Improvements in Tailoring Statistics Collection with Teradata 14.00: Using Sample, MaxValueLength, and MaxIntervals Options
Learn about the latest improvements in Teradata 14.00 that allow you to customize the collection of statistics to better suit your needs. These improvements include the option to set different sample sizes, consider more bytes for histogram creation, and choose the number of intervals for building s
Teradata 14.00 offers three enhancements that enable you to customize statistic collection to suit your requirements better:
COLLECT STATISTICS USING SAMPLE n PERCENT COLUMN(column_name) ON table_name;COLLECT STATISTICS USING MAXINTERVALS n COLUMN(column_name) ON table_name;COLLECT STATISTICS USING MAXVALUELENGTH n COLUMN(column_name) ON table_name;The USING SAMPLE option
Previously, a single global percentage for sampling table rows was applied to all statistics, even when SAMPLE statistics were already available. However, with Teradata 14.00, you now have the ability to specify a unique sample size for each collection statement.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
The USING MAXVALUELENGTH option
Get the next issue by email.
Teradata statistics information is created by considering a limited number of bytes in a column, starting from the leftmost character.
Teradata 14.00 increased the differentiation threshold for individual row entries from 16 to 25 bytes. However, columns that consume more bytes are still considered equal. Additionally, character columns no longer undergo padding with spaces, which used to occur for all row entries consuming fewer than the threshold level of bytes prior to Teradata 14.00.
Although the default value is presently 25 bytes, you may encounter wider columns, such as VARCHAR(2000), in which most entries vary primarily in details towards the right end of the string.
To accurately distinguish between these entries, character counts beyond position 25 and up to 2000 must be included for histogram generation. To accomplish this, the MAXVALUELENGTH option can be utilized to expand the number of bytes analyzed.
The USING MAXINTERVALS option
A new Teradata statistics feature pertains to the number of intervals utilized in constructing the statistics histogram. Each interval incorporates the most common value and its corresponding record count within the interval.
In prior versions, the histogram class number remained constant (currently set to 250 intervals by default on Teradata 14); however, it is now possible to select a value ranging from 0 to 500. What benefits can be derived from altering the interval count?
Increasing the number of classes in the histogram improves the chances of capturing localized value concentrations. This is particularly beneficial for optimization tasks when the distribution of row entries displays numerous sharp peaks and valleys concentrated in a small range of potential values. Aggregating them into a single histogram class can lead to distortion.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.