Optimizing Performance with Teradata Summary Statistics
Discover how Teradata summary statistics provide valuable information about the number of rows in a table, helping to optimize performance tuning.
Teradata summary statistics were introduced in Release 14 to give the Optimizer vital information on table row counts. These statistics are valuable for performance tuning as they can be efficiently gathered and are crucial in creating an execution plan. They are especially useful when collecting column statistics requires significant resources. The syntax for collecting summary statistics is simple:
COLLECT SUMMARY STATISTICS ON <TABLE>;Teradata summary statistics include the number of rows, average data block size, and average row size per table. The Teradata Optimizer heavily relies on the number of rows per table to approximate the cost of scanning a table, underscoring its critical importance.
In previous Teradata versions lacking summary statistics, gathering statistics on the "PARTITION" dummy column and primary index columns was imperative.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
When summary statistics are defined for a table, they are refreshed whenever any column statistics are updated. To optimize resource usage, it is recommended to update all column statistics at once, as shown below:
COLLECT STATISTICS ON <TABLE>; -- all column statistics and summary statistics are refreshed. Although less resource-intensive than traditional column/index statistics, one should still avoid the following approach:
COLLECT STATISTICS ON <TABLE> <COLUMN_1>; -- first collection of summary statistics
...
COLLECT STATISTICS ON <TABLE> <COLUMN_N>; -- needless nth collection of summary statisticsTeradata summary statistics serve as a great starting point for gathering table-level statistics. When trying to update table statistics for a table lacking any column statistics, an error will arise:
COLLECT STATISTICS ON <TABLE>;
COLLECT failed. [3624] There are no statistics defined for the table.Efficiently generating preliminary summary statistics for all tables can resolve this issue.
Get the next issue by email.
COLLECT SUMMARY STATISTICS ON <TABLE>;COLLECT STATISTICS ON <TABLE>;
Collect complete. 1 rows processedTeradata Statistics FAQ
What are Teradata statistics used for?
Teradata statistics are essential for the optimizer to generate efficient and accurate query execution plans. By collecting statistics on tables, columns, and indexes, the optimizer can better understand the data distribution, cardinality, and uniqueness within the database. This information helps the optimizer make informed decisions about join methods, access paths, and resource allocation when processing queries. In essence, Teradata statistics are crucial for improving query performance and ensuring optimal system resource usage.
How to show statistics in Teradata?
To show statistics in Teradata, you can use the HELP STATISTICS command. This command displays the available statistics for a specific table, index, or column. The syntax for the HELP STATISTICS command is as follows:
HELP STATISTICS database_name.table_name;
Replace database_name and table_name with the appropriate database and table names for which you want to display the statistics. You can also specify a particular column or index by appending it to the table name like this:
HELP STATISTICS database_name.table_name.column_name;
How often do you collect statistics in Teradata?
The frequency of collecting statistics in Teradata depends on various factors, such as the nature of the data, the rate of data changes, and the impact on query performance. As a general guideline, you should consider collecting statistics when:
- A table is newly created or loaded with data.
- A significant amount of data has been added, updated, or deleted (typically around 10% or more of the table's total data).
- You observe a decline in query performance or the optimizer starts generating suboptimal execution plans.
You may need to collect statistics more often for tables with frequently changing data. Conversely, you can collect statistics less frequently for tables with relatively static data.
Why do we collect stats in Teradata?
Collecting statistics in Teradata is essential for maintaining optimal query performance and resource utilization. By gathering information on data distribution, cardinality, and uniqueness, the optimizer can make informed decisions when creating query execution plans. Collecting statistics helps the optimizer:
- Choose the most efficient join methods and access paths for the given data.
- Estimate the number of rows each operation will process and return, allowing for better resource allocation and parallelism.
- Determine the cost of different execution plan options, leading to selecting the most efficient plan.
In summary, collecting stats in Teradata is critical for ensuring efficient query execution and maximizing system performance.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.