Share

Optimizing Workload with Teradata's Sample Statistics Feature

Learn how to gather statistical samples in Teradata systems to reduce workload and generate comparable execution plans through sample statistics.

Optimizing Workload with Teradata's Sample Statistics Feature
tune3

The Teradata Collect Statistics Using Sample feature explained

Utilizing statistical samples in a Teradata system can lessen the workload and produce execution plans equivalent to those resulting from complete statistics when implemented with care.

What are sample statistics? The optimizer utilizes a 2% row sample to gather demographic information. The sample size can be adjusted by executing the statement below, and the new sample size will remain valid until the session ends:

DIAGNOSTIC "COLLECTSTATS, SAMPLESIZE=n" ON FOR SESSION;

When can sample statistics be applied effectively? Ensuring the table is not skewed is crucial in obtaining accurate sample statistics. To determine skewness, execute the following query:


Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.

Get the next issue


SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)FROM <TABLE>GROUP BY 1;

Sample statistics are adequate for skew values that are less than or equal to 5%. Full statistics are unnecessary for small tables, but for large tables, complete statistics can use up a significant amount of CPU and Disk IO resources. In these instances, utilizing sample statistics can be a viable option.

COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);

Global temporary tables, join indices, and volatile tables do not support statistics collection.

For more information on Teradata Statistics, please refer to Teradata Statistics: Everything You Need to Know.


Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe before the paid plan launches and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Subscribe to DWHPro Letters

Practical field notes on enterprise data engineering, production AI systems, platform migration, and the senior engineering market.
Written by Roland Wenzlofsky Founder of DWHPro Author of Teradata Query Performance Tuning
Get the next issue
Subscribe