Optimizing Workload with Teradata's Sample Statistics Feature

Learn how to gather statistical samples in Teradata systems to reduce workload and generate comparable execution plans through sample statistics.

tune3

The Teradata Collect Statistics Using Sample feature explained

Utilizing statistical samples in a Teradata system can lessen the workload and produce execution plans equivalent to those resulting from complete statistics when implemented with care.

What are sample statistics? The optimizer utilizes a 2% row sample to gather demographic information. The sample size can be adjusted by executing the statement below, and the new sample size will remain valid until the session ends:

DIAGNOSTIC "COLLECTSTATS, SAMPLESIZE=n" ON FOR SESSION;

When can sample statistics be applied effectively? Ensuring the table is not skewed is crucial in obtaining accurate sample statistics. To determine skewness, execute the following query:

SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)FROM <TABLE>GROUP BY 1;

Sample statistics are adequate for skew values that are less than or equal to 5%. Full statistics are unnecessary for small tables, but for large tables, complete statistics can use up a significant amount of CPU and Disk IO resources. In these instances, utilizing sample statistics can be a viable option.

COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);

Global temporary tables, join indices, and volatile tables do not support statistics collection.

For more information on Teradata Statistics, please refer to Teradata Statistics: Everything You Need to Know.

The Teradata Collect Statistics Using Sample feature explained

Subscribe to DWHPro Letters