Understanding Teradata's SAMPLE Function: Randomly Select Rows and More
What is the Teradata Sample Function?
The SAMPLE function returns a set of randomly selected rows. Here are the characteristics:
- Ask for a sample with an absolute number of rows
- Ask for a sample that contains a certain percentage of the table rows
- Ask for more than one sample at the same time
- Include the column SAMPLEID to determine from which sample a row originates
- Ask for samples with or without duplicates
The Syntax of the Sample Function
SAMPLE [WITH REPLACEMENT] [RANDOMIZED ALLOCATION][WHEN <condition> THEN] {<number of row> | <percentage>} [...,<number-of-rows> | <percentage>][ELSE {<number of rows> | <percentage } END]
Here are some examples of how to use the SAMPLE function:
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
100 Random Rows
SELECT * FROM CustomerSAMPLE 100;
10% of Table Rows
Get the next issue by email.
SELECT * FROM CustomerSAMPLE .10;
2 Samples with 10% of Table Rows (no Duplicates)
SELECT * FROM CustomerSAMPLE .1,.1;
2 Samples with 10 Rows and SAMPLEID
SELECT t01.* , SAMPLEIDFROM Customer t01SAMPLE 10,10;
2 Samples with 10% of Table Rows (with Duplicates)
SELECT * FROM CustomerSAMPLE WITH REPLACEMENT .1,.1;
2 Samples with Conditional Logic Applied
SELECT * FROM CustomerSAMPLE WHEN CustomerId > 1000 THEN .1,.1 ELSE .3,.3 END;
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.