Building a Teradata Data Warehouse: Considerations for ETL Process, SQL Queries, and Physical Data Model

design4

This post aims to compile all crucial aspects to be considered while constructing a Teradata Data Warehouse, including the ETL process and SQL queries.

This list is just the beginning, and I anticipate receiving valuable feedback from my readers to expand it in the future. Initially, I have provided a few concepts, but I intend to include more categories and topics in the coming days and weeks. Furthermore, you can contribute your ideas at the end of this blog post, which I will continuously incorporate into the article, along with your name if desired.

The Teradata Physical Data Model

Are all Primary Key columns defined as NOT NULL?
Does the Primary Index distribute the rows equally across all AMPs?
Are the Primary Index column values stable, or do they change frequently?
Do columns with the same content have the same data type across all tables?
Are SET tables only used for exceptional cases, otherwise always MULTISET tables?
Are tables from which a range of values is often queried row-level partitioned?
Do tables that are often joined together have the same primary index?
Is the primary index of a table often used in a WHERE condition?
Was Multi-Value compression performed on all tables?
Are all row partitions maintained (to cover new values)?
Is the correct CHARACTER SET used (LATIN vs. UNICODE)?
Is the DATABLOCKSIZE appropriate for the workload (Tactical vs. Strategic)?
Are tables created without the FALLBACK option? (Raluca Topor)
Was a 1-AMP design chosen for tiny tables (1-AMP Hashmap)? (Peter Nagy)
Was columnar considered for wide datamarts with narrow queries? (Peter Nagy)
Was NOT NULL unique key applied on parent tables to enable join elimination? (alternative to soft RI) (Peter Nagy)
Was primary AMP design considered for tables that are frequently reloaded and yet still joined? (Peter Nagy)
Is adequate sample percentage used during collect stats for big tables? (Peter Nagy)
Are statistics copied from table to table whenever it's a replica instead of recollected? (Peter Nagy)
Is a value-ordered NUSI considered for range query columns on tables where row partitioning is not an option? (Peter Nagy)

The Teradata Statistics

Are full statistics collected on all small tables (fewer table rows than AMPs)?
Are statistics collected on all non-indexed columns that are used in WHERE conditions?
Are full statistics collected on all skewed columns?
Do you use DELETE instead of DROP/CREATE to preserve statistics histories?
Have you collected statistics on all join columns?
Have you collected statistics on all Primary Indexes?
Have you collected statistics on all NUSIs?
Have you collected statistics on all partition columns (individually and in combination)?
Are you using Create Table Table1 as Table2 with data and stats to keep the stats? (Atharva Joshi)

Teradata SQL

Do you use UNION only if you need to remove duplicates, otherwise, UNION ALL?
Do you only select the columns you really need?
Do you use outer joins only when it is really necessary?
If possible, do you use WHERE conditions to enable partition elimination?
Do you use multi-statement requests where possible?
Do you use CLOB instead of large VARCHAR for rarely requested columns?
Are you careful not to use functions on join columns?
Are you using a Fast Path INSERT/SELECT?

Please Send Me Your Own Tuning Ideas

Please add this tuning idea!

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

Your email address will not be used for marketing purposes or stored.

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

The Teradata Physical Data Model

The Teradata Statistics

Teradata SQL

Please Send Me Your Own Tuning Ideas

Subscribe to DWHPro Letters