Optimizing Join and Aggregation with Teradata UNION ALL Views
Teradata 16 introduces a new option for handling sets of rows combined with "UNION ALL" in views and derived tables, reducing resource usage. The optimizer can apply aggregation steps and join operations to each branch of the "UNION ALL" separately, resulting in smaller spool sizes and improved perf
Teradata UNION ALL Introduction
Teradata introduced a new Optimizer feature for managing sets of rows combined by UNION ALL in views and derived tables.
In earlier versions of Teradata, rows combined with Teradata UNION ALL were consolidated into a shared spool before further operations like joining another table or performing an aggregation step. Consider the following example of a join operation:
REPLACE VIEW TestView AS(SELECT PK, COL1 FROM TheTableAUNION ALLSELECT PK, COL1 FROM TheTableB) ;SELECT *FROMTestView T01INNER JOINTestTable T02ONT01.PK = T02.PKWHERET01.COL2 = 100;The Teradata 15.10 Optimizer creates a common spool and performs a join operation with the "TestTable".
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
The common UNION ALL spool will not reduce rows during construction. All join preparation steps, including redistribution on join columns, must be performed using the spool created from the combination of all UNION ALL branches.
The Optimizer in Teradata 16 enables the division of a join into several segments and offers an additional choice.
Each Teradata UNION ALL branch can be joined separately to the "TestTable". The final outcome of these join processes is stored in a final spool.
To illustrate the equivalent queries, the Teradata 16 Optimizer effectively executes:
Get the next issue by email.
SELECT * FROM TheTableA T01
INNER JOIN
TestTable T02
ON T01.PK = T02.PK
WHERE T01.COL2 = 100;
SELECT * FROM TheTableB T01
INNER JOIN
TestTable T02
ON T01.PK = T02.PK
WHERE T01.COL2 = 100;
This option can save resources by effectively decreasing the selected row count in every join.
The potential for decreased resource consumption rises as tables become larger and are merged using UNION ALL before being joined with an additional table.
The Teradata 16 Optimizer can apply aggregation to each branch of UNION ALL separately and then execute global aggregation in a subsequent step.
Before Teradata 16, aggregation occurred once all UNION ALL branches had been consolidated into a shared spool.
The primary benefit lies in counting the rows of a view or derived table that includes multiple branches merged with UNION ALL. Before Teradata 16, a common spool was created first, which required a full row count.
Starting with Teradata 14.10, the Optimizer has implemented a uniform optimization method for individual tables. This entails computing the row count on the Cylinder Index for each table linked by a UNION ALL operation and subsequently summing up the results. This technique remains in use in Teradata 16.
The Optimizer can utilize the aforementioned techniques to enhance performance by reducing spool sizes by pushing aggregation or joining into each UNION ALL branch.
The Optimizer heavily relies on statistics to determine the feasibility of implementing these methods.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.