What is Teradata Query Rewriting? Top 6 Optimization Techniques Explained
What is Teradata Query Rewriting?
Teradata query rewriting is an integral component of the optimization process for Teradata.
The optimizer replaces your query with a more efficient and faster version.
Both queries must produce identical results. Teradata implements various optimization techniques to enhance performance.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
We will demonstrate the commonly used optimization methods.
1. Removing unreferenced columns and expressions from the VIEW select list
Without enhancement, all VIEW columns must be spooled during query execution.
Let's assume we have the following view definition:
CREATE VIEW MyView AS SELECT col1, col2, AVG(col3) myavg FROM table1 t01, table2 t02 WHERE t01.pk=t02.pk GROUP BY 1,2; We are executing the following query:
SELECT MAX(myavg) FROM MyView;In the example, col1 and col2 can be removed, and only myavg column will be spooled.
The following example shows another case where spool usage can be eliminated.
SELECT COUNT(*) FROM MyView; The query does not refer to any view column. The number of rows can be determined solely from the cylinder index.
The optimizer commonly reduces spool usage by eliminating view columns from the spool.
Get the next issue by email.
2. Conversion of outer joins to inner joins
If the optimizer recognizes that a query can be solved using an inner join, it will swap out the outer join to enhance its performance.
SELECT DISTINCT(col1) FROM table1 LEFT OUTER JOIN table2 ON table1.pk=table2.pk WHERE table2.col2 = 100 ;In the example above, the outer join can be converted to an inner one because the WHERE condition filters non-matching rows.
3. Folding of Views
The optimizer removes any reference to views from the execution plan, known as view folding.
Assuming the given view definition:
CREATE VIEW MyView AS SELECT col1, col2, SUM(col3*col4) the_sum FROM table1, table2 WHERE table1.pk=table2.pk GROUP BY 1,2;We execute the query below:
SELECT col2 FROM MyView WHERE the_sum > 100;The optimizer would rewrite the above example in the following form:
SELECT col2 FROM table1, table2 WHERE table1.pk=table2.pk GROUP BY 1,2 HAVING SUM(col3*col4) > 100;The view has been removed from the execution plan.
4. Predicate Pushdown
The optimizer applies query predicates by pushing them down. For instance, in the t01 inner sub-query, the WHERE predicate is already implemented.
SELECT SUM(the_sum) the_overall_sum FROM ( SELECT col1, col2, SUM(col3*col4) the_sum FROM table1, table2 WHERE table1.col1=table2.col2 GROUP BY 1, 2 ) t01 WHERE col1 IN (1,2,3,4,5) ;5. Elimination of SET operator branches
If a branch in SET has an unsatisfiable condition, it will be removed from the query.
Consider the following query as an example:
SELECT * FROM table1 WHERE month_id= 1 UNION ALL SELECT * FROM table2 WHERE month_id = 1 ;We assume a check constraint on month_id to ensure that table1 only has month_id = 1 and table2 only has month_id = 2.
The optimizer can rewrite the query above safely to:
SELECT * FROM table1 WHERE month_id = 1 ;6. Elimination of Joins
Unnecessary joins have been removed from the execution plan.
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.pk=table2.pk ;Only Table 1 needs to be spooled to fulfill the query above.
SELECT table1.* FROM table1;The optimizer endeavors to enhance your queries. I trust you found this post enjoyable!
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.