The Teradata NESTED join and the impact of the Primary Index

Discover the benefits of using a Teradata NESTED join strategy for economical data joining. Learn how to enable this feature and optimize query design.

tune4

The MERGE join is the most commonly utilized join strategy, or its equivalent for row partitioned tables, known as the row key-based merge join. One seldom-seen join strategy is the Teradata NESTED join.

The Teradata Optimizer must carefully consider query design to select the most cost-effective joining method.

First, we will examine the mechanics of a nested join and the technology that facilitates it.

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

What allows for nesting to occur? The Optimizer must perform a distinct index lookup on one table and an indexed search on the other table.

A distinct index search can refer to either primary index access or secondary index access. The second table must be accessible via any index, regardless of uniqueness.

In a fast-paced setting, where avoiding unique primary indexes is common practice to conceal duplicate key issues, numerous opportunities to perform a nested join go unnoticed.

In this test scenario, we will demonstrate the significant impact this can have on a system.

I created two test tables, allowing the NESTED join strategy to be used.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B INTEGER) PRIMARY INDEX (ATTRIB_B );

Get the next issue by email.

I inserted arbitrary data into both tables.

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1) FROM SYS_CALENDAR.CALENDAR;

I then created a query meeting the ideal NESTED join prerequisites.

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was retrieving the result set with only two disk accesses.

Subsequently, I removed the uniqueness constraint from the initial test table, changing it from UPI to NUPI. Without any unique secondary index, this will prevent the Optimizer from utilizing the NESTED join strategy.

DROP TABLE TABLE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) PRIMARY INDEX (PK);

The Optimizer formulated an alternative plan, opting to execute a MERGE join.

The second execution plan is highly inefficient, requiring a staggering 228 disk accesses instead of a mere 2.

Conclusion:

Do not disregard the uniqueness of primary indexes for dubious reasons.

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Subscribe to DWHPro Letters