Optimizing Teradata Joins with Skewed Data: Strategies and Solutions
Skewed Teradata Joins - The Initial Situation
Consider the scenario where one table includes various currencies while the other comprises customers' accounts with their corresponding currency. Essentially, the ISO code of the currency serves as a foreign key in the account table.
CREATE TABLE Currency
(
CURRENCY_CD VARCHAR(20) NOT NULL,
CURRENCY_NAME VARCHAR(200),
) PRIMARY INDEX (CURRENCY_CD);
CREATE TABLE Customer
(
CUSTOMER_ID INTEGER NOT NULL,
CUSTOMER_NAME VARCHAR(255),
...
CURRENCY_CD CHAR(20)
) PRIMARY INDEX (CUSTOMER_ID);
CUSTOMER TABLE
| CUSTOMER_ID | CUSTOMER_NAME | CURRENCY_CD |
| 1 | Nina Lowery | EUR |
| 2 | Alexia Neal | USD |
| 3 | Kyla Chan | NULL |
| 4 | Alesha Ferrell | NULL |
| 5 | Cara Adams | NULL |
| 6 | Abigail Larsen | NULL |
| 7 | Amie Massey | NULL |
CURRENCY TABLE
| CURRENCY_CD | CURRENCY_NAME |
| EUR | Euro |
| USD | US Dollar |
| AUD | Australian Dollar |
| HUF | Hungarian Forint |
To join with Column CURRENCY_CD, ensure that this column is the primary index for both tables and that the rows exist on the relevant AMPs.
Our table shows that most accounts lack a currency assignment, resulting in a skewed join.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
Despite the optimizations implemented in the latest Teradata versions to mitigate the issue, such as re-hashing evenly distributed values while duplicating NULL rows to all AMPs, the problem persists.
Get the next issue by email.
This is particularly true when the optimizer lacks current statistics on biased values.
A Possible Solution To This Problem
Replacing the NULL value with a highly distinct value is a viable solution.
In our case, for example, the CUSTOMER_ID can be used. We can cast it to a CHARACTER column, add a "special string" in front of it, and write it into the Column CURRENCY_CD:
| CUSTOMER_ID | CUSTOMER_NAME | CURRENCY_CD |
| 1 | Nina Lowery | EUR |
| 2 | Alexia Neal | USD |
| 3 | Kyla Chan | #NULL#3 |
| 4 | Alesha Ferrell | #NULL#4 |
| 5 | Cara Adams | #NULL#5 |
| 6 | Abigail Larsen | #NULL#6 |
| 7 | Amie Massey | #NULL#7 |
However, using this approach renders NULL values unidentifiable and requires additional query logic.
Want to gain comprehensive knowledge about joins in Teradata? Read this article.
https://letters.dwhpro.com/content/files/2026/05/teradata-join-strategies.html
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.