Shrinking Teradata Tables: Reduce Table Size by 90% with This Simple Trick
Despite implementing Multivalue Compression to minimize the size of our tables, we still require additional space.
Shrinking A Teradata Table To A Minimum Size
I once used a trick that reduced an already optimized table with multivalue compression to just 10% of its original size.
This is a pattern worth watching for in your own tables.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Launch access is open. The single paid plan will be EUR 49 per year.
I will illustrate using the table below:
If you work with enterprise data platforms, migrations, performance tuning, or AI-driven delivery teams, DWHPro Letters is written for you. Get the next issue by email.
CREATE SET TABLE DWHPRO.Sales
(
SalesDate DATE FORMAT 'YY/MM/DD' NOT NULL,
Customer_Category_Cd CHAR(5) NOT NULL COMPRESS ('10000','20000'),
SalesId CHAR(50) NOT NULL,
Sales_Main_Type_Cd CHAR(1) NOT NULL COMPRESS ('A','B','C','D','E'),
Sales_Sub_Type_Cd CHAR(5) NOT NULL COMPRESS '00001','00002','00003'),
Sales_Value_Cd CHAR(5) NOT NULL COMPRESS ('VAL01','VAL02','VAL03'),
Sales_Class_Cd CHAR(5) NOT NULL COMPRESS ('Cla01','Cla02'),
Group_SalesId CHAR(50) NOT NULL COMPRESS (''),
Currency CHAR(3) NOT NULL COMPRESS ('EUR','USD','AUD','NZD'),
Final_Flag INTEGER NOT NULL COMPRESS 0 ,
Price DECIMAL(38,2) COMPRESS (0.00, 100.00,200.00)
PRIMARY INDEX (SalesId)
PARTITION BY RANGE_N(
SalesDate BETWEEN DATE '2019-01-01' AND DATE '2020-12-31' EACH INTERVAL '1' DAY,NO RANGE, UNKNOWN)
UNIQUE INDEX (SalesDate,Customer_Category_Cd,SalesId,Sales_Main_Type_Cd ,Sales_Sub_Type_Cd,Sales_Value_Cd,Sales_Class_Cd ,Group_SalesId ,Currency ,Final_Flag );SalesId is the primary index that distributes efficiently and is frequently utilized for joining.
The Primary Key was duplicated using a Unique Secondary Index (USI).
The intent is to prevent the insertion of duplicate primary keys into the table.
Initially, this solution appears reasonable, but it ultimately proves very costly. I will explain the reason shortly. First, let's consider the table's size:
SELECT SUM(CURRENTPERM) / (1024**3) AS Size_GB FROM DBC.TABLESIZE
WHERE TABLENAME = 'Sales';
-> 125,50 GBNext, we must determine the table's size after removing the USI.
DROP UNIQUE INDEX (SalesDate,Customer_Category_Cd,SalesId ,Sales_Main_Type_Cd,Sales_Sub_Type_Cd,Sales_Value_Cd,
Sales_Class_Cd ,Group_SalesId ,Currency ,Final_Flag)
ON DWHPRO.Sales; SELECT SUM(CURRENTPERM) / (1024**3) AS Size_GB FROM DBC.TABLESIZE
WHERE TABLENAME = 'Sales';
-> 12,50 GB
The Reason For A Huge Table Size
The USI is uncompressed and essentially duplicates the base table due to the inclusion of nearly all columns.
Teradata Primary Key Check Solution
In my opinion, creating secondary indexes solely to enforce the primary key is a flawed practice.
Usually, it is more logical to perform this task programmatically during the ETL process. This approach also eliminates the need for maintaining the secondary index, which can be resource-intensive.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.