Teradata Historisation - Daily snapshot to change history
For various reasons, we may need to switch our Teradata historisation from snapshot tables to a start date/end date logic table, which I usually call a change history, as only changes trigger a historisation.
A principal motivation for such a move away from a daily snapshot table could be space restrictions.
Whatever the reason may be, changing the historisation type is easier than one may expect. Here is a SQL template that handles this task precisely. RSD and RED columns are the technical columns defining each record's validity, start date, and end date.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
[quote]
SELECT
<KEY_COLUMNS>,
<NON_KEY_COLUMNS>,
RSD,
CASE WHEN
MAX(CHANGED_IND) OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING ) = 2
THEN
MAX(RED_PREV) OVER ( PARTITION BY <KEY_COLUMNS>, ORDER BY RSD ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING )
ELSE
COALESCE(((MAX(RSD) OVER ( PARTITION BY KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING ))-1), ENDI)
END AS RED
FROM
(
SELECT
<KEY_COLUMNS>,<NON_KEY_COLUMNS>,SNAPSHOT_DATE AS RSD,SNAPSHOT_DATE AS RED,
MAX(RED) OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS RED_PREV,
Get the next issue by email.
CASE
WHEN
(RED_PREV+1) <> RSD THEN 2
WHEN
(MIN() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)=
MAX() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) OR
(MIN() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL AND
MIN() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN CURRENT ROW AND CURRENT ROW) IS NULL))
AND
...
AND
(MIN() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)=
MAX( <ATTRIBUTEn) OVER (PARTITION BY ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) OR
(MIN() OVER (PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL AND
MIN( <ATTRIBUTEn) OVER (PARTITION BY ORDER BY RSD ROWS BETWEEN CURRENT ROW AND CURRENT ROW) IS NULL))
THEN 0 ELSE 1
END AS CHANGED_IND,
MAX(RED) OVER ( PARTITION BY <KEY_COLUMNS> ORDER BY RSD ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS RED_NEXT,
MAX(RED) OVER ( PARTITION BY <KEY_COLUMNS> AS ENDI
FROM
<SNAPHOT_TABLE> i
QUALIFY (CHANGED_IND=0 and RED_PREV IS NULL) OR (CHANGED_IND <> 0)
) i;
[/quote]
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.