Unnecessary IO Elimination – Teradata View Optimization

tune1

Summary

Today I will show you how to identify the source problem to tune the query's performance utilizing the unnecessary amount of IO, the UII indicator, and how the WHERE clause placed on a VIEW is evaluated.

This case shows that the WHERE clause placed on VIEW for 2 Partitioned tables resulted in excessive block reads and applying WHERE condition afterward. You need to be fully aware of the amount of data query reads and the amount of data they need.

Performance tuning boosted the performance fivefold. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources:

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

1+ Billion rows INSERTED

Here is the target query for today, which populates a large fact table of the data mart.

- INSERT INTO ... SELECT FROM ...

- The query is run daily

- Over 1000M rows inserted (1 billion)

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership ( ... ) SELECT ... FROM PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE (crm_wave_launch_dt >= (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today - 1 month */ AND (crm_campaign_enter_dt >= (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today - 1 month */ ;

Poor performance detected

Query performance statistics stored in Teradata QueryLog signify that the query is suspicious and performs poorly in unnecessary IO, excessive Spool space, and CPU utilization.

Take a look at Teradata performance metrics:

What knowledge can we discover from this data?

What is particularly wrong, and how can we tune and improve the query?

A closer look at IO, source view DDL

First, let us describe what UII (Unnecessary IO indicator) is and what it can tell us.

Get the next issue by email.

UII is calculated as SumIO / (SumCPU * 1000). The metric is used to understand query efficiency for consuming CPU and IO resources. If UII is relatively high, it could mean that many data blocks are read, but a relatively small proportion of them are processed.

Let us examine source View DDL through which the target table is populated.

REPLACE VIEW SBX.V_CRM_FT_CAMPAIGN_MEMBERSHIP AS LOCKING ROW ACCESS SELECT ... FROM SBX.T_CRM_CAMPAIGN_MEMBER CCM JOIN SBX.T_MD_CRM_CAMPAIGN CP ON ( ... AND CP.WAVE_LAUNCH_DATE>=OADD_MONTHS(CURRENT_DATE,-6) AND CCM.CAMPAIGN_ENTER_DATE >=OADD_MONTHS(CURRENT_DATE,-6));

A-ha! Here we see that the latest six months are extracted from the view.

The tables are partitioned by columns used in WHERE clause:

CREATE MULTISET TABLE SBX.T_CRM_CAMPAIGN_MEMBER ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( ... ) PRIMARY INDEX ( MEMBER_ID , WAVE_ID ) PARTITION BY RANGE_N(CAMPAIGN_ENTER_DATE BETWEEN DATE '2014-12-01' AND DATE '2025-12-31' EACH INTERVAL '1' MONTH );

But only last month's data is inserted in the target table! The INSERT statement is as:

So, the query reads five months' data in vain with no particular reason. The thing is that before the final query applies any WHERE clause, the whole data is read.

Every data block under this VIEW is read first, and then non-relevant data is eliminated. But why read unnecessary data? Let us figure out how to force Teradata to extract only relevant partitions.

Here is the answer:

1. We might put the WHERE clause from the final query INSIDE the source VIEW DDL.

2. We might create an entirely new VIEW for this ETL process if some other users might not want any changes to the VIEW.

Improved query performance

This optimization gave us five times boost in performance. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources.

Detailed query metrics are below:

Artemiy Kozyr is Data Engineer at Sberbank, Moscow, with a Master's Degree in CS.
He has five years of experience in Data Warehousing, ETL, and Visualization for Financial Institutions.

Contacts:

artemiy.kozyr@gmail.com
http://linkedin.com/in/artemiykozyr/

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Summary

1+ Billion rows INSERTED

Poor performance detected

A closer look at IO, source view DDL

Improved query performance

Subscribe to DWHPro Letters