How to Avoid Full Table Scans When Using LIKE Operators in Teradata Indexing

Learn how to avoid resource-killing full table scans on extensive tables with Teradata Indexing. Discover a fantastic trick to optimize query performance.

sql2

When Teradata Indexing fails

Selecting rows from extensive tables with billions of entries can be resource-intensive. While it may not cause significant strain on resources when accessing rows directly through primary indices or unique secondary indices (USI), using a "LIKE" operator to select rows can be extremely resource-intensive.

SELECT * FROM wide_table WHERE column_B LIKE '%A%';

Typically, the above query is resolved through a full table scan (FTS), without any additional measures.

Using a NUSI or single table join index (STJI) can minimize disk IO and decrease query runtime.

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

Using an STJI may be viable; however, persuading the optimizer to replace the FTS on the base table with an FTS on a NUSI sub-table could prove challenging. The inadequate statistics (whether collected or sampled) may not adequately support index usage. Typically, when the search parameter begins with "value%", the optimizer may opt for a NUSI traversal instead of "%value%". This preference stems from the storage manner of statistic histograms in Teradata.

Get the next issue by email.

Note that the use of LIKE operators prevents a binary search in NUSI data blocks due to sorting the row pointer array by ROWHASH of indexed columns, which LIKE operators cannot hash. This limits access to a full table scan of the NUSI sub-table, which is only feasible when the number of NUSI data blocks to be read is significantly lower than the number of base table data blocks accessed with an FTS.

The Solution without Teradata Indexing

Fortunately, there is a great technique to prevent a complete table scan on a broad table.

We generate a condensed version of the big table, comprising solely the primary key columns and those essential for fulfilling the LIKE operator's requirements. To employ this technique, it is crucial that the primary key and primary index of the table match.

We will join our auxiliary table with the wide table instead of selecting directly from the latter.

SELECT t01.*FROM wide_table t01INNER JOIN auxiliary_table t02 ON t01.primary_key = t02.primary_key WHERE t02.column_A LIKE '%A%';

If all necessary statistics are available for the query, the optimizer will generate an execution plan that is less demanding on resources than the original plan.

Performing a complete scan of the smaller auxiliary table will isolate only the rows that fulfill the LIKE operator and transfer them to the spool. Since the auxiliary table is significantly smaller, there will be a significant decrease in IO.

The second step involves a swift and AMP-local primary index merge join of the spooled records with the wide table.

The above solution was effectively executed in multiple telecommunication projects, which involved carrying out daily queries using LIKE operators on massive call detail record (CDR) tables. Typically, the generation of auxiliary tables is deferred until the nightly batch load window.

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

When Teradata Indexing fails

The Solution without Teradata Indexing

Subscribe to DWHPro Letters