Optimizing Queries with LIKE Operator in Teradata

Learn how to optimize LIKE operator queries in Teradata to avoid full table scans and improve performance. Read on for expert tips and tricks.

sql2

Queries using the LIKE operator typically result in a full table scan.

When the LIKE operator matches from the left, the Teradata Optimizer can swiftly search the statistic histograms for demographic data metrics, including row count.

Regrettably, the Teradata Optimizer cannot utilize statistics for a LIKE match, specifically LIKE '%ABC%'. Accordingly, it will anticipate a weak selectivity and consistently opt for the full table scan as the least costly access path. The full table scan can be implemented on the base table, a join index, or a NUSI.

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

However, accessing indexes necessitates further qualifying against other columns.

If you work with enterprise data platforms, migrations, performance tuning, or AI-driven delivery teams, DWHPro Letters is written for you. Get the next issue by email.

The Teradata Optimizer may perform a full table scan on a smaller NUSI or a Join Index if there are many distinct values.

However, as it is not guaranteed, we have a technique to force a full table scan on a smaller, compact table.

Assuming the table we are querying with the LIKE operator appears as follows:

CREATE SET TABLE Table1(PK INTEGER NOT NULLcolA VARCHAR(100),colB VARCHAR(100),colC VARCHAR(100),...colZ VARCHAR(100)) UNIQUE PRIMARY INDEX (PK)

Here is a typical query:

SELECT * FROM Table1 WHERE colA like '%ABC%';

To prevent a complete table scan on the large table, we generate a secondary table that solely includes the essential columns necessary for the LIKE comparison (in this instance, "colA" is sufficient) along with the distinct primary key column "PK":

CREATE SET TABLE Helper(PK INTEGER NOT NULL,colA  VARCHAR(100)) UNIQUE PRIMARY INDEX (PK);

INSERT INTO Helper SELECT PK, colA FROM Table1;COLLECT STATISTICS ON Helper COLUMN(PK);

We have now converted the initial SQL query:

SELECT Table1.*FROMTable1 t01INNER JOINHelper t02ONt01.PK = t02.PKWHERE Helper.colA like '%ABC%';

Teradata will perform a complete table scan on the smaller "Helper" table and execute an AMP-local merge join based on row keys with the larger "Table1" table.

The benefit is only applicable if scanning the large table is more costly than scanning the small table and executing a consecutive AMP-local merge join.

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Subscribe to DWHPro Letters