Teradata strtok_split_to_table, CSVLD, Unpack: Column Splitting in Teradata
Splitting column content can be a challenging task. Teradata provides multiple methods to achieve this, each with unique pros and cons. In this article, we will examine these techniques in detail. To begin, we will generate a table containing sufficient random data to assess every option's efficiency and resource utilization.
CREATE TABLE DWHPRO.Split
(
PK INTEGER,
ToBeSplit VARCHAR(200)
) PRIMARY INDEX (PK);INSERT INTO DWHPRO.Split
SELECT
ROW_NUMBER() OVER (ORDER BY 1) AS PK,
'Row' || TRIM(PK) || 'Col1,' || 'Row' || TRIM(PK) || 'Col2,' || 'Row' || TRIM(PK) || 'Col3'
FROM
(
SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR
) t01;The Teradata "CSVLD" Function
This function requires all defined columns to be present. An error message will be shown if any columns cannot be created.
SET QUERY_BAND = 'Method=CSVLDFunc;' FOR SESSION;
SELECT * FROM TABLE (CSVLD(Split.ToBeSplit, ',', '')
RETURNS (col1 varchar(200), col2 varchar(200), col3 varchar(200))) as T1;The Teradata "Unpack" Function
In contrast to the previously discussed function, the Unpack function also allows you to omit rows that cannot be split into all defined columns. The parameter "IgnoreInvalid" is used for this purpose:
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
SET QUERY_BAND = 'Method=UnpackFunc;' FOR SESSION;
SELECT
PK,
CAST(Col1 AS VARCHAR(255)),
CAST(Col2 AS VARCHAR(255)),
CAST(Col3 AS VARCHAR(255))
FROM Unpack (
ON Split
USING
TargetColumn ('ToBeSplit')
OutputColumns ('Col1', 'Col2', 'Col3')
OutputDataTypes ('varchar', 'varchar', 'varchar')
Delimiter (',')
Regex ('(.*)')
RegexSet (1)
IgnoreInvalid ('true')
) AS t01
;The Teradata "strtok_split_to_table" Function
This function splits data into columns and assigns each column a number for identification. Additionally, it creates a new row for each column.
Get the next issue by email.
SET QUERY_BAND = 'Method=strtok_split_to_tab;' FOR SESSION;
WITH T1 as
(
SELECT d.* FROM TABLE (strtok_split_to_table( Split.PK, Split.ToBeSplit, ',')
RETURNS (PK integer, tokennum integer, Col varchar(100)) ) as d
)
SELECT T1.* FROM t1
;SELECT QUERYBAND,Spoolusage,TotalIOCount,AMPCPUTIme
FROM DBC.DBQLOGTBL
WHERE QUERYBAND LIKE '%Method%'Performance Considerations
Only the Teradata functions "CSVLD" and "Unpack" produce one output row for each input row, making them directly comparable. In contrast, "strtok_split_to_table" multiplies the number of rows and requires additional spool space.
Based on our empirical measurements (each query was executed three times), the Unpack method necessitates approximately 43% less spool space and 59% fewer I/Os.
| Queryband | SpoolUsage | TotalIOCount | AMPCPUTime |
| =S> Method=CSVLDFunc; | 110 559 232 | 332,00 | 1,89 |
| =S> Method=CSVLDFunc; | 110 559 232 | 332,00 | 1,89 |
| =S> Method=CSVLDFunc; | 110 559 232 | 332,00 | 1,90 |
| =S> Method=strtok_split_to_tab; | 245 719 040 | 608,00 | 4,71 |
| =S> Method=strtok_split_to_tab; | 245 719 040 | 608,00 | 4,72 |
| =S> Method=strtok_split_to_tab; | 245 719 040 | 608,00 | 4,98 |
| =S> Method=UnpackFunc; | 63 406 080 | 136,00 | 1,90 |
| =S> Method=UnpackFunc; | 63 406 080 | 136,00 | 1,79 |
| =S> Method=UnpackFunc; | 63 406 080 | 136,00 | 1,87 |
Discover this fascinating article on Stack Overflow.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.