Share

Teradata Physical Storage: Understanding the Technical Details for Performance Optimization

Teradata Physical Storage: Understanding the Technical Details for Performance Optimization
arch4

A thorough understanding of the Teradata system's technical intricacies is crucial to attaining expertise in performance optimization. Although many of us are knowledgeable about existing optimization opportunities, resolving complex performance issues often requires an in-depth understanding of internal data storage.

Until recently, data storage relied solely on hard disks. However, the current trend is shifting towards solid-state disks, yet hard drives are expected to remain in use for a considerable time.

Teradata systems currently employ both solid-state disks and hard drives for data storage.


Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.

Get the next issue


The Teradata data storage system comprises cylinders, sectors, and blocks within the sectors, which marks the start of our journey into its inner workings.

Data records are stored in blocks allocated to specific cylinders on the AMP's disk.

Teradata Physical Storage - How can a table row be accessed?

teradata physical storage

The Parsing Engine sends requests to the AMPs containing the internal TABLEID, which serves as a unique identifier for tables in a Teradata system. To locate a data record on the disk, two indexes are employed.

The Teradata Master Index

The Master Index stores information about the cylinder holding a table's rows, eliminating the need to read all cylinders for a single data row, such as during primary index access. The cylinder index contains each cylinder's minimum ROWID and maximum ROWHASH values to prevent redundancy.

Get the next issue by email.

The master index maintains a sub-index of available cylinders for write operations such as inserts and updates.

The master index is cached in the FSG cache and readily accessible on every AMP for data retrieval.

The Teradata Cylinder Index

After querying the master index, the AMP quickly locates the cylinders containing the table records. The cylinder index is then used to pinpoint the specific cylinder sectors that hold the necessary data blocks, which ultimately house the required data rows. Notably, the cylinder index stores the initial and final sectors for every data block in a table.

Due to the larger size of the cylinder index compared to the master index, it may not be completely stored within the FSG cache.

The lowest ROWID and highest ROWHASH are accessible for restricting access to sectors containing the necessary records, similar to the master index. The cylinder index maintains a sub-index of all vacant disk sectors.

Once the AMP determines the first and last sector of the data block containing the necessary data record(s), the block can be transferred to the FSG cache for performing select, update, insert, or delete operations.

The AMP will locate the requested data record(s) by performing a binary search on the data block(s).

The binary search algorithm efficiently locates a row in a sorted array of row IDs by iteratively comparing the desired row ID value with the row ID value of the central pointer array element.

Pointers mark table rows. The row can be accessed by the position the pointer array's central element points to if it matches the row ID being searched for. If the row ID is less than the central element's row ID, the algorithm will search to the left of the middle element. If the search key is greater, it will search to the right. The algorithm repeats these steps until the searched row ID is found or there are no more elements to search, indicating that the row is unavailable.

In part two of "The Ultimate Teradata Physical Storage Guide" series, we'll examine the data block structure, record storage within the blocks, and Teradata's management of varying block size needs.

We will demonstrate Teradata's data block maintenance techniques, which involve adding, reducing, and splitting blocks as necessary to reduce fragmentation and optimize system speed.


Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe before the paid plan launches and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Subscribe to DWHPro Letters

Practical field notes on enterprise data engineering, production AI systems, platform migration, and the senior engineering market.
Written by Roland Wenzlofsky Founder of DWHPro Author of Teradata Query Performance Tuning
Get the next issue
Subscribe