Understanding Teradata Nodes in Parallel Systems: Architecture and Scalability

Learn about Teradata Nodes - Linux systems packed into cabinets with multicore CPUs, memory, and parallel database extension software (PDE).

arch3

What is a Teradata Node?

Teradata nodes are Linux systems packed into a single cabinet, each containing multiple physical multicore CPUs and ample memory. These systems run the parallel database extension software (PDE) on top of the Linux operating system.

On each node, the primary processes of Teradata systems are performed (see our article about the Teradata high-level architecture):

- The Parsing Engines
- The AMPs
- Two redundant BYNETs for the communication between AMPs and Parsing Engines.

Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. DWHPro Letters is free. Subscribe to get new issues by email.

Get the next issue

Parallelism within a node is achieved by uniformly distributing the workload among all AMPs.

Teradata architecture offers excellent scalability, allowing numerous nodes to connect into a vast system.

Get the next issue by email.

The idea of achieving a performance boost by doubling the number of nodes is often called linear scalability. However, this is largely a myth in practice. The flaw becomes apparent when you consider the requirement for perfect parallelism across your workload.

In practice, workload skew is a well-known problem. Adding more nodes will not improve efficiency if a SQL statement's workload remains on a single AMP. This is an important consideration when expanding your system.

In addition, fault tolerance capabilities are limited, and such architectures lack resilience. It is not feasible for a Teradata system to scale to thousands of nodes. While hot standby nodes can provide some level of fault tolerance, they come at a high cost.

In parallel system terminology, a single node is referred to as a symmetric multiprocessing (SMP) node. A system comprised of at least two nodes is classified as a massively parallel processing (MPP) system.

The communication network, BYNET, is software-based within a single node but hardware-based between nodes, facilitating communication between AMPs and Parsing Engines across different nodes.

Two BYNETs are always available for reasons of performance and fault tolerance.

Both networks are used concurrently to maximize throughput, provided they operate without errors. In the event of a network failure, the backup network ensures uninterrupted operation. Teradata becomes inoperative only if both networks fail.

Years ago, Teradata's BYNET provided a notable advantage by sorting and merging data, thus reducing CPU workload. However, this benefit may no longer be as significant with the advent of multicore processors. The shift from BYNET to InfiniBand as the primary data transmission backbone is a contributing factor.

Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe for free and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

What is a Teradata Node?

Subscribe to DWHPro Letters