Share

Loading Data into Teradata using Python and Fastload

Loading Data into Teradata using Python and Fastload
tool3

This article assumes prior basic knowledge of Python and demonstrates the easy process of loading data using Python and the Teradata SQL Driver for Python with Fastload.

If you are using Windows, we recommend using WinPython if you don't have Python installed yet. Because a Jupyter Notebook is included, we use it to show how easy it is to load data into Teradata.

Download the latest WinPython version from here.


Want more practical data engineering analysis like this?

Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.

Get the next issue


Download the latest version of WinPython

The table definition is provided below:

Get the next issue by email.
CREATE MULTISET TABLE DWHPRO.PythonFastload
(
      PK  INTEGER,
      TheDate DATE FORMAT 'YY/MM/DD',
      TheInteger INTEGER,
      TheDecimal DECIMAL(10,2),
      TheVarchar VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );

The Required Packages

The following packages are necessary to access Teradata:

  • teradatasql
  • pandas
  • NumPy
  • faker

The pandas, NumPy, and faker packages serve solely to generate fictitious random data for import.

import pandas as pdimport numpy as npimport teradatasql
import faker.providers as ffrom faker import Fakerfake=Faker()from datetime import datetime

Next, we establish a connection to Teradata and initiate a cursor.

con = teradatasql.connect ('{"host":"127.0.0.1","user":"dbc","passsword":"dbc","database":"DWHPRO"}')cur = con.cursor ()

We'll create a helper function for generating random dates to insert:

def random_dates(start, end, size):    divide_by = 24 * 60 * 60 * 10**9    start_u = start.value // divide_by    end_u = end.value // divide_by    return pd.to_datetime(np.random.randint(start_u, end_u, size), unit="D").date

Function executing the Fastload

This function utilizes a data frame to load the simulated data.

def fload(size_tran = 100000):
    """Uses Teradata FastLoad method to load data
    """
    try:
        sRequest = "DROP TABLE DWHPRO.PythonFastload"
        print (sRequest)
        cur.execute (sRequest)
    except Exception as ex:
        print ("Ignoring", str (ex).split ("\n") [0])

    sRequest = """CREATE MULTISET TABLE DWHPRO.PythonFastload  (
     PK INTEGER
    ,TheDate DATE
    ,TheInteger INTEGER
    ,TheDecimal decimal(10,2)
    ,TheVarChar varchar(20)
    )
    """#
    print (sRequest)
    cur.execute (sRequest)


    sRequest = "{fn teradata_nativesql}{fn teradata_autocommit_off}"
    print (sRequest)
    cur.execute (sRequest)

    size = 100000
    start = datetime.now()
    tran = pd.DataFrame( )
    tran["PK"] = [x+1 for x in range(size_tran)]
    tran["TheDate"] = random_dates(start=pd.to_datetime('2021-01-01'), end=pd.to_datetime('2021-12-31'), size=size_tran)
    tran["TheInteger"] = np.random.choice([x+1 for x in range(size)] , size =size_tran)
    tran["TheDecimal"]= [ np.random.randint(10, 100000) / 100 for x in range(size_tran)]
    tran["TheVarChar"]=   np.random.choice([fake.iban()  for _ in range(1000)] , size = size_tran)


    tran_list = tran.values.tolist()
    parameter = ",?"*len(tran_list[0])
    parameter = parameter[1:]

    aaoValues= tran.values.tolist()

    sInsert = "{fn teradata_try_fastload}INSERT INTO DWHPRO.PythonFastload ("+parameter+")"

    print (sInsert)
    cur.execute (sInsert, aaoValues)

    con.commit()

    print(f"transaction: data generated {datetime.now()-start}")
    print(f"transaction: data inserted {datetime.now()-start}")

We execute the Fastload by calling the function, then close the cursor and terminate the connection to Teradata.

fload(size_tran = 100000)
cur.close()con.close()

If you encounter an error message in the Jupyter Notebook indicating too many active load or unload tasks, you can immediately terminate these sessions using the following command:

SELECT AbortSessions (1, 'DBC', 0, 'Y', 'N');

To terminate a specific session, specify the sessionid parameter. Alternatively, using the second parameter will terminate all active sessions for the user.


Planning or surviving an enterprise data platform migration?

I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.

Subscribe before the paid plan launches and keep launch access.

Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.

Subscribe to DWHPro Letters

Practical field notes on enterprise data engineering, production AI systems, platform migration, and the senior engineering market.
Written by Roland Wenzlofsky Founder of DWHPro Author of Teradata Query Performance Tuning
Get the next issue
Subscribe