Loading Data into Teradata using Python and Fastload
This article assumes prior basic knowledge of Python and demonstrates the easy process of loading data using Python and the Teradata SQL Driver for Python with Fastload.
If you are using Windows, we recommend using WinPython if you don't have Python installed yet. Because a Jupyter Notebook is included, we use it to show how easy it is to load data into Teradata.
Download the latest WinPython version from here.
Want more practical data engineering analysis like this?
Join DWHPro Letters and get field-tested notes on Teradata, Snowflake, AI, migrations, performance, and enterprise data work. Early subscribers keep launch access before the paid plan launches.
Download the latest version of WinPython
The table definition is provided below:
Get the next issue by email.
CREATE MULTISET TABLE DWHPRO.PythonFastload
(
PK INTEGER,
TheDate DATE FORMAT 'YY/MM/DD',
TheInteger INTEGER,
TheDecimal DECIMAL(10,2),
TheVarchar VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );The Required Packages
The following packages are necessary to access Teradata:
- teradatasql
- pandas
- NumPy
- faker
The pandas, NumPy, and faker packages serve solely to generate fictitious random data for import.
import pandas as pdimport numpy as npimport teradatasqlimport faker.providers as ffrom faker import Fakerfake=Faker()from datetime import datetimeNext, we establish a connection to Teradata and initiate a cursor.
con = teradatasql.connect ('{"host":"127.0.0.1","user":"dbc","passsword":"dbc","database":"DWHPRO"}')cur = con.cursor ()We'll create a helper function for generating random dates to insert:
def random_dates(start, end, size): divide_by = 24 * 60 * 60 * 10**9 start_u = start.value // divide_by end_u = end.value // divide_by return pd.to_datetime(np.random.randint(start_u, end_u, size), unit="D").dateFunction executing the Fastload
This function utilizes a data frame to load the simulated data.
def fload(size_tran = 100000):
"""Uses Teradata FastLoad method to load data
"""
try:
sRequest = "DROP TABLE DWHPRO.PythonFastload"
print (sRequest)
cur.execute (sRequest)
except Exception as ex:
print ("Ignoring", str (ex).split ("\n") [0])
sRequest = """CREATE MULTISET TABLE DWHPRO.PythonFastload (
PK INTEGER
,TheDate DATE
,TheInteger INTEGER
,TheDecimal decimal(10,2)
,TheVarChar varchar(20)
)
"""#
print (sRequest)
cur.execute (sRequest)
sRequest = "{fn teradata_nativesql}{fn teradata_autocommit_off}"
print (sRequest)
cur.execute (sRequest)
size = 100000
start = datetime.now()
tran = pd.DataFrame( )
tran["PK"] = [x+1 for x in range(size_tran)]
tran["TheDate"] = random_dates(start=pd.to_datetime('2021-01-01'), end=pd.to_datetime('2021-12-31'), size=size_tran)
tran["TheInteger"] = np.random.choice([x+1 for x in range(size)] , size =size_tran)
tran["TheDecimal"]= [ np.random.randint(10, 100000) / 100 for x in range(size_tran)]
tran["TheVarChar"]= np.random.choice([fake.iban() for _ in range(1000)] , size = size_tran)
tran_list = tran.values.tolist()
parameter = ",?"*len(tran_list[0])
parameter = parameter[1:]
aaoValues= tran.values.tolist()
sInsert = "{fn teradata_try_fastload}INSERT INTO DWHPRO.PythonFastload ("+parameter+")"
print (sInsert)
cur.execute (sInsert, aaoValues)
con.commit()
print(f"transaction: data generated {datetime.now()-start}")
print(f"transaction: data inserted {datetime.now()-start}")We execute the Fastload by calling the function, then close the cursor and terminate the connection to Teradata.
fload(size_tran = 100000)cur.close()con.close()If you encounter an error message in the Jupyter Notebook indicating too many active load or unload tasks, you can immediately terminate these sessions using the following command:
SELECT AbortSessions (1, 'DBC', 0, 'Y', 'N');To terminate a specific session, specify the sessionid parameter. Alternatively, using the second parameter will terminate all active sessions for the user.
Planning or surviving an enterprise data platform migration?
I write regularly about the performance, cost, architecture, and project mistakes that show up in real Teradata, Snowflake, Databricks, and enterprise data work.
Subscribe before the paid plan launches and keep launch access.
Written by Roland Wenzlofsky, founder of DWHPro and author of Teradata Query Performance Tuning. DWHPro has helped data warehouse practitioners for 15+ years.