In this post, I’ll explain how to add the current date and time to an existing PySpark DataFrame in a Fabric Notebook. This is particularly helpful when inserting data into a Fabric Lakehouse table, as it allows you to track when each record was added. I’ll walk through an example using a DataFrame to first load some sample data and show how to append a new column with the current timestamp to capture the insertion time.
The data I am using is new york taxi dataset, after I opened my notebook and attached my lakehouse, I can simply upload the file to my lakehouse.
data:image/s3,"s3://crabby-images/77d17/77d17bf24585039621d365336b22f5fa17c32622" alt=""
Once I confirm file being uploaded, I can load the data using spark.
data:image/s3,"s3://crabby-images/dbe7c/dbe7c6b012a1228ccac5e36653d9d5cea4dd0be1" alt=""
data:image/s3,"s3://crabby-images/524ad/524ad03ef5407f0d07fb3b59f181962452dc398f" alt=""
To add column for datetime, you need to import lit and current_timestamp function then adding a new column. Run the query you will see CurrentDateTime column appended to the right.
dft = df.withColumn("CurrentDateTime", lit(current_timestamp()))
data:image/s3,"s3://crabby-images/2496b/2496b8f6a8688da71bafade0013d868cb41f41c8" alt=""
For validation purpose, you can put it in the first column by doing df.select(“columnname”, “*”), similar to how we do it in sql.
data:image/s3,"s3://crabby-images/2e844/2e8448243c376b1a05905ce8d1e52b611f5344dc" alt=""
This is a very easy trick but quite useful when tracing changes, I hope you find it helpful.