How to add Current DateTime as New Column to Fabric Lakehouse using Notebook PySpark

In this post, I’ll explain how to add the current date and time to an existing PySpark DataFrame in a Fabric Notebook. This is particularly helpful when inserting data into a Fabric Lakehouse table, as it allows you to track when each record was added. I’ll walk through an example using a DataFrame to first load some sample data and show how to append a new column with the current timestamp to capture the insertion time.

The data I am using is new york taxi dataset, after I opened my notebook and attached my lakehouse, I can simply upload the file to my lakehouse.

Once I confirm file being uploaded, I can load the data using spark.

To add column for datetime, you need to import lit and current_timestamp function then adding a new column. Run the query you will see CurrentDateTime column appended to the right.

dft = df.withColumn("CurrentDateTime", lit(current_timestamp()))

For validation purpose, you can put it in the first column by doing df.select(“columnname”, “*”), similar to how we do it in sql.

This is a very easy trick but quite useful when tracing changes, I hope you find it helpful.

Daily BI Talks

Business Intelligence Chats and Tips for Data Professionals!

How to add Current DateTime as New Column to Fabric Lakehouse using Notebook PySpark