Solved: Incremental write - Databricks Community - 14562

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hi All,

I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source tables , joins them and eventually overwrites the existing parquet file.

The question becomes - is there a way to implement the incremental write only in cases of a new record or changes in the values in the existing record of the file.

1 ACCEPTED SOLUTION

Accepted Solutions

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

View solution in original post

3 REPLIES 3

Thanks, Appreciate the quick response.

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

Thanks werners

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

DAIS 2025 Virtual Learning Festival: 11 June - 02 July 2025

Welcome to the Greece User Group!

DAIS 2025 Day 1 Highlights

What an incredible Day 2 at DAIS 2025!

	
		OSZAR »