Skip to content

[spark] Support Merge Into DELETE for data evolution tables#8182

Draft
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/spark-data-evolution-merge-delete
Draft

[spark] Support Merge Into DELETE for data evolution tables#8182
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/spark-data-evolution-merge-delete

Conversation

@leaves12138

Copy link
Copy Markdown
Contributor

What changed

  • Add a data-evolution delete rewriter that rewrites normal and blob files by retained row-id ranges.
  • Wire matched DELETE actions into Spark DataEvolution MERGE INTO, including mixed update/delete sequencing and Spark 4.0 parity.
  • Add RowTracking and Blob coverage for merge delete and update+delete cases.

Why

DataEvolution tables currently handle merge updates/inserts, but matched deletes need to physically rewrite the affected row-id ranges so normal files and corresponding blob files stay aligned.

Validation

  • mvn -pl paimon-spark/paimon-spark-common -am -Pspark3,fast-build -DskipTests compile
  • JAVA_HOME=/Users/yejunhao/Library/Java/JavaVirtualMachines/ms-17.0.16/Contents/Home PATH=/Users/yejunhao/Library/Java/JavaVirtualMachines/ms-17.0.16/Contents/Home/bin:$PATH mvn -pl paimon-spark/paimon-spark-4.0 -am -Pspark4,fast-build -DskipTests compile
  • mvn -pl paimon-spark/paimon-spark-3.5 -am -Pspark3,fast-build -DfailIfNoTests=false -DwildcardSuites=org.apache.paimon.spark.sql.RowTrackingTest -Dtest=none test ran 37 tests; the new data-evolution delete tests did not fail, while 5 existing local failures hit codegen loader / generated class instantiation issues.
  • mvn -pl paimon-spark/paimon-spark-ut -am -Pspark3,fast-build -DfailIfNoTests=false -DwildcardSuites=org.apache.paimon.spark.sql.BlobTestWithV2Write -Dtest=none test ran 15 tests; the new blob delete test did not fail, while 2 existing local failures hit the same codegen loader issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant