Skip to content

Add logical range partitioning representation#22777

Merged
gabotechs merged 2 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/06/logical-source-partitioning
Jun 10, 2026
Merged

Add logical range partitioning representation#22777
gabotechs merged 2 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/06/logical-source-partitioning

Conversation

@gene-bordegaray

@gene-bordegaray gene-bordegaray commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Declared scan output partitioning should use logical partitioning metadata, not physical partitioning types. This adds logical range partitioning so range-partitioned sources can declare their layout at the logical layer.

What changes are included in this PR?

  • Add logical Partitioning::Range and RangePartitioning.
  • Move SplitPoint and shared split-point validation to datafusion-common.
  • Wire logical range partitioning through expression traversal, rewrites, and display.
  • Keep planning, logical proto, and Substrait support explicitly unsupported for now.

Are these changes tested?

Yes. Unit tests added

Are there any user-facing changes?

Yes. This adds public logical range partitioning API. No breaking API changes.

@gene-bordegaray gene-bordegaray changed the title Add logical range partitioning representation [WIP] Add logical range partitioning representation Jun 5, 2026
@github-actions github-actions Bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate substrait Changes to the substrait crate common Related to common crate proto Related to proto crate labels Jun 5, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v54.0.0 (current)
       Built [  91.233s] (current)
     Parsing datafusion v54.0.0 (current)
      Parsed [   0.031s] (current)
    Building datafusion v54.0.0 (baseline)
       Built [  90.354s] (baseline)
     Parsing datafusion v54.0.0 (baseline)
      Parsed [   0.032s] (baseline)
    Checking datafusion v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.696s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 184.393s] datafusion
    Building datafusion-common v54.0.0 (current)
       Built [  30.327s] (current)
     Parsing datafusion-common v54.0.0 (current)
      Parsed [   0.055s] (current)
    Building datafusion-common v54.0.0 (baseline)
       Built [  30.417s] (baseline)
     Parsing datafusion-common v54.0.0 (baseline)
      Parsed [   0.055s] (baseline)
    Checking datafusion-common v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.769s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  62.597s] datafusion-common
    Building datafusion-expr v54.0.0 (current)
       Built [  24.126s] (current)
     Parsing datafusion-expr v54.0.0 (current)
      Parsed [   0.069s] (current)
    Building datafusion-expr v54.0.0 (baseline)
       Built [  23.997s] (baseline)
     Parsing datafusion-expr v54.0.0 (baseline)
      Parsed [   0.069s] (baseline)
    Checking datafusion-expr v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   1.438s] 223 checks: 221 pass, 1 fail, 1 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/enum_variant_added.ron

Failed in:
  variant Partitioning:Range in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/plan.rs:4451
  variant Partitioning:Range in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/plan.rs:4451

--- warning partial_ord_enum_variants_reordered: enum variants reordered in #[derive(PartialOrd)] enum ---

Description:
A public enum that derives PartialOrd had its variants reordered. #[derive(PartialOrd)] uses the enum variant order to set the enum's ordering behavior, so this change may break downstream code that relies on the previous order.
        ref: https://doc.rust-lang.org/std/cmp/trait.PartialOrd.html#derivable
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/partial_ord_enum_variants_reordered.ron

Failed in:
  Partitioning::DistributeBy moved from position 3 to 4, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/plan.rs:4453
  Partitioning::DistributeBy moved from position 3 to 4, in /home/runner/work/datafusion/datafusion/datafusion/expr/src/logical_plan/plan.rs:4453

     Summary semver requires new major version: 1 major and 0 minor checks failed
     Warning produced 1 major and 0 minor level warnings
    Finished [  50.576s] datafusion-expr
    Building datafusion-physical-expr v54.0.0 (current)
       Built [  26.646s] (current)
     Parsing datafusion-physical-expr v54.0.0 (current)
      Parsed [   0.048s] (current)
    Building datafusion-physical-expr v54.0.0 (baseline)
       Built [  26.491s] (baseline)
     Parsing datafusion-physical-expr v54.0.0 (baseline)
      Parsed [   0.046s] (baseline)
    Checking datafusion-physical-expr v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.405s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure struct_missing: pub struct removed or renamed ---

Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/struct_missing.ron

Failed in:
  struct datafusion_physical_expr::SplitPoint, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/2231dfcfec6ed67c8b624f6f394b6d09d3843f69/datafusion/physical-expr/src/partitioning.rs:225

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  54.445s] datafusion-physical-expr
    Building datafusion-proto v54.0.0 (current)
       Built [  53.620s] (current)
     Parsing datafusion-proto v54.0.0 (current)
      Parsed [   0.017s] (current)
    Building datafusion-proto v54.0.0 (baseline)
       Built [  53.196s] (baseline)
     Parsing datafusion-proto v54.0.0 (baseline)
      Parsed [   0.018s] (baseline)
    Checking datafusion-proto v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.292s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 108.517s] datafusion-proto
    Building datafusion-substrait v54.0.0 (current)
       Built [ 313.977s] (current)
     Parsing datafusion-substrait v54.0.0 (current)
      Parsed [   0.016s] (current)
    Building datafusion-substrait v54.0.0 (baseline)
       Built [ 315.378s] (baseline)
     Parsing datafusion-substrait v54.0.0 (baseline)
      Parsed [   0.018s] (baseline)
    Checking datafusion-substrait v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.239s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 631.359s] datafusion-substrait

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 5, 2026
@gene-bordegaray gene-bordegaray changed the title [WIP] Add logical range partitioning representation Add logical range partitioning representation Jun 5, 2026
@gene-bordegaray

Copy link
Copy Markdown
Contributor Author

cc: @gabotechs @stuhood

@gene-bordegaray gene-bordegaray marked this pull request as ready for review June 5, 2026 18:35

@stuhood stuhood left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread datafusion/core/src/physical_planner.rs Outdated
Comment on lines +1268 to +1270
return not_impl_err!(
"Physical plan does not support Range repartitioning"
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a TODO, right? Should it point at a ticket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good catch I can add the epic: #22395

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gene-bordegaray gene-bordegaray force-pushed the gene.bordegaray/2026/06/logical-source-partitioning branch from 7f0949d to d3907a8 Compare June 6, 2026 13:12

@gabotechs gabotechs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some non blocking comments. Pretty straight forward PR! thanks @gene-bordegaray and @stuhood.

Comment thread datafusion/core/src/physical_planner.rs Outdated
Comment thread datafusion/expr/src/logical_plan/plan.rs Outdated
Comment thread datafusion/expr/src/logical_plan/plan.rs Outdated
Comment thread datafusion/core/src/physical_planner.rs Outdated
@gabotechs gabotechs added this pull request to the merge queue Jun 10, 2026
Merged via the queue into apache:main with commit d23321d Jun 10, 2026
38 checks passed
@gene-bordegaray

Copy link
Copy Markdown
Contributor Author

Sorry for not responding when travelling but addressed comments, thanks @gabotechs 😄

@gabotechs

Copy link
Copy Markdown
Contributor

I know 👍 it's completely fine, you responded with code so that's good enough!

AdamGS pushed a commit to AdamGS/arrow-datafusion that referenced this pull request Jun 11, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#22778.
- Related: apache#21992, apache#22395.
- Needed by apache#22657.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Declared scan output partitioning should use logical partitioning
metadata, not physical partitioning types. This adds logical range
partitioning so range-partitioned sources can declare their layout at
the logical layer.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Add logical `Partitioning::Range` and `RangePartitioning`.
- Move `SplitPoint` and shared split-point validation to
`datafusion-common`.
- Wire logical range partitioning through expression traversal,
rewrites, and display.
- Keep planning, logical proto, and Substrait support explicitly
unsupported for now.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes. Unit tests added

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Yes. This adds public logical range partitioning API. No breaking API
changes.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates proto Related to proto crate substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add logical range partitioning representation

3 participants