Skip to content

Add example for PartitionedFile schema#22809

Merged
comphead merged 3 commits into
apache:mainfrom
fpetkovski:partition-file-example
Jun 9, 2026
Merged

Add example for PartitionedFile schema#22809
comphead merged 3 commits into
apache:mainfrom
fpetkovski:partition-file-example

Conversation

@fpetkovski

@fpetkovski fpetkovski commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Addresses the suggestion in #22360 (review) to add an example for specifying an Arrow schema for a PartitionedFile.

What changes are included in this PR?

  • Add an example in datafusion-examples/examples/data_io/partitioned_file_schema.rs.

Are these changes tested?

Tested with

cd datafusion-examples/examples
cargo run --example data_io -- partitioned_file_schema

Are there any user-facing changes?

No user facing changes.

cc @alamb

@fpetkovski fpetkovski force-pushed the partition-file-example branch from eafcd76 to 7ae54d6 Compare June 7, 2026 19:27
@fpetkovski fpetkovski marked this pull request as ready for review June 7, 2026 19:47
Comment thread datafusion-examples/examples/data_io/partitioned_file_schema.rs Outdated
Comment thread datafusion-examples/examples/data_io/partitioned_file_schema.rs Outdated
//! (file: query_http_csv.rs, desc: Query CSV files via HTTP)
//!
//! - `remote_catalog`
//! (file: remote_catalog.rs, desc: Interact with a remote catalog)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry for partitioned_file_schema

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.


let table_schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, true),
Field::new("b", DataType::Float64, true),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment for this field, let me know if it makes sense.

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fpetkovski and @martin-g

I ran it locally like

andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo run --profile=ci --example data_io -- partitioned_file_schema
    Finished `ci` profile [unoptimized] target(s) in 0.25s
     Running `target/ci/examples/data_io partitioned_file_schema`
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
  1,
  2,
  3,
  4,
  5,
], PrimitiveArray<Float64>
[
  null,
  null,
  null,
  null,
  null,
]], row_count: 5 }
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
  1,
  2,
  3,
  4,
  5,
], PrimitiveArray<Float64>
[
  null,
  null,
  null,
  null,
  null,
]], row_count: 5 }
Got schema error: ParquetError(ArrowError("Incompatible supplied Arrow schema: data type mismatch for field a: requested Int64 but found Int32"))

I took the liberty of pushing a commit to your branch to resolve a CI error: https://github.com/apache/datafusion/actions/runs/27138078890/job/80100749669?pr=22809

/// already known, it can be supplied up front so this inference step is
/// skipped, saving an I/O round trip and metadata parse per file.
///
/// The example writes a small Parquet file with a single `Int32` column `a` and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- this is a nice description of what is going on

@alamb alamb added the documentation Improvements or additions to documentation label Jun 8, 2026
@comphead comphead added this pull request to the merge queue Jun 9, 2026
Merged via the queue into apache:main with commit bdfdd09 Jun 9, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants