Add example for PartitionedFile schema by fpetkovski · Pull Request #22809 · apache/datafusion

fpetkovski · 2026-06-07T19:23:45Z

Which issue does this PR close?

Addresses the suggestion in #22360 (review) to add an example for specifying an Arrow schema for a PartitionedFile.

What changes are included in this PR?

Add an example in datafusion-examples/examples/data_io/partitioned_file_schema.rs.

Are these changes tested?

Tested with

cd datafusion-examples/examples
cargo run --example data_io -- partitioned_file_schema

Are there any user-facing changes?

No user facing changes.

cc @alamb

martin-g · 2026-06-08T11:25:11Z

 //!   (file: query_http_csv.rs, desc: Query CSV files via HTTP)
 //!
 //! - `remote_catalog`
 //!   (file: remote_catalog.rs, desc: Interact with a remote catalog)


Please add an entry for partitioned_file_schema

Thanks, updated.

martin-g · 2026-06-08T11:30:38Z

+
+    let table_schema = Arc::new(Schema::new(vec![
+        Field::new("a", DataType::Int32, true),
+        Field::new("b", DataType::Float64, true),


Please add a comment what is the purpose of field b. It is not mentioned at https://github.com/apache/datafusion/pull/22809/changes#diff-5097924e81226127006feb2aab9ff70726bf3ad7d6bb5d6d73a7a53f0412636bR45

I added a comment for this field, let me know if it makes sense.

alamb

Thank you @fpetkovski and @martin-g

I ran it locally like

andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo run --profile=ci --example data_io -- partitioned_file_schema
    Finished `ci` profile [unoptimized] target(s) in 0.25s
     Running `target/ci/examples/data_io partitioned_file_schema`
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
  1,
  2,
  3,
  4,
  5,
], PrimitiveArray<Float64>
[
  null,
  null,
  null,
  null,
  null,
]], row_count: 5 }
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
  1,
  2,
  3,
  4,
  5,
], PrimitiveArray<Float64>
[
  null,
  null,
  null,
  null,
  null,
]], row_count: 5 }
Got schema error: ParquetError(ArrowError("Incompatible supplied Arrow schema: data type mismatch for field a: requested Int64 but found Int32"))

I took the liberty of pushing a commit to your branch to resolve a CI error: https://github.com/apache/datafusion/actions/runs/27138078890/job/80100749669?pr=22809

alamb · 2026-06-08T15:11:33Z

+/// already known, it can be supplied up front so this inference step is
+/// skipped, saving an I/O round trip and metadata parse per file.
+///
+/// The example writes a small Parquet file with a single `Int32` column `a` and


Thank you -- this is a nice description of what is going on

Add example for PartitionedFile schema

7ae54d6

fpetkovski force-pushed the partition-file-example branch from eafcd76 to 7ae54d6 Compare June 7, 2026 19:27

fpetkovski marked this pull request as ready for review June 7, 2026 19:47

martin-g reviewed Jun 8, 2026

View reviewed changes

fpetkovski and others added 2 commits June 8, 2026 14:36

Address code review comments

7cf92a7

doc: update list of examples

7b5a0c5

alamb approved these changes Jun 8, 2026

View reviewed changes

alamb added the documentation Improvements or additions to documentation label Jun 8, 2026

comphead added this pull request to the merge queue Jun 9, 2026

Merged via the queue into apache:main with commit bdfdd09 Jun 9, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example for PartitionedFile schema#22809

Add example for PartitionedFile schema#22809
comphead merged 3 commits into
apache:mainfrom
fpetkovski:partition-file-example

fpetkovski commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

martin-g Jun 8, 2026

Uh oh!

fpetkovski Jun 8, 2026

Uh oh!

martin-g Jun 8, 2026

Uh oh!

fpetkovski Jun 8, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fpetkovski commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

martin-g Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

fpetkovski Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

fpetkovski Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fpetkovski commented Jun 7, 2026 •

edited

Loading