Skip to content

interactive: port the data model from flat [i64] to a Value ADT + Term scalar language#760

Draft
frankmcsherry wants to merge 8 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:value-next
Draft

interactive: port the data model from flat [i64] to a Value ADT + Term scalar language#760
frankmcsherry wants to merge 8 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:value-next

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Replaces the flat [i64]/FieldExpr data model with a Value ADT (Int/Tuple/Variant/List) and a Term scalar language, on top of master-next's scope-tree IR +
substrate-generic backend, and brings the explanation rewrite back online over the new model. Four reviewable commits:

  1. Value data model — Value + the tree-walking Term interpreter in ir.rs; Projection becomes {key: Term, val: Term}; Reducer gains Collect, Expr/LinearOp gain FlatMap;
    both parsers parse the full Term grammar (tuples/lists/spread, proj, inject/case, fold, builtins, named constructors). backend/vec.rs evaluates Terms over Value rows.
    Existing programs verified (reach → 4 reachable; scc → 3 cycle edges).
  2. ADT example programs — unnest (flatmap/collect round-trip), binders (fold + named case binders), adt (constructors/case), congruence + eqsat (variable-arity e-node
    congruence and the full equality-saturation fixpoint), cse_tree.
  3. Explain back online — implements the decoupled RowModel/Dataflow traits for Value/Term. The demand envelope is a flat value tuple [V | chain | q] matching the host
    lift; time_le/strip are inlined and folded retired. All sufficiency tests pass, including the --ignored sweeps (scc 100/110, the join partner-time regression at
    1000/1100, tc/reach fuzz).
  4. Explain CLI — dump_explain and ddir_vec --explain restored.

Deferred: the columnar substrate (backend::col/ddir_col) needs a Columnar story for Value.

🤖 Generated with Claude Code

frankmcsherry and others added 8 commits June 15, 2026 08:00
Replace the flat [i64]/FieldExpr data model with the Value ADT (Int/Tuple/
Variant/List) and the Term scalar language, on master-next's scope-tree IR +
substrate-generic backend.

- ir.rs: Value + the tree-walking Term interpreter (eval); LinearOp gains
  FlatMap, Filter/EnterAt now carry Term. Drops RowLike/FieldExpr eval and the
  arity transfer functions (those were explain-only).
- parse: Projection is now {key: Term, val: Term}; Reducer gains Collect; Expr
  gains FlatMap. Both front-ends parse the full Term grammar (tuples/lists/
  spread, proj, inject/case, fold, builtins) plus named constructors + pattern
  `case` (pipe), reconciled with master-next's import/export syntax.
- backend/vec.rs: Row = Value; render_linear/join/reduce evaluate Terms;
  Collect NEST reducer. Value derives serde (ExchangeData bound).
- gen_row produces (Tuple[Int;arity], unit); ddir_vec gains EDGES_FILE input.

Deferred to later stages: explain + its folded helper (need RowModel for
Value/Term), and the col substrate (needs a Columnar story for Value).

Verified: lib tests pass; reach.ddp (root 0, chain 0-1-2-3) -> 4 reachable;
scc.ddp (cycle 0-1-2 + trivial 3-4) -> 3 cycle edges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the Value/ADT example programs onto the scope-tree base (old `result …;`
-> `export "result" = …;`), exercising the new scalar language end-to-end:

- unnest.ddp  — flatmap (UNNEST) / collect (NEST) list round-trip
- binders.ddp — fold with named pattern-`case` binders
- adt.ddp     — named constructors + pattern `case`
- congruence.ddp / eqsat.ddp — variable-arity e-node congruence and the full
  equality-saturation fixpoint
- cse_tree.ddp — common-subexpression sharing over expression trees

Verified on master-next: eqsat reproduces both scenarios (pure congruence
5~1 then mul(5,2)~mul(1,2); and the a~b cascade collapsing all three muls);
unnest round-trips position-ordered; adt yields the same 98/102 buckets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-enable the explanation rewrite on the Value data model by implementing the
decoupled `RowModel`/`Dataflow` traits for `Value`/`Term`.

- explain/mod.rs: a `Val` RowModel whose demand envelope is a flat value tuple
  `[V | chain (innermost-first) | q]` — matching the host lift's `append_iter`.
  Each rule builds `Term`-based projections/predicates over field indices
  (replacing the flat `[i64]` `FieldExpr` column ranges); `time_le`/`strip` are
  inlined (the `folded` algebra), and a `Spread`-bounding `expand_value_fields`
  keeps bare-row refs from pulling in chain coords. `Sb`'s `Dataflow` predicate
  is now `Term`. The clone/resolve/shape machinery is unchanged; the shape pass
  is `Term`-arity.
- Count now yields a one-field tuple `(count)`, keeping "a value is a tuple" so
  `$1[0]` and the explain envelope hold uniformly.
- decouple.rs: drop the flat executable contract; the `nested_contract`
  model-agnostic proof remains the runnable spec. `folded.rs` retired.
- tests/explain.rs restored, ported to Value rows + the flat query envelope.

Verified: all 8 sufficiency tests pass, plus the heavy --ignored sweeps
(scc 100/110, the join partner-time regression at 1000/1100, tc/reach fuzz).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- dump_explain: re-enabled (prints the scope-tree IR before/after the rewrite);
  it has no data-model dependencies and works as-is now that explain is online.
- ddir_vec --explain / --query=K:V[,q] / --debug-demand: re-enabled. The query
  input is seeded with the flat demand envelope `(key ; val ++ q)`; demand
  collections can be tapped with --debug-demand.

The CLI assigns every source the uniform shape (arity, 0), so --explain is for
single-input-arity programs (e.g. scc); mixed-arity programs (reach's arity-1
roots) need explicit per-input shapes, as the integration tests use. Verified:
scc.ddp --explain demands the cycle edges that produced the queried output.

The columnar substrate (ddir_col / backend::col) stays deferred — it needs a
Columnar story for Value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(stage 5)

Restore a unit-level by-example spec for the reverse rules — but over the model
the crate actually evaluates. The removed `[i64]` `contract` tested `Flat` via
`eval_fields`/`eval_condition`; this `value_contract` runs the same six specs on
real `Value` rows in `Val`'s flat envelope `[V | chain | q]`, through an
in-memory `Value` dataflow against `explain::Val`.

`nested_contract` (a different, nested layout over a toy model) stays as the
proof that the *rules* are model-agnostic; `value_contract` pins the *model*
the backend runs, closing the unit-coverage gap the deletion opened.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consolidate the front-end language docs into one reference, on the `pipe`
module (the .ddp front-end): the collection language (sources, pipe operators
incl. flatmap/collect, statements, `con` decls) and the scalar `Term` language
(row/field access, arithmetic, products/lists/sums, named constructors,
pattern `case`, `fold` with `^0`/`^1`, binders, `if`). Doc-only; previously
this had to be teased out of the `Term` variants, `build_builtin`, and example
programs. `Term`'s doc now points here for the concrete syntax.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rses

Capture the design for restoring the explainability invariant (writable =>
explainable) once the data model is Value/ADTs, and for unifying the per-op
reverse rules. Core: a universal backstop (witness the inputs, key by output,
join on output) makes every op explainable with no op-specific logic; an
optional inverse — factored as (PRESERVED_out, PRESERVED_in, RESIDUAL, REFORM)
with Total/None as the endpoints and a precision dial between — is a per-op
optimization. Maps today's lossy_*/keyed/join/folded onto the interface,
notes the opaque-envelope interaction (which also deletes the shape pass), and
lays out a phased plan. Doc-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tmap

Per inverse-design.md, a contract-style test proving the universal backstop
reverses `flatmap` — the op the live rewrite currently panics on — using only
the existing `Dataflow` primitives over real `Value` rows (with a `List`): a
forward-built `(output -> input)` pair table, one join on the output, and a
`REFORM` projection that recovers the whole input (the `None` endpoint). It
demands one exploded output and gets back exactly the input row whose list
carried that element, with the query id.

Additive and isolated to the test harness — the live rewrite is untouched. This
turns "the regression is closable" into a running demonstration; the phased plan
(opaque envelope, refactor lossy_* through the interface, wire FlatMap into the
real walk) follows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant