Skip to content

[rust-compiler] Tolerate unknown statement kinds at the AST boundary#36705

Closed
poteto wants to merge 1 commit into
lauren/ts-unknown-01-fixturesfrom
lauren/ts-unknown-02-tolerance
Closed

[rust-compiler] Tolerate unknown statement kinds at the AST boundary#36705
poteto wants to merge 1 commit into
lauren/ts-unknown-01-fixturesfrom
lauren/ts-unknown-02-tolerance

Conversation

@poteto
Copy link
Copy Markdown
Collaborator

@poteto poteto commented Jun 6, 2026

Summary

Babel can emit statement kinds the typed AST does not model; #36704 pins three TS module-interop forms. Deserialization previously failed the whole file on the first such node, while the TS reference compiles the file and leaves the statement alone.

  • Statement gains a final #[serde(untagged)] Unknown(UnknownStatement) variant carrying the complete raw Babel node. Deserialization is hand-written and dispatches modeled type tags through a KnownStatement helper enum, so a malformed modeled node still errors with its precise field-level message instead of degrading to Unknown; only genuinely unmodeled tags take the catch-all. A known_statements! macro is the single source for the dispatch enum, its From mapping, and the tag list, so the three cannot drift from each other.
  • Top-level unknown statements are preserved verbatim through re-serialization. Function-body occurrences record the standard UnsupportedSyntax bailout with an UnsupportedNode instruction carrying the raw node. The TS reference reaches its equivalent default case only via assertExhaustive, which Babel's closed types make unreachable; in Rust unmodeled syntax is reachable by construction, so it degrades per the fault-tolerance model instead of crashing.
  • Program-level analyses handle Unknown explicitly: the gating reference-before-declaration scan walks the raw node for identifier references (an export = X does reference X), and the prefilter and return-analysis arms are deliberately inert.
  • The SWC/OXC reverse converters get loud tripwire arms (a deliberate throw in generated code) instead of silently dropping unknown nodes. The SWC forward path is fixed in the next PR of this stack.
  • rust-port-0001-babel-ast.md's no-catch-all policy is amended to document Statement as the single deliberate exception.

Perf note: deserialization now materializes a serde_json::Value per statement before typed parsing. The marginal cost is a move-based tree rebuild at a one-time boundary; the previous derive also buffered every node through serde's internal Content to read the tag, so the delta is allocation shape, not asymptotics.

Test plan

  • cargo test --workspace green; unit tests cover program-level and nested-in-function round trips with reserialize equality, known-tag non-shadowing, precise malformed-node errors, missing/non-string/non-object type, scoped raw mutation, and a lowering integration test pinning the function-body bailout shape
  • All three fixtures pass scoped Babel e2e including events parity (import lib = require("shared-runtime") is preserved verbatim next to real memoization)
  • Full Babel e2e 1791/1798; the only failures are the 7 documented pre-existing ones
  • test-babel-ast.sh (round-trip and scope-resolution tests) fully green on this stack

Stack

PR Slice Base
#36517#36518#36522 SWC parity stack (open) pr-36173
#36704 TDD fixtures (red) #36522's branch
#36705 Unknown-statement tolerance (Babel green) #36704's branch
#36706 SWC preservation + renames (SWC green) #36705's branch

Merge order: bottom up.

@meta-cla meta-cla Bot added the CLA Signed label Jun 6, 2026
@github-actions github-actions Bot added the React Core Team Opened by a member of the React Core Team label Jun 6, 2026
@poteto poteto force-pushed the lauren/ts-unknown-01-fixtures branch from fb7d3cc to 1e41304 Compare June 6, 2026 22:37
@poteto poteto force-pushed the lauren/ts-unknown-02-tolerance branch from 51f8c00 to 0efc429 Compare June 6, 2026 22:37
Babel can emit statement kinds the typed AST does not model (the
todo-ts-* fixtures pin three TS module-interop forms). Deserialization
previously failed the whole file on the first such node, while the TS
reference compiles the file and leaves the statement alone.

Statement gains a final #[serde(untagged)] Unknown(UnknownStatement)
variant carrying the complete raw node. Deserialization is hand-written
and dispatches modeled `type` tags through a KnownStatement helper so a
malformed modeled node still errors with its precise field-level
message instead of degrading to Unknown; only genuinely unmodeled tags
take the catch-all. The TS reference reaches its equivalent default
case only via assertExhaustive (Babel's closed types), so it crashes;
here unmodeled syntax is reachable by construction and degrades
instead: top-level statements are preserved verbatim through
re-serialization, and function-body occurrences record the standard
UnsupportedSyntax bailout with an UnsupportedNode instruction carrying
the raw node. A known_statements! macro is the single source for the
dispatch enum, its From mapping, and the tag list, so those three
cannot drift; a variant added to Statement but not the macro is the one
remaining silent gap, documented on the variant.

UnknownStatement caches BaseNode for position helpers; the scoped
with_raw_mut mutator refreshes the cache and rejects mutations that
strip `type`, so the two views cannot desync. Program-level analyses
treat Unknown explicitly: the gating reference-before-declaration scan
walks the raw node for identifier references (an `export = X` does
reference X), and the prefilter and return-analysis arms are
deliberately inert. SWC/OXC reverse converters emit a deliberate
runtime tripwire (a throw in generated code) for the arms that are
unreachable until the SWC forward conversion stops rewriting these
statements to EmptyStatement in the next slice.

Deserialization now materializes a serde_json::Value per statement
before typed parsing. The cost is one move-based tree rebuild per
nesting level at a one-time boundary; the previous derive also buffered
every node through serde's internal Content to read the tag, so the
delta is allocation shape, not asymptotics.

Verified: ast unit tests including malformed/edge cases, a lowering
integration test pinning the function-body bailout, round_trip green on
the three fixtures, scoped and full Babel e2e green on all three with
events parity, cargo test --workspace green. The scope-resolution half
of test-babel-ast.sh is green on this stack's base and remains red
corpus-wide on the pr-36173 tip, whose node-ID migration removed
position-based keying while babel-ast-to-json.mjs still emits
offset-based scope JSON; that generator gap needs its own fix before
this stack rebases onto the tip. rust-port-0001-babel-ast.md's no-catch-all policy is
amended to document Statement as the deliberate exception.
@poteto poteto force-pushed the lauren/ts-unknown-01-fixtures branch from 1e41304 to e4141db Compare June 6, 2026 22:47
@poteto poteto force-pushed the lauren/ts-unknown-02-tolerance branch from 0efc429 to 64199c8 Compare June 6, 2026 22:47
@poteto
Copy link
Copy Markdown
Collaborator Author

poteto commented Jun 7, 2026

Closing: ported directly to the umbrella PR branch (rust-research, #36173) along with the rest of this stack.

@poteto poteto closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed React Core Team Opened by a member of the React Core Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant