Generative testing: grammar-derived inputs + by-construction consistency (#25) by johnsoncodehk · Pull Request #28 · johnsoncodehk/monogram

johnsoncodehk · 2026-06-08T13:37:41Z

Closes #25.

The source IS a grammar, so the same combinator object the parser / highlighter / tree-sitter derive from is also a generator. Walking its rule IR emits guaranteed-legal inputs — replacing "hope the corpus contains the shape" (the blind spot that hid #23/#24 from a monogramWrong=0 metric) with systematic, deterministic, grammar-derived coverage. This PR implements #25 plus a follow-on roadmap that pushes the generator to deterministic precision and operationalizes its findings.

#25 core — the method

test/grammar-gen.ts — a generic, language-agnostic walker over the shared RuleExpr IR. Every per-language fact (indent tokens, flow brackets, compact indicators, markup delimiters) is read from grammar.indent / .markup, never hardcoded. test/generative.ts — two by-construction judges, no external oracle: round-trip (every derivation parses as its rule) and scope ≡ role (the flat highlighter's scope at each parsed token must match the token's by-construction role; floor-blind, so a - mis-painted as string is caught — the blind spot the role-graded scope-gap metric had). It re-surfaces #23/#24 by construction (verified by reverting each fix), with mustCover asserting the corpus keeps containing both shape-classes.

Roadmap (this PR, on top of the core)

Deterministic per-token coverage (tokenCover) — directed descent to each scoped token via the distTo BFS, so numerics / regex / etc. (which the shallow corpus never reached) are graded. TS scope-checked tokens 157→326.
Determinism — replaced the one random strategy (fuzz) with deterministic t-wise systematic coverage (complete to 3-wise); generateInputs(grammar) is now a pure function (seed eliminated). This is what makes the gap ledger commit-trackable, and it makes the tool faithful to its own "systematic, not a representativeness bet" thesis: discovery is bounded by generator PRECISION, not random luck.
Precision — make the known gap shape-classes systematically producible (markup no-space self-close; block-context [-in-scalar; block scalars |/>), so the deterministic check finds them on purpose.
Gap ledger (test/gap-ledger.ts → KNOWN-GAPS.md) — collects the discovered divergences, delta-debug-minimizes each (<aA aA = "a"/> → <A A=""/>), classifies via the neutral oracle (typescript / yaml / parse5; over-accepts dropped), and fingerprints for cross-commit identity. Deterministic, regenerated with --write, CI-gated up-to-date with --check. It currently lists the HTML/Vue self-close / gap — a real, valid-input divergence the corpus-bound scope-gap metric is blind to (the / is a lexical-floor punct role).
Comment coverage — comments are skip tokens (no CST leaf), so the judge couldn't see them; closed via deterministic comment injection recorded as witnesses in GenInput.tokens (its first consumer) + a judge arm grading each witness span. 0→N comment spans graded per language (YAML 442, TS 46, …), proven non-trivial (mutating a comment scope fails the gate).

Test-suite cleanup (the #25 part-2)

Deleted 9 dev-only scratch probes (each confirmed not a CI gate). Folded the per-language scope-gap + src-coverage adapters into two data-driven drivers (scope-gap-run.ts / src-coverage-run.ts) + a config table, <lang> preserved as a parameter — byte-identical output to the old adapters (the README coverage-table algorithm is unchanged; only the dispatch table was rewired). The thicker html / yaml / vue adapters are delegated to.

Boundary

Round-trip proves only self-consistency — never that the parser matches an external semantic boundary — so conformance / scope-gap-vs-official / repo-compat and the negative tests all stay. The gap fixes for what the ledger finds are a separate concern (a fix-highlighter-gaps branch), so this PR demonstrates the tool FINDING them.

Verification

npm run gen byte-identical · tsc --noEmit clean · sanity 15/15 · yaml-depth-witnesses 10/0 · agnostic 9/9 · generative 7/7 + depth-site 2/2 · gap-ledger deterministic + --check clean + selftest · coverage-table end-to-end. CI runs node test/generative.ts, the gap-ledger selftest, and gap-ledger --check.

…ncy (#25) Walk the shared combinator IR to emit guaranteed-legal inputs for any Monogram grammar, replacing corpus sampling with systematic, bounded coverage — the lever a normal highlighter lacks (the source IS a grammar). Two by-construction judges, no external oracle: - round-trip: every generated derivation parses as the rule it was rooted at (parser self-consistency); the structured strategies are ~88% legal, fuzz is exploratory (random choices wander outside the IR's context constraints). - scope ≡ role: the flat highlighter's scope at each parsed token must agree with the token's by-construction role (the scope the grammar declares). Where they disagree is the #23/#24 class — a value-leading `---` the parser keeps a plain scalar but a flat grammar mis-scopes as a marker; an inner sequence `-` the parser knows is an indicator but a flat grammar folds into a string. Floor-blind (compares the punctuation class directly), so a `-` painted string is caught. The check independently re-surfaces both: a directed-nesting derivation produces `- - x\n - x` (#24); the anchored-marker scan catches a value-leading marker misfire (#23). Verified by reverting each fix — the gate fires — and depth-site coverage is asserted so generation can't silently stop exercising them. Test-suite cleanup alongside: - delete 9 dev-only scratch / superseded probes (each confirmed not a CI gate). - fold the per-language scope-gap + src-coverage adapters into two data-driven drivers (scope-gap-run.ts / src-coverage-run.ts) + a config table, the per-language entry preserved as a <lang> parameter. Output byte-identical to the old adapters; coverage-table.ts and package.json rewired. The thicker html / yaml / vue adapters keep their files and are delegated to. Adds: grammar-gen.ts (the walker), generative.ts (the judges), curated-corpora.ts. CI runs node test/generative.ts.

…overage The generated legal corpus never reached whole scoped token classes the scope≡role judge checks — for TypeScript, numerics (Hex/Octal/Binary/BigInt/ Number), because the legal corpus is shallow/structural and never lands on an expression-position literal (proven: raising cap/fuzz still yields zero numerics). Add a 5th strategy `tokenCover`: for each scoped, samplable token, descend the SHORTEST path from the entry rule that references it (reusing the distTo/exprDist BFS), build a minimal legal context (fillContent/minExpand), and substitute sampleVariants. Deterministic and minimal-context, so it stays cheap on the large TS grammar (no depth strategies for token-stream). Also sweep all top-level token-pattern `alt` branches in sampleVariants (so a Number emits hex/oct/bin/ float/bigint, not just `0`), guarded against the interesting-literal embed for decimal-start / start()-anchored tokens (no `-0x1`, no broken column-0 anchor). TS declared-scope tokens checked 157→326 (numerics now graded); generative 7/7 consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9.

Generation was seed-dependent — different opts.seed → different fuzz outputs → different "discovered" divergences. That's fatal for a reproducible gap ledger (random testing shows presence, not absence, and can't be tracked across commits) and contradicts the project's own "systematic, not a representativeness bet" thesis. The only random STRUCTURE was `fuzz` (this.rand for alt/quantifier choices); enum/ nestChain/tokenCover already rotate on a variant index. Replace fuzz with `cover`: the same walk, but every production choice comes from a deterministic mixed-radix Chooser indexed by round i alone — the first few choice points form a full base-N cartesian (t-wise interaction coverage by construction: measured complete to 3-wise), the tail perturbed by rotations. this.rand is seeded from a fixed constant; opts.seed is now a no-op. generateInputs(grammar) is a pure function of the grammar: byte-identical across runs for all 7 languages. 7/7 consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9. Foundation for a deterministic, commit-trackable gap ledger.

…ible Deterministic generation found 0 divergences — the gaps random fuzz hit were luck, and the deterministic generator couldn't produce those shapes. Discovery is bounded by generator PRECISION, not luck; so make the known gap shape-classes producible (config-derived, no language names): - markup: a NO-SPACE (tight) render variant + a directed `markupSelfCloseAttr` producer so `<img src="a"/>` (quoted attr flush against `/>`) forms. The HTML/Vue self-close `/` gap now surfaces deterministically under "discovered": «/» got «string.unquoted.html». - indent: sample plain scalars from `blockPattern` + splice a flow bracket mid-token, and directed `indentExplicitKeyBracket` producer, so `? k [y : …` forms (round-trips). - indent: `indentBlockScalar` synthesis for the `never()`-token block scalar `|`/`>` (introducer + deeper-indented body), so `string.unquoted.block` is covered (was 0%). Deterministic preserved (generateInputs pure); 7/7 gated-clean; depth-site 2/2 (#23/#24 intact); agnostic 9/9. Honest finding: the YAML explicit-key `[` divergence is a `name`-bucket scope (entity.name.tag), which the scope≡role gates (literal→content, anchored-marker) structurally don't flag — a check-precision item for a follow-up, distinct from producibility (which is now done). The HTML `/` is unambiguously gate-1.

…WN-GAPS.md) Operationalize the scope≡role check's "discovered" divergences into a committed, commit-trackable ledger instead of console output that vanishes. test/gap-ledger.ts: for each language, collect the discovered divergences (reusing the EXACT detection, factored into generative-detect.ts so generative.ts's gate is unchanged), MINIMIZE each via delta-debugging to a stable minimal repro, CLASSIFY via the neutral oracle (typescript/yaml/parse5) keeping only oracle-VALID-input gaps (over-accepts dropped), and FINGERPRINT (content hash, stable across commits). Emits KNOWN-GAPS.md (human + machine-readable), regenerated with `--write`, gated up-to-date with `--check`. Deterministic: two runs → byte-identical ledger. Currently 2 gaps, 0 dropped — the HTML/Vue self-close `/` mis-scope (`<aA aA = "a"/>` ddmin-minimized to `<A A=""/>`), the floor-blind divergence the corpus-bound scope-gap metric can't see. CI runs the selftest + `--check`. generative 7/7 unchanged; agnostic 9/9; deterministic. The fixes for these gaps live on a separate branch (highlighter product changes), so the ledger here demonstrates the tool FINDING them; a later layer can reconcile the ledger into GitHub issues.

Comments are skip:true tokens — the parser drops them, so they are never CST leaves, so the scope≡role judge (which walks the parser's CST) never checked the highlighter's comment scopes (0% covered). Closing it needs a witness the GENERATOR records, not a parser leaf. 4a — deterministic comment injection at one safe position per mode (config-derived, no `//`/`#`/`` after a tagClose. A re-parse-and-drop net keeps round-trip clean; the injected comment is recorded as a witness in GenInput.tokens (its first consumer), inheriting the host's tier. 4b — the judge grades each witness span: the flat highlighter must paint `comment` somewhere in it (same scopeBucket partition + leniency); a comment painted non-comment is unambiguous, so it GATES. Coverage hole closed 0→N graded per language (YAML 442, TS 46, …), all 0 uncolored today; proven non-trivial — mutating a comment scope makes every witness uncolored and the gate fail. Deterministic preserved; 7/7 + depth-site 2/2 (#23/#24); gap-ledger --check clean; agnostic 9/9.

johnsoncodehk added 6 commits June 8, 2026 21:19

johnsoncodehk merged commit 437a5ea into master Jun 8, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generative testing: grammar-derived inputs + by-construction consistency (#25)#28

Generative testing: grammar-derived inputs + by-construction consistency (#25)#28
johnsoncodehk merged 6 commits into
masterfrom
generative-testing

johnsoncodehk commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnsoncodehk commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

#25 core — the method

Roadmap (this PR, on top of the core)

Test-suite cleanup (the #25 part-2)

Boundary

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johnsoncodehk commented Jun 8, 2026 •

edited

Loading