Skip to content

Generative testing: grammar-derived inputs + by-construction consistency (#25)#28

Merged
johnsoncodehk merged 6 commits into
masterfrom
generative-testing
Jun 8, 2026
Merged

Generative testing: grammar-derived inputs + by-construction consistency (#25)#28
johnsoncodehk merged 6 commits into
masterfrom
generative-testing

Conversation

@johnsoncodehk

@johnsoncodehk johnsoncodehk commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Closes #25.

The source IS a grammar, so the same combinator object the parser / highlighter / tree-sitter derive from is also a generator. Walking its rule IR emits guaranteed-legal inputs — replacing "hope the corpus contains the shape" (the blind spot that hid #23/#24 from a monogramWrong=0 metric) with systematic, deterministic, grammar-derived coverage. This PR implements #25 plus a follow-on roadmap that pushes the generator to deterministic precision and operationalizes its findings.

#25 core — the method

test/grammar-gen.ts — a generic, language-agnostic walker over the shared RuleExpr IR. Every per-language fact (indent tokens, flow brackets, compact indicators, markup delimiters) is read from grammar.indent / .markup, never hardcoded. test/generative.ts — two by-construction judges, no external oracle: round-trip (every derivation parses as its rule) and scope ≡ role (the flat highlighter's scope at each parsed token must match the token's by-construction role; floor-blind, so a - mis-painted as string is caught — the blind spot the role-graded scope-gap metric had). It re-surfaces #23/#24 by construction (verified by reverting each fix), with mustCover asserting the corpus keeps containing both shape-classes.

Roadmap (this PR, on top of the core)

  • Deterministic per-token coverage (tokenCover) — directed descent to each scoped token via the distTo BFS, so numerics / regex / etc. (which the shallow corpus never reached) are graded. TS scope-checked tokens 157→326.
  • Determinism — replaced the one random strategy (fuzz) with deterministic t-wise systematic coverage (complete to 3-wise); generateInputs(grammar) is now a pure function (seed eliminated). This is what makes the gap ledger commit-trackable, and it makes the tool faithful to its own "systematic, not a representativeness bet" thesis: discovery is bounded by generator PRECISION, not random luck.
  • Precision — make the known gap shape-classes systematically producible (markup no-space self-close; block-context [-in-scalar; block scalars |/>), so the deterministic check finds them on purpose.
  • Gap ledger (test/gap-ledger.tsKNOWN-GAPS.md) — collects the discovered divergences, delta-debug-minimizes each (<aA aA = "a"/><A A=""/>), classifies via the neutral oracle (typescript / yaml / parse5; over-accepts dropped), and fingerprints for cross-commit identity. Deterministic, regenerated with --write, CI-gated up-to-date with --check. It currently lists the HTML/Vue self-close / gap — a real, valid-input divergence the corpus-bound scope-gap metric is blind to (the / is a lexical-floor punct role).
  • Comment coverage — comments are skip tokens (no CST leaf), so the judge couldn't see them; closed via deterministic comment injection recorded as witnesses in GenInput.tokens (its first consumer) + a judge arm grading each witness span. 0→N comment spans graded per language (YAML 442, TS 46, …), proven non-trivial (mutating a comment scope fails the gate).

Test-suite cleanup (the #25 part-2)

Deleted 9 dev-only scratch probes (each confirmed not a CI gate). Folded the per-language scope-gap + src-coverage adapters into two data-driven drivers (scope-gap-run.ts / src-coverage-run.ts) + a config table, <lang> preserved as a parameter — byte-identical output to the old adapters (the README coverage-table algorithm is unchanged; only the dispatch table was rewired). The thicker html / yaml / vue adapters are delegated to.

Boundary

Round-trip proves only self-consistency — never that the parser matches an external semantic boundary — so conformance / scope-gap-vs-official / repo-compat and the negative tests all stay. The gap fixes for what the ledger finds are a separate concern (a fix-highlighter-gaps branch), so this PR demonstrates the tool FINDING them.

Verification

npm run gen byte-identical · tsc --noEmit clean · sanity 15/15 · yaml-depth-witnesses 10/0 · agnostic 9/9 · generative 7/7 + depth-site 2/2 · gap-ledger deterministic + --check clean + selftest · coverage-table end-to-end. CI runs node test/generative.ts, the gap-ledger selftest, and gap-ledger --check.

…ncy (#25)

Walk the shared combinator IR to emit guaranteed-legal inputs for any Monogram
grammar, replacing corpus sampling with systematic, bounded coverage — the lever
a normal highlighter lacks (the source IS a grammar). Two by-construction judges,
no external oracle:

- round-trip: every generated derivation parses as the rule it was rooted at
  (parser self-consistency); the structured strategies are ~88% legal, fuzz is
  exploratory (random choices wander outside the IR's context constraints).
- scope ≡ role: the flat highlighter's scope at each parsed token must agree with
  the token's by-construction role (the scope the grammar declares). Where they
  disagree is the #23/#24 class — a value-leading `---` the parser keeps a plain
  scalar but a flat grammar mis-scopes as a marker; an inner sequence `-` the
  parser knows is an indicator but a flat grammar folds into a string. Floor-blind
  (compares the punctuation class directly), so a `-` painted string is caught.

The check independently re-surfaces both: a directed-nesting derivation produces
`- - x\n  - x` (#24); the anchored-marker scan catches a value-leading marker
misfire (#23). Verified by reverting each fix — the gate fires — and depth-site
coverage is asserted so generation can't silently stop exercising them.

Test-suite cleanup alongside:
- delete 9 dev-only scratch / superseded probes (each confirmed not a CI gate).
- fold the per-language scope-gap + src-coverage adapters into two data-driven
  drivers (scope-gap-run.ts / src-coverage-run.ts) + a config table, the
  per-language entry preserved as a <lang> parameter. Output byte-identical to
  the old adapters; coverage-table.ts and package.json rewired. The thicker html
  / yaml / vue adapters keep their files and are delegated to.

Adds: grammar-gen.ts (the walker), generative.ts (the judges), curated-corpora.ts.
CI runs node test/generative.ts.
…overage

The generated legal corpus never reached whole scoped token classes the
scope≡role judge checks — for TypeScript, numerics (Hex/Octal/Binary/BigInt/
Number), because the legal corpus is shallow/structural and never lands on an
expression-position literal (proven: raising cap/fuzz still yields zero numerics).

Add a 5th strategy `tokenCover`: for each scoped, samplable token, descend the
SHORTEST path from the entry rule that references it (reusing the distTo/exprDist
BFS), build a minimal legal context (fillContent/minExpand), and substitute
sampleVariants. Deterministic and minimal-context, so it stays cheap on the large
TS grammar (no depth strategies for token-stream). Also sweep all top-level
token-pattern `alt` branches in sampleVariants (so a Number emits hex/oct/bin/
float/bigint, not just `0`), guarded against the interesting-literal embed for
decimal-start / start()-anchored tokens (no `-0x1`, no broken column-0 anchor).

TS declared-scope tokens checked 157→326 (numerics now graded); generative 7/7
consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9.
Generation was seed-dependent — different opts.seed → different fuzz outputs →
different "discovered" divergences. That's fatal for a reproducible gap ledger
(random testing shows presence, not absence, and can't be tracked across commits)
and contradicts the project's own "systematic, not a representativeness bet" thesis.

The only random STRUCTURE was `fuzz` (this.rand for alt/quantifier choices); enum/
nestChain/tokenCover already rotate on a variant index. Replace fuzz with `cover`:
the same walk, but every production choice comes from a deterministic mixed-radix
Chooser indexed by round i alone — the first few choice points form a full base-N
cartesian (t-wise interaction coverage by construction: measured complete to
3-wise), the tail perturbed by rotations. this.rand is seeded from a fixed constant;
opts.seed is now a no-op. generateInputs(grammar) is a pure function of the grammar:
byte-identical across runs for all 7 languages.

7/7 consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9. Foundation for a
deterministic, commit-trackable gap ledger.
…ible

Deterministic generation found 0 divergences — the gaps random fuzz hit were
luck, and the deterministic generator couldn't produce those shapes. Discovery is
bounded by generator PRECISION, not luck; so make the known gap shape-classes
producible (config-derived, no language names):

- markup: a NO-SPACE (tight) render variant + a directed `markupSelfCloseAttr`
  producer so `<img src="a"/>` (quoted attr flush against `/>`) forms. The HTML/Vue
  self-close `/` gap now surfaces deterministically under "discovered":
  «/» got «string.unquoted.html».
- indent: sample plain scalars from `blockPattern` + splice a flow bracket mid-token,
  and directed `indentExplicitKeyBracket` producer, so `? k [y : …` forms (round-trips).
- indent: `indentBlockScalar` synthesis for the `never()`-token block scalar `|`/`>`
  (introducer + deeper-indented body), so `string.unquoted.block` is covered (was 0%).

Deterministic preserved (generateInputs pure); 7/7 gated-clean; depth-site 2/2
(#23/#24 intact); agnostic 9/9. Honest finding: the YAML explicit-key `[` divergence
is a `name`-bucket scope (entity.name.tag), which the scope≡role gates (literal→content,
anchored-marker) structurally don't flag — a check-precision item for a follow-up,
distinct from producibility (which is now done). The HTML `/` is unambiguously gate-1.
…WN-GAPS.md)

Operationalize the scope≡role check's "discovered" divergences into a committed,
commit-trackable ledger instead of console output that vanishes. test/gap-ledger.ts:
for each language, collect the discovered divergences (reusing the EXACT detection,
factored into generative-detect.ts so generative.ts's gate is unchanged), MINIMIZE
each via delta-debugging to a stable minimal repro, CLASSIFY via the neutral oracle
(typescript/yaml/parse5) keeping only oracle-VALID-input gaps (over-accepts dropped),
and FINGERPRINT (content hash, stable across commits). Emits KNOWN-GAPS.md
(human + machine-readable), regenerated with `--write`, gated up-to-date with `--check`.

Deterministic: two runs → byte-identical ledger. Currently 2 gaps, 0 dropped — the
HTML/Vue self-close `/` mis-scope (`<aA aA = "a"/>` ddmin-minimized to `<A A=""/>`),
the floor-blind divergence the corpus-bound scope-gap metric can't see. CI runs the
selftest + `--check`. generative 7/7 unchanged; agnostic 9/9; deterministic.

The fixes for these gaps live on a separate branch (highlighter product changes), so
the ledger here demonstrates the tool FINDING them; a later layer can reconcile the
ledger into GitHub issues.
Comments are skip:true tokens — the parser drops them, so they are never CST leaves,
so the scope≡role judge (which walks the parser's CST) never checked the highlighter's
comment scopes (0% covered). Closing it needs a witness the GENERATOR records, not a
parser leaf.

4a — deterministic comment injection at one safe position per mode (config-derived, no
`//`/`#`/`<!--` hardcoded): token-stream → a no-newline block comment at an inter-token
space; indent → end-of-line `# c` outside flow; markup → `<!-- c -->` after a tagClose.
A re-parse-and-drop net keeps round-trip clean; the injected comment is recorded as a
witness in GenInput.tokens (its first consumer), inheriting the host's tier.

4b — the judge grades each witness span: the flat highlighter must paint `comment`
somewhere in it (same scopeBucket partition + leniency); a comment painted non-comment
is unambiguous, so it GATES. Coverage hole closed 0→N graded per language (YAML 442,
TS 46, …), all 0 uncolored today; proven non-trivial — mutating a comment scope makes
every witness uncolored and the gate fail.

Deterministic preserved; 7/7 + depth-site 2/2 (#23/#24); gap-ledger --check clean;
agnostic 9/9.
@johnsoncodehk johnsoncodehk merged commit 437a5ea into master Jun 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generative testing: grammar-derived inputs + by-construction consistency (with test-suite cleanup)

1 participant