01#3256
Open
Guajir0-code wants to merge 6 commits into
Open
Conversation
After the model edits a .rs file via edit_file/write_file, the runtime now automatically runs cargo check, clippy, fmt --check, and test on the owning crate and folds the result back into the tool_result. If any step fails, is_error=true forces the model to correct on the next iteration instead of waiting for the user to notice. - New verifier module with Verifier trait and CargoVerifier impl (manifest discovery, subprocess timeout, output truncation preserving error/warning lines, early-exit after first failure). - RuntimeVerifierConfig wired through settings.json with nested schema validation, precedence User/Project/Local. - ConversationRuntime integrates the verifier between post-hook and the tool_result, with record_verifier_ran telemetry. - CLI wires CargoVerifier from config. - 12 e2e tests spawn real cargo against temp crates to cover passing code, type errors, fmt violations, timeouts, step skipping after failure, nested files, alternate path keys, and malformed input. Also clears pre-existing clippy/compile errors in unrelated crates (ApiError missing suggested_action in 4 CLI tests, map_unwrap_or, duration_suboptimal_units, trailing commas, result_large_err) so the workspace passes cargo clippy --workspace --all-targets -D warnings and cargo test --workspace end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bring in the staged verifier rework from upstream (Rust / Node-TS / Python adapters, quick+final phases, structured VerificationReport, final-gate loop), the new Verification message role, getrandom-based OAuth PKCE generation, and the Windows-compatible hook/MCP test infrastructure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- verifier: emit Unavailable reports on Node package.json IO/parse errors instead of silent None (Bug 2) - verifier: drop dead final_phase param from verify_node/verify_python - verifier: scope CARGO_TERM_COLOR=never to cargo invocations only - verifier: remove dead VerificationReport::target() - conversation: make_final_gate_reminder report_id now includes adapter_id and mutation_sequence (Bug 3 — prevents collisions across adapters) - conversation: when run_final_verification returns None, emit synthetic Unavailable report and advance ledger instead of silent continue (Bug 1) - conversation: cap final-gate attempts at MAX_FINAL_GATE_ATTEMPTS=5 per (adapter, root); emit aborted Unavailable report on overflow (Bug 4) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous fix keyed the counter only by (adapter, project_root), so when the model edited code mid-turn and mutation_sequence advanced, the prior attempts carried over and prematurely tripped the cap on otherwise-valid work. Key the counter by (adapter, root, mutation_sequence) so each snapshot gets its own budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merges complete edit→verify→fix pipeline into the repo: Trunk (phases 1-5): - Rich StepDiagnostics across Rust/Node/Python adapters - Change-scoped verification via nearest manifest walk - VerificationReport content block with shadow/text/typed report modes - RuntimeVerifierMode::Auto + CLAUDE_CODE_VERIFIER_AUTO - Parallel bash validation wiring (permission_enforcer + tools) - verifier_ran telemetry with adapter/phase/failure_kind/mutation_sequence Post-trunk modules: - runtime::critic — CriticPlanner with subagent_depth guard, diff thresholds (>=4 files OR >=200 lines OR >1 root), per-mutation dedup; wired into LiveCli post-turn pipeline - runtime::rollout_metrics — aggregate(), evaluate_budget_gates(), samples_from_traces() with 1pp/5%/10%/15% regression limits - promote_auto_skill + run_promote_auto_skill_cli — 3-fixture replay + human approval + 10% token budget gate for auto-generated skills - Explicit marker-based adapter detector for auto-mode verification - Dedicated bash permission parity tests Telemetry: turn_completed now emits turn_latency_ms + tokens_total. Validation: cargo fmt, clippy -D warnings, cargo test --workspace all green (+27 new tests across runtime, tools, cli). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Latest main already contains the functional BrokenPipe tolerance in plugins::hooks::CommandWithStdin::output_with_stdin, but the only coverage for the original CI failure was the higher-level plugin hook test. Add a deterministic regression that exercises the exact low-level EPIPE path by spawning a hook child that closes stdin immediately while the parent writes an oversized payload. This keeps the real root cause explicit: Linux surfaced BrokenPipe from the parent's stdin write after the hook child closed fd 0 early. Missing execute bits were not the primary bug. Constraint: Keep the change surgical on top of latest main Rejected: Re-open the production code path | latest main already contains the runtime fix Rejected: Inflate HookRunner payloads in the regression | HOOK_* env injection hit ARG_MAX before the pipe path Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep BrokenPipe coverage near CommandWithStdin so future refactors do not regress the Linux EPIPE path Tested: cargo test -p plugins hooks::tests::collects_and_runs_hooks_from_enabled_plugins -- --exact (10x) Tested: cargo test -p plugins hooks::tests::output_with_stdin_tolerates_broken_pipe_when_child_closes_stdin_early -- --exact (10x) Tested: cargo test --workspace Not-tested: GitHub Actions rerun on the PR branch
|
This isn't even a PR what 😭 you just used ai to generate and refactor a bunch of files |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Anti-slop triage
Verification
git diff --checkpasses.Resolution gate