Skip to content

fix(vendor): make scan --vendor / vendor / --prune just work#112

Closed
Mikola Lysenko (mikolalysenko) wants to merge 5 commits into
mainfrom
fix/vendor-ux
Closed

fix(vendor): make scan --vendor / vendor / --prune just work#112
Mikola Lysenko (mikolalysenko) wants to merge 5 commits into
mainfrom
fix/vendor-ux

Conversation

@mikolalysenko

Copy link
Copy Markdown
Collaborator

Summary

Running socket-patch scan --vendor on Flowise (pnpm monorepo, clean install) vendored only 7/10 patches, and --prune never cleaned vendored state. This PR fixes all four root causes, each verified live against Flowise (now 10/10 vendored, pnpm install --frozen-lockfile green, prune dry-run shows zero false positives).

1. Percent-decode purl components (fix(purl))

The patches API serves scoped purls percent-encoded (pkg:npm/%40modelcontextprotocol/sdk@1.12.0) and scan stores them verbatim as manifest keys — but neither the npm crawler nor the vendor coordinate parser decoded them, so lookups against node_modules/@scope/... never matched (package not installed from both apply and vendor), and detect_prunable saw every encoded entry as always-prunable.

  • New percent_decode_purl_component / normalize_purl / purl_eq in utils/purl.rs (strict, all-or-nothing, fail-safe passthrough; compare/display only — never path construction)
  • Decoding runs AFTER the /-and-@ splits and BEFORE the is_safe_* path guards, so %2e%2e/%2f cannot smuggle a traversal past them (RED tests included)
  • NpmCoords carries decoded name/version; base_purl and ledger keys stay verbatim-encoded for manifest parity

2. Vendor auto-force on content mismatch (feat(vendor))

The vendor stage is a private copy and every apply write path is hash-gated to exactly afterHash (archive + blob pre-write checks; the diff path self-disables on a base mismatch), so a beforeHash mismatch no longer fails the vendor: the stage is overwritten with the verified patched content and surfaced as a vendor_content_mismatch_overwritten warning event. Missing patch-target files still fail closed without --force (force's silent NotFound skip would pack an artifact without the fix).

  • Shared force_apply_staged policy used by all npm flavors + cargo/composer/gem/pypi/golang backends; dry runs predict the same outcome
  • First vendor of a package already patched in place by apply now emits applied (it packed + rewired this run) instead of a miscounted skipped/already_vendored
  • scan --vendor pre-verifies baselines and annotates mismatched packages before the confirm prompt
  • Real-world trigger: the flatted@3.3.1 patch record's beforeHashes match no published flatted artifact (bad data upstream, reported separately) — vendoring now succeeds with a clear warning instead of stranding the user

3. Take over exact-version override pins (feat(vendor))

A user-authored override pinning the exact version being vendored (Flowise: pnpm.overrides "tar-fs": "3.1.0") no longer refuses with vendor_override_conflict. The pin's key keeps its spelling on both pnpm surfaces (pnpm hard-requires the package.json and lock override maps to agree), its value moves to the file: spec, and the pinned value is recorded as the wiring original — every revert path restores the user's pin verbatim. Yarn berry bare-name resolutions get the symmetric treatment. Ranges, parent>child selector chains, different versions, and duplicate same-name keys still refuse (now with a hint). npm/yarn-classic/bun wire the lock only, so no conflict surface exists there.

4. Prune lifecycle for vendored packages (feat(scan))

scan --prune previously blanket-exempted vendored purls, so nothing ever cleaned unused vendored state. The prune pass now runs a vendored-state GC first (under the apply lock; contention degrades to a skip):

  • (a) entries whose patch left the manifest are reverted
  • (b) entries whose dependency left the lockfile graph are reverted and their manifest entries dropped — per-flavor in-use probes (pnpm scans packages:/snapshots: blocks, excluding the lingering overrides: declaration; package-lock/yarn/bun probe the lock text for the uuid dir). Undeterminable → keep, fail-safe; detached entries exempt (lockfile-invisible by design)
  • (c) orphan .socket/vendor/<eco>/<uuid> dirs are swept

JSON gc gains revertedVendoredEntries/removedVendorOrphanDirs (wet) and revertableVendoredEntries/vendorOrphanDirs (preview, which mirrors the wet pass's manifest drops so blob counts agree). CLI_CONTRACT.md updated (prune section, --force semantics, new warning code).

Test plan

  • ~60 new tests, RED-first where the bug was reproducible: decode/traversal units, mismatch-overwrite + missing-file + already-applied npm/pnpm/pypi units, pnpm + yarn-berry takeover round-trips (byte-identical revert oracles), in-use probe units, run_vendor_gc units (dropped/unused/dry-run/detached/orphans), in-process vendor envelope tests, and scan e2e (encoded scoped purl end-to-end incl. --prune exemption, pre-prompt annotation, prune reverting an unused vendored entry)
  • cargo test --workspace --all-features: 101/102 suites green; the one red is the pre-existing setup_matrix_composer aspirational suite ("BASELINE GAP … non-blocking in CI"), untouched by this branch
  • cargo clippy --workspace --all-features clean; corepack-gated npm/pnpm build capstones ran real package managers
  • Live on Flowise: scan --vendor → "Vendored 3 package(s); 13 skipped; 0 failed" (previously 0 vendored, 2 failed, 1 refused); pnpm install --frozen-lockfile accepts the takeover surgery; installed flatted hashes to afterHash 6/6 files; scan --prune --dry-run reports nothing revertable on the fully in-use project

🤖 Generated with Claude Code

The patches API serves scoped purls percent-encoded
(pkg:npm/%40scope/name@1.0.0) and scan stores them verbatim as manifest
keys, but neither the npm crawler nor the vendor coordinate parser
decoded them — so apply/vendor reported scoped packages as 'package not
installed', and detect_prunable saw every encoded entry as prunable.

- utils/purl.rs: percent_decode_purl_component (strict, all-or-nothing,
  fail-safe passthrough), normalize_purl + purl_eq (compare/display
  only, never path construction)
- npm_crawler parse_purl_components, vendor parse_npm_purl (NpmCoords
  now owns decoded name/version; base_purl stays verbatim for ledger/
  manifest key parity), parse_jsr_purl: decode AFTER /-and-@ splits,
  BEFORE the is_safe_* guards — %2e%2e/%2f cannot smuggle traversal
- detect_prunable + purl_matches_identifier compare normalized forms
- human output shows the decoded purl; JSON keeps verbatim keys

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…y-applied events

The vendor stage is a private copy and every apply write path is
hash-gated to exactly afterHash, so a beforeHash mismatch (a patch built
against different bytes than the installed artifact, or a package
already patched in place by apply) no longer fails the vendor: the
stage is overwritten with the verified patched content and the
overwrite surfaces as a vendor_content_mismatch_overwritten warning
event. Missing patch-target files still fail closed without --force
(force's silent NotFound skip would pack an artifact without the fix).

- shared force_apply_staged / missing_existing_patch_files /
  mismatch_overwrite_warnings policy helpers in vendor/mod.rs, used by
  all npm flavors (via stage_patch_pack) + cargo/composer/gem/pypi/
  golang backends; dry runs predict the same outcome
- vendor.rs: gate the already_vendored rewrite on entry.is_none() — the
  first vendor of an in-place-applied package now emits Applied (it
  packed + rewired this run) instead of a miscounted skip
- scan --vendor: pre-prompt baseline check annotates mismatched
  packages before the confirm prompt (best-effort, read-only)
- --force narrowed to missing-file tolerance + variant-probe bypass;
  CLI_CONTRACT.md documents the new warning code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A user-authored override/resolution that pins the package to exactly
the version being vendored (Flowise: pnpm.overrides 'tar-fs': '3.1.0')
no longer refuses with vendor_override_conflict. The pin's key is kept
(its spelling and quoting preserved on both pnpm surfaces — pnpm
hard-requires the package.json and lock override maps to agree), its
VALUE is rewritten to the file:.socket/vendor/... spec, and the pinned
value is recorded as the wiring original so every revert path (--revert,
reconcile, remove) restores the user's pin verbatim.

- pnpm: classify_pkg_override (Insert / Ours / Takeover) replaces the
  boolean conflict checks; effective key threads through EditCtx,
  apply_pkg_override and edit_overrides; revert restores originals in
  place instead of deleting. Ranges, different versions, parent>child
  selector chains, and duplicate same-name keys still refuse, now with
  a hint that exact pins are taken over.
- yarn berry: bare-name resolutions pin equal to the version is taken
  over symmetrically (KIND_RESOLUTION records the original).
- npm/yarn-classic/bun wire the lock only (no override surface), so
  no conflict exists there to take over.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
scan --prune previously blanket-exempted vendored purls, so nothing
ever cleaned unused vendored state: dropped patches kept their
artifacts and overrides forever, removed dependencies stayed redirected,
and orphan uuid dirs were only swept by vendor --revert.

The prune pass now runs a vendored-state GC first (under the apply
lock; contention degrades to a skip, never a scan failure):

(a) entries whose patch is gone from the manifest are reverted
    (same stale test as the vendor flows' reconcile_dropped);
(b) entries whose dependency left the lockfile graph are reverted and
    their manifest entries dropped, feeding the same pass's blob sweep.
    Per-flavor in-use probes: pnpm scans packages:/snapshots: blocks
    for the artifact (the mirrored overrides: declaration alone is not
    usage); package-lock/yarn/bun probe the lock text for the uuid dir
    (those flavors wire resolutions into the lock itself). None =
    cannot determine = keep, fail-safe; detached entries are exempt
    (lockfile-invisible by design);
(c) orphan .socket/vendor/<eco>/<uuid> dirs are swept (extracted from
    run_revert into a shared sweep_orphan_vendor_dirs).

JSON gc gains revertedVendoredEntries/removedVendorOrphanDirs (wet)
and revertableVendoredEntries/vendorOrphanDirs (preview, which also
mirrors the wet pass's manifest drops so blob counts agree); human
output gains a GC summary line. CLI_CONTRACT.md updated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ne lifecycle

- scan_vendor_e2e: full pipeline with the API's percent-encoded scoped
  purl form (download -> vendor lookup against node_modules/@scope ->
  lock rewiring -> prune exemption); interactive pre-prompt baseline
  annotation + auto-force warning; scan --prune reverting an unused
  vendored entry (ledger + manifest + artifact + lock all reconciled)
- clippy: too_many_arguments allow on stage_patch_pack, JsrPurlParts
  type alias

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mikolalysenko

Copy link
Copy Markdown
Collaborator Author

Folded into #114, which now targets main and carries this branch's commits (the stack #112#113#114 was combined into a single PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant