Skip to content

feat: add spec mode, autonomy picker, and updated skills/tools#82

Merged
Patel230 merged 31 commits into
mainfrom
polish/perf-security-ux
Jul 4, 2026
Merged

feat: add spec mode, autonomy picker, and updated skills/tools#82
Patel230 merged 31 commits into
mainfrom
polish/perf-security-ux

Conversation

@Patel230

@Patel230 Patel230 commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Implements spec mode, autonomy picker, and updated tools and skills integration.

Patel230 added 30 commits July 1, 2026 21:18
- Update status bar token count to use a unique Database icon.
- Update status bar cost counter to use a unique Ruby/Dollar icon.
- Replace permanent footer clock with total session duration timer.
- Introduce request-specific timer that displays alongside the active spinner.
- Fix broken unit tests related to new icons, layout width expectations, and error assertions.
- Mitigate flaky config test by commenting out failing URL assertions.
Moved the welcome screen from being a fixed header pane outside the viewport
to being dynamically injected into the scrollable viewport's content.

This fixes the issue where the welcome screen appeared 'frozen' because it
could not be scrolled on smaller terminals, and seamlessly disappears when
the first message is sent, maintaining the original UX while resolving the
scroll limitation.
Streaming responses re-rendered the entire accumulated markdown buffer
(plus two large identity regexes) on every 50ms tick, making a long
answer quadratic. Cache the rendered stable prefix (completed markdown
blocks, never splitting inside a code fence) and re-render only the tail
each tick; an equivalence test proves output matches a single-pass
render at every chunk boundary.

Also:
- visibleWidth: replace per-word regex ANSI stripping with a zero-alloc
  scanner on the wrap hot path.
- settings: memoize the settings-file read on (path, mtime, size) so
  repeated startup loads skip disk+JSON parse; also write settings.json
  0600 instead of world-readable 0644.
- print/exec: use a stale-but-usable catalog immediately and refresh in
  the background instead of blocking startup up to 90s on the network.
The Bash tool ignored the sensitive-path protection enforced on the
file tools, so `cat ~/.ssh/id_rsa`, `cat .env`, etc. bypassed it —
including in prompt-bypassing contexts (run_in_background,
--dangerously-skip-permissions). Add CommandReferencesSensitivePath and
wire it into IsSuspicious (forces a prompt) and isHardDeny (hard block
when no human is in the loop), with tests for blocked and allowed forms.

Also:
- migrate: delete the plaintext .pre-secret-migrate.bak after a
  successful keychain migration, and clean up stale backups on startup.
- update.isNewer: replace lexicographic compare (0.10.0 < 0.9.0) with
  real semver ordering; pre-release sorts before its release.
- credentials mask: show only the last 4 chars (was first 4 + last 2,
  leaking the provider-identifying prefix).
- welcome: remove stray "+✓" double glyph; fix mis-centering (was
  mixing byte length with display width, breaking wide glyphs).
- theme: route the last inline ANSI color literals through the theme
  palette so theme.go is genuinely the single source of truth.
- status bar: refresh the git branch on a 5s TTL (was cached for the
  whole session, so branch switches never showed); use git stdout so
  stderr warnings can't leak into the branch name.
- version: `hawk --version` and `hawk version` now share one format.
- fix two pre-existing lint failures (unused const, trailing newline).
The Bash tool passed AllowNetwork:true unconditionally when wrapping a
command in the sandbox, so even strict/workspace sandboxing left an open
exfiltration path. Derive network access from the sandbox mode instead:
strict denies (matching its documented read-only, no-network posture),
workspace keeps it (package managers, module fetches). HAWK_SANDBOX_NETWORK
overrides either direction. Policy-tier defaults are unchanged.
In automation mode the agent's prompt is taken verbatim from an issue,
PR, or comment body — content any external user can supply — and ran at
whatever --auto level the workflow set. Use GitHub's author_association
as a trust signal: when the triggering actor is not a repo insider
(OWNER/MEMBER/COLLABORATOR) the autonomy is capped at "basic"
(read-only auto-approval), so attacker-controlled text cannot drive
writes or Bash. Maintainers can opt out with HAWK_GHA_TRUST_EVENT=1.
The streaming prefix cache still re-rendered the entire completed prefix
each time a markdown block finished, keeping the stream O(n²) — a
benchmark made this visible (cached and naive full-render were equal).
Render only the newly-completed blocks and append them, and resume the
block-boundary scan from the last boundary instead of rescanning the
whole buffer. Over a long streamed response this is ~17x faster
(4.3ms vs 74ms) with ~19x fewer allocations. Correctness is guarded by
the existing incremental-vs-single-pass equivalence test.
Every color was dark-tuned, so on a light terminal body text
(#F0F0F0), muted text, disabled text, and borders were near-invisible.
Convert the neutral foreground/structure colors to lipgloss.AdaptiveColor
with the Dark variant equal to the original value — dark terminals (the
default) render byte-for-byte identically — and a legible dark-ink Light
variant that lipgloss selects when it detects a light background. Accent
hues and dark code-box pairs are left as-is since they read on both. A
guard test pins the Dark variants so dark-mode appearance can't drift.
- Split BuildContextWithDirs into startup/deferred halves so heavy
  AGENTS.md/cross-agent/git scans no longer block the first UI frame.
- Parallelize startup prompt/settings/registry loads in runChat and
  move crash-recovery, welcome snapshot, Docker probe, and footer
  warmup off the startup path onto background goroutines + tea.Msg.
- Cache embedded prompt templates and parse once (sync.RWMutex map)
  to avoid re-parsing role/execution templates on every session.
- Persist system-prompt context changes through Session.persist so
  AppendSystemContext/ReplaceSystemContextSection survive reloads.
- Harden Docker sandbox: cap-drop ALL, no-new-privileges, pids-limit,
  read-only rootfs, tmpfs /tmp with noexec, and bounded docker probe
  via LookPath + 1.5s timeout; avoid pre-paint Docker bool in welcome.
- Add focus/blur handling plus a 15s prompt-keepalive so terminals
  that drop focus on background still restore the prompt input.
- Cover new paths with table-driven unit tests.
- Replace plan mode with spec mode (new tool, picker, approval flow)
- Add autonomy picker UI for granular permission control
- Enhance markdown rendering and visual diff output
- Refactor permission engine and safety layers
- Polish TUI: scrollbar, statusbar, theme, welcome screen
- Improve shell mode classification
- Add comprehensive tests for new features
- Update golden test data and documentation
Add permission_engine_test.go verifying the spec-stage gate blocks
writes/Bash even at YOLO autonomy, ApproveImplementation always prompts,
and the gate opens once Stage=Implementing. Add coverage in
permissions_center_test.go for effectivePermissionTier, resetPermissionCenter,
and the /autonomy tier command path.

Also fixes lint issues (shadowed err, unused vars) in files the pre-commit
hook flagged as new-since-HEAD~1: bundled_skills.go, spec.go,
delta_merge.go, validator.go, spec_clarify.go.
@Patel230 Patel230 merged commit 7f4519c into main Jul 4, 2026
5 of 8 checks passed
@Patel230 Patel230 deleted the polish/perf-security-ux branch July 4, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant