diff --git a/rfc/0005-sandbox-proxy-egress-adapter/README.md b/rfc/0005-sandbox-proxy-egress-adapter/README.md new file mode 100644 index 000000000..cbbb97374 --- /dev/null +++ b/rfc/0005-sandbox-proxy-egress-adapter/README.md @@ -0,0 +1,404 @@ +--- +authors: + - "@johntmyers" +state: draft +links: + - https://github.com/NVIDIA/OpenShell/issues/1107 + - https://github.com/NVIDIA/OpenShell/pull/1083 + - https://github.com/NVIDIA/OpenShell/pull/1151 + - https://github.com/NVIDIA/OpenShell/pull/1286 + - https://github.com/NVIDIA/OpenShell/pull/1511 +--- + +# RFC 0005 - Sandbox Proxy Egress Adapter Model + + + +## Summary + +Refactor sandbox egress around one shared authorization and relay pipeline. +CONNECT, forward HTTP, native TCP capture, policy DNS, `inference.local`, +`policy.local`, and metadata loopback should become narrow adapters that +translate userland entry points into common runtime intents. Policy evaluation, +destination validation, credential injection, request-body rewrite, WebSocket +handling, protocol parsing, and upstream dialing should happen behind shared +boundaries. + +The codebase has already moved in this direction by splitting networking into +`openshell-supervisor-network` and process/netns work into +`openshell-supervisor-process`. This RFC proposes the next internal boundary: +make proxy entry mechanisms pluggable without duplicating authorization, +destination validation, or relay behavior. + +Supporting detail lives in: + +- [Current shape appendix](current-shape.md) +- [Technical design appendix](technical-design.md) +- [Implementation plan](implementation-plan.md) + +## Motivation + +The sandbox proxy supports several connection surfaces: explicit CONNECT, +forward HTTP, local inference and policy APIs, metadata loopback, TLS +termination, REST and GraphQL inspection, WebSocket inspection, credential +injection, and nftables-backed bypass detection. These features are valuable, +but changes to policy and enforcement still tend to touch multiple entry paths. + +The risk is asymmetric enforcement. A security fix can be added to CONNECT and +missed in forward HTTP; endpoint metadata can be selected differently from the +logged policy; a credential path can gain request-body or WebSocket support +without the same behavior existing in another relay mode. + +The target shape separates three concerns: + +- **Adapters** describe how userland reached the networking component. +- **Authorization** decides whether the egress is allowed and what endpoint + behavior applies. +- **Relays** own bytes, credentials, protocol parsing, and upstream dialing. + +This also prepares the proxy for future deployment modes. Today the proxy runs +inside the sandbox supervisor process. The networking leaf can already run in a +network-only mode, and a future standalone binary or sidecar should be possible +if it implements the same userland surfaces, gateway APIs, and policy +enforcement contracts. + +## Non-goals + +- Replace CONNECT with forward proxy as the only explicit proxy mode. +- Add SOCKS support. +- Add HTTP/2 L7 parsing in this refactor. Inspected HTTP paths should continue + to reject unsupported h2c upgrades instead of silently upgrading to raw + traffic. +- Redesign provider credential storage. +- Reintroduce iptables as the sandbox packet filtering backend. +- Use eBPF connect hooks for transparent capture. Native TCP capture needs a + userland proxy in the byte stream for TLS termination and protocol parsing. + +## Proposal + +### Migration Big Rocks + +1. **Transport and local-service adapters.** CONNECT, forward HTTP, + transparent TCP, policy DNS, `inference.local`, `policy.local`, and metadata + loopback become small adapters. They parse their surface and produce either + an egress intent, a local response, or a DNS answer. They do not duplicate + policy evaluation. +2. **Egress intent and decision.** Shared authorization evaluates L4 policy and + endpoint selection once per connection intent and returns one decision + containing the matched policy, matched endpoint, process identity, allowed + IP metadata, TLS behavior, protocol enforcement, and credential injection + plan. +3. **Relays.** Relays receive an authorized destination connector, not an + already-open upstream socket. HTTP relays evaluate every request before + upstream write. TCP relays copy bytes for L4-only endpoints or hand the + stream to a protocol parser when endpoint policy requires native protocol + enforcement. + +### Unified Adapter Flow + +```mermaid +flowchart TD + User["Userland payload / harness"] + + subgraph ExplicitProxy["Explicit proxy listener"] + ProxyBytes["HTTP proxy bytes"] + IsConnect{"CONNECT request?"} + Connect["CONNECT adapter"] + Forward["Forward HTTP adapter"] + ProxyBytes --> IsConnect + IsConnect -- Yes --> Connect + IsConnect -- No --> Forward + end + + subgraph NativeTcp["Policy DNS + native TCP"] + NameLookup["Userland DNS lookup"] + PolicyDns["Policy DNS adapter"] + DnsAnswer["DNS answer + active mapping"] + NativeConnect["Userland connect(ip:port)"] + TcpAdapter["Transparent TCP adapter"] + NameLookup --> PolicyDns + PolicyDns --> DnsAnswer + DnsAnswer --> NativeConnect + NativeConnect --> TcpAdapter + end + + subgraph LocalApis["Sandbox-local services"] + InferenceReq["Request to inference.local"] + PolicyReq["Request to policy.local"] + MetadataReq["Request to metadata loopback"] + InferenceAdapter["Inference local adapter"] + PolicyAdapter["Policy local adapter"] + MetadataAdapter["Metadata loopback adapter"] + InferenceReq --> InferenceAdapter + PolicyReq --> PolicyAdapter + MetadataReq --> MetadataAdapter + end + + subgraph Shared["Shared external egress pipeline"] + Intent["EgressIntent"] + Auth["Authorize and select endpoint"] + Decision["EgressDecision"] + Validate["Resolve and validate destination"] + Relay["Relay"] + Deny["Adapter-specific deny response"] + Intent --> Auth + Auth --> Allowed{"Allowed?"} + Allowed -- No --> Deny + Allowed -- Yes --> Decision + Decision --> Validate + Validate --> Relay + end + + User --> ProxyBytes + User --> NameLookup + User --> NativeConnect + User --> InferenceReq + User --> PolicyReq + User --> MetadataReq + + Connect --> Intent + Forward --> Intent + TcpAdapter --> Intent + InferenceAdapter --> InferenceResp["Local inference response"] + PolicyAdapter --> PolicyResp["Local policy response"] + MetadataAdapter --> MetadataResp["Local metadata credential response"] +``` + +Each adapter still owns its response shape. If authorization denies a CONNECT +intent, the CONNECT adapter returns a tunnel denial. If forward HTTP is denied, +the forward adapter returns an HTTP denial. If policy DNS refuses a name, it +returns the appropriate DNS response. The shared layer decides the outcome; the +adapter renders it for its protocol. + +### Relay Flow + +```mermaid +flowchart TD + Start["Authorized egress + destination connector"] + Start --> FirstReq{"First HTTP request already parsed?"} + + FirstReq -- Yes --> ForwardMode{"decision.endpoint.enforcement"} + ForwardMode -- "None" --> HttpCred["HTTP relay
credential injection only"] + ForwardMode -- "Http" --> HttpL7["HTTP relay
REST/GraphQL/WebSocket policy"] + ForwardMode -- "TcpApplication" --> BadForward["Deny: HTTP request for TCP app endpoint"] + + FirstReq -- No --> TlsPolicy{"TLS handling skipped?"} + TlsPolicy -- Yes --> Readable["Readable client stream"] + TlsPolicy -- No --> Peek["Peek client bytes"] + Peek --> Tls{"TLS ClientHello?"} + Tls -- Yes --> Terminate["Shared TLS terminator"] + Tls -- No --> Readable + Terminate --> Readable + + Readable --> Enforce{"decision.endpoint.enforcement"} + Enforce -- "None" --> Sniff{"Looks like HTTP?"} + Sniff -- Yes --> HttpCred + Sniff -- No --> TcpRelay["TcpRelay
byte copy"] + + Enforce -- "Http" --> MustHttp{"Looks like HTTP?"} + MustHttp -- Yes --> HttpL7 + MustHttp -- No --> DenyHttp["Deny: expected HTTP"] + + Enforce -- "TcpApplication" --> TcpParser["TcpRelay
protocol parser owns loop"] + + HttpCred --> ReqLoop["HTTP request loop"] + HttpL7 --> ReqLoop + ReqLoop --> ReqPolicy{"Request allowed?"} + ReqPolicy -- No --> ReqDeny["Local HTTP deny
no upstream write"] + ReqPolicy -- Yes --> StaticCreds["Resolve static placeholders"] + StaticCreds --> TokenGrant["Apply endpoint token grant if configured"] + TokenGrant --> Rewrite["Rewrite configured credential slots"] + Rewrite --> HttpDial["Connect or reuse upstream"] + HttpDial --> HttpResponse["Write request and relay response"] + HttpResponse --> Upgrade{"101 WebSocket upgrade?"} + Upgrade -- No --> ReqLoop + Upgrade -- Yes --> WsInspect{"WebSocket inspection or rewrite configured?"} + WsInspect -- No --> RawUpgrade["Raw upgraded stream"] + WsInspect -- Yes --> WsRelay["WebSocket relay
text-frame rewrite / message policy"] + + TcpParser --> ParserDial["Parser calls connector when protocol allows"] + TcpRelay --> TcpDial["Connect upstream"] + TcpDial --> ByteCopy["Copy bytes"] +``` + +Relay rules: + +- HTTP credential injection happens in both HTTP modes: L4-only HTTP and + HTTP-inspected. +- Credential injection includes static placeholder rewrite and endpoint-bound + dynamic token grants. Token grants run after policy allow and before upstream + write; failures deny without forwarding the request. +- Static credential rewrite covers request target, query, headers, opt-in REST + request bodies, and opt-in client-to-server WebSocket text frames. +- HTTP L7 policy is evaluated before upstream write for each request. +- WebSocket upgrade policy is evaluated as HTTP first. After an allowed `101` + upgrade, the WebSocket relay owns frame parsing when text-frame credential + rewrite, WebSocket transport policy, GraphQL-over-WebSocket policy, or safe + compression handling is configured. Other upgraded streams remain raw. +- Forward HTTP must stay in the shared HTTP relay loop or in an equivalent + guarded single-request relay. It must not evaluate one request and then + switch to raw bidirectional copy. +- `protocol: tcp` or an omitted protocol means L4 authorization plus byte copy, + except that HTTP-looking streams may still use HTTP credential injection. +- Future native protocol parsers, such as Redis, Postgres, or MySQL, own the + full message loop and can parse multiple commands or queries on one TCP + session. + +### Adapter Responsibilities + +CONNECT remains the generic explicit proxy mode for HTTPS and arbitrary TCP. +The CONNECT adapter parses `CONNECT host:port` into an `EgressIntent`, asks the +shared authorization boundary for an `EgressDecision`, returns the tunnel-ready +response only after the connection is allowed, and then hands the tunnel to the +relay. The upstream connection is opened by the HTTP relay or TCP parser when +payload policy allows it. + +Forward HTTP is compatibility for clients that send absolute-form HTTP +requests. The adapter parses the first request, rewrites proxy framing only at +the relay boundary, rejects `https://` absolute-form requests, rejects +unsupported h2c upgrades on inspected routes, and either stays in a shared HTTP +request loop or forces `Connection: close` for a guarded single request. + +Transparent TCP is for native clients that do not know they are using a proxy. +It depends on policy DNS and nftables capture: DNS answers create active +endpoint mappings, userland later calls `connect(ip:port)`, nftables redirects +matching traffic to a userland listener, and the TCP adapter recovers the +original destination before building an intent. + +Policy DNS replaces static `/etc/hosts` snapshots for native TCP names. It is +query-driven: check whether the name is policy-eligible, resolve through +trusted DNS, filter returned IPs, publish the active endpoint mapping, and +answer userland. The later `connect(ip:port)` still runs through normal +authorization. + +Local service adapters stay outside the normal external egress relay: +`inference.local` routes chat, completion, model discovery, embeddings, and +provider-specific inference traffic through the router with local limits; +`policy.local` exposes current policy, denial summaries, proposal submission, +and proposal wait routes; metadata loopback serves provider metadata +credentials to SDKs that bypass HTTP proxy variables. + +### Network Enforcement Substrate + +Current main uses nftables for sandbox bypass enforcement. It accepts +proxy-bound traffic, loopback, and established flows, then rejects and +optionally logs other TCP/UDP traffic for the bypass monitor. That is +enforcement, not native TCP capture. + +```mermaid +flowchart TD + Packet["Userland packet"] --> ProxyDest{"Proxy destination?"} + ProxyDest -- Yes --> AcceptProxy["nftables accept"] + ProxyDest -- No --> Capture{"Future native TCP capture match?"} + Capture -- Yes --> Redirect["nftables redirect/TPROXY to transparent adapter"] + Capture -- No --> Reject["nftables log + reject bypass"] + Reject --> Monitor["Bypass monitor emits OCSF"] +``` + +Transparent TCP work should extend this nftables model with explicit capture +rules that run before the reject path and are scoped to active policy DNS +mappings. It should not add a parallel iptables path. + +### Deployment Modes + +| Mode | Shape | Status | +|------|-------|--------| +| Embedded supervisor | `openshell-sandbox` orchestrates `openshell-supervisor-network` and `openshell-supervisor-process` | Current | +| Network-only supervisor | Networking, policy, proxy, local services, and background tasks run without a payload process leaf | Current runtime mode | +| Standalone proxy binary | Supervisor launches networking as a separate process with explicit APIs | Future packaging/API work | +| Sidecar proxy | Proxy runs outside the payload container but inside the sandbox boundary | Future isolation mode | + +A pluggable proxy must expose the right userland surfaces, implement the +gateway APIs it needs, and prove equivalent policy enforcement through tests. +The nftables rules that force or reject userland traffic belong to the sandbox +network boundary even if the proxy process later moves into a standalone binary +or sidecar. + +## Implementation plan + +The detailed migration plan lives in [implementation-plan.md](implementation-plan.md). +The intended order is: + +1. Add regression coverage around the current split, forward HTTP invariants, + endpoint selection, token grants, WebSocket/body rewrite, metadata loopback, + and nftables bypass enforcement. +2. Introduce `EgressIntent` and `EgressDecision` inside + `openshell-supervisor-network`. +3. Move destination validation and endpoint metadata materialization behind the + shared decision and connector boundary. +4. Consolidate forward HTTP, CONNECT HTTP inspection, credential injection, + request-body rewrite, and WebSocket handling behind shared HTTP/WebSocket + relay code. +5. Move TLS detection and termination ahead of the HTTP/TCP relay split. +6. Add the TCP relay/parser boundary, then policy DNS and native TCP capture. +7. Treat local services and deployment modes as explicit runtime contracts. + +## Risks + +- Tightening endpoint metadata failures from fail-open to deny may expose + latent policy or Rego errors. +- Deterministic endpoint selection may reject policies that currently load but + only work by accident. +- Token grants add a runtime dependency on SPIFFE Workload API and token + endpoints. Failures should remain fail-closed and sanitized. +- Transparent TCP capture adds network namespace interception complexity and + must coexist with the nftables bypass reject/log table. +- Sidecar mode needs a reliable identity source for binary/path scoped policy. +- Metadata loopback and `policy.local` expand sandbox-local control surfaces + and need strict route validation, body limits, redaction, and authentication + boundaries. +- Provider-composed policy rules use a reserved namespace. Decisions and logs + must distinguish provider-derived policy from user-authored policy without + exposing provider rules as editable sandbox proposals. + +## Alternatives + +### Keep patching each entry path + +This has the lowest short-term cost but keeps security behavior duplicated +across CONNECT, forward HTTP, and local services. It also makes future TCP +application protocol support harder because each parser must be wired through +multiple entry mechanisms. + +### Replace CONNECT with forward proxy + +Forward proxy only covers plaintext absolute-form HTTP requests. It is not a +replacement for HTTPS tunnels, WebSocket tunnels, or arbitrary TCP clients. +CONNECT should remain the generic explicit proxy mode. + +### Build only transparent TCP + +Transparent TCP helps native clients but does not replace explicit proxy +support used by common HTTP tooling. It also requires policy DNS and nftables +capture before it can safely preserve endpoint identity. + +## Prior art + +The current `openshell-supervisor-network` split is the immediate prior step: +it already separates proxy, OPA, L7, inference routing, policy-local routes, +TLS, and token grants from process supervision. + +The current `openshell-supervisor-process` netns and bypass monitor are the +packet-enforcement substrate. Transparent TCP should extend that nftables +model rather than creating a second firewall path. + +The existing L7 relay is the behavioral prior art for this RFC. It already +proves per-request HTTP evaluation, GraphQL parsing, WebSocket frame handling, +request-body rewrite, and token-grant injection can live behind relay +boundaries. + +## Open questions + +1. Should overlapping endpoint metadata be rejected at policy load time, or + should policy name plus endpoint index define precedence? +2. Should direct IP connects to a policy-DNS-resolved TCP endpoint be accepted, + or should DNS query correlation be required for stricter modes? +3. What TTL cap and stale-generation grace period should policy DNS use? +4. Which process identity source should sidecar mode use when it cannot inspect + payload process metadata through local `/proc`? +5. Which proxy capabilities should be negotiated with the gateway at startup? +6. Should metadata loopback be modeled as an adapter inside + `openshell-supervisor-network`, or remain orchestrated by `openshell-sandbox` + with shared credential/provider helpers? diff --git a/rfc/0005-sandbox-proxy-egress-adapter/current-shape.md b/rfc/0005-sandbox-proxy-egress-adapter/current-shape.md new file mode 100644 index 000000000..d4090e4f6 --- /dev/null +++ b/rfc/0005-sandbox-proxy-egress-adapter/current-shape.md @@ -0,0 +1,223 @@ +# Current Shape Appendix + +This appendix records the current proxy shape and the review findings that +motivate the adapter model. The main RFC intentionally keeps these details out +of the direction document. + +## Current Runtime Split + +The proxy is no longer only a large module inside `openshell-sandbox`. +Current main has three relevant runtime owners: + +```mermaid +flowchart TD + Sandbox["openshell-sandbox
orchestrator"] + Network["openshell-supervisor-network
proxy, OPA, L7, TLS, inference,
policy.local, token grants"] + Process["openshell-supervisor-process
process leaf, SSH, netns,
nftables, bypass monitor"] + Denials["Denial/activity aggregators"] + Gateway["Gateway policy/provider APIs"] + + Sandbox --> Network + Sandbox --> Process + Network --> Denials + Process --> Denials + Sandbox --> Gateway + Network --> Gateway +``` + +`openshell-sandbox` creates the shared network namespace, owns denial/activity +channels, starts the policy poll loop, starts networking, starts the metadata +loopback server when needed, and then optionally starts the process leaf. If +`process_enabled` is false, the supervisor can run in network-only mode and +keep networking/background tasks alive until shutdown. + +`openshell-supervisor-network` owns the explicit proxy listener, OPA engine +integration, L7 enforcement, TLS termination, inference routing, policy-local +routes, identity cache, provider credential injection, and token grants. + +`openshell-supervisor-process` owns process execution, SSH, network namespace +helpers, nftables bypass rules, and the bypass monitor that turns nftables LOG +entries into OCSF events. + +## Current Userland-Facing Surfaces + +The networking surface currently includes: + +- CONNECT proxy traffic for HTTPS and generic TCP tunnels. +- Forward HTTP proxy traffic for absolute-form HTTP requests. +- `inference.local` for local inference routing. +- `policy.local` for current policy, denial summaries, proposal submission, + and proposal wait routes. +- GCE metadata loopback for SDKs that bypass HTTP proxy variables. +- nftables bypass enforcement for direct TCP/UDP egress that does not enter + the proxy. +- OPA/Rego policy and endpoint metadata lookups. +- DNS resolution and endpoint validation for CONNECT and forward HTTP egress. +- Static provider credential injection and redaction. +- Endpoint-bound dynamic token grant injection. +- Opt-in REST request-body credential rewrite. +- L7 REST, GraphQL, WebSocket, and GraphQL-over-WebSocket enforcement. + +The issue is not that these features exist. The issue is that entry mechanisms, +policy evaluation, endpoint metadata lookup, credential injection, and byte +relay decisions are still interleaved. + +## Current CONNECT Shape + +```mermaid +flowchart TD + Client["Client CONNECT host:port"] --> Parse["Parse CONNECT target"] + Parse --> L4["Evaluate network policy"] + L4 --> Allowed{"Allowed?"} + Allowed -- No --> Deny["CONNECT denial"] + Allowed -- Yes --> Meta["Query endpoint metadata"] + Meta --> Config{"L7, TLS, or credential config?"} + Config -- No --> Tunnel["Return tunnel-ready response"] + Config -- Yes --> Tunnel + Tunnel --> Inspect["Inspect tunneled bytes when possible"] + Inspect --> Relay["HTTP/WebSocket/TCP relay selection"] + Relay --> Inject["Static credentials and token grants if configured"] + Inject --> Upstream["Open upstream when relay policy allows"] +``` + +CONNECT is still the strongest entry shape because the tunnel relay can keep +parsing HTTP requests on long-lived connections and enforce request policy per +request. + +## Current Forward HTTP Shape + +```mermaid +flowchart TD + Client["Absolute-form HTTP request"] --> Parse["Parse first request"] + Parse --> L4["Evaluate network policy"] + L4 --> Allowed{"Allowed?"} + Allowed -- No --> Deny["HTTP denial"] + Allowed -- Yes --> L7{"Matching L7 endpoint?"} + L7 -- Yes --> Eval["Evaluate REST/GraphQL/WebSocket policy"] + Eval --> Guard["Reject unsupported h2c upgrade when inspected"] + Guard --> Rewrite["Rewrite to origin-form + configured credentials"] + L7 -- No --> Rewrite + Rewrite --> Token["Apply token grant if endpoint-bound"] + Token --> Close["Force Connection: close except WebSocket upgrade"] + Close --> Upstream["Open upstream"] + Upstream --> Relay["Guarded HTTP relay / upgrade relay"] +``` + +Latest main no longer has the old raw-copy-after-first-request shape for +ordinary forward HTTP. It rewrites ordinary requests with `Connection: close`, +uses guarded HTTP relay helpers for body handling, rejects inspected h2c +upgrades, injects token grants, and sends allowed WebSocket upgrades through +the upgrade relay. That is a narrower surface than the historical bidirectional +copy, but it is still orchestrated separately from the CONNECT relay path. + +## Current Local Service Shape + +```mermaid +flowchart TD + Request["Request to local name"] --> Match{"Known local route?"} + Match -- "inference.local" --> Inference["Inference route adapter"] + Match -- "policy.local" --> Policy["Policy local adapter"] + Match -- "metadata loopback" --> Metadata["Metadata credential server"] + Match -- No --> External["Normal egress path"] + Inference --> InferenceResp["Local inference response"] + Policy --> PolicyResp["Local policy response"] + Metadata --> MetadataResp["Metadata response"] +``` + +`inference.local` now covers buffered and streaming inference shapes including +chat/completion routes, model discovery, embeddings, and provider-specific +routes. `policy.local` supports the agentic approval loop: agents can submit +narrow proposals and wait on approval/reload before retrying. Metadata +loopback exists for provider credentials consumed by SDKs that do not honor +HTTP proxy variables. + +These are userland-facing network surfaces. They should stay distinct from +external egress while still fitting the adapter model. + +## Current Network Namespace Enforcement + +```mermaid +flowchart TD + Start["Process in sandbox network namespace"] --> Dest{"Destination"} + Dest -- "Proxy host_ip:port" --> Proxy["Accept to sandbox proxy"] + Dest -- "Loopback" --> Loopback["Accept loopback"] + Dest -- "Established/related" --> Established["Accept response packet"] + Dest -- "Other TCP/UDP" --> Reject["nftables log + reject"] + Reject --> Monitor["Bypass monitor reads dmesg"] + Monitor --> OCSF["OCSF network + detection events"] +``` + +The process leaf installs an `inet` nftables filter table for bypass +enforcement. The table accepts proxy-bound traffic, loopback, and established +flows, then rejects and optionally logs other TCP/UDP traffic. It does not +currently redirect native TCP connections into the proxy. + +## Findings To Preserve + +### Invariant: forward proxy must not relay unevaluated follow-on HTTP bytes + +The historical forward path evaluated at most the first absolute-form request, +rewrote it, then switched to bidirectional copy. Bytes already buffered after +the first header block, or later pipelined requests on the same client/upstream +connection, could reach upstream without the CONNECT L7 relay's per-request +parser/evaluator. + +Latest main mitigates this by forcing ordinary forward HTTP to one request per +connection and by using guarded relay helpers. The adapter model should +preserve the invariant either by keeping forward HTTP single-request/close or +by passing the first parsed request into a shared HTTP relay loop. + +### Endpoint config is not tied to deterministic matched policy + +The policy name used for L4 authorization and logging can be selected through a +different precedence rule than endpoint metadata. With overlapping host, port, +and binary rules, allowed IPs, TLS behavior, enforcement, and +`allow_encoded_slash` can come from a different endpoint than the policy name +logged and used for L4 allow. + +The adapter model requires authorization to return one decision with one +deterministic matched endpoint. + +### Endpoint metadata query failures should not erase enforcement + +If endpoint metadata lookup fails, callers can interpret the result as no L7 +configuration and downgrade to credential-only or raw L4 relay. + +The adapter model treats endpoint metadata as part of the authorization result. +Failure to materialize required metadata should deny rather than erase extended +configuration. + +### Destination validation must be shared + +Private address checks, `allowed_ips`, exact declared private endpoint trust, +trusted gateway aliases, SSRF checks, and control-plane port blocks have grown +over time. They should be centralized so CONNECT, forward HTTP, future +transparent TCP, and local-service egress use the same resolved-destination +rules. + +## Existing Feature Inventory + +The refactor should preserve: + +- CONNECT explicit proxy support. +- Forward HTTP explicit proxy support. +- Network-only supervisor mode. +- nftables bypass reject/log enforcement. +- Provider credential injection and redaction. +- Dynamic token grant injection through SPIFFE-backed provider credentials. +- REST request-body credential rewrite. +- WebSocket text-frame credential rewrite. +- REST endpoint method/path policy. +- GraphQL-over-HTTP policy. +- WebSocket transport and GraphQL-over-WebSocket policy. +- h2c rejection on inspected HTTP routes. +- Inference routing through `inference.local`, including embeddings. +- Agent-facing policy advisor routes through `policy.local`. +- GCE metadata loopback for supported provider credentials. +- Timeout and resource tracking for client, upstream, and local service work. +- Structured OCSF logging for network and HTTP policy outcomes. +- SSRF and internal address protections. +- Exact declared private endpoint handling. +- Control-plane port protection. +- `allowed_ips` endpoint restrictions. +- TLS auto-detection and termination for inspectable client connections. diff --git a/rfc/0005-sandbox-proxy-egress-adapter/implementation-plan.md b/rfc/0005-sandbox-proxy-egress-adapter/implementation-plan.md new file mode 100644 index 000000000..3b00a512b --- /dev/null +++ b/rfc/0005-sandbox-proxy-egress-adapter/implementation-plan.md @@ -0,0 +1,168 @@ +# Implementation Plan + +This plan is intentionally separate from the main RFC so the proposal can stay +direction-focused. + +## Phase 0 - Regression Tests + +- Add tests for forward HTTP pipelining and keep-alive follow-on requests, + including the current `Connection: close` mitigation. +- Add tests for forward HTTP h2c rejection on inspected endpoints. +- Add tests for overlapping endpoint metadata selection. +- Add tests for endpoint metadata query failures. +- Add tests for control-plane port blocking through all destination validation + paths. +- Add tests for exact declared private endpoint trust and `allowed_ips` + behavior across CONNECT and forward HTTP. +- Add tests proving static credential injection works in L4-only HTTP and + HTTP-inspected paths. +- Add tests proving token grant success injects the configured header and token + grant failure does not forward upstream. +- Add tests for REST request-body credential rewrite, WebSocket text-frame + credential rewrite, WebSocket GraphQL policy, and compression handling. +- Add tests for `policy.local` proposal wait behavior and `inference.local` + buffered/streaming route limits. +- Add tests for metadata loopback startup/failure behavior when provider + credentials require it. +- Add nftables bypass enforcement tests that verify proxy-bound traffic is + accepted while direct TCP/UDP egress is rejected and logged when available. + +## Phase 1 - Authorization Result + +- Introduce `EgressIntent` and `EgressDecision` inside + `openshell-supervisor-network`. +- Make authorization return matched policy and matched endpoint metadata + together. +- Include policy source on the decision: user-authored, provider-derived, or + local-service internal. +- Include protocol enforcement and credential injection plan on the decision. +- Fail closed when required endpoint metadata cannot be materialized. +- Emit consistent OCSF network denial events from the shared boundary. + +## Phase 2 - Shared Destination Validation + +- Move DNS resolution, allowed IP filtering, SSRF checks, exact declared + endpoint handling, trusted gateway aliases, and control-plane port checks + into one destination validation path. +- Return an `UpstreamConnector` rather than an opened upstream socket. +- Add tests proving CONNECT, forward HTTP, and future transparent TCP use the + same validation behavior. + +## Phase 3 - Forward HTTP Adapter + +- Convert forward HTTP into an adapter that parses the first absolute-form + request and builds an egress intent. +- Route the parsed first request into the shared HTTP relay or preserve the + current guarded single-request relay behavior. +- Preserve `https://` absolute-form rejection. +- Preserve h2c rejection on inspected routes. +- Keep the no-raw-copy invariant after the first request. + +## Phase 4 - HTTP, WebSocket, And Credential Relay Consolidation + +- Centralize HTTP request parsing, REST policy, GraphQL policy, WebSocket + upgrade policy, credential resolution, redaction, request rewrite, upstream + dial, and response relay. +- Evaluate every HTTP request before upstream write. +- Ensure denied HTTP requests do not create upstream TCP sessions. +- Preserve static placeholder rewrite for target, query, and headers. +- Preserve dynamic token grant injection after request allow and before + upstream write. +- Preserve opt-in REST request-body credential rewrite behind the shared HTTP + relay, including bounded buffering, supported content-type handling, + `Content-Length` recomputation, and fail-closed unresolved placeholders. +- Preserve WebSocket upgrade handling behind the shared relay, including + opt-in client-to-server text-frame credential rewrite, WebSocket transport + message policy, GraphQL-over-WebSocket policy, and raw passthrough for other + upgraded protocols. + +## Phase 5 - Shared TLS Termination + +- Move client-side TLS detection and termination before the HTTP/TCP relay + split. +- Keep endpoint TLS behavior on `EgressDecision`. +- Treat `tls: skip` as the explicit opt-out for TLS handling. +- Remove duplicate HTTP-specific and TCP-specific TLS termination decisions. + +## Phase 6 - TCP Relay And Parser Boundary + +- Use `TcpRelay` for byte relay and TCP application parser dispatch. +- Keep `protocol: tcp` or omitted protocol as L4 authorization plus byte copy. +- Add a TCP application parser dispatch point for future protocol enforcement. +- Let TCP application parsers own their message loop and call the connector + when protocol state allows. + +## Phase 7 - Policy DNS And Transparent TCP + +- Add policy DNS registration for native TCP endpoint names. +- Replace static host-file mapping with query-driven DNS answers. +- Publish active DNS answer state and capture rules. +- Implement nftables REDIRECT/TPROXY capture rules ahead of the bypass reject + path; do not add a parallel iptables path. +- Coordinate capture rule ownership with `openshell-supervisor-process::netns`. +- Implement transparent TCP adapter lookup from captured original destination + to active endpoint generation. +- Decide TTL and stale-generation behavior. + +## Phase 8 - Local Service Adapters + +- Model `inference.local` as a local adapter with TLS termination, route + validation, provider auth injection, streaming/buffered limits, and OCSF + logging. +- Model `policy.local` as a local adapter for current policy, bounded denial + summaries, policy proposals, and proposal wait. +- Decide whether metadata loopback remains orchestrated in `openshell-sandbox` + or moves behind a local adapter boundary in `openshell-supervisor-network`. +- Keep these paths outside normal external egress relay while preserving + credential redaction and route validation. + +## Phase 9 - Runtime Boundary + +- Keep embedded supervisor mode as the first migration target. +- Treat the existing `openshell-supervisor-network` and + `openshell-supervisor-process` split as the structural baseline. +- Define the proxy runtime API needed for a future standalone binary: + configured listeners, policy updates, provider credentials, token grants, + gateway calls, telemetry, denial/activity events, and shutdown. +- Identify process identity requirements for standalone and sidecar modes. +- Add capability negotiation with the gateway if standalone proxy versions can + differ from gateway versions. + +## Phase 10 - Cleanup + +- Remove duplicated endpoint metadata queries from relay paths. +- Remove duplicated destination validation and deny rendering where adapters + can own response shape. +- Remove any remaining forward HTTP raw-copy fallback. +- Remove stale references to iptables or static `/etc/hosts` native TCP + mapping from proxy design docs. +- Update architecture docs once implementation lands. + +## Testing Plan + +- Unit-test each adapter's intent construction and deny response shape. +- Unit-test authorization precedence for overlapping policy and endpoint rules. +- Unit-test provider-derived rule namespace handling and `policy.local` + filtering. +- Integration-test shared destination validation across CONNECT, forward HTTP, + and transparent TCP. +- Integration-test HTTP keep-alive and pipelined requests with REST, GraphQL, + and WebSocket upgrade enforcement. +- Integration-test credential injection in L4-only HTTP and HTTP-inspected + paths. +- Integration-test token grant success, cache hit, malformed token, resolver + unavailable, and token endpoint failure. +- Integration-test REST request-body credential rewrite for JSON, + form-url-encoded, `text/*`, unsupported content types, chunked framing, body + caps, and unresolved placeholders. +- Integration-test WebSocket text-frame credential rewrite, raw upgraded + passthrough, WebSocket message policy, GraphQL-over-WebSocket policy, and + safe compression negotiation. +- Integration-test TLS termination before HTTP/TCP relay split. +- Integration-test `protocol: tcp` byte-copy behavior. +- Add parser harness tests before adding Redis, Postgres, or similar TCP + application parsers. +- Integration-test policy DNS TTL, stale generation handling, and captured + connect correlation. +- Integration-test `inference.local`, `policy.local`, and metadata loopback + body limits, timeout behavior, redaction, and local denial responses. diff --git a/rfc/0005-sandbox-proxy-egress-adapter/technical-design.md b/rfc/0005-sandbox-proxy-egress-adapter/technical-design.md new file mode 100644 index 000000000..837a638d9 --- /dev/null +++ b/rfc/0005-sandbox-proxy-egress-adapter/technical-design.md @@ -0,0 +1,338 @@ +# Technical Design Appendix + +This appendix carries implementation-level design details behind the main RFC. + +## Existing Runtime Boundary + +`openshell-supervisor-network::run::run_networking` is the current networking +startup boundary. It builds policy-local context, waits for policy binary +symlink resolution, creates the identity cache, writes the TLS CA, builds TLS +state, resolves inference routes, wires provider credentials and token grants, +and starts the proxy. + +This is a useful outer boundary, but it is not yet the proxy adapter boundary. +The proxy still needs internal `EgressIntent` and `EgressDecision` boundaries +so CONNECT, forward HTTP, local routes, and future native TCP capture do not +duplicate policy and relay orchestration. + +## Shared Data Boundaries + +### EgressIntent + +`EgressIntent` is the normalized description of what userland is trying to do. + +It should carry: + +- entry transport: CONNECT, forward HTTP, transparent TCP, local HTTP, policy + DNS, or metadata loopback; +- requested destination host/port or captured original IP/port; +- process identity inputs collected by the adapter/runtime; +- optional first HTTP request for forward proxy traffic; +- optional local service route; +- policy generation or DNS mapping generation when relevant. + +Adapters build intents. They should not query endpoint metadata, select TLS +mode, or select relays. + +### EgressDecision + +`EgressDecision` is the policy result consumed by validation and relay code. + +It should carry: + +- allow or deny; +- deterministic matched policy identifier; +- whether the policy is user-authored, provider-derived, or local-service + internal; +- deterministic matched endpoint identifier and endpoint metadata; +- process identity used for evaluation; +- destination and allowed IP constraints; +- TLS behavior; +- protocol enforcement; +- credential injection plan; +- logging context and denial reason. + +Relay code should read this decision. It should not query OPA again for +endpoint metadata, TLS mode, allowed IPs, credential behavior, or parser +selection. + +## Protocol Enforcement + +Use a protocol enforcement value derived from endpoint policy: + +| Policy protocol | Enforcement | Relay behavior | +|-----------------|-------------|----------------| +| omitted / `tcp` | None | L4 authorization plus byte relay, with optional HTTP sniff for credential injection | +| `rest` | HTTP | HTTP request parser with REST rules, plus opt-in request-body and WebSocket text-frame credential rewrite | +| `graphql` | HTTP | HTTP request parser with GraphQL-over-HTTP rules | +| `websocket` | HTTP | HTTP upgrade policy followed by WebSocket frame policy or GraphQL-over-WebSocket policy | +| future `redis`, `postgres`, `mysql`, ... | TCP application | Protocol-specific TCP parser owns the message loop | + +`protocol: tcp` is effectively the default L4 mode. It should not run TCP +application parsers. Avoid using the term "provider" for parser concepts +because providers are already a first-class credential and routing domain in +OpenShell. + +## Suggested Types + +The exact Rust shape can evolve, but the boundaries should look like this: + +```rust +enum EgressTransport { + Connect, + ForwardHttp, + TransparentTcp, + PolicyDns, + LocalHttp, + MetadataLoopback, +} + +struct EgressIntent { + transport: EgressTransport, + destination: RequestedDestination, + process: ProcessIdentity, + first_request: Option, + local_route: Option, + generation: Option, +} + +struct EgressDecision { + outcome: PolicyOutcome, + matched_policy: Option, + endpoint: Option, + log_context: EgressLogContext, +} + +struct MatchedPolicy { + id: PolicyId, + source: PolicySource, +} + +enum PolicySource { + User, + ProviderDerived, + LocalService, +} + +struct MatchedEndpoint { + id: EndpointId, + allowed_ips: AllowedIpPolicy, + tls: TlsPolicy, + enforcement: ProtocolEnforcement, + credentials: CredentialInjectionPlan, +} + +enum ProtocolEnforcement { + None, + Http(HttpL7Config), + TcpApplication(TcpApplicationConfig), +} + +enum HttpL7Protocol { + Rest, + Graphql, + Websocket, +} + +struct HttpL7Config { + protocol: HttpL7Protocol, + path: EndpointPathScope, + allow_encoded_slash: bool, + enforcement_mode: L7EnforcementMode, + websocket_credential_rewrite: bool, + request_body_credential_rewrite: bool, + websocket_graphql_policy: bool, + graphql_max_body_bytes: usize, +} + +struct CredentialInjectionPlan { + static_placeholders: StaticPlaceholderPlan, + token_grant: Option, +} + +struct StaticPlaceholderPlan { + http_target_query_header: bool, + rest_request_body: bool, + websocket_text_frames: bool, +} + +struct TokenGrantPlan { + provider_key: String, + auth_style: TokenGrantAuthStyle, + token_endpoint: String, +} + +struct RelayContext { + decision: EgressDecision, + connector: UpstreamConnector, + deadlines: RelayDeadlines, + telemetry: RelayTelemetry, +} +``` + +`UpstreamConnector` is the relay-owned dial boundary. It encapsulates the +validated destination and lets relays/parsers open an upstream connection only +after protocol policy allows it. + +## Current Owners And Proposed Cleanup + +| Current owner | Current responsibility | Proposed cleanup | +|---------------|------------------------|------------------| +| `openshell-sandbox` | Orchestrator, policy poll loop, denial/activity channels, metadata loopback startup, network-only lifecycle | Keep as orchestration; avoid embedding per-entry proxy policy decisions | +| `openshell-supervisor-network::run` | Networking startup and handles | Become the stable runtime API for embedded and future standalone modes | +| `openshell-supervisor-network::proxy` | CONNECT, forward HTTP, local route dispatch, destination validation, denial rendering | Split into adapters, authorization, destination, relay selection, and adapter response rendering | +| `openshell-supervisor-network::opa` | Policy engine and Rego queries | Return deterministic `EgressDecision` data instead of separate policy and endpoint lookups | +| `openshell-supervisor-network::l7` | REST, GraphQL, WebSocket, inference helpers, TLS, token grants | Keep as protocol/relay implementation behind shared relay boundaries | +| `openshell-supervisor-network::policy_local` | `policy.local` state and routes | Model as a local adapter with explicit limits and proposal/wait behavior | +| `openshell-supervisor-process::netns` | nftables bypass rules and namespace helpers | Remain owner of bypass enforcement; coordinate future capture rules with network proxy mappings | +| `openshell-supervisor-process::bypass_monitor` | nftables LOG parsing and OCSF bypass telemetry | Remain telemetry producer for bypass violations | +| `openshell-core::secrets` and provider credential state | Static placeholder sources and dynamic credential metadata | Feed credential injection plans; do not leak secrets into decision logs | + +## Policy DNS And Resolved TCP State + +Policy DNS should be query-driven rather than a static `/etc/hosts` snapshot. + +1. Policy load registers eligible native TCP endpoint names. +2. Userland performs DNS lookup. +3. Policy DNS checks whether the name is registered for native TCP. +4. Policy DNS resolves through trusted upstream DNS. +5. Answers are filtered against endpoint metadata and SSRF controls. +6. The adapter publishes the DNS answer, endpoint generation, and capture rule. +7. Userland later calls `connect(ip:port)`. +8. Transparent TCP recovers the original destination and maps it to the active + endpoint generation. +9. Normal egress authorization and relay selection run. + +The resolved endpoint store is therefore not a preemptive global DNS snapshot. +It is active state produced by policy-eligible lookups and consumed by +transparent TCP connects. + +## nftables Boundary + +Current main uses nftables, not iptables, for sandbox network bypass +enforcement. The installed `inet` table accepts traffic to the sandbox proxy, +loopback, and established/related flows, then rejects and optionally logs other +TCP/UDP traffic. The bypass monitor reads those log lines and emits OCSF +network and detection events. + +Transparent TCP capture should build on this same nftables substrate: + +- capture rules must run before the generic bypass reject rules; +- capture rules should be scoped to active policy DNS IP/port mappings; +- capture state should be updated atomically with endpoint generation changes; +- reject/log rules remain the fallback for unmatched TCP/UDP egress; +- VM or Podman driver nftables rules are infrastructure NAT/isolation and + should not be treated as the proxy policy enforcement point. + +## Endpoint Selection And OPA + +OPA/Rego should return policy and endpoint metadata through one deterministic +authorization result. It should not let policy name and endpoint config be +selected by different precedence rules. + +Two acceptable approaches: + +- Reject overlapping endpoint metadata at load or merge time. +- Define a single deterministic precedence key and use it for both policy name + and endpoint metadata. + +Endpoint metadata query failures should fail closed when metadata is required +for the selected endpoint. They should not silently downgrade to L4 behavior. + +Provider-derived policies use a reserved rule-name namespace. The gateway and +sandbox sync should prevent user-authored `_provider_*` rules, and +`policy.local` proposal surfaces should not expose provider-derived rules as +editable user policy. `EgressDecision` should still identify provider-derived +matches for logging and debugging. + +## Credential Injection Boundary + +Credential injection belongs in the HTTP/WebSocket relay after policy allow and +before upstream write. + +1. Authorization selects the endpoint and computes a credential injection plan. +2. The HTTP relay resolves credentials only when it has an allowed request. +3. Static placeholder values are resolved and redacted from logs. +4. Endpoint-bound token grants obtain or reuse a dynamic access token. +5. The final upstream request or WebSocket frame is rewritten immediately + before write. + +Both L4-only HTTP and HTTP-inspected paths can inject credentials. The +difference is whether REST, GraphQL, or WebSocket policy is evaluated before +the rewrite. + +Credential rewrite slots should be explicit: + +- request target, query values, and headers for HTTP-family traffic; +- REST request bodies only when `request_body_credential_rewrite` is enabled; +- client-to-server WebSocket text frames only when + `websocket_credential_rewrite` is enabled; +- GraphQL-over-WebSocket connection/control messages when they are carried in + text frames and the endpoint enables the WebSocket rewrite path; +- token grant headers for endpoint-bound provider credentials. + +Request-body rewrite is REST-only. It should buffer bounded UTF-8 textual +bodies, including JSON, form-url-encoded, and `text/*`, recompute +`Content-Length`, preserve unsupported bodies that contain no reserved +credential markers, and fail closed when a reserved placeholder cannot be +resolved safely. Binary WebSocket frames are not rewritten. + +Token grants are dynamic credential injection. They use provider metadata to +request a SPIFFE JWT-SVID, exchange it for an OAuth2 access token, cache the +token, and inject either an `Authorization: Bearer` header or a configured +custom header. Token grant failures should return a local relay error and must +not forward the request upstream. + +## Parser Boundary + +Protocol parsers operate on streams owned by the relay. + +- HTTP parsing converts bytes into request metadata, evaluates request policy, + and loops for keep-alive or pipelined requests. +- WebSocket parsing starts only after an allowed HTTP upgrade. It validates the + handshake/frame stream and owns client-to-server text-frame inspection when + credential rewrite, transport message policy, GraphQL-over-WebSocket policy, + or compression handling is configured. +- TCP application parsers read client and upstream streams as needed and own + their message loop. +- A TCP parser can deny before dialing, dial for a server handshake, or keep + evaluating commands/queries throughout the session. + +This avoids a separate dial strategy enum. The parser knows which protocol +milestone is sufficient to call the validated connector. + +## Local Service Adapter Boundary + +Local services are network surfaces but not normal external egress: + +- `inference.local` terminates local client traffic, validates known inference + routes, strips caller auth, injects provider routing/auth, and applies + streaming or buffered limits based on route type. +- `policy.local` serves policy snapshots, denial summaries, proposal + submission, and proposal wait. It should never expose secrets or provider + rules as editable policy. +- Metadata loopback serves provider metadata credentials for SDKs that bypass + HTTP proxy variables. It should use the same provider credential state and + redaction discipline as other credential paths. + +These adapters may call gateway APIs or local credential helpers, but they +should not bypass policy and credential invariants that apply to external +egress. + +## Timeout And Resource Ownership + +| Owner | Resource | +|-------|----------| +| Adapter | Client-side parse timeout and adapter-specific deny response | +| Authorization | OPA deadline and policy evaluation telemetry | +| Destination validator | DNS timeout, allowed IP checks, SSRF checks, control-plane port checks | +| TLS terminator | Client TLS handshake timeout and certificate selection | +| HTTP relay | Per-request read/write deadlines, body caps, request-body rewrite caps, upstream reuse | +| WebSocket relay | Upgrade validation, frame limits, text-frame rewrite, compression limits, message policy | +| TCP relay | Byte-copy idle timeout and half-close handling | +| TCP parser | Protocol message timeouts and parser-specific limits | +| Local service adapter | Local route body limits, response caps, gateway call timeout | +| Token grant resolver | SPIFFE Workload API timeout, token endpoint timeout, cache TTL | + +Timeouts should be recorded in telemetry at the owner boundary that can explain +the failure.