[awf] ARC/DinD: remaining workarounds needed for zero-config chroot mode on Kubernetes runners

## Context

Despite significant ARC/DinD improvements in AWF (PRs #2839, #2843, #3218, #3554, #3852, #3914, #4026), real-world users on ARC/DinD runners still require a composite action with ~100 lines of shell workarounds to get agentic workflows running. See [gh-aw#34896 comment](https://github.com/github/gh-aw/issues/34896#issuecomment-4600100004) for the full workaround code.

The core infrastructure (path prefix, socket detection, /etc synthesis) is fixed. This issue tracks the **remaining gaps** that prevent a truly zero-workaround experience.

---

## Gap 1: Copilot HOME/identity vars not forwarded in chroot mode

**Problem:** AWF chroot mode passes `HOME=/home/runner`, `USER=root`, `LOGNAME=root` to the agent exec regardless of `engine.env` settings. The Copilot CLI can't write to `~/.copilot` in the chrooted DinD filesystem and exits silently with status 1.

**Current workaround:** Users create a shell shim (`copilot.real` + wrapper) that forces `HOME=/tmp/gh-aw/home USER=runner LOGNAME=runner` before exec'ing the real binary.

**Proposed fix:**
- Add `chroot.identity` to stdin config:
  ```json
  {
    "chroot": {
      "identity": {
        "home": "/tmp/gh-aw/home",
        "user": "runner",
        "uid": 1001,
        "gid": 1001
      }
    }
  }
  ```
- AWF's `entrypoint.sh` reads these from config and sets `HOME`, `USER`, `LOGNAME` **after** the chroot pivot, overriding the defaults.
- Document in `awf-config-schema.json` and `docs/chroot-mode.md`.

---

## Gap 2: /tmp/gh-aw directory tree pre-staging inside DinD daemon

**Problem:** On ARC/DinD, the Docker daemon's `/tmp` is a separate filesystem from the runner's `/tmp`. AWF writes files to the runner's `/tmp/gh-aw/`, but the daemon (which creates containers) can't see them. Users must pre-create the directory tree with correct permissions inside the daemon's filesystem before AWF runs.

**Current workaround:** Users run `docker run --rm ... -v /tmp:/host-tmp:rw ... mkdir -p /host-tmp/gh-aw/{.cache,.config,.local/state,home,mcp-logs,...} && chmod -R 0777` as a pre-agent step.

**Proposed fix:**
- Add `dind.preStageDirs` to stdin config:
  ```json
  {
    "dind": {
      "preStageDirs": true,
      "workDir": "/tmp/gh-aw",
      "stagingImage": "ghcr.io/github/gh-aw-firewall/agent:latest"
    }
  }
  ```
- When `preStageDirs: true` and a DinD environment is detected, AWF runs a lightweight init container to create the required directory tree with open permissions before starting the compose stack.
- This reuses the existing DinD detection logic from PR #3554.

---

## Gap 3: Engine binary staging into DinD daemon's /usr/local/bin

**Problem:** The Copilot CLI is installed on the runner at runtime by gh-aw. But in DinD mode, the runner's filesystem is not visible to containers created by the daemon. The binary must be copied into the daemon's filesystem so AWF's `/usr:/host/usr:ro` mount exposes it inside the chroot.

**Current workaround:** Users `docker run ... -v /usr/local/bin:/daemon-usr-local-bin:rw ... cp copilot /daemon-usr-local-bin/` after installation.

**Proposed fix:**
- Add `dind.stageEngineBinary` to stdin config:
  ```json
  {
    "dind": {
      "stageEngineBinary": {
        "path": "/usr/local/bin/copilot",
        "targetPath": "/usr/local/bin/copilot"
      }
    }
  }
  ```
- AWF detects the DinD split filesystem, locates the engine binary on the runner, and stages it into the daemon's filesystem via a short-lived container before starting the agent.
- The binary path comes from config (non-sensitive); no secrets involved.

---

## Gap 4: MCP DOCKER_HOST env for DinD socket

**Problem:** MCP servers (github-mcp-server, mcpg) need to know the Docker socket location when running inside DinD. The user currently has to manually set `sandbox.mcp.env.DOCKER_HOST`.

**Current workaround:** `sandbox.mcp.env.DOCKER_HOST: tcp://localhost:2375` in workflow frontmatter.

**Proposed fix:**
- When AWF detects DinD mode (already supported), automatically propagate the detected Docker host to MCP server containers as `DOCKER_HOST`.
- No config change needed — this is implicit behavior when `--enable-dind` or auto-detection is active.

---

## Design Principles

All proposed config fields follow AWF's existing conventions:

| Parameter | Location | Rationale |
|-----------|----------|-----------|
| `chroot.identity.home` | stdin config | Non-sensitive path configuration |
| `chroot.identity.user` | stdin config | Non-sensitive identity hint |
| `chroot.identity.uid/gid` | stdin config | Non-sensitive numeric IDs |
| `dind.preStageDirs` | stdin config | Boolean flag, no secrets |
| `dind.workDir` | stdin config | Non-sensitive path |
| `dind.stagingImage` | stdin config | Non-sensitive image reference |
| `dind.stageEngineBinary.path` | stdin config | Non-sensitive filesystem path |
| API keys, tokens | env vars only | **Never** in config — passed via `-e` flags |

### Documentation requirements
- All new fields MUST be added to `src/awf-config-schema.json` with descriptions
- All new fields MUST be reflected in `src/types/` TypeScript interfaces
- `docs/chroot-mode.md` MUST document the ARC/DinD identity override behavior
- A new `docs/arc-dind.md` guide should consolidate all ARC/DinD configuration in one place
- The AWF spec (`awf-config-spec.yaml` if applicable) MUST include the new fields

---

## Success Criteria

A user on ARC/DinD runners can run an agentic workflow with **only** standard workflow frontmatter fields (no composite action, no `pre-agent-steps`, no `resources:` block). The AWF binary handles all filesystem staging internally based on the stdin config provided by the gh-aw compiler.

## References

- User report: https://github.com/github/gh-aw/issues/34896#issuecomment-4600100004
- Upstream issues (closed but incomplete): gh-aw#30838, gh-aw#30840
- Existing ARC/DinD PRs: #2839, #2843, #3218, #3554, #3852, #3914, #4026
- AWF config schema: `src/awf-config-schema.json`
- Chroot docs: `docs/chroot-mode.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[awf] ARC/DinD: remaining workarounds needed for zero-config chroot mode on Kubernetes runners #4399

Context

Gap 1: Copilot HOME/identity vars not forwarded in chroot mode

Gap 2: /tmp/gh-aw directory tree pre-staging inside DinD daemon

Gap 3: Engine binary staging into DinD daemon's /usr/local/bin

Gap 4: MCP DOCKER_HOST env for DinD socket

Design Principles

Documentation requirements

Success Criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Parameter	Location	Rationale
`chroot.identity.home`	stdin config	Non-sensitive path configuration
`chroot.identity.user`	stdin config	Non-sensitive identity hint
`chroot.identity.uid/gid`	stdin config	Non-sensitive numeric IDs
`dind.preStageDirs`	stdin config	Boolean flag, no secrets
`dind.workDir`	stdin config	Non-sensitive path
`dind.stagingImage`	stdin config	Non-sensitive image reference
`dind.stageEngineBinary.path`	stdin config	Non-sensitive filesystem path
API keys, tokens	env vars only	Never in config — passed via `-e` flags

[awf] ARC/DinD: remaining workarounds needed for zero-config chroot mode on Kubernetes runners #4399

Description

Context

Gap 1: Copilot HOME/identity vars not forwarded in chroot mode

Gap 2: /tmp/gh-aw directory tree pre-staging inside DinD daemon

Gap 3: Engine binary staging into DinD daemon's /usr/local/bin

Gap 4: MCP DOCKER_HOST env for DinD socket

Design Principles

Documentation requirements

Success Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions