callstack · thymikee · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/docs/adr/0004-ios-snapshot-backend-strategy.md b/docs/adr/0004-ios-snapshot-backend-strategy.md
@@ -0,0 +1,79 @@
+# ADR 0004: iOS Snapshot Backend Strategy
+
+## Status
+
+Accepted
+
+## Context
+
+Agent Device exposes iOS UI state through snapshots produced by the long-lived XCTest runner. The
+runner has two different snapshot needs:
+
+- rich diagnostics and selector disambiguation, where a recursive XCTest snapshot is useful because
+  it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
+- agent-facing compact interactive context, where the important contract is fast, bounded discovery
+  of visible controls and stable refs for the next action.
+
+These needs should not share one capture strategy blindly. Recursive `XCUIElement.snapshot()` is
+rich, but some real simulator app trees can make XCTest fail with `kAXErrorIllegalArgument` while
+the same app remains visually usable and can be inspected by lower-level simulator accessibility
+services. Bluesky is the current known example: Argent's `ax-service` can describe the screen, but
+XCTest recursive snapshots and typed `XCUIElementQuery` enumeration can degrade to no useful child
+nodes.
+
+This is different from presentation filtering. The daemon's snapshot presentation can hide noisy
+or inaccessible nodes, but it cannot recover nodes that XCTest never returns. More filters,
+Maestro-specific heuristics, or retries in the daemon would only make this failure slower and less
+predictable.
+
+## Decision
+
+Keep XCTest as the default iOS automation runner and split iOS snapshot capture into explicit
+strategies:
+
+- **Full tree strategy**: use recursive XCTest snapshots for normal/full snapshots, raw snapshots,
+  diagnostics, and cases that need hierarchy. If XCTest reports a real AX serialization failure,
+  preserve that error instead of pretending the UI is empty.
+- **Compact interactive strategy**: for `snapshot -i -c`, use a bounded flat XCTest query strategy
+  that avoids recursive root snapshots and app/window property reads. It should prefer fast,
+  one-screen actionability over hierarchy fidelity and should return a sparse root quickly when
+  XCTest cannot enumerate controls.
+- **Future simulator AX-service strategy**: treat Bluesky-class failures as evidence that XCTest is
+  not a complete semantic snapshot backend. A robust semantic fix should add a host-side simulator
+  accessibility backend, similar in role to `idb` accessibility commands or Argent's `ax-service`,
+  and normalize its output into the same `SnapshotNode` model. That backend can be simulator-only;
+  physical devices can continue using XCTest unless a supported lower-level API exists.
+
+The daemon should make degraded compact output observable. If an iOS compact interactive snapshot
+contains only the synthetic application root, surface a warning so agents know the snapshot is
+bounded fallback output rather than proof that the screen has no controls.
+
+## Regression Notes
+
+PR #639 made XCTest AX serialization failures explicit instead of swallowing them as empty
+snapshots. That was the correct diagnostic change, but it exposed apps whose accessibility trees
+XCTest cannot serialize.
+
+The first compact fallback then still paid several XCTest reads (`app.label`, `app.identifier`,
+`app.frame`, window frame lookup) before enumerating flat controls. On broken trees those reads can
+hit the same AX failure path, which made `snapshot -i -c` much slower than the plain snapshot in
+some apps. PR #700 changed compact interactive snapshots to enter the flat strategy immediately and
+avoid those app/window reads.
+
+## Consequences
+
+Compact interactive snapshots are allowed to be less complete than full snapshots, but they must be
+bounded and honest. They should never block for the full daemon snapshot timeout because one app has
+a pathological AX tree.
+
+Full snapshots remain the right tool when hierarchy matters. They may still fail loudly on
+XCTest-broken trees; that failure is useful because retrying the same recursive capture is unlikely
+to reveal a different tree.
+
+A future AX-service backend is the correct place to regain Bluesky-class semantic coverage. It
+should be added as a platform backend with its own lifecycle, protocol, normalization, timing
+metrics, and fallback rules, not as another special case inside the XCTest runner.
+
+When adding new iOS snapshot behavior, maintainers should first decide which strategy owns it. If a
+change tries to make compact snapshots rich by reintroducing recursive snapshots, or tries to make
+full snapshots fast by hiding XCTest failures, it is probably crossing strategy boundaries.
diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Alert.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Alert.swift
@@ -8,20 +8,18 @@ extension RunnerTests {
   }
 
   func resolveAlert(app activeApp: XCUIApplication) -> RunnerAlert? {
+#if !os(macOS)
+    if let systemModal = firstBlockingSystemModal(in: springboard) {
+      return runnerAlert(root: systemModal, ownerApp: springboard)
+    }
+#endif
     if let alert = firstExistingElement(in: activeApp.alerts.allElementsBoundByIndex) {
       return runnerAlert(root: alert, ownerApp: activeApp)
     }
     if let popup = firstDismissPopupWindow(in: activeApp) {
       return runnerAlert(root: popup, ownerApp: activeApp)
     }
-#if os(macOS)
-    return nil
-#else
-    if let systemModal = firstBlockingSystemModal(in: springboard) {
-      return runnerAlert(root: systemModal, ownerApp: springboard)
-    }
     return nil
-#endif
   }
 
   func handleAlert(_ alert: RunnerAlert, action: String) -> Response {
@@ -33,6 +31,27 @@ extension RunnerTests {
       if let response = unsupportedResponse(for: outcome) {
         return response
       }
+      sleepFor(0.2)
+      if alertStillVisible(alert, actionButtonLabel: button.label) {
+        let frame = button.frame
+        if !frame.isNull && !frame.isEmpty {
+          let coordinateOutcome = tapAt(app: alert.ownerApp, x: frame.midX, y: frame.midY)
+          if let response = unsupportedResponse(for: coordinateOutcome) {
+            return response
+          }
+          sleepFor(0.2)
+        }
+      }
+      if alertStillVisible(alert, actionButtonLabel: button.label) {
+        return Response(
+          ok: false,
+          error: ErrorPayload(
+            code: "INTERACTION_FAILED",
+            message: "alert \(action) did not dismiss the visible alert",
+            hint: "The alert button was found but the system still reports the alert after tapping it."
+          )
+        )
+      }
       return Response(ok: true, data: DataPayload(message: action == "accept" ? "accepted" : "dismissed"))
     }
 
@@ -53,6 +72,21 @@ extension RunnerTests {
     return RunnerAlert(root: root, ownerApp: ownerApp, buttons: buttons)
   }
 
+  private func alertStillVisible(_ alert: RunnerAlert, actionButtonLabel: String) -> Bool {
+    guard let current = resolveAlert(app: alert.ownerApp) else {
+      return false
+    }
+    let previousTitle = preferredAlertTitle(alert.root, buttons: alert.buttons)
+    let currentTitle = preferredAlertTitle(current.root, buttons: current.buttons)
+    if previousTitle == currentTitle {
+      return true
+    }
+    let normalizedActionLabel = actionButtonLabel.trimmingCharacters(in: .whitespacesAndNewlines)
+    return current.buttons.contains { button in
+      button.label.trimmingCharacters(in: .whitespacesAndNewlines) == normalizedActionLabel
+    }
+  }
+
   private func firstExistingElement(in elements: [XCUIElement]) -> XCUIElement? {
     elements.first { isVisibleElement($0) }
   }