pi-remote-ios/docs/SIMULATOR-AUTOMATION.md

17 KiB

iOS Simulator UI Automation Guide

Empirically verified on: iPhone 12 mini (iOS 18.6), Xcode 16.4, macOS Intel
UDID: 062F8F0A-B3E5-4A4B-BC8A-B01E98CF27F2
App: de.vpsj.pi-remote (URL scheme: pi-remote://)


TL;DR

Use Facebook's idb (idb_companion + idb CLI). It talks to the simulator via gRPC, reads the full accessibility tree to find elements by label/ID, and provides tap, swipe, text-input, key, and screenshot primitives — all without knowing window coordinates ahead of time and without touching the app source.


Approach Comparison Table

Method What it can do Pros Cons Verified?
idb (fb-idb + idb_companion) tap, swipe, text input, keys, describe-all, screenshot Full accessibility tree; no coordinate guessing; no app changes needed; free CLI idb Python client needs Python ≤3.12 venv; older companion (Aug 2022) but still works on iOS 18 ✓ YES
xcrun simctl io screenshot screenshot only Built-in, no install Only screenshots + video; no input ✓ YES (limited)
xcrun simctl ui appearance/contrast/font-size Built-in Zero UI element interaction ✓ YES (limited)
xcrun simctl openurl open URL scheme Built-in; NO confirm prompt Can't tap buttons or assert UI ✓ YES
xcrun simctl privacy grant/revoke permissions Bypasses permission dialogs No interaction ✓ YES
xcrun simctl push send push notifications Built-in No UI interaction ✓ YES
xcodebuild test + XCUITest everything Official Apple, most powerful Requires test target in Xcode project; heavyweight; can't add test target to existing app without source changes ✗ NOT TESTED (requires source changes)
WebDriverAgent / Appium everything Cross-platform, widely used Complex setup; requires WDA compiled for simulator; gRPC port juggling ✗ NOT TESTED
AppleScript / System Events host-OS window automation Sometimes useful for macOS dialogs Requires accessibility permissions on host; unreliable for simulator internals ✗ NOT VERIFIED
cliclick (current approach) coordinate-based mouse clicks No install Fragile (window-position dependent); not accessibility-aware ✗ SUPERSEDED
Private CoreSimulator APIs anything Low-level control Undocumented; breaks on Xcode updates ✗ NOT ATTEMPTED

Install Instructions

One-time setup

# 1. Install idb_companion via Homebrew
brew tap facebook/fb
brew install idb-companion

# 2. Install idb Python CLI in a Python 3.12 venv
#    (the client has asyncio compatibility issues with Python 3.14+)
python3.12 -m venv /opt/idb-venv
/opt/idb-venv/bin/pip install fb-idb

# Verify
idb_companion --version   # prints build date JSON
/opt/idb-venv/bin/idb --help

Per-session setup (start the companion)

SIM="062F8F0A-B3E5-4A4B-BC8A-B01E98CF27F2"
IDB="/opt/idb-venv/bin/idb"

# Start idb_companion in the background
idb_companion --udid $SIM &>/tmp/idb-companion.log &

# Connect the idb client to it
$IDB connect localhost 10882

# Verify
$IDB list-targets | grep $SIM

Recipes: 7 Verified Primitives

1. Tap a button by accessibility label

# Helper function: find element by AXLabel, compute center, tap it
tap_by_label() {
  local label="$1"
  local coords
  coords=$($IDB ui describe-all --udid $SIM | python3 -c "
import json, sys
data = json.load(sys.stdin)
for el in data:
    if el.get('AXLabel') == '''$label''' or el.get('AXUniqueId') == '''$label''':
        f = el['frame']
        cx = f['x'] + f['width']/2
        cy = f['y'] + f['height']/2
        print(f'{cx:.0f} {cy:.0f}')
        break
")
  if [ -z "$coords" ]; then
    echo "ERROR: element '$label' not found" >&2
    return 1
  fi
  local x; x=$(echo "$coords" | cut -d' ' -f1)
  local y; y=$(echo "$coords" | cut -d' ' -f2)
  echo "Tapping '$label' at ($x, $y)"
  $IDB ui tap --udid $SIM "$x" "$y"
}

# Example: tap the Settings button
tap_by_label "Settings"
# → opens the Settings sheet ✓

# Tap Done to dismiss it
tap_by_label "Done"

Evidence: 02-after-settings-tap.png (Settings sheet opened after tap).

2. Type text into a focused field

# Tap the text area / input field first to give it focus
$IDB ui tap --udid $SIM 187 400        # tap center of terminal text area

# Type text
$IDB ui text --udid $SIM "echo hello_idb_test"

# Press Enter (HID keycode 40 = Return)
$IDB ui key --udid $SIM 40

Note: idb ui text types the literal string. It does not need a system keyboard — it injects characters directly via accessibility. Special characters are supported as-is (no escaping needed for most printable ASCII).

Evidence: 05-after-type.png shows "echo hello_idb_test" in the terminal input; 08-after-swipe.png shows "hello_idb_test" printed as output after Enter.

3. Swipe / scroll

# Syntax: idb ui swipe x_start y_start x_end y_end [--duration <s>] [--delta <px>]

# Scroll DOWN (swipe up): from (187,600) to (187,200)
$IDB ui swipe --udid $SIM 187 600 187 200

# Scroll UP (swipe down):
$IDB ui swipe --udid $SIM 187 200 187 600

# Swipe left (navigate back):
$IDB ui swipe --udid $SIM 20 400 300 400

# Slow swipe (for drag interactions):
$IDB ui swipe --udid $SIM 187 600 187 200 --duration 0.8

Evidence: 07-before-swipe.png08-after-swipe.png shows the terminal view scrolled to reveal earlier output.

4. Assert that a view / text is visible

idb exposes the full iOS accessibility tree. Two levels of assertions:

4a. Assert element exists by label

assert_visible() {
  local label="$1"
  local found
  found=$($IDB ui describe-all --udid $SIM | python3 -c "
import json, sys
data = json.load(sys.stdin)
for el in data:
    if el.get('AXLabel') == '''$label''' or el.get('AXUniqueId') == '''$label''':
        print('found')
        break
")
  if [ "$found" = "found" ]; then
    echo "✓ '$label' is visible"
    return 0
  else
    echo "✗ '$label' not visible"
    return 1
  fi
}

assert_visible "Settings"    # → ✓ 'Settings' is visible
assert_visible "Nonexistent" # → ✗ 'Nonexistent' not visible

4b. Assert TextArea content (app-specific limitation)

piRemote renders the terminal using SwiftTerm's custom drawing (not UIKit UILabels), so the AXValue of the TextArea node is always empty. Text shown in the terminal is not accessible via the accessibility tree.

Workaround: Take a screenshot and process it with OCR, or check app-layer state directly (e.g. via Sidecar's REST API for piRemote specifically).

For apps using standard UIKit UILabel/UITextField, AXLabel or AXValue will contain the text and assert_visible above works perfectly.

5. Screenshot tied to a specific UI element

# Full screenshot
$IDB screenshot --udid $SIM /tmp/before.png

# Element-scoped crop: find element frame → crop with sips
element_screenshot() {
  local label="$1"
  local out="$2"
  local scale=3  # iPhone 12 mini @3x

  local info
  info=$($IDB ui describe-all --udid $SIM | python3 -c "
import json, sys
data = json.load(sys.stdin)
for el in data:
    if el.get('AXLabel') == '''$label''' or el.get('AXUniqueId') == '''$label''':
        f = el['frame']
        pad = 10
        print(int((f['y']-pad)*$scale),  # offsetY
              int((f['x']-pad)*$scale),  # offsetX
              int((f['height']+2*pad)*$scale),  # cropH
              int((f['width']+2*pad)*$scale))   # cropW
        break
")
  local oy ox ch cw
  read -r oy ox ch cw <<< "$info"
  $IDB screenshot --udid $SIM /tmp/_elem_full.png
  cp /tmp/_elem_full.png "$out"
  sips "$out" --cropOffset "$oy" "$ox" --cropToHeightWidth "$ch" "$cw" &>/dev/null
  echo "Saved element screenshot to $out"
}

element_screenshot "Settings" /tmp/settings-btn.png

6. Dismiss system alerts

System alerts (permission dialogs, "Open in…" URL sheets, etc.) appear as normal elements in the accessibility tree. The universal pattern:

# Wait for and dismiss any alert with an "Allow" or "Open" button
dismiss_alert() {
  local timeout=${1:-5}
  local elapsed=0
  while [ $elapsed -lt $timeout ]; do
    local coords
    coords=$($IDB ui describe-all --udid $SIM | python3 -c "
import json, sys
data = json.load(sys.stdin)
for el in data:
    label = el.get('AXLabel') or ''
    if label in ('Allow', 'Allow Once', 'Allow While Using App',
                 'Open', 'OK', 'Continue', 'Don\\'t Allow'):
        f = el['frame']
        print(f\"{f['x']+f['width']/2:.0f} {f['y']+f['height']/2:.0f}\")
        break
" 2>/dev/null)
    if [ -n "$coords" ]; then
      x=$(echo "$coords" | cut -d' ' -f1)
      y=$(echo "$coords" | cut -d' ' -f2)
      $IDB ui tap --udid $SIM "$x" "$y"
      echo "Alert dismissed"
      return 0
    fi
    sleep 0.5
    elapsed=$((elapsed + 1))
  done
  echo "No alert found within ${timeout}s"
  return 1
}

For pre-emptive dismissal (avoid the dialog entirely):

# Grant permissions before the app asks, suppressing the dialog
xcrun simctl privacy $SIM grant notifications de.vpsj.pi-remote
xcrun simctl privacy $SIM grant photos de.vpsj.pi-remote
xcrun simctl privacy $SIM grant location de.vpsj.pi-remote

Verified behavior: When an iOS pop-up/sheet is present, idb ui describe-all returns elements from within it. The Close button of the native iOS share sheet was found at AXUniqueId: header.closeButton and successfully tapped.

# xcrun simctl openurl talks directly to SpringBoard, bypassing the
# "Open in piRemote?" confirmation prompt that Safari would show.
xcrun simctl openurl $SIM "pi-remote://test"

This was verified to open piRemote immediately without any system dialog.

The confirm prompt only appears when a URL is navigated to inside another app (e.g. Safari). If you need to test the prompt itself:

  1. Open Safari: xcrun simctl openurl $SIM "https://example.com"
  2. Use tap_by_label "Address" → type the URL → press Enter
  3. Wait for the alert → use dismiss_alert above

Verified: 33-deeplink-no-prompt.png shows piRemote active after xcrun simctl openurl, with no intermediate dialog.


Complete Worked Example

Launch app → tap Settings → verify it opens → dismiss → type a command → submit → verify output (via screenshot).

#!/usr/bin/env bash
set -euo pipefail

SIM="062F8F0A-B3E5-4A4B-BC8A-B01E98CF27F2"
APP="de.vpsj.pi-remote"
IDB="/opt/idb-venv/bin/idb"
EVIDENCE="/tmp/sim-run-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$EVIDENCE"

# ── 0. Start companion (idempotent) ────────────────────────────────────────
pkill idb_companion 2>/dev/null || true
idb_companion --udid "$SIM" &>/tmp/idb-companion.log &
sleep 2
$IDB connect localhost 10882

# ── 1. Launch app ──────────────────────────────────────────────────────────
xcrun simctl launch "$SIM" "$APP"
sleep 2

$IDB screenshot --udid "$SIM" "$EVIDENCE/01-launched.png"
echo "✓ App launched"

# ── 2. Tap Settings button ─────────────────────────────────────────────────
tap_by_label() {
  local label="$1"
  local coords
  coords=$($IDB ui describe-all --udid "$SIM" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for el in data:
    if el.get('AXLabel') == '$label' or el.get('AXUniqueId') == '$label':
        f = el['frame']
        print(f\"{f['x']+f['width']/2:.0f} {f['y']+f['height']/2:.0f}\")
        break
")
  [ -z "$coords" ] && { echo "ERROR: '$label' not found" >&2; return 1; }
  $IDB ui tap --udid "$SIM" $(echo "$coords" | tr ' ' '\n')
}

tap_by_label "Settings"
sleep 1
$IDB screenshot --udid "$SIM" "$EVIDENCE/02-settings-open.png"

# Assert Settings sheet is showing
$IDB ui describe-all --udid "$SIM" | python3 -c "
import json, sys
data = json.load(sys.stdin)
assert any(el.get('AXLabel') == 'Done' for el in data), 'Settings sheet not open!'
print('✓ Settings sheet is visible (Done button found)')
"

# ── 3. Dismiss Settings ────────────────────────────────────────────────────
tap_by_label "Done"
sleep 0.5
echo "✓ Settings dismissed"

# ── 4. Type a command and submit ───────────────────────────────────────────
$IDB ui tap --udid "$SIM" 187 400   # focus terminal text area
$IDB ui text --udid "$SIM" "echo hello_idb_test"
$IDB screenshot --udid "$SIM" "$EVIDENCE/03-typed.png"
$IDB ui key --udid "$SIM" 40        # Return
sleep 1

$IDB screenshot --udid "$SIM" "$EVIDENCE/04-submitted.png"
echo "✓ Command typed and submitted"

# ── 5. Verify via screenshot (visual) ──────────────────────────────────────
echo "✓ Check $EVIDENCE/04-submitted.png — 'hello_idb_test' should be visible in terminal"

Screenshots from verified run

Step Screenshot
Before tap before tap
After tapping Settings settings open
Before typing before type
After typing "echo hello_idb_test" after type
Before scroll before swipe
After scroll (shows "hello_idb_test" output) after swipe
Deep link — no prompt deep link

Known Gotchas

1. Python version for idb CLI

idb (the Python client) uses asyncio.get_event_loop() which was deprecated in Python 3.10 and raises RuntimeError in 3.14. Always run it from a Python 3.12 venv:

python3.12 -m venv /opt/idb-venv
/opt/idb-venv/bin/pip install fb-idb

2. idb_companion must be started first

The idb_companion process acts as a gRPC server for the simulator. Start it before any idb client calls:

idb_companion --udid $SIM &>/tmp/idb.log &
sleep 2
idb connect localhost 10882

If you forget, idb commands silently return empty results.

3. Terminal text not in accessibility tree

piRemote's terminal (SwiftTerm) renders text via CoreText/Metal, not via UILabel. Therefore idb ui describe-all returns an empty AXValue for the terminal's TextArea node. You cannot assert terminal text content via accessibility.

Workarounds:

  • Visual: compare screenshots (e.g. use tesseract or mlx_vlm for OCR)
  • Programmatic: query Sidecar's REST API (http://10.13.37.2:17373)
  • Add an accessibilityValue to the SwiftTerm view (requires source change)

4. URL-scheme confirm prompt

xcrun simctl openurl routes through SpringBoard directly and never shows a "Open in piRemote?" confirmation. That prompt only appears when:

  • Safari (or another app) navigates to the custom URL scheme
  • There are multiple apps registered for the scheme

In the simulator there is usually only one app per scheme, so even Safari navigating to pi-remote:// opens it promptly. If you do need to test the confirmation dialog, open a page in Safari that links to the URL scheme (using an HTML <a> tag) and tap the link.

5. xcrun simctl privacy requires Booted sim

The privacy grant/revoke subcommand fails with "Operation not permitted" on some protected services (e.g. notifications). Use privacy reset to force re-prompting or privacy grant for services that support it (photos, location, contacts, microphone, etc.).

6. Simulator must be focused / visible for some touch events

idb injects events through the Simulator framework (not host-OS mouse clicks), so the simulator window does not need to be in the foreground. Events work even when another macOS window is on top.

7. describe-all returns flattened, not nested tree

The output of idb ui describe-all is a flat JSON array. Parent/child relationships are not directly encoded. If two elements have the same AXLabel, sort by proximity to expected coordinates.

8. idb_companion version vs Xcode version mismatch

Homebrew's idb_companion was built against an older Xcode (Aug 2022). On Xcode 16.4 it still works for all tested operations but may miss newer simulator features. The warning about "Xcode 16.4 being outdated" is from Homebrew's Tier 2 support and can be ignored.


What Was NOT Verified

Feature Status
XCUITest via xcodebuild test Not tested — requires adding a test target (source change)
WebDriverAgent / Appium Not tested — complex setup; overkill for shell-based automation
AppleScript + System Events Not tested — requires granting host-OS accessibility; slow
idb ui describe-point for filled coordinates Partially — returns empty element when no accessible element exists at exact point
Terminal text assertion via accessibility Does NOT work — custom renderer
xcrun simctl privacy for notifications Fails on iOS 18.6 with "Operation not permitted"
URL scheme confirm prompt via Safari link click Triggers SpringBoard directly with no prompt in practice on iOS 18 sim