pi-remote-control/docs/PHASE-1-sidecar.md

12 KiB
Raw Permalink Blame History

Phase 1 — Sidecar Production-Ready

Status: blocked on Phase 0 verdict. Owners: parallelisable across multiple agents — see task table. Branch base: main after Phase 0 merge. Feature branches per work stream (see SYNC.md). Spec reference: reference/SPEC-ios-app.md §4.

Streaming primitive (decided in Phase 0.5): tmux control mode (tmux -C attach). Pane output is delivered via parsed %output events, which is robust across alternate-screen transitions — unlike pipe-pane (which Phase 0 found unreliable). See reference/PHASE-0.5-report.md and the spike code in branch feat/spike-tmux-cc.

Goal

The pi-remote-control extension is extended into a full sidecar that can serve the iOS app. End state: a single Node process, started alongside pi (or as a system service), that exposes a WebSocket API for:

  • Stream attach/detach with reconnect.
  • Send-keys input.
  • Multi-session lifecycle (spawn, list, rename, kill).
  • Snapshot, disk-buffered replay.
  • State, slash-command-registry side-channel.
  • QR-based pairing, bearer-token auth, self-signed TLS with pinning.
  • Health endpoint.

After Phase 1 we can drive everything from wscat or a small Web UI. The iOS app is not required to validate Phase 1.

Acceptance Criteria

For each S-feature listed below: implemented, manually exercised, basic test (smoke test minimum). Plus:

  • pi-remote pair prints a working QR.
  • Two parallel sessions can be spawned, switched between, and one can be killed without disturbing the other.
  • WebSocket-level integration smoke test: a script that opens a stream, sends keys, receives output, drops the connection, reconnects with lastSeq, observes a clean delta.
  • wss:// works against the self-signed cert; the fingerprint matches the QR contents.
  • Sidecar survives restart and reattaches to all existing tmux sessions without losing state.

Architecture Sketch

extensions/remote-control/
├── index.ts             — extension entry point (existing, extended)
├── server/              — NEW: HTTP/WS server, split into route modules
│   ├── server.ts        — bootstrap, TLS, middleware
│   ├── routes/
│   │   ├── stream.ts    — S-02 binary stream + S-04 sequence + S-05 snapshot
│   │   ├── input.ts     — S-03 send-keys
│   │   ├── sessions.ts  — S-09 multi-session CRUD
│   │   ├── commands.ts  — S-08 slash-command registry
│   │   ├── side.ts      — S-07 state side-channel
│   │   └── health.ts    — S-12 health
│   └── upgrade.ts       — WS upgrade routing per session/topic
├── tmux/                — NEW: tmux wrapper (control-mode client)
│   ├── manager.ts       — spawn/list/kill sessions, metadata via @options
│   ├── control.ts       — `tmux -C` control-mode client, %output parser, byte streaming
│   ├── input.ts         — send-keys translation (key names → tmux send-keys)
│   └── snapshot.ts      — capture-pane wrapper
├── buffer/              — NEW: disk ringbuffer per session
│   ├── writer.ts        — append, cap enforcement, watchdog
│   └── reader.ts        — range read for snapshot fallback
├── sequence.ts          — NEW: monotonic chunk numbering shared by stream + buffer
├── auth/                — auth/pairing module
│   ├── tokens.ts        — bearer-token CRUD (extends existing auth.ts)
│   ├── pairing.ts       — pi-remote pair, QR rendering, exchange
│   └── tls.ts           — self-signed cert generation + fingerprint
├── pi/                  — adapter to pi ExtensionAPI
│   ├── events.ts        — subscribe agent_start/end, tool_*, session_*
│   ├── commands.ts      — pi.getCommands() wrapper
│   └── autoname.ts      — S-09a, spawn pi -p subprocess
└── cli/                 — CLI entrypoints (pi-remote attach/pair/auth/health)
    └── index.ts

html.ts, messages.ts, the existing server.ts and config.ts remain for the legacy HTML client during the transition; they are tagged as legacy in code comments. They will be retired after Phase 2 ships.

Task Breakdown

Tasks are numbered T-1.<n>. The "Parallel With" column shows which other tasks can be in flight simultaneously without merge pain. The "Touches" column lists the files an agent may modify.

ID Task Touches Depends on Parallel With
T-1.0 Server refactor scaffold. Carve server.ts into the server/ and route modules above; existing HTML behaviour must still work; CI green. extensions/remote-control/server/**, minimal edit of index.ts none — must land first
T-1.1 tmux/manager + tmux/control + tmux/snapshot. Spawn, list, kill, metadata via @description. Control-mode client (tmux -C attach), %output parser with octal-escape decoder, broadcast bytes to subscribers. Snapshot via capture-pane. Reference: feat/spike-tmux-cc branch (spike-cc.ts). tmux/** T-1.0 T-1.2, T-1.3, T-1.4, T-1.5, T-1.6
T-1.2 Sequence module + buffer/writer + buffer/reader. Monotone chunk IDs, disk ringbuffer with caps (100MB/session, 1GB global, free-space watchdog), idle-cleanup. sequence.ts, buffer/** T-1.0 T-1.1, T-1.3, T-1.4, T-1.5, T-1.6
T-1.3 Auth: tokens + pairing + TLS. Self-signed cert generation, fingerprint, bearer-token CRUD, pi-remote pair CLI + QR rendering, pi-remote auth list/revoke/name. auth/**, cli/index.ts (subcommands only) T-1.0 T-1.1, T-1.2, T-1.4, T-1.5, T-1.6
T-1.4 pi adapter. Subscribe ExtensionAPI events, expose getCommands, implement autoname.ts spawning pi -p. pi/**, edits in index.ts to wire subscriptions T-1.0 T-1.1, T-1.2, T-1.3, T-1.5, T-1.6
T-1.5 Stream + input + snapshot routes (S-02/S-03/S-04/S-05). WS upgrade routing, binary stream, sequence cursor resume, send-keys with bracketed-paste. server/routes/stream.ts, server/routes/input.ts, server/upgrade.ts T-1.0, T-1.1, T-1.2 T-1.6, T-1.7
T-1.6 Side-channel + commands + sessions routes (S-07/S-08/S-09). server/routes/side.ts, server/routes/commands.ts, server/routes/sessions.ts T-1.0, T-1.1, T-1.4 T-1.5, T-1.7
T-1.7 Health endpoint + config + watchdog (S-12). Disk watchdog ties buffer caps to global state. server/routes/health.ts, new config.toml schema in config.ts T-1.0, T-1.2 T-1.5, T-1.6
T-1.8 Integration smoke harness. Node script under scripts/smoke/ that spawns a sidecar, opens a stream, sends keys, drops + reconnects, verifies delta. scripts/smoke/** T-1.5, T-1.6 none
T-1.9 Docs: operator guide. README section "Running pi-remote as a sidecar", config sample, troubleshooting. README.md, optionally docs/reference/OPERATOR.md T-1.5, T-1.6, T-1.7 parallel with T-1.8
T-1.10 APNs scaffold (deferred but cheap). apns/ module: config schema, JWT generation, push primitive. Stub the device-token registry — flesh out in Phase 2 when iOS app provides tokens. apns/**, edits in auth/tokens.ts to store device-tokens T-1.3 T-1.5..T-1.7

Interface Contracts (lock early to enable parallelism)

These are the contracts that downstream tasks depend on. They must be agreed and frozen at the start of Phase 1 — see SYNC.md for the freeze protocol.

IC-1 — WebSocket frames

// binary frame  : raw ANSI stream bytes (output direction only).
// text frame    : JSON, type-discriminated.

type ClientToServer =
  | { type: "resume"; lastSeq: number | null }
  | { type: "key"; name: string }              // "escape" | "tab" | "up" | "down" | "left" | "right" | "enter" | "shift-enter"
  | { type: "keys"; data: string }             // literal text, sent via send-keys -l
  | { type: "paste"; data: string }            // wrapped in bracketed-paste
  | { type: "snapshot-request" };

type ServerToClient =
  | { type: "state"; value: "thinking" | "tool" | "idle" | "awaiting-input"; tool?: string; ts: number }
  // tree event dropped — out of iOS scope. Revisit if a dashboard wants it.
  // resize ClientToServer deferred — fixed 120×40 for v1.
  | { type: "snapshot"; seq: number; data: string }        // base64 ANSI snapshot
  | { type: "session-meta"; name: string; description?: string; createdAt: string }
  | { type: "error"; code: string; message: string };

Binary frames carry an out-of-band seq via a leading 8-byte big-endian header. Owner: T-1.5.

IC-2 — HTTP REST shape

GET    /health                     → { ok, sessions, bufferBytes, ... }
POST   /sessions                   → { id, name }
GET    /sessions                   → [{ id, name, description, state, lastOutputAt }, …]
PATCH  /sessions/:id               → updates @description
DELETE /sessions/:id               → kills tmux session, optionally clears buffer
GET    /sessions/:id/commands      → [{ name, description, args }]
GET    /sessions/:id/thumbnail     → text/plain capture-pane (40×12)

All endpoints behind bearer token, all responses application/json unless noted. Owner: T-1.5..T-1.7.

IC-3 — Pairing payload

QR encodes a pi-remote:// URL:

pi-remote://<host>:<port>?pair=<pairing-token>&fp=<sha256-hex>&name=<sidecar-name>

Pairing exchange: client POST /pair with { pairingToken, deviceToken?, environment?, deviceName? } → server replies { bearerToken, sidecarId }.

deviceToken and environment are optional pre-Phase-2, mandatory from Phase 2 onward. Owner: T-1.3.

IC-4 — Config schema (TOML)

[server]
host        = "0.0.0.0"
port        = 7777
state_dir   = "~/.local/share/pi-remote"

[buffer]
per_session_mb = 100
global_gb      = 1
free_min_gb    = 1
idle_days      = 30

[tmux]
default_width  = 120
default_height = 40

[apns]
team_id   = "..."
key_id    = "..."
key_path  = "..."
bundle_id = "..."

[autoname]
enabled        = true
trigger_after  = 3   # user messages
model          = "claude-haiku-4-5"

Owner: T-1.7.

Branching Strategy

  • Each task is a feature branch off main, named feat/p1-<task-id>-<slug>, e.g. feat/p1-t1-1-tmux-manager.
  • Open a PR as soon as a task is ready for review. Squash-merge.
  • T-1.0 (refactor) lands first, then T-1.1..T-1.4 can run truly parallel.
  • T-1.5..T-1.7 each consume one or more of the lower-layer modules; they start as soon as the dependency PR is in main.

Test Strategy

  • Unit: per-module pure-logic tests under extensions/remote-control/**/__tests__/.
  • Integration smoke: T-1.8 script, runnable locally and in CI.
  • Manual: each task PR lists manual-verification steps.
  • No iOS testing in this phase.

Risks

  • R1. Disk-buffer cap math races vs. global watchdog. Mitigation: serialise buffer writes through a single async queue per session, lock the global cap behind a mutex.
  • R2. ExtensionAPI event names might shift in future pi versions. Mitigation: pin pi version range in package.json, isolate adapter in pi/events.ts.
  • R3. pi -p auto-name calls cost money. Mitigation: gate behind [autoname] enabled, debounce, skip if user already named the session.
  • R4. tmux control-mode protocol is text-framed; binary pane bytes are octal-escaped (\NNN). Parser must handle high-throughput bursts (~50fps during tool output). Mitigation: streaming line-parser with no full-buffer copies; per-line decode allocates only the escaped payload. Reference decode in spike-cc.ts.
  • R5. tmux version requirement. Control mode is stable from tmux 2.0; modern features (e.g. pane-died event) need 2.5+. Mitigation: tmux/manager.ts checks tmux -V at startup, refuses to run on < 2.5 with a clear error.

Exit / Handover

  • All T-1.x merged.
  • Smoke harness passes locally and in CI.
  • Operator guide complete.
  • A short docs/reference/PHASE-1-report.md summarising deviations from the plan, especially anything that affects Phase 2 contracts.
  • Update SYNC.md to unblock Phase 2.