pi-remote-control/docs/PHASE-1-sidecar.md

243 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 1 — Sidecar Production-Ready
> **Status:** blocked on Phase 0 verdict.
> **Owners:** parallelisable across multiple agents — see task table.
> **Branch base:** `main` after Phase 0 merge. Feature branches per work
> stream (see `SYNC.md`).
> **Spec reference:** [`reference/SPEC-ios-app.md`](./reference/SPEC-ios-app.md) §4.
> **Streaming primitive (decided in Phase 0.5):** tmux **control mode**
> (`tmux -C attach`). Pane output is delivered via parsed `%output`
> events, which is robust across alternate-screen transitions — unlike
> `pipe-pane` (which Phase 0 found unreliable). See
> `reference/PHASE-0.5-report.md` and the spike code in branch
> `feat/spike-tmux-cc`.
## Goal
The `pi-remote-control` extension is extended into a full sidecar that can
serve the iOS app. End state: a single Node process, started alongside pi
(or as a system service), that exposes a WebSocket API for:
- Stream attach/detach with reconnect.
- Send-keys input.
- Multi-session lifecycle (spawn, list, rename, kill).
- Snapshot, disk-buffered replay.
- State, slash-command-registry side-channel.
- QR-based pairing, bearer-token auth, self-signed TLS with pinning.
- Health endpoint.
After Phase 1 we can drive everything from `wscat` or a small Web UI.
The iOS app is **not** required to validate Phase 1.
## Acceptance Criteria
For each S-feature listed below: implemented, manually exercised, basic
test (smoke test minimum). Plus:
- `pi-remote pair` prints a working QR.
- Two parallel sessions can be spawned, switched between, and one can be
killed without disturbing the other.
- WebSocket-level integration smoke test: a script that opens a stream,
sends keys, receives output, drops the connection, reconnects with
`lastSeq`, observes a clean delta.
- `wss://` works against the self-signed cert; the fingerprint matches the
QR contents.
- Sidecar survives restart and reattaches to all existing tmux sessions
without losing state.
## Architecture Sketch
```
extensions/remote-control/
├── index.ts — extension entry point (existing, extended)
├── server/ — NEW: HTTP/WS server, split into route modules
│ ├── server.ts — bootstrap, TLS, middleware
│ ├── routes/
│ │ ├── stream.ts — S-02 binary stream + S-04 sequence + S-05 snapshot
│ │ ├── input.ts — S-03 send-keys
│ │ ├── sessions.ts — S-09 multi-session CRUD
│ │ ├── commands.ts — S-08 slash-command registry
│ │ ├── side.ts — S-07 state side-channel
│ │ └── health.ts — S-12 health
│ └── upgrade.ts — WS upgrade routing per session/topic
├── tmux/ — NEW: tmux wrapper (control-mode client)
│ ├── manager.ts — spawn/list/kill sessions, metadata via @options
│ ├── control.ts — `tmux -C` control-mode client, %output parser, byte streaming
│ ├── input.ts — send-keys translation (key names → tmux send-keys)
│ └── snapshot.ts — capture-pane wrapper
├── buffer/ — NEW: disk ringbuffer per session
│ ├── writer.ts — append, cap enforcement, watchdog
│ └── reader.ts — range read for snapshot fallback
├── sequence.ts — NEW: monotonic chunk numbering shared by stream + buffer
├── auth/ — auth/pairing module
│ ├── tokens.ts — bearer-token CRUD (extends existing auth.ts)
│ ├── pairing.ts — pi-remote pair, QR rendering, exchange
│ └── tls.ts — self-signed cert generation + fingerprint
├── pi/ — adapter to pi ExtensionAPI
│ ├── events.ts — subscribe agent_start/end, tool_*, session_*
│ ├── commands.ts — pi.getCommands() wrapper
│ └── autoname.ts — S-09a, spawn pi -p subprocess
└── cli/ — CLI entrypoints (pi-remote attach/pair/auth/health)
└── index.ts
```
`html.ts`, `messages.ts`, the existing `server.ts` and `config.ts` remain
for the legacy HTML client during the transition; they are tagged as
*legacy* in code comments. They will be retired after Phase 2 ships.
## Task Breakdown
Tasks are numbered `T-1.<n>`. The "Parallel With" column shows which other
tasks can be in flight simultaneously without merge pain. The "Touches"
column lists the files an agent may modify.
| ID | Task | Touches | Depends on | Parallel With |
|---|---|---|---|---|
| T-1.0 | **Server refactor scaffold.** Carve `server.ts` into the `server/` and route modules above; existing HTML behaviour must still work; CI green. | `extensions/remote-control/server/**`, minimal edit of `index.ts` | — | none — must land first |
| T-1.1 | **tmux/manager + tmux/control + tmux/snapshot.** Spawn, list, kill, metadata via `@description`. **Control-mode client** (`tmux -C attach`), `%output` parser with octal-escape decoder, broadcast bytes to subscribers. Snapshot via `capture-pane`. Reference: `feat/spike-tmux-cc` branch (`spike-cc.ts`). | `tmux/**` | T-1.0 | T-1.2, T-1.3, T-1.4, T-1.5, T-1.6 |
| T-1.2 | **Sequence module + buffer/writer + buffer/reader.** Monotone chunk IDs, disk ringbuffer with caps (100MB/session, 1GB global, free-space watchdog), idle-cleanup. | `sequence.ts`, `buffer/**` | T-1.0 | T-1.1, T-1.3, T-1.4, T-1.5, T-1.6 |
| T-1.3 | **Auth: tokens + pairing + TLS.** Self-signed cert generation, fingerprint, bearer-token CRUD, `pi-remote pair` CLI + QR rendering, `pi-remote auth list/revoke/name`. | `auth/**`, `cli/index.ts` (subcommands only) | T-1.0 | T-1.1, T-1.2, T-1.4, T-1.5, T-1.6 |
| T-1.4 | **pi adapter.** Subscribe ExtensionAPI events, expose `getCommands`, implement `autoname.ts` spawning `pi -p`. | `pi/**`, edits in `index.ts` to wire subscriptions | T-1.0 | T-1.1, T-1.2, T-1.3, T-1.5, T-1.6 |
| T-1.5 | **Stream + input + snapshot routes (S-02/S-03/S-04/S-05).** WS upgrade routing, binary stream, sequence cursor resume, send-keys with bracketed-paste. | `server/routes/stream.ts`, `server/routes/input.ts`, `server/upgrade.ts` | T-1.0, T-1.1, T-1.2 | T-1.6, T-1.7 |
| T-1.6 | **Side-channel + commands + sessions routes (S-07/S-08/S-09).** | `server/routes/side.ts`, `server/routes/commands.ts`, `server/routes/sessions.ts` | T-1.0, T-1.1, T-1.4 | T-1.5, T-1.7 |
| T-1.7 | **Health endpoint + config + watchdog (S-12).** Disk watchdog ties buffer caps to global state. | `server/routes/health.ts`, new `config.toml` schema in `config.ts` | T-1.0, T-1.2 | T-1.5, T-1.6 |
| T-1.8 | **Integration smoke harness.** Node script under `scripts/smoke/` that spawns a sidecar, opens a stream, sends keys, drops + reconnects, verifies delta. | `scripts/smoke/**` | T-1.5, T-1.6 | none |
| T-1.9 | **Docs: operator guide.** README section "Running pi-remote as a sidecar", config sample, troubleshooting. | `README.md`, optionally `docs/reference/OPERATOR.md` | T-1.5, T-1.6, T-1.7 | parallel with T-1.8 |
| T-1.10 | **APNs scaffold (deferred but cheap).** `apns/` module: config schema, JWT generation, push primitive. Stub the device-token registry — flesh out in Phase 2 when iOS app provides tokens. | `apns/**`, edits in `auth/tokens.ts` to store device-tokens | T-1.3 | T-1.5..T-1.7 |
## Interface Contracts (lock early to enable parallelism)
These are the contracts that downstream tasks depend on. They must be
agreed and frozen at the start of Phase 1 — see `SYNC.md` for the freeze
protocol.
### IC-1 — WebSocket frames
```ts
// binary frame : raw ANSI stream bytes (output direction only).
// text frame : JSON, type-discriminated.
type ClientToServer =
| { type: "resume"; lastSeq: number | null }
| { type: "key"; name: string } // "escape" | "tab" | "up" | "down" | "left" | "right" | "enter" | "shift-enter"
| { type: "keys"; data: string } // literal text, sent via send-keys -l
| { type: "paste"; data: string } // wrapped in bracketed-paste
| { type: "snapshot-request" };
type ServerToClient =
| { type: "state"; value: "thinking" | "tool" | "idle" | "awaiting-input"; tool?: string; ts: number }
// tree event dropped — out of iOS scope. Revisit if a dashboard wants it.
// resize ClientToServer deferred — fixed 120×40 for v1.
| { type: "snapshot"; seq: number; data: string } // base64 ANSI snapshot
| { type: "session-meta"; name: string; description?: string; createdAt: string }
| { type: "error"; code: string; message: string };
```
Binary frames carry an out-of-band `seq` via a leading 8-byte
big-endian header. Owner: T-1.5.
### IC-2 — HTTP REST shape
```
GET /health → { ok, sessions, bufferBytes, ... }
POST /sessions → { id, name }
GET /sessions → [{ id, name, description, state, lastOutputAt }, …]
PATCH /sessions/:id → updates @description
DELETE /sessions/:id → kills tmux session, optionally clears buffer
GET /sessions/:id/commands → [{ name, description, args }]
GET /sessions/:id/thumbnail → text/plain capture-pane (40×12)
```
All endpoints behind bearer token, all responses `application/json` unless
noted. Owner: T-1.5..T-1.7.
### IC-3 — Pairing payload
QR encodes a `pi-remote://` URL:
```
pi-remote://<host>:<port>?pair=<pairing-token>&fp=<sha256-hex>&name=<sidecar-name>
```
Pairing exchange: client `POST /pair` with `{ pairingToken, deviceToken?, environment?, deviceName? }` → server replies `{ bearerToken, sidecarId }`.
`deviceToken` and `environment` are **optional** pre-Phase-2, **mandatory** from Phase 2 onward. Owner: T-1.3.
### IC-4 — Config schema (TOML)
```toml
[server]
host = "0.0.0.0"
port = 7777
state_dir = "~/.local/share/pi-remote"
[buffer]
per_session_mb = 100
global_gb = 1
free_min_gb = 1
idle_days = 30
[tmux]
default_width = 120
default_height = 40
[apns]
team_id = "..."
key_id = "..."
key_path = "..."
bundle_id = "..."
[autoname]
enabled = true
trigger_after = 3 # user messages
model = "claude-haiku-4-5"
```
Owner: T-1.7.
## Branching Strategy
- Each task is a feature branch off `main`, named `feat/p1-<task-id>-<slug>`,
e.g. `feat/p1-t1-1-tmux-manager`.
- Open a PR as soon as a task is ready for review. Squash-merge.
- T-1.0 (refactor) lands first, then T-1.1..T-1.4 can run truly parallel.
- T-1.5..T-1.7 each consume one or more of the lower-layer modules; they
start as soon as the dependency PR is in `main`.
## Test Strategy
- Unit: per-module pure-logic tests under `extensions/remote-control/**/__tests__/`.
- Integration smoke: T-1.8 script, runnable locally and in CI.
- Manual: each task PR lists manual-verification steps.
- No iOS testing in this phase.
## Risks
- **R1.** Disk-buffer cap math races vs. global watchdog. Mitigation:
serialise buffer writes through a single async queue per session, lock
the global cap behind a mutex.
- **R2.** ExtensionAPI event names might shift in future pi versions.
Mitigation: pin pi version range in `package.json`, isolate adapter in
`pi/events.ts`.
- **R3.** `pi -p` auto-name calls cost money. Mitigation: gate behind
`[autoname] enabled`, debounce, skip if user already named the session.
- **R4.** tmux control-mode protocol is text-framed; binary pane bytes are
octal-escaped (`\NNN`). Parser must handle high-throughput bursts (~50fps
during tool output). Mitigation: streaming line-parser with no full-buffer
copies; per-line decode allocates only the escaped payload. Reference
decode in `spike-cc.ts`.
- **R5.** tmux version requirement. Control mode is stable from tmux 2.0;
modern features (e.g. `pane-died` event) need 2.5+. Mitigation:
`tmux/manager.ts` checks `tmux -V` at startup, refuses to run on < 2.5
with a clear error.
## Exit / Handover
- All T-1.x merged.
- Smoke harness passes locally and in CI.
- Operator guide complete.
- A short `docs/reference/PHASE-1-report.md` summarising deviations from
the plan, especially anything that affects Phase 2 contracts.
- Update `SYNC.md` to unblock Phase 2.