It's 9:14am. You're on the subway. Claude Code is running on your laptop at home, mid-task — it finished restructuring a service layer and hit a decision point. It's waiting for your input. The session is live, context loaded, everything ready. And you can't do anything about it until you get back to your desk.
That's the friction. Not the lack of a coding environment on mobile — the inability to interact with an AI session that's already running somewhere else.
Why the obvious solutions don't work
The first thing developers try is SSH. It works, technically. But on a phone screen, SSH is barely usable — tiny keyboard, no proper terminal emulator behavior, and the latency makes every keystroke feel like a coin flip. You're not in a real Claude Code session; you're fighting the interface.
The second attempt is usually ngrok or a similar reverse tunnel. Point ngrok at localhost, get a public URL, connect from mobile. The problem: your traffic routes through ngrok's servers. Every message — including your code, prompts, and Claude's responses — passes through a third party in plaintext. If your project contains API keys, internal business logic, or anything you'd consider sensitive, that's not a tradeoff you should make for the convenience of checking in from your phone.
Screen sharing (TeamViewer, Screens, etc.) is the third option people reach for. It works, but only when your laptop is awake, plugged in, and connected. It's not async — you need both devices in a cooperative state simultaneously. The battery drain is real, and for a quick "review what Claude just did" check-in, it's massive overkill.
None of these solve the actual use case: async, low-friction access to a running AI session from mobile, without compromising what the session contains.
What Termly does differently
Termly is a WebSocket relay with zero-knowledge end-to-end encryption. The relay server never decrypts anything — it routes encrypted bytes between your CLI and your mobile device. Your laptop holds one key, your phone holds the other. The server sees blobs.
Setup is three commands:
# Install the CLI
npm install -g @termly-dev/cli
# Start Claude Code through Termly
cd /your/project
termly start --ai claude-codeA QR code appears in your terminal. Open the Termly app on iOS or Android, scan it, and your Claude Code session mirrors to your phone. The whole process takes under 60 seconds.
It works with Claude Code, OpenCode, Cursor, and any other CLI-based AI coding tool. The mobile app is free.
Key properties:
- End-to-end encrypted with AES-256-GCM — keys never leave your devices
- Zero-knowledge relay — the server cannot read your code or prompts even if subpoenaed
- One mobile device per session — when you connect a new phone, the old connection is cleanly closed
- Push notifications — get alerted when Claude finishes a task and is waiting for input
The architecture, for people who want to understand it
This is the part worth understanding if you're building anything similar.
The relay model
When you run termly start, two WebSocket connections are established:
- Agent connection (CLI → relay server): the CLI pairs with a session code, gets a
sessionIdback - Client connection (mobile → relay server): authenticated via JWT, joined to the same
sessionId
The server maintains two maps: sessionId → agentSocket and sessionId → clientSocket. When a message arrives from the CLI, it's forwarded to the mobile socket. When input arrives from mobile, it goes to the CLI. That's it. The server is a router, not a processor.
The raw relay fast-path
The naive implementation would parse every incoming WebSocket frame as JSON, inspect the type, then re-serialize and forward. For a high-frequency stream of Claude's output, this means two JSON operations per message under load.
Termly avoids this by scanning only the first 50 bytes of each incoming frame to identify the message type. For data messages (output, error, catchup_batch), the raw Buffer is forwarded directly to the mobile socket — no JSON.parse, no JSON.stringify. This eliminates double-serialization overhead and reduces CPU load by roughly 50-70% under real usage compared to the naive approach.
Only control messages (ping, pong) go through the full parse path — and those arrive at most once every 5 seconds.
Reconnection and catchup
Mobile connections drop. Subway tunnels, background app suspension, switching WiFi networks — a mobile client can't be assumed to always be connected.
When Termly's mobile app reconnects, it sends a lastSeq parameter — the sequence number of the last message it successfully received. The relay forwards a catchup_request to the CLI agent. The CLI replays all messages after lastSeq from its own local buffer, sending them in batches (catchup_batch messages). The relay forwards these batches directly to mobile.
The server never buffers messages. The CLI does. This is intentional: it keeps the relay stateless and avoids a memory accumulation problem when many sessions are open.
Abuse protection
A token bucket rate limiter applies to all data messages from the CLI. The bucket refills at 512 KB/s — roughly 800x the actual output rate of a real LLM (~600 bytes/s). The capacity is 30 MB, allowing for legitimate bursts like cat large_file without disconnecting.
The implementation uses no timers. Refill is calculated lazily on each message arrival: elapsed_ms * (rate / 1000) tokens added, capped at capacity. Three arithmetic operations. If the bucket empties, the connection closes with code 1008 Policy Violation — the CLI reconnects with a fresh full bucket.
Why zero-knowledge matters specifically for code
A relay that decrypts your messages is a single point of compromise. Your codebase contains things you don't want exposed: API keys hardcoded during development, internal service URLs, proprietary business logic. If the relay server is breached, or compelled to produce records, encrypted blobs are useless without the keys — which never leave your CLI and mobile device.
This is the architectural guarantee that SSH through a cloud proxy or an ngrok tunnel cannot give you.
Real workflows
Commute review. Claude finishes a 40-file refactor while you're on the train. You open Termly, read through the summary, spot a problem in how it handled the auth middleware, and send a correction prompt. By the time you get to the office, Claude has already re-done the relevant files. You never opened your laptop.
Emergency hotfix. 2am, production incident, you're not at your desk. You open Termly on your iPhone, connect to the Claude Code session running on your home machine, describe the bug, guide it to the fix, review the diff, and merge from mobile. The whole thing takes 12 minutes. Without Termly, you're either scrambling to find a laptop or waiting until morning.
iPad as a second screen. Run Claude Code on your Mac, mirror to iPad with Termly. Use Split View: Safari with the relevant docs on one side, Claude's session on the other. Use voice input for longer prompts. This setup works especially well for research-heavy tasks where you want to paste context from browser without alt-tabbing constantly.
Full setup
# 1. Install Termly CLI
npm install -g @termly-dev/cli
# 2. Navigate to your project
cd /path/to/your/project
# 3. Launch Claude Code through Termly
termly start --ai claude-code
# A QR code appears in the terminal
# 4. Open Termly app → tap "+" → scan QR code
# Your Claude Code session appears on mobile instantlyFor OpenCode:
termly start --ai opencodeDownload the app:
- iOS: https://apps.apple.com/app/id6754467087
- Android: https://play.google.com/store/apps/details?id=dev.termly.app
- Website: https://termly.dev
The broader pattern
Most developer tooling assumes you're at a desk. The mental model is: computer → tool → output, consumed locally. AI coding agents are no different — they run locally, they produce output locally, they wait for local input.
The zero-knowledge relay pattern breaks that assumption without compromising the security model. Your compute stays local. Your data stays encrypted. Your UI can be anywhere. Termly is one implementation of this for AI coding agents, but the architecture — stateless relay, client-side encryption, async catchup via sequence numbers — applies to anything that needs to bridge a local process to a remote client without a cloud intermediary that can see your data.
If you're building developer tooling that needs async mobile access, that's the architecture worth stealing.
.png)