A Star Trek-style **communicator badge** — wearable, voice-first AI device you tap and talk to. Not a tricorder (that's a separate project). ComBadge is the wearable companion: tap, speak, get things done. Voice comms over Wi-Fi to a local LLM.
**Goal:** Wear it daily, talk to it, get answers back. Watch form factor is the target.
---
## Hardware Requirements
### Hard Gates (Must Have)
| Requirement | Why |
|-------------|-----|
| Microphone | Voice input |
| Speaker or audio output | Voice output |
| Battery (LiPo with charging) | Wearable power |
| IMU (accelerometer) | Tap-to-talk activation |
| Wi-Fi | Connection to host/OpenClaw |
| ESP32-S3 or equivalent | WiFi + enough RAM for PicoClaw |
### Nice-to-Have (Don't Gate On)
| Feature | Notes |
|---------|-------|
| Screen | Useful for status (glanceable), not required for voice I/O |
| Camera | Not a priority for ComBadge |
| BLE | WiFi handles connectivity |
| Built-in IMU | Can add external via I2C if needed |
**Screen policy:** Screen is a nice-to-have for status display (glanceable). Do NOT rule out headless boards that meet all hard gates. LEDs or audio feedback are valid substitutes for the screen.
---
## Lead Hardware: Waveshare ESP32-S3-Touch-AMOLED-2.06
**Status:** TOP CONTENDER — Daily driver form factor
**Commitment:** Not ready to commit yet — battery needs more research
- PicoClaw compatible — full agent loop on-device possible
- Designed specifically for voice AI interaction
**Cons:**
- External battery (no built-in LiPo — need to source MX1.25 connector pack)
- Strap form factor = bulkier than flat badge
- Higher price than M5StickS3 (~$40-50 vs $20-25)
**Battery note:** Needs a LiPo with MX1.25 connector. 500mAh+ is reasonable for all-day wear. Budget extra for a quality LiPo pack.
---
### Alternative: Band Module (Slim Add-On)
**Concept:** A slim pod that slides into a watch band (Whoop/Polar style) — the band is just a carrier, the electronics are the pod. Worn 24/7 alongside Apple Watch.
**Goal:** Voice AI in a band form factor, not a watch.
| Component | Option |
|-----------|--------|
| **SoC** | ESP32-S3 mini (XIAO or custom) |
| **Mic** | MEMS digital mic (I2S) |
| **Speaker** | Mini speaker pointing toward wrist/forearm |
| **Feedback** | Haptic motor + RGB LEDs (no screen) |
| **IMU** | For tap-to-talk wake |
| **Battery** | Slim LiPo (target: 40-80mAh) |
| **PMIC** | AXP2101 or similar for aggressive power management |
**Why this form factor:**
- Worn 24/7 like a fitness band
- Doesn't compete with Apple Watch
- Apple Watch handles fitness/notifications; this handles voice AI
- Slim profile — same as Whoop/Polar Loop
**Power management (extreme):**
- IMU in interrupt mode — device in deep sleep until tap detected
- Mic bias off until wake
- ESP32 deep sleep: ~10-20µA average
- Target: 40-80mAh for full day voice usage (20-30 interactions)
- This is tight — needs careful power budgeting
**Audio challenge:** Speaker pointed at wrist won't be loud enough for open-air use. Sound needs to travel up the arm to your ear. May need to evaluate speaker size vs. form factor.
**LEDs:** Simple RGB for status (listening, connected, error). WS2812 or similar.
**Haptics:** Mini ERM or linear resonant actuator for tap confirmation and alerts.
**Status:** Concept phase — sizing study complete
### Band Module Sizing
**Target envelope:** 35 × 25 × 10mm
Real-world reference:
| Device | Pod Dimensions | Weight |
|--------|---------------|--------|
| Whoop 5.0 | 34.7 × 24 × 10.6mm | 26.5g |
| Whoop 4.0 | 35.97 × 25 × 10.1mm | 11.3g |
| Polar Loop | 42 × 27 × 9mm | 29g total |
Target battery: **150mAh** (slightly thicker than Whoop — doable for all-day voice AI)
**Tight spots:** Speaker protrusion is the main challenge — ~10.5mm at thickest point. Battery is the dimensional limiter. Speaker audio path (wrist→ear) needs prototype testing before full commit.
### Band Module Power Budget
Target: 150mAh for full day (~20-30 voice interactions)
**Realistic for heavy use:** 30-50mAh/day. 150mAh gives comfortable headroom. Power management: mic bias OFF except capture, WiFi OFF except streaming, ESP32 deep sleep between interactions, AXP2101 handles all domain gating.
**2.06" is the recommended model** — has all features (dual mic, speaker, IMU, RTC, TF) in a proper watch form factor.
### Fit Assessment
| Hard Gate | Status |
|-----------|--------|
| Mic | ✅ Dual mics |
| Speaker | ✅ Onboard |
| Battery | ✅ External (MX1.25 header) — need to source LiPo |
| IMU | ✅ QMI8658 |
| WiFi | ✅ 802.11 b/g/n |
---
## Architecture
### Activation: IMU Tap Detection
```
User taps watch → IMU detects acceleration spike
→ Mic turns on, watch starts listening
→ User speaks → audio streams to host
→ Response → speaker + screen confirms
```
**Why IMU tap over wake word:**
- Mic bias off until tap fires = much lower power
- IMU in interrupt mode + ESP32 deep sleep ≈ microamps average
- Tap-to-talk is more badge-authentic (Star Trek style)
- No false triggers from ambient conversation
**Wake word is optional** — if you want always-listening, add a TinyML model. For power savings, IMU tap is the default.
### Two Modes
#### Mode A: Watch as Thin Client
Watch captures audio, streams to a nearby host (Tricorder M10, home server, OpenClaw instance). Host handles STT → LLM → TTS. Watch outputs audio + status.
**Pros:** Simple on-watch logic, fast response, no LLM complexity on ESP32
**Cons:** Network-dependent
#### Mode B: Watch as Full Agent (PicoClaw)
Watch runs PicoClaw, connects directly to Ollama. Full autonomous agent loop on-watch.
#### Mode B: Watch as Full Agent (PicoClaw)
Watch runs PicoClaw, connects directly to Ollama. Full autonomous agent loop on-watch.
**Pros:** Works standalone, no host dependency
**Cons:** ESP32-S3 constrained; LLM must fit in 8MB PSRAM (quantized small models only)
**Decision:** **Mode B (ESP-Claw) as primary plan.** Start there, fall back to Mode A if needed. Design lead time gives us time to evaluate ESP-Claw on dev hardware before committing to the custom band module.
### ESP-Claw (NEW — 2026-04-23)
**Espressif's official** agent framework for ESP32-S3. Released 2026-04-23.
**Why it matters:** Validates that full local agent loop is possible on 8MB PSRAM. Inspired by OpenClaw. Directly integrates with OpenClaw via MCP.
**MCP:** Acts as both MCP server (exposes hardware) and MCP client (calls external agents)
**Memory:** On-chip structured long-term memory — preferences and routines extracted from conversations
**Offline:** Lua scripts execute deterministically even offline
**Event-driven:** Local event bus drives sensor triggers, millisecond-latency response
**Flash:** Web Flasher available or build from source
**Relevance to ComBadge:**
- MCP server mode: ESP-Claw on band module exposes hardware to OpenClaw host
- MCP client mode: ESP-Claw calls OpenClaw for heavy reasoning while handling local control
- Qwen backend: can connect to local Ollama instance (no cloud required)
- Already hardware-validated on M5Stack StickS3 (same ESP32-S3-S3 module we considered)
**vs PicoClaw:** ESP-Claw is Espressif's production-grade version. PicoClaw is community/M5Stack. Feature set is similar but ESP-Claw has official support, MCP native, and on-chip memory architecture.
**Status:** Released today — evaluate as primary Mode B path
### Voice Pipeline (on Host)
| Component | Option |
|-----------|--------|
| **STT** | Whisper (local Ollama) |
| **LLM** | Ollama (local, e.g. Qwen 0.5B, Gemma 4 on capable hosts) |
- Dual mics on Waveshare better for voice capture than M5StickS3 single mic
- Added hard gates vs nice-to-have framework
- Screen is nice-to-have, not a gate — do NOT rule out headless boards
- IMU tap detection adopted as default activation (replaces wake word)
- XIAO Sense added as alternative candidate (headless but meets hard gates)
- Local-on-badge capabilities defined
### 2026-04-19 — XIAO ESP32S3 Sense Evaluation
- Seeed XIAO ESP32S3 Sense evaluated against M5StickS3
- Sense has mic, WiFi, BLE, battery management — all hard gates
- No speaker/display — would need external components
- M5StickS3 remains lead due to all-in-one packaging
### 2026-04-15 — Architecture Decision
Start with badge as thin client streaming to local Ollama. Reduces on-device complexity while voice pipeline is proven. Full PicoClaw agent mode is the goal but not the starting point.
### 2026-03-30 — Hardware Candidate
M5StickS3 chosen as lead candidate over AtomS3 because it has built-in audio (mic + speaker). AtomS3 lacks audio, making it a component rather than standalone solution.