Documentation

Voice & UI

Talk to your agent and use the local web interface.

Voice Mode

GitClaw supports real-time bidirectional voice via two adapters:

OpenAI Realtime (default)

  • Model: gpt-realtime-2025-08-28
  • Real-time audio streaming over WebSocket
  • Supports image input (camera frames)
  • Requires: OPENAI_API_KEY

Gemini Live

  • Model: gemini-2.0-flash
  • Alternative voice provider
  • Free tier available
  • Requires: GEMINI_API_KEY
# OpenAI voice (default)
gitclaw --voice --dir ~/assistant

# Gemini voice
gitclaw --voice gemini --dir ~/assistant

Text-Only Fallback

If no voice API key is set, GitClaw still starts the web UI server but with voice disabled. A warning banner appears in the UI, mic/camera/speaker buttons are hidden, and text input routes directly to the agent via query().

Camera

  • Front/back camera toggle (mobile)
  • Captures frames every 1 second as JPEG
  • Frames injected into conversation as images
  • Auto-captures on "memorable moments" (laughter, excitement)

Web UI

The voice server runs at http://localhost:3333 and provides a full-featured web interface.

Tabs

TabFeatures
ChatReal-time conversation, voice controls, camera, agent vitals, file system viewer
SkillsBrowse and install skills from the marketplace
IntegrationsConnect Composio services (Gmail, Calendar, Slack, GitHub)
CommunicationTelegram bot setup, WhatsApp connection, phone/SMS webhook
SkillFlowsVisual workflow builder — chain skills into multi-step flows
SchedulerCreate cron jobs — run prompts on a schedule
SettingsModel selection, API keys, custom base URL — saves to .env and agent.yaml

Agent Vitals

Real-time metrics displayed in the Chat tab:

  • CPU — Delta-based percentage (blue)
  • Memory — RSS in MB (orange)
  • Tokens — Total tokens used in session (purple)
  • Uptime — Server uptime synced from backend (green)
  • Pulse — CPU wave visualization

Mobile Responsive

The UI is responsive under 700px:

  • Tabs become a scrollable horizontal strip
  • Camera panel stacks vertically
  • Controls have 44px touch targets
  • Sidebar overlays instead of pushing content
  • All views stack vertically