Skip to content
/

OpenQuack

openquack · larryxiao/openquack · ★ 28 · last commit 2026-05-26

Primitive shape
No installable primitives
00

Summary

OpenQuack — Summary

OpenQuack is a macOS menu-bar voice dictation app that runs Whisper entirely on-device via WhisperKit on Apple Silicon — no audio, text, or telemetry ever leaves the device. Press a configurable hotkey (default ⌃⇧Space), speak, press again; the transcript appears at the cursor in any app. It supports 99 languages, custom vocabulary dictionaries, auto-paste, smart formatting, and silence-based auto-stop. The "remote control" angle relevant to this batch is its roadmap integration with AI coding agent sessions: the SPEC-031a design doc covers "voice reply to live sessions," meaning OpenQuack will allow voice-driven interaction with running Claude Code sessions. Distributed as a Homebrew Cask or DMG; Swift/SwiftUI on macOS 13+. Compared to the seeds, OpenQuack has no parallel — none of the 11 seeds address voice input or local speech-to-text as an agent input surface; it is closest to agent-os in its "markdown scaffold, zero primitives" philosophy but operates in an entirely different dimension (speech input vs. structured memory files).

01

Overview

OpenQuack — Overview

Origin

Personal project by larryxiao. v2.0.0-alpha.15 analyzed. 3 contributors. Active development as of 2026-05-26.

Philosophy

"Local. Everything runs on your device — recording, transcription, optional polish. Nothing leaves: no audio, no text, no telemetry, no signup. Confidential work stays confidential, by construction."

"Fast, especially on long clips. Whisper streams while you speak, so a 5-minute dictation finishes in about 3 seconds after you stop."

"OpenQuack is AI-native open source — every PR cites a SPEC, atomic tasks come from the roadmap, the workflow is friendly to coding agents at scale."

Privacy Contract

The README makes a strict privacy commitment: no audio or text ever leaves the device. This is a design constraint that shapes the entire architecture (on-device Whisper, no cloud API).

Agent Integration Angle

OpenQuack is being extended as a voice control surface for AI coding agents. SPEC-031a ("voice reply to live sessions") is merged as a design doc — voice will be able to send commands to running agent sessions, turning the dictation tool into a voice remote control for coding agents. The existing "voice-launched agent" feature (as of v2.0.0-alpha.15) already integrates with Claude Code.

What It Is Not

OpenQuack is not an agent framework, not a workflow harness, and not a memory system. It is a speech input surface that can connect to coding agents.

02

Architecture

OpenQuack — Architecture

Distribution

  • Homebrew Cask: brew tap larryxiao/openquack && brew install --cask openquack
  • DMG download from GitHub Releases
  • No npm/pip package

Platform

  • macOS 13+ only
  • Apple Silicon required for WhisperKit performance (Intel not mentioned)
  • Swift/SwiftUI + WhisperKit framework

Directory Structure

Sources/           — Swift source code
Tests/             — Swift tests
Casks/             — Homebrew Cask definition
scripts/           — Build/packaging scripts
docs/              — Architecture, benchmarks, roadmap, vision, i18n
  ARCHITECTURE.md
  BENCHMARKS.md
  ROADMAP.md
  VISION.md
  DEVELOPMENT.md
  INSTALL.md
  TUTORIAL.md
bench/             — Benchmark harness
Package.swift      — Swift package definition
Package.resolved   — Dependency lock

Required Runtime

  • macOS 13+
  • Apple Silicon (for WhisperKit GPU acceleration)
  • No network connection required at runtime

Dependencies

  • WhisperKit (Argmax) — On-device Whisper inference on Apple Silicon
  • KeyboardShortcuts (Sindre Sorhus) — Hotkey machinery
  • OpenAI Whisper model (stored locally, loaded on demand)

Target AI Tools

Primary: Claude Code (via voice-launched agent feature, SPEC-031a roadmap) General: any app that accepts keyboard input (voice appears at cursor)

Install

brew tap larryxiao/openquack https://github.com/larryxiao/openquack
brew install --cask openquack
# Or download DMG from GitHub Releases

Configuration

Settings stored locally (macOS user defaults or config file — exact path not disclosed in README). Configurable: hotkey, language, auto-paste toggle, custom dictionary, silence-stop timeout, launch at login.

03

Components

OpenQuack — Components

Core Components

Component Description
Menu-bar app macOS status bar icon — the main entry point
Hotkey listener Configurable hotkey (default ⌃⇧Space), kickoff ⌃Space
Recording engine Microphone capture via AVFoundation
WhisperKit inference On-device Whisper ASR on Apple Silicon
Transcript formatter Capitalisation, end-punctuation, um/uh cleanup
Custom dictionary User-defined vocabulary (proper nouns, project names)
Auto-paste Simulates keyboard input to active app or falls back to clipboard
Language selector 99 languages, auto-detect
Silence detector Auto-stop after configurable silence period

Roadmap Components (Not Yet Shipped)

Component Status
In-context transcription Deferred — reads surrounding text for disambiguation
Thinking mode Deferred — opt-in second pass through local LLM (Ollama/MLX-LM)
Voice reply to live sessions (SPEC-031a) Design doc merged; implementation in progress

Agent Integration (Current)

  • Voice-launched agent: dictation hotkey default (⌃⇧Space), kickoff moves to ⌃Space — allows speaking a prompt and launching a Claude Code session
  • Blank-audio kickoffs no longer burn a Claude session (v2.0.0-alpha.15 fix)
  • Curated Whisper-bias dictionary: Claude, Anthropic, macOS proper nouns — restore with one click

No Framework Primitives

OpenQuack ships zero slash-commands, skills, agents, hooks, or MCP servers. It is purely a native macOS application with no Claude Code plugin structure.

05

Prompts

OpenQuack — Prompts

OpenQuack is a native macOS application, not a prompt framework. It contains no agent prompt files. The AGENTS.md provides guidance for AI contributors.

Excerpt 1: AGENTS.md Contribution Guidance

From AGENTS.md:

See CLAUDE.md for project architecture, commands, and conventions.

The AGENTS.md defers to CLAUDE.md — a single-line delegation pattern common in AI-native open source projects.

Excerpt 2: README — Privacy Contract (behavioral specification)

From README:

Nothing leaves your device — audio, text, nothing. Recording and transcription are fully local. Always.
No analytics, no telemetry, no signup.

This is an immutable behavioral constraint embedded in the product description rather than a prompt.

Excerpt 3: Release Note (v2.0.0-alpha.15) — Feature Specification Pattern

From README release notes:

Polish on the voice-launched agent. Dictation hotkey default back to ⌃⇧Space (kickoff moves to ⌃Space), 
blank-audio kickoffs no longer burn a claude session, a curated Whisper-bias dictionary ships in Settings 
(Claude, Anthropic, macOS, …) — restore with one click. SPEC-031a (voice reply to live sessions) merged 
as a design doc; implementation lands across the next few releases.

Prompting technique: SPEC-driven development — each feature is linked to a SPEC document ID, creating traceability between design docs and shipped behavior. This mirrors the openspec seed's delta-spec approach but applied to a native app rather than AI agent instructions.

Note

No docs/VISION.md or spec files were fetched (they exist on the GitHub repo but were not retrieved in this analysis). The SPEC-031a design doc mentioned in release notes represents the closest thing to a "prompt" — it specifies how voice should integrate with live coding sessions.

09

Uniqueness

OpenQuack — Uniqueness

differs_from_seeds

OpenQuack has no meaningful parallel in the 11 seeds. None of the seeds — superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, claude-conductor, spec-driver, ccmemory — address voice input as an agent interaction surface. The closest analogy is agent-os in philosophy (simple, focused, no-framework primitives), but they solve entirely different problems. OpenQuack is a macOS app, not a Claude Code plugin or methodology. The only relevant connection to the seeds is the roadmap SPEC-031a feature that will allow voice to reach live Claude Code sessions — at that point it would become a voice-input remote-control surface for any framework that runs in a terminal.

Positioning

OpenQuack sits at the input layer of the human-agent interaction stack: it converts voice to text and injects it into any app. In the remote-control batch, it represents the voice channel versus the Telegram channel (CCBot), the terminal TUI channel (Sidecar), and the web/mobile channel (IM.codes). Its competitive differentiation is privacy-first local execution — competing with Wispr Flow and Typeless (both cloud, both closed).

Observable Failure Modes

  1. macOS-only: no Windows or Linux support
  2. Apple Silicon requirement: Intel Macs have degraded performance
  3. Alpha status: v2.0.0-alpha.15, not production-stable
  4. No agent-specific integration yet: SPEC-031a is design-doc-only; current version just produces text the user must manually send
  5. 3-contributor project: limited maintenance bandwidth
  6. WER in noise: ~6.3% in office noise — meaningful for technical dictation

What Is Genuinely Novel

  • Only on-device voice dictation tool explicitly designed for AI coding agent workflows
  • Privacy-by-construction (no audio/text ever leaves device — enforced architecturally, not just policy)
  • SPEC-driven development workflow makes it dogfood for AI-native development methodology
  • AI-native acknowledgment: "I haven't written any code myself — it was built almost entirely by Claude Code" (IM.codes README — same philosophy applies here)
04

Workflow

OpenQuack — Workflow

Dictation Workflow

  1. Press hotkey (⌃⇧Space)
  2. Speak
  3. Press hotkey again (or silence auto-stops)
  4. WhisperKit transcribes on-device
  5. Formatted transcript auto-pasted at cursor (or copied to clipboard)

Agent Interaction Workflow (Current)

  1. Open Claude Code terminal or any agent chat interface
  2. Press hotkey to dictate a message
  3. Transcript appears in the agent's input field
  4. User presses Enter (no automatic submission)

Voice-Launched Agent Workflow (v2.0.0-alpha.15)

  1. Press kickoff hotkey (⌃Space)
  2. Speak the task
  3. OpenQuack launches a Claude Code session with the transcript as the initial prompt
  4. Empty audio detected → session not launched (blank-audio guard)

Phase → Artifact Map

Phase Artifact
Recording In-memory audio buffer (never persisted)
Transcription In-memory text string
Auto-paste Keyboard event injected into active app
Clipboard fallback macOS pasteboard content

Approval Gates

None. OpenQuack has no approval gates — it is a synchronous input tool.

AI-Native Development Workflow

OpenQuack itself is developed "AI-native open source" style: every PR cites a SPEC, atomic tasks come from the roadmap. SPEC IDs like SPEC-031a appear in release notes, connecting features to design documents.

06

Memory Context

OpenQuack — Memory & Context

Memory Architecture

OpenQuack has no memory system. Audio is captured in memory, transcribed, and discarded. No transcript history is stored (as a privacy guarantee).

State Persistence

  • User settings (hotkey preference, language selection, custom dictionary, auto-paste toggle) — stored in macOS user defaults or app-specific config; exact paths not exposed
  • Custom vocabulary dictionary — stored locally on device
  • No session history, no transcript log

Context Injection (Roadmap)

The "In-context transcription" roadmap feature (deferred) would read surrounding text before transcribing, using it to disambiguate domain terms. This would be the closest thing to context-awareness — reading from the active app's text field.

Cross-Session Handoff

None. Each dictation is independent; there is no concept of a "session" in OpenQuack.

Compaction

Not applicable.

Summary

OpenQuack is a stateless voice input tool. Its memory footprint is ~120 MB RAM at idle. The Whisper model lives on disk and only loads on hotkey press. Privacy-by-design means zero persistence of any user speech or text data.

07

Orchestration

OpenQuack — Orchestration

Multi-Agent Support

No. OpenQuack is a single-user, single-session input tool. It does not coordinate multiple agents.

Orchestration Pattern

None. It is a keyboard-input augmentation tool.

Execution Mode

Event-driven (hotkey-triggered). Dormant between hotkey presses.

Isolation Mechanism

Not applicable (no agent execution).

Multi-Model Support

No. The ASR model (Whisper) is fixed. The "Thinking mode" roadmap feature would optionally route through Ollama/MLX-LM as a local LLM polish pass, but this is not shipped.

Agent Integration Mode (Current + Roadmap)

Currently: OpenQuack produces text that the user pastes into any agent interface. The agent is unmodified.

Roadmap (SPEC-031a): "Voice reply to live sessions" — OpenQuack will be able to send voice-transcribed messages directly to running Claude Code (and possibly other) sessions, making it a voice-controlled remote for the session rather than just a dictation tool.

Consensus Mechanism

None.

Prompt Chaining

No.

08

Ui Cli Surface

OpenQuack — UI & CLI Surface

CLI Binary

None. OpenQuack is a macOS app, not a CLI tool. No binary exposed in PATH.

Local UI

  • Type: macOS menu-bar app (status bar icon)
  • Tech stack: Swift/SwiftUI
  • Features:
    • Menu-bar popover showing recording status ("Listening"), live level meter, last transcript
    • "Pasted at cursor" confirmation message
    • Settings panel: hotkey customization, language selection, custom dictionary, auto-paste toggle
    • Launch at login toggle

Observability

  • ~120 MB RAM at idle
  • Whisper model loads on-demand (only when hotkey pressed)
  • No telemetry, no logging to external services

IDE Integration

None directly. OpenQuack works in any app that accepts keyboard input, including VS Code, terminal apps, JetBrains IDEs, etc. — but via system-level auto-paste, not via IDE plugin APIs.

Cross-Tool Portability

High by nature — it operates at the OS input layer (simulates keyboard events). Works with any application.

Benchmarks

From docs/BENCHMARKS.md (referenced, not fetched):

  • ~2.6% word-error rate on real human speech (baseline M4/16GB)
  • ~6.3% WER in realistic office noise
  • 5-minute dictation finishes ~3 seconds after speaking stops

Related frameworks

same archetype · same primary tool · same memory type

Goose (Block/AAIF) ★ 46k

General-purpose AI agent (not just code) with security-first tool inspection, recipe-based shareable configurations, and 15+ LLM…

Vibe Kanban ★ 27k

Eliminate the overhead of planning, switching between agent terminals, and reviewing diffs by providing a single web dashboard…

1Code ★ 5.5k

Cursor-like desktop experience for Claude Code and Codex with cloud background agents, event-driven automations, and a full…

Crystal (stravu) ★ 3.1k

Manage multiple parallel AI coding sessions in isolated git worktrees from a single desktop GUI.

Maestro (RunMaestro) ★ 3.0k

Orchestrate unlimited parallel AI agent sessions with a keyboard-first desktop app including Group Chat coordination and Auto Run…

AgentsMesh ★ 2.1k

Multi-tenant workforce platform that gives every team member a squad of AI coding agents coordinated through channels, pod…