OpenQuack

openquack · larryxiao/openquack · ★ 28 · last commit 2026-05-26

Primitive shape

No installable primitives

Summary

OpenQuack — Summary

OpenQuack is a macOS menu-bar voice dictation app that runs Whisper entirely on-device via WhisperKit on Apple Silicon — no audio, text, or telemetry ever leaves the device. Press a configurable hotkey (default ⌃⇧Space), speak, press again; the transcript appears at the cursor in any app. It supports 99 languages, custom vocabulary dictionaries, auto-paste, smart formatting, and silence-based auto-stop. The "remote control" angle relevant to this batch is its roadmap integration with AI coding agent sessions: the SPEC-031a design doc covers "voice reply to live sessions," meaning OpenQuack will allow voice-driven interaction with running Claude Code sessions. Distributed as a Homebrew Cask or DMG; Swift/SwiftUI on macOS 13+. Compared to the seeds, OpenQuack has no parallel — none of the 11 seeds address voice input or local speech-to-text as an agent input surface; it is closest to agent-os in its "markdown scaffold, zero primitives" philosophy but operates in an entirely different dimension (speech input vs. structured memory files).

Overview

OpenQuack — Overview

Origin

Personal project by larryxiao. v2.0.0-alpha.15 analyzed. 3 contributors. Active development as of 2026-05-26.

Philosophy

"Local. Everything runs on your device — recording, transcription, optional polish. Nothing leaves: no audio, no text, no telemetry, no signup. Confidential work stays confidential, by construction."

"Fast, especially on long clips. Whisper streams while you speak, so a 5-minute dictation finishes in about 3 seconds after you stop."

"OpenQuack is AI-native open source — every PR cites a SPEC, atomic tasks come from the roadmap, the workflow is friendly to coding agents at scale."

Privacy Contract

The README makes a strict privacy commitment: no audio or text ever leaves the device. This is a design constraint that shapes the entire architecture (on-device Whisper, no cloud API).

Agent Integration Angle

OpenQuack is being extended as a voice control surface for AI coding agents. SPEC-031a ("voice reply to live sessions") is merged as a design doc — voice will be able to send commands to running agent sessions, turning the dictation tool into a voice remote control for coding agents. The existing "voice-launched agent" feature (as of v2.0.0-alpha.15) already integrates with Claude Code.

What It Is Not

OpenQuack is not an agent framework, not a workflow harness, and not a memory system. It is a speech input surface that can connect to coding agents.

Architecture

OpenQuack — Architecture

Distribution

Homebrew Cask: brew tap larryxiao/openquack && brew install --cask openquack
DMG download from GitHub Releases
No npm/pip package

Platform

macOS 13+ only
Apple Silicon required for WhisperKit performance (Intel not mentioned)
Swift/SwiftUI + WhisperKit framework

Directory Structure

Sources/           — Swift source code
Tests/             — Swift tests
Casks/             — Homebrew Cask definition
scripts/           — Build/packaging scripts
docs/              — Architecture, benchmarks, roadmap, vision, i18n
  ARCHITECTURE.md
  BENCHMARKS.md
  ROADMAP.md
  VISION.md
  DEVELOPMENT.md
  INSTALL.md
  TUTORIAL.md
bench/             — Benchmark harness
Package.swift      — Swift package definition
Package.resolved   — Dependency lock

Required Runtime

macOS 13+
Apple Silicon (for WhisperKit GPU acceleration)
No network connection required at runtime

Dependencies

WhisperKit (Argmax) — On-device Whisper inference on Apple Silicon
KeyboardShortcuts (Sindre Sorhus) — Hotkey machinery
OpenAI Whisper model (stored locally, loaded on demand)

Target AI Tools

Primary: Claude Code (via voice-launched agent feature, SPEC-031a roadmap) General: any app that accepts keyboard input (voice appears at cursor)

Install

brew tap larryxiao/openquack https://github.com/larryxiao/openquack
brew install --cask openquack
# Or download DMG from GitHub Releases

Configuration

Settings stored locally (macOS user defaults or config file — exact path not disclosed in README). Configurable: hotkey, language, auto-paste toggle, custom dictionary, silence-stop timeout, launch at login.

Components

OpenQuack — Components

Core Components

Component	Description
Menu-bar app	macOS status bar icon — the main entry point
Hotkey listener	Configurable hotkey (default ⌃⇧Space), kickoff ⌃Space
Recording engine	Microphone capture via AVFoundation
WhisperKit inference	On-device Whisper ASR on Apple Silicon
Transcript formatter	Capitalisation, end-punctuation, um/uh cleanup
Custom dictionary	User-defined vocabulary (proper nouns, project names)
Auto-paste	Simulates keyboard input to active app or falls back to clipboard
Language selector	99 languages, auto-detect
Silence detector	Auto-stop after configurable silence period

Roadmap Components (Not Yet Shipped)

Component	Status
In-context transcription	Deferred — reads surrounding text for disambiguation
Thinking mode	Deferred — opt-in second pass through local LLM (Ollama/MLX-LM)
Voice reply to live sessions (SPEC-031a)	Design doc merged; implementation in progress

Agent Integration (Current)

Voice-launched agent: dictation hotkey default (⌃⇧Space), kickoff moves to ⌃Space — allows speaking a prompt and launching a Claude Code session
Blank-audio kickoffs no longer burn a Claude session (v2.0.0-alpha.15 fix)
Curated Whisper-bias dictionary: Claude, Anthropic, macOS proper nouns — restore with one click

No Framework Primitives

OpenQuack ships zero slash-commands, skills, agents, hooks, or MCP servers. It is purely a native macOS application with no Claude Code plugin structure.

Prompts

OpenQuack — Prompts

OpenQuack is a native macOS application, not a prompt framework. It contains no agent prompt files. The AGENTS.md provides guidance for AI contributors.

Excerpt 1: AGENTS.md Contribution Guidance

From AGENTS.md:

See CLAUDE.md for project architecture, commands, and conventions.

The AGENTS.md defers to CLAUDE.md — a single-line delegation pattern common in AI-native open source projects.

Excerpt 2: README — Privacy Contract (behavioral specification)

From README:

Nothing leaves your device — audio, text, nothing. Recording and transcription are fully local. Always.
No analytics, no telemetry, no signup.

This is an immutable behavioral constraint embedded in the product description rather than a prompt.

Excerpt 3: Release Note (v2.0.0-alpha.15) — Feature Specification Pattern

From README release notes:

Polish on the voice-launched agent. Dictation hotkey default back to ⌃⇧Space (kickoff moves to ⌃Space), 
blank-audio kickoffs no longer burn a claude session, a curated Whisper-bias dictionary ships in Settings 
(Claude, Anthropic, macOS, …) — restore with one click. SPEC-031a (voice reply to live sessions) merged 
as a design doc; implementation lands across the next few releases.

Prompting technique: SPEC-driven development — each feature is linked to a SPEC document ID, creating traceability between design docs and shipped behavior. This mirrors the openspec seed's delta-spec approach but applied to a native app rather than AI agent instructions.

Note

No docs/VISION.md or spec files were fetched (they exist on the GitHub repo but were not retrieved in this analysis). The SPEC-031a design doc mentioned in release notes represents the closest thing to a "prompt" — it specifies how voice should integrate with live coding sessions.

Uniqueness

OpenQuack — Uniqueness

differs_from_seeds

OpenQuack has no meaningful parallel in the 11 seeds. None of the seeds — superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, claude-conductor, spec-driver, ccmemory — address voice input as an agent interaction surface. The closest analogy is agent-os in philosophy (simple, focused, no-framework primitives), but they solve entirely different problems. OpenQuack is a macOS app, not a Claude Code plugin or methodology. The only relevant connection to the seeds is the roadmap SPEC-031a feature that will allow voice to reach live Claude Code sessions — at that point it would become a voice-input remote-control surface for any framework that runs in a terminal.

Positioning

OpenQuack sits at the input layer of the human-agent interaction stack: it converts voice to text and injects it into any app. In the remote-control batch, it represents the voice channel versus the Telegram channel (CCBot), the terminal TUI channel (Sidecar), and the web/mobile channel (IM.codes). Its competitive differentiation is privacy-first local execution — competing with Wispr Flow and Typeless (both cloud, both closed).

Observable Failure Modes

macOS-only: no Windows or Linux support
Apple Silicon requirement: Intel Macs have degraded performance
Alpha status: v2.0.0-alpha.15, not production-stable
No agent-specific integration yet: SPEC-031a is design-doc-only; current version just produces text the user must manually send
3-contributor project: limited maintenance bandwidth
WER in noise: ~6.3% in office noise — meaningful for technical dictation

What Is Genuinely Novel

Only on-device voice dictation tool explicitly designed for AI coding agent workflows
Privacy-by-construction (no audio/text ever leaves device — enforced architecturally, not just policy)
SPEC-driven development workflow makes it dogfood for AI-native development methodology
AI-native acknowledgment: "I haven't written any code myself — it was built almost entirely by Claude Code" (IM.codes README — same philosophy applies here)

Workflow

OpenQuack — Workflow

Dictation Workflow

Press hotkey (⌃⇧Space)
Speak
Press hotkey again (or silence auto-stops)
WhisperKit transcribes on-device
Formatted transcript auto-pasted at cursor (or copied to clipboard)

Agent Interaction Workflow (Current)

Open Claude Code terminal or any agent chat interface
Press hotkey to dictate a message
Transcript appears in the agent's input field
User presses Enter (no automatic submission)

Voice-Launched Agent Workflow (v2.0.0-alpha.15)

Press kickoff hotkey (⌃Space)
Speak the task
OpenQuack launches a Claude Code session with the transcript as the initial prompt
Empty audio detected → session not launched (blank-audio guard)

Phase → Artifact Map

Phase	Artifact
Recording	In-memory audio buffer (never persisted)
Transcription	In-memory text string
Auto-paste	Keyboard event injected into active app
Clipboard fallback	macOS pasteboard content

Approval Gates

None. OpenQuack has no approval gates — it is a synchronous input tool.

AI-Native Development Workflow

OpenQuack itself is developed "AI-native open source" style: every PR cites a SPEC, atomic tasks come from the roadmap. SPEC IDs like SPEC-031a appear in release notes, connecting features to design documents.

Memory Context

OpenQuack — Memory & Context

Memory Architecture

OpenQuack has no memory system. Audio is captured in memory, transcribed, and discarded. No transcript history is stored (as a privacy guarantee).

State Persistence

User settings (hotkey preference, language selection, custom dictionary, auto-paste toggle) — stored in macOS user defaults or app-specific config; exact paths not exposed
Custom vocabulary dictionary — stored locally on device
No session history, no transcript log

Context Injection (Roadmap)

The "In-context transcription" roadmap feature (deferred) would read surrounding text before transcribing, using it to disambiguate domain terms. This would be the closest thing to context-awareness — reading from the active app's text field.

Cross-Session Handoff

None. Each dictation is independent; there is no concept of a "session" in OpenQuack.

Compaction

Not applicable.

Summary

OpenQuack is a stateless voice input tool. Its memory footprint is ~120 MB RAM at idle. The Whisper model lives on disk and only loads on hotkey press. Privacy-by-design means zero persistence of any user speech or text data.

Orchestration

OpenQuack — Orchestration

Multi-Agent Support

No. OpenQuack is a single-user, single-session input tool. It does not coordinate multiple agents.

Orchestration Pattern

None. It is a keyboard-input augmentation tool.

Execution Mode

Event-driven (hotkey-triggered). Dormant between hotkey presses.

Isolation Mechanism

Not applicable (no agent execution).

Multi-Model Support

No. The ASR model (Whisper) is fixed. The "Thinking mode" roadmap feature would optionally route through Ollama/MLX-LM as a local LLM polish pass, but this is not shipped.

Agent Integration Mode (Current + Roadmap)

Currently: OpenQuack produces text that the user pastes into any agent interface. The agent is unmodified.

Roadmap (SPEC-031a): "Voice reply to live sessions" — OpenQuack will be able to send voice-transcribed messages directly to running Claude Code (and possibly other) sessions, making it a voice-controlled remote for the session rather than just a dictation tool.

Consensus Mechanism

None.

Prompt Chaining

No.

Ui Cli Surface

OpenQuack — UI & CLI Surface

CLI Binary

None. OpenQuack is a macOS app, not a CLI tool. No binary exposed in PATH.

Local UI

Type: macOS menu-bar app (status bar icon)
Tech stack: Swift/SwiftUI
Features:
- Menu-bar popover showing recording status ("Listening"), live level meter, last transcript
- "Pasted at cursor" confirmation message
- Settings panel: hotkey customization, language selection, custom dictionary, auto-paste toggle
- Launch at login toggle

Observability

~120 MB RAM at idle
Whisper model loads on-demand (only when hotkey pressed)
No telemetry, no logging to external services

IDE Integration

None directly. OpenQuack works in any app that accepts keyboard input, including VS Code, terminal apps, JetBrains IDEs, etc. — but via system-level auto-paste, not via IDE plugin APIs.

Cross-Tool Portability

High by nature — it operates at the OS input layer (simulates keyboard events). Works with any application.

Benchmarks

From docs/BENCHMARKS.md (referenced, not fetched):

~2.6% word-error rate on real human speech (baseline M4/16GB)
~6.3% WER in realistic office noise
5-minute dictation finishes ~3 seconds after speaking stops

Related frameworks

same archetype · same primary tool · same memory type

Goose (Block/AAIF) ★ 46k

A12 UI passthrough

General-purpose AI agent (not just code) with security-first tool inspection, recipe-based shareable configurations, and 15+ LLM…

Vibe Kanban ★ 27k

A12 UI passthrough

Eliminate the overhead of planning, switching between agent terminals, and reviewing diffs by providing a single web dashboard…

1Code ★ 5.5k

A12 UI passthrough

Cursor-like desktop experience for Claude Code and Codex with cloud background agents, event-driven automations, and a full…

Crystal (stravu) ★ 3.1k

A12 UI passthrough

Manage multiple parallel AI coding sessions in isolated git worktrees from a single desktop GUI.

Maestro (RunMaestro) ★ 3.0k

A12 UI passthrough

Orchestrate unlimited parallel AI agent sessions with a keyboard-first desktop app including Group Chat coordination and Auto Run…

AgentsMesh ★ 2.1k

A12 UI passthrough

Multi-tenant workforce platform that gives every team member a squad of AI coding agents coordinated through channels, pod…

Distribution

Type: desktop-app
License: MIT
Install: one-liner
Version: 2.0.0-alpha.15

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: desktop-app
Tech stack: Swift/SwiftUI + WhisperKit

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 4
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: No
Locked to: whisper (on-device)
Modal: text+vision+audio

Execution

Mode: event-driven
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: none
Persistence: none
Search: none
State files: 1 file

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 2
Portability: high

Signals

Stars: 28
Last commit: 2026-05-26
Contributors: 3
Maintainer: active
Quality score: 0/10