Skip to content
/

TabbyML/Tabby

tabby · TabbyML/tabby · ★ 34k · last commit 2026-03-02

Primitive shape
No installable primitives
00

Summary

TabbyML/Tabby — Summary

Tabby is a self-hosted AI coding assistant server written in Rust, providing an open-source, on-premises alternative to GitHub Copilot. It runs as a Docker container or native binary with a web admin dashboard, and IDE plugins for VS Code, JetBrains, Vim/Neovim, and Eclipse. Unlike cloud-based tools, Tabby runs on the organization's own hardware (including consumer GPUs), provides a REST/OpenAPI interface for integration, and ships FIM (Fill-In-the-Middle) completion as the primary feature. The enterprise edition adds team management, LDAP authentication, Git context indexing, and an "Answer Engine" for internal Q&A. Recent developments include the Pochi agent (private preview) for issue-to-PR automation.

Tabby is architecturally different from all seed frameworks: it is an AI server, not a client-side agent methodology. Where superpowers/BMAD/spec-kit augment how Claude Code behaves, Tabby replaces the AI backend itself — organizations run their own Tabby server instead of hitting Anthropic/OpenAI APIs. Seed frameworks like taskmaster-ai and claude-flow assume a cloud LLM; Tabby IS the LLM serving infrastructure.

01

Overview

TabbyML/Tabby — Overview

Origin

Created by TabbyML and maintained as an open-source project. First stable release in August 2023. The project focuses on self-hosted AI coding infrastructure for organizations that cannot or do not want to send code to cloud AI services.

Philosophy

From README:

"Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot."

Three key features:

  1. Self-contained: No external DBMS or cloud service required
  2. OpenAPI interface: Easy integration with existing infrastructure
  3. Consumer GPU support: Runs on gaming GPUs, not just data center hardware

Architecture Philosophy

Tabby is fundamentally an inference server + admin platform:

  • Inference: serves code completions via FIM (Fill-In-the-Middle) and chat via its REST API
  • Platform: admin UI for team management, usage analytics, model management
  • Integration: IDE plugins connect to the local Tabby server

The codebase is a Rust workspace with:

  • crates/tabby: main server
  • ee/tabby-webserver: enterprise web server and admin UI
  • clients/vscode, clients/vim, clients/intellij: IDE clients
  • clients/tabby-agent: the shared agent client library (TypeScript)

RAG for Code Completion (Key Technical Innovation)

From a 2023 blog post referenced in README:

"RAG-based code completion is enabled by detail in v0.3.0! Tabby utilizes repo-level context to get even smarter!"

Tabby uses repository-level context indexing (RAG) for code completion suggestions — not just the open file, but the entire repository structure. This is a differentiator from simple FIM completions.

Pochi (Agent, Private Preview)

"Implement GitHub issues by connecting them to Pochi tasks and create PRs directly from the sidebar with a breakdown of CI/Lint/Test results"

Pochi is an emerging agent layer being built on top of Tabby's infrastructure.

02

Architecture

TabbyML/Tabby — Architecture

Distribution

# Docker (easiest)
docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

# Native binary
# Download from GitHub releases or package managers

Source Layout (Rust + TypeScript monorepo)

crates/
  tabby/              # Main server (Rust)
    src/
      serve.rs        # HTTP server entrypoint
      routes/         # API routes
      services/       # Business logic
  tabby-common/       # Shared types
  tabby-inference/    # Inference engine
  tabby-index/        # Repository context indexing
  tabby-crawler/      # Code crawler for indexing
  tabby-git/          # Git integration
ee/
  tabby-db/           # Database layer
  tabby-schema/       # GraphQL schema
  tabby-webserver/    # Enterprise web server (Rust/Axum)
  tabby-ui/           # Admin web UI
clients/
  vscode/             # VS Code extension
  intellij/           # JetBrains plugin
  vim/                # Vim/Neovim plugin
  eclipse/            # Eclipse plugin
  tabby-agent/        # Shared TypeScript agent library
  tabby-chat-panel/   # Chat UI component

Config Files

  • tabby.json — model specification (FIM template, chat template)
  • ~/.tabby/ — data directory (models, index, logs)

Required Runtime

  • Docker (recommended)
  • Or: Rust toolchain for building from source
  • GPU: CUDA, Metal (Apple Silicon), or CPU inference

API

REST API at http://localhost:8080:

  • /v1/completions — FIM code completions
  • /v1/chat/completions — chat completions (OpenAI-compatible)
  • Admin API via web UI

Model Format

GGUF via llama.cpp, stored in ~/.tabby/models/

Target AI Tools

Tabby IS the AI tool (server). IDE clients connect to it. No external cloud LLM dependency by default.

03

Components

TabbyML/Tabby — Components

Server Components (Rust)

Component Purpose
tabby crate Main HTTP server (Axum)
tabby-inference Model inference engine (llama.cpp via GGML)
tabby-index Repository context indexing (RAG)
tabby-crawler Code crawler for building the index
tabby-git Git repository integration
tabby-common Shared types and utilities

Enterprise Components (Rust)

Component Purpose
tabby-webserver Enterprise web server with admin UI
tabby-schema GraphQL schema
tabby-db Database layer (SQLite/PostgreSQL)
tabby-email Email notification system
tabby-ui Admin web UI (React)

IDE Clients

Client Platform
clients/vscode VS Code extension
clients/intellij JetBrains plugin (IntelliJ, PyCharm, etc.)
clients/vim Vim/Neovim plugin
clients/eclipse Eclipse plugin
clients/tabby-agent Shared TypeScript library for IDE clients
clients/tabby-chat-panel Chat UI component

Admin Web UI Features

  • Team management
  • User authentication (password, OAuth, LDAP)
  • Model management and switching
  • Usage analytics and reporting
  • Activity log
  • Answer Engine (internal Q&A)
  • Git context management (GitHub/GitLab integration)

Pochi (Agent, Private Preview)

GitHub integration agent that:

  • Connects GitHub issues to tasks
  • Creates PRs from the sidebar
  • Shows CI/Lint/Test breakdown results

Answer Engine

Internal knowledge base:

  • Connect to GitHub/GitLab repositories
  • Index documentation and code
  • Q&A interface for internal engineering teams

RAG (Repository Context)

The tabby-index and tabby-crawler components build and maintain a searchable index of the repository for use in code completion suggestions. This provides repo-level context for completions, not just file-level.

05

Prompts

TabbyML/Tabby — Prompts

Prompt 1: FIM (Fill-In-the-Middle) Template

Source: MODEL_SPEC.mdtabby.json model specification

Technique: Prompt templating for FIM inference. The model specification defines how context is structured for code completion requests.

{
  "prompt_template": "<PRE>{prefix}<SUF>{suffix}<MID>",
  "chat_template": "<s>{% for message in messages %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + '</s> ' }}{% endif %}{% endfor %}"
}

The FIM template:

  • {prefix}: code before the cursor
  • {suffix}: code after the cursor
  • The model fills in the <MID> section

This is the core prompting primitive for code completion. Unlike chat-based agents that use free-form prompts, Tabby uses structured FIM templates defined per-model.


Prompt 2: RAG Context Injection

Source: Inferred from tabby-index and tabby-crawler architecture

Technique: Retrieval-Augmented Generation for code completion. The repository index provides relevant code snippets as additional context for completions.

When generating a completion, Tabby:

  1. Takes the current file context (prefix + suffix)
  2. Queries the repository index for similar/relevant code patterns
  3. Injects retrieved snippets as additional context
  4. Generates the completion with expanded context

This is documented as "locally relevant snippets (declarations from local LSP, and recently modified code)" added in v0.5.


Prompt 3: Answer Engine Query Handling

Source: Inferred from v0.13 release notes

Technique: Grounded Q&A with source attribution. The Answer Engine is designed to provide "reliable and precise answers" grounded in indexed content, with citations.

The system likely uses a retrieval step (query → relevant docs/code) followed by generation with explicit "cite your sources" constraints, similar to the OpenHands documentation microagent.


Prompting Techniques Used

  1. FIM templates: Structured prefix/suffix/middle format for code completion — not conversational prompting
  2. Per-model prompt templates: Each model has its own template defined in tabby.json, not a universal format
  3. RAG injection: Repository context retrieved and injected as additional prompt context
  4. Grounded generation: Answer Engine requires source attribution
  5. Chat templates: Jinja2-style templates for instruct/chat models (separate from FIM templates)
09

Uniqueness

TabbyML/Tabby — Uniqueness and Positioning

Differs from Seeds

Tabby is architecturally different from all seed frameworks — it is an AI inference server, not a client-side agent methodology. Where superpowers, BMAD-method, spec-kit, agent-os, and other seeds augment how a cloud LLM agent behaves, Tabby replaces the cloud LLM itself. Organizations run Tabby on their own servers, and IDE plugins connect to it instead of Anthropic/OpenAI. Seed frameworks like taskmaster-ai, claude-flow, and openspec assume a cloud LLM as the backend; Tabby IS the backend. The closest seed comparison is ccmemory (both provide infrastructure rather than methodology) but ccmemory augments Claude Code with memory while Tabby provides the entire AI serving layer.

Key Differentiators

  1. Self-hosted inference: Organizations run Tabby on their own hardware. No data leaves the company. This is the primary value proposition — data sovereignty.

  2. Consumer GPU support: Metal (Apple M1/M2), CUDA, and CPU inference. This makes self-hosting viable for small teams with gaming GPUs.

  3. Admin platform: The most complete admin dashboard in this corpus — team management, LDAP authentication, usage analytics, activity logs, notification systems.

  4. Answer Engine: Internal knowledge base with repository indexing and grounded Q&A. No other framework in this batch ships this as a built-in feature.

  5. Multi-IDE support: VS Code, JetBrains, Vim/Neovim, and Eclipse. The broadest IDE coverage in this batch.

  6. FIM-native: Built around Fill-In-the-Middle inference rather than chat-based code generation. This is a different interaction model — passive inline completion vs active conversation.

  7. RAG for completions: Repository-level context indexing (v0.3.0+) provides multi-file context for completions without any user action.

Observable Failure Modes

  1. Last commit March 2026: The main TabbyML/tabby repo's last commit was March 2026 — suggesting development may be slowing or moving to Pochi
  2. Not an agent: Tabby's core value is code completion, not autonomous task execution. Users wanting an agent (like aider/cline/opencode) need to look elsewhere or wait for Pochi
  3. Hardware requirements: Self-hosting requires a machine with adequate GPU/CPU; not suitable for all teams
  4. Pochi is private preview: The agent layer is not public; users cannot evaluate it
  5. Enterprise features require setup: LDAP, GitLab/GitHub integration, team management require significant admin configuration
04

Workflow

TabbyML/Tabby — Workflow

Developer Workflow (Code Completion)

  1. Server running — Tabby server started on local machine or company server
  2. IDE plugin installed — VS Code, JetBrains, Vim, or Eclipse
  3. Type code — as developer types, IDE plugin sends context to Tabby
  4. FIM completion — Tabby generates fill-in-the-middle suggestions
  5. Accept/reject — Tab to accept, Escape to reject

Chat Workflow

  1. Open chat panel — available in VS Code sidebar
  2. Ask question — natural language question about code
  3. Response — Tabby generates response using chat model

Answer Engine Workflow (Enterprise)

  1. Admin indexes repositories — Git repos (GitHub/GitLab) indexed
  2. Developer asks question — via Answer Engine web interface
  3. Response with citations — grounded in indexed codebase content

Pochi (Agent) Workflow (Private Preview)

  1. Connect GitHub issues to Pochi tasks — in Tabby admin
  2. Pochi analyzes issue — reads related code
  3. Creates PR — with implementation
  4. CI/Lint/Test breakdown — shows in sidebar

Admin Workflow

  1. Access admin UIhttp://localhost:8080/admin
  2. Configure models — completion model, chat model
  3. Manage teams — add users, set permissions
  4. View analytics — usage stats, model performance

Phases + Artifacts Table

Phase Artifact
Indexing Repository context index (~/.tabby/)
Code completion Inline suggestion (in IDE)
Chat Response (in chat panel)
Answer Engine Grounded answer with source citations
Pochi PR with implementation

Approval Gates

Gate Type Notes
Code completion acceptance Tab keypress No modal dialog
Pochi PR creation GitHub PR review Standard PR approval process
06

Memory Context

TabbyML/Tabby — Memory and Context

State Storage

State Storage Scope
Models ~/.tabby/models/ Server-global
Repository index ~/.tabby/ Server-global
User accounts SQLite/PostgreSQL Server-global
Usage analytics Database Server-global
Sessions In-memory Session

Repository Context Index

The tabby-index and tabby-crawler maintain a searchable index of:

  • Code snippets from configured repositories
  • Documentation
  • Recently modified files (via local LSP)
  • GitHub/GitLab repository content (enterprise)

This index is persistent across sessions and updated when repos change.

Context for Completions

When serving a completion request:

  • Current file context (prefix + suffix from IDE)
  • RAG-retrieved relevant snippets from the repository index
  • Recently modified files (local LSP declarations)

This provides multi-file context without requiring explicit file selection by the developer.

Chat Context

Standard conversation context (recent message history). No long-term memory beyond the index.

Answer Engine Memory

The Answer Engine indexes and makes queryable:

  • Connected Git repositories (code)
  • Imported documentation files
  • User-submitted knowledge base entries

This is Tabby's longest-term "memory" — the indexed knowledge persists as long as the Tabby server runs.

07

Orchestration

TabbyML/Tabby — Orchestration

Multi-Agent

No multi-agent system in the core Tabby server. Pochi (in private preview) is an emerging agent layer but not yet public.

Orchestration Pattern

None. Tabby is a server that serves API requests — not an orchestrator.

Isolation Mechanism

Server-side: each API request is handled independently. No worktrees, no containers for user sessions.

Multi-Model

Yes, but for different purposes:

  • --model: FIM completion model (e.g., StarCoder-1B)
  • --chat-model: Chat model (e.g., Qwen2-1.5B-Instruct)

These are configured at server startup, not per-request.

Execution Mode

Background daemon. The Tabby server runs continuously and serves API requests from IDE plugins.

Crash Recovery

Docker restart policies handle server restart. The index persists in ~/.tabby/.

Context Compaction

Not applicable — each request is stateless at the API level.

Consensus Mechanism

None.

Prompt Chaining

Not applicable to core completion use case. Pochi may implement this.

08

Ui Cli Surface

TabbyML/Tabby — UI and CLI Surface

CLI Binary

Name: tabby (the server binary)
Primary use: tabby serve --model <model> --device <device>
Not a coding agent CLI — it's a server daemon

Admin Web Dashboard

Tabby ships a full admin web interface:

  • URL: http://localhost:8080/admin (or configured host)
  • Stack: React + Rust/Axum backend
  • Features:
    • Model management (switch models, view available models)
    • Team and user management
    • LDAP/OAuth/SSO authentication
    • Usage analytics and reporting
    • Activity log
    • Answer Engine Q&A interface
    • Git context configuration (GitHub/GitLab integration)
    • Notification management

This is the most complete admin dashboard in this batch — significantly more capable than other frameworks' UIs.

IDE Extensions

Extension Platform
VS Code Available in Marketplace (TabbyML.vscode-tabby)
JetBrains Available in JetBrains Marketplace
Vim/Neovim Available
Eclipse Available

All extensions connect to the Tabby server via REST API.

Chat Side Panel

VS Code extension includes:

  • Chat panel for Q&A
  • @-mention files to add context
  • Edit inline with right-click option
  • Multiple choice inline completion

Answer Engine Web Interface

A separate web interface for the Answer Engine — internal search/Q&A for engineering teams. Pages (persistent, shareable) created from Answer Engine conversations.

Observability

  • Admin UI: activity log, usage analytics
  • Per-user and per-team statistics
  • Model performance metrics
  • No file-based audit log (database-backed)

OpenAPI

Tabby exposes a documented REST API, enabling integration with custom tools and workflows.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Qodo (PR-Agent) ★ 11k

Open-source AI PR reviewer with single-call tool architecture, PR compression for large diffs, self-reflection quality gate, and…