TabbyML/Tabby

tabby · TabbyML/tabby · ★ 34k · last commit 2026-03-02

Primitive shape

No installable primitives

Summary

TabbyML/Tabby — Summary

Tabby is a self-hosted AI coding assistant server written in Rust, providing an open-source, on-premises alternative to GitHub Copilot. It runs as a Docker container or native binary with a web admin dashboard, and IDE plugins for VS Code, JetBrains, Vim/Neovim, and Eclipse. Unlike cloud-based tools, Tabby runs on the organization's own hardware (including consumer GPUs), provides a REST/OpenAPI interface for integration, and ships FIM (Fill-In-the-Middle) completion as the primary feature. The enterprise edition adds team management, LDAP authentication, Git context indexing, and an "Answer Engine" for internal Q&A. Recent developments include the Pochi agent (private preview) for issue-to-PR automation.

Tabby is architecturally different from all seed frameworks: it is an AI server, not a client-side agent methodology. Where superpowers/BMAD/spec-kit augment how Claude Code behaves, Tabby replaces the AI backend itself — organizations run their own Tabby server instead of hitting Anthropic/OpenAI APIs. Seed frameworks like taskmaster-ai and claude-flow assume a cloud LLM; Tabby IS the LLM serving infrastructure.

Overview

TabbyML/Tabby — Overview

Origin

Created by TabbyML and maintained as an open-source project. First stable release in August 2023. The project focuses on self-hosted AI coding infrastructure for organizations that cannot or do not want to send code to cloud AI services.

Philosophy

From README:

"Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot."

Three key features:

Self-contained: No external DBMS or cloud service required
OpenAPI interface: Easy integration with existing infrastructure
Consumer GPU support: Runs on gaming GPUs, not just data center hardware

Architecture Philosophy

Tabby is fundamentally an inference server + admin platform:

Inference: serves code completions via FIM (Fill-In-the-Middle) and chat via its REST API
Platform: admin UI for team management, usage analytics, model management
Integration: IDE plugins connect to the local Tabby server

The codebase is a Rust workspace with:

crates/tabby: main server
ee/tabby-webserver: enterprise web server and admin UI
clients/vscode, clients/vim, clients/intellij: IDE clients
clients/tabby-agent: the shared agent client library (TypeScript)

RAG for Code Completion (Key Technical Innovation)

From a 2023 blog post referenced in README:

"RAG-based code completion is enabled by detail in v0.3.0! Tabby utilizes repo-level context to get even smarter!"

Tabby uses repository-level context indexing (RAG) for code completion suggestions — not just the open file, but the entire repository structure. This is a differentiator from simple FIM completions.

Pochi (Agent, Private Preview)

"Implement GitHub issues by connecting them to Pochi tasks and create PRs directly from the sidebar with a breakdown of CI/Lint/Test results"

Pochi is an emerging agent layer being built on top of Tabby's infrastructure.

Architecture

TabbyML/Tabby — Architecture

Distribution

# Docker (easiest)
docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

# Native binary
# Download from GitHub releases or package managers

Source Layout (Rust + TypeScript monorepo)

crates/
  tabby/              # Main server (Rust)
    src/
      serve.rs        # HTTP server entrypoint
      routes/         # API routes
      services/       # Business logic
  tabby-common/       # Shared types
  tabby-inference/    # Inference engine
  tabby-index/        # Repository context indexing
  tabby-crawler/      # Code crawler for indexing
  tabby-git/          # Git integration
ee/
  tabby-db/           # Database layer
  tabby-schema/       # GraphQL schema
  tabby-webserver/    # Enterprise web server (Rust/Axum)
  tabby-ui/           # Admin web UI
clients/
  vscode/             # VS Code extension
  intellij/           # JetBrains plugin
  vim/                # Vim/Neovim plugin
  eclipse/            # Eclipse plugin
  tabby-agent/        # Shared TypeScript agent library
  tabby-chat-panel/   # Chat UI component

Config Files

tabby.json — model specification (FIM template, chat template)
~/.tabby/ — data directory (models, index, logs)

Required Runtime

Docker (recommended)
Or: Rust toolchain for building from source
GPU: CUDA, Metal (Apple Silicon), or CPU inference

API

REST API at http://localhost:8080:

/v1/completions — FIM code completions
/v1/chat/completions — chat completions (OpenAI-compatible)
Admin API via web UI

Model Format

GGUF via llama.cpp, stored in ~/.tabby/models/

Target AI Tools

Tabby IS the AI tool (server). IDE clients connect to it. No external cloud LLM dependency by default.

Components

TabbyML/Tabby — Components

Server Components (Rust)

Component	Purpose
`tabby` crate	Main HTTP server (Axum)
`tabby-inference`	Model inference engine (llama.cpp via GGML)
`tabby-index`	Repository context indexing (RAG)
`tabby-crawler`	Code crawler for building the index
`tabby-git`	Git repository integration
`tabby-common`	Shared types and utilities

Enterprise Components (Rust)

Component	Purpose
`tabby-webserver`	Enterprise web server with admin UI
`tabby-schema`	GraphQL schema
`tabby-db`	Database layer (SQLite/PostgreSQL)
`tabby-email`	Email notification system
`tabby-ui`	Admin web UI (React)

IDE Clients

Client	Platform
`clients/vscode`	VS Code extension
`clients/intellij`	JetBrains plugin (IntelliJ, PyCharm, etc.)
`clients/vim`	Vim/Neovim plugin
`clients/eclipse`	Eclipse plugin
`clients/tabby-agent`	Shared TypeScript library for IDE clients
`clients/tabby-chat-panel`	Chat UI component

Admin Web UI Features

Team management
User authentication (password, OAuth, LDAP)
Model management and switching
Usage analytics and reporting
Activity log
Answer Engine (internal Q&A)
Git context management (GitHub/GitLab integration)

Pochi (Agent, Private Preview)

GitHub integration agent that:

Connects GitHub issues to tasks
Creates PRs from the sidebar
Shows CI/Lint/Test breakdown results

Answer Engine

Internal knowledge base:

Connect to GitHub/GitLab repositories
Index documentation and code
Q&A interface for internal engineering teams

RAG (Repository Context)

The tabby-index and tabby-crawler components build and maintain a searchable index of the repository for use in code completion suggestions. This provides repo-level context for completions, not just file-level.

Prompts

TabbyML/Tabby — Prompts

Prompt 1: FIM (Fill-In-the-Middle) Template

Source: MODEL_SPEC.md — tabby.json model specification

Technique: Prompt templating for FIM inference. The model specification defines how context is structured for code completion requests.

{
  "prompt_template": "<PRE>{prefix}<SUF>{suffix}<MID>",
  "chat_template": "<s>{% for message in messages %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + '</s> ' }}{% endif %}{% endfor %}"
}

The FIM template:

{prefix}: code before the cursor
{suffix}: code after the cursor
The model fills in the <MID> section

This is the core prompting primitive for code completion. Unlike chat-based agents that use free-form prompts, Tabby uses structured FIM templates defined per-model.

Prompt 2: RAG Context Injection

Source: Inferred from tabby-index and tabby-crawler architecture

Technique: Retrieval-Augmented Generation for code completion. The repository index provides relevant code snippets as additional context for completions.

When generating a completion, Tabby:

Takes the current file context (prefix + suffix)
Queries the repository index for similar/relevant code patterns
Injects retrieved snippets as additional context
Generates the completion with expanded context

This is documented as "locally relevant snippets (declarations from local LSP, and recently modified code)" added in v0.5.

Prompt 3: Answer Engine Query Handling

Source: Inferred from v0.13 release notes

Technique: Grounded Q&A with source attribution. The Answer Engine is designed to provide "reliable and precise answers" grounded in indexed content, with citations.

The system likely uses a retrieval step (query → relevant docs/code) followed by generation with explicit "cite your sources" constraints, similar to the OpenHands documentation microagent.

Prompting Techniques Used

FIM templates: Structured prefix/suffix/middle format for code completion — not conversational prompting
Per-model prompt templates: Each model has its own template defined in tabby.json, not a universal format
RAG injection: Repository context retrieved and injected as additional prompt context
Grounded generation: Answer Engine requires source attribution
Chat templates: Jinja2-style templates for instruct/chat models (separate from FIM templates)

Uniqueness

TabbyML/Tabby — Uniqueness and Positioning

Differs from Seeds

Tabby is architecturally different from all seed frameworks — it is an AI inference server, not a client-side agent methodology. Where superpowers, BMAD-method, spec-kit, agent-os, and other seeds augment how a cloud LLM agent behaves, Tabby replaces the cloud LLM itself. Organizations run Tabby on their own servers, and IDE plugins connect to it instead of Anthropic/OpenAI. Seed frameworks like taskmaster-ai, claude-flow, and openspec assume a cloud LLM as the backend; Tabby IS the backend. The closest seed comparison is ccmemory (both provide infrastructure rather than methodology) but ccmemory augments Claude Code with memory while Tabby provides the entire AI serving layer.

Key Differentiators

Self-hosted inference: Organizations run Tabby on their own hardware. No data leaves the company. This is the primary value proposition — data sovereignty.
Consumer GPU support: Metal (Apple M1/M2), CUDA, and CPU inference. This makes self-hosting viable for small teams with gaming GPUs.
Admin platform: The most complete admin dashboard in this corpus — team management, LDAP authentication, usage analytics, activity logs, notification systems.
Answer Engine: Internal knowledge base with repository indexing and grounded Q&A. No other framework in this batch ships this as a built-in feature.
Multi-IDE support: VS Code, JetBrains, Vim/Neovim, and Eclipse. The broadest IDE coverage in this batch.
FIM-native: Built around Fill-In-the-Middle inference rather than chat-based code generation. This is a different interaction model — passive inline completion vs active conversation.
RAG for completions: Repository-level context indexing (v0.3.0+) provides multi-file context for completions without any user action.

Observable Failure Modes

Last commit March 2026: The main TabbyML/tabby repo's last commit was March 2026 — suggesting development may be slowing or moving to Pochi
Not an agent: Tabby's core value is code completion, not autonomous task execution. Users wanting an agent (like aider/cline/opencode) need to look elsewhere or wait for Pochi
Hardware requirements: Self-hosting requires a machine with adequate GPU/CPU; not suitable for all teams
Pochi is private preview: The agent layer is not public; users cannot evaluate it
Enterprise features require setup: LDAP, GitLab/GitHub integration, team management require significant admin configuration

Workflow

TabbyML/Tabby — Workflow

Developer Workflow (Code Completion)

Server running — Tabby server started on local machine or company server
IDE plugin installed — VS Code, JetBrains, Vim, or Eclipse
Type code — as developer types, IDE plugin sends context to Tabby
FIM completion — Tabby generates fill-in-the-middle suggestions
Accept/reject — Tab to accept, Escape to reject

Chat Workflow

Open chat panel — available in VS Code sidebar
Ask question — natural language question about code
Response — Tabby generates response using chat model

Answer Engine Workflow (Enterprise)

Admin indexes repositories — Git repos (GitHub/GitLab) indexed
Developer asks question — via Answer Engine web interface
Response with citations — grounded in indexed codebase content

Pochi (Agent) Workflow (Private Preview)

Connect GitHub issues to Pochi tasks — in Tabby admin
Pochi analyzes issue — reads related code
Creates PR — with implementation
CI/Lint/Test breakdown — shows in sidebar

Admin Workflow

Access admin UI — http://localhost:8080/admin
Configure models — completion model, chat model
Manage teams — add users, set permissions
View analytics — usage stats, model performance

Phases + Artifacts Table

Phase	Artifact
Indexing	Repository context index (`~/.tabby/`)
Code completion	Inline suggestion (in IDE)
Chat	Response (in chat panel)
Answer Engine	Grounded answer with source citations
Pochi	PR with implementation

Approval Gates

Gate	Type	Notes
Code completion acceptance	Tab keypress	No modal dialog
Pochi PR creation	GitHub PR review	Standard PR approval process

Memory Context

TabbyML/Tabby — Memory and Context

State Storage

State	Storage	Scope
Models	`~/.tabby/models/`	Server-global
Repository index	`~/.tabby/`	Server-global
User accounts	SQLite/PostgreSQL	Server-global
Usage analytics	Database	Server-global
Sessions	In-memory	Session

Repository Context Index

The tabby-index and tabby-crawler maintain a searchable index of:

Code snippets from configured repositories
Documentation
Recently modified files (via local LSP)
GitHub/GitLab repository content (enterprise)

This index is persistent across sessions and updated when repos change.

Context for Completions

When serving a completion request:

Current file context (prefix + suffix from IDE)
RAG-retrieved relevant snippets from the repository index
Recently modified files (local LSP declarations)

This provides multi-file context without requiring explicit file selection by the developer.

Chat Context

Standard conversation context (recent message history). No long-term memory beyond the index.

Answer Engine Memory

The Answer Engine indexes and makes queryable:

Connected Git repositories (code)
Imported documentation files
User-submitted knowledge base entries

This is Tabby's longest-term "memory" — the indexed knowledge persists as long as the Tabby server runs.

Orchestration

TabbyML/Tabby — Orchestration

Multi-Agent

No multi-agent system in the core Tabby server. Pochi (in private preview) is an emerging agent layer but not yet public.

Orchestration Pattern

None. Tabby is a server that serves API requests — not an orchestrator.

Isolation Mechanism

Server-side: each API request is handled independently. No worktrees, no containers for user sessions.

Multi-Model

Yes, but for different purposes:

--model: FIM completion model (e.g., StarCoder-1B)
--chat-model: Chat model (e.g., Qwen2-1.5B-Instruct)

These are configured at server startup, not per-request.

Execution Mode

Background daemon. The Tabby server runs continuously and serves API requests from IDE plugins.

Crash Recovery

Docker restart policies handle server restart. The index persists in ~/.tabby/.

Context Compaction

Not applicable — each request is stateless at the API level.

Consensus Mechanism

None.

Prompt Chaining

Not applicable to core completion use case. Pochi may implement this.

Ui Cli Surface

TabbyML/Tabby — UI and CLI Surface

CLI Binary

Name: tabby (the server binary)
Primary use: tabby serve --model <model> --device <device>
Not a coding agent CLI — it's a server daemon

Admin Web Dashboard

Tabby ships a full admin web interface:

URL: http://localhost:8080/admin (or configured host)
Stack: React + Rust/Axum backend
Features:
- Model management (switch models, view available models)
- Team and user management
- LDAP/OAuth/SSO authentication
- Usage analytics and reporting
- Activity log
- Answer Engine Q&A interface
- Git context configuration (GitHub/GitLab integration)
- Notification management

This is the most complete admin dashboard in this batch — significantly more capable than other frameworks' UIs.

IDE Extensions

Extension	Platform
VS Code	Available in Marketplace (`TabbyML.vscode-tabby`)
JetBrains	Available in JetBrains Marketplace
Vim/Neovim	Available
Eclipse	Available

All extensions connect to the Tabby server via REST API.

Chat Side Panel

VS Code extension includes:

Chat panel for Q&A
@-mention files to add context
Edit inline with right-click option
Multiple choice inline completion

Answer Engine Web Interface

A separate web interface for the Answer Engine — internal search/Q&A for engineering teams. Pages (persistent, shareable) created from Answer Engine conversations.

Observability

Admin UI: activity log, usage analytics
Per-user and per-team statistics
Model performance metrics
No file-based audit log (database-backed)

OpenAPI

Tabby exposes a documented REST API, enabling integration with custom tools and workflows.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Qodo (PR-Agent) ★ 11k

A8 Cross-runtime harness

Open-source AI PR reviewer with single-call tool architecture, PR compression for large diffs, self-reflection quality gate, and…

Distribution

Type: docker-image
License: NOASSERTION
Install: multi-step
Version: main (2026-03-02)

Surfaces

CLI binary: tabby
CLI subcmds: 3
Local UI: web-dashboard
UI port: 8080
Tech stack: React + Rust/Axum REST API + GraphQL (tabby-schema)

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 6
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 0
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: No
Modal: text

Execution

Mode: background-daemon
Crash recovery: Yes
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: hybrid
Persistence: global
Search: vector
State files: 3 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: proprietary
Replay: No

Tools

Primary: vscode
Targets: 5
Portability: high

Signals

Stars: 34k
Last commit: 2026-03-02
Contributors: 100
Maintainer: active
Quality score: 3/10