OmniCoreAgent

omnicore-agent · omnirexflora-labs/omnicoreagent · ★ 241 · last commit 2026-05-25

Primitive shape

No installable primitives

Summary

OmniCoreAgent — Summary

OmniCoreAgent is an open Python agent harness (pip: omnicoreagent) that positions itself as an "application-facing harness layer" providing the runtime boundary around LLM models. Its distinctive features are parallel tool batches (the agent batches independent tool calls and gets one structured observation), BM25-based tool retrieval for large toolsets, signature-based loop detection, MCP client integration, a workspace file system with optional cloud storage (S3, R2), dynamic subagents, background tasks (durable scheduled/manual with run history and leases), and REST/SSE serving via OmniServe. The framework uses LiteLLM for model routing, making it compatible with any LiteLLM-supported provider. It also supports OpenTelemetry, LangSmith, and Opik for observability. Modular extras: [redis], [postgres], [mongodb], [s3], [serve], [tokenizer], [otel]. Version is dynamic (git-tagged).

Compared to seeds, OmniCoreAgent most closely resembles claude-flow in bundling parallel tool execution, memory, subagents, and production serving in one package, but differs: it is a pure Python pip package with no MCP server of its own (it is an MCP client), no LangGraph dependency, and uses LiteLLM for model abstraction rather than LangChain.

Overview

OmniCoreAgent — Overview

Origin

Developed by Abiola Adeshina / Omnirexflora Labs. Published as omnicoreagent on PyPI. The tagline is "The Open Production Agent Harness for Python."

Core Philosophy

From the README:

"An LLM is not an agent by itself. The model provides intelligence; the harness gives that intelligence a working environment."

The framework explicitly defines four harness boundaries:

Layer	What It Owns
Agent harness	Model loop, prompt contract, tools, observations, memory, context, workspace, guardrails, events, subagents
Serving boundary	OmniServe REST/SSE APIs, request lifecycle, readiness, auth, rate limits, metrics
Background boundary	Durable scheduled/manual task execution with task state, run history, leases, retries, and workspace output
External tool boundary	MCP server tools and local Python tools through one runtime surface

Parallel Tool Batches (Key Differentiator)

From the README:

"The usual tool loop looks like this: LLM -> call tool A -> wait -> result -> LLM -> call tool B -> wait -> result. OmniCoreAgent lets the model request independent tools together: LLM -> [tool A + tool B + tool C in parallel] -> one structured observation -> LLM."

Structured Observations

Raw tool output is reformatted before being added to the message thread. Failed tools appear alongside successful ones in the same observation, rather than collapsing the step.

Architecture

OmniCoreAgent — Architecture

Distribution

Type: pip package (omnicoreagent on PyPI)
Version analyzed: dynamic (git-tagged via uv-dynamic-versioning)
Install: pip install omnicoreagent
Optional extras: [redis], [postgres], [mongodb], [s3], [serve], [tokenizer], [otel], [langsmith], [opik]
Required runtime: Python >=3.10

Repository Layout

src/omnicoreagent/          # Main Python package
├── __init__.py
├── core/                   # Core agent loop, prompt contract, tool runner
├── background/             # Durable background tasks, run history, leases
├── governance/             # Guardrails, policy enforcement
├── mcp_clients_connection/ # MCP client integration
├── serve/                  # OmniServe REST/SSE API layer (FastAPI)
└── workflows/              # Workflow orchestration

engineering/                # Internal engineering docs
cookbook/                   # Usage examples (progressive complexity)
docker/                     # Docker configuration

Key Dependencies

litellm>=1.83.14 — LLM model routing (supports any LiteLLM provider)
mcp[cli]>=1.27.0 — MCP client support
pydantic>=2.12.5 — data validation
rich>=15.0.0 — terminal output formatting
click>=8.1.8 — CLI framework
Optional: redis, sqlalchemy+psycopg2, motor+pymongo, boto3, fastapi+uvicorn, tiktoken, opentelemetry-*

Config Files

Environment variables: LLM_API_KEY, LLM_PROVIDER, LLM_MODEL (via litellm)
No project-level config file required; configuration is programmatic

Deployment Options

Standalone Python process
Docker (via docker/ directory)
OmniServe (built-in FastAPI server)

Target Tools

LLM-agnostic via LiteLLM (OpenAI, Anthropic, Google, Azure, Cohere, local models). MCP-compatible for tool integration.

Components

OmniCoreAgent — Components

Core Agent

Component	Purpose
`OmniCoreAgent(name, system_instruction, model_config)`	Main agent class
`agent.run(prompt, session_id)`	Execute agent with a prompt
`agent.cleanup()`	Release resources
Parallel tool batcher	Runs independent tool calls concurrently
Structured observation formatter	Formats tool results before LLM sees them
BM25 tool retrieval	Finds relevant tools from large toolsets
Signature loop detector	Detects and breaks repetitive action patterns

Memory + Context

Component	Purpose
Session memory	Per-session message history
Context management	Token limit enforcement, context windowing
Tool output offloading	Write large tool results to workspace, replace with reference
Cross-session recall	Persistent memory across sessions

Workspace

Component	Purpose
Local workspace files	Read/write/list files in agent workspace
Cloud storage backends	S3, R2 (via `[s3]` extra)
Artifact management	Store large tool results as workspace files

MCP Client

Component	Purpose
`mcp_clients_connection/`	Connect to MCP servers as a client
Local Python tools	Expose Python functions as tools
Unified tool surface	MCP tools + local tools through one interface

Subagents

Component	Purpose
Dynamic subagents	Spawn specialized child agents at runtime
Shared workspace output	Parent and child share workspace filesystem

Background Tasks

Component	Purpose
Durable task store	Persist task definitions
Run history	Track all past runs
Task leases	Prevent concurrent duplicate execution
Retries + cancellation	Automatic retry on failure; manual cancellation
Scheduled tasks	Cron-style or manual trigger

OmniServe (REST/SSE API)

Component	Purpose
FastAPI server	Expose agents as HTTP endpoints
SSE streaming	Stream agent responses in real time
Auth	API key or custom auth
Rate limiting	Per-client request rate limits
Readiness probe	`/health` endpoint
Metrics	Request count, latency, error rate

Observability

OpenTelemetry (OTLP export) — [otel] extra
LangSmith integration — [langsmith] extra
Opik integration — [opik] extra

Prompts

OmniCoreAgent — Prompts

Excerpt 1: Quickstart (from README)

agent = OmniCoreAgent(
    name="assistant",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "openai", "model": "gpt-4o"},
)

result = await agent.run(
    "Research the top 3 open-source agent runtimes and summarize them.",
    session_id="quickstart",
)

Prompting technique: Simple string system instruction + string user prompt. The harness manages the message loop, tool calls, and structured observations internally. The system instruction is a single string; no template variables or chaining required for the basic case.

Excerpt 2: LiteLLM Model Config

model_config={"provider": "openai", "model": "gpt-4o"}
# or
model_config={"provider": "anthropic", "model": "claude-sonnet-4-20250514"}
# or any LiteLLM-supported provider

Prompting technique: Model routing is config-driven via LiteLLM's provider abstraction. The same system instruction and user prompt are sent to any supported model without code changes.

Prompting Architecture

System instruction: Python string passed to OmniCoreAgent(system_instruction=...)
User prompt: Python string passed to agent.run(prompt, ...)
No static markdown prompt files are shipped
Structured observations: tool results are reformatted by the harness before the LLM sees them (prompt-level context engineering happens automatically)
The harness manages the tool-call contract, parser, resolver, parallel runner, and result formatter — prompting is sandwiched by harness infrastructure

Uniqueness

OmniCoreAgent — Uniqueness

Differs from Seeds

OmniCoreAgent is most similar to claude-flow in its production-first ambitions: parallel tool execution, memory, subagents, and HTTP serving in one package. The key deltas: OmniCoreAgent is a pip package (not npm), uses LiteLLM rather than being Claude-specific, acts as an MCP client (not server), and adds signature-based loop detection and BM25 tool retrieval — production features not present in claude-flow's 305-tool MCP server approach. Unlike taskmaster-ai (task management with JSON files) or deepagents-langchain (LangGraph-specific), OmniCoreAgent is LLM-framework-agnostic via LiteLLM.

Distinctive Position

Parallel tool batches with structured observations — the most technically rigorous handling of parallel tool calls in this batch: one model request → N parallel tool calls → one structured observation (failed tools shown alongside successful ones)
Signature loop detection — algorithmic detection of repetitive action patterns, not just prompt-based warnings
BM25 tool retrieval — scales to large toolsets without context exhaustion
Four-boundary architecture documented explicitly: agent harness, serving boundary, background boundary, external tool boundary
Background task system with leases and run history — production durable task execution not seen in other Python harnesses in this batch

Explicit Anti-Patterns

Stated in README: "Most demos stop at 'LLM plus tool loop.' Production agents fail in the layer around that loop: slow sequential tool calls, noisy observations, repeated actions, context exhaustion, unsafe tool output, missing workspace state, uninspectable background work, and weak serving boundaries."

Observable Failure Modes

Dynamic versioning (git-tagged): no stable semver pinning
litellm>=1.83.14 is a very current version floor — potential breakage on LiteLLM updates
Stars (241) suggest early adoption; production readiness claims may outrun community validation
No built-in git automation or filesystem isolation

Inspired By

Production agent reliability patterns, LiteLLM, MCP protocol.

Workflow

OmniCoreAgent — Workflow

Basic Execution

from omnicoreagent import OmniCoreAgent

agent = OmniCoreAgent(
    name="assistant",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "openai", "model": "gpt-4o"},
)

result = await agent.run(
    "Research the top 3 open-source agent runtimes.",
    session_id="quickstart",
)
print(result["response"])
await agent.cleanup()

Parallel Tool Execution Loop

LLM reasons → identifies independent tool calls
→ tool batcher runs them concurrently
→ structured observation combines all results (including failures)
→ LLM reasons from the complete observation
→ repeat until done

This differs from the standard sequential tool loop where each call is a separate LLM round-trip.

Phases

Phase	Description	Artifact
Agent initialization	`OmniCoreAgent(...)` — configure model, tools, memory	Agent object
Run	`agent.run(prompt, session_id)` — execute agent loop	Response dict
Tool batching	Parallel tool execution per reasoning step	Structured observation
Session memory	Message history maintained per `session_id`	In-memory or persistent
Background task	Durable task with run history and lease	Task run record
OmniServe	Expose agent via FastAPI HTTP/SSE	API endpoint

Approval Gates

Not documented in the README. The governance/ module may include human-in-the-loop controls.

Progressive Complexity Pattern

The cookbook is organized as progressive complexity tiers:

Hello world
Agent with local tools
Agent with MCP tools
Context + memory management
Tool output offloading
Real applications
Workflows
OmniServe

Memory Context

OmniCoreAgent — Memory and Context

Session Memory

Per-session message history maintained by session_id
In-memory by default; Redis/PostgreSQL/MongoDB persistence via optional extras
context_management handles token limits (windowing + summarization)

Tool Output Offloading

Large tool outputs are written to workspace files rather than added to the message thread
The message thread receives a file reference instead of the raw output
Prevents context exhaustion from large API responses or file contents

Workspace

Local workspace filesystem: agent can read/write/list files
Cloud storage: S3 or R2 backends (via [s3] extra)
Shared between parent agent and dynamic subagents

Cross-Session Memory

Persistent memory via Redis, PostgreSQL, or MongoDB backends
Recall across sessions: the agent can query previous session outputs

BM25 Tool Retrieval

For large toolsets, BM25 retrieval surfaces relevant tools based on the current task
Prevents context exhaustion from loading all tool descriptions into the prompt

Context Management Strategy

Token limit enforcement (via [tokenizer] extra with tiktoken)
Context windowing: oldest messages may be dropped or summarized
Tool output offloading: large results → workspace file
BM25 tool discovery: only relevant tools loaded into context

Orchestration

OmniCoreAgent — Orchestration

Multi-Agent Pattern

OmniCoreAgent supports dynamic subagent spawning:

Parent agent can spawn specialized child agents at runtime
Child agents share the parent's workspace output
Pattern: hierarchical — parent delegates to children

Parallel Tool Batching

The core execution innovation: the model requests multiple independent tools simultaneously, the harness executes them in parallel, and returns a single structured observation. This is not multi-agent coordination; it is single-agent parallel tool execution.

Multi-Model

Via LiteLLM: any model_config can specify any LiteLLM-supported provider. Different agent instances can use different models. No config-file routing system.

Background Tasks

The background task system provides durable work isolation:

Tasks run in a separate execution context
State machine with run history, leases, retries
Cancellable and schedulable
Output written to workspace

Execution Modes

Interactive: agent.run() returns when done
Background: durable scheduled/manual task execution
OmniServe: event-driven HTTP/SSE serving

Isolation Mechanism

Workspace isolation: each session/run writes to a scoped workspace. No container isolation or git worktrees.

Signature Loop Detection

Monitors the sequence of tool calls and detects repeated patterns (signatures) indicating the agent is stuck in a loop. Breaks out automatically.

Ui Cli Surface

OmniCoreAgent — UI/CLI Surface

CLI Binary

click>=8.1.8 is a dependency, suggesting a CLI exists, but no binary is explicitly declared in pyproject.toml's [project.scripts] section. The CLI interface is likely via python -m omnicoreagent or a click-based entry point in src/.

Local UI

None documented. OmniCoreAgent is a library + OmniServe API server; no web dashboard.

OmniServe (REST/SSE API)

FastAPI-based agent serving (via [serve] extra: fastapi, uvicorn)
Features: SSE streaming, API key auth, rate limiting, readiness probe, metrics
This is an HTTP API server for production deployment, not a monitoring UI

AI Tool Docs Integration

The README includes a dedicated section "Use docs with AI tools" with links for:

Ask AI (query the hosted docs)
/llms.txt (for LLM-native documentation access)
Hosted docs MCP (for AI coding tool integration)
Cursor, VS Code, ChatGPT, Claude, Perplexity guides

This suggests the framework actively targets AI coding tool integration for its own documentation.

Cross-Tool Portability

high — LiteLLM backend supports any provider. Not tied to any AI coding tool.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: pip-package
License: MIT
Install: one-liner
Version: unknown (dynamic git versioning)

Surfaces

CLI binary: unknown (click-based)
Local UI: No
Tech stack: OmniServe (FastAPI/uvicorn, not a UI — an API server)

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 7
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: interactive-loop
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: Yes

Memory

Type: hybrid
Persistence: project
Search: full-text
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: No

Tools

Primary: LiteLLM (provider-agnostic)
Targets: 2
Portability: high

Signals

Stars: 241
Last commit: 2026-05-25
Maintainer: active
Quality score: 4/10