System Architecture
This page explains Agenta's system architecture: what each component does and how they connect.
System Overview
Agenta uses a microservices architecture deployed as Docker containers. The diagram below shows how the main layers connect.
┌─────────────────────────────────────┐
│ Users │
│ (Developers, AI Engineers) │
└─────────────────┬───────────────────┘
│
┌─────────────────▼───────────────────┐
│ Load Balancer / Proxy │
│ (Traefik or Nginx) │
│ Handles SSL and routing │
└─────────────┬───────────────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ API Backend │ │ Services API │
│ (Web UI) │◄────────► (FastAPI) │◄────────► (FastAPI) │
│ │ │ │ │ │
│ • Next.js App │ │ • REST API │ │ • Completion │
│ • Playground │ │ • Core logic │ │ • Chat │
│ • Admin UI │ │ • Persistence │ │ • LLM adapters │
└─────────────────┘ └─────────┬───────┘ └────────┬────────┘
│ │ │
│ ▼ ▼
│ ┌─────────────────────────┐ ┌─────────────────┐
│ │ Worker Pool │ │ runner :8765 │
│ │ (background procs) │ │ (agent runs) │
│ │ • worker-evaluations │ └────────┬────────┘
│ │ • worker-tracing │ │
│ │ • worker-webhooks │ │
│ │ • worker-events │ │
│ │ • worker-records │ │
│ │ • worker-interactions │ │
│ │ • worker-triggers │ │
│ │ • cron │ │
│ └──────────────┬──────────┘ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ PostgreSQL │ │ Redis │ │ SuperTokens │ │seaweedfs │ │
│ │ │ │ │ │ │ │ :8333 │ │
│ │ • Core DB │ │ • Task queues │ │ • Auth │ │(bundled │ │
│ │ • Tracing DB │ │ • Streams │ │ • Sessions │ │or ext S3)│ │
│ │ • Auth DB │ │ • Caching │ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ └──────────────┘ └──────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Frontend Components
Web UI (NextJS Application)
- Technology: React, TypeScript, Next.js
- Port: 3000 (internal)
- Purpose: Primary user interface for Agenta platform
Key Responsibilities:
- User Interface: Provides intuitive web interface for application management
- Playground: Interactive environment for testing and evaluating LLM applications
- Evaluation Dashboard: Visualizations and metrics for application performance
- Application Management: Create, configure, and deploy AI applications
- User Authentication: Login, registration, and session management
Backend Components
API Service (FastAPI)
- Technology: Python, FastAPI, SQLAlchemy
- Port: 8000 (internal)
- Purpose: Core business logic and API endpoints
Key Responsibilities:
- REST API: Provides RESTful endpoints for frontend and external integrations
- Business Logic: Implements core platform functionality
- Data Management: Handles CRUD operations for applications, evaluations, experiments, etc
- Authentication: Integrates with SuperTokens for user authentication
- Application Orchestration: Manages application lifecycle and deployment
- Evaluation Management: Coordinates evaluation runs and result collection
Worker Services (TaskIQ + Async Consumers)
- Technology: Python workers, TaskIQ, asyncio consumers, Redis, PostgreSQL
- Purpose: Background processing for evaluations, tracing, events, and webhooks
Key Responsibilities:
- Evaluation Execution:
worker-evaluationsruns asynchronous evaluation workloads - Tracing Ingestion:
worker-tracingconsumes OTLP tracing pipelines - Webhook Delivery:
worker-webhooksdispatches outbound webhook notifications - Event Processing:
worker-eventsprocesses internal event streams - Session Records:
worker-recordspersists agent session records from thestreams:recordsRedis stream - Interaction Dispatch:
worker-interactionsconsumes thequeues:interactionsqueue and dispatches async session interactions - Trigger Processing:
worker-triggersprocesses trigger events for automated workflow execution
TaskIQ Integration:
- Broker: Uses Redis streams for queueing and task distribution
- Task Registration: Evaluation tasks are registered at worker startup
- Execution: Workers consume Redis-backed jobs and process them asynchronously
Agent Runner
- Technology: Node.js TypeScript sidecar
- Port: 8765 (internal)
- Purpose: Executes agent workflows on behalf of the Services API
The runner receives /run requests from the Services API (routed via AGENTA_RUNNER_URL) and starts harness processes (Pi, Claude Code, or other supported adapters) in local or remote sandboxes. It mounts durable working directories from the store into each sandbox and relays server-side tools back to the Services API without exposing the full stack environment to the harness.
Sandbox matrix:
local— in-process on the runner host; the default for compose and Kubernetes deployments.daytona— a remote Daytona cloud sandbox; requiresSANDBOX_AGENT_PROVIDER=daytonaon the runner.
Services Backend
Services API (FastAPI)
- Technology: Python, FastAPI
- Port: 8080 (internal)
- Purpose: LLM-facing endpoints and service-layer APIs exposed under
/services/*
Key Responsibilities:
- LLM Integration: Connects to various LLM providers (OpenAI, Anthropic, etc.)
- Prompt Processing: Handles prompt templates and variable substitution
- Response Generation: Manages LLM API calls and response handling
- Provider Abstraction: Unified interface across different LLM providers
- Error Handling: Robust error handling for LLM API failures
- Endpoint Groups: Includes
/services/completion/*and/services/chat/*
Infrastructure Services
PostgreSQL (Database)
- Technology: PostgreSQL 17
- Port: 5432
- Purpose: Primary data storage
Databases:
- Core Database: Application data, Datasets, Evaluations, Users & Profiles, etc.
- Tracing Database: Execution traces and performance metrics
- SuperTokens Database: Authentication and user management data
Redis (Task Queue, Caching & Sessions)
- Technology: Redis
- Ports: 6379 (volatile), 6381 (durable)
- Purpose: Task queue, caching, pub/sub, streams
Use Cases:
- Task Queue: TaskIQ broker for background job distribution and processing
- Application Caching: Frequently accessed data
- Session Storage: User sessions and temporary data
- Task Results: TaskIQ task results and status
- Real-time Data: Live updates and notifications
- Rate Limiting: API rate limit counters
SuperTokens (Authentication)
- Technology: SuperTokens
- Port: 3567
- Purpose: Authentication and user management
Features:
- User Authentication: Login/logout, password management
- Session Management: Secure session handling with JWT
- OAuth Integration: Google, and GitHub
- User Management: User registration, profile management
Durable Store (SeaweedFS / S3)
- Technology: SeaweedFS (bundled) or any S3-compatible store (AWS S3, Cloudflare R2, MinIO)
- Port: 8333 (bundled SeaweedFS)
- Purpose: S3-compatible object store backing durable agent workspaces
Files written during an agent run are stored here and remounted automatically on the next turn, so agent workspaces survive sandbox teardown.
The store.seaweedfs.enabled Helm toggle controls whether the chart bundles a SeaweedFS StatefulSet or points store.endpointUrl at an external store. This mirrors the postgresql.enabled pattern. The endpoint URL is always explicit; a remote S3-compatible store (AWS, MinIO) must set it.
Per-deployment default:
- Dev compose: SeaweedFS container bundled.
- Railway: SeaweedFS service and volume (publicly reachable, no tunnel needed).
- Kubernetes (gh self-host): no bundled SeaweedFS; supply external S3 credentials via
store.*values. - Kubernetes (operator choice): enable via
store.seaweedfs.enabled=true. - Live / private cloud: external AWS S3 (
store.seaweedfs.enabled=false).
See the Store configuration reference.
Service Dependencies
Frontend Dependencies
Web UI depends on:
├── API Service (primary backend)
├── Services API (playground and model calls)
└── Authentication (SuperTokens via API)
Backend Dependencies
API Service depends on:
├── PostgreSQL (data persistence)
├── Redis (task queue, caching, sessions)
├── SuperTokens (authentication)
└── Worker pool (async task execution)
Services API depends on:
├── PostgreSQL (agent and service state)
├── LLM providers (model calls)
└── runner sidecar (agent workflow execution via AGENTA_RUNNER_URL)
Worker Dependencies
Worker pool depends on:
├── Redis (queues and streams)
├── PostgreSQL (state and persistence)
├── API backend (coordination and config)
├── worker-records (streams:records stream → session persistence)
└── Services API / external endpoints (workload-specific processing)