The 100 packages, SDKs, CLIs, and runtimes you actually install when you build AI agents in May 2026, ranked head-to-head on a single board.
LangGraph alone shipped 43,268,641 monthly PyPI downloads in April 2026. That is one package, in one month, on one runtime. Multiply that across the framework layer (LangChain, CrewAI, AutoGen, Pydantic AI, smolagents, LlamaIndex, DSPy, Haystack), the agent SDKs (OpenAI Agents, Claude Agent SDK, Google ADK), the coding agents (Claude Code, Codex CLI, Aider, Continue), the browser stacks (Playwright, browser-use, Stagehand), the local LLM runtimes (vLLM, Ollama, llama.cpp, MLX), the sandboxes (E2B, Daytona, Modal, Docker), the memory systems (Mem0, Zep, Letta), the eval and observability tooling (Langfuse, Inspect, Promptfoo, OpenLLMetry), the vector indexes (Chroma, Qdrant, LanceDB, Weaviate, pgvector), and the package managers and shell tools that hold the whole thing together (uv, pnpm, Bun, ripgrep, ast-grep, tree-sitter), and you get a picture of how much of the modern agent stack is shipped as something you pip install or npm i rather than something you call over HTTPS.
This guide is the install side of the agent stack. Hosted APIs, MCP servers, and SaaS connectors are covered in our companion guides. Here we focus on packages, SDKs, CLIs, runtimes, libraries, and binaries: the things that actually land in your requirements.txt, your package.json, your Docker image, your ~/.local/bin, your node_modules, your global CLI path. We rank the top 100 installs for AI agents on a single global board, scored against the criteria a builder cares about in May 2026: agent-readiness, install friction, production reliability, ecosystem pull, and cost-to-run.
The list deliberately weights coding agents heavily, because that is where the install density is highest in 2026 and where most of the structural innovation in tooling is happening. But it also covers what you need for browser/RPA agents, voice agents, research agents, customer-support agents, and horizontal workforce agents like the ones we build inside O-mega.ai on top of Suprsonic. When the same package serves multiple archetypes (and most do), we say so explicitly.
Written by Yuma Heymans (@yumahey), founder of O-mega.ai, who spent the last year shipping autonomous coding agents, browser agents, and outbound-research agents on top of roughly 60 of the 100 packages on this list.
Contents
- The Master Table: All 100 Installs Ranked
- Scoring Criteria and Weights
- The Coding Agent Stack (Where Half the Installs Live)
- Agent Frameworks: LangGraph, CrewAI, AutoGen, Pydantic AI, smolagents, DSPy
- First-Party Agent SDKs: OpenAI, Anthropic, Google
- Coding Agent CLIs: Claude Code, Codex, Aider, Continue, Cursor
- Browser and Computer-Use Installs
- Local Inference Runtimes: vLLM, Ollama, llama.cpp, MLX
- Sandboxes and Code Execution: E2B, Daytona, Modal, Docker
- Memory Systems: Mem0, Zep, Letta, Supermemory
- Retrieval and Vector Indexes
- Evals and Observability: Langfuse, Inspect, Promptfoo, OpenLLMetry
- Durable Workflow Runtimes: Temporal, Inngest, Restate, Trigger.dev
- Voice Agents: LiveKit Agents, Pipecat
- Structured Output: Instructor, Outlines, LiteLLM
- Code-Search Primitives: ripgrep, ast-grep, tree-sitter, fd
- Package Managers and Runtimes: uv, pnpm, Bun, Node, Python
- How Suprsonic Fits Alongside These Installs
- Picking the 10 Installs You Actually Need
- What Will Move on This List by November 2026
1. The Master Table: All 100 Installs Ranked
This is one unified ranking of 100 installable packages, SDKs, CLIs, runtimes, libraries, and binaries that AI-agent builders reach for in May 2026. The board is sorted globally by Final Score, highest first. The Category column tells you what role the install plays so you can compare like with like, but the rank itself is global. A code-search binary at #14 is a more important install for an agent stack than a hosted-browser SDK at #85, and that is the point of putting them on the same board.
The five criteria, with weights: Agent-Readiness (30%), Install Friction (15%), Reliability at Production Scale (25%), Ecosystem Pull (20%), Cost-to-Run (10%). Each install is scored 0-10 per criterion. Each cell contains the score and a one-line justification with the actual data point. The Final Score is the weighted average, rounded to one decimal. Section 2 expands on what each criterion means and why it carries the weight it does.
| # | Install | Category | Lang/Runtime | Agent-Ready (30%) | Install Friction (15%) | Reliability (25%) | Ecosystem (20%) | Cost (10%) | Final |
|---|---|---|---|---|---|---|---|---|---|
| 1 | FastAPI | Web Framework | Python | 10 - Pydantic-native, OpenAPI, default for agent control planes | 10 - pip install fastapi | 10 - very widely deployed | 10 - default Python agent server framework | 10 - free OSS | 10.0 |
| 2 | Pydantic | Schema/Validation | Python | 10 - tool schemas, structured output, every framework uses it | 10 - pip install pydantic | 10 - Pydantic Inc, used in OpenAI/Anthropic SDK | 10 - 230M+ monthly dl | 10 - free OSS | 10.0 |
| 3 | ripgrep | Code Search Binary | Rust binary | 10 - the search tool every coding agent shells out to | 10 - brew install rg / cargo / native | 10 - BurntSushi, Rust core | 10 - shipped inside VS Code, Cursor, Claude Code | 10 - free OSS | 10.0 |
| 4 | tree-sitter | Parser Library | C + bindings | 10 - the AST every code agent uses | 10 - pip install tree-sitter + grammars | 10 - GitHub, universal | 10 - underpins ast-grep, Neovim, many editors | 10 - free OSS | 10.0 |
| 5 | Zod | Schema/Validation | TypeScript | 10 - tool schemas for TS agents | 10 - npm i zod | 10 - de facto TS schema lib | 10 - 30M+ weekly dl | 10 - free OSS | 10.0 |
| 6 | Anthropic SDK | LLM Client SDK | Python + TS + Go + Java | 10 - tools, prompt caching, computer use, batch | 10 - pip install anthropic | 10 - Anthropic-maintained, 99.9% uptime SLA | 10 - Claude Sonnet/Opus 4.x flagship clients | 8 - usage paid, SDK free | 9.7 |
| 7 | GitHub CLI (gh) | DevOps Binary | Go binary | 9 - GitHub interface every coding agent shells out to | 10 - brew install gh | 10 - GitHub | 10 - universal | 10 - free | 9.7 |
| 8 | jq | JSON Tool | C binary | 9 - JSON tool every agent shells out to | 10 - universal install | 10 - venerable | 10 - default JSON wrangler | 10 - free OSS | 9.7 |
| 9 | MCP Python SDK | Protocol SDK | Python | 10 - reference impl of the standard | 9 - pip install mcp | 9 - Linux Foundation governed (AAIF) since Dec 2025 | 10 - 110M monthly SDK dl across MCP ecosystem | 10 - free OSS | 9.7 |
| 10 | MCP TypeScript SDK | Protocol SDK | TypeScript | 10 - reference impl, used by Claude Desktop, Cursor, Copilot | 9 - npm i @modelcontextprotocol/sdk | 9 - same governance | 10 - 300+ MCP clients, 21K+ servers | 10 - free OSS | 9.7 |
| 11 | OpenAI Python SDK | LLM Client SDK | Python | 10 - Responses API, Realtime, Assistants | 10 - pip install openai | 10 - OpenAI maintained, scale-tested | 10 - largest install base in agent space | 8 - usage paid, SDK free | 9.7 |
| 12 | OpenAI TS SDK | LLM Client SDK | TypeScript | 10 - same surface as Python | 10 - npm i openai | 10 - same | 10 - same | 8 - same | 9.7 |
| 13 | Vercel AI SDK | LLM Helper SDK | TypeScript | 10 - tool calling, streaming, generateObject | 10 - npm i ai | 9 - Vercel | 10 - default TS LLM SDK | 10 - free OSS | 9.7 |
| 14 | ast-grep | Code Search/Edit | Rust binary | 10 - structural search via tree-sitter, MCP server | 10 - brew install ast-grep | 9 - mature, 7K+ stars | 9 - default structural tool for coding agents | 10 - free OSS | 9.6 |
| 15 | Hugging Face Hub | Model Distribution | Python | 9 - 1M+ models, datasets, Inference Endpoints | 10 - pip install huggingface_hub | 10 - HF, universal | 10 - the registry for OSS models | 10 - free OSS | 9.6 |
| 16 | Instructor | Structured Output | Python + TS + Go + Ruby | 10 - Pydantic-validated outputs, 100+ providers | 10 - pip install instructor | 9 - active maintenance, Jan 2026 release | 9 - default structured-output lib | 10 - free OSS | 9.6 |
| 17 | LangGraph | Agent Framework | Python + TS | 10 - graph runtime, MCP, structured tools, native interrupts | 9 - pip install langgraph, no native deps | 9 - 43.2M monthly PyPI dl, used at JPMC, LinkedIn | 10 - 100K+ stars across LangChain repos | 10 - free OSS, optional cloud | 9.6 |
| 18 | Langfuse | Observability | Python + TS + self-host | 10 - traces, evals, prompt mgmt, OTel | 9 - SDK + Docker / Cloud | 9 - Langfuse Inc, OSS | 10 - default OSS LLM observability | 10 - free OSS, paid cloud | 9.6 |
| 19 | LiteLLM | LLM Router/Gateway | Python | 10 - 100+ providers behind one API, proxy mode | 10 - pip install litellm | 9 - widely deployed proxy | 9 - default for multi-provider agents | 10 - free OSS, paid cloud | 9.6 |
| 20 | Ollama | Local LLM Runtime | Go binary | 9 - "Docker of local LLMs", MCP client, function calling | 10 - one curl install, no GPU required | 10 - millions of users, very stable | 10 - default for "run a model locally" | 10 - free OSS | 9.6 |
| 21 | pnpm | JS Package Manager | Node | 9 - 65.5M weekly dl, monorepo-friendly for agent stacks | 10 - npm i -g pnpm | 10 - very stable | 10 - default for serious TS agent monorepos | 10 - free OSS | 9.6 |
| 22 | fd | File Search Binary | Rust binary | 9 - default file find for coding agents | 10 - brew install fd | 10 - sharkdp | 9 - shipped in many AI dev environments | 10 - free OSS | 9.5 |
| 23 | httpx | HTTP Client | Python | 9 - async + sync, HTTP/2, timeouts | 10 - pip install httpx | 10 - Encode foundation, 200M+ monthly dl | 9 - default async client across most agent frameworks | 10 - free OSS | 9.5 |
| 24 | llama.cpp | Local LLM Runtime | C++ binary | 9 - GGUF, MCP client merged Mar 2026, structured output | 9 - brew install llama.cpp or build | 10 - ggerganov, b8200+ builds | 10 - underpins LM Studio, Ollama, etc | 10 - free OSS | 9.5 |
| 25 | OpenTelemetry SDK | Observability | Python + TS + Go + .NET | 9 - OTel GenAI semantic conventions GA | 9 - pip install opentelemetry-sdk | 10 - CNCF, universal | 10 - the substrate every observability stack uses | 10 - free OSS | 9.5 |
| 26 | AnyIO | Async Library | Python | 9 - portable async primitives behind FastAPI/httpx | 10 - pip install anyio | 10 - mature | 9 - underpins much async-agent code | 10 - free OSS | 9.4 |
| 27 | Composio SDK | Tool Platform SDK | Python + TS | 10 - 850+ tools, managed OAuth, MCP | 10 - pip install composio | 8 - Composio, OSS core | 9 - default integration platform for agents | 9 - free 200K calls/mo, $29/mo | 9.4 |
| 28 | Inspect AI | Eval Framework | Python | 10 - UK AISI grade, model graders, sandboxing | 10 - pip install inspect-ai | 9 - UK AI Security Institute | 8 - default for serious safety/agent evals | 10 - free OSS | 9.4 |
| 29 | Inspect Evals | Eval Suites | Python | 10 - SWE-bench, Agentic, etc out of box | 10 - pip install inspect-evals | 9 - UK AISI | 8 - rising eval standard | 10 - free OSS | 9.4 |
| 30 | LlamaIndex | RAG/Agent Framework | Python + TS | 9 - agents, query engines, structured output | 10 - pip install llama-index | 9 - LlamaIndex Inc, scale-tested | 10 - default RAG framework, 38K+ stars | 10 - free OSS | 9.4 |
| 31 | Mem0 | Agent Memory | Python + TS | 10 - universal memory, OpenMemory MCP server | 10 - pip install mem0ai | 8 - 48K+ stars, Mem0 Inc | 9 - default chosen for personalization in 2026 | 10 - free OSS, paid cloud | 9.4 |
| 32 | smolagents | Agent Framework | Python | 10 - code-first agents, ~1K LOC core, Hub native | 10 - pip install smolagents | 8 - 26K+ stars, Hugging Face maintained | 9 - fastest-growing OSS framework in 2026 | 10 - free OSS | 9.4 |
| 33 | Tenacity | Retry Library | Python | 9 - retry/backoff lib agents use around httpx | 10 - pip install tenacity | 10 - mature | 9 - default retry helper | 10 - free OSS | 9.4 |
| 34 | Uvicorn | ASGI Server | Python | 9 - paired with FastAPI by default | 10 - pip install uvicorn | 10 - mature | 9 - default FastAPI runner | 10 - free OSS | 9.4 |
| 35 | vLLM | Local LLM Runtime | Python + CUDA | 9 - PagedAttention, FlashAttention 4, Anthropic-compat in v0.17 | 7 - GPU + CUDA, vllm-mlx for Apple | 10 - production at every major lab + serving company | 10 - default high-throughput inference engine | 10 - free OSS | 9.4 |
| 36 | Playwright | Browser Automation | Python + TS | 9 - Test Agents, MCP server, accessibility tree | 8 - pip install playwright && playwright install | 10 - Microsoft, used by every browser-agent stack | 10 - de facto browser automation standard | 10 - free OSS | 9.3 |
| 37 | Qdrant Client | Vector DB Client | Python + TS + Rust | 9 - rich filtering, named vectors, hybrid search | 10 - pip install qdrant-client | 9 - Qdrant in Rust, production-grade | 9 - top-3 OSS vector DB | 10 - free OSS, cloud | 9.3 |
| 38 | Weaviate Client | Vector DB Client | Python + TS + Go | 9 - hybrid search, modules, generative | 10 - pip install weaviate-client | 9 - Weaviate B.V., scale | 9 - top-3 OSS vector DB | 10 - free OSS, cloud | 9.3 |
| 39 | browser-use | Browser Agent | Python | 10 - browser-first, accessibility-tree input, multi-tab | 9 - pip install browser-use + Playwright | 8 - 50K+ stars, used in many demos | 9 - reference design for OSS browser agents | 10 - free OSS | 9.2 |
| 40 | Docker | Sandbox/Container | binary | 8 - first-class in OpenAI Agents SDK harness | 9 - install Docker Desktop or engine | 10 - universal | 10 - every team has it | 10 - free for most uses | 9.2 |
| 41 | Docker Compose | Container Orchestration | Python/Go | 8 - standard way to run agents + sidecars locally | 10 - bundled with Docker | 10 - Docker | 10 - universal | 10 - free | 9.2 |
| 42 | Hono | Web Framework | TypeScript | 9 - edge/Bun/Cloudflare-native agent surfaces | 10 - npm i hono | 9 - Hono team | 9 - rising fast for TS agents | 10 - free OSS | 9.2 |
| 43 | LangChain Core | Agent Framework | Python + TS | 9 - tools, runnables, MCP adapters | 10 - pip install langchain | 8 - very fast iteration sometimes breaks API | 10 - 100K+ stars, biggest ecosystem | 10 - free OSS | 9.2 |
| 44 | LiveKit Agents | Voice Agent | Python | 10 - WebRTC voice agents, plugin model | 9 - pip install "livekit-agents [openai,silero,deepgram,cartesia]" | 9 - LiveKit, used by ElevenLabs, OpenAI Voice | 9 - default OSS voice agent stack | 8 - infra paid | 9.2 |
| 45 | Llama 4 | Open-Weight Model | Python | 9 - Meta agentic OSS line | 8 - HF download | 9 - Meta | 10 - largest open-weight install base | 10 - free | 9.2 |
| 46 | pgvector | Vector Index | Postgres extension | 8 - SQL-native, just Postgres | 9 - CREATE EXTENSION vector | 10 - Postgres, battle-tested | 10 - default vector layer at most companies | 10 - free OSS | 9.2 |
| 47 | Suprsonic SDK | Capability API SDK | Python + TS | 10 - one key unlocks 17+ capabilities, Agent API | 10 - pip install suprsonic / npm i suprsonic | 8 - production at O-mega, hardening | 8 - rising in horizontal-agent stacks | 9 - credit-billed, free tier | 9.2 |
| 48 | uv | Python Package Manager | Rust binary | 9 - sub-second resolves unlock per-agent venvs | 10 - one curl install, zero deps | 9 - Astral, used by Anthropic, OpenAI, Pydantic team | 9 - replacing pip across new agent repos in 2026 | 10 - free OSS | 9.2 |
| 49 | DeepSeek-V4 / R-series | Open-Weight Model | Python | 9 - reasoning + coding, low-cost API | 9 - HF or API SDK | 9 - DeepSeek | 9 - default cheap-frontier | 10 - free OSS | 9.1 |
| 50 | Promptfoo | Eval CLI | Node | 9 - red-teaming, regression evals, CI-native | 10 - npx promptfoo | 9 - Promptfoo Inc | 8 - default OSS eval CLI | 10 - free OSS | 9.1 |
| 51 | Qwen 3.5 (Qwen-Agent) | Open-Weight Model + Agent | Python | 9 - native function calling, top OSS coding | 9 - HF + Qwen-Agent pkg | 9 - Alibaba | 9 - rising fast in 2026 | 10 - free | 9.1 |
| 52 | Sentry SDK | Error Tracking | Python + TS + many | 8 - errors + AI agent context now | 10 - pip install sentry-sdk | 10 - Sentry | 10 - default errors stack | 8 - free tier + paid | 9.1 |
| 53 | Tavily SDK | Search API SDK | Python + TS | 9 - agent-shaped search + extract + crawl | 10 - pip install tavily-python | 9 - Tavily, acquired by Nebius | 9 - default agent search SDK | 9 - free 1K/mo + paid | 9.1 |
| 54 | Aider | Coding Agent CLI | Python | 9 - mature pair-programmer, multi-file edits | 10 - pip install aider-chat | 9 - one of the longest-running tools | 8 - smaller than Claude Code now | 9 - free, paid model | 9.0 |
| 55 | Browserbase SDK | Browser-Agent Infra | TS + Python | 10 - hosted browsers, Stagehand-native | 9 - npm i @browserbasehq/sdk | 9 - Browserbase Inc | 9 - one of two default hosted-browser stacks | 7 - paid SaaS | 9.0 |
| 56 | Chroma | Vector DB Client | Python + TS | 9 - embedded mode, MCP, collections | 10 - pip install chromadb | 8 - Chroma Inc, in-process default | 9 - default RAG vector DB for prototypes | 10 - free OSS, paid cloud | 9.0 |
| 57 | Claude Agent SDK | Agent SDK | Python + TS | 10 - tool-use first, agents-as-tools, file/exec tools | 9 - pip install claude-agent-sdk | 8 - renamed from Claude Code SDK in early 2026 | 9 - powers Claude Code, every Anthropic agent product | 8 - SDK free, Claude usage paid | 9.0 |
| 58 | Claude Code | Coding Agent CLI | Node 18+ / native | 10 - reference Anthropic agent loop, MCP, subagents, hooks | 9 - npm i -g @anthropic-ai/claude-code or native installer | 9 - shipped daily, used by Anthropic-internal teams | 9 - reference impl for Claude Agent SDK, plugins, skills | 7 - billed via Pro/Max + API | 9.0 |
| 59 | CrewAI | Agent Framework | Python | 9 - role-based crews, hierarchical, marketplace | 10 - pip install crewai | 8 - 46K+ stars, enterprise marketplace | 9 - default in three-framework shortlist | 10 - free OSS, paid cloud | 9.0 |
| 60 | Cursor CLI | Coding Agent CLI | binary | 10 - same agent loop as Cursor IDE | 9 - curl ... | sh install | 9 - Cursor-the-company, scale | 10 - largest agent IDE install base | 6 - usage billed | 9.0 |
| 61 | E2B SDK | Sandbox SDK | Python + TS | 10 - Firecracker microVM agents, code interpreter | 9 - pip install e2b + API key | 9 - 150ms cold starts, 24h sessions | 9 - in OpenAI Agents SDK 8-sandbox set | 7 - paid SaaS, free tier | 9.0 |
| 62 | Firecrawl SDK | Scrape API SDK | Python + TS | 9 - crawl, scrape, extract, MCP | 10 - pip install firecrawl-py | 9 - Firecrawl, scale | 9 - default agent scraping SDK | 8 - paid SaaS, free tier | 9.0 |
| 63 | fzf | Fuzzy Finder | Go binary | 8 - quick selection in agent CLIs | 10 - brew install fzf | 10 - hardened | 9 - default fuzzy finder | 10 - free OSS | 9.0 |
| 64 | GitHub Copilot CLI | Coding Agent CLI | Node | 9 - agent mode in CLI + IDE | 9 - gh extension install github/gh-copilot | 10 - GitHub | 10 - 50M+ users | 7 - paid | 9.0 |
| 65 | Letta | Agent Runtime | Python | 10 - self-editing memory, archival blocks, REST API | 9 - pip install letta + server | 8 - born from MemGPT research | 8 - chosen for long-running agents | 10 - free OSS | 9.0 |
| 66 | OpenAI Agents SDK | Agent SDK | Python + TS | 10 - handoffs, guardrails, sessions, 8-sandbox harness (Apr 2026) | 9 - pip install openai-agents | 9 - replaced Swarm, batched releases, OpenAI maintained | 9 - direct path to GPT-5.5/Codex models | 8 - SDK free, model usage paid | 9.0 |
| 67 | OpenAI gpt-oss | Open-Weight Model Pkg | Python | 9 - apache-2 OpenAI weights, agentic-tuned | 8 - download GGUF/safetensors | 9 - OpenAI | 9 - first-party OSS reset | 10 - free | 9.0 |
| 68 | OpenLLMetry | Observability | Python + TS | 9 - OTel instrumentation, vendor-neutral | 10 - pip install traceloop-sdk | 9 - Apache 2.0, ServiceNow now stewards | 8 - OTel-native, plugs into Datadog/Honeycomb | 10 - free OSS | 9.0 |
| 69 | Outlines | Structured Output | Python | 10 - constrained sampling, regex/grammar/JSON | 10 - pip install outlines | 8 - .txt research project, hardened | 8 - go-to when you need guarantees | 10 - free OSS | 9.0 |
| 70 | Pipecat | Voice Agent | Python | 10 - frame-based pipeline, VAD/STT/LLM/TTS | 9 - pip install pipecat-ai | 8 - Daily, hardening | 8 - alternative to LiveKit Agents | 10 - free OSS | 9.0 |
| 71 | Pydantic AI | Agent Framework | Python | 10 - typed agents, model-agnostic, structured output by default | 10 - one pip, runs on stdlib + Pydantic | 8 - 16.5K+ stars, Pydantic Inc maintained | 8 - rapid adoption among Pydantic users | 10 - free OSS | 9.0 |
| 72 | Transformers | Inference Library | Python | 8 - reference impl, agents use it less directly in prod | 9 - pip install transformers | 10 - HF, foundational | 10 - 200M+ monthly dl | 10 - free OSS | 9.0 |
| 73 | Twilio Python SDK | Telecom SDK | Python | 8 - SMS/voice for agents | 10 - pip install twilio | 10 - Twilio | 10 - default telecom SDK | 7 - usage billed | 9.0 |
| 74 | Zep | Agent Memory | Python + TS | 10 - temporal knowledge graph, +15pt over Mem0 on LongMemEval | 9 - SDK + Zep server / Cloud | 9 - production at scale | 8 - rising in enterprise stacks | 8 - paid cloud, OSS variant | 9.0 |
| 75 | Cursor (IDE) | Coding Agent IDE | Electron | 10 - chat, composer, agent mode | 9 - download installer | 9 - hardened | 10 - largest paid agent IDE | 6 - $20/mo Pro | 8.9 |
| 76 | LanceDB | Vector DB Client | Python + TS + Rust | 9 - on-disk Lance format, multimodal, embedded | 10 - pip install lancedb | 8 - LanceDB Inc | 8 - rising fast for >mem datasets | 10 - free OSS | 8.9 |
| 77 | OpenAI Evals | Eval Framework | Python | 9 - older but still used | 10 - pip install evals | 8 - OpenAI maintained | 8 - mid-tier mindshare in 2026 | 10 - free OSS | 8.9 |
| 78 | vellum / DSPy Optimizers | Prompt/Weight Optimization | Python | 9 - MIPRO/BootstrapFinetune | 10 - in pip install dspy | 8 - active research | 8 - default optimizer suite | 10 - free OSS | 8.9 |
| 79 | Arcade SDK | Auth-First Tools | Python + TS | 10 - just-in-time OAuth, secure agent auth | 9 - pip install arcade-ai | 8 - Arcade, OSS core | 7 - smaller catalog | 9 - generous free | 8.8 |
| 80 | Datadog Agent + Tracer | APM | binary + SDK | 9 - LLM Observability product | 8 - install Datadog Agent | 10 - Datadog | 10 - dominant enterprise APM | 5 - expensive | 8.8 |
| 81 | DSPy | Agent Framework | Python | 9 - programming not prompting, optimizers, MIPRO | 10 - pip install dspy | 8 - Stanford NLP, 19K+ stars | 8 - mindshare among researchers and serious builders | 10 - free OSS | 8.8 |
| 82 | Modal SDK | Sandbox + Infra SDK | Python | 9 - sandboxes, GPU jobs, agents, gVisor | 9 - pip install modal + auth | 9 - Modal Labs, Python-ML scale | 9 - large Python-native install base | 7 - paid SaaS, free tier | 8.8 |
| 83 | PostHog SDK | Product Analytics | Python + TS + many | 8 - LLM analytics, replays | 10 - one pip / npm | 9 - PostHog, OSS option | 9 - rising in agent product analytics | 9 - generous free | 8.8 |
| 84 | PyTorch | ML Framework | Python | 8 - underneath everything | 8 - pip install torch (2GB+) | 10 - Linux Foundation | 10 - universal | 10 - free OSS | 8.8 |
| 85 | Anchor Browser SDK | Browser-Agent Infra | Python + TS | 10 - hosted Chromium for agents, persist + proxy | 9 - pip install anchor-browser + key | 8 - Anchor, used inside O-mega | 8 - rising browser-agent infra | 7 - paid SaaS | 8.7 |
| 86 | AutoGen | Agent Framework | Python + .NET | 8 - mature multi-agent, AutoGen 2 redesign | 9 - pip install autogen-agentchat | 9 - Microsoft, 36K+ stars | 9 - large enterprise install base | 10 - free OSS | 8.7 |
| 87 | Continue.dev | Coding Agent IDE | VS Code + JetBrains | 9 - in-editor agent, MCP, custom models | 9 - install from marketplace | 8 - Continue Inc | 8 - alternative to Cursor for OSS-leaning teams | 10 - free OSS | 8.7 |
| 88 | Daytona SDK | Sandbox SDK | Python + TS | 10 - sub-90ms cold starts, AI-first pivot | 9 - pip install daytona-sdk + API key | 8 - youngest of the three, hardening | 8 - in OpenAI Agents SDK 8-sandbox set | 8 - paid SaaS, free tier | 8.7 |
| 89 | Inngest SDK | Durable Workflow | TypeScript + Python + Go | 9 - event-driven, Temporal-compat workflows Feb 2026 | 9 - npm i inngest + cloud | 9 - production-ready | 8 - rising in serverless agents | 8 - paid cloud, OSS | 8.7 |
| 90 | MLX | Apple-Silicon ML | Python + Swift | 8 - Apple-Silicon native, vllm-mlx wrapper | 10 - pip install mlx-lm, M-series only | 9 - Apple maintained, mature in 2026 | 8 - dominant on Mac | 10 - free OSS | 8.7 |
| 91 | OpenAI Codex CLI | Coding Agent CLI | Rust binary | 10 - GPT-5.5 native, sandbox proxy, MCP | 9 - npm i -g @openai/codex or native | 8 - GA in 2025, hardened through 2026 | 8 - tied to OpenAI account | 7 - usage billed | 8.7 |
| 92 | Temporal SDK | Durable Workflow | Python + TS + Go + Java | 9 - durable agents, $5B valuation Feb 2026 | 8 - SDK + Temporal cluster | 10 - 9.1T lifetime executions on cloud | 9 - enterprise default for durable | 7 - cloud paid, OSS | 8.7 |
| 93 | CUDA Toolkit | GPU Runtime | binary | 8 - serving stack dep | 6 - heavy install, version traps | 10 - NVIDIA | 10 - dominant | 10 - free | 8.6 |
| 94 | Stagehand | Browser Agent | TypeScript | 9 - high-level act/observe/extract on top of Playwright | 9 - npm i @browserbasehq/stagehand | 8 - Browserbase | 8 - rising fast in TS-first stacks | 9 - free OSS, paid runtime | 8.6 |
| 95 | Trigger.dev SDK | Durable Workflow | TypeScript | 9 - serverless durable jobs, AI-native | 10 - npm i @trigger.dev/sdk | 8 - Trigger.dev v3 hardened | 8 - default TS background jobs | 8 - paid cloud, OSS | 8.6 |
| 96 | Windsurf | Coding Agent IDE | Electron | 10 - Cascade agent, deep edit | 9 - download installer | 8 - Codeium, acquired by OpenAI | 8 - second-largest paid agent IDE | 7 - tiered | 8.5 |
| 97 | Codex (OpenAI) IDE | Coding Agent IDE | hosted | 10 - GPT-5.5 native, multi-task | 9 - chatgpt.com/codex | 8 - GA 2025, evolving | 8 - large but newer | 6 - billed | 8.4 |
| 98 | Hyperbrowser SDK | Browser-Agent Infra | TS + Python | 9 - alt hosted browser, proxy + stealth | 9 - npm i @hyperbrowser/sdk | 8 - newer | 7 - growing | 8 - paid SaaS | 8.4 |
| 99 | Nango SDK | Integration SDK | TypeScript + Python | 9 - 700+ APIs, OSS, MCP server | 9 - npm i nango + Nango cluster | 8 - mature | 8 - widely used | 7 - $50+/mo | 8.4 |
| 100 | Restate SDK | Durable Workflow | TypeScript + Python + Java | 9 - sidecar durable execution | 9 - SDK + Restate binary | 8 - commercial launch Mar 2026 | 7 - newer, growing | 9 - open source core | 8.4 |
There are a few things worth noticing about this board. The very top is dominated not by frameworks but by substrate: schema libraries (Pydantic, Zod), parser libraries (tree-sitter), shell binaries (ripgrep, jq, GitHub CLI), and HTTP/server primitives (httpx, FastAPI). These score perfectly because they fail almost no criterion. They are free, they install in one command, they have years of production hardening, every framework imports them, and they are exactly what agents shell out to or wrap. The second band is the agent surface: LangGraph, Claude Code, the OpenAI Agents SDK, Pydantic AI, Claude Agent SDK, Playwright. These are where the agent loop actually lives. The third band is runtime and infra: vLLM, Ollama, llama.cpp, E2B, Daytona, Modal, Temporal, Inngest. None of this would matter without the substrate beneath it, which is why we ranked them on a single board rather than splitting tiers into separate tables.
2. Scoring Criteria and Weights
The five criteria below are not generic. They were chosen specifically for installable AI-agent tooling, where the relevant questions are not "what API does it expose" but "what does it actually take to land this in your stack and run it for a year." Each criterion is followed by why it carries the weight it does, and what a 0, 5, and 10 actually look like.
Agent-Readiness (30%, the heaviest weight) measures whether the install was designed for the way agents work. Concretely: native function-calling, Pydantic/Zod-shaped tool schemas, MCP support, async-first APIs, structured-output guarantees, streaming, sane interrupt and cancellation semantics, and parallel-safety. A package that scores 10 here is something an agent loop can use directly without wrapping; a package that scores 0 either fights the agent loop or has no model-aware surface at all. This carries the heaviest weight because the entire point of the install is to be used by agents. Being free, fast to install, and broadly used does not matter if the surface is hostile to the agent loop.
Install Friction (15%) measures how painful it is to actually land the package on a machine. One-line installs (pip install x, npm i x, brew install x, curl ... | sh) score high. Heavy native dependencies, GPU/CUDA requirements, multi-step manual setup, finicky version pinning, and operating-system traps drag the score down. This carries 15% because in practice it is one of the strongest predictors of whether a builder actually adopts the install or quietly switches to something easier. The agent ecosystem of 2026 is friction-allergic: if you are not a one-line install, you are competing against five things that are.
Reliability at Production Scale (25%) measures whether the install survives running in production for a year. Stable APIs across versions, real users at scale, active maintenance, sane release cadence, low rate of breaking changes, and a real failure-recovery story. A 10 here is something you can pin to a major version and forget; a 0 here is something that breaks every month. This carries the second-heaviest weight because every other criterion gets erased if the install is unstable at scale. We have all watched a 50K-star OSS framework from 2024 evaporate or fork its way into incompatibility, and that is the risk this column is pricing in.
Ecosystem Pull (20%) measures the gravitational mass of the install. GitHub stars, weekly/monthly downloads, framework neighbors, community velocity, integrations, and how often the install shows up as a dependency of other installs on this list. Tree-sitter is a textbook example: it is not flashy, but it is so deep in the substrate (ast-grep, Neovim, GitHub Code Search, Cursor, Claude Code, every coding agent worth shipping) that its ecosystem pull is effectively maximum. Lower-pull installs may be excellent technically but harder to hire for, harder to find tutorials for, and harder to debug when you hit an edge case at 2 AM.
Cost-to-Run (10%, the lightest weight) measures the actual cost over a realistic year of agent operation. Free OSS that runs anywhere scores 10. Paid SaaS with a free tier and reasonable per-unit pricing scores 7-9. Paid SaaS with a $750/month minimum or per-seat enterprise pricing scores 3-5. This carries the lightest weight because most installable tooling in 2026 is free or near-free at the install layer, and the dominant cost is model usage on hosted APIs (which is not what this article is about). For sandboxes, observability SaaS, and durable workflow clouds, cost-to-run does matter, and the weighting captures that without letting it dominate.
The weights are deliberately top-heavy on agent-readiness and reliability because in practice those are what determine whether a project ships. A free, easy-to-install package that fights the agent loop is worse than a paid, harder-to-install package that hands you the right primitives. The board reflects that bias, and so do the rankings: the substrate that scores 10/10/10 across the agent-reading and reliability dimensions is what surfaces at the top.
3. The Coding Agent Stack (Where Half the Installs Live)
Coding agents are the densest install category in 2026. Roughly 40 of the 100 installs above are either coding-agent specific or are most often installed for a coding-agent purpose even if they are general-purpose tools. That is not an accident. Coding agents are the most economically successful agent archetype today, with paid IDE-and-CLI products (Cursor, Claude Code, Copilot, Codex, Windsurf, Continue) generating real monetised usage and pulling a large ecosystem of supporting installs along with them. We covered the broader market in our Claude Code pricing guide, and the structural reason coding agents work better than other archetypes is that the test-and-feedback loop is genuinely tight: the code either compiles or it does not, the test either passes or it does not, the diff either lands or it does not. That tightness rewards iteration, and iteration is exactly what an agent loop does well.
The coding-agent install stack has three strata. The top stratum is the agent product itself: a CLI or IDE that wraps a model and exposes an agent loop. Claude Code is the reference implementation here, with the broadest set of agent primitives (subagents, hooks, MCP, skills) that we have seen ship in a coding agent. OpenAI Codex CLI is the GPT-5.5-native equivalent, rebuilt in Rust, with sandbox proxy networking and ChatGPT device-code auth - OpenAI Codex changelog. Cursor CLI, Aider, Continue.dev, GitHub Copilot CLI, and Windsurf round out the band. None of these is "right." They all wrap the same agent loop pattern around a different surface (terminal vs IDE) and a different default model.
The middle stratum is the shell tooling the agent shells out to. Coding agents do not write file system code from scratch; they call binaries. ripgrep for search, fd for file find, jq for JSON, fzf for fuzzy selection, GitHub CLI for repo and PR work, ast-grep for structural search and rewrite, tree-sitter as the parser library underneath ast-grep and most code-aware tools. ast-grep specifically has emerged as the install that closes the largest capability gap in 2026: an AI assistant limited to regex-based search and replace is genuinely more limited than one that can write structural patterns over an AST - batsov.com. ast-grep also exposes itself as an MCP server, which means an agent that has both a shell and an MCP client can use it through whichever surface it prefers. In practice, the agents that handle large multi-file refactors well are the ones whose operators have installed ast-grep alongside ripgrep and fd, not just ripgrep alone.
The bottom stratum is the language-runtime and package-manager substrate. uv has eaten the Python install layer at most new agent-stack repos, replacing pip with sub-second resolves that make per-agent virtualenvs cheap to spin up and discard. pnpm has become the default for serious TypeScript agent monorepos because of disk-efficient hard-linking and faster cold installs than npm or Yarn - pnpm vs Bun benchmarks. Bun is faster on cold installs but ties you to its runtime; for an agent stack that wants Node compatibility, pnpm 10 is the safer default. The package manager looks like a low-stakes choice until you are running 50 ephemeral sandboxes per minute and the per-resolve overhead becomes the bottleneck. Then it is the most important install in the stack.
The other thing to notice about the coding-agent strata is that almost none of these top-stratum products are built from scratch. They lean on OpenAI/Anthropic/Google SDKs at the model layer, on Pydantic/Zod at the schema layer, on httpx/fetch at the transport layer, on tree-sitter at the parsing layer, and on ripgrep/fd/jq at the shell layer. The CLI is mostly orchestration. Once you understand that, the install list reads less like "100 unrelated packages" and more like a layered cake, where each install solves a specific layer and the agent product is the icing.
4. Agent Frameworks: LangGraph, CrewAI, AutoGen, Pydantic AI, smolagents
The framework layer is where most builders make their first significant install decision, and in 2026 the choice has narrowed but not collapsed. LangGraph sits at the top of this list because it has the broadest production install base, the deepest set of agent primitives (state machines, interrupts, time travel, persistent checkpoints), and the largest neighboring ecosystem (LangChain, LangSmith, MCP adapters, eval tooling) - LangGraph PyPI. Its 43.2M monthly PyPI downloads is the single largest install number in this entire guide outside of generic primitives. That number tells you nothing about whether LangGraph is the right choice for your specific agent, but it does tell you that the path is well-trodden, the issue tracker has answers, and the people you might hire have probably touched it.
CrewAI sits in the next band with 46K+ stars and a marketplace push that has made it the default choice for "multiple specialised agents collaborate on a task" patterns - DecisionCrafters. Its abstractions (crews, agents, tasks, processes) are easier to reason about for non-engineers than LangGraph's graph runtime, which has made it popular in product-engineering teams that include non-Python-natives. The trade-off is less control over the agent loop's internals; you get less rope and less rope-burn.
AutoGen has the longest production track record and the deepest Microsoft-shop integration. AutoGen 2 redesigned the framework around a more robust messaging substrate, and its 36K+ stars reflect a hard core of enterprise installations. Pydantic AI, with 16.5K+ stars, is the youngest of this set but the cleanest in design: typed agents, model-agnostic, structured output is the default rather than a plugin, and it inherits Pydantic's reputation for schemas that do not rot under refactoring. smolagents, also from this generation, is the code-first outlier: instead of tool calling, agents write code in a sandboxed Python interpreter and the framework runs it - DecisionCrafters smolagents. Its 1K-LOC core means you can read the entire framework in an afternoon, which is something you cannot say about LangGraph or AutoGen.
The right way to choose is not "which is best" but "which constraint dominates." If your team is Python-native and likes typed surfaces, Pydantic AI. If you are doing graph-shaped multi-agent work with checkpoints, LangGraph. If you are doing role-based collaboration with a non-engineer-friendly mental model, CrewAI. If you are deeply Microsoft-oriented, AutoGen. If you want the smallest surface possible and trust the model to write code, smolagents. Suprsonic sits beside any of these as the capability layer, not as a competitor: every framework above can call Suprsonic via REST or via the MCP server, and most of our internal agents at O-mega use Pydantic AI or LangGraph wrapping Suprsonic capabilities for search, scrape, enrichment, and generation - the internal-tooling architecture covers this pattern in more depth.
What people miss when they pick a framework on stars alone is that the agent SDKs from the model labs (OpenAI Agents SDK, Anthropic Claude Agent SDK, Google ADK) are now strong enough that they substitute for a third-party framework in a meaningful share of cases. We cover those next, and they belong on the same shortlist when you are choosing.
5. First-Party Agent SDKs: OpenAI, Anthropic, Google
The 2026 release of native agent SDKs from each major lab has changed how the framework decision is made. OpenAI Agents SDK replaced the experimental Swarm framework with a production-grade toolkit organised around handoffs (agents transferring control with conversation context) and equipped with guardrails, sessions, and a sandbox harness that as of April 2026 supports eight providers (E2B, Modal, Docker, Vercel, Cloudflare, Daytona, Runloop, Blaxel) plus a model-native filesystem-and-exec harness - OpenAI Agents SDK sandbox blog. The sandbox harness alone is a major adoption driver because it shifts the secure-code-execution problem from "build-your-own" to "pick a provider and pass a config."
Anthropic Claude Agent SDK was renamed from Claude Code SDK in early 2026 to reflect a broader ambition than just code, and is now the SDK behind Claude Code itself. It takes a tool-use-first approach where agents are Claude models equipped with tools, including the ability to invoke other agents as tools - Anthropic Agent SDK overview. For builders who want to use the same agent loop that Anthropic uses internally for Claude Code, this is the install. It is also the most natural path if you plan to ship MCP-native agents because Anthropic is the protocol's original author and the SDK's MCP support is first-class.
Google Agent Development Kit (ADK) introduced in April 2026 takes a hierarchical agent-tree orchestration model and is the only first-party SDK that incorporates multimodal capabilities natively, allowing agents to process images, audio, and video through Gemini's multimodal API - HolySheep AI showdown. For voice, vision-inspection, and document-understanding agents, ADK's native multimodal posture is a real differentiator over the OpenAI and Anthropic SDKs, both of which support multimodal but treat it as a tool surface rather than a built-in.
The strategic question for 2026 is whether to build on a first-party SDK or a third-party framework. The honest answer is mostly first-party, with third-party where you need cross-model abstraction. If you are confident you will be on Claude for the next year, Claude Agent SDK is hard to beat for Claude-native work. If you are GPT-native, OpenAI Agents SDK is the right call. If you genuinely need to swap models across vendors for cost or capability reasons, LangGraph or Pydantic AI gives you a portable abstraction. The mistake people made in 2024-2025 was treating "framework" as a portability decision when most teams stayed on one model anyway. The 2026 mistake to avoid is the inverse: locking into a first-party SDK when your roadmap genuinely involves multi-model routing. LiteLLM as the routing layer sits underneath either choice and erases most of the friction.
6. Coding Agent CLIs: Claude Code, Codex, Aider, Continue, Cursor
The CLI surface is where coding agents have crystallised in 2026. Claude Code ships as both an npm i -g @anthropic-ai/claude-code package and a native installer that requires zero Node.js dependencies - Claude Code CLI guide. Anthropic's stated stance is that the native installer is the supported path, and npm remains fully supported as a fallback. Inside the binary the agent loop wraps Claude Sonnet or Opus, exposes hooks for pre/post-tool events, supports subagents for parallel work, and integrates MCP servers for tool extensions. This is the most feature-rich coding-agent CLI today.
OpenAI Codex CLI is the GPT-5.5-and-Codex-models-native equivalent, rebuilt in Rust for speed and shipping with sandbox proxy networking on Windows, ChatGPT device-code sign-in, MCP support, and a codex exec mode that takes a prompt plus stdin for one-shot scripted use - OpenAI Codex CLI reference. The April 2026 release also restored several TUI and MCP workflows after a regression earlier in the year. For OpenAI-native shops or for teams that want a Rust-based CLI rather than Node, Codex is the install. Aider is the well-established, mature pair-programmer that runs in the terminal and supports multiple model backends; it is older than both Claude Code and Codex CLI and is the right choice if you want a single tool that works with Claude, GPT-5.5, Gemini, or any local model interchangeably - Aider GitHub.
Continue.dev is the in-IDE alternative for VS Code and JetBrains, designed for teams that want an open-source, model-agnostic agent in the editor without paying for Cursor or Windsurf. Cursor CLI mirrors the Cursor IDE's agent loop in a terminal surface and lets Cursor users keep one mental model across both. GitHub Copilot CLI ships as gh extension install github/gh-copilot and brings GitHub-native agent capabilities to the terminal for the 50M+ users on Copilot. Each of these is a defensible install, and the right choice depends on what you already pay for, what models you are willing to be locked to, and whether you want IDE-resident or terminal-resident behavior.
The pattern across these CLIs is that they all converge on roughly the same primitive set: file-aware editing, shell command execution, MCP tool extension, sandboxing, and either subagents or task delegation of some form. The differences are surface, not substance. What is real is that shipping a coding agent without ripgrep, ast-grep, fd, jq, and the GitHub CLI on the host machine is leaving most of the agent's capability on the table. The CLI is the orchestrator; the binaries are the leverage. Builders who skip the binary install layer end up with an agent that "feels slow" without ever realising the slowness is a regex search where a structural search would have collapsed the work to one call.
7. Browser and Computer-Use Installs
Browser automation is the second-densest install category after coding agents. The dominant low-level library is Playwright, which since 2025 has actively repositioned itself for AI agents with the addition of Test Agents (planner, generator, healer), an MCP server, and accessibility-tree-based interaction modes that are deterministic and require no vision models - Playwright AI ecosystem 2026. For Python and TypeScript, pip install playwright followed by playwright install is a one-line setup that lands a fully working browser stack with Chromium, Firefox, and WebKit. There is essentially no defensible reason to choose Puppeteer over Playwright in 2026 for an agent context.
On top of Playwright sits the agentic browser layer. browser-use is the Python-first library that has crossed 50,000+ GitHub stars and powers many open-source AI agents - browser-use GitHub. It exposes browser control as a single agent-friendly API, handles multi-tab browsing, and has a memory and parallel-agent system. Stagehand is the TypeScript-first equivalent from Browserbase, exposing high-level act, observe, and extract primitives over Playwright. Both libraries make the same trade-off: more control with raw Playwright, more agent-shaped surface with browser-use or Stagehand.
For production browser-agent infrastructure, the choice is between hosted providers and self-hosted Playwright. Browserbase and Anchor are the two most-installed hosted-browser SDKs in 2026, both offering persistent profiles, residential proxies, and stealth modes. Hyperbrowser is a newer entrant in the same category. We cover the trade-offs in our Anchor Browser alternatives guide, and the short version is that for any serious production browser-agent workload at >10K browser-minutes per month you want hosted infrastructure, not a Playwright cluster you maintain.
The newer category to watch is computer-use agents in the OS sense, not the browser sense. OpenAI's Operator and Anthropic's Computer Use both expose pixel-level control over a virtual machine, and the install side of this is provider-side (you call an API, not run a binary locally). For local computer-use, Open Interpreter and Self-Operating Computer remain the two OSS installs to know, but neither has crossed into the same scale as browser-use yet. The structural reason is that browser is a constrained environment with a stable accessibility tree, whereas a full OS is an unconstrained environment where vision models are still expensive and brittle. Until vision-model latency drops another 5-10x, browser-only agents will remain the dominant installable surface.
8. Local Inference Runtimes: vLLM, Ollama, llama.cpp, MLX
Local inference is the most important install category for builders who care about cost, latency, or sovereignty. As covered in our open-source personal AI guide, the four installs that matter in 2026 are vLLM (high-throughput serving), Ollama (single-user simplicity), llama.cpp (the C++ engine underneath much of the rest), and MLX (Apple Silicon native).
vLLM released v0.17.0 on March 7, 2026 with PyTorch 2.10, FlashAttention 4, and Anthropic API compatibility, and v0.16.0 expanded multi-platform support to NVIDIA, AMD ROCm, Intel XPU, and TPU - vLLM benchmarks 2026. It wins on throughput and latency predictability above roughly five concurrent users, and it is the install you want when you are serving an agent stack with bursty or sustained concurrency. The vllm-mlx fork rebuilds the PagedAttention mechanism on top of MLX and is the go-to for high-concurrency Apple Silicon serving, hitting 400+ tokens/sec on M-series hardware - vllm-mlx GitHub.
Ollama has become the "Docker of local LLMs": a process manager and OpenAI-compatible API server that abstracts away model management, hardware detection, and configuration. The 2026 v0.8+ release adds dynamic hardware detection optimised for M4's AMX (Apple Matrix) instruction set. For single-user agent development on a laptop, Ollama is the right install; for production serving, vLLM or vllm-mlx is. The reason to install both is that they coexist comfortably and serve different workloads cleanly.
llama.cpp sits beneath much of this. Builds reached b8200+ as of March 2026, and the release added MCP client support directly in llama-server for tool calling via Model Context Protocol, plus an autoparser for structured output and multiple speed improvements for Qwen 3.5 and linear attention architectures. MLX has matured into the premier Apple Silicon inference framework, with the MLX Community on Hugging Face becoming a primary distribution channel for Mac-optimised models. The structural insight here is that the local inference layer is now a four-tool stack, not a one-tool decision. Builders who treat it as "Ollama vs vLLM" miss that llama.cpp is the engine under Ollama and MLX is the substrate under everything Apple-Silicon-shaped.
The reason this matters for agents is that agent workloads are bursty and parallel, which is exactly the workload pattern that vLLM was designed for and that Ollama struggles with above the single-user threshold. If you are running 50 sub-agents in parallel against a local Llama 4 or Qwen 3.5 model, vLLM will give you 5-10x the throughput of Ollama on the same hardware. If you are running one agent at a time on a laptop, Ollama will give you a better developer experience. Most serious teams install both, route through LiteLLM, and switch backends based on context.
9. Sandboxes and Code Execution: E2B, Daytona, Modal, Docker
When an agent generates code that needs to execute, you do not want it executing in your main process. The four installs that matter for safe code execution are E2B, Daytona, Modal, and Docker, and the OpenAI Agents SDK April 2026 release codified this by adding native support for eight sandbox providers (the four above plus Vercel, Cloudflare, Runloop, and Blaxel) plus a model-native harness with filesystem tools and security features - Modal sandbox products.
E2B uses Firecracker microVMs with 150ms cold starts and is the best AI-first SDK in this category. The microVM model gives you a dedicated kernel per session and hardware-level isolation, but with a 24-hour session limit. Daytona pivoted to AI code execution in 2026 and is the fastest cold-start sandbox at sub-90ms, using Docker containers with a shared kernel for faster init - Daytona vs E2B. The trade-off is that container isolation is weaker than microVM isolation; for adversarial workloads, E2B is the safer install.
Modal is the AI-infra platform that includes sandboxes as one of its products, with sandboxes running on gVisor and dynamically defined at runtime. Modal is the best install for Python-ML-heavy agents because the same SDK that runs your sandbox also runs your GPU jobs, your scheduled functions, and your batch workloads. Docker itself remains the universal fallback: every team has it, and the OpenAI Agents SDK supports it natively for local development. For agents running on a laptop during development, Docker is fine; for production scale or untrusted code, you want a hosted sandbox provider.
The structural reason these installs matter so much in 2026 is that agents that can execute code beat agents that cannot, by a large margin, on most coding-style tasks. SWE-bench numbers, agent benchmarks, and production reports all point to the same conclusion: an agent with sandboxed execution shipped with pip install e2b or pip install modal will outperform an agent that only generates code, because the former can iterate against feedback. Skipping the sandbox install layer is leaving the largest single capability multiplier on the table.
10. Memory Systems: Mem0, Zep, Letta, Supermemory
Agent memory is the install category that matured most visibly in 2025-2026. The state-of-the-art trio is Mem0, Zep, and Letta, each optimising for a different memory shape. Mem0 is the universal memory layer with 48K+ GitHub stars, best for chatbot and personal assistant memory, and it ships an OpenMemory MCP server that any MCP client can talk to - Mem0 GitHub. Zep stores memory as a temporal knowledge graph that tracks how facts change over time, and on the LongMemEval benchmark using GPT-4o it scores 63.8% vs Mem0's 49.0%, a 15-point gap driven by Zep's temporal graph - Mem0 graph memory blog. Letta is an agent runtime built around self-editing memory where agents manage what stays in-context versus archival storage through dedicated memory-management tools.
The right way to think about this install layer is what shape of memory does my agent need. For "remember preferences and personal facts across sessions," install Mem0. For "track how a fact changes over time and reason about that history," install Zep. For "long-running agents that manage their own context window," install Letta. For "ground only on local data with no cloud calls," install Supermemory or SuperLocalMemory, which hit 74% retrieval and 60% zero-LLM accuracy on LoCoMo in mode-A and 87.7% in mode-C with cloud assistance.
The mistake people make is treating memory as one install when it is really three different installs serving three different needs. A horizontal-agent platform might install all three: Mem0 for user-preference memory, Zep for temporal facts, and Letta-style self-editing for the agent's own working context. We cover the architecture pattern in more depth in our building AI agents 2026 insider guide, and the practical consequence for an install list is that memory is rarely one row in requirements.txt; it is two or three. The board ranks Mem0 highest because it is the most-installed, but Zep and Letta are not alternatives to Mem0 so much as complements.
11. Retrieval and Vector Indexes
The vector index layer is more boring than it was in 2024, which is a good thing for builders. The list of installs to know in 2026 collapsed to five: Chroma, Qdrant, LanceDB, Weaviate, and pgvector. Chroma is the default recommendation for most projects: simple to deploy, lightweight, genuinely production-ready despite the "dev tool" reputation, and pip install chromadb lands an embedded mode in seconds - vector DB comparison 2026. Qdrant is open-source, written in Rust, focused on payload filtering and rich query, and is the right install when you are building a legal or financial AI with complex metadata filtering.
LanceDB stands out for larger-than-memory datasets with disk-based indexing and an in-process model that eliminates external database management. Weaviate goes beyond storage and search by including built-in modules for generating embeddings; you can insert raw text and Weaviate handles the vectorization. pgvector is the sleeper choice that has been winning steadily: if you already run Postgres, CREATE EXTENSION vector gives you SQL-native vector search without adding a new database to your stack. For most teams, this is the install path of least resistance.
The right decision rule in 2026 is: if you are building a prototype or a small app, install Chroma. If you already run Postgres, install pgvector. If you have larger-than-memory data, install LanceDB. If you need rich payload filtering, install Qdrant. If you want hybrid search with built-in vectorization, install Weaviate. The fact that the decision tree fits in one sentence each is the real story: the vector-DB layer has commoditised, and the install you pick matters less than how well your retrieval surfaces evolve. We expand on this in our building your first MCP server guide, where the retrieval layer is one half of any tool server worth building.
12. Evals and Observability: Langfuse, Inspect, Promptfoo, OpenLLMetry
You cannot improve what you cannot measure, and the agent-eval and observability install layer is now mature enough that "we will add observability later" is not a defensible engineering position. The three installs that ship for most production stacks are Langfuse, Inspect AI, and Promptfoo, with OpenLLMetry and the broader OpenTelemetry SDK beneath them as the OTel-native substrate.
Langfuse is the open-source LLM engineering platform that combines observability, prompts, evals, experiments, and human annotation into one connected workflow - Langfuse GitHub. Its 15% performance overhead at scale is reasonable for the level of observability it provides. It supports LangGraph, OpenAI Agents, Pydantic AI, CrewAI, n8n, and most agent frameworks out of the box. For most builders, this is the first observability install and it covers most of the pain.
Inspect AI is the eval framework from the UK AI Security Institute and has emerged as the default install for serious agent and safety evals. It supports model graders, sandboxed eval environments, and a large library of off-the-shelf eval suites including SWE-bench and agentic benchmarks. Promptfoo is the eval CLI with strong red-teaming and regression-eval support, designed to run in CI alongside your tests. OpenLLMetry instruments LLM and agent applications with OpenTelemetry semantic conventions, and as of March 2026 ServiceNow acquired Traceloop (the company behind OpenLLMetry) for an estimated $60-80M, but the OSS project continues under Apache 2.0 - OpenLLMetry 2026 morphllm.
The structural insight in this layer is that OpenTelemetry has won as the eventual substrate. OTel GenAI semantic conventions are now GA, which means traces from any framework can flow into any backend (Langfuse, Datadog, Honeycomb, New Relic) without per-vendor instrumentation. The install pattern that scales is: instrument with OpenLLMetry or framework-native OTel, route to Langfuse for LLM-specific dashboards, and route to Datadog or your existing APM for cross-cutting infra. The mistake is locking into a vendor-specific SDK that does not emit OTel; that closes your options in 12-18 months when something better appears.
13. Durable Workflow Runtimes: Temporal, Inngest, Restate, Trigger.dev
Agents are flaky in ways that traditional applications are not. Models time out, tools fail, networks blip, sandboxes get killed. Agents that survive a year in production survive on durable execution, and the four installs that matter are Temporal, Inngest, Restate, and Trigger.dev. We covered the broader pattern in our analysis of agent infrastructure, and the install layer crystallised in 2026 around these four.
Temporal is the industrial-grade option, born from Uber's Cadence project, and the market validated this thesis decisively when Temporal raised $300M at a $5B valuation on February 17, 2026, with 9.1 trillion lifetime action executions on its cloud - Temporal vs Restate 2026. For workflows that absolutely cannot fail (financial transactions, multi-day sagas, regulated domains), Temporal is the install. Trigger.dev v3 reimagined background jobs for the serverless era: write TypeScript functions, Trigger.dev handles execution, retries, and observability. For 90% of TypeScript-first agent stacks, Trigger.dev is the simplest path to reliable background jobs.
Inngest adopts a serverless-first, event-driven philosophy and shipped Temporal-compatible workflows in February 2026, which collapses some of the historical Temporal-vs-Inngest tradeoff. Restate launched commercially in March 2026 and takes a different shape: rather than a separate cluster, it is a lightweight sidecar that intercepts your HTTP calls and adds durability. For agents that mostly call HTTP services and want durability without re-architecting around a workflow engine, Restate is the install to evaluate.
The broader insight is that durable execution is not a "nice to have" for agents; it is what separates demos from production. An agent that works 95% of the time is unusable at scale because the 5% failures compound into stuck sessions, lost work, and user trust erosion. The install pattern that works in 2026 is: agent loop in your framework of choice, durable execution wrapping anything that touches the network or a sandbox, observability through Langfuse or OTel. Skipping the durable layer is the most common reason agent prototypes do not graduate to production.
14. Voice Agents: LiveKit Agents, Pipecat
Voice is the install category that scaled fastest in 2025-2026 and is now the second most economically successful agent archetype after coding. The two installs that dominate the open-source voice-agent stack are LiveKit Agents and Pipecat. LiveKit Agents is the WebRTC-native framework with a plugin model: pip install "livekit-agents [openai,silero,deepgram,cartesia,turn-detector]" lands a working voice agent stack with VAD (Silero), STT (Deepgram), LLM (OpenAI), TTS (Cartesia), and turn detection in one install - LiveKit Agents GitHub. It is used inside ElevenLabs, OpenAI's voice products, and many production voice startups.
Pipecat is the Python-native equivalent from Daily, designed around a frame-based streaming model that orchestrates VAD/STT/LLM/TTS with automatic interruption handling. For teams that want a more Pythonic abstraction over the same primitives, Pipecat is the right install. Vapi and Retell are the turnkey hosted alternatives; for teams without DevOps or that need a 5-minute setup, they are defensible, but you give up the customisation that LiveKit Agents and Pipecat offer.
The decision rule for voice is similar to other agent categories: build with LiveKit Agents or Pipecat when you exceed 10-50K minutes/month because of cost, when you need stricter compliance (HIPAA, SOC2), when you require <500ms end-to-end latency, when you need full observability, or when you have deep custom integrations - AssemblyAI orchestration tools 2026. For lower volumes or rapid prototyping, hosted is fine. For serious voice infrastructure, the install layer is non-negotiable.
15. Structured Output: Instructor, Outlines, LiteLLM
Structured output is where agent reliability comes from. An agent that returns prose is hard to compose; an agent that returns a typed Pydantic model or a Zod-validated object is composable. The three installs that own this layer in 2026 are Instructor, Outlines, and LiteLLM. Instructor released its latest version on January 29, 2026 and is the default Python install for structured outputs, supporting Pydantic-validated responses across LiteLLM and 100+ providers - Instructor PyPI. It is also available in TypeScript, Go, and Ruby.
Outlines takes a different approach: rather than validating after generation (Instructor's pattern), Outlines uses constrained token sampling to guarantee structured output during generation directly from the LLM. For cases where you need hard guarantees (the JSON will parse, the regex will match, the grammar will hold), Outlines is the install. The trade-off is slightly more setup and a smaller provider list, since constrained sampling requires logit access that not every API exposes.
LiteLLM is not strictly a structured-output library but is the universal substrate beneath them. It exposes 100+ LLM providers behind a single OpenAI-compatible API, and pairing LiteLLM with Instructor is one of the most powerful structured output setups available. For agents that need to swap models across providers without rewriting the structured-output layer, the install pattern is pip install litellm instructor and you get retries, validation, and provider abstraction in one stack.
The reason to install all three (LiteLLM at the routing layer, Instructor for validation, Outlines for hard constraints) is that they are not substitutes. Each solves a different failure mode. Skipping any of them means re-implementing it in your agent code. We cover the broader pattern in our LLM tool gateways guide, and the practical takeaway is that structured output is no longer optional for production agents, and the install layer for it is mature enough that there is no excuse to roll your own.
16. Code-Search Primitives: ripgrep, ast-grep, tree-sitter, fd
This category has the highest score-to-stars ratio on the entire list, because the installs are deeply unsexy and absolutely critical. ripgrep is the search tool every coding agent shells out to; ast-grep is the structural search that closes the largest capability gap in 2026; tree-sitter is the parser library underneath ast-grep and most code-aware tooling; fd is the file-find binary that replaces find with something that respects .gitignore. None of these is an "agent" install, but every coding agent that ships in 2026 either depends on them or should.
ripgrep is shipped inside VS Code, Cursor, Claude Code, and Codex CLI - Supercharging Claude Code with the right tools. It is the default search binary across the agent stack and the only meaningful improvement over it is structural search via ast-grep. ast-grep supports 20+ languages via tree-sitter, exposes itself as an MCP server, and is the install that turns "an AI assistant limited to regex-based search and replace" into "an AI assistant that can write structural patterns over an AST." If you only install one binary specifically for coding agents in 2026, install ast-grep.
tree-sitter sits underneath ast-grep, Neovim, GitHub Code Search, and many editors. It is what makes parsing 20+ languages feasible in a single tool. fd is a file-search binary that respects .gitignore by default and is the install that makes "find files by pattern" not embarrassingly slow on a large repo. jq is the JSON tool every agent shells out to for structured data manipulation. fzf is the fuzzy finder that powers interactive selection in most agent CLIs. GitHub CLI (gh) is the GitHub API surface every coding agent uses to read PRs, comment, and create issues.
The reason these score so high on the master board is that they fail almost no criterion: free, one-line install, decade of production hardening, every agent uses them, no model dependency. The criticism that "these are not AI installs" misses the structural point: an AI-agent stack is not just AI installs. It is the AI surface plus the substrate the AI shells out to. Builders who skip the substrate end up with agents that compete on prompt-engineering rather than on capability. Builders who install the substrate end up with agents that have actual leverage.
17. Package Managers and Runtimes: uv, pnpm, Bun, Node, Python
This category is about the substrate beneath everything. uv has become the default Python package manager for new agent-stack repos in 2026, replacing pip with sub-second resolves that make per-agent virtualenvs cheap to spin up and discard. For an agent stack that runs many ephemeral sandboxes per minute, the per-resolve overhead of pip is genuinely a bottleneck and uv erases it. The install is one curl line, zero dependencies, and Astral (the maintainer) has the same engineering bench as Anthropic, OpenAI, and Pydantic.
On the JavaScript side, the 2026 leaderboard is npm 11.x, Yarn 4.x (Berry), pnpm 10.x, and Bun 1.3. Bun is 4-5x faster than pnpm 10 on cold installs and 10-30x faster than npm, but it locks you into the Bun runtime - pnpm vs Bun 2026. pnpm 10 has 65.5M weekly npm downloads vs Bun's still-modest CLI adoption, universal Node.js compatibility, battle-tested monorepo support, and uses 70% less disk space than npm by storing packages once in a global store and hard-linking. For most agent monorepos, pnpm is the right install. Bun is right when you want a Node-runtime alternative and your codebase can tolerate the migration.
Node.js 22+ is the install requirement for Claude Code's npm path and most modern TypeScript agent stacks. Python 3.11+ is the practical floor for most agent libraries (Pydantic AI requires it, smolagents recommends it, some LiteLLM features need it). The runtime version is one of those installs that nobody thinks about until something silently fails on 3.10 or 3.9, at which point you discover that the version pin in your pyproject.toml was the actual problem all along. The boring lesson is that runtime version is not a free choice; pick the latest stable LTS for both Node and Python and revisit only when something forces you to.
18. How Suprsonic Fits Alongside These Installs
Suprsonic is a capability-API SDK that sits beside, not against, every install on this list. The model is simple: one Suprsonic API key, one credit pool, one unified response envelope, and 17+ capabilities that an agent does not have natively (search, scrape, profile enrichment, email finding, image generation, text-to-speech, transcription, document extraction, file conversion, screenshot capture, invoice parsing, subtitle generation, image background removal). The install is pip install suprsonic or npm i suprsonic, plus an omk_-prefixed key, plus zero per-provider OAuth.
The reason Suprsonic earns a spot at #47 on this board is not because it is competitive with LangGraph or Pydantic AI; it is because it is complementary to all of them. An agent built on LangGraph, Pydantic AI, or the OpenAI Agents SDK still needs to search the web, scrape pages, find emails, generate images, and convert files. Without Suprsonic, that means signing up for 6-12 different services, managing 6-12 API keys, handling 6-12 response formats, and paying 6-12 bills. With Suprsonic, it is one row in requirements.txt and one environment variable. We cover the deeper trade-offs in our Suprsonic alternatives guide, and the install pattern that works for most agent stacks is framework + first-party SDK + Suprsonic + Composio + memory + observability, where Suprsonic handles the capability layer and Composio handles the user-OAuth integration layer (Gmail, Salesforce, Slack).
The other thing Suprsonic earns this spot for is its discovery surface. The GET /v1/tools endpoint returns tool definitions in the exact format that OpenAI's function calling and Anthropic's tools accept, the MCP server is listed on 18+ registries including the official MCP Registry, Glama, and Smithery (covered in our top 50 API marketplaces guide), and the Python and TypeScript SDKs are typed with Pydantic and Zod respectively. None of those is a flashy feature; they are exactly the substrate hygiene that determines whether the install fits cleanly into the rest of an agent stack. The practical test for any capability install is "can my agent loop discover and call this without me writing 50 lines of glue?" Suprsonic answers yes by design.
19. Picking the 10 Installs You Actually Need
If you read the full board and now feel paralysed, the practical install set for a starting agent stack in May 2026 is closer to ten than to one hundred. The core list, optimised for a coding-or-horizontal agent stack:
- Anthropic SDK + OpenAI SDK at the model layer, both, routed through LiteLLM for provider switching.
- A framework: LangGraph if you want the broadest ecosystem, Pydantic AI if you want typed surfaces, or smolagents if you want minimal code.
- Pydantic (Python) or Zod (TS) for tool schemas and structured output, plus Instructor for validated outputs.
- MCP Python SDK or MCP TypeScript SDK so your agent can speak the protocol every other tool now speaks.
- A sandbox: E2B for safety, Daytona for speed, Modal if you are also doing GPU jobs.
- A memory system: Mem0 for personal-fact memory, plus Zep if you need temporal reasoning.
- Langfuse for observability, instrumented via OpenLLMetry for OTel-compat.
- ripgrep, ast-grep, fd, jq, GitHub CLI as shell binaries, especially if any part of the agent is a coding agent.
- Playwright plus browser-use or Stagehand if any part of the agent touches the web.
- Suprsonic for search, scrape, enrichment, generation, and other capabilities the LLM does not have natively.
Beyond those ten, you add things as needs surface. Temporal or Inngest when you discover that a workflow needs to survive process restarts. vLLM or Ollama when you decide to run models locally. LiveKit Agents or Pipecat when voice becomes a requirement. DSPy when you want to optimise prompts programmatically rather than by hand. Haystack when you are in a regulated industry that needs the governance affordances. The board above is the menu; this paragraph is the order most teams should place.
The other thing worth saying explicitly is that install velocity matters more than install correctness in the first month. The biggest cost of a new agent stack is the time between signup and first working loop. uv makes that cheaper for Python, pnpm makes it cheaper for TypeScript, Suprsonic and Composio make it cheaper at the capability layer, and the first-party SDKs from OpenAI, Anthropic, and Google make it cheaper at the model layer. Optimise for the second-most-important install being one command away, not for the install set being theoretically perfect on day one.
20. What Will Move on This List by November 2026
The board above is a snapshot, and a snapshot of a fast-moving market. A few specific moves are likely by November 2026, and naming them now lets you weight your install decisions accordingly.
The first-party agent SDKs will absorb more of what the third-party frameworks do. Today the OpenAI Agents SDK has handoffs and sandboxes, the Claude Agent SDK has agents-as-tools and MCP, Google ADK has hierarchical orchestration and multimodal. By November, all three will have most of the capabilities all three currently have, which means the framework decision will collapse for single-vendor builders. The third-party frameworks (LangGraph, Pydantic AI, smolagents) will still matter for multi-vendor abstraction, but their mindshare will compress. The structural reason is that the labs have an unfair advantage in shipping agent SDK features fast: they own the model, the API, and the tools surface.
The sandbox layer will consolidate around the providers OpenAI's harness blesses. OpenAI Agents SDK April 2026 added native support for eight sandbox providers; that is a lot today, and it will be fewer by November as builders coalesce around the two or three with the best cold starts, the best DX, and the best pricing. E2B and Daytona are the most likely survivors at the top; Modal and Docker will hold the Python-ML and local-dev positions. The smaller providers may exit or get acquired.
Memory will become first-class in the first-party SDKs. Today you install Mem0 or Zep separately; by November, Anthropic and OpenAI will likely ship native memory primitives in the Agent SDKs, possibly as managed services. That does not kill Mem0 or Zep (they will still be the OSS option for self-hosters), but it does compress their commercial premium. The structural reason is the same as for sandboxes: the labs see the gap, the integration surface is theirs, and they will close it.
The browser-agent layer will ship a "Playwright Test Agents for Python" equivalent. Today the Playwright Test Agents (planner, generator, healer) ship for JavaScript and TypeScript only, with Python support tracked as a feature request - Playwright Python Test Agents request. By November this will likely close. When it does, the install pattern for Python browser agents collapses from "Playwright + browser-use" to "Playwright + Test Agents" for many use cases. browser-use will still matter as a higher-level abstraction, but the floor will rise.
MCP will deepen, not broaden. The protocol is settled, the registries are full, and the next 12 months are about quality, governance, and security rather than count of servers. Trust scores, signed registries, OAuth-on-MCP, and capability scoping will be the install-relevant changes. From a builder's perspective, that means the choice of MCP servers will start looking like the choice of npm packages does today: there are millions, you install the small set that has audited maintainers, and you avoid the rest. We covered the security shape of this in our 50 best MCP servers guide, and the install hygiene is the same as for any other dependency.
Suprsonic and the capability-API category will grow because individual provider integration does not commoditise the way model intelligence does. As we wrote in our Suprsonic alternatives guide, the integration layer is one of the categories where value accrues rather than commoditises. Every organisation is different, every provider has its own quirks, and the abstraction work does not ride the cost-of-intelligence curve down. The install layer for capabilities, integrations, and waterfall failover will grow in 2026 in absolute terms even as model APIs commoditise. That is exactly why Suprsonic exists, and it is why the install patterns above lean on it as a default rather than treating it as optional.
The thing that will not change is the substrate-first ordering of this list. ripgrep, tree-sitter, Pydantic, Zod, FastAPI, httpx, and Hugging Face Hub will still be at the top in November because they are not in a category that the labs are competing for. They are the parts every agent install ends up depending on, and that is exactly the kind of install that compounds rather than churns.
This guide reflects the AI-agent install landscape as of May 2026. Package versions, GitHub stars, download counts, and pricing change frequently in this category. Verify current details against official package registries and project repositories before pinning versions in production. The 100 installs above were chosen on a single global board and are not categorical winners; the right install set for any given agent stack is a subset of this board, sized to the actual problem you are solving.