Skip to content

Models

Re-exports from PydanticAI under murmur.models so user code never imports pydantic_ai directly (Public API Rule). Two unrelated families live here: Model classes (one per LLM vendor — pass to Agent.model for non-default Provider / endpoint / HTTP client configuration) and concurrency primitives (cap provider-side HTTP request concurrency).

Model classes

Class Vendor
AnthropicModel Anthropic
BedrockConverseModel AWS Bedrock
CerebrasModel Cerebras
CohereModel Cohere
GoogleModel Gemini (Google AI Studio / Vertex)
GroqModel Groq
HuggingFaceModel HuggingFace Inference Providers
MistralModel Mistral
OllamaModel Ollama
OpenAIChatModel OpenAI Chat Completions API (and OpenAI-compatible endpoints)
OpenAIResponsesModel OpenAI Responses API
OpenRouterModel OpenRouter
XaiModel xAI (Grok)
FallbackModel Wraps several models for automatic failover (Murmur builds this for you when you set Agent.fallback_models=)
Model Abstract base class — useful for type-annotating user code that holds a model of any kind

Pair a Model with the matching Provider when you need non-default authentication, an alternative endpoint, or a custom HTTP client. For the common case, prefer Agent(model="vendor:model_name") — PydanticAI auto-resolves it. See the Models & providers concept guide and the upstream PydanticAI per-vendor docs for each Model's constructor signature and supported model IDs.

OutlinesModel is intentionally not re-exported because it requires the optional outlines extra. If you need it, install the extra and import directly from pydantic_ai.models.outlines.

Concurrency primitives

Used to cap provider-side HTTP request concurrency:

from murmur.models import (
    AbstractConcurrencyLimiter,
    ConcurrencyLimit,
    ConcurrencyLimiter,
)

See the Agents concept guide for usage. The two Agent knobs that consume these are:

  • Agent.max_concurrent_requests: int | None — convenience int knob, builds a fresh limiter per agent.
  • Agent.model_concurrency_limiter: AbstractConcurrencyLimiter | None — pre-built limiter shared across agents.

The two are mutually exclusive. The runtime wraps the resolved model in pydantic_ai.models.concurrency.ConcurrencyLimitedModel outside any FallbackModel, so one slot covers the whole run regardless of which fallback served the request.

Classes

Class Purpose
ConcurrencyLimiter The default in-process limiter. Constructor: ConcurrencyLimiter(max_running, *, max_queued=None, name=None, tracer=None). Tracks waiting count + emits OpenTelemetry spans while awaiting a slot.
ConcurrencyLimit Frozen dataclass: ConcurrencyLimit(max_running, max_queued=None). Pass to ConcurrencyLimiter.from_limit(...) for backpressure (ConcurrencyLimitExceeded when queue depth exceeds max_queued).
AbstractConcurrencyLimiter Base class for custom limiters. Subclass + implement acquire(source) / release() for cross-process backends (e.g. Redis-backed for fleet-wide RPM caps).
from murmur import Agent
from murmur.models import ConcurrencyLimit, ConcurrencyLimiter

# Per-agent cap (one limiter per agent):
solo = Agent(name="solo", model="openai:gpt-5.2", max_concurrent_requests=5, )

# Shared cap across agents:
pool = ConcurrencyLimiter(max_running=10, name="openai-pool")
head   = Agent(name="head",   model="openai:gpt-5.2", model_concurrency_limiter=pool, )
minion = Agent(name="minion", model="openai:gpt-5.2", model_concurrency_limiter=pool, )

# Backpressure (raises ConcurrencyLimitExceeded when queue exceeds max_queued):
bp = ConcurrencyLimiter.from_limit(
    ConcurrencyLimit(max_running=5, max_queued=20),
    name="openai-bp",
)
strict = Agent(name="strict", model="openai:gpt-5.2", model_concurrency_limiter=bp, )

For the full primitive reference, see PydanticAI's concurrency module docs.