Models

Browse available models routable through Multi-Router. Compare capabilities, pricing, and specs.

Current Models

49 of 49
Anthropicactive

Claude Haiku 4.5

anthropic/claude-haiku-4-5

The fastest model with near-frontier intelligence. Best for low-latency, high-volume workloads.

Context200K
Max Output64K
Latencyfastest
$1.00 input/$5.00 outputper 1M tokens
textvisiontool useextended thinking
Anthropicactive

Claude Opus 4.6

anthropic/claude-opus-4-6

The most intelligent model for building agents and coding. Exceptional performance in reasoning, complex analysis, and multi-step tasks.

Context1M
Max Output128K
Latencymoderate
$5.00 input/$25.00 outputper 1M tokens
textvisiontool useextended thinkingadaptive thinking
Anthropicactive

Claude Sonnet 4.6

anthropic/claude-sonnet-4-6

The best combination of speed and intelligence. Ideal for high-throughput tasks requiring strong reasoning.

Context1M
Max Output64K
Latencyfast
$3.00 input/$15.00 outputper 1M tokens
textvisiontool useextended thinkingadaptive thinking
Googleactive

Gemini 2.5 Flash

google/gemini-2.5-flash

Best price-performance model for large scale processing, low-latency, high-volume tasks with thinking.

Context1M
Max Output65.5K
Latencyfast
$0.30 input/$2.50 outputper 1M tokens
textvisiontool usereasoning
Googleactive

Gemini 2.5 Flash-Lite

google/gemini-2.5-flash-lite

Fastest and most budget-friendly multimodal model. Ideal for classification, extraction, and low-latency workloads.

Context1M
Max Output65.5K
Latencyfastest
$0.10 input/$0.40 outputper 1M tokens
textvisiontool usereasoning
Googleactive

Gemini 2.5 Pro

google/gemini-2.5-pro

State-of-the-art thinking model for complex problems in code, math, STEM, and long-context analysis.

Context1M
Max Output65.5K
Latencymoderate
$1.25 input/$10.00 outputper 1M tokens
textvisiontool usereasoning
Googleactive

Gemini 3 Flash

google/gemini-3-flash-preview

Frontier-class reasoning and multimodal understanding at a fraction of the cost of larger models.

Context1M
Max Output65.5K
Latencyfast
$0.50 input/$3.00 outputper 1M tokens
textvisiontool usereasoning
Googleactive

Gemini 3.1 Flash-Lite

google/gemini-3.1-flash-lite-preview

Most cost-efficient multimodal model. Optimized for high-frequency lightweight tasks like translation and data extraction.

Context1M
Max Output65.5K
Latencyfastest
$0.25 input/$1.50 outputper 1M tokens
textvisiontool usereasoning
Googleactive

Gemini 3.1 Pro

google/gemini-3.1-pro-preview

Advanced intelligence with better thinking, improved token efficiency, and factual consistency. Optimized for software engineering and agentic workflows.

Context1M
Max Output65.5K
Latencymoderate
$2.00 input/$12.00 outputper 1M tokens
textvisiontool usereasoning
MiniMaxactive

MiniMax M2.5

minimax/minimax-m2-5

General-purpose model with tool use and 197K context. Free tier.

Context197K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
texttool use
NVIDIAactive

Nemotron 3 Nano 30B A3B

nvidia/nemotron-3-nano-30b-a3b

Compact MoE model for efficient text generation with 256K context. Free tier.

Context256K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
text
NVIDIAactive

Nemotron 3 Super

nvidia/nemotron-3-super

MoE model (120B total, 12B active) optimized for text tasks with 262K context. Free tier.

Context262K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
text
NVIDIAactive

Nemotron Nano 12B 2 VL

nvidia/nemotron-nano-12b-2-vl

Compact vision-language model with 128K context. Free tier.

Context128K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
textvision
NVIDIAactive

Nemotron Nano 9B V2

nvidia/nemotron-nano-9b-v2

Compact reasoning model with 128K context. Free tier.

Context128K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
textreasoning
OpenAIactive

GPT-4.1

openai/gpt-4.1

Excels at instruction following and tool calling with broad knowledge across domains. 1M token context.

Context1M
Max Output32.8K
Latencyfast
$2.00 input/$8.00 outputper 1M tokens
textvisiontool use
OpenAIactive

GPT-5

openai/gpt-5

Strong coding, reasoning, and agentic capabilities across domains with built-in reasoning tokens.

Context400K
Max Output128K
Latencymoderate
$1.25 input/$10.00 outputper 1M tokens
textvisiontool usereasoning
OpenAIactive

GPT-5 Mini

openai/gpt-5-mini

Near-frontier intelligence for cost-sensitive, low-latency, high-volume workloads.

Context400K
Max Output128K
Latencyfast
$0.25 input/$2.00 outputper 1M tokens
textvisiontool use
OpenAIactive

GPT-5 Nano

openai/gpt-5-nano

Fastest and cheapest GPT-5 variant. Optimized for summarization and classification tasks.

Context400K
Max Output128K
Latencyfastest
$0.05 input/$0.40 outputper 1M tokens
textvisiontool use
OpenAIactive

GPT-5.4

openai/gpt-5.4

Most capable frontier model. Exceptional at complex professional work with configurable reasoning depth.

Context1.1M
Max Output128K
Latencymoderate
$2.50 input/$15.00 outputper 1M tokens
textvisiontool usereasoning
OpenAIactive

GPT-5.4 Pro

openai/gpt-5.4-pro

Premium tier using extended compute for harder problems. Best for research-grade reasoning tasks.

Context1.1M
Max Output128K
Latencymoderate
$30.00 input/$180.00 outputper 1M tokens
textvisiontool usereasoning
OpenAIactive

o3

openai/o3

Sets a new standard for math, science, coding, and visual reasoning tasks with highest reasoning capability.

Context200K
Max Output100K
Latencymoderate
$2.00 input/$8.00 outputper 1M tokens
textvisiontool usereasoning
OpenAIactive

o3-pro

openai/o3-pro

Extended compute version of o3 for harder problems. Best for research-grade reasoning tasks.

Context200K
Max Output100K
Latencymoderate
$20.00 input/$80.00 outputper 1M tokens
textvisiontool usereasoning
OpenAIactive

o4-mini

openai/o4-mini

Fast, cost-efficient reasoning model optimized for coding and visual tasks.

Context200K
Max Output100K
Latencyfast
$1.10 input/$4.40 outputper 1M tokens
textvisiontool usereasoning
Qwenactive

Qwen3 Coder 480B A35B

qwen/qwen3-coder-480b-a35b

MoE coding model (480B total, 35B active) with tool use and 262K context. Free tier.

Context262K
Max Output8.2K
Latencymoderate
$0.00 input/$0.00 outputper 1M tokens
texttool use
Qwenactive

Qwen3 Next 80B A3B Instruct

qwen/qwen3-next-80b-a3b-instruct

MoE instruction-tuned model with tool use and 262K context. Free tier.

Context262K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
texttool use
StepFunactive

Step 3.5 Flash

stepfun/step-3-5-flash

MoE model (196B total, 11B active) with strong reasoning and 256K context. Free tier.

Context256K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
textreasoning
Z.aiactive

GLM 4.5 Air

z-ai/glm-4-5-air

Reasoning model with 131K context from Zhipu AI. Free tier.

Context131K
Max Output8.2K
Latencyfast
$0.00 input/$0.00 outputper 1M tokens
textreasoning

Legacy Models

Still available but consider migrating to current models for improved performance.

Claude 3 Haiku

deprecated
anthropic/claude-3-haiku
200K ctx4.1K out$0.25 / $1.25

Claude Opus 4

legacy
anthropic/claude-opus-4
200K ctx32K out$15.00 / $75.00

Claude Opus 4.1

legacy
anthropic/claude-opus-4-1
200K ctx32K out$15.00 / $75.00

Claude Opus 4.5

legacy
anthropic/claude-opus-4-5
200K ctx64K out$5.00 / $25.00

Claude Sonnet 4

legacy
anthropic/claude-sonnet-4
200K ctx64K out$3.00 / $15.00

Claude Sonnet 4.5

legacy
anthropic/claude-sonnet-4-5
200K ctx64K out$3.00 / $15.00

Gemini 2.0 Flash

deprecated
google/gemini-2.0-flash
1M ctx8.2K out$0.10 / $0.40

GPT-3.5 Turbo

deprecated
openai/gpt-3.5-turbo
16.4K ctx4.1K out$0.50 / $1.50

GPT-4

legacy
openai/gpt-4
8.2K ctx8.2K out$30.00 / $60.00

GPT-4 Turbo

legacy
openai/gpt-4-turbo
128K ctx4.1K out$10.00 / $30.00

GPT-4.1 Mini

legacy
openai/gpt-4.1-mini
1M ctx32.8K out$0.40 / $1.60

GPT-4.1 Nano

legacy
openai/gpt-4.1-nano
1M ctx32.8K out$0.10 / $0.40

GPT-4o

legacy
openai/gpt-4o
128K ctx16.4K out$2.50 / $10.00

GPT-4o Mini

legacy
openai/gpt-4o-mini
128K ctx16.4K out$0.15 / $0.60

GPT-5 Pro

legacy
openai/gpt-5-pro
400K ctx272K out$15.00 / $120.00

GPT-5.1

legacy
openai/gpt-5.1
400K ctx128K out$1.25 / $10.00

GPT-5.2

legacy
openai/gpt-5.2
400K ctx128K out$1.75 / $14.00

GPT-5.2 Pro

legacy
openai/gpt-5.2-pro
400K ctx128K out$21.00 / $168.00

o1

legacy
openai/o1
200K ctx100K out$15.00 / $60.00

o1-mini

deprecated
openai/o1-mini
128K ctx65.5K out$1.10 / $4.40

o1-pro

legacy
openai/o1-pro
200K ctx100K out$150.00 / $600.00

o3-mini

legacy
openai/o3-mini
200K ctx100K out$1.10 / $4.40