Appearance
Models & Providers
Model ID Format
All model IDs follow the pattern <provider>/<model-name>:
gemini/gemini-2.5-flash
bedrock/nova-pro
bedrock/qwen3-32b
bedrock/kimi-k2.5
openrouter/anthropic/claude-3-5-sonnet
kimi/moonshot-v1-8k
minimax/MiniMax-M2.5
local/llama-3The gateway parses the prefix to route requests to the correct provider.
Providers
Gemini (Google)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
gemini/gemini-2.5-pro | $1.25 | $10.00 | 1M |
gemini/gemini-2.5-flash | $0.15 | $0.60 | 1M |
gemini/gemini-2.5-flash-lite | $0.10 | $0.40 | 1M |
gemini/gemini-2.0-flash | $0.10 | $0.40 | 1M |
gemini/gemini-1.5-pro | $1.25 | $5.00 | 2M |
gemini/gemini-1.5-flash | $0.075 | $0.30 | 1M |
Kimi (Moonshot AI)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
kimi/kimi-k2.5 | $0.60 | $3.00 | 262k |
kimi/moonshot-v1-8k | $0.20 | $2.00 | 8k |
kimi/moonshot-v1-32k | $1.00 | $3.00 | 32k |
kimi/moonshot-v1-128k | $2.00 | $5.00 | 131k |
INFO
kimi-k2.5 ignores temperature, top_p, and penalty parameters.
MiniMax
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
minimax/MiniMax-M2.7 | $0.30 | $1.20 | 204k |
minimax/MiniMax-M2.5 | $0.118 | $0.95 | 196k |
minimax/MiniMax-M2 | $0.255 | $1.00 | 196k |
minimax/MiniMax-M1 | $0.40 | $1.76 | 1M |
minimax/MiniMax-Text-01 | $0.20 | $1.10 | 1M |
INFO
MiniMax ignores presence_penalty and frequency_penalty parameters.
AWS Bedrock (ap-northeast-1)
Access to 30+ models across multiple providers via AWS Bedrock in Tokyo. Full list available from GET /v1/models.
Anthropic Claude
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/claude-opus-4-7 | $5.00 | $25.00 | 200k |
bedrock/claude-opus-4-6 | $5.00 | $25.00 | 200k |
bedrock/claude-sonnet-4-6 | $3.00 | $15.00 | 200k |
bedrock/claude-haiku-4-5-20251001 | $1.00 | $5.00 | 200k |
Amazon Nova
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/nova-2-lite | $0.08 | $0.32 | 300k |
bedrock/nova-pro | $0.80 | $3.20 | 300k |
bedrock/nova-lite | $0.06 | $0.24 | 300k |
bedrock/nova-micro | $0.035 | $0.14 | 128k |
DeepSeek
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/deepseek-v3-2 | $0.74 | $2.22 | 64k |
bedrock/deepseek-v3 | $0.55 | $1.65 | 64k |
Google Gemma
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/gemma-3-27b | $0.28 | $0.46 | 8k |
bedrock/gemma-3-12b | $0.11 | $0.35 | 8k |
bedrock/gemma-3-4b | $0.05 | $0.10 | 8k |
MiniMax (via Bedrock)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/minimax-m2-5 | $0.36 | $1.44 | 1M |
bedrock/minimax-m2-1 | $0.36 | $1.44 | 1M |
bedrock/minimax-m2 | $0.36 | $1.45 | 1M |
Mistral AI
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/mistral-large-3 | $0.61 | $1.82 | 128k |
bedrock/devstral-2-123b | $0.48 | $2.40 | 128k |
bedrock/magistral-small-2509 | $0.61 | $1.82 | 40k |
bedrock/ministral-14b | $0.24 | $0.24 | 128k |
bedrock/ministral-8b | $0.18 | $0.18 | 128k |
bedrock/ministral-3b | $0.12 | $0.12 | 128k |
bedrock/voxtral-small-24b | $0.20 | $0.60 | 32k |
bedrock/voxtral-mini-3b | $0.05 | $0.05 | 32k |
Moonshot AI (via Bedrock)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/kimi-k2.5 | $0.72 | $3.60 | 131k |
bedrock/kimi-k2-thinking | $0.73 | $3.03 | 131k |
INFO
bedrock/kimi-k2-thinking is a reasoning model that uses an internal thinking budget. Use max_tokens ≥ 1000 to ensure output is produced.
NVIDIA Nemotron
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/nemotron-super-120b | $0.18 | $0.78 | 128k |
bedrock/nemotron-nano-30b | $0.07 | $0.29 | 128k |
bedrock/nemotron-nano-12b-v2 | $0.07 | $0.29 | 64k |
bedrock/nemotron-nano-9b-v2 | $0.07 | $0.28 | 64k |
OpenAI OSS (via Bedrock)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/gpt-oss-120b | $0.155 | $0.618 | 32k |
bedrock/gpt-oss-20b | $0.072 | $0.309 | 32k |
INFO
bedrock/gpt-oss-20b requires max_tokens ≥ 500 to produce output.
Qwen
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/qwen3-vl-235b | $0.64 | $3.22 | 128k |
bedrock/qwen3-coder-480b | $0.60 | $1.44 | 128k |
bedrock/qwen3-235b-a22b | $0.227 | $0.906 | 128k |
bedrock/qwen3-next-80b | $0.18 | $1.45 | 128k |
bedrock/qwen3-coder-30b | $0.20 | $0.60 | 128k |
bedrock/qwen3-32b | $0.155 | $0.618 | 128k |
Z.AI (GLM)
| Model | Prompt ($/1M) | Completion ($/1M) | Context |
|---|---|---|---|
bedrock/glm-5 | $1.20 | $3.84 | 128k |
bedrock/glm-4-7 | $0.72 | $2.64 | 128k |
bedrock/glm-4-7-flash | $0.07 | $0.40 | 128k |
OpenRouter
Provides access to 400+ models. Models are fetched dynamically from the OpenRouter API. Pricing comes from the API response.
Local
For self-hosted models (Ollama, vLLM, or any OpenAI-compatible endpoint). Free pricing ($0/$0). Context defaults to 4096 if not specified by the model metadata.
Pricing
Prices shown above are downstream costs (what the gateway pays the provider). The gateway applies a configurable markup (default 20%) on top.
Effective cost = downstream cost x (1 + markup%)
Adding a New Provider
- Implement the
LLMProviderinterface insrc/services/llm/ - Register the provider in
src/services/llm/index.ts - Add the API key env variable to
src/config.ts