Skip to content

Models & Providers

Model ID Format

All model IDs follow the pattern <provider>/<model-name>:

gemini/gemini-2.5-flash
bedrock/nova-pro
bedrock/qwen3-32b
bedrock/kimi-k2.5
openrouter/anthropic/claude-3-5-sonnet
kimi/moonshot-v1-8k
minimax/MiniMax-M2.5
local/llama-3

The gateway parses the prefix to route requests to the correct provider.

Providers

Gemini (Google)

ModelPrompt ($/1M)Completion ($/1M)Context
gemini/gemini-2.5-pro$1.25$10.001M
gemini/gemini-2.5-flash$0.15$0.601M
gemini/gemini-2.5-flash-lite$0.10$0.401M
gemini/gemini-2.0-flash$0.10$0.401M
gemini/gemini-1.5-pro$1.25$5.002M
gemini/gemini-1.5-flash$0.075$0.301M

Kimi (Moonshot AI)

ModelPrompt ($/1M)Completion ($/1M)Context
kimi/kimi-k2.5$0.60$3.00262k
kimi/moonshot-v1-8k$0.20$2.008k
kimi/moonshot-v1-32k$1.00$3.0032k
kimi/moonshot-v1-128k$2.00$5.00131k

INFO

kimi-k2.5 ignores temperature, top_p, and penalty parameters.

MiniMax

ModelPrompt ($/1M)Completion ($/1M)Context
minimax/MiniMax-M2.7$0.30$1.20204k
minimax/MiniMax-M2.5$0.118$0.95196k
minimax/MiniMax-M2$0.255$1.00196k
minimax/MiniMax-M1$0.40$1.761M
minimax/MiniMax-Text-01$0.20$1.101M

INFO

MiniMax ignores presence_penalty and frequency_penalty parameters.

AWS Bedrock (ap-northeast-1)

Access to 30+ models across multiple providers via AWS Bedrock in Tokyo. Full list available from GET /v1/models.

Anthropic Claude

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/claude-opus-4-7$5.00$25.00200k
bedrock/claude-opus-4-6$5.00$25.00200k
bedrock/claude-sonnet-4-6$3.00$15.00200k
bedrock/claude-haiku-4-5-20251001$1.00$5.00200k

Amazon Nova

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/nova-2-lite$0.08$0.32300k
bedrock/nova-pro$0.80$3.20300k
bedrock/nova-lite$0.06$0.24300k
bedrock/nova-micro$0.035$0.14128k

DeepSeek

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/deepseek-v3-2$0.74$2.2264k
bedrock/deepseek-v3$0.55$1.6564k

Google Gemma

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/gemma-3-27b$0.28$0.468k
bedrock/gemma-3-12b$0.11$0.358k
bedrock/gemma-3-4b$0.05$0.108k

MiniMax (via Bedrock)

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/minimax-m2-5$0.36$1.441M
bedrock/minimax-m2-1$0.36$1.441M
bedrock/minimax-m2$0.36$1.451M

Mistral AI

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/mistral-large-3$0.61$1.82128k
bedrock/devstral-2-123b$0.48$2.40128k
bedrock/magistral-small-2509$0.61$1.8240k
bedrock/ministral-14b$0.24$0.24128k
bedrock/ministral-8b$0.18$0.18128k
bedrock/ministral-3b$0.12$0.12128k
bedrock/voxtral-small-24b$0.20$0.6032k
bedrock/voxtral-mini-3b$0.05$0.0532k

Moonshot AI (via Bedrock)

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/kimi-k2.5$0.72$3.60131k
bedrock/kimi-k2-thinking$0.73$3.03131k

INFO

bedrock/kimi-k2-thinking is a reasoning model that uses an internal thinking budget. Use max_tokens ≥ 1000 to ensure output is produced.

NVIDIA Nemotron

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/nemotron-super-120b$0.18$0.78128k
bedrock/nemotron-nano-30b$0.07$0.29128k
bedrock/nemotron-nano-12b-v2$0.07$0.2964k
bedrock/nemotron-nano-9b-v2$0.07$0.2864k

OpenAI OSS (via Bedrock)

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/gpt-oss-120b$0.155$0.61832k
bedrock/gpt-oss-20b$0.072$0.30932k

INFO

bedrock/gpt-oss-20b requires max_tokens ≥ 500 to produce output.

Qwen

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/qwen3-vl-235b$0.64$3.22128k
bedrock/qwen3-coder-480b$0.60$1.44128k
bedrock/qwen3-235b-a22b$0.227$0.906128k
bedrock/qwen3-next-80b$0.18$1.45128k
bedrock/qwen3-coder-30b$0.20$0.60128k
bedrock/qwen3-32b$0.155$0.618128k

Z.AI (GLM)

ModelPrompt ($/1M)Completion ($/1M)Context
bedrock/glm-5$1.20$3.84128k
bedrock/glm-4-7$0.72$2.64128k
bedrock/glm-4-7-flash$0.07$0.40128k

OpenRouter

Provides access to 400+ models. Models are fetched dynamically from the OpenRouter API. Pricing comes from the API response.

Local

For self-hosted models (Ollama, vLLM, or any OpenAI-compatible endpoint). Free pricing ($0/$0). Context defaults to 4096 if not specified by the model metadata.

Pricing

Prices shown above are downstream costs (what the gateway pays the provider). The gateway applies a configurable markup (default 20%) on top.

Effective cost = downstream cost x (1 + markup%)

Adding a New Provider

  1. Implement the LLMProvider interface in src/services/llm/
  2. Register the provider in src/services/llm/index.ts
  3. Add the API key env variable to src/config.ts

Released under the MIT License.