Models & Providers

Model ID Format

All model IDs follow the pattern <provider>/<model-name>:

gemini/gemini-2.5-flash
bedrock/nova-pro
bedrock/qwen3-32b
bedrock/kimi-k2.5
openrouter/anthropic/claude-3-5-sonnet
kimi/moonshot-v1-8k
minimax/MiniMax-M2.5
local/llama-3

The gateway parses the prefix to route requests to the correct provider.

Providers

Gemini (Google)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`gemini/gemini-2.5-pro`	$1.25	$10.00	1M
`gemini/gemini-2.5-flash`	$0.15	$0.60	1M
`gemini/gemini-2.5-flash-lite`	$0.10	$0.40	1M
`gemini/gemini-2.0-flash`	$0.10	$0.40	1M
`gemini/gemini-1.5-pro`	$1.25	$5.00	2M
`gemini/gemini-1.5-flash`	$0.075	$0.30	1M

Kimi (Moonshot AI)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`kimi/kimi-k2.5`	$0.60	$3.00	262k
`kimi/moonshot-v1-8k`	$0.20	$2.00	8k
`kimi/moonshot-v1-32k`	$1.00	$3.00	32k
`kimi/moonshot-v1-128k`	$2.00	$5.00	131k

INFO

kimi-k2.5 ignores temperature, top_p, and penalty parameters.

MiniMax

Model	Prompt ($/1M)	Completion ($/1M)	Context
`minimax/MiniMax-M2.7`	$0.30	$1.20	204k
`minimax/MiniMax-M2.5`	$0.118	$0.95	196k
`minimax/MiniMax-M2`	$0.255	$1.00	196k
`minimax/MiniMax-M1`	$0.40	$1.76	1M
`minimax/MiniMax-Text-01`	$0.20	$1.10	1M

INFO

MiniMax ignores presence_penalty and frequency_penalty parameters.

AWS Bedrock (ap-northeast-1)

Access to 30+ models across multiple providers via AWS Bedrock in Tokyo. Full list available from GET /v1/models.

Anthropic Claude

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/claude-opus-4-7`	$5.00	$25.00	200k
`bedrock/claude-opus-4-6`	$5.00	$25.00	200k
`bedrock/claude-sonnet-4-6`	$3.00	$15.00	200k
`bedrock/claude-haiku-4-5-20251001`	$1.00	$5.00	200k

Amazon Nova

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/nova-2-lite`	$0.08	$0.32	300k
`bedrock/nova-pro`	$0.80	$3.20	300k
`bedrock/nova-lite`	$0.06	$0.24	300k
`bedrock/nova-micro`	$0.035	$0.14	128k

DeepSeek

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/deepseek-v3-2`	$0.74	$2.22	64k
`bedrock/deepseek-v3`	$0.55	$1.65	64k

Google Gemma

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/gemma-3-27b`	$0.28	$0.46	8k
`bedrock/gemma-3-12b`	$0.11	$0.35	8k
`bedrock/gemma-3-4b`	$0.05	$0.10	8k

MiniMax (via Bedrock)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/minimax-m2-5`	$0.36	$1.44	1M
`bedrock/minimax-m2-1`	$0.36	$1.44	1M
`bedrock/minimax-m2`	$0.36	$1.45	1M

Mistral AI

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/mistral-large-3`	$0.61	$1.82	128k
`bedrock/devstral-2-123b`	$0.48	$2.40	128k
`bedrock/magistral-small-2509`	$0.61	$1.82	40k
`bedrock/ministral-14b`	$0.24	$0.24	128k
`bedrock/ministral-8b`	$0.18	$0.18	128k
`bedrock/ministral-3b`	$0.12	$0.12	128k
`bedrock/voxtral-small-24b`	$0.20	$0.60	32k
`bedrock/voxtral-mini-3b`	$0.05	$0.05	32k

Moonshot AI (via Bedrock)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/kimi-k2.5`	$0.72	$3.60	131k
`bedrock/kimi-k2-thinking`	$0.73	$3.03	131k

INFO

bedrock/kimi-k2-thinking is a reasoning model that uses an internal thinking budget. Use max_tokens ≥ 1000 to ensure output is produced.

NVIDIA Nemotron

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/nemotron-super-120b`	$0.18	$0.78	128k
`bedrock/nemotron-nano-30b`	$0.07	$0.29	128k
`bedrock/nemotron-nano-12b-v2`	$0.07	$0.29	64k
`bedrock/nemotron-nano-9b-v2`	$0.07	$0.28	64k

OpenAI OSS (via Bedrock)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/gpt-oss-120b`	$0.155	$0.618	32k
`bedrock/gpt-oss-20b`	$0.072	$0.309	32k

INFO

bedrock/gpt-oss-20b requires max_tokens ≥ 500 to produce output.

Qwen

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/qwen3-vl-235b`	$0.64	$3.22	128k
`bedrock/qwen3-coder-480b`	$0.60	$1.44	128k
`bedrock/qwen3-235b-a22b`	$0.227	$0.906	128k
`bedrock/qwen3-next-80b`	$0.18	$1.45	128k
`bedrock/qwen3-coder-30b`	$0.20	$0.60	128k
`bedrock/qwen3-32b`	$0.155	$0.618	128k

Z.AI (GLM)

Model	Prompt ($/1M)	Completion ($/1M)	Context
`bedrock/glm-5`	$1.20	$3.84	128k
`bedrock/glm-4-7`	$0.72	$2.64	128k
`bedrock/glm-4-7-flash`	$0.07	$0.40	128k

OpenRouter

Provides access to 400+ models. Models are fetched dynamically from the OpenRouter API. Pricing comes from the API response.

Local

For self-hosted models (Ollama, vLLM, or any OpenAI-compatible endpoint). Free pricing ($0/$0). Context defaults to 4096 if not specified by the model metadata.

Pricing

Prices shown above are downstream costs (what the gateway pays the provider). The gateway applies a configurable markup (default 20%) on top.

Effective cost = downstream cost x (1 + markup%)

Adding a New Provider

Implement the LLMProvider interface in src/services/llm/
Register the provider in src/services/llm/index.ts
Add the API key env variable to src/config.ts

Models & Providers ​

Model ID Format ​

Providers ​

Gemini (Google) ​

Kimi (Moonshot AI) ​

MiniMax ​

AWS Bedrock (ap-northeast-1) ​

OpenRouter ​

Local ​

Pricing ​

Adding a New Provider ​

Models & Providers

Model ID Format

Providers

Gemini (Google)

Kimi (Moonshot AI)

MiniMax

AWS Bedrock (ap-northeast-1)

OpenRouter

Local

Pricing

Adding a New Provider