Model Pricing

Pay only for what you use with simple per-token pricing across all our models

Model Input Price Output Price Speed Context Window
kimi-k2 Recommended
SOTA ai model with 32B active parameters
$0.10
per million tokens
$0.25
per million tokens
~50
tokens/sec
131k
tokens
gpt-oss-120b
Open source OpenAI model
$0.07
per million tokens
$0.27
per million tokens
~200
tokens/sec
131k
tokens
kimi-k2-turbo
Kimi K2 at 400+ t/s
$0.50
per million tokens
$1.00
per million tokens
~500
tokens/sec
131k
tokens
glm-4.5
Highly accurate model (set in instant response mode)
$0.10
per million tokens
$0.10
per million tokens
~80
tokens/sec
131k
tokens
qwen3-235b-a22b-2507-thinking
Newest version of the model
$0.10
per million tokens
$0.30
per million tokens
~20
tokens/sec
131k
tokens
qwen3-235b-a22b-2507-thinking-turbo
Newest version of the model
$0.50
per million tokens
$1.00
per million tokens
~300
tokens/sec
131k
tokens
qwen3-coder
Incredible Qwen3 coding model
$0.15
per million tokens
$0.35
per million tokens
~30
tokens/sec
262k
tokens
qwen3-coder-turbo
super fast qwen3-coder deployment
$0.20
per million tokens
$0.50
per million tokens
~150
tokens/sec
131k
tokens
qwen3-235b-a22b-2507-instruct
Newest version of the model
$0.10
per million tokens
$0.25
per million tokens
~30
tokens/sec
262k
tokens
qwen3-235b-a22b
Largest Qwen3 reasoning model
$0.12
per million tokens
$0.20
per million tokens
~60
tokens/sec
33k
tokens
deepseek-r1-0528
Updated version of deepseek-r1
$0.25
per million tokens
$0.25
per million tokens
~100
tokens/sec
131k
tokens
deepseek-r1-0528-turbo
deepseek-r1-0528 at 200+ t/s
$1.00
per million tokens
$2.00
per million tokens
~200
tokens/sec
131k
tokens
deepseek-v3-0324
Newest version of deepseek-v3
$0.20
per million tokens
$0.25
per million tokens
~30
tokens/sec
131k
tokens
deepseek-v3-0324-turbo
Fastest DeepSeek-V3-0324 deployment available
$0.50
per million tokens
$1.00
per million tokens
~325
tokens/sec
131k
tokens
llama3.3-70b
Open weights, strong performance
$0.02
per million tokens
$0.10
per million tokens
~30
tokens/sec
131k
tokens
llama3.1-8b
Smaller Llama-3.1 model
$0.01
per million tokens
$0.06
per million tokens
~50
tokens/sec
131k
tokens
llama-4-scout
Powerful model with high context
$0.08
per million tokens
$0.40
per million tokens
~65
tokens/sec
262k
tokens
deepseek-r1-distill-llama-70b
Smaller Than DeepSeek-R1, decent performance
$0.10
per million tokens
$0.10
per million tokens
~50
tokens/sec
131k
tokens
deepseek-r1-distill-qwen-32b
Smaller than llama distillation, good for code
$0.10
per million tokens
$0.10
per million tokens
~50
tokens/sec
131k
tokens
gemma-3-27b-it
Incredibly performant non-reasoning model
$0.04
per million tokens
$0.10
per million tokens
~80
tokens/sec
131k
tokens
multilingual-e5-large-instruct
Fast, inexpensive embedding model
$0.02
per million tokens
$0.02
per million tokens
~75
tokens/sec
512
tokens
gemma-3n-e4b-it
Best 4B gemma model (as of release)
$0.03
per million tokens
$0.05
per million tokens
~70
tokens/sec
128k
tokens