Model Pricing
Pay only for what you use with simple per-token pricing across all our models
Model | Input Price | Output Price | Speed | Context Window |
---|---|---|---|---|
kimi-k2 Recommended
SOTA ai model with 32B active parameters
|
$0.10
per million tokens
|
$0.25
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
gpt-oss-120b
Open source OpenAI model
|
$0.07
per million tokens
|
$0.27
per million tokens
|
~200
tokens/sec
|
131k
tokens
|
kimi-k2-turbo
Kimi K2 at 400+ t/s
|
$0.50
per million tokens
|
$1.00
per million tokens
|
~500
tokens/sec
|
131k
tokens
|
glm-4.5
Highly accurate model (set in instant response mode)
|
$0.10
per million tokens
|
$0.10
per million tokens
|
~80
tokens/sec
|
131k
tokens
|
qwen3-235b-a22b-2507-thinking
Newest version of the model
|
$0.10
per million tokens
|
$0.30
per million tokens
|
~20
tokens/sec
|
131k
tokens
|
qwen3-235b-a22b-2507-thinking-turbo
Newest version of the model
|
$0.50
per million tokens
|
$1.00
per million tokens
|
~300
tokens/sec
|
131k
tokens
|
qwen3-coder
Incredible Qwen3 coding model
|
$0.15
per million tokens
|
$0.35
per million tokens
|
~30
tokens/sec
|
262k
tokens
|
qwen3-coder-turbo
super fast qwen3-coder deployment
|
$0.20
per million tokens
|
$0.50
per million tokens
|
~150
tokens/sec
|
131k
tokens
|
qwen3-235b-a22b-2507-instruct
Newest version of the model
|
$0.10
per million tokens
|
$0.25
per million tokens
|
~30
tokens/sec
|
262k
tokens
|
qwen3-235b-a22b
Largest Qwen3 reasoning model
|
$0.12
per million tokens
|
$0.20
per million tokens
|
~60
tokens/sec
|
33k
tokens
|
deepseek-r1-0528
Updated version of deepseek-r1
|
$0.25
per million tokens
|
$0.25
per million tokens
|
~100
tokens/sec
|
131k
tokens
|
deepseek-r1-0528-turbo
deepseek-r1-0528 at 200+ t/s
|
$1.00
per million tokens
|
$2.00
per million tokens
|
~200
tokens/sec
|
131k
tokens
|
deepseek-v3-0324
Newest version of deepseek-v3
|
$0.20
per million tokens
|
$0.25
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
deepseek-v3-0324-turbo
Fastest DeepSeek-V3-0324 deployment available
|
$0.50
per million tokens
|
$1.00
per million tokens
|
~325
tokens/sec
|
131k
tokens
|
llama3.3-70b
Open weights, strong performance
|
$0.02
per million tokens
|
$0.10
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
llama3.1-8b
Smaller Llama-3.1 model
|
$0.01
per million tokens
|
$0.06
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
llama-4-scout
Powerful model with high context
|
$0.08
per million tokens
|
$0.40
per million tokens
|
~65
tokens/sec
|
262k
tokens
|
deepseek-r1-distill-llama-70b
Smaller Than DeepSeek-R1, decent performance
|
$0.10
per million tokens
|
$0.10
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
deepseek-r1-distill-qwen-32b
Smaller than llama distillation, good for code
|
$0.10
per million tokens
|
$0.10
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
gemma-3-27b-it
Incredibly performant non-reasoning model
|
$0.04
per million tokens
|
$0.10
per million tokens
|
~80
tokens/sec
|
131k
tokens
|
multilingual-e5-large-instruct
Fast, inexpensive embedding model
|
$0.02
per million tokens
|
$0.02
per million tokens
|
~75
tokens/sec
|
512
tokens
|
gemma-3n-e4b-it
Best 4B gemma model (as of release)
|
$0.03
per million tokens
|
$0.05
per million tokens
|
~70
tokens/sec
|
128k
tokens
|