Model Pricing
Pay only for what you use with simple per-token pricing across all our models
Model | Input Price | Output Price | Speed | Context Window |
---|---|---|---|---|
deepseek-r1-0528 Recommended
High quality outputs with ~95% accuracy
|
$0.4
per million tokens
|
$0.9
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
deepseek-r1-0528-turbo
High quality outputs at high speeds
|
$1.00
per million tokens
|
$2.00
per million tokens
|
~180
tokens/sec
|
131k
tokens
|
deepseek-r1
High quality outputs with 96.7% accuracy
|
$0.5
per million tokens
|
$0.9
per million tokens
|
~25
tokens/sec
|
131k
tokens
|
deepseek-r1-turbo
High speeds around 230 t/s at 95.1% accuracy
|
$1.00
per million tokens
|
$2.00
per million tokens
|
~230
tokens/sec
|
131k
tokens
|
deepseek-v3-0324
Latest version, Improved efficiency
|
$0.2
per million tokens
|
$0.65
per million tokens
|
~35
tokens/sec
|
131k
tokens
|
deepseek-v3-0324-turbo
Fastest DeepSeek-V3-0324 deployment available
|
$0.5
per million tokens
|
$1
per million tokens
|
~325
tokens/sec
|
131k
tokens
|
deepseek-v3
Base for DeepSeek-R1
|
$0.2
per million tokens
|
$0.65
per million tokens
|
~35
tokens/sec
|
131k
tokens
|
llama3.3-70b
Open weights, strong performance
|
$0.12
per million tokens
|
$0.35
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
llama3.1-405b
Larger of the Llama-3.1 models
|
$0.5
per million tokens
|
$0.5
per million tokens
|
~35
tokens/sec
|
131k
tokens
|
llama3.1-8b
Smaller Llama-3.1 model
|
$0.01
per million tokens
|
$0.06
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
llama3.1-tulu3-405b
More natural than 3.1-405b
|
$0.6
per million tokens
|
$0.6
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
llama-4-scout
Powerful model with high context
|
$0.08
per million tokens
|
$0.4
per million tokens
|
~65
tokens/sec
|
262k
tokens
|
deepseek-r1-distill-llama-70b
Smaller Than DeepSeek-R1, decent performance
|
$0.2
per million tokens
|
$0.7
per million tokens
|
~30
tokens/sec
|
131k
tokens
|
deepseek-r1-distill-qwen-32b
Smaller than llama distillation, good for code
|
$0.15
per million tokens
|
$0.22
per million tokens
|
~50
tokens/sec
|
131k
tokens
|
qwen-qwq-32b
Near Deepseek-R1 performance
|
$0.2
per million tokens
|
$0.2
per million tokens
|
~25
tokens/sec
|
131k
tokens
|
gemma-3-27b-it
Incredibly performant non-reasoning model
|
$0.08
per million tokens
|
$0.18
per million tokens
|
~80
tokens/sec
|
131k
tokens
|
qwen3-235b-a22b
Largest Qwen3 reasoning model
|
$0.12
per million tokens
|
$0.5
per million tokens
|
~60
tokens/sec
|
33k
tokens
|
multilingual-e5-large-instruct
Fast, inexpensive embedding model
|
$0.02
per million tokens
|
$0.02
per million tokens
|
~75
tokens/sec
|
512
tokens
|