Model Pricing

Pay only for what you use with simple per-token pricing across all our models

Model Input Price Output Price Speed Context Window
deepseek-r1-0528 Recommended
High quality outputs with ~95% accuracy
$0.4
per million tokens
$0.9
per million tokens
~30
tokens/sec
131k
tokens
deepseek-r1-0528-turbo
High quality outputs at high speeds
$1.00
per million tokens
$2.00
per million tokens
~180
tokens/sec
131k
tokens
deepseek-r1
High quality outputs with 96.7% accuracy
$0.5
per million tokens
$0.9
per million tokens
~25
tokens/sec
131k
tokens
deepseek-r1-turbo
High speeds around 230 t/s at 95.1% accuracy
$1.00
per million tokens
$2.00
per million tokens
~230
tokens/sec
131k
tokens
deepseek-v3-0324
Latest version, Improved efficiency
$0.2
per million tokens
$0.65
per million tokens
~35
tokens/sec
131k
tokens
deepseek-v3-0324-turbo
Fastest DeepSeek-V3-0324 deployment available
$0.5
per million tokens
$1
per million tokens
~325
tokens/sec
131k
tokens
deepseek-v3
Base for DeepSeek-R1
$0.2
per million tokens
$0.65
per million tokens
~35
tokens/sec
131k
tokens
llama3.3-70b
Open weights, strong performance
$0.12
per million tokens
$0.35
per million tokens
~30
tokens/sec
131k
tokens
llama3.1-405b
Larger of the Llama-3.1 models
$0.5
per million tokens
$0.5
per million tokens
~35
tokens/sec
131k
tokens
llama3.1-8b
Smaller Llama-3.1 model
$0.01
per million tokens
$0.06
per million tokens
~50
tokens/sec
131k
tokens
llama3.1-tulu3-405b
More natural than 3.1-405b
$0.6
per million tokens
$0.6
per million tokens
~30
tokens/sec
131k
tokens
llama-4-scout
Powerful model with high context
$0.08
per million tokens
$0.4
per million tokens
~65
tokens/sec
262k
tokens
deepseek-r1-distill-llama-70b
Smaller Than DeepSeek-R1, decent performance
$0.2
per million tokens
$0.7
per million tokens
~30
tokens/sec
131k
tokens
deepseek-r1-distill-qwen-32b
Smaller than llama distillation, good for code
$0.15
per million tokens
$0.22
per million tokens
~50
tokens/sec
131k
tokens
qwen-qwq-32b
Near Deepseek-R1 performance
$0.2
per million tokens
$0.2
per million tokens
~25
tokens/sec
131k
tokens
gemma-3-27b-it
Incredibly performant non-reasoning model
$0.08
per million tokens
$0.18
per million tokens
~80
tokens/sec
131k
tokens
qwen3-235b-a22b
Largest Qwen3 reasoning model
$0.12
per million tokens
$0.5
per million tokens
~60
tokens/sec
33k
tokens
multilingual-e5-large-instruct
Fast, inexpensive embedding model
$0.02
per million tokens
$0.02
per million tokens
~75
tokens/sec
512
tokens