Large Language Model (LLM) Pricing
Model
Quantization Context / Max output
$ / M in
$ / M out
Speed
kimi-k2-0905
fp8 131,072 / 131,072
$0.15
$0.55
~29 t/s
kimi-k2-0905-turbo
fp8 131,072 / 8,192
$0.35
$1.00
~1019 t/s
kimi-k2-eco
fp4 131,072 / 131,072
$0.05
$0.10
~15 t/s
glm-4.6
fp8 131,072 / 131,072
$0.30
$0.60
~51 t/s
glm-4.5
fp8 131,072 / 131,072
$0.20
$0.40
~25 t/s
deepseek-v3.2-exp
fp4 131,072 / 131,072
$0.15
$0.30
~107 t/s
deepseek-v3.1-terminus
fp4 131,072 / 131,072
$0.20
$0.50
~18 t/s
deepseek-v3.1-terminus-reasoner
fp4 131,072 / 131,072
$0.20
$0.50
~36 t/s
deepseek-v3.1
fp4 131,072 / 131,072
$0.15
$0.50
~18 t/s
deepseek-v3.1-reasoner
fp4 131,072 / 131,072
$0.15
$0.50
~42 t/s
deepseek-v3-0324
fp4 131,072 / 8,192
$0.20
$0.25
~14 t/s
deepseek-v3-0324-turbo
fp4 131,072 / 8,192
$0.50
$1.00
~839 t/s
deepseek-r1-0528
fp4 131,072 / 131,072
$0.25
$0.25
~209 t/s
deepseek-r1-0528-turbo
fp4 131,072 / 131,072
$1.00
$2.00
~819 t/s
qwen3-next-80b-a3b-instruct
fp8 262,144 / 262,144
$0.08
$0.38
~282 t/s
qwen3-235b-a22b-2507-instruct
fp8 131,072 / 131,072
$0.10
$0.25
~20 t/s
qwen3-235b-a22b-2507-thinking
fp8 131,072 / 131,072
$0.10
$0.30
~29 t/s
qwen3-coder
fp8 131,072 / 131,072
$0.15
$0.35
~256 t/s
qwen3-coder-turbo
fp8 131,072 / 131,072
$0.20
$0.50
~439 t/s
gpt-oss-120b
fp4 131,072 / 131,072
$0.07
$0.27
~132 t/s
gemma-3-27b-it
fp8 131,072 / 131,072
$0.04
$0.10
~246 t/s
llama-4-scout
fp8 262,144 / 16,384
$0.08
$0.40
~39 t/s
llama3.3-70b
fp8 131,072 / 8,192
$0.12
$0.20
~68 t/s
deepseek-r1-distill-llama-70b
fp4 65,536 / 65,536
$0.10
$0.10
~62 t/s
deepseek-r1-distill-qwen-32b
fp4 65,536 / 65,536
$0.10
$0.10
~40 t/s