Skip to content

DeepSeek R1

Ollama deepseek-r1:70b

ollama 中的 deepseek-r1:70b 模型镜像为 43GB,加载到内存后的占用约为 46GB

text
deepseek-r1:70b    d37b54d01a76    46 GB    100% GPU     8192       4 minutes from now

total duration:       7m10.425334424s
load duration:        56.463007ms
prompt eval count:    25 token(s)
prompt eval duration: 44.639105ms
prompt eval rate:     560.05 tokens/s
eval count:           1480 token(s)
eval duration:        7m10.322503313s
eval rate:            3.44 tokens/s

llama.cpp Benchmark

以下数据来自 ~/lzc-aipod-benchmark/deepseek-r1-distill-70b/cumulative-report.md,对应模型为 DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf,runtime 为 llama.cpp。这一节是 Distill 70BGGUF benchmark,不是上面的 ollama deepseek-r1:70b 拉取记录。

DeviceRuntimeWeightSingle StreamPeak AggregatePowerGPU Temp
Jetson AGX Orinllama.cppGGUF Q4_K_M3.346 tok/s (g64 / c1)33.851 tok/s (g64 / c16)50.32~68.787 W89.5 C
Thor T4000llama.cppGGUF Q4_K_M2.909~2.912 tok/s (g64~g1024 / c1)46.165 tok/s (g64 / c48)22.066~22.295 W for c1; 37.798 W at peak52~53 C for c1; 48.562 C at peak
Thorllama.cppGGUF Q4_K_M4.392~4.415 tok/s (g64~g1024 / c1)-37.443~39.254 W68~75 C
Thor T5000llama.cppGGUF Q4_K_M4.458~4.552 tok/s (g64~g1024 / c1)-37.699~38.937 W61~63 C

Notes

  • Jetson AGX Orin 这一行对应 benchmark 仓库中的测试机 lzc-pod-juyIZt
  • THORT5000 本轮都完成了 g64,128,256,512,1024 / c1 / 300s 的单路稳态复测。
  • 当前高并发峰值只在 T4000Jetson AGX Orin 上有完整结果,所以 THORT5000 的峰值聚合吞吐先留空。
  • 数据来源:/home/catdog/lzc-aipod-benchmark/deepseek-r1-distill-70b/cumulative-report.md