DeepSeek R1
Ollama deepseek-r1:70b
ollama 中的 deepseek-r1:70b 模型镜像为 43GB,加载到内存后的占用约为 46GB。
text
deepseek-r1:70b d37b54d01a76 46 GB 100% GPU 8192 4 minutes from now
total duration: 7m10.425334424s
load duration: 56.463007ms
prompt eval count: 25 token(s)
prompt eval duration: 44.639105ms
prompt eval rate: 560.05 tokens/s
eval count: 1480 token(s)
eval duration: 7m10.322503313s
eval rate: 3.44 tokens/sllama.cpp Benchmark
以下数据来自 ~/lzc-aipod-benchmark/deepseek-r1-distill-70b/cumulative-report.md,对应模型为 DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf,runtime 为 llama.cpp。这一节是 Distill 70B 的 GGUF benchmark,不是上面的 ollama deepseek-r1:70b 拉取记录。
| Device | Runtime | Weight | Single Stream | Peak Aggregate | Power | GPU Temp |
|---|---|---|---|---|---|---|
| Jetson AGX Orin | llama.cpp | GGUF Q4_K_M | 3.346 tok/s (g64 / c1) | 33.851 tok/s (g64 / c16) | 50.32~68.787 W | 89.5 C |
| Thor T4000 | llama.cpp | GGUF Q4_K_M | 2.909~2.912 tok/s (g64~g1024 / c1) | 46.165 tok/s (g64 / c48) | 22.066~22.295 W for c1; 37.798 W at peak | 52~53 C for c1; 48.562 C at peak |
| Thor | llama.cpp | GGUF Q4_K_M | 4.392~4.415 tok/s (g64~g1024 / c1) | - | 37.443~39.254 W | 68~75 C |
| Thor T5000 | llama.cpp | GGUF Q4_K_M | 4.458~4.552 tok/s (g64~g1024 / c1) | - | 37.699~38.937 W | 61~63 C |
Notes
Jetson AGX Orin这一行对应 benchmark 仓库中的测试机lzc-pod-juyIZt。THOR与T5000本轮都完成了g64,128,256,512,1024 / c1 / 300s的单路稳态复测。- 当前高并发峰值只在
T4000与Jetson AGX Orin上有完整结果,所以THOR和T5000的峰值聚合吞吐先留空。 - 数据来源:
/home/catdog/lzc-aipod-benchmark/deepseek-r1-distill-70b/cumulative-report.md