GPT-OSS 120B

Benchmark

以下数据来自 ~/lzc-aipod-benchmark/gpt-oss-120b-moe/cumulative-report.md，对应模型为 gpt-oss-120b-mxfp4，runtime 为 llama.cpp，权重格式为 GGUF。

Device	Runtime	Weight	Single Stream	Peak Aggregate	Power	GPU Temp
Thor	`llama.cpp`	`GGUF mxfp4`	`34.615 tok/s` (`g512 / c1`)	`178.536 tok/s` (`g64 / c16`)	`31.793 W`	`67 C`
Thor T5000	`llama.cpp`	`GGUF mxfp4`	`35.088 tok/s` (`g1024 / c1`)	`330.602 tok/s` (`g64 / c48`)	`31.261 W` for latest `c1`; `54.419 W` at `c48`	`56 C` for latest `c1`; `57.593 C` at `c48`
Jetson AGX Orin	`-`	`-`	`no data`	`no data`	`-`	`-`
Thor T4000	`-`	`-`	`no data`	`no data`	`-`	`-`

THOR 当前单路最佳结果来自 g128 / g256 / g512 / g1024 的生成长度矩阵，其中 g512 / c1 为 34.615 tok/s。
T5000 当前默认单路口径固定为 g1024 / c1 / server_parallel=1 / 300s，最新复测结果为 35.088 tok/s。
T5000 的高并发峰值来自 c40 / c48 扩展扫描，其中 g64 / c48 达到 330.602 tok/s。
当前 benchmark 仓库里 Orin 与 T4000 还没有 gpt-oss-120b 的正式结果，因此这里只保留 no data。
数据来源：/home/catdog/lzc-aipod-benchmark/gpt-oss-120b-moe/cumulative-report.md