🏆 ICML 2026 — Efficient Qwen Leaderboard

Minimizing Inference Latency for Qwen3.5-4B on A10G

Hardware: NVIDIA A10G (ml.g5.xlarge) | Ranked by average speedup over baseline

Baseline (unoptimized Qwen3.5-4B): Short: 2,582 ms | Medium: 5,441 ms | Long: 6,576 ms | Avg: 4,866 ms
#TeamAvg Speedup Short (64t)Medium (2048t)Long (8192t) MMLU-Pro ≥62.1%IFEval ≥81.4%GPQA-Diamond ≥63.0%
1 AFM-as4vvw34 5.576x 333 ms835 ms2,678 ms ✅ 0.628 ✅ 0.833 ✅ 0.636
2 AFM-69568ssx 3.404x 620 ms1,312 ms3,456 ms ✅ 0.648 ✅ 0.833 ✅ 0.667
3 AFM-hnknxz5w 3.401x 622 ms1,317 ms3,424 ms ✅ 0.658 ✅ 0.845 ✅ 0.707
4 AFM-xpvr9w7k 3.013x 668 ms1,594 ms3,729 ms ✅ 0.646 ✅ 0.821 ✅ 0.636
5 AFM-pysuua5t 2.799x 774 ms1,825 ms3,161 ms ✅ 0.652 ✅ 0.857 ✅ 0.636
6 AFM-wp5spjm8 2.432x 1,154 ms2,308 ms2,435 ms ✅ 0.649 ✅ 0.857 ✅ 0.687
7 AFM-ayxvmqbj 2.103x 1,124 ms2,500 ms3,580 ms ✅ 0.663 ✅ 0.821 ✅ 0.657
8 AFM-z9qv547h 2.103x 1,125 ms2,500 ms3,582 ms ✅ 0.644 ✅ 0.821 ✅ 0.677
9 AFM-gv7e2ebx 2.103x 1,124 ms2,499 ms3,579 ms ✅ 0.647 ✅ 0.821 ✅ 0.636
10 AFM-npq35pxt 1.570x 1,584 ms3,384 ms4,468 ms ✅ 0.669 ✅ 0.845 ✅ 0.636
11 AFM-rc3csarb 1.481x 1,743 ms3,652 ms4,467 ms ✅ 0.672 ✅ 0.869 ✅ 0.727
12 AFM-newrkm40 0.999x 2,584 ms5,444 ms6,577 ms ✅ 0.688 ✅ 0.845 ✅ 0.687
13 Baseline 0.999x 2,585 ms5,444 ms6,577 ms ✅ 0.685 ✅ 0.857 ✅ 0.697