Minimizing Inference Latency for Qwen3.5-4B on A10G
| # | Team | Avg Speedup | Short (64t) | Medium (2048t) | Long (8192t) | MMLU-Pro ≥62.1% | IFEval ≥81.4% | GPQA-Diamond ≥63.0% |
|---|---|---|---|---|---|---|---|---|
| 1 | AFM-as4vvw34 | 5.576x | 333 ms | 835 ms | 2,678 ms | ✅ 0.628 | ✅ 0.833 | ✅ 0.636 |
| 2 | AFM-69568ssx | 3.404x | 620 ms | 1,312 ms | 3,456 ms | ✅ 0.648 | ✅ 0.833 | ✅ 0.667 |
| 3 | AFM-hnknxz5w | 3.401x | 622 ms | 1,317 ms | 3,424 ms | ✅ 0.658 | ✅ 0.845 | ✅ 0.707 |
| 4 | AFM-xpvr9w7k | 3.013x | 668 ms | 1,594 ms | 3,729 ms | ✅ 0.646 | ✅ 0.821 | ✅ 0.636 |
| 5 | AFM-pysuua5t | 2.799x | 774 ms | 1,825 ms | 3,161 ms | ✅ 0.652 | ✅ 0.857 | ✅ 0.636 |
| 6 | AFM-wp5spjm8 | 2.432x | 1,154 ms | 2,308 ms | 2,435 ms | ✅ 0.649 | ✅ 0.857 | ✅ 0.687 |
| 7 | AFM-ayxvmqbj | 2.103x | 1,124 ms | 2,500 ms | 3,580 ms | ✅ 0.663 | ✅ 0.821 | ✅ 0.657 |
| 8 | AFM-z9qv547h | 2.103x | 1,125 ms | 2,500 ms | 3,582 ms | ✅ 0.644 | ✅ 0.821 | ✅ 0.677 |
| 9 | AFM-gv7e2ebx | 2.103x | 1,124 ms | 2,499 ms | 3,579 ms | ✅ 0.647 | ✅ 0.821 | ✅ 0.636 |
| 10 | AFM-npq35pxt | 1.570x | 1,584 ms | 3,384 ms | 4,468 ms | ✅ 0.669 | ✅ 0.845 | ✅ 0.636 |
| 11 | AFM-rc3csarb | 1.481x | 1,743 ms | 3,652 ms | 4,467 ms | ✅ 0.672 | ✅ 0.869 | ✅ 0.727 |
| 12 | AFM-newrkm40 | 0.999x | 2,584 ms | 5,444 ms | 6,577 ms | ✅ 0.688 | ✅ 0.845 | ✅ 0.687 |
| 13 | Baseline | 0.999x | 2,585 ms | 5,444 ms | 6,577 ms | ✅ 0.685 | ✅ 0.857 | ✅ 0.697 |