AnyGroundBench
A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models
* Equal contribution
Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where models inevitably encounter rare visual concepts and complex spatio-temporal dynamics. Since exhaustive pre-training across infinite data distributions is infeasible, the ability to adapt to novel domains is essential.
To bridge this gap, we introduce AnyGroundBench, a domain-adaptation benchmark designed to shift the STVG evaluation paradigm from static zero-shot testing to rigorous domain adaptation. Targeting five specialized domains (animal, industry, sports, surgery, and public security), AnyGroundBench pairs newly captured videos such as expert-annotated mouse behaviors with established datasets, unifying them through dense, high-fidelity spatio-temporal annotations. Crucially, the benchmark provides dedicated training subsets to systematically measure domain adaptability.
We extensively evaluate 15 state-of-the-art VLMs, assessing their zero-shot generalization and In-Context Learning (ICL) capabilities under practical computational constraints. Ultimately, our findings reveal that current models fail in both zero-shot and ICL-based adaptation when confronted with specialized domains, exposing critical flaws in spatio-temporal reasoning that future research must address.
Each cell reports STVG / TVG / SVG. STVG uses vIoU@0.3, TVG uses tIoU@0.3, and SVG uses sIoU@0.3. Shaded +ICL rows show 2-shot In-Context Learning performance.
| Models | Animal | Industry | Sports | Surgery | Public Security |
|---|---|---|---|---|---|
| Proprietary VLMs | |||||
| GPT-4o | 0.00 / 16.5 / 7.64 | 2.95 / 8.87 / 14.7 | 0.61 / 17.1 / 0.00 | 0.00 / 13.4 / 0.00 | 0.80 / 56.0 / 4.40 |
| +ICL | 0.00 / 17.3 / 12.7 | 1.25 / 10.4 / 4.14 | 0.61 / 22.0 / 0.61 | 0.00 / 26.1 / 0.00 | 6.35 / 66.9 / 6.40 |
| GPT-5.1 | 3.18 / 19.7 / 55.4 | 5.32 / 9.46 / 28.4 | 0.00 / 25.7 / 1.84 | 1.38 / 22.6 / 0.45 | 4.80 / 61.2 / 35.6 |
| +ICL | 5.59 / 23.0 / 59.2 | 2.74 / 11.3 / 34.9 | 1.22 / 23.9 / 7.36 | 2.64 / 41.7 / 0.44 | 9.63 / 66.4 / 39.2 |
| Gemini-2.5-Flash | 2.54 / 26.1 / 15.2 | 0.59 / 21.3 / 4.73 | 1.84 / 31.9 / 3.68 | 0.00 / 22.6 / 0.97 | 0.40 / 51.2 / 2.00 |
| +ICL | 1.91 / 19.1 / 12.1 | 1.18 / 23.0 / 8.87 | 0.00 / 41.7 / 0.00 | 0.46 / 18.9 / 3.59 | 2.40 / 61.6 / 3.60 |
| Gemini-2.5-Pro | 8.28 / 36.9 / 20.3 | 1.18 / 39.6 / 17.1 | 0.61 / 37.4 / 7.97 | 1.38 / 31.4 / 2.77 | 4.00 / 65.8 / 22.8 |
| +ICL | 8.28 / 31.2 / 45.2 | 8.28 / 37.2 / 26.0 | 3.68 / 43.5 / 5.52 | 6.01 / 41.2 / 24.8 | 8.80 / 70.4 / 20.0 |
| Gemini-3-Flash | 14.0 / 36.3 / 51.5 | 5.91 / 30.7 / 20.1 | 2.45 / 29.4 / 2.45 | 0.92 / 37.9 / 11.1 | 10.7 / 66.8 / 45.5 |
| +ICL | 13.3 / 33.1 / 45.8 | 6.50 / 33.1 / 24.2 | 1.22 / 44.1 / 3.68 | 6.48 / 42.5 / 27.0 | 30.8 / 78.4 / 25.1 |
| Gemini-3.1-Pro | 16.5 / 37.5 / 70.7 | 7.69 / 21.8 / 41.4 | 1.22 / 26.3 / 16.5 | 4.16 / 32.8 / 26.1 | 22.8 / 69.4 / 52.0 |
| +ICL | 12.7 / 33.1 / 60.5 | 11.8 / 27.8 / 39.6 | 1.22 / 39.8 / 7.36 | 9.72 / 42.1 / 23.4 | 22.4 / 77.2 / 41.6 |
| Open-source Specialized VLMs | |||||
| LLaVA-ST | 12.1 / 19.7 / 53.5 | 0.00 / 9.46 / 12.4 | 0.79 / 8.58 / 3.68 | 0.00 / 12.0 / 0.45 | 0.80 / 35.6 / 13.2 |
| Open-source General-Purpose VLMs | |||||
| Qwen3-VL-4B | 5.73 / 25.4 / 19.7 | 0.00 / 4.14 / 5.32 | 0.00 / 8.58 / 0.00 | 0.00 / 13.4 / 0.00 | 0.40 / 39.2 / 0.00 |
| +ICL | 0.63 / 28.6 / 0.63 | 1.18 / 15.9 / 0.00 | 0.00 / 17.7 / 0.00 | 0.00 / 34.2 / 0.00 | 0.00 / 59.6 / 0.00 |
| Qwen3-VL-8B | 3.82 / 19.7 / 0.00 | 0.00 / 10.6 / 0.00 | 0.00 / 11.6 / 0.00 | 0.00 / 15.7 / 0.00 | 0.80 / 46.0 / 0.00 |
| +ICL | 0.00 / 28.0 / 4.45 | 0.59 / 14.7 / 0.00 | 0.00 / 23.3 / 0.00 | 0.00 / 34.2 / 0.00 | 0.40 / 65.6 / 0.00 |
| Qwen3.5-4B | 2.54 / 30.5 / 13.3 | 0.00 / 14.7 / 7.69 | 0.00 / 12.8 / 0.00 | 0.00 / 28.2 / 1.49 | 0.40 / 49.2 / 2.00 |
| +ICL | 3.18 / 29.2 / 17.1 | 2.95 / 23.6 / 5.91 | 0.00 / 20.8 / 0.00 | 0.46 / 34.7 / 6.70 | 2.00 / 65.2 / 2.40 |
| Qwen3.5-9B | 4.45 / 35.0 / 20.3 | 0.59 / 17.1 / 12.4 | 0.61 / 14.7 / 0.00 | 0.00 / 28.2 / 1.35 | 0.40 / 50.4 / 4.80 |
| +ICL | 2.54 / 35.0 / 10.1 | 1.77 / 19.5 / 14.2 | 0.00 / 25.7 / 0.61 | 0.00 / 25.4 / 7.15 | 2.40 / 62.8 / 2.80 |
| Eagle2.5-8B | 0.00 / 24.8 / 1.27 | 0.00 / 7.69 / 2.36 | 0.00 / 7.97 / 0.00 | 0.00 / 15.2 / 0.00 | 0.00 / 47.6 / 0.40 |
| +ICL | 0.00 / 25.4 / 0.00 | 0.00 / 12.4 / 0.00 | 0.00 / 20.8 / 0.00 | 0.00 / 28.2 / 0.44 | 0.00 / 61.6 / 0.00 |
| InternVL3-8B | 0.63 / 15.2 / 3.82 | 0.00 / 5.91 / 1.18 | 0.00 / 6.79 / 0.00 | 0.00 / 10.6 / 0.00 | 0.00 / 4.40 / 0.80 |
| +ICL | 0.00 / 15.9 / 0.00 | 0.00 / 7.10 / 0.00 | 0.00 / 7.36 / 0.00 | 0.00 / 25.4 / 0.00 | 0.00 / 7.60 / 0.00 |
| InternVL3-14B | 0.63 / 17.8 / 7.64 | 0.00 / 5.32 / 1.18 | 0.00 / 7.36 / 0.00 | 0.00 / 10.6 / 0.00 | 0.00 / 13.2 / 0.00 |
| +ICL | 0.00 / 15.2 / 0.00 | 0.00 / 8.59 / 0.00 | 0.00 / 8.64 / 0.00 | 0.00 / 14.5 / 0.49 | 0.00 / 19.0 / 0.00 |
| InternVL3.5-8B | 0.00 / 11.4 / 0.00 | 0.00 / 2.95 / 1.18 | 0.00 / 6.13 / 0.00 | 0.00 / 7.40 / 0.00 | 0.00 / 3.60 / 0.00 |
| +ICL | 0.00 / 5.09 / 1.27 | 0.00 / 7.81 / 2.36 | 0.00 / 3.68 / 0.00 | 0.46 / 18.0 / 2.69 | 0.40 / 4.00 / 0.00 |