Xiaomi: MiMo-V2-Omni
xiaomi
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
- Context window
- 262,144 tokens
- Input cost
- $0.40 / 1M
- Output cost
- $2.00 / 1M
- Latency (p50)
- —
