ByteDance: UI-TARS 7B
bytedance
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
- Context window
- 128,000 tokens
- Input cost
- $0.10 / 1M
- Output cost
- $0.20 / 1M
- Latency (p50)
- —
