Meta: Llama 3.2 11B Vision Instruct
meta-llama
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
- Context window
- 131,072 tokens
- Input cost
- $0.24 / 1M
- Output cost
- $0.24 / 1M
- Latency (p50)
- —
