๐All Models๐ฌText Generation๐ปCode & Reasoning๐๏ธVision & Multimodal๐จImage Generation๐๏ธAudio & Speech๐คAgents & Tools๐Long Context๐Free & Open๐ง Reasoning๐Multilingual
๐๏ธ Vision & Multimodal
6 models ยท Page 1 of 1
llama-3.2-11b-vision-instruct
meta
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.
textvisionreasoning
128,000 ctx$0.05/1M in
Explore specs and pricingView details โ
llama-4-scout-17b-16e-instruct
meta
Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
textvisioninstruct
131,000 ctx$0.27/1M in
Explore specs and pricingView details โ
Llama 4 Maverick 17B Instruct
meta
Llama 4 Maverick 17B Instruct โ available via AWS Bedrock (us-east-1).
textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ
Llama 3.2 11B Instruct
meta
Llama 3.2 11B Instruct โ available via AWS Bedrock (us-east-1).
textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ
Llama 3.2 90B Instruct
meta
Llama 3.2 90B Instruct โ available via AWS Bedrock (us-east-1).
textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ
Llama 4 Scout 17B Instruct
meta
Llama 4 Scout 17B Instruct โ available via AWS Bedrock (us-east-1).
textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ
