๐ All Models
449 models ยท Page 1 of 13
stable-diffusion-v1-5-img2img
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.
uform-gen2-qwen-500m
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
flux-2-dev
FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.
gemma-3-12b-it
Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
llama-3.2-11b-vision-instruct
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.
llama-4-scout-17b-16e-instruct
Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
lucid-origin
Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics โ from stylized concept art to crisp product mockups.
mistral-small-3.1-24b-instruct
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
kimi-k2.5
Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.
flux-1-schnell
FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
stable-diffusion-xl-lightning
SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.
flux-2-klein-4b
FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.
phoenix-1.0
Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.
dreamshaper-8-lcm
Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.
stable-diffusion-xl-base-1.0
Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.
stable-diffusion-v1-5-inpainting
Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
kimi-k2.6
Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.
flux-2-klein-9b
FLUX.2 [klein] 9B is a 9 billion parameter model that can generate images from text descriptions and supports multi-reference editing capabilities.
llava-1.5-7b-hf
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
Stable Image Control Structure
Stable Image Control Structure โ available via AWS Bedrock (us-east-1).
Nova Premier
Nova Premier โ available via AWS Bedrock (us-east-1).
Nova Premier
Nova Premier โ available via AWS Bedrock (us-east-1).
Nova Premier
Nova Premier โ available via AWS Bedrock (us-east-1).
Stable Image Style Transfer
Stable Image Style Transfer โ available via AWS Bedrock (us-east-1).
Titan Multimodal Embeddings G1
Titan Multimodal Embeddings G1 โ available via AWS Bedrock (us-east-1).
Claude 3 Haiku
Claude 3 Haiku โ available via AWS Bedrock (us-east-1).
Claude Sonnet 4.5
Claude Sonnet 4.5 โ available via AWS Bedrock (us-east-1).
Gemma 3 4B IT
Gemma 3 4B IT โ available via AWS Bedrock (us-east-1).
Pixtral Large (25.02)
Pixtral Large (25.02) โ available via AWS Bedrock (us-east-1).
Nova Reel
Nova Reel โ available via AWS Bedrock (us-east-1).
Nova Premier
Nova Premier โ available via AWS Bedrock (us-east-1).
Titan Multimodal Embeddings G1
Titan Multimodal Embeddings G1 โ available via AWS Bedrock (us-east-1).
Claude Opus 4.7
Claude Opus 4.7 โ available via AWS Bedrock (us-east-1).
Claude 3 Haiku
Claude 3 Haiku โ available via AWS Bedrock (us-east-1).
Nova Canvas
Nova Canvas โ available via AWS Bedrock (us-east-1).
Llama 4 Maverick 17B Instruct
Llama 4 Maverick 17B Instruct โ available via AWS Bedrock (us-east-1).
