modelstop.top
Home/All Models

AI Model Catalogue

Browse 449 models across providers, modalities, and use cases.

๐ŸŒ All Models

449 models ยท Page 1 of 13

stable-diffusion-v1-5-img2img

runwayml

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

uform-gen2-qwen-500m

unum

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

textvisionimage
ctx$0.00/1M in
Explore specs and pricingView details โ†’

flux-2-dev

black-forest-labs

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

gemma-3-12b-it

google

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

textvisionreasoning
80,000 ctx$0.35/1M in
Explore specs and pricingView details โ†’

llama-3.2-11b-vision-instruct

meta

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

textvisionreasoning
128,000 ctx$0.05/1M in
Explore specs and pricingView details โ†’

llama-4-scout-17b-16e-instruct

meta

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

textvisioninstruct
131,000 ctx$0.27/1M in
Explore specs and pricingView details โ†’

lucid-origin

leonardo

Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics โ€“ from stylized concept art to crisp product mockups.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

mistral-small-3.1-24b-instruct

mistralai

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

textvisioninstruct
128,000 ctx$0.35/1M in
Explore specs and pricingView details โ†’

kimi-k2.5

moonshotai

Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

textvisionagents
256,000 ctx$0.60/1M in
Explore specs and pricingView details โ†’

flux-1-schnell

black-forest-labs

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

stable-diffusion-xl-lightning

bytedance

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

flux-2-klein-4b

black-forest-labs

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

phoenix-1.0

leonardo

Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

dreamshaper-8-lcm

lykon

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

stable-diffusion-xl-base-1.0

stabilityai

Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

stable-diffusion-v1-5-inpainting

runwayml

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

kimi-k2.6

moonshotai

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

textvisionagents
262,144 ctx$0.95/1M in
Explore specs and pricingView details โ†’

flux-2-klein-9b

black-forest-labs

FLUX.2 [klein] 9B is a 9 billion parameter model that can generate images from text descriptions and supports multi-reference editing capabilities.

visionimagefree
ctx$0.00/1M in
Explore specs and pricingView details โ†’

llava-1.5-7b-hf

llava-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

textvisionimage
ctx$0.00/1M in
Explore specs and pricingView details โ†’

Stable Image Control Structure

stability

Stable Image Control Structure โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Premier

amazon

Nova Premier โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Premier

amazon

Nova Premier โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Premier

amazon

Nova Premier โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Stable Image Style Transfer

stability

Stable Image Style Transfer โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Titan Multimodal Embeddings G1

amazon

Titan Multimodal Embeddings G1 โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Claude 3 Haiku

anthropic

Claude 3 Haiku โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Claude Sonnet 4.5

anthropic

Claude Sonnet 4.5 โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Gemma 3 4B IT

google

Gemma 3 4B IT โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
131,072 ctxFree in
Explore specs and pricingView details โ†’

Pixtral Large (25.02)

mistral

Pixtral Large (25.02) โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Reel

amazon

Nova Reel โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Premier

amazon

Nova Premier โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Titan Multimodal Embeddings G1

amazon

Titan Multimodal Embeddings G1 โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Claude Opus 4.7

anthropic

Claude Opus 4.7 โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Claude 3 Haiku

anthropic

Claude 3 Haiku โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Nova Canvas

amazon

Nova Canvas โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’

Llama 4 Maverick 17B Instruct

meta

Llama 4 Maverick 17B Instruct โ€” available via AWS Bedrock (us-east-1).

textvisionmultimodal
ctxFree in
Explore specs and pricingView details โ†’