AI Model Catalogue

llama-3.2-11b-vision-instruct

uform-gen2-qwen-500m

unum

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

textvisionimage

Output$0.0000/1M

flux-2-dev

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

stable-diffusion-v1-5-img2img

runwayml

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

lucid-origin

leonardo

Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.

⚡81msp50

llama-4-scout-17b-16e-instruct

gemma-3-12b-it

google

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

flux-2-klein-9b

FLUX.2 [klein] 9B is a 9 billion parameter model that can generate images from text descriptions and supports multi-reference editing capabilities.

⚡73msp50

kimi-k2.5

moonshotai

Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

llava-1.5-7b-hf

llava-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

flux-1-schnell

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

stable-diffusion-xl-lightning

bytedance

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

dreamshaper-8-lcm

lykon

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

⚡69msp50

phoenix-1.0

leonardo

Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

⚡570msp50

stable-diffusion-xl-base-1.0

stabilityai

Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

flux-2-klein-4b

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

kimi-k2.6

moonshotai

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

stable-diffusion-v1-5-inpainting

runwayml

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.