AI Model Catalogue

Deepseek V3.2

Qwen3.5 9B Fp8

Nvidia Nemotron 3 Super 120B A12b Bf16

nvidia

Meta Llama 3.2 1B Instruct

textinstructcheap

131,072 ctx$0.06/1M in

Llama 4 Maverick 17B 128E

Facebook CWM

facebook

Holo3 35B A3b

hcompany

Qwen2.5 32B

Qwen2.5 32B — Alibaba's Qwen series language model with strong multilingual and coding capabilities.

Qwen3 235B A22B Thinking 2507 FP8

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...

textreasoningcheap

262,144 ctx$0.65/1M in

Qwen3-VL-8B-Instruct

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

textvisionreasoning

262,144 ctx$0.18/1M in

Cogito V1 Preview Llama 8B

Cogito V1 Preview Llama 8B — Meta's Llama open-source language model, one of the most widely deployed open models.

DeepSeek R1 Distill Qwen 1.5B

textcheaplong-context

131,072 ctx$0.18/1M in

Cogito V1 Preview Qwen 32B

Deepseek V3

DeepSeek R1 Distill Qwen 14B

textlong-context

131,072 ctx$1.60/1M in

DeepSeek R1 Distill Qwen 7B

Cogito V1 Preview Qwen 14B

Deepseek V3.1 Base

Cogito V1 Preview Llama 70B

DeepSeek R1 Distill Llama 70B

textlong-context

131,072 ctx$2.00/1M in

GLM 5 Fp4

zai-org

202,752 ctxFree in

Cogito V1 Preview Llama 70B Turbo

Qwen3 Next 80B A3b Thinking

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...

textcodereasoning

262,144 ctx$0.15/1M in

Llama Guard 4 12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisioncheap

1,048,576 ctx$0.20/1M in

Qwen3-VL-32B-Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

textvisionreasoning

262,144 ctx$0.50/1M in

Meta Llama 3.1 70B Instruct Turbo

textinstructcheap

131,072 ctx$0.88/1M in

GLM 4.7 Fp8

zai-org

textcheaplong-context

202,752 ctx$0.45/1M in

Qwen3 Coder 30B A3b Instruct

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...

textcodeagents

Cogito v2.1 671B

textlong-context

163,840 ctx$1.25/1M in

meta-llama/Llama-Guard-4-12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisioncheap

google/gemma-4-26B-A4B-it

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Qwen/Qwen3-Max

textreasoningmultilingual

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It...

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

textvisioninstruct

openai/gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

textreasoningagents

nvidia/Nemotron-3-Nano-30B-A3B

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

textagentsfree

256,000 ctxFree in