AI Model Catalogue

131,072 ctx$0.18/1M in

Qwen3-VL-32B-Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

textvisionreasoning

262,144 ctx$0.50/1M in

Arize AI Qwen 2 1.5B Instruct

arize-ai

32,768 ctx$0.10/1M in

Qwen/Qwen3-Max

textreasoningmultilingual

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It...

meta-llama/Llama-Guard-4-12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

textvisioncheap

163,840 ctxFree in

Qwen/Qwen3-30B-A3B

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...

textreasoningagents

40,960 ctxFree in

Qwen/Qwen3-32B

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

40,960 ctxFree in

Qwen/Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...

mistralai/Mistral-Small-24B-Instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...

32,768 ctxFree in

Qwen/Qwen3-VL-30B-A3B-Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

textvisioninstruct

131,072 ctxFree in

microsoft/phi-4

Microsoft Phi-4 14B — small language model achieving state-of-the-art results on reasoning tasks.

16,384 ctxFree in

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

textvisioninstruct

Gryphe/MythoMax-L2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

4,096 ctxFree in

Qwen/Qwen3-Max-Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...

Qwen/Qwen3-14B

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

40,960 ctxFree in

codestral-2508

mistralai

Our cutting-edge language model for coding released August 2025.

textcodecheap

256,000 ctxFree in

openai/gpt-oss-safeguard-20b

groq

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust...

131,072 ctxFree in

mistral-embed-2312

mistralai

Official mistral-embed-2312 Mistral AI model

8,192 ctxFree in

mistral-small-2603

mistralai

Mistral Small 4.

kimi-k2-thinking

moonshotai

Kimi K2 Thinking is the latest, most capable version of an open-source thinking model.

qwen3-coder-next

qwen3-coder-next — available to run locally via Ollama on CPU and GPU hardware.

textcodecheap

kimi-k2-thinking

kimi-k2-thinking — available to run locally via Ollama on CPU and GPU hardware.

minimax-m2

minimax-m2 — available to run locally via Ollama on CPU and GPU hardware.

196,608 ctxFree in

Llama-3.1-70B-Instruct

meta-llama

Open-source Llama-3.1-70B-Instruct model from meta-llama — available for download and self-hosting on Hugging Face.

131,072 ctxFree in

gemini-3-flash-preview

gemini-3-flash-preview — available to run locally via Ollama on CPU and GPU hardware.

1,048,576 ctxFree in

Llama-3.2-1B-Instruct

meta-llama

Open-source Llama-3.2-1B-Instruct model from meta-llama — available for download and self-hosting on Hugging Face.

60,000 ctxFree in

Llama-3.1-8B-Instruct

meta-llama

Open-source Llama-3.1-8B-Instruct model from meta-llama — available for download and self-hosting on Hugging Face.

16,384 ctxFree in

Qwen3-30B-A3B-Instruct-2507

Open-source Qwen3-30B-A3B-Instruct-2507 model from qwen — available for download and self-hosting on Hugging Face.

262,144 ctx$0.00/1M in

Qwen3-Coder-30B-A3B-Instruct

Open-source Qwen3-Coder-30B-A3B-Instruct model from qwen — available for download and self-hosting on Hugging Face.

textcodeinstruct

160,000 ctx$0.00/1M in

Qwen3-30B-A3B

Open-source Qwen3-30B-A3B model from qwen — available for download and self-hosting on Hugging Face.

Qwen3-14B

Open-source Qwen3-14B model from qwen — available for download and self-hosting on Hugging Face.

Qwen3-32B

Open-source Qwen3-32B model from qwen — available for download and self-hosting on Hugging Face.

Qwen3-8B

Open-source Qwen3-8B model from qwen — available for download and self-hosting on Hugging Face.

long-contextinstructcheap

AI21 Jamba 1.6 Mini

ai21

AI21 Jamba 1.6 Mini is a lightweight Mamba-Transformer hybrid optimized for cost-effective, high-throughput inference with an impressive 256K context window. An excellent choice for document-heavy workloads on a budget.

256,000 ctx$0.20/1M in

long-contextinstructcheap

AI21 Jamba 1.6 Large

ai21

AI21 Jamba 1.6 Large uses a hybrid Mamba-Transformer architecture offering low memory footprint and high throughput compared to equivalent Transformer models. Features 256K context at a fraction of the inference cost.

256,000 ctx$2.00/1M in

Microsoft Phi-4 Mini

microsoft

Microsoft Phi-4 Mini is a 3.8B parameter compact model from Microsoft. Delivers impressive reasoning capabilities for edge and mobile deployment scenarios, with strong performance on math and coding tasks relative to its size.

reasoningcodeinstruct

128,000 ctx$0.00/1M in