๐ All Models
287 models ยท Page 2 of 8
DeepSeek R1 Distill Qwen 1.5B
Qwen3-VL-32B-Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Arize AI Qwen 2 1.5B Instruct
Qwen/Qwen3-Max
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It...
meta-llama/Llama-Guard-4-12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
Qwen/Qwen3-30B-A3B
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...
Qwen/Qwen3-32B
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
Qwen/Qwen3-235B-A22B-Thinking-2507
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...
mistralai/Mistral-Small-24B-Instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...
Qwen/Qwen3-VL-30B-A3B-Instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
microsoft/phi-4
Microsoft Phi-4 14B โ small language model achieving state-of-the-art results on reasoning tasks.
Qwen/Qwen3-VL-235B-A22B-Instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...
Gryphe/MythoMax-L2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Qwen/Qwen3-Max-Thinking
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...
Qwen/Qwen3-14B
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
codestral-2508
Our cutting-edge language model for coding released August 2025.
openai/gpt-oss-safeguard-20b
gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust...
mistral-embed-2312
Official mistral-embed-2312 Mistral AI model
mistral-small-2603
Mistral Small 4.
kimi-k2-thinking
Kimi K2 Thinking is the latest, most capable version of an open-source thinking model.
qwen3-coder-next
qwen3-coder-next โ available to run locally via Ollama on CPU and GPU hardware.
kimi-k2-thinking
kimi-k2-thinking โ available to run locally via Ollama on CPU and GPU hardware.
minimax-m2
minimax-m2 โ available to run locally via Ollama on CPU and GPU hardware.
Llama-3.1-70B-Instruct
Open-source Llama-3.1-70B-Instruct model from meta-llama โ available for download and self-hosting on Hugging Face.
gemini-3-flash-preview
gemini-3-flash-preview โ available to run locally via Ollama on CPU and GPU hardware.
Llama-3.2-1B-Instruct
Open-source Llama-3.2-1B-Instruct model from meta-llama โ available for download and self-hosting on Hugging Face.
Llama-3.1-8B-Instruct
Open-source Llama-3.1-8B-Instruct model from meta-llama โ available for download and self-hosting on Hugging Face.
Qwen3-30B-A3B-Instruct-2507
Open-source Qwen3-30B-A3B-Instruct-2507 model from qwen โ available for download and self-hosting on Hugging Face.
Qwen3-Coder-30B-A3B-Instruct
Open-source Qwen3-Coder-30B-A3B-Instruct model from qwen โ available for download and self-hosting on Hugging Face.
Qwen3-30B-A3B
Open-source Qwen3-30B-A3B model from qwen โ available for download and self-hosting on Hugging Face.
Qwen3-14B
Open-source Qwen3-14B model from qwen โ available for download and self-hosting on Hugging Face.
Qwen3-32B
Open-source Qwen3-32B model from qwen โ available for download and self-hosting on Hugging Face.
Qwen3-8B
Open-source Qwen3-8B model from qwen โ available for download and self-hosting on Hugging Face.
AI21 Jamba 1.6 Mini
AI21 Jamba 1.6 Mini is a lightweight Mamba-Transformer hybrid optimized for cost-effective, high-throughput inference with an impressive 256K context window. An excellent choice for document-heavy workloads on a budget.
AI21 Jamba 1.6 Large
AI21 Jamba 1.6 Large uses a hybrid Mamba-Transformer architecture offering low memory footprint and high throughput compared to equivalent Transformer models. Features 256K context at a fraction of the inference cost.
Microsoft Phi-4 Mini
Microsoft Phi-4 Mini is a 3.8B parameter compact model from Microsoft. Delivers impressive reasoning capabilities for edge and mobile deployment scenarios, with strong performance on math and coding tasks relative to its size.
