π All Models
352 models Β· Page 1 of 10
mistral-small-3.1-24b-instruct
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
llama-3-8b-instruct-awq
Quantized (int4) generative text model with 8 billion parameters from Meta.
qwen3-30b-a3b-fp8
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
gemma-sea-lion-v4-27b-it
SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
bge-base-en-v1.5
BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector
gemma-4-26b-a4b-it
Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.
gemma-3-12b-it
Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
gpt-oss-20b
OpenAIβs open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases β gpt-oss-20b is for lower latency, and local or specialized use-cases.
bge-reranker-base
Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.
llama-4-scout-17b-16e-instruct
Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
llama-3.2-11b-vision-instruct
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.
llama-3.1-8b-instruct-awq
Quantized (int4) generative text model with 8 billion parameters from Meta.
qwq-32b
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
bge-large-en-v1.5
BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector
nemotron-3-120b-a12b
NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.
bge-small-en-v1.5
BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector
plamo-embedding-1b
PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.
granite-4.0-h-micro
Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.
indictrans2-en-indic-1B
IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages
kimi-k2.6
Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.
mistral-7b-instruct-v0.1
Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters
llama-2-7b-chat-fp16
Full precision (fp16) generative text model with 7 billion parameters from Meta
llama-3.1-8b-instruct-fp8
Llama 3.1 8B quantized to FP8 precision
gpt-oss-120b
OpenAIβs open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases β gpt-oss-120b is for production, general purpose, high reasoning use-cases.
qwen3-embedding-0.6b
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.
m2m100-1.2b
Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation
deepseek-r1-distill-qwen-32b
DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
glm-4.7-flash
GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.
qwen2.5-coder-32b-instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
llama-3.2-3b-instruct
The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.
kimi-k2.5
Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.
bge-m3
Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.
distilbert-sst-2-int8
Distilled BERT model that was finetuned on SST-2 for sentiment classification
llama-3-8b-instruct
Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.
llama-guard-3-8b
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM β it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
llama-3.3-70b-instruct-fp8-fast
Llama 3.3 70B quantized to fp8 precision, optimized to be faster.
