modelstop.top
Home/All Models

AI Model Catalogue

Browse 1,750 models across providers, modalities, and use cases.

🌐 All Models

1,750 models Β· Page 4 of 49

bge-large-en-v1.5

baai

BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

textcheap
ctx$0.20/1M in
Explore specs and pricingView details β†’

llama-3.2-11b-vision-instruct

meta

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

textvisionreasoning
128,000 ctx$0.05/1M in
Explore specs and pricingView details β†’

gpt-oss-20b

openai

OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

textreasoningagents
128,000 ctx$0.20/1M in
Explore specs and pricingView details β†’

qwq-32b

qwen

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

textreasoningcheap
24,000 ctx$0.66/1M in
Explore specs and pricingView details β†’

llama-4-scout-17b-16e-instruct

meta

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

textvisioninstruct
131,000 ctx$0.27/1M in
Explore specs and pricingView details β†’

llama-3-8b-instruct-awq

meta

Quantized (int4) generative text model with 8 billion parameters from Meta.

textinstructcheap
8,192 ctx$0.12/1M in
Explore specs and pricingView details β†’

embeddinggemma-300m

google

EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

textfree
ctx$0.00/1M in
Explore specs and pricingView details β†’

bge-reranker-base

baai

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

textcheap
ctx$0.00/1M in
Explore specs and pricingView details β†’

phi-2

microsoft

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

textcodefree
2,048 ctx$0.00/1M in
Explore specs and pricingView details β†’

bart-large-cnn

facebook

BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

textfree
ctx$0.00/1M in
Explore specs and pricingView details β†’

qwen1.5-14b-chat-awq

qwen

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

textfree
7,500 ctx$0.00/1M in
Explore specs and pricingView details β†’

gemma-7b-it-lora

google

This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

textfree
3,500 ctx$0.00/1M in
Explore specs and pricingView details β†’

gemma-4-26b-a4b-it

google

Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

textcheaplong-context
256,000 ctx$0.10/1M in
Explore specs and pricingView details β†’

openchat-3.5-0106

openchat

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.

textfree
8,192 ctx$0.00/1M in
Explore specs and pricingView details β†’

gemma-3-12b-it

google

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

textvisionreasoning
80,000 ctx$0.35/1M in
Explore specs and pricingView details β†’

qwen1.5-7b-chat-awq

qwen

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

textfree
20,000 ctx$0.00/1M in
Explore specs and pricingView details β†’

deepseek-math-7b-instruct

deepseek-ai

DeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B. DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens.

textcodeinstruct
4,096 ctx$0.00/1M in
Explore specs and pricingView details β†’

falcon-7b-instruct

tiiuae

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

textinstructfree
4,096 ctx$0.00/1M in
Explore specs and pricingView details β†’

bge-small-en-v1.5

baai

BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

textcheap
ctx$0.02/1M in
Explore specs and pricingView details β†’

tinyllama-1.1b-chat-v1.0

tinyllama

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

textfree
2,048 ctx$0.00/1M in
Explore specs and pricingView details β†’

qwen2.5-coder-32b-instruct

qwen

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

textcodeinstruct
32,768 ctx$0.66/1M in
Explore specs and pricingView details β†’

nemotron-3-120b-a12b

nvidia

NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.

textagentscheap
256,000 ctx$0.50/1M in
Explore specs and pricingView details β†’

smart-turn-v2

pipecat-ai

An open source, community-driven, native audio turn detection model in 2nd version

textaudiofree
ctx$0.00/1M in
Explore specs and pricingView details β†’

m2m100-1.2b

meta

Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

textmultilingualcheap
ctx$0.34/1M in
Explore specs and pricingView details β†’

qwen1.5-0.5b-chat

qwen

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.

textfree
32,000 ctx$0.00/1M in
Explore specs and pricingView details β†’

llama-2-7b-chat-hf-lora

meta-llama

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

textfree
8,192 ctx$0.00/1M in
Explore specs and pricingView details β†’

deepseek-r1-distill-qwen-32b

deepseek-ai

DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

textcheap
80,000 ctx$0.50/1M in
Explore specs and pricingView details β†’

llama-3.3-70b-instruct-fp8-fast

meta

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

textinstructcheap
24,000 ctx$0.29/1M in
Explore specs and pricingView details β†’

granite-4.0-h-micro

ibm-granite

Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

textagentsinstruct
131,000 ctx$0.02/1M in
Explore specs and pricingView details β†’

indictrans2-en-indic-1B

ai4bharat

IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages

textmultilingualcheap
ctx$0.34/1M in
Explore specs and pricingView details β†’

llama-3.1-8b-instruct-fp8

meta

Llama 3.1 8B quantized to FP8 precision

textinstructcheap
32,000 ctx$0.15/1M in
Explore specs and pricingView details β†’

plamo-embedding-1b

pfnet

PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.

textcheap
ctx$0.02/1M in
Explore specs and pricingView details β†’

discolm-german-7b-v1-awq

thebloke

DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

textfree
4,096 ctx$0.00/1M in
Explore specs and pricingView details β†’

llama-2-7b-chat-int8

meta

Quantized (int8) generative text model with 7 billion parameters from Meta

textfree
8,192 ctx$0.00/1M in
Explore specs and pricingView details β†’

glm-4.7-flash

zai-org

GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.

textmultilingualcheap
131,072 ctx$0.06/1M in
Explore specs and pricingView details β†’

mistral-7b-instruct-v0.2-lora

mistral

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

textinstructfree
15,000 ctx$0.00/1M in
Explore specs and pricingView details β†’