theblokemodel

TinyLlama-1.1B-Chat-v0.3-GPTQ

Open-source TinyLlama-1.1B-Chat-v0.3-GPTQ model from thebloke — available for download and self-hosting on Hugging Face.

text free

Best for

General ChatSummarisationContent Generation

Try in Playground Compare →

Context Window

—

Input Cost

Free

Output Cost

—

Latency p50

—

Pricing Details

No pricing data. Model may be free or requires direct access.

Run it locally

Local run command

pip install transformers accelerate && python -c "from transformers import AutoModelForCausalLM, AutoTokenizer; tok = AutoTokenizer.from_pretrained('thebloke/TinyLlama-1.1B-Chat-v0.3-GPTQ'); model = AutoModelForCausalLM.from_pretrained('thebloke/TinyLlama-1.1B-Chat-v0.3-GPTQ'); print('Loaded thebloke/TinyLlama-1.1B-Chat-v0.3-GPTQ')"

Presumptive Specs:

GPU: 12+ GB VRAM; System RAM: 16+ GB; Disk: 20+ GB free

Reference links

huggingface.co/thebloke/TinyLlama-1.1B-Chat-v0.3-GPTQ huggingface.co/docs/transformers/installation

Hallucination Score™ (est.)

Community reliability estimate · not official

—

Not yet rated

About this score: Community-estimated based on user reports and publicly available benchmark data (e.g. TruthfulQA). This is not an official score from the model provider. Scores may be inaccurate — always verify with the official leaderboard before making production decisions.

Price History

Not enough historical data yet. Check back after the next pricing sync.

Provider

thebloke

Community Prompts

Proven prompts shared by the community for this model

Loading prompts…