uform-gen2-qwen-500m
unum
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
- Context window
- — tokens
- Input cost
- $0.00 / 1M
- Output cost
- $0.00 / 1M
- Latency (p50)
- —
