Learn

Vision / image input

Accepts image input

What is vision / image input?

Vision support means the model can process images alongside text. You send an image in the API call and the model can describe it, extract text from it, answer questions about it, or compare it to other images. GPT-4o, Claude 3.5 Sonnet, and Gemini all support vision. Image tokens are priced differently from text tokens — typically at a fixed rate per image or per tile.

Why it matters

Vision support unlocks use cases that text-only models cannot handle: document analysis, UI testing, image classification, accessibility descriptions, and visual question answering. If your application needs to process screenshots, receipts, product photos, or handwritten notes, you need a model with vision support. sourc.dev tracks this as a capability flag on every model.

Where models stand

12 models with vision / image input enabled:

Models with vision / image input

GPT-4, GPT-4 Turbo, GPT-4o, Claude 3 Opus, Claude 3.5 Sonnet, Claude Sonnet 4.6, Gemini 1.5 Pro, Gemini 1.5 Flash, Grok 2, GPT-4o mini, Gemini 1.0 Pro, Gemini 2.0 Flash

Data available for 12 of 271 tracked entities.

How sourc.dev tracks this

sourc.dev verifies vision / image input manually from official provider documentation, API responses, and published specifications. Every data point includes a source URL and verification date. When a value changes, the old value is preserved in the history table and the new value is recorded alongside it. Nothing is overwritten — the full timeline is always available.

Frequently asked questions

FAQ How does sourc.dev measure vision / image input?

sourc.dev verifies this attribute manually from provider documentation. Every data point includes a source URL and verification date. Changes are recorded in the history table — nothing is overwritten.

FAQ How often is vision / image input updated?

This attribute is verified periodically against provider documentation. When sourc.dev detects a change, the new value is recorded alongside the old one with full provenance.

FAQ Why does vision / image input matter for developers?

Understanding vision / image input helps developers make informed decisions when choosing between models and providers. Rather than relying on marketing claims, sourc.dev provides verified, dated, source-linked data so the data decides.