Vision / image input
Accepts image input
What is vision / image input?
Vision support means the model can process images alongside text. You send an image in the API call and the model can describe it, extract text from it, answer questions about it, or compare it to other images. GPT-4o, Claude 3.5 Sonnet, and Gemini all support vision. Image tokens are priced differently from text tokens — typically at a fixed rate per image or per tile.
Why it matters
Vision support unlocks use cases that text-only models cannot handle: document analysis, UI testing, image classification, accessibility descriptions, and visual question answering. If your application needs to process screenshots, receipts, product photos, or handwritten notes, you need a model with vision support. sourc.dev tracks this as a capability flag on every model.
Where models stand
12 models with vision / image input enabled:
Data available for 12 of 271 tracked entities.
How sourc.dev tracks this
sourc.dev verifies vision / image input manually from official provider documentation, API responses, and published specifications. Every data point includes a source URL and verification date. When a value changes, the old value is preserved in the history table and the new value is recorded alongside it. Nothing is overwritten — the full timeline is always available.
sourc.dev verifies this attribute manually from provider documentation. Every data point includes a source URL and verification date. Changes are recorded in the history table — nothing is overwritten.
This attribute is verified periodically against provider documentation. When sourc.dev detects a change, the new value is recorded alongside the old one with full provenance.
Understanding vision / image input helps developers make informed decisions when choosing between models and providers. Rather than relying on marketing claims, sourc.dev provides verified, dated, source-linked data so the data decides.