Learn

Drift Index

Behavioural consistency score 0-100. Lower = more drift from baseline.

What is drift index?

Drift index measures how much a model's output behaviour has changed over time, detected via canary prompts sent on a fixed schedule. When OpenAI updates GPT-4 silently, your application's behaviour changes without you deploying any code. Drift detection is the only way to know this happened. Think of it as calibration drift in a scientific instrument — if you do not measure it, you cannot manage it.

Why it matters

OpenAI updated GPT-4 multiple times in 2023-2024 without public announcement. Each update changed output patterns. Applications that depended on specific formatting, reasoning chains, or JSON structures broke silently. Drift detection is the only defence against this. sourc.dev tracks drift via canary prompts — fixed inputs sent on a schedule, with outputs compared over time.

Where models stand

No data available yet for this metric.

How sourc.dev tracks this

sourc.dev tracks drift index through its automated monitoring pipeline. Data is collected on a regular schedule, compared against previous values, and any changes are recorded in the history table with full provenance — source URL, effective date, and verification timestamp. Nothing is overwritten. The pipeline ensures this attribute stays current without manual intervention.

Frequently asked questions

FAQ How is drift detected?

sourc.dev sends canary prompts — fixed, deterministic prompts — to each model on a regular schedule. The responses are compared over time using semantic similarity and structural analysis. Significant changes in output pattern indicate drift.

FAQ Does drift mean the model got worse?

Not necessarily. Drift measures change, not quality. A model might drift toward better outputs, worse outputs, or simply different outputs. The value is knowing that change occurred, so you can evaluate its impact on your use case.

FAQ How often do models drift?

More often than most developers expect. Research has documented significant behavioural changes in GPT-4 and GPT-3.5 over periods as short as a few months, with no public announcement from the provider.