Monitoring Model Deprecation in Production
The model you pinned will be switched off one day, probably with less notice than you would like. Here is how to find out from CI instead of from your users.
The model you carefully pinned six months ago will be switched off one day, and the provider will give you less notice than you would like. When it happens, your app does not degrade gracefully. It starts returning hard errors (model_not_found) on every request that touches that model, and you find out from a support ticket instead of from a calendar reminder. This post is about flipping that around: detecting deprecations in CI, on a schedule, so the warning lands in your team's channel months before the shutdown.
I will use deprecations.info, a small open project that scrapes every major provider's deprecation page into one feed, and build the workflow you actually want around it: a single file listing the models your app uses, a scheduled GitHub Action that cross-references that file against the feed and pings each model to confirm it still answers, and an alert when something is wrong.
Why this matters more than it looks#
Model deprecation feels like a non-problem until the first time it bites you. Here is why it is worth a small amount of infrastructure.
The failure mode is a hard outage, not a slow decline. When a provider retires a model ID, calls to it stop working. There is no fallback, no "slightly worse answer", just errors on the code path that used that model. If that path is your core feature, your core feature is down.
Pinned snapshots are exactly what gets retired. Good practice is to pin a dated snapshot like gpt-5-2025-08-07 or claude-haiku-4-5-20251001 so behavior stays stable. The tradeoff is that dated snapshots are the first things on the chopping block. The alternative, using a floating alias like gpt-5, trades a hard shutdown for silent behavior change, which is its own problem. Either way you need to know when the thing underneath you moves.
The notice is short and the migration is not. Providers often give a few months between announcement and shutdown. That sounds like plenty until you remember what migrating actually involves: swapping the ID, re-running your evals, checking that prompts still behave, confirming structured output still parses, and re-checking cost and latency because the new model prices and responds differently. If you discover the deprecation with three weeks left, that is a scramble. With three months, it is a calm ticket.
The announcement rarely reaches the right person. Providers email the account owner and update a docs page. The account owner is often not the engineer who maintains the code, and nobody reads every provider's changelog daily. Detection needs to live where your engineers already look: a pull request check, a red CI run, a Slack message. Not an inbox someone forwards three weeks late.
It is cheap insurance. A scheduled job that runs in thirty seconds a day is nothing against one 2am incident, one emergency migration, or one breached uptime promise to your own customers. This is the cheapest reliability work you will do all quarter.
This is the proactive half of Observability for LLM apps. Observability tells you something broke. Deprecation monitoring tells you something is going to break, with enough lead time to fix it on your terms.
What deprecations.info gives you#
The site does one boring, useful thing. A GitHub Action runs daily (2 AM UTC), scrapes the deprecation pages of OpenAI, Anthropic, Google AI/Gemini, Google Vertex AI, AWS Bedrock, Cohere, Groq, xAI, and Azure, extracts each notice, and publishes the result in three formats: an RSS feed, a JSON Feed, and a raw JSON array. No API key, no auth. The whole thing is open source (deprecations/deprecations-rss, MIT, almost entirely Python) so you can read exactly how each provider page is parsed, or run it yourself.
For programmatic monitoring, skip the RSS and use the raw JSON. Each entry is structured and easy to match against:
{
"provider": "OpenAI",
"model_id": "gpt-5-2025-08-07",
"announcement_date": "2026-06-16",
"shutdown_date": "2026-12-11",
"deprecation_date": "2026-06-11",
"replacement_models": ["gpt-5.5"],
"deprecation_context": "On June 11, 2026, we notified developers using older GPT-5 and o3 model snapshots of their deprecation and removal from the API on December 11, 2026.",
"url": "https://platform.openai.com/docs/deprecations#2026-06-11-gpt-5-and-o3-model-deprecations",
"scraped_at": "2026-06-16T07:28:54Z",
"first_observed": "2026-06-16",
"last_observed": "2026-06-16"
}The fields that matter for automation are model_id (to match against what you use), shutdown_date (to compute urgency), and replacement_models (to tell you where to go next). That is enough to build the whole workflow.
Treat the feed as an early-warning system, not the source of truth. It scrapes provider pages, so it can lag a page change or miss an oddly formatted notice, and it collapses region-specific dates into the earliest one. The provider's own docs are still authoritative. That is also why the workflow below adds a second, independent check: a liveness probe.
The workflow#
Two ideas make this solid. First, one file is the single source of truth for every model your app uses, so the monitor never guesses. Second, you check two independent things: the feed (an announced future shutdown) and a live probe (is the model actually responding right now). The feed catches planned deprecations early; the probe catches surprises the feed missed, plus access changes, typos, and regional rollouts.
Step 1: one models constant file#
Stop scattering model strings across your codebase. Put them in one place and import from it everywhere. This file is what the monitor reads, and it is also just good hygiene.
# app/models.py
# The single source of truth for every model this app calls.
# Import MODEL_* from here everywhere; never hard-code a model string elsewhere.
GPT_5 = "gpt-5-2025-08-07"
CLAUDE_HAIKU = "claude-haiku-4-5-20251001"
EMBEDDING = "text-embedding-3-large"
# What the deprecation monitor reads. provider must match the feed's
# "provider" field (OpenAI, Anthropic, Google AI, AWS Bedrock, ...).
MODELS_IN_USE = [
{"provider": "OpenAI", "model_id": GPT_5, "used_for": "main agent"},
{"provider": "Anthropic", "model_id": CLAUDE_HAIKU, "used_for": "cheap classification"},
{"provider": "OpenAI", "model_id": EMBEDDING, "used_for": "RAG embeddings"},
]Step 2: the deprecation check#
This pulls the feed, matches it against MODELS_IN_USE, and flags anything with a shutdown date inside a window you care about. I set the window generously (120 days) so the first alert arrives early.
# scripts/check_deprecations.py
import json
import sys
import urllib.request
from datetime import date
sys.path.insert(0, ".")
from app.models import MODELS_IN_USE
FEED_URL = "https://deprecations.info/v1/deprecations.json"
WARN_WINDOW_DAYS = 120 # start caring this many days before shutdown
def fetch_feed() -> list[dict]:
req = urllib.request.Request(FEED_URL, headers={"User-Agent": "dep-monitor"})
with urllib.request.urlopen(req, timeout=30) as r:
return json.load(r)
def days_left(shutdown_date: str | None) -> int | None:
if not shutdown_date:
return None
return (date.fromisoformat(shutdown_date) - date.today()).days
def check() -> list[dict]:
feed = fetch_feed()
by_model: dict[str, list[dict]] = {}
for entry in feed:
by_model.setdefault(entry["model_id"], []).append(entry)
findings = []
for m in MODELS_IN_USE:
for entry in by_model.get(m["model_id"], []):
left = days_left(entry.get("shutdown_date"))
if left is None or left <= WARN_WINDOW_DAYS:
findings.append({
"model_id": m["model_id"],
"used_for": m["used_for"],
"provider": entry["provider"],
"shutdown_date": entry.get("shutdown_date", "unknown"),
"days_left": left,
"replacements": entry.get("replacement_models", []),
"url": entry.get("url", ""),
})
return findings
if __name__ == "__main__":
findings = check()
if findings:
print(json.dumps(findings, indent=2))
sys.exit(1) # non-zero so the GitHub Action goes red
print("No deprecations affecting models in use.")Matching on exact model_id is deliberate. You could also do a substring match on the model family to catch related snapshots, but exact matching keeps the signal clean: it only fires for a model you actually run.
Step 3: the liveness probe#
The feed tells you what providers have announced. A liveness probe tells you the truth right now: does this exact model ID still answer when you call it? This catches the cases the feed cannot, like a model pulled earlier than announced, an entitlement you lost, or a typo in your constants file that has been silently failing.
Keep it cheap. One token is enough to learn whether the model is alive.
# scripts/check_liveness.py
import sys
from openai import OpenAI, NotFoundError
from anthropic import Anthropic
from app.models import MODELS_IN_USE
sys.path.insert(0, ".")
openai_client = OpenAI()
anthropic_client = Anthropic()
def probe(model: dict) -> str | None:
"""Return an error string if the model is not callable, else None."""
try:
if model["provider"] == "OpenAI":
if "embedding" in model["model_id"]:
openai_client.embeddings.create(model=model["model_id"], input="ping")
else:
openai_client.chat.completions.create(
model=model["model_id"],
messages=[{"role": "user", "content": "ping"}],
max_tokens=1,
)
elif model["provider"] == "Anthropic":
anthropic_client.messages.create(
model=model["model_id"],
max_tokens=1,
messages=[{"role": "user", "content": "ping"}],
)
return None
except NotFoundError:
return "model_not_found (already gone)"
except Exception as e: # auth, access, rate, network
return f"{type(e).__name__}: {e}"
if __name__ == "__main__":
dead = []
for m in MODELS_IN_USE:
err = probe(m)
status = "ok" if err is None else err
print(f"{m['provider']:10} {m['model_id']:30} {status}")
if err:
dead.append((m, err))
sys.exit(1 if dead else 0)The liveness probe costs a few tokens per model per run. Run it daily, not hourly. And run it against the same credentials your production app uses, so an access or entitlement problem shows up here instead of in front of a user.
Step 4: run it on a schedule in GitHub Actions#
Now wire both checks into a scheduled Action. The feed refreshes once a day, so a daily run is the right cadence. On any failure, send an alert.
# .github/workflows/model-deprecation-monitor.yml
name: Model Deprecation Monitor
on:
schedule:
- cron: "0 8 * * *" # daily at 08:00 UTC, after the feed's 02:00 refresh
workflow_dispatch: {} # let me run it by hand too
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install openai anthropic
- name: Check announced deprecations
id: deprecations
run: python scripts/check_deprecations.py
continue-on-error: true
- name: Check model liveness
id: liveness
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: python scripts/check_liveness.py
continue-on-error: true
- name: Alert if anything failed
if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
run: python scripts/alert.py
- name: Fail the run so it shows red
if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
run: exit 1The continue-on-error: true on the checks lets the alert step run even when a check fails, and the final step still marks the run red so it is visible in the Actions tab and on the commit.
Step 5: send the alert where your team lives#
The alert is just an HTTP POST. Slack with an incoming webhook is the simplest version, and the same shape works for Teams or anything else with a webhook. I capture both checks' output and post a single message.
# scripts/alert.py
import json
import os
import subprocess
import urllib.request
def run(cmd: list[str]) -> str:
return subprocess.run(cmd, capture_output=True, text=True).stdout.strip()
def post_slack(text: str) -> None:
webhook = os.environ["SLACK_WEBHOOK_URL"]
body = json.dumps({"text": text}).encode()
req = urllib.request.Request(
webhook, data=body, headers={"Content-Type": "application/json"}
)
urllib.request.urlopen(req, timeout=10)
if __name__ == "__main__":
dep = run(["python", "scripts/check_deprecations.py"])
live = run(["python", "scripts/check_liveness.py"])
message = (
"*Model deprecation monitor flagged something*\n\n"
f"*Announced deprecations:*\n```{dep}```\n"
f"*Liveness:*\n```{live}```\n"
"Plan the migration: swap the ID, re-run evals, re-check cost and latency."
)
post_slack(message)For Microsoft Teams, swap the webhook URL for a Teams incoming webhook and adjust the payload to Teams' format. For email, post to a transactional email API the same way, or lean on the no-code path: subscribe a team alias to the deprecations.info RSS feed through a service like Blogtrottr. The point is that the detection logic stays in code; only the last hop changes.
What to do when it fires#
An alert is the start of a migration, not the end of the world. A repeatable checklist beats panic:
- Confirm the deprecation on the provider's own docs page (the
urlin the feed entry links straight there). - Find every call site. Because every model string lives in
app/models.py, this is one grep, not a scavenger hunt. - Pick the replacement from
replacement_modelsand read its model card for behavior and pricing changes. - Re-run your evals against the new model. This is where the work from Evaluating agents with LangSmith pays off: you change one constant, run the suite, and read the diff instead of guessing.
- Re-check cost and latency, since the new model prices and responds differently. See Cutting LLM cost and latency.
- Ship the swap well before the shutdown date, and update the constant so the monitor goes green again.
The takeaway#
Model deprecation is a small, predictable risk with an outsized failure mode: a hard outage on no notice if you are not watching. The fix is cheap. Keep every model string in one constant file, point a daily GitHub Action at deprecations.info to catch announced shutdowns, add a liveness probe to catch the surprises, and route both to wherever your team already pays attention. Thirty seconds of CI a day buys you months of warning instead of a 2am page.
If you want to go further, the same scheduled-job pattern is how I think about the rest of LLM ops in Observability for LLM apps: instrument the boring thing once, and let it shout only when it matters.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.