Monitoring Model Deprecation in Production, Folarin Akinloye

The model you carefully pinned six months ago will be switched off one day, and the provider will give you less notice than you would like. When it happens, your app does not degrade gracefully. It starts returning hard errors (model_not_found) on every request that touches that model, and you find out from a support ticket instead of from a calendar reminder. This post is about flipping that around: detecting deprecations in CI, on a schedule, so the warning lands in your team's channel months before the shutdown.

I will use deprecations.info, a small open project that scrapes every major provider's deprecation page into one feed, and build the workflow you actually want around it: a single file listing the models your app uses, a scheduled GitHub Action that cross-references that file against the feed and pings each model to confirm it still answers, and an alert when something is wrong.

Why this matters more than it looks#

Model deprecation feels like a non-problem until the first time it bites you. Here is why it is worth a small amount of infrastructure.

The failure mode is a hard outage, not a slow decline. When a provider retires a model ID, calls to it stop working. There is no fallback, no "slightly worse answer", just errors on the code path that used that model. If that path is your core feature, your core feature is down.

Pinned snapshots are exactly what gets retired. Good practice is to pin a dated snapshot like gpt-5-2025-08-07 or claude-haiku-4-5-20251001 so behavior stays stable. The tradeoff is that dated snapshots are the first things on the chopping block. The alternative, using a floating alias like gpt-5, trades a hard shutdown for silent behavior change, which is its own problem. Either way you need to know when the thing underneath you moves.

The notice is short and the migration is not. Providers often give a few months between announcement and shutdown. That sounds like plenty until you remember what migrating actually involves: swapping the ID, re-running your evals, checking that prompts still behave, confirming structured output still parses, and re-checking cost and latency because the new model prices and responds differently. If you discover the deprecation with three weeks left, that is a scramble. With three months, it is a calm ticket.

The announcement rarely reaches the right person. Providers email the account owner and update a docs page. The account owner is often not the engineer who maintains the code, and nobody reads every provider's changelog daily. Detection needs to live where your engineers already look: a pull request check, a red CI run, a Slack message. Not an inbox someone forwards three weeks late.

It is cheap insurance. A scheduled job that runs in thirty seconds a day is nothing against one 2am incident, one emergency migration, or one breached uptime promise to your own customers. This is the cheapest reliability work you will do all quarter.

Note

This is the proactive half of Observability for LLM apps. Observability tells you something broke. Deprecation monitoring tells you something is going to break, with enough lead time to fix it on your terms.

What deprecations.info gives you#

The site does one boring, useful thing. A GitHub Action runs daily (2 AM UTC), scrapes the deprecation pages of OpenAI, Anthropic, Google AI/Gemini, Google Vertex AI, AWS Bedrock, Cohere, Groq, xAI, and Azure, extracts each notice, and publishes the result in three formats: an RSS feed, a JSON Feed, and a raw JSON array. No API key, no auth. The whole thing is open source (deprecations/deprecations-rss, MIT, almost entirely Python) so you can read exactly how each provider page is parsed, or run it yourself.

For programmatic monitoring, skip the RSS and use the raw JSON. Each entry is structured and easy to match against:

{
  "provider": "OpenAI",
  "model_id": "gpt-5-2025-08-07",
  "announcement_date": "2026-06-16",
  "shutdown_date": "2026-12-11",
  "deprecation_date": "2026-06-11",
  "replacement_models": ["gpt-5.5"],
  "deprecation_context": "On June 11, 2026, we notified developers using older GPT-5 and o3 model snapshots of their deprecation and removal from the API on December 11, 2026.",
  "url": "https://platform.openai.com/docs/deprecations#2026-06-11-gpt-5-and-o3-model-deprecations",
  "scraped_at": "2026-06-16T07:28:54Z",
  "first_observed": "2026-06-16",
  "last_observed": "2026-06-16"
}

The fields that matter for automation are model_id (to match against what you use), shutdown_date (to compute urgency), and replacement_models (to tell you where to go next). That is enough to build the whole workflow.

Caution

Treat the feed as an early-warning system, not the source of truth. It scrapes provider pages, so it can lag a page change or miss an oddly formatted notice, and it collapses region-specific dates into the earliest one. The provider's own docs are still authoritative. That is also why the workflow below adds a second, independent check: a liveness probe.

The workflow#

Two ideas make this solid. First, one file is the single source of truth for every model your app uses, so the monitor never guesses. Second, you check two independent things: the feed (an announced future shutdown) and a live probe (is the model actually responding right now). The feed catches planned deprecations early; the probe catches surprises the feed missed, plus access changes, typos, and regional rollouts.

Step 1: one models constant file#

Stop scattering model strings across your codebase. Put them in one place and import from it everywhere. This file is what the monitor reads, and it is also just good hygiene.

# app/models.py
# The single source of truth for every model this app calls.
# Import MODEL_* from here everywhere; never hard-code a model string elsewhere.
 
GPT_5 = "gpt-5-2025-08-07"
CLAUDE_HAIKU = "claude-haiku-4-5-20251001"
EMBEDDING = "text-embedding-3-large"
 
# What the deprecation monitor reads. provider must match the feed's
# "provider" field (OpenAI, Anthropic, Google AI, AWS Bedrock, ...).
MODELS_IN_USE = [
    {"provider": "OpenAI",    "model_id": GPT_5,        "used_for": "main agent"},
    {"provider": "Anthropic", "model_id": CLAUDE_HAIKU, "used_for": "cheap classification"},
    {"provider": "OpenAI",    "model_id": EMBEDDING,    "used_for": "RAG embeddings"},
]

Step 2: the deprecation check#

This pulls the feed, matches it against MODELS_IN_USE, and flags anything with a shutdown date inside a window you care about. I set the window generously (120 days) so the first alert arrives early.

# scripts/check_deprecations.py
import json
import sys
import urllib.request
from datetime import date
 
sys.path.insert(0, ".")
from app.models import MODELS_IN_USE
 
FEED_URL = "https://deprecations.info/v1/deprecations.json"
WARN_WINDOW_DAYS = 120  # start caring this many days before shutdown
 
 
def fetch_feed() -> list[dict]:
    req = urllib.request.Request(FEED_URL, headers={"User-Agent": "dep-monitor"})
    with urllib.request.urlopen(req, timeout=30) as r:
        return json.load(r)
 
 
def days_left(shutdown_date: str | None) -> int | None:
    if not shutdown_date:
        return None
    return (date.fromisoformat(shutdown_date) - date.today()).days
 
 
def check() -> list[dict]:
    feed = fetch_feed()
    by_model: dict[str, list[dict]] = {}
    for entry in feed:
        by_model.setdefault(entry["model_id"], []).append(entry)
 
    findings = []
    for m in MODELS_IN_USE:
        for entry in by_model.get(m["model_id"], []):
            left = days_left(entry.get("shutdown_date"))
            if left is None or left <= WARN_WINDOW_DAYS:
                findings.append({
                    "model_id": m["model_id"],
                    "used_for": m["used_for"],
                    "provider": entry["provider"],
                    "shutdown_date": entry.get("shutdown_date", "unknown"),
                    "days_left": left,
                    "replacements": entry.get("replacement_models", []),
                    "url": entry.get("url", ""),
                })
    return findings
 
 
if __name__ == "__main__":
    findings = check()
    if findings:
        print(json.dumps(findings, indent=2))
        sys.exit(1)  # non-zero so the GitHub Action goes red
    print("No deprecations affecting models in use.")

Matching on exact model_id is deliberate. You could also do a substring match on the model family to catch related snapshots, but exact matching keeps the signal clean: it only fires for a model you actually run.

Step 3: the liveness probe#

The feed tells you what providers have announced. A liveness probe tells you the truth right now: does this exact model ID still answer when you call it? This catches the cases the feed cannot, like a model pulled earlier than announced, an entitlement you lost, or a typo in your constants file that has been silently failing.

Keep it cheap. One token is enough to learn whether the model is alive.

# scripts/check_liveness.py
import sys
from openai import OpenAI, NotFoundError
from anthropic import Anthropic
from app.models import MODELS_IN_USE
 
sys.path.insert(0, ".")
 
openai_client = OpenAI()
anthropic_client = Anthropic()
 
 
def probe(model: dict) -> str | None:
    """Return an error string if the model is not callable, else None."""
    try:
        if model["provider"] == "OpenAI":
            if "embedding" in model["model_id"]:
                openai_client.embeddings.create(model=model["model_id"], input="ping")
            else:
                openai_client.chat.completions.create(
                    model=model["model_id"],
                    messages=[{"role": "user", "content": "ping"}],
                    max_tokens=1,
                )
        elif model["provider"] == "Anthropic":
            anthropic_client.messages.create(
                model=model["model_id"],
                max_tokens=1,
                messages=[{"role": "user", "content": "ping"}],
            )
        return None
    except NotFoundError:
        return "model_not_found (already gone)"
    except Exception as e:  # auth, access, rate, network
        return f"{type(e).__name__}: {e}"
 
 
if __name__ == "__main__":
    dead = []
    for m in MODELS_IN_USE:
        err = probe(m)
        status = "ok" if err is None else err
        print(f"{m['provider']:10} {m['model_id']:30} {status}")
        if err:
            dead.append((m, err))
    sys.exit(1 if dead else 0)

Tip

The liveness probe costs a few tokens per model per run. Run it daily, not hourly. And run it against the same credentials your production app uses, so an access or entitlement problem shows up here instead of in front of a user.

Step 4: run it on a schedule in GitHub Actions#

Now wire both checks into a scheduled Action. The feed refreshes once a day, so a daily run is the right cadence. On any failure, send an alert.

# .github/workflows/model-deprecation-monitor.yml
name: Model Deprecation Monitor
 
on:
  schedule:
    - cron: "0 8 * * *"   # daily at 08:00 UTC, after the feed's 02:00 refresh
  workflow_dispatch: {}    # let me run it by hand too
 
jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install openai anthropic
 
      - name: Check announced deprecations
        id: deprecations
        run: python scripts/check_deprecations.py
        continue-on-error: true
 
      - name: Check model liveness
        id: liveness
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python scripts/check_liveness.py
        continue-on-error: true
 
      - name: Alert if anything failed
        if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
        run: python scripts/alert.py
 
      - name: Fail the run so it shows red
        if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
        run: exit 1

The continue-on-error: true on the checks lets the alert step run even when a check fails, and the final step still marks the run red so it is visible in the Actions tab and on the commit.

Step 5: send the alert where your team lives#

The alert is just an HTTP POST. Slack with an incoming webhook is the simplest version, and the same shape works for Teams or anything else with a webhook. I capture both checks' output and post a single message.

# scripts/alert.py
import json
import os
import subprocess
import urllib.request
 
 
def run(cmd: list[str]) -> str:
    return subprocess.run(cmd, capture_output=True, text=True).stdout.strip()
 
 
def post_slack(text: str) -> None:
    webhook = os.environ["SLACK_WEBHOOK_URL"]
    body = json.dumps({"text": text}).encode()
    req = urllib.request.Request(
        webhook, data=body, headers={"Content-Type": "application/json"}
    )
    urllib.request.urlopen(req, timeout=10)
 
 
if __name__ == "__main__":
    dep = run(["python", "scripts/check_deprecations.py"])
    live = run(["python", "scripts/check_liveness.py"])
    message = (
        "*Model deprecation monitor flagged something*\n\n"
        f"*Announced deprecations:*\n```{dep}```\n"
        f"*Liveness:*\n```{live}```\n"
        "Plan the migration: swap the ID, re-run evals, re-check cost and latency."
    )
    post_slack(message)

For Microsoft Teams, swap the webhook URL for a Teams incoming webhook and adjust the payload to Teams' format. For email, post to a transactional email API the same way, or lean on the no-code path: subscribe a team alias to the deprecations.info RSS feed through a service like Blogtrottr. The point is that the detection logic stays in code; only the last hop changes.

What to do when it fires#

An alert is the start of a migration, not the end of the world. A repeatable checklist beats panic:

Confirm the deprecation on the provider's own docs page (the url in the feed entry links straight there).
Find every call site. Because every model string lives in app/models.py, this is one grep, not a scavenger hunt.
Pick the replacement from replacement_models and read its model card for behavior and pricing changes.
Re-run your evals against the new model. This is where the work from Evaluating agents with LangSmith pays off: you change one constant, run the suite, and read the diff instead of guessing.
Re-check cost and latency, since the new model prices and responds differently. See Cutting LLM cost and latency.
Ship the swap well before the shutdown date, and update the constant so the monitor goes green again.

The takeaway#

Model deprecation is a small, predictable risk with an outsized failure mode: a hard outage on no notice if you are not watching. The fix is cheap. Keep every model string in one constant file, point a daily GitHub Action at deprecations.info to catch announced shutdowns, add a liveness probe to catch the surprises, and route both to wherever your team already pays attention. Thirty seconds of CI a day buys you months of warning instead of a 2am page.

If you want to go further, the same scheduled-job pattern is how I think about the rest of LLM ops in Observability for LLM apps: instrument the boring thing once, and let it shout only when it matters.

Why this matters more than it looks#

Model deprecation feels like a non-problem until the first time it bites you. Here is why it is worth a small amount of infrastructure.

Note

What deprecations.info gives you#

For programmatic monitoring, skip the RSS and use the raw JSON. Each entry is structured and easy to match against:

{
  "provider": "OpenAI",
  "model_id": "gpt-5-2025-08-07",
  "announcement_date": "2026-06-16",
  "shutdown_date": "2026-12-11",
  "deprecation_date": "2026-06-11",
  "replacement_models": ["gpt-5.5"],
  "deprecation_context": "On June 11, 2026, we notified developers using older GPT-5 and o3 model snapshots of their deprecation and removal from the API on December 11, 2026.",
  "url": "https://platform.openai.com/docs/deprecations#2026-06-11-gpt-5-and-o3-model-deprecations",
  "scraped_at": "2026-06-16T07:28:54Z",
  "first_observed": "2026-06-16",
  "last_observed": "2026-06-16"
}

Caution

The workflow#

Step 1: one models constant file#

Stop scattering model strings across your codebase. Put them in one place and import from it everywhere. This file is what the monitor reads, and it is also just good hygiene.

# app/models.py
# The single source of truth for every model this app calls.
# Import MODEL_* from here everywhere; never hard-code a model string elsewhere.
 
GPT_5 = "gpt-5-2025-08-07"
CLAUDE_HAIKU = "claude-haiku-4-5-20251001"
EMBEDDING = "text-embedding-3-large"
 
# What the deprecation monitor reads. provider must match the feed's
# "provider" field (OpenAI, Anthropic, Google AI, AWS Bedrock, ...).
MODELS_IN_USE = [
    {"provider": "OpenAI",    "model_id": GPT_5,        "used_for": "main agent"},
    {"provider": "Anthropic", "model_id": CLAUDE_HAIKU, "used_for": "cheap classification"},
    {"provider": "OpenAI",    "model_id": EMBEDDING,    "used_for": "RAG embeddings"},
]

Step 2: the deprecation check#

# scripts/check_deprecations.py
import json
import sys
import urllib.request
from datetime import date
 
sys.path.insert(0, ".")
from app.models import MODELS_IN_USE
 
FEED_URL = "https://deprecations.info/v1/deprecations.json"
WARN_WINDOW_DAYS = 120  # start caring this many days before shutdown
 
 
def fetch_feed() -> list[dict]:
    req = urllib.request.Request(FEED_URL, headers={"User-Agent": "dep-monitor"})
    with urllib.request.urlopen(req, timeout=30) as r:
        return json.load(r)
 
 
def days_left(shutdown_date: str | None) -> int | None:
    if not shutdown_date:
        return None
    return (date.fromisoformat(shutdown_date) - date.today()).days
 
 
def check() -> list[dict]:
    feed = fetch_feed()
    by_model: dict[str, list[dict]] = {}
    for entry in feed:
        by_model.setdefault(entry["model_id"], []).append(entry)
 
    findings = []
    for m in MODELS_IN_USE:
        for entry in by_model.get(m["model_id"], []):
            left = days_left(entry.get("shutdown_date"))
            if left is None or left <= WARN_WINDOW_DAYS:
                findings.append({
                    "model_id": m["model_id"],
                    "used_for": m["used_for"],
                    "provider": entry["provider"],
                    "shutdown_date": entry.get("shutdown_date", "unknown"),
                    "days_left": left,
                    "replacements": entry.get("replacement_models", []),
                    "url": entry.get("url", ""),
                })
    return findings
 
 
if __name__ == "__main__":
    findings = check()
    if findings:
        print(json.dumps(findings, indent=2))
        sys.exit(1)  # non-zero so the GitHub Action goes red
    print("No deprecations affecting models in use.")

Step 3: the liveness probe#

Keep it cheap. One token is enough to learn whether the model is alive.

# scripts/check_liveness.py
import sys
from openai import OpenAI, NotFoundError
from anthropic import Anthropic
from app.models import MODELS_IN_USE
 
sys.path.insert(0, ".")
 
openai_client = OpenAI()
anthropic_client = Anthropic()
 
 
def probe(model: dict) -> str | None:
    """Return an error string if the model is not callable, else None."""
    try:
        if model["provider"] == "OpenAI":
            if "embedding" in model["model_id"]:
                openai_client.embeddings.create(model=model["model_id"], input="ping")
            else:
                openai_client.chat.completions.create(
                    model=model["model_id"],
                    messages=[{"role": "user", "content": "ping"}],
                    max_tokens=1,
                )
        elif model["provider"] == "Anthropic":
            anthropic_client.messages.create(
                model=model["model_id"],
                max_tokens=1,
                messages=[{"role": "user", "content": "ping"}],
            )
        return None
    except NotFoundError:
        return "model_not_found (already gone)"
    except Exception as e:  # auth, access, rate, network
        return f"{type(e).__name__}: {e}"
 
 
if __name__ == "__main__":
    dead = []
    for m in MODELS_IN_USE:
        err = probe(m)
        status = "ok" if err is None else err
        print(f"{m['provider']:10} {m['model_id']:30} {status}")
        if err:
            dead.append((m, err))
    sys.exit(1 if dead else 0)

Tip

Step 4: run it on a schedule in GitHub Actions#

Now wire both checks into a scheduled Action. The feed refreshes once a day, so a daily run is the right cadence. On any failure, send an alert.

# .github/workflows/model-deprecation-monitor.yml
name: Model Deprecation Monitor
 
on:
  schedule:
    - cron: "0 8 * * *"   # daily at 08:00 UTC, after the feed's 02:00 refresh
  workflow_dispatch: {}    # let me run it by hand too
 
jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install openai anthropic
 
      - name: Check announced deprecations
        id: deprecations
        run: python scripts/check_deprecations.py
        continue-on-error: true
 
      - name: Check model liveness
        id: liveness
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python scripts/check_liveness.py
        continue-on-error: true
 
      - name: Alert if anything failed
        if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
        run: python scripts/alert.py
 
      - name: Fail the run so it shows red
        if: steps.deprecations.outcome == 'failure' || steps.liveness.outcome == 'failure'
        run: exit 1

The continue-on-error: true on the checks lets the alert step run even when a check fails, and the final step still marks the run red so it is visible in the Actions tab and on the commit.

Step 5: send the alert where your team lives#

# scripts/alert.py
import json
import os
import subprocess
import urllib.request
 
 
def run(cmd: list[str]) -> str:
    return subprocess.run(cmd, capture_output=True, text=True).stdout.strip()
 
 
def post_slack(text: str) -> None:
    webhook = os.environ["SLACK_WEBHOOK_URL"]
    body = json.dumps({"text": text}).encode()
    req = urllib.request.Request(
        webhook, data=body, headers={"Content-Type": "application/json"}
    )
    urllib.request.urlopen(req, timeout=10)
 
 
if __name__ == "__main__":
    dep = run(["python", "scripts/check_deprecations.py"])
    live = run(["python", "scripts/check_liveness.py"])
    message = (
        "*Model deprecation monitor flagged something*\n\n"
        f"*Announced deprecations:*\n```{dep}```\n"
        f"*Liveness:*\n```{live}```\n"
        "Plan the migration: swap the ID, re-run evals, re-check cost and latency."
    )
    post_slack(message)

What to do when it fires#

An alert is the start of a migration, not the end of the world. A repeatable checklist beats panic:

Confirm the deprecation on the provider's own docs page (the url in the feed entry links straight there).
Find every call site. Because every model string lives in app/models.py, this is one grep, not a scavenger hunt.
Pick the replacement from replacement_models and read its model card for behavior and pricing changes.
Re-run your evals against the new model. This is where the work from Evaluating agents with LangSmith pays off: you change one constant, run the suite, and read the diff instead of guessing.
Re-check cost and latency, since the new model prices and responds differently. See Cutting LLM cost and latency.
Ship the swap well before the shutdown date, and update the constant so the monitor goes green again.

Monitoring Model Deprecation in Production

Why this matters more than it looks#

What deprecations.info gives you#

The workflow#

Step 1: one models constant file#

Step 2: the deprecation check#

Step 3: the liveness probe#

Step 4: run it on a schedule in GitHub Actions#

Step 5: send the alert where your team lives#

What to do when it fires#

The takeaway#

Related articles

Observability for LLM Apps: What to Log, What to Alert On

Caching Agent Tool Calls (Not Just Prompts)

Prompt Caching for LLM Apps: What It Is and When It Pays Off

Monitoring Model Deprecation in Production

Why this matters more than it looks#

What deprecations.info gives you#

The workflow#

Step 1: one models constant file#

Step 2: the deprecation check#

Step 3: the liveness probe#

Step 4: run it on a schedule in GitHub Actions#

Step 5: send the alert where your team lives#

What to do when it fires#

The takeaway#

Related articles

Observability for LLM Apps: What to Log, What to Alert On

Caching Agent Tool Calls (Not Just Prompts)

Prompt Caching for LLM Apps: What It Is and When It Pays Off