Functional correctness and uptime are two different questions. An endpoint can be perfectly correct today and silently break tomorrow because an upstream free API changed its response shape, hit a rate limit, or went down entirely. We needed a way to know that happened without manually re-testing every namespace by hand.
One real sample call per namespace, not per endpoint
Checking all 400+ endpoints twice a day would be excessive — most failure modes (an upstream API going down, a shared library breaking) affect an entire namespace at once, not one endpoint in isolation. So the health check hits one real, valid sample call per live namespace — the same list our public /status page already uses for its client-side checks, now shared in one module so the two can never drift apart.
export async function checkServiceHealth(baseUrl: string, endpoint: string) {
const t0 = Date.now();
try {
const res = await fetch(`${baseUrl}${endpoint}`, { signal: AbortSignal.timeout(8000), cache: "no-store" });
const latencyMs = Date.now() - t0;
if (res.status === 503) return { status: "degraded", latencyMs };
if (res.status >= 500) return { status: "outage", latencyMs };
return { status: "operational", latencyMs };
} catch {
return { status: "outage", latencyMs: Date.now() - t0 };
}
}Twice a day, via Vercel cron
A cron entry in vercel.json hits the health-check route at 6 AM and 6 PM UTC. Vercel automatically sends an Authorization: Bearer $CRON_SECRET header on cron-triggered requests, which the route checks before doing anything — the same route also accepts a manual trigger from an admin dashboard button, gated by a separate admin-token cookie check instead of the cron secret.
Results go to two places
Every check writes a row to a history table (namespace, status, latency, timestamp), so there's a queryable record over time, not just a current snapshot. If a check comes back as anything other than operational, it additionally writes an alert into our existing admin alerts table — reusing infrastructure that already existed for payment and security events — and sends an email.
Why alert-only, not a routine digest
We deliberately did not build a “here's your twice-daily status report” email. On a normal day, every namespace is operational, and an email saying so twice a day forever is exactly the kind of notification that trains you to stop reading notifications. Silence means healthy. An email only arrives when a namespace actually fails its check — which means every email is real signal, not noise to filter out.