How we source real (not fake) data for 34 API namespaces

Every route file in this codebase that pulls from an external concept — not pure user input — starts with a three-line comment block: DATA SOURCE, OWNERSHIP, REFRESH STRATEGY. It started as internal discipline and turned out to be a useful audit trail once the platform crossed 400 endpoints. Here are the three real categories that fall out of it.

1. Pure computation — no external dependency at all

The largest category by far. Loan amortization, SIP future value, compound interest, net worth, break-even analysis, BMI, body fat (US Navy and Hamwi/Devine/Robinson/Miller formulas), one-rep max (Epley/Brzycki/Lombardi), training heart-rate zones (Karvonen and Tanaka formulas), timezone conversion (Node.js's own IANA database via Intl.DateTimeFormat), Haversine great-circle distance, CO2 emissions, tyre size decoding. These endpoints have zero upstream dependency and zero staleness risk — the same input always produces the same output, forever, because it's a named formula with no external state.

2. Free, no-key proxies — live pass-through to public APIs

Weather comes from Open-Meteo. Forex comes from Frankfurter (which mirrors ECB reference rates). Reverse geocoding comes from OSM Nominatim. Stock and crypto quotes come from Yahoo Finance's unofficial endpoint. None of these require an API key on our end, which matters for a platform where developers pay us, not a third party — every paid upstream dependency is a margin risk and a reliability risk we'd be passing on to customers without their knowledge.

3. Owned datasets — ingested once, served from our own database

Three namespaces ingest real, licensed datasets directly into Supabase rather than proxying live:

Cricket match data from Cricsheet (cricsheet.org), ODC-BY 1.0 licensed — ingested into our own cricket_matches and cricket_innings tables.
University data from Hipo's university-domains-list (MIT licensed) — ingested into our own universities table.
Food and nutrition data ingested into owned food_items and food_nutrients tables.
100 originally-authored recipes — written for APlicious specifically, so there's no third-party licensing risk at all, unlike scraped recipe content.

Owned data means no upstream rate limit, no upstream outage risk, and — for the recipes specifically — no licensing exposure, at the cost of having to maintain and periodically refresh the dataset ourselves.

Why this matters more than it sounds like it should

45 distinct data sources are documented this way across the codebase right now. Without the discipline of writing down provenance at the route level, it's easy for a platform this size to drift into a state where nobody actually knows which endpoints are load-bearing on an external service that could disappear, rate-limit, or change its response shape without notice. The comment costs three lines. The alternative — finding out an endpoint silently broke because an unannounced upstream dependency went away — costs a lot more.