Labor and human capital

Hercules

A reproducible WDI warehouse for labor markets, schooling, and human development.

Summary

Hercules is the labor and human capital observatory companion to Sisyphus, built on the same FastAPI plus SQLite pattern but a different curated indicator pack. The warehouse holds 19 World Bank WDI indicators across roughly 195 economies from 1960 to 2025, spanning unemployment, labor force participation, school enrollment, literacy, life expectancy, mortality, poverty, and GDP per capita PPP. Every series is paced from the public v2 API, written to a single SQLite warehouse, and exposed through a read-only HTTP contract for raw slices, persistence diagnostics, shock features, and regression-ready panels.

What it is

Hercules is a small, opinionated slice of World Bank WDI focused on employability, schooling, longevity, and the basic welfare context that frames almost every cross-country labor question. The curated pack is named wb_wdi_human_capital and groups 19 indicators into six topics: labor (unemployment by sex and education, labor force participation, employment to population), education (primary, secondary, and tertiary gross enrollment, adult literacy), health (life expectancy by sex, under-five mortality), welfare (poverty headcount at the 2017 PPP $2.15 line), demographic (population), and levels (GDP per capita PPP). Internal series codes use the hc. prefix so cross-observatory tooling can distinguish the human capital pack from the macro pack served by Sisyphus.

For client engagements, Hercules functions as private research infrastructure rather than a dashboard product. Teams use it to build comparable labor and schooling panels without the ad hoc download cycle, to test persistence and shock structure in unemployment or enrollment series, and to assemble regression-ready extracts for cross-country work on employment, gender gaps, and human development. The collector is paced and retried against the public v2 API, the warehouse honors gaps honestly rather than imputing them away, and every observation traces back to the World Bank indicator id and a row in collection_log. Engagements typically combine a custom data extract, a written brief, and a working session with the warehouse so the client team can extend the analysis and rerun the collector on its own cadence.

Methodology

  • Curated indicator pack of 19 World Bank WDI series grouped into six topics: labor, education, health, welfare, demographic, and levels.
  • Single collector (wb_wdi_human_capital) pulling annual observations for roughly 195 economies across the 1960 to 2025 window from the v2 API.
  • Paced ingestion at 0.45 seconds between indicator calls, with up to five retries and exponential backoff on transient HTTP statuses (400, 429, 5xx).
  • SQLite warehouse in WAL mode with five tables (sources, series, observations, countries, collection_log) and uniqueness keyed on series, country ISO3, year, and month.
  • Series-level analytics: AR(1) persistence and half-life, rolling volatility, year-over-year and log-difference growth, shock flags at z-score thresholds, and CAGR over endpoint windows.
  • Cross-country comparators ranked by AR(1) persistence and volatility, with a minimum observation gate to avoid noisy short panels.
  • Long-format panel endpoint with up to 30 series codes per request and panel quality counts (cells observed, cells meeting min series threshold, balanced ratio).
  • Weekly Sunday 06:00 UTC refresh through APScheduler, with collection_log capturing status, row counts, and error text for every run.

Data sources

  • World Bank WDI labor series (SL.UEM.* unemployment, SL.TLF.CACT.* labor force participation, SL.EMP.TOTL.SP.ZS employment to population)
  • World Bank WDI education series (SE.PRM.ENRR, SE.SEC.ENRR, SE.TER.ENRR gross enrollment, SE.ADT.LITR.ZS adult literacy)
  • World Bank WDI health series (SP.DYN.LE00.* life expectancy by sex, SH.DYN.MORT under-five mortality)
  • World Bank WDI welfare series (SI.POV.DDAY poverty headcount at 2017 PPP $2.15 per day)
  • World Bank WDI demographic and levels series (SP.POP.TOTL population, NY.GDP.PCAP.PP.KD GDP per capita PPP, constant 2021 international dollars)
  • World Bank country and region reference (ISO3 codes, region, income group) seeded from the shared registry
  • World Bank Health, Nutrition and Population (HNP) variants exposed through WDI for mortality and life expectancy
  • World Bank v2 public API (GET /v2/country/all/indicator/{id}) under CC-BY 4.0 dataset terms

Deliverables when used in engagements

  • Custom labor, schooling, and human development extracts as long-format panels across countries and years.
  • Persistence and shock diagnostics (AR(1), half-life, volatility, z-score flags) for any country and series.
  • Cross-country comparator rankings for unemployment, participation, enrollment, and life expectancy series.
  • Written briefs with every figure and claim cited to a row in the observations or collection_log tables.
  • Working session and warehouse handoff so the client team can rerun the collector and extend the analysis.