How engagements actually run
Every engagement runs on the same six-phase research workflow we use for academic papers. The discipline travels intact from journal submissions to client memos.
The phase-based workflow
Every engagement, whether a two-week diagnostic or a six-month evaluation, runs through the same six phases: pre-analysis plan, data quality audit, estimation and robustness, writing, internal audit, and handoff. The structure comes from our open-source research toolkit, Delphi, where it governs how academic papers move from question to submission. We did not invent a separate consulting workflow. The same checks that a hostile referee would apply at the American Economic Review apply to a memo we hand a chief risk officer.
Phases are gated. We do not begin estimation until the pre-analysis plan is signed off and the data quality audit has produced a clean report. We do not write until estimation has cleared a robustness battery. We do not hand off until an internal audit, conducted by a colleague who did not touch the analysis, has walked the replication package end to end. Clients see the gate documents at every phase, so there are no surprises in the final readout. If a phase fails its check, we stop and re-scope rather than paper over the gap.
Phase 1: pre-analysis plan
Before any data is touched, we produce a written pre-analysis plan. It states the contribution in one paragraph (what is the question, what is new, who already answered something close), specifies the identification strategy in formal notation, and pre-commits to the primary specification, the sample, and the inference rule. A contribution audit forces us to defend why the engagement is worth running before we spend a client dollar on estimation.
The plan justifies the identification strategy on its own terms. For difference-in-differences, we document the parallel trends assumption with pre-period plots and event-study coefficients, and we name the credible threats. For regression discontinuity, we justify the bandwidth choice (Imbens-Kalyanaraman, Calonico-Cattaneo-Titiunik) and pre-register the polynomial order. For instrumental variables, we state the exclusion restriction in plain language and commit to first-stage diagnostics. Balance tests across treatment and control groups are reported before any outcome regression. The plan is timestamped and shared with the client so the analysis cannot drift toward the result anyone hoped to find.
Phase 2: data quality audit
No mock data, ever. Every variable in every regression traces back to a primary source: World Bank WDI, FRED, IMF IFS, BACI, Comtrade, FDIC call reports, FATF mutual evaluations, BBS publications, or client-supplied administrative records with documented provenance. We pin the vintage (for example, WDI 2024Q2), record the access date, and store the raw file untouched in a read-only directory. The same standards we use in our Argus risk observatory apply here: ISO3 codes uppercase, units stated in metadata, zero-centered indices never silently filtered.
The audit produces a written data quality report before any estimation runs. It documents schema validation (column types, expected ranges, primary key uniqueness), missingness patterns by variable and panel cell, outlier flags with decisions recorded, and reconciliation checks against any known totals. Collection logs capture every download, every transformation, and every dropped observation with a stated reason. If a series we needed is unavailable, we say so. We do not impute, extrapolate, or substitute. The report ships to the client and into the replication package as a permanent artifact.
Phase 3: estimation and robustness
We always start with a baseline OLS specification, even when a more sophisticated estimator is the eventual answer. Heteroskedasticity-robust HC1 standard errors are the default. For panel data, we cluster at the unit level unless theory or the sampling design says otherwise, and we always state the clustering level in the table notes. For instrumental variables, we report the first-stage F statistic (Stock-Yogo and Olea-Pflueger weak-identification thresholds, not just the rule of thumb). For fixed-effects models, we report within-R-squared. For cross-sectional regressions, we check variance inflation factors before claiming a coefficient is well identified.
From that baseline, we add the identification strategy the question demands: difference-in-differences with Callaway-Sant'Anna or de Chaisemartin-D'Haultfoeuille corrections when treatment timing is staggered, regression discontinuity with robust bias-corrected inference, synthetic control with placebo permutation tests, double machine learning when the nuisance functions are high-dimensional, causal forest for heterogeneous treatment effects. Every causal claim ships with a bounds analysis: Manski worst-case bounds where assumptions are minimal, Rambachan-Roth sensitivity for parallel-trends violations in DiD, Oster bounds for selection on unobservables. Placebo tests and randomization inference are run as a battery, not a single check. If a result does not survive, we say so in the memo.
Phase 4: writing
Writing follows a tested formula. The introduction delivers the question, the contribution, the data, the method, the result, and the policy implication in five paragraphs, each with at least one number. Active voice throughout. Acronyms are defined at first use, every time, with no exceptions for terms the writer thinks are obvious. Numbers below ten are spelled out in prose; ten and above use digits, with all numerics inside tables and equations.
We observe an em-dash discipline. No em-dashes appear anywhere in our writing: not in memos, not in slide decks, not in client emails. Commas, colons, semicolons, parentheses, and short separate sentences carry the same load with no loss of clarity. Tables and figures are not embedded inline. Body sections are prose only. All exhibits are collected after the references on dedicated pages, each captioned, each readable on its own. This is not a stylistic preference. It is the convention our reports follow because it is the convention serious empirical work follows, and it forces the prose to stand up without leaning on a chart.
Phase 5: audit
Before anything reaches the client, a colleague who did not touch the analysis runs an internal audit. Coverage check: every table and every figure must be discussed in the text with a specific number interpreted, not just referenced in passing. Necessity check: every table and every figure must earn its place, with redundant exhibits cut and exhibits that could be a single sentence demoted to a sentence.
The hostile-referee pass simulates the worst-case reviewer across five dimensions: identification (is the strategy actually credible), data (could the result be a measurement artifact), econometrics (are the standard errors right, is the clustering right, is the bandwidth right), results (are the magnitudes plausible against external benchmarks), contribution (is this differentiated from prior work). Pre-emptive defenses are written into the memo before submission. The auditor then walks the replication package end to end on a clean machine, runs every script from raw data to final exhibit, and confirms byte-identical outputs. If anything fails to reproduce, the engagement does not ship.
Phase 6: handoff
Delivery is four artifacts. First, the memo: typically twelve to twenty-five pages, written in the AER house style we use for our own papers, with executive summary, methods, results, robustness, limitations, and policy implications. Second, an interactive dashboard for the metrics the client will track between our involvement and theirs, built on the same Plotly and FastAPI stack that powers our public observatories. Third, the replication package: code, data with provenance metadata, pinned package versions in pyproject.toml and renv.lock, random seeds documented, the full pre-analysis plan, the data quality report, and the audit log.
We walk the client through everything in a readout call that defaults to ninety minutes. Two follow-up sessions are included at no extra cost, scheduled at thirty and ninety days, so the analysis stays useful as the client begins to act on it. After that, we offer an optional support retainer for clients who want a standing line into the team for interpretation, light extensions, or peer review of internal work. Most engagements end at the second follow-up. The replication package is yours to keep, modify, and rerun forever.