Research Unit Tests
Structured quality checks for academic research papers — analogous to unit tests in software engineering.
Tests range from deterministic checks — does the replication package run? — to judgment calls — is the contribution interesting? Each test specifies what to check, how an agent should reason about it, and what constitutes a pass.
Universal (10)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Abstract, introduction, and results internally consistent | blocker | heuristic | paper |
| Project is feasible given stated resources and timeline | blocker | judgment | proposal |
| Contribution is interesting to the target audience | warning | judgment | paper, proposal |
| Contribution is new relative to existing literature | warning | judgment | paper, proposal |
| Effect sizes reported with economic significance assessment | blocker | heuristic | paper |
| OLS/correlational papers address omitted variable bias | blocker | judgment | paper |
| Replication package reproduces all main results | blocker | deterministic | replication |
| Main results accompanied by robustness checks | warning | heuristic | paper |
| Standard errors clustered at the right level | blocker | heuristic | paper |
| Regression tables report number of observations | blocker | deterministic | paper |
Difference-in-Differences (5)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| DiD: Control group selection justified and robust to alternatives | warning | judgment | paper |
| DiD: Event study (dynamic effects) reported | blocker | deterministic | paper |
| DiD: Pre-trends visualization shown and plausible | blocker | heuristic | paper |
| DiD: Placebo/falsification test reported | warning | heuristic | paper |
| DiD: Staggered adoption uses heterogeneity-robust estimator | blocker | heuristic | paper |
Regression Discontinuity (3)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| RDD: Estimates robust to bandwidth choice | blocker | heuristic | paper |
| RDD: Pre-determined covariates smooth at cutoff | blocker | heuristic | paper |
| RDD: No manipulation of running variable (density test) | blocker | heuristic | paper |
Instrumental Variables (5)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| IV: Weak-instrument-robust inference used when F-statistic is borderline | warning | deterministic | paper |
| IV: Exclusion restriction explicitly argued | blocker | judgment | paper, proposal |
| IV: First-stage F-statistic reported and sufficient | blocker | deterministic | paper |
| IV: Monotonicity assumption explicitly argued | warning | judgment | paper |
| IV: Reduced form reported alongside IV estimates | warning | deterministic | paper |
Synthetic Control (3)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Synth: Donor pool selection justified | blocker | judgment | paper |
| Synth: In-space and/or in-time placebo tests reported | blocker | heuristic | paper |
| Synth: Single-unit design limitations acknowledged | warning | heuristic | paper |
Lab & Online Experiments (3)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Experiment: Attrition and differential attrition tested | blocker | heuristic | paper |
| Experiment: Baseline covariate balance table reported | blocker | deterministic | paper |
| Experiment: Power calculation reported or MDE stated | warning | heuristic | paper, proposal |
Field Experiments (4)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Field experiment: Spillover effects addressed | blocker | judgment | paper |
| Experiment: Attrition and differential attrition tested | blocker | heuristic | paper |
| Experiment: Baseline covariate balance table reported | blocker | deterministic | paper |
| Experiment: Power calculation reported or MDE stated | warning | heuristic | paper, proposal |
Theory (3)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Theory: All model assumptions stated explicitly | blocker | heuristic | paper |
| Theory: Economic intuition provided for main results | warning | judgment | paper |
| Theory: Main results formally proven, not just stated | blocker | deterministic | paper |
Machine Learning / Prediction (5)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| ML: Performance compared to a simple benchmark | warning | deterministic | paper |
| ML: No feature leakage from outcome or future data | blocker | heuristic | paper |
| ML: Performance metric appropriate for task and outcome distribution | warning | heuristic | paper |
| ML: Prediction claims not confused with causal inference | blocker | judgment | paper |
| ML: Model performance evaluated on held-out data | blocker | deterministic | paper |
Survey Methods (4)
| Test | Severity | Clarity | Scope |
|---|---|---|---|
| Survey: Key questions pre-tested or drawn from validated instruments | warning | judgment | paper, proposal |
| Survey: Measurement error in self-reported variables acknowledged | warning | judgment | paper |
| Survey: Response rate reported and non-response bias addressed | blocker | heuristic | paper |
| Survey: Complex sampling design accounted for in standard errors | blocker | deterministic | paper |