Free, research-grade, high-frequency U.S. equity data for academics and researchers. Documented, version-controlled, and updated weekly.
One-minute OHLCV bars for 1,391 U.S. equities and ETFs, from December 2002 through the present. Sourced from the consolidated tape (CTA/UTP) via PiTrading and IEX Exchange HIST.
Three cleaning versions so you can choose the level of processing appropriate for your research. Twenty-seven pre-computed academic variables per ticker per day. Full methodology documentation.
Updated every week. No subscription. No paywall. Licensed under CC BY 4.0.
# Python — load any ticker in seconds
import pandas as pd
df = pd.read_parquet("AAPL_clean.parquet")
print(df.head())
# datetime Open High Low Close Volume
# 2002-12-30 09:30:00 0.98 0.99 0.98 0.98 842900
# 2002-12-30 09:31:00 0.98 0.99 0.98 0.99 521400
# ...
Data as received from the source. No outlier removal, no gap-filling. Prices are split/dividend adjusted. 1,533,403,126 bars.
Best for: Market microstructure research, missingness analysis, studying the data itself.
Nine-step cleaning pipeline applied: outside-hours removal, non-positive prices, OHLC violations, duplicate bars, Brownlees-Gallo outlier filter. Gaps preserved. 1,533,014,567 bars.
Best for: Volatility estimation, spread measurement, jump detection — most empirical finance.
Clean data with LOCF gap-filling to produce a regular 390-bar daily grid (09:30–15:59 ET). Every bar flagged as original or filled. 2,342,519,726 bars.
Best for: Machine learning, backtesting systems, time-series models requiring regular grids.
Computed daily for each ticker in each cleaning version. Ready to use in your research.
Realized variance (1-min, 5-min), bipower variation, Parkinson range, Yang-Zhang OHLC
Roll (1984) implied spread, Corwin-Schultz (2012) high-low spread
First-order return AC(1), variance ratio VR(5), VR(10)
BNS z-statistic, jump indicators at 1% and 5% significance
Amihud illiquidity, daily dollar volume, share volume, observed trade count
Gap rate, observed/filled bar counts, longest gap, bars since last trade
Download individual tickers or pre-packaged bundles (S&P 500, Nasdaq 100, by sector). Click and go — no account needed for basic downloads.
Browse DownloadsProgrammatic access to any ticker, date range, and version. JSON, CSV, or parquet. Free API key with 300 requests/minute. Python, R, and Stata examples provided.
API DocsFull dataset dump — all 1,391 tickers, all versions, all timeframes. Parquet format. Updated weekly.
Full Dataset| Feature | HF Data Library | CRSP/TAQ | Yahoo Finance | Polygon.io |
|---|---|---|---|---|
| Price | Free | $25,000+/yr | Free | $199+/mo |
| Frequency | 1-minute bars | Tick-level | Daily only | 1-minute bars |
| Cleaning versions | 3 versions | 1 version | None | None |
| Cleaning documentation | Full pipeline | Minimal | None | None |
| Academic variables | 27 measures | None | None | None |
| Data quality scores | Per-ticker | No | No | No |
| REST API | Free | No | Unofficial | Paid |
| DOI / Citable | Zenodo DOI | No | No | No |
| License | CC BY 4.0 | Restrictive | ToS restricted | Commercial |
| Updated | Weekly (automated) | Quarterly | Daily | Real-time |