Data source
All data is extracted from IRS electronic filings (e-files) published on
Amazon S3 at s3://irs-form-990/. This is the same public dataset
used by ProPublica, GuideStar, and academic researchers. The IRS releases new
batches periodically throughout the year, typically monthly.
DataDawn currently covers filings from tax years 2015 through 2025, with earlier years having sparser coverage due to lower e-filing adoption rates before the IRS mandate took effect. Coverage is most complete for tax years 2016 onward.
Form types parsed
| Form | Filed by | What we extract |
|---|---|---|
| 990 | Public charities (revenue > $200K or assets > $500K) | Revenue, expenses, assets, officers, contractors, program activities |
| 990-EZ | Small public charities (revenue < $200K) | Revenue, expenses, assets, officers |
| 990-PF | Private foundations | Revenue, assets, grants paid (Schedule F/G), officers, investments, contributors |
| 990-T | Orgs with unrelated business income | Basic filing data |
Database tables
Raw filings are parsed into 12 structured tables. No editorial filtering is applied — if the IRS published it, we parsed it.
| Table | Records | Description |
|---|---|---|
| returns | ~5.0M | Core filing data: org name, EIN, state, revenue, expenses, assets, return type, tax year |
| grants | ~12.1M | Foundation grants from 990-PF filings: recipient, amount, purpose, date, location |
| schedule_i_grants | ~1.25M | Schedule I disbursements (DAF sponsors and public charities): recipient, amount, purpose |
| bmf | ~1.9M | IRS Business Master File: NTEE codes, ruling dates, asset codes, subsection codes |
| officers | varies | Officers, directors, trustees, and key employees with compensation |
| contractors | varies | Independent contractors receiving > $100K |
| contributors | varies | Contributors to private foundations (from 990-PF Schedule B) |
| top_employees | varies | Highest-compensated employees |
| investments | varies | Foundation investments (from 990-PF Part II) |
| program_investments | varies | Program-related investments (PRIs) |
| capital_gains | varies | Capital gains and losses from 990-PF |
| program_activities | varies | Program service accomplishments and expenses |
Extraction process
IRS e-files are XML documents following IRS-defined schemas that have evolved across filing years. DataDawn's extraction scripts handle schema variations across years, mapping different XML element paths to consistent database columns.
Pipeline steps
- Download — New XML batches are synced from the IRS S3 bucket. Batch completion is tracked with marker files to prevent reprocessing.
- Parse — Three extraction scripts process 990/990-EZ returns, 990-PF detail filings (grants, investments, contributors), and Schedule I grants respectively.
- Deduplicate — Filings are keyed on a combination of EIN and object ID to prevent duplicate insertion from overlapping IRS releases.
- Index — Full-text search indexes (SQLite FTS5) are built on organization names and grant recipient names for instant search.
- Publish — The public database is built from an allowlist of raw data tables. No analysis or curated tables are included in the public release.
DAF grant identification
Donor-advised fund (DAF) disbursements are extracted from Schedule I of 990 filings submitted by DAF sponsor organizations. DataDawn identifies and parses grants from major DAF sponsors including Vanguard Charitable, Fidelity Charitable, Schwab Charitable, National Philanthropic Trust, Silicon Valley Community Foundation, and others.
These are grants made by DAF sponsors to recipient nonprofits. They do not
identify the individual donors who recommended the grants — that information is not
available in any public filing. The schedule_i_grants table includes the
sponsoring organization's name and EIN, the recipient organization, the amount, and
the stated purpose.
Why this matters: DAF grants represent a large and growing share of philanthropic funding, but because they flow through intermediary sponsors, they are difficult to trace using traditional 990-PF data alone. Combining 990-PF grants with Schedule I DAF data provides a more complete picture of institutional funding flows.
Known limitations
E-file only
DataDawn only includes electronically filed returns. Paper filings — roughly one-third of all 990s — are not included. E-filing rates have increased over time, so recent years have better coverage than earlier years.
Filing lag
Organizations file 990s after their fiscal year ends, and the IRS publishes e-files on a rolling basis. The most recent tax year will always have incomplete data. For example, organizations with a December fiscal year-end filing for tax year 2024 may not appear until mid-to-late 2025.
Sparse early years
Tax years 2014–2015 have limited coverage because the IRS e-filing mandate was not yet in full effect. These records are spillover from later IRS batch releases, not complete filing years. Coverage is most reliable from 2016 onward.
Grant dates
Foundation grant dates in the grants table come from the filer's reported
grant date field. Some foundations report the approval date, others the payment date,
and some leave it blank. Year-level analysis is more reliable than month-level.
DAF attribution
DAF disbursements identify the sponsor organization (e.g., Vanguard Charitable) but not the individual donor who recommended the grant. Multiple donors may fund grants to the same recipient through the same sponsor. It is not possible to determine from public filings alone who directed a specific DAF grant.
Name matching
Organization names are as reported on the filing. The same organization may appear under slightly different names across years or filings (e.g., "ACLU" vs "American Civil Liberties Union"). DataDawn does not perform entity resolution — search results should be verified by checking the EIN.
Amount discrepancies
Financial figures reflect what was reported on the filing. Amended returns may not overwrite original filings. In rare cases, both an original and amended filing for the same tax year may appear in the database.
Update schedule
DataDawn updates its database as the IRS publishes new e-file batches, typically on a monthly cycle. Updates are processed through an automated pipeline that syncs new filings, parses them, and publishes the updated public database.
The current dataset was built in March 2026 from all available IRS e-file releases as of that date. Record counts on the site reflect the most recent published version of the database.
Independence statement
DataDawn is an independent project with no institutional affiliations. It receives no funding from any nonprofit, foundation, or organization represented in its datasets. All data is sourced exclusively from public records filed with federal government agencies.
DataDawn does not endorse, evaluate, or rank any organization. The platform provides raw data and search tools. Interpretation and analysis are the responsibility of the user.
Corrections and feedback
If you find a data quality issue, parsing error, or have questions about the methodology, you can reach DataDawn at [email protected].