Data source

All data is extracted from IRS electronic filings (e-files) published on Amazon S3 at s3://irs-form-990/. This is the same public dataset used by ProPublica, GuideStar, and academic researchers. The IRS releases new batches periodically throughout the year, typically monthly.

DataDawn currently covers filings from tax years 2015 through 2025, with earlier years having sparser coverage due to lower e-filing adoption rates before the IRS mandate took effect. Coverage is most complete for tax years 2016 onward.

Form types parsed

Form Filed by What we extract
990 Public charities (revenue > $200K or assets > $500K) Revenue, expenses, assets, officers, contractors, program activities
990-EZ Small public charities (revenue < $200K) Revenue, expenses, assets, officers
990-PF Private foundations Revenue, assets, grants paid (Schedule F/G), officers, investments, contributors
990-T Orgs with unrelated business income Basic filing data

Database tables

Raw filings are parsed into 12 structured tables. No editorial filtering is applied — if the IRS published it, we parsed it.

Table Records Description
returns ~5.0M Core filing data: org name, EIN, state, revenue, expenses, assets, return type, tax year
grants ~12.1M Foundation grants from 990-PF filings: recipient, amount, purpose, date, location
schedule_i_grants ~1.25M Schedule I disbursements (DAF sponsors and public charities): recipient, amount, purpose
bmf ~1.9M IRS Business Master File: NTEE codes, ruling dates, asset codes, subsection codes
officers varies Officers, directors, trustees, and key employees with compensation
contractors varies Independent contractors receiving > $100K
contributors varies Contributors to private foundations (from 990-PF Schedule B)
top_employees varies Highest-compensated employees
investments varies Foundation investments (from 990-PF Part II)
program_investments varies Program-related investments (PRIs)
capital_gains varies Capital gains and losses from 990-PF
program_activities varies Program service accomplishments and expenses

Extraction process

IRS e-files are XML documents following IRS-defined schemas that have evolved across filing years. DataDawn's extraction scripts handle schema variations across years, mapping different XML element paths to consistent database columns.

Pipeline steps

  1. Download — New XML batches are synced from the IRS S3 bucket. Batch completion is tracked with marker files to prevent reprocessing.
  2. Parse — Three extraction scripts process 990/990-EZ returns, 990-PF detail filings (grants, investments, contributors), and Schedule I grants respectively.
  3. Deduplicate — Filings are keyed on a combination of EIN and object ID to prevent duplicate insertion from overlapping IRS releases.
  4. Index — Full-text search indexes (SQLite FTS5) are built on organization names and grant recipient names for instant search.
  5. Publish — The public database is built from an allowlist of raw data tables. No analysis or curated tables are included in the public release.

DAF grant identification

Donor-advised fund (DAF) disbursements are extracted from Schedule I of 990 filings submitted by DAF sponsor organizations. DataDawn identifies and parses grants from major DAF sponsors including Vanguard Charitable, Fidelity Charitable, Schwab Charitable, National Philanthropic Trust, Silicon Valley Community Foundation, and others.

These are grants made by DAF sponsors to recipient nonprofits. They do not identify the individual donors who recommended the grants — that information is not available in any public filing. The schedule_i_grants table includes the sponsoring organization's name and EIN, the recipient organization, the amount, and the stated purpose.

Why this matters: DAF grants represent a large and growing share of philanthropic funding, but because they flow through intermediary sponsors, they are difficult to trace using traditional 990-PF data alone. Combining 990-PF grants with Schedule I DAF data provides a more complete picture of institutional funding flows.

Known limitations

E-file only

DataDawn only includes electronically filed returns. Paper filings — roughly one-third of all 990s — are not included. E-filing rates have increased over time, so recent years have better coverage than earlier years.

Filing lag

Organizations file 990s after their fiscal year ends, and the IRS publishes e-files on a rolling basis. The most recent tax year will always have incomplete data. For example, organizations with a December fiscal year-end filing for tax year 2024 may not appear until mid-to-late 2025.

Sparse early years

Tax years 2014–2015 have limited coverage because the IRS e-filing mandate was not yet in full effect. These records are spillover from later IRS batch releases, not complete filing years. Coverage is most reliable from 2016 onward.

Grant dates

Foundation grant dates in the grants table come from the filer's reported grant date field. Some foundations report the approval date, others the payment date, and some leave it blank. Year-level analysis is more reliable than month-level.

DAF attribution

DAF disbursements identify the sponsor organization (e.g., Vanguard Charitable) but not the individual donor who recommended the grant. Multiple donors may fund grants to the same recipient through the same sponsor. It is not possible to determine from public filings alone who directed a specific DAF grant.

Name matching

Organization names are as reported on the filing. The same organization may appear under slightly different names across years or filings (e.g., "ACLU" vs "American Civil Liberties Union"). DataDawn does not perform entity resolution — search results should be verified by checking the EIN.

Amount discrepancies

Financial figures reflect what was reported on the filing. Amended returns may not overwrite original filings. In rare cases, both an original and amended filing for the same tax year may appear in the database.

Update schedule

DataDawn updates its database as the IRS publishes new e-file batches, typically on a monthly cycle. Updates are processed through an automated pipeline that syncs new filings, parses them, and publishes the updated public database.

The current dataset was built in March 2026 from all available IRS e-file releases as of that date. Record counts on the site reflect the most recent published version of the database.

Independence statement

DataDawn is an independent project with no institutional affiliations. It receives no funding from any nonprofit, foundation, or organization represented in its datasets. All data is sourced exclusively from public records filed with federal government agencies.

DataDawn does not endorse, evaluate, or rank any organization. The platform provides raw data and search tools. Interpretation and analysis are the responsibility of the user.

Corrections and feedback

If you find a data quality issue, parsing error, or have questions about the methodology, you can reach DataDawn at [email protected].