Data Connections

The real story isn't in any single dataset. It's in the links between them. These queries cross-reference public records that are rarely seen together.

How this works
1
We collect public data from 15+ federal sources — Congress.gov, Regulations.gov, FEC, Senate LDA, GovInfo, USAspending, and more.
2
We normalize names, IDs, and organizations across datasets. A senator's bioguide ID links their votes, trades, speeches, and campaign donors.
3
We run cross-reference queries that surface patterns no single dataset reveals on its own. Every query links to the live data so you can verify.
Influence & Testimony

Who testifies AND lobbies?

Organizations testify before Congress as expert witnesses while simultaneously paying lobbyists to influence the same committees. Both activities are legal and publicly disclosed — but the connection between them is almost never surfaced.

Hearings × Lobbying
Hearing Witnesses Who Also Lobby Congress
Organizations that testified as expert witnesses at congressional hearings while also paying lobbyists. They appear in the public record as neutral experts, but the lobbying record tells a different story.
hearing_witnesses lobbying_activities
2,500+ organizations overlap
Query live data
Data Sources
  • Hearing witnesses: GovInfo MODS XML metadata for congressional hearings. Each hearing lists witnesses with name and organization.
  • Lobbying clients: Senate Lobbying Disclosure Act (LDA) filings via lda.senate.gov API. Each quarterly LD-2 filing lists the client organization.
Matching Method
Exact case-insensitive string match: UPPER(TRIM(witness_organization)) = UPPER(TRIM(client_name)). This is a pre-computed materialized table (witness_lobby_overlap) rebuilt during each database update.
Limitations
Exact name matching means we miss cases where the same organization uses slightly different names across filings (e.g., "American Petroleum Institute" vs "API"). Fuzzy matching would increase coverage but also increase false positives. We chose precision over recall.
Comments × Lobbying
Organizations That Comment on Rules AND Lobby
When a company submits a public comment on an EPA rule while also paying a lobbyist on the same issue, that's dual-channel influence on the same regulatory process. Both are public record.
comment_details lobbying_activities
288+ organizations overlap (growing)
Query live data
Data Sources
  • Comment organizations: Regulations.gov API comment detail records. The organization field is self-reported by the commenter.
  • Lobbying clients: Senate LDA quarterly activity reports (LD-2 filings) with client_name.
Matching Method
Exact case-insensitive match: UPPER(TRIM(organization)) = UPPER(TRIM(client_name)). Materialized as commenter_lobby_overlap.
Limitations
Currently based on ~41K comment details (about 1% of 3.7M total comments). As we expand full comment text collection, this overlap count will grow significantly. Organization names are self-reported and unstandardized, so matches depend on commenters using their organization's exact legal name.
Money & Committees

Follow the money to the committee room

Committee assignments determine which industries a member of Congress oversees. Financial disclosures reveal what they trade and who funds them. We connect the two.

Committees × Stock Trades
Stock Trading by Committee Members
Members of Congress actively trading stocks while serving on committees that oversee related industries. Not necessarily illegal, but a pattern worth watching.
committee_memberships congress_members stock_trades
95,621 trades by sitting members
Query live data
Data Sources
  • Committee assignments: congress-legislators GitHub repository (unitedstates project). Maps bioguide_id to committee_id.
  • Stock trades: House Financial Disclosure PTR PDFs parsed via pdftotext, plus Senate eFD periodic transaction reports scraped from efdsearch.senate.gov. Both are government sources.
Matching Method
Members are linked via bioguide_id, the universal congressional identifier. This is deterministic — no fuzzy matching. The query joins committee_memberships to stock_trades through congress_members.
Limitations
85.5% of stock trades have a matched bioguide_id. The unmatched 14.5% are mostly candidates who filed disclosures but never took office. Committee assignments reflect current membership only; historical rotations are not tracked.
Committees × Campaign Finance
Top Donors to Committee Members
Which PACs and organizations fund the members of specific committees? Cross-references FEC contribution records with current committee assignments and bioguide IDs.
committee_memberships fec_crosswalk fec_contributions
4.4M contribution records linked
Query live data
Data Sources
  • Contributions: FEC bulk data from fec.gov S3 bucket. Committee-to-candidate contributions across all election cycles.
  • Member linkage: fec_candidate_crosswalk maps FEC cand_id to congressional bioguide_id (1,711 matched members).
  • Committees: congress-legislators current membership data.
Matching Method
Four-table join: committee_membershipsfec_candidate_crosswalk (via bioguide_id) → fec_contributions (via cand_id) → fec_committees (via cmte_id). Pre-computed as committee_donor_summary (threshold: $10,000+ total).
Limitations
The FEC-to-bioguide crosswalk covers about 1,711 members. Contributions are PAC-to-candidate only (not individual donors). Committee assignments are current session only — we don't track historical rotations, so a member's past committee work won't show here.
Legislative Activity & Trading

Trading around legislation

Members of Congress trade stocks and also give floor speeches, sponsor bills, and vote on legislation that can move markets. We connect the timing.

Floor Speeches × Stock Trades
Floor Speeches Within 7 Days of Trades
Members who gave floor speeches within a week of making stock trades. This doesn't prove anything — but the temporal proximity is a pattern that researchers and journalists need to be able to see.
stock_trades crec_speakers congressional_record
7 days trade-to-speech window
Query live data
Data Sources
  • Stock trades: House PTR PDFs + Senate eFD reports. Transaction dates, tickers, amounts.
  • Floor speeches: GovInfo Congressional Record (CREC) daily packages, 1994–present. MODS XML provides speaker bioguide IDs for 99.6% of entries.
Matching Method
Join stock_trades to crec_speakers via bioguide_id, then filter where ABS(julianday(transaction_date) - julianday(speech_date)) <= 7. Pre-computed as speeches_near_trades.
Limitations
Temporal proximity does not imply causation. Members give many speeches and make many trades — some overlap is expected by chance alone. We chose 7 days as a meaningful window, but this is an editorial choice. The speech content is not automatically matched to the traded stock's sector; users must evaluate relevance manually.
Committee Jurisdiction × Trades
Trading in the Sectors They Regulate
Members of Congress sit on committees with jurisdiction over specific industries. Using SEC EDGAR's SIC classifications for 2,027 traded tickers, we check whether members are trading stocks in sectors their committees regulate.
committee_memberships committee_jurisdiction (SIC ranges) ticker_sic (SEC EDGAR) stock_trades
2,027 tickers classified by SIC code via SEC EDGAR
Query live data
Data Sources
  • Committee jurisdiction mapping: A curated reference table (view it) that maps 27 congressional committees to the SIC code ranges under their primary jurisdiction.
  • Ticker SIC codes: Downloaded from SEC EDGAR — each company's CIK looked up via data.sec.gov/submissions/ to get its SIC industry classification. 2,027 traded tickers classified across 333 unique SIC codes.
  • Stock trades: Same House PTR + Senate eFD sources as above.
  • Committee assignments: congress-legislators GitHub data.
Matching Method
For each committee member, we look up their stock trades, classify each ticker by its SEC-assigned SIC code, then check if that SIC code falls within any of the committee's jurisdiction SIC ranges. The join is: CAST(ticker_sic.sic_code AS INTEGER) BETWEEN committee_sic_ranges.sic_start AND sic_end.
Limitations
SIC codes cover publicly traded companies only — mutual funds, ETFs, and ADRs don't have SIC classifications and are excluded. The committee-to-SIC mapping currently covers primary jurisdiction only; some committees (Appropriations, Budget, Foreign Affairs) are omitted because their jurisdiction is cross-cutting rather than sector-specific. The mapping reflects editorial judgment about which SIC ranges fall under each committee.
Lobbying ↔ Legislation
Which Bills Get Lobbied the Most?
We parse bill numbers (H.R., S., etc.) from the "specific issues" text in 2.7M lobbying activity reports, then match them to legislation records. See which bills attract the most lobbying and from how many different clients.
lobbying_activities lobbying_bills legislation
2.7M lobbying reports text-mined
Query live data
Data Sources
  • Lobbying activities: Senate LDA quarterly LD-2 filings. Each filing has a specific_issues free-text field describing lobbying activities.
  • Legislation: Congress.gov BILLSTATUS bulk XML. 167K+ bills across Congresses 93–119.
Matching Method
Regex extraction of bill references from specific_issues text. Patterns: H.R., S., H.J.Res., S.J.Res., H.Con.Res., S.Con.Res., H.Res., S.Res. followed by a number. Congress number inferred from filing year using (year - 1789) // 2 + 1. Matched to legislation table via constructed bill_id.
Limitations
Not all lobbying filings reference specific bills — many describe issues generally. Congress number inference from filing year may be off for filings near session boundaries. The regex doesn't catch informal references like "the infrastructure bill" or bill names without numbers.
Coming Soon

More connections in progress

We're actively building more cross-references. As our comment detail coverage expands and new datasets come online (hearings, nominations, GAO reports), new patterns will emerge.

Lobbying ↔ Congress
The Revolving Door: Former Members Who Lobby
Lobbying filings include "covered positions" — when a lobbyist previously held a government role. Cross-referencing with congress_members reveals which former legislators now lobby their former colleagues.
lobbying_lobbyists covered_position filter congress_members
~199 former members matched to lobbying filings
Query live data
Data Sources
  • Lobbying disclosures: Senate LDA filings include a covered_position field where lobbyists disclose former government roles. 1.85M records have this field populated.
  • Congress members: 12,700+ historical and current members with bioguide IDs.
Matching Method
We filter lobbying records where the covered_position text indicates the lobbyist was a former member of Congress (matching patterns like "U.S. Senator", "U.S. Representative", "Member of Congress", "Former Member"). Then we join on UPPER(full_name) = lobbyist_name. For ambiguous names (e.g., multiple "Thomas Davis" in history), we pick the most recent member.
Limitations
Name matching misses ~50–80 members who go by nicknames in lobbying filings (e.g., "Billy Tauzin" vs. "William Tauzin", "Dick Gephardt" vs. "Richard Gephardt"). We plan to add a manual nickname mapping table to capture these. The position text is free-form and inconsistent, so some staffers who list their boss's title may be incorrectly included or excluded.
Coming Soon
GAO Oversight of Specific Laws
GAO reports reference specific public laws and U.S. Code sections. Matching these to legislation reveals which laws have been subject to the most government accountability scrutiny.
gao_reports legislation
Coming Soon
Foreign Agents Testifying at Hearings
FARA registrants are organizations representing foreign governments. Do any of them also appear as witnesses at congressional hearings? The data to answer that is in two tables.
fara_registrants hearing_witnesses