Data Sources

Code for Democracy's products share the same backend data source. Multiple data sources are ingested, cleaned, processed, and combined in order to build this underlying dataset.

This page documents the data sources, update frequencies, and processing methodology used to create Code for Democracy's data. You can also view the status of each dataset to better understand its coverage and update cadence.

News

News articles are indexed twice each day from the news sources rated by Allsides and Media Bias/Fact Check.

Social Media

We continuously index all tweets from a core group of Twitter users that are relevant political candidates, commentators, activists, fact checkers, or journalists.

We also index all data related to "Issues, Elections or Politics" from the Facebook Ads Library on a continual basis.

Campaign Finance

Data from the FEC's bulk data download is indexed daily, along with financial reports from the FEC API.

Lobbying Disclosures

Lobbying activity and related contribution reports are continuously indexed from both the House and Senate websites.

IRS 990s

A subset of fields from the IRS 990, IRS 990EZ, and IRS 990PF filings are continuously indexed from the AWS XML mirror.

Our data pipeline is open source! See the Data repository on GitHub for details on how we are ingesting and processing each data source.