LogoLogo
  • Introduction
  • Tools
    • Overview
    • Workflows
    • Recipes
    • Tutorials
  • Data
    • Methodology
    • API
  • Links
    • GitHub
Powered by GitBook
On this page
  • Government Data
  • Narrative Data

Was this helpful?

  1. Data

Methodology

PreviousTutorialsNextAPI

Last updated 3 years ago

Was this helpful?

This page documents the data sources, update frequencies, and processing methodology used to create Code for Democracy's data.

You can also of each dataset to better understand its coverage and update cadence.

Government Data

We believe that open data provided by the government is still the best place to start for any search. As such, these datasets are core to our platform:

Campaign Finance

Data from the is indexed daily, along with financial reports and more detailed Schedule A data from the . We use the bulk data as the primary source for our campaign finance data because it is processed by the FEC before release, and therefore we consider it the cleanest source. However, in many cases, there may be a lag between when raw data is reported to the FEC and when the bulk data is available.

Lobbying Disclosures

Lobbying activity and related contribution reports are continuously indexed from both the and websites. For data from the House, we ingest the data by paging through the front-end website. For the data from the Senate, we ingest the data directly from their API. Therefore, it is very possible that data from the House is less comprehensive than data from the Senate.

Tax Documents

A subset of fields from the IRS 990, IRS 990EZ, and IRS 990PF filings are continuously indexed from the . We use the provided index listings of available filings for each year in order to page through the individual XML filings.

Narrative Data

In additional to traditional open data sources, we also ingest a variety of datasets that are helpful for understanding the type of narratives occurring in political discourse:

News

News articles are indexed twice each day from the news sources rated by and . Although we attempt to index all articles from each news source, in reality our coverage is should be thought of as a collection of "front-page" articles.

Facebook Ads

Tweets

We continuously index all tweets from a core group of Twitter users that are relevant political candidates, commentators, activists, fact checkers, or journalists. This is our least comprehensive dataset and is subject a multitude of potential misattribution and latency issues, so it should be used for exploratory purposes only.

We also index all data related to "Issues, Elections or Politics" from the on a continual basis. Our data comes from the Facebook API, and therefore it should be an exact mirror of the data available in the Ads Library. However, the universe of data available here is dependent on the accuracy of Facebook's own classification algorithms.

Our data pipeline is open source! See the repository on GitHub for details on how we are ingesting and processing each data source.

view the status
FEC's bulk data download
FEC API
House
Senate
AWS XML mirror
Allsides
Media Bias/Fact Check
Facebook Ads Library
Data