> For the complete documentation index, see [llms.txt](https://docs.codefordemocracy.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.codefordemocracy.org/data/methodology.md).

# Methodology

This page documents the data sources, update frequencies, and processing methodology used to create Code for Democracy's data.&#x20;

{% hint style="info" %}
You can also [view the status](https://api.codefordemocracy.org/view/status/) of each dataset to better understand its coverage and update cadence.
{% endhint %}

### Government Data

We believe that open data provided by the government is still the best place to start for any search. As such, these datasets are core to our platform:

#### Campaign Finance <a href="#campaign-finance" id="campaign-finance"></a>

Data from the [FEC's bulk data download](https://www.fec.gov/data/browse-data/?tab=bulk-data) is indexed daily, along with financial reports and more detailed Schedule A data from the [FEC API](https://api.open.fec.gov/developers/#/). We use the bulk data as the primary source for our campaign finance data because it is processed by the FEC before release, and therefore we consider it the cleanest source. However, in many cases, there may be a lag between when raw data is reported to the FEC and when the bulk data is available.

#### Lobbying Disclosures <a href="#social-media" id="social-media"></a>

Lobbying activity and related contribution reports are continuously indexed from both the [House](https://disclosurespreview.house.gov/) and [Senate](https://lda.senate.gov/system/public/) websites. For data from the House, we ingest the data by paging through the front-end website. For the data from the Senate, we ingest the data directly from their API. Therefore, it is very possible that data from the House is less comprehensive than data from the Senate.

#### Tax Documents <a href="#social-media" id="social-media"></a>

A subset of fields from the IRS 990, IRS 990EZ, and IRS 990PF filings are continuously indexed from the [AWS XML mirror](https://docs.opendata.aws/irs-990/readme.html). We use the provided index listings of available filings for each year in order to page through the individual XML filings.

### Narrative Data <a href="#news" id="news"></a>

In additional to traditional open data sources, we also ingest a variety of datasets that are helpful for understanding the type of narratives occurring in political discourse:

#### News

News articles are indexed twice each day from the news sources rated by [Allsides](http://allsides.com/) and [Media Bias/Fact Check](https://mediabiasfactcheck.com/). Although we attempt to index all articles from each news source, in reality our coverage is should be thought of as a collection of "front-page" articles.

#### Facebook Ads

We also index all data related to "Issues, Elections or Politics" from the [Facebook Ads Library](https://www.facebook.com/ads/library/) on a continual basis. Our data comes from the Facebook API, and therefore it should be an exact mirror of the data available in the Ads Library. However, the universe of data available here is dependent on the accuracy of Facebook's own classification algorithms.

#### Tweets <a href="#social-media" id="social-media"></a>

We continuously index all tweets from a core group of Twitter users that are relevant political candidates, commentators, activists, fact checkers, or journalists. This is our least comprehensive dataset and is subject a multitude of potential misattribution and latency issues, so it should be used for exploratory purposes only.

{% hint style="info" %}
Our data pipeline is open source! See the [Data](https://github.com/codefordemocracy/data) repository on GitHub for details on how we are ingesting and processing each data source.
{% endhint %}