How it works

In this page, you will find more information about the contents of the Disinfodex database and about the ways to search and read the database. In addition, you will find a FAQ addressing questions such as our source inclusion process, frequency of updating, and other considerations.

What you will find in the Disinfodex database

Disinfodex.org indexes, aggregates and makes it easier to search and analyze publicly available information about disinformation campaigns posted since 2017. As of Autumn 2020, it focuses on information released by the following technology companies:

  • Facebook
  • Twitter
  • Google/YouTube
  • Reddit

For these companies, Disinfodex typically indexes information posted on the company’s official channels or by its official representatives on social media, which specifically pertains to actions taken against networks of accounts, pages, etc. driving disinformation campaigns.

In addition, the database includes information released by the following organizations’ open source investigation teams (with more to come):

  • Graphika
  • DFR Lab
  • Stanford Internet Observatory (SIO)

For these organizations, Disinfodex will typically index in-depth reports published about actions taken by one or more or the technology companies mentioned above against disinformation networks, as opposed to other reports these organizations might publish (for instance, reports analyzing trends in the disinformation landscape, or analyzing operations that are not connected to actions taken by one or more of the platforms outlined above).

We aim to make it easy for users to see when such reports are connected and relate to the same networks.

How to search and use the Disinfodex database

This section covers the different views of the database, the search box and filters for the database, and how to download a CSV version of the database.

Card view and Table view

As of May 2021, the Disinfodex database indexes action taken against more than 250 networks spanning more than 340 disclosures since 2017.

  • A “network” means: groups of accounts or entities coordinating in ways that are deceptive or inauthentic, typically resulting in violations of the policies of the platforms who report on these networks.
  • A “disclosure” means: the ways by which a platform communicates on actions taken against one or more networks. Typically, disclosures take the form of blog posts in which platforms list actions taken against a number of networks and provide some information about their origin, activity, and/or policy violations.

⇒ One network may be involved in multiple disclosures. For instance, it may be that a platform notes that a disclosure it makes on date t+1 pertains to a network it already took action against on date t, but which has since deployed new tactics or renewed its efforts to create assets on the platform.

Networks and disclosures can be explored in two ways: a card view (default) and a table view.. More on each view:

Each line in the Table view (default view) represents a network, which may be associated to one or more disclosures by one or more platforms. The default view is sorted by chronological order, starting with the networks that have been most recently featured in a disclosure – and where multiple entities have reported on a network (e.g. one platform and one third party investigator), they will be highlighted in the “source” column.. Clicking on a line will lead to the card view for further exploration of a network. 

The Card view provides an easily readable recap of actions taken against each network in the database. Upon opening a card, you will see key information about the network and actions taken against it, and have the option to open more detailed descriptions from each of the entities that reported on the network   

Searching Disinfodex

Both the Card and Table views are meant to be easily searched or filtered so you can find the exact information you need.

You can search each view by typing the keyword(s) you are interested in directly in the search bar that’s above the database. All relevant entries will be pulled dynamically.

Alternatively, you can use the filters displayed above the table to focus on specific companies, dates, named entities, types of removals, or countries.

Downloading Disinfodex

You can download a CSV of the database by clicking the ‘download CSV’ link that appears at the bottom left of the Table view. Downloading Disinfodex is free, as is using information from the database for your research or journalistic projects – we simply ask that you cite us if you do.

How to read the Disinfodex database

The Disinfodex database codes public disclosures alongside a number of attributes, for the purpose of making them easier for you to search and analyze. This section outlines what each of these attributes represent, starting with the arbitrary Network Codes that we generate for each network in the database.

Reading Network Codes in the Disinfodex database:

Disinfodex indexes actions or findings about disinformation networks, by which we mean groups of accounts or entities coordinating in ways that are deceptive or inauthentic, typically resulting in violations of the policies of the platforms who report on these networks. Each platform and investigator may have different criteria for determining what constitutes a network and Disinfodex reflects these determinations.

To that end, for each distinct network detailed in disclosures or reports that we index, we generate an arbitrary network identifiers, or “Network ID”, which is structured as [ENTITY]-[COUNTRY]-[DATE] – where:

  • ENTITYrefers to a 2 or 3 letters identifier for the entity releasing information about this network. The current list of entities includes:
  • DFR: DFR Lab
  • GRA: Graphika
  • GY: Google/YouTube
  • FB: Facebook
  • RD: Reddit
  • SIO: Stanford Internet Observatory
  • TW: Twitter

  • COUNTRYrefers to the country of origin of a network, referred to via its 2-letter ISO country code as indexed here (for instance: FR for France, EG for Egypt). The notion of ‘origin’ should be understood here as: the country that the platform and investigators have deemed that the network was most likely operated from – which does not always mean that the local government or that local actors were involved. For cases where the country of origin is unclear in the reporting, we mark the country section as “UNKNOWN’. For instance, GRA-UNKNOWN would refer to a network reported by Graphika whose country of origin is unknown. For cases where there are multiple countries of origin, we include all of their 2-letter codes in a sequence. For instance, GY-EGFR would refer to a network reported by Google/YouTube originating from Egypt and France.

  • “DATE” refers to the month and year of initial reporting of this network by the entity who disclosed it. For instance, DFR-EE-0519 would refer to a network reported on by DFR Lab for the first time in May 2019. If an entity later reports more information or actions taken about the same network, we will continue to use the same network ID. There are occasions for which an entity first reports on multiple separate networks originating from the same country at the same date. In that case, we add letters (A, B, C…) at the end of network IDs to differentiate these networks. For instance, DFR-EE-0519-A and DFR-EE-0519-B would refer to two separate networks reported on by DFR Lab for the first time in May 2019.

Attributes in the Disinfodex database

The database comprises the following attributes – all of which are available in its downloadable CSV version; some may be removed from the online version for legibility:

  • Date: the date of publication of the report or disclosure
  • Source: the name of the organization releasing a report or disclosure
  • Network ID: the network identifiers of all networks related to this disclosure or report (see Network ID, above, for more information on identifiers).
  • This means that if, for instance, Twitter reports actions taken against a network of accounts that Graphika has also covered in a separate, third party report, two network IDs will be displayed – one from Twitter, one from Graphika.
  • The purpose of showing all network IDs is to help third Disinfodex users easily understand what connections exist between different public reports.
  • Destination country: any information shared about the country the network targeted (sometimes it may be more than one country; and sometimes it may be simply information about the languages the network used).
  • Origin country: any information shared about the country in which the network was operated from (here too it may be more than one country).
  • Named entities: any information shared about a specific entity connected to a campaign (e.g. Government of country X, PR agency Y. There may be more than one entity.)
  • Main URL: the primary URL associated with a disclosure or report (usually a blog post)
  • Secondary URL: any other relevant URL associated with a disclosure or report (usually a PDF)
  • Description: Long-form text providing information about the network and actions taken.
  • On average, reports from open source investigators tend to be longer than those of platforms, and as such, we include a smaller portion of those directly in disinfodex and recommend clicking through to view full reports if they are of interest.
  • Other notes: any other information relevant about the network that does not fit in the prior categories (e.g. advertising spend, when available)
  • Screenshots: links to screenshots related to this disclosure, as provided by the platform or open source investigator. Note: screenshots that are published in PDF reports are not included.

In addition, the following attributes are captured specifically for reports provided by technology companies:

  • Removal type: in the case of platforms, this covers what they say they took action on (e.g. Facebook, Instagram, Twitter, or YouTube accounts).
  • Removal number: The numbers of removals that took place.
  • Engagement language: what the platform says, if anything, about the amount of engagement with the content or accounts that were removed (e.g. ‘more than X views’, ‘less than Y followers’)
  • Engagement number: raw numbers from the engagement language column, for ease of analysis by researchers.
  • Policy violations: policy violation that resulted in the action taken by the platform.
  • Archive URL: link to an archive of the content that was removed, where available

FAQ

How did you select the organizations whose content you index in the Disinfodex database?

We included the disclosures of four major technology companies that have provided information about actions taken against disinformation campaigns on a regular basis since 2017. In addition, we are in the process of including open source investigators, selected with guidance from the Carnegie Endowment’s Partnership for Countering Influence Operations, who meet certain standards of methodology and transparency.

How often is the Disinfodex database updated?

We aim to include new updates within days of their release by the organizations whose content we index.

Who writes the content in the Disinfodex database?

All the content in the database comes directly from the entities that are indexed. Whether it is for raw numbers or full text (e.g. descriptions; naming countries or organizations…), we simply replicate the wording of the entity.

I have seen a mistake in the database, how can I call it out?

Thanks for spotting it, please let us know by reaching out at teamdisinfodex@gmail.com.

What is your funding structure?

Disinfodex is a small-scale and mostly volunteer-led project. As of November 2020, our funding comes from the Miami Foundation and the Carnegie Endowment for International Peace’s Partnership for Countering Influence Operations. We have also benefited from support from the Harvard Berkman Klein Center’s Assembly program.