Description

These data explore social media platforms’ shortcomings when it comes to white supremacist speech and how it differs from general or nonextremist speech, and recommends ways to improve automated hate speech identification methods.
Data include 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API. The following files are included:
  • stormfront_posts.txt: one post per line, no post metadata
  • reddit_posts.txt: one comment per line, no comment metadata
  • stormfront_post_data_processed.json.gz: preprocessed posts from Stormfront, includes post metadata
  • reddit_sample.csv.gz: preprocessed comments from Reddit, includes comment metadata
Twitter data used in the report is not available for public reuse because of Twitter's terms of service and our data use agreement with VOX-Pol.

Details

PDF

Files

Statistics

from
to
Export
Download Full History