Description
These data explore social media platforms’ shortcomings when it comes to white supremacist speech and how it differs from general or nonextremist speech, and recommends ways to improve automated hate speech identification methods.
Data include 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API. The following files are included:
Data include 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API. The following files are included:
- stormfront_posts.txt: one post per line, no post metadata
- reddit_posts.txt: one comment per line, no comment metadata
- stormfront_post_data_processed.json.gz: preprocessed posts from Stormfront, includes post metadata
- reddit_sample.csv.gz: preprocessed comments from Reddit, includes comment metadata