This text dataset includes 3,481 social media user comments posted in response to political news posts and videos on Twitter, YouTube, and Reddit in August, 2021. The dataset also includes MTurk workers’ annotations of these comments as hateful, offensive, and/or toxic; and codes assigned by researchers describing various rhetorical dimensions of these comments.
Details
Title
HOT Speech: Comments from Political News Posts and Videos that were Annotated for Hateful, Offensive, and Toxic Content
Creator
Wu, Siqi Data Collector (University of Michigan School of Information) Schöpke-Gonzalez, Angela Researcher (University of Michigan School of Information) Kumar, Sagar Researcher (Northeastern University) Hemphill, Libby Supervisor (University of Michigan School of Information) Resnick, Paul Principal Investigator (University of Michigan School of Information)
Access Rights
This dataset has two levels of access: Public and Login Required.
Public files can be downloaded directly from the dataset record. Typically, documentation files such as READMEs and codebooks are made public.
Login Required files can be downloaded directly from the dataset record, once you have created and logged in to your SOMAR account. By downloading data files you agree to ICPSR’s Terms of Use.
Funding Information
University of Michigan School of Information
Citation
Wu, Siqi, Schöpke-Gonzalez, Angela, Kumar, Sagar, Hemphill, Libby, and Resnick, Paul. HOT Speech: Comments from Political News Posts and Videos that were Annotated for Hateful, Offensive, and Toxic Content. Inter-university Consortium for Political and Social Research [distributor], 2023-04-20. https://doi.org/10.3886/45fc-9c8f
Collection Modes
application programming interface (API) web scraping
Data Formats
text
Additional Notes
collectionmethods.pdf describes how we collected and assembled this dataset.
dataset.csv contains all 3,481 comments, MTurk worker annotations, and researcher codes where applicable.
codebook.csv contains detailed descriptions of each code and all possible values for each code.