COVID2020 dataset provides a new, high-volume COVID-19 tweet dataset. It was collected from March 2020 to November 2020, covering eight months in the first year of the pandemic. The list of tracked COVID-19 keywords is obtained from "Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse about the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance (2020)". Those keywords include not only generic terms such as "corona virus", "covid", but also non-pharmaceutical interventions such as "lockdown", "n95", and "social distancing."
This dataset is comprised of tweet IDs.
Title
The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries
Variant Title
COVID2020 Twitter dataset
Issued Date
2023-08-03
Version
v1
Status
Published
Funding Information
Air Force Office of Scientific Research, FA2386-20-1-4064
Citation
Wu, Siqi, Yang, Cai, and Xie, Lexing. The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries. Inter-university Consortium for Political and Social Research [distributor], 2023-08-03. https://doi.org/10.3886/jy2x-qc10
Time Period
2020-03, 2020-11
Collection Date
2020-03, 2020-11
Geographic Coverage
United States
United Kingdom
Australia
Canada
Germany
Spain
France
Turkey
Platform
Twitter
Collection Modes
application programming interface (API)
Data Formats
web platform data
text
Purpose
We want to understand on Twitter, during the pandemic,
RQ1: For a given country, in terms of political leaning, what is the breadth of readership for the media outlets there?
RQ2: For a given media, in terms of political leaning, how does its audience profile vary across different countries?
Design
Data were collected from the Twitter filtered streaming API.
To reduce the computational load, we processed one week’s data in every two weeks. The resulting dataset contains 18 calendar weeks over an eight-month period (Mar-Nov, or week 13 to week 47 in 2020). We experienced several server glitches during data collection, and lost data for two entire weeks (week 27, 29) and another four days (in week 17, 31). In total, we obtained 999,040,035 COVID-19 tweets posted by 62,687,121 users.
Universe
Geolocated Twitter users in the following countries: United States, United Kingdom, Australia, Canada, Germany, Spain, France and Turkey
Sampling
nonprobability.availability