Have questions about SOMAR's data? See the following resources:
The Social Media Archive (SOMAR) at the Inter-university Consortium for Political and Social Research (ICPSR) is a collection of public and restricted data from various social media platforms organized and stored for research and analysis purposes. With their data available to the community, SOMAR aims to help researchers and community members better understand social media behavior and trends. In addition, the data can inform the development of new technologies and services.
SOMAR will provide access to social media data and develop a robust set of wraparound services, including training in social media data use and learning opportunities for the community.
Public data is information available to anyone without any restrictions on access.
Restricted data is information not available to the general public and may only be accessed by authorized users. To gain access to this type of data, a data user must complete a restricted data application.
The SOMAR project will democratize access to some of the most consequential information in contemporary society. By providing a reliable, unbiased resource to data users everywhere, ICPSR and SOMAR foster clarity and transparency during a time in which these qualities seem ever scarcer. Much of SOMAR's data will be available through approved restricted data applications, and the data will be accessed through a virtual data enclave.
The privacy of users is a top priority for SOMAR. Therefore, all data in the archive are de-identified, meaning it has been stripped of any information that could be used to identify individual users. In addition, access to restricted data is tightly controlled, and only authorized users who have agreed to strict confidentiality terms are granted access. Please visit our privacy policy for more information.
ICPSR and SOMAR are experienced with handling data with the utmost confidentiality and privacy. Stringent protections are in place for securing and accessing sensitive data and ensuring that any analyses of SOMAR data do not reveal sensitive information about individuals. This attention to ethical data use is irreplaceable when it comes to the data of millions of social media users.
Information on obtaining restricted data can be found on their respective study home pages. When you click the "Apply for Restricted Data" button, you will find instructions for accessing and preparing the restricted data application specific to that dataset(s).
Before you begin filling out the paperwork, you should know the following:
SOMAR does not discourage the use of restricted data. In fact, we've put a lot of effort into building systems that make these data available to data users. We are, however, very serious about protecting respondent confidentiality and ensuring that sensitive data are used appropriately.
Data users are approved to access the data via a remote desktop connection called the Virtual Data Enclave (VDE). Data users cannot move files from the remote desktop to their desktop or the Internet. To receive output from the VDE, data users must request that ICPSR conduct a disclosure review on the desired files.
Proper citation ensures that research data can be: discovered, reused, replicated for verification, credited for recognition, and tracked to measure usage and impact.
Citing data is straightforward. Each citation must include the essential elements that allow a unique dataset to be identified over time:
Here are some examples of ICPSR data citations:
Barnes, Samuel H. Italian Mass Election Survey, 1968. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1992-02-16. https://doi.org/10.3886/ICPSR07953.v1
Schneider, Barbara, and Waite, Linda J. The 500 Family Study [1998-2000: United States]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-06-03. https://doi.org/10.3886/ICPSR04549.v1
For more information, see ICPSR's Citing Data web page.
While SOMAR is still in its early stages, there are already social media data held at ICPSR, which will be cross-listed when SOMAR is up and running. The datasets include:
SOMAR has been made possible by a $41,000 Propelling Original Data Science grant from the Michigan Institute for Data Science, called "Ensuring FAIRness in Social Media Archives". In 2022, Meta provided a $1.3 million gift to support SOMAR's vision and help build the archive so that it continues to exist and support research for years to come. Other funders have an opportunity to get involved, and potential supporters are encouraged to reach out to the ISR Development team.
SOMAR project lead Libby Hemphill directs the Resource Center for Minority Data at ICPSR and holds a joint appointment as an associate professor at the U-M School of Information. You may also contact the SOMAR team at somar-help@umich.edu.
SOMAR accepts data deposits from researchers and will build data-sharing partnerships with social media companies. Institutions are encouraged to contact ISR Development Director Henry Jewell to join the movement to democratize social media data. In addition, individual PIs are encouraged to email the SOMAR team at somar-help@umich.edu.