000000017 001__ 17 000000017 005__ 20240801202639.0 000000017 02470 $$2DOI 000000017 037__ $$aADMIN 000000017 245__ $$aMapping Language Literacy At Scale: A Case Study on Facebook 000000017 246__ $$aOnline Language Literacy Estimates (OLLE) Dataset 000000017 251__ $$av1 000000017 269__ $$a2023-04-07 000000017 336__ $$aDataset 000000017 500__ $$aLiteracy estimates are calculated for each geographically bounded community (e.g. a county or a region) with at least 1000 active users observed in the study period. The gender- or region-disaggregate population-level estimates also require a minimum of 1000 active users observed in the study period in any of the disaggregate groups. 000000017 510__ $$aLin, Yu-Ru, Wu, Shaomei, and Mason, Winter. Mapping Language Literacy At Scale: A Case Study on Facebook. Inter-university Consortium for Political and Social Research [distributor], 2023-04-07. [DOI forthcoming] 000000017 520__ $$aLiteracy is one of the most fundamental skills for people to access and navigate today’s digital environment. This work systematically studies the language literacy skills of online populations for more than 160 countries and regions across the world, including many low-resourced countries where official literacy data are particularly sparse. Leveraging public data on Facebook, we develop a population-level literacy estimate for the online population that is based on aggregated and de-identified public posts written by adult Facebook users globally, significantly improving both the coverage and resolution of existing literacy tracking data. We found that, on Facebook, women collectively show higher language literacy than men in many countries, but substantial gaps remain in Africa and Asia. Further, our analysis reveals a considerable regional gap within a country that is associated with multiple socio-technical inequalities, suggesting an “inequality paradox” – where the online language skill disparity interacts with offline socioeconomic inequalities in complex ways. These findings have implications for global women’s empowerment and socioeconomic inequalities. This data is replication data for the following paper: Lin, Yu-Ru, Wu, Shaomei, and Mason, Winter (in press). Mapping Language Literacy At Scale: A Case Study on Facebook. EPJ: Data Science. 000000017 540__ $$aThis dataset has two levels of access: Public and Login Required. <ul> <li>Public files can be downloaded directly from the dataset record. Typically, documentation files such as READMEs and codebooks are made public.</li> <li>Login Required files can be downloaded directly from the dataset record, once you have created and logged in to your SOMAR account. By downloading data files you agree to <a href="https://socialmediaarchive.org/pages/?page=Terms%20of%20Use&ln=en">ICPSR’s Terms of Use</a>.</li> 000000017 650__ $$aliteracy 000000017 650__ $$ainternet use 000000017 650__ $$ainequality 000000017 655__ $$aaggregate data 000000017 655__ $$atext 000000017 655__ $$aobservational data 000000017 720__ $$aLin, Yu-Ru $$eData Collector$$uUniversity of Pittsburgh$$7Personal 000000017 720__ $$aWu, Shaomei $$eData Collector$$uAImpower$$7Personal 000000017 720__ $$aMason, Winter$$eData Collector$$uMeta Platforms, Inc$$7Personal 000000017 791__ $$tLin, Yu-Ru, Wu, Shaomei, and Mason, Winter (in press). Mapping Language Literacy At Scale: A Case Study on Facebook. EPJ: Data Science.$$aJournal Article$$eIs Source Of$$2DOI$$whttps://doi.org/10.48550/arXiv.2303.12179 000000017 8564_ $$yThis is the final dataset that summarizes statistics for all 167 countries, including offline data (e.g., internet penetration) and on-platform data (e.g., average number of “Big Words” per post)$$9c81fd589-18d2-4fa9-b31f-ba62e1451508$$s24176$$uhttps://socialmediaarchive.org/record/17/files/all_country_measures.csv 000000017 8564_ $$yIncludes a description of all the data files$$9d7e70a9c-b02c-4b86-a58a-b04cfc5d9d4f$$s8208$$uhttps://socialmediaarchive.org/record/17/files/Data%20Summary.docx 000000017 8564_ $$y Breakdown of OLLE measurement by gender for each country, along with relevant covariates for each country$$9a6dc4df5-7a39-4609-a79e-8c54eefebbb4$$s20752$$uhttps://socialmediaarchive.org/record/17/files/olle_gender_covariates.csv 000000017 8564_ $$y Regional variation of OLLE within each country, plus relevant covariates$$9c96b59da-29d2-48f6-bcbd-511725a8042d$$s5186$$uhttps://socialmediaarchive.org/record/17/files/olle_subregional_covariates.csv 000000017 8564_ $$ySummary statistics about the UN subregions$$9cc59cb60-ae0a-46cf-8df7-678f56b1304f$$s1297$$uhttps://socialmediaarchive.org/record/17/files/summary_7region_OLLE.csv 000000017 8564_ $$yAnalysis code and data files$$9131e1202-329c-415b-9ac0-e90e63797736$$s96525$$uhttps://socialmediaarchive.org/record/17/files/mapping-language-literacy-data.zip 000000017 906__ $$a2020-04-20$$b2020-05-20 000000017 908__ $$aFacebook 000000017 910__ $$auser behavior tracking 000000017 911__ $$aThis data was collected as part of the first large-scale study of online language literacy (in this context, by literacy we mean verbal acuity, i.e., having the ability to use a diversity of words) and its implications for 167 nations around the globe and introduces a new global measure to estimate online language literacy skills as captured on Facebook. This estimation enables us to examine the language literacy differences at an unprecedented level of coverage and resolution, especially for low income countries and regions. 000000017 912__ $$aTo obtain a written sample of online populations worldwide, we collected public posts written in any of the 12 chosen languages created by Facebook users who are at least 18 years old and active during a 30-day period between April 20 and May 20, 2020. We measure the aggregated use of lower-frequency words (“LoFF words”) - secondary vocabulary words outside the high-frequency everyday lexicons - in public Facebook posts as a proxy for online populations’ literacy skills in a given language. Analyses are conducted by gender and region. Please see the corresponding paper and supplementary materials for more details about data collection and methodology. 000000017 913__ $$aData represent aggregate information derived from public posts on Facebook by individuals in 167 countries writing in 12 languages. The 12 languages are Arabic, German, English, Spanish, French, Italian, Malay, Dutch, Portuguese, Russian, Turkish, and Chinese. 000000017 921__ $$ageographic unit 000000017 921__ $$apolitical administrative area 000000017 921__ $$agroup 000000017 925__ $$av2 000000017 926__ $$aR 000000017 927__ $$dRequired packages are listed at the top of lit_utils_v2.R 000000017 928__ $$aSeven .RData files are required input for this software. Files and software are available in the mapping-language-literacy-data.zip folder: <ul> <li>df_covid_vamapping-language-literacy-datars</li> <li>df_gender_trans_vars</li> <li>df_region_trans_vars</li> <li>df_trans_vars</li> <li>df_vis</li> <li>df_word_freq</li> <li>valid_countries</li> </ul> 000000017 929__ $$aThe R script lit_utils_v2.R creates a series of functions that can be used to plot and analyze the data files. 000000017 980__ $$aFacebook 000000017 980__ $$aDatasets 000000017 981__ $$aPublished