Enhancing FAIR compliance: A controlled vocabulary for mapping Social Sciences survey variables

Authors

DOI:

https://doi.org/10.29173/iq1118

Keywords:

Controlled vocabularies, longitudinal surveys, Knowledge graphs, Survey variables - Social Sciences

Abstract

The dynamic relationship among survey instruments and study entities like questionnaires, variables, questions, and response formats evolve in Social Sciences surveys. Researchers may need to modify variable attributes such as labels or names, question-wording, or response scales when reusing variables in survey design. Therefore, explaining these relations across different waves and studies is necessary to track how variables relate to each other. Although standards like Data Documentation Initiative – Lifecycle (DDI-LC) and DataCite model these relationships, these frameworks fall short of capturing the complexity of variable relationships. The DDI Alliance Controlled Vocabulary for Commonality Type employs codes—such as 'identical,' 'some,' and 'none'—to outline shifts in entities like variables; however, this approach is insufficient for disambiguating these relationships since they do not differentiate the variable attributes subject to change. We introduce the GESIS Controlled Vocabulary (CV) for Variables in Social Sciences Research Data to bridge this gap. This CV is designed to enhance semantic interoperability across various organizations and systems. Establishing explicit relationships facilitates harmonization across different study waves and enriches data reuse. This enhancement supports advanced search and browse functionalities. The CV, published via the CESSDA vocabulary manager, seeks to forge a semantically rich, interconnected knowledge graph specifically tailored for Social Science Research. This endeavour aligns with the FAIR data principles, aiming to foster a more integrated and accessible research landscape.

Author Biography

Claus-Peter Klas

Dr. Claus-Peter Klas, GESIS – Leibniz Institute for the Social Sciences, Team Leader "Data & Service Engineering" and Measure Lead in the NFDI consortium KonsortSWD in the department "Knowledge Technologies for the Social Sciences" . He received his PhD in computer science at the University of Duisburg-Essen and was a postdoctoral researcher in the Department of Multimedia and Internet Applications, Faculty of Mathematics and Computer Science, University of Hagen, Germany. His research focuses on information retrieval, interactive information retrieval, information systems, databases, digital libraries, preservation and grid and cloud architectures. He developed the software Daffodil founded on a nation research project and worked in national and European research projects such as The European Film Gateway, SHAMAN (Sustaining Heritage Access through Multivalent ArchiviNg) and Smart Vortex (Scalable Semantic Product Data Stream Management for Collaboration and Decision Making in Engineering).  He is currently responsible for several infrastructure projects within GESIS, such as da|ra, SowiDataNet or Missy, all concerned with providing information and data for social scientists. In addition, he leads the measure PID Services in the national research infrastructure project NFDI. In his team, they are developing an open source DDI suite to support getting DDI into operation.

References

Ajzen, I. and Fishbein, M. (2005), ‘The influence of attitudes on behavior’, in Albarracin, D., Johnson, B. T. and Zanna, M.P. (Eds), Handbook of Attitudes and Attitude Change, Lawrence Erlbaum Associates, Mahwah, NJ.

Aryani, A. et al. (2018) ‘A Research Graph dataset for connecting research data repositories using RD-Switchboard’, Scientific Data, 5(1), p. 180099. Available at: https://doi.org/10.1038/sdata.2018.99. DOI: https://doi.org/10.1038/sdata.2018.99

Babbie, E.R. (1990). Survey Research Methods, Wadsworth Publishing, Belmont, CA.

Bollen, K.A. (2002) ‘Latent Variables in Psychology and the Social Sciences’, Annual Review of Psychology, 53(1), pp. 605–634. Available at: https://doi.org/10.1146/annurev.psych.53.100901.135239 DOI: https://doi.org/10.1146/annurev.psych.53.100901.135239

Bugaje, M. and Chowdhury, G. (2017) ‘Is Data Retrieval Different from Text Retrieval? An Exploratory Study’, in S. Choemprayong, F. Crestani, and S.J. Cunningham (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 97–103. Available at: https://doi.org/10.1007/978-3-319-70232-2_8 DOI: https://doi.org/10.1007/978-3-319-70232-2_8

Cox, M. (2015) ‘A basic guide for empirical environmental social science’, Ecology and Society, 20(1), p. art63. Available at: https://doi.org/10.5751/ES-07400-200163 DOI: https://doi.org/10.5751/ES-07400-200163

DataCite Metadata Working Group (2021) ‘DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs v4.4’, p. 82 pages. Available at: https://doi.org/10.14454/3W3Z-SA82.

DDI Training Group (2021) ‘Variables and the Variable Cascade’. Available at: https://doi.org/10.5281/ZENODO.5180568 .

European Union. (2016). ‘Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons about the processing of personal data and the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)’. Official Journal of the European Union. Available at: https://eur-lex.europa.eu/eli/reg/2016/679/oj

Fafalios, P.; Iosifidis, V.; Ntoutsi, E. and Dietze, S. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In 15th Extended Semantic Web Conference (ESWC'18), Heraklion, Crete, Greece, June 3-7, 2018. https://doi.org/10.48550/arXiv.1810.10308 DOI: https://doi.org/10.1007/978-3-319-93417-4_12

Gangopadhyay, S., Boland, K., Dessí, D., Dietze, S., Fafalios, P., Tchechmedjiev, A., ... & Jabeen, H. (2023, May). Truth or dare: Investigating claims truthfulness with claimskg. In Second International Workshop on Linked Data-driven Resilience Research (D2R2’23) co-located with ESWC 2023, May 28th, 2023, Hersonissos, Greece. Available at: https://ceur-ws.org/Vol-3401/paper7.pdf

ISSP Research Group (1992) ‘International Social Survey Programme: Role of Government II - ISSP 1990 International Social Survey Programme: Role of Government II - ISSP 1990’. GESIS Data Archive. Available at: https://doi.org/10.4232/1.1950

Jaaskelainen, T., Moschner, M. and Wackerow, J. (2010) ‘Controlled Vocabularies for DDI 3: Enhancing Machine-Actionability’, IASSIST Quarterly, 33(1), p. 34. Available at: https://doi.org/10.29173/iq649 DOI: https://doi.org/10.29173/iq649

Kaur, Loveleen and Mittal, Ritu. (2021). ‘Variables in Social Science Research’. Indian Res. J. Ext. Edu. 21 (2&3), April & July, 2021. URL: https://www.researchgate.net/profile/Ritu-Mittal-2/publication/351080413_Variables_in_Social_Science_Research/links/6083aa49907dcf667bbda5cf/Variables-in-Social-Science-Research.pdf

Klas, C.-P. et al. (2022) KonsortSWD Measure 5.1: PID Service for variables report. Zenodo. Available at: https://doi.org/10.5281/ZENODO.6397367.

Manghi, P. et al. (2019) The OpenAIRE Research Graph Data Model. Zenodo. Available at: https://doi.org/10.5281/ZENODO.2643199.

Liebig, S. et al. (2021) ‘Socio-Economic Panel, data from 1984-2019, (SOEP-Core, v36, EU Edition) Sozio-oekonomisches Panel, Daten der Jahre 1984-2019 (SOEP-Core, v36, EU Edition)’. SOEP Socio-Economic Panel Study. Available at: https://doi.org/10.5684/SOEP.CORE.V36EU.

Roßbach, H.-G. and NEPS, National Educational Panel Study, Bamberg (Germany) (2016) ‘NEPS Starting Cohort 6: Adults (SC6 6.0.1)NEPS-Startkohorte 6: Erwachsene (SC6 6.0.1)’. NEPS National Education Panel Study. Available at: https://doi.org/10.5157/NEPS:SC6:6.0.1

Saldanha Bach, J., Klas, C.-P. and Mutschke, P. (2023) KonsortSWD Measure 5.1: use cases description extended report. Zenodo. Available at: https://doi.org/10.5281/ZENODO.7588944 DOI: https://doi.org/10.52825/cordi.v1i.344

Saldanha Bach, J., Klas, C.-P. and Mutschke, P. (2023) KonsortSWD Measure 5.1: metadata schema extended report. Zenodo. Available at: https://doi.org/10.5281/ZENODO.7588902

Scoulas, J.M. (2020) ‘Learning from data reuse: successful and failed experiences in a large public research university library’, IASSIST Quarterly, 44(1–2), pp. 1–15. Available at: https://doi.org/10.29173/iq966 DOI: https://doi.org/10.29173/iq966

Stocker, M. et al. (2018) ‘Curating Scientific Information in Knowledge Infrastructures’, Data Science Journal, 17, p. 21. Available at: https://doi.org/10.5334/dsj-2018-021. DOI: https://doi.org/10.5334/dsj-2018-021

Sun, G. and Khoo, C.S.G. (2018) ‘A Framework to represent variables and values in Social Science research data sets to support data curation and reuse’, in F. Ribeiro and M.E. Cerveira (eds) Challenges and Opportunities for Knowledge Organization in the Digital Age. Ergon Verlag, pp. 231–239. Available at: https://doi.org/10.5771/9783956504211-231 DOI: https://doi.org/10.5771/9783956504211-231

Thomas, W., et al. (2014). Data documentation initiative: technical specification Part I Version 3.2. URL: https://ddialliance.org/Specification/DDI-Lifecycle/3.2/XMLSchema/HighLevelDocumentation/DDI_Part_I_TechnicalDocument.pdf

Wehrle, D. and Rechert, K. (2019) ‘Are Research Datasets FAIR in the Long Run?’, International Journal of Digital Curation, 13(1), pp. 294–305. Available at: https://doi.org/10.2218/ijdc.v13i1.659 DOI: https://doi.org/10.2218/ijdc.v13i1.659

Wu, M. et al. (2019) ‘Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories’, Data Science Journal, 18, p. 3. Available at: https://doi.org/10.5334/dsj-2019-003 DOI: https://doi.org/10.5334/dsj-2019-003

Downloads

Published

2024-06-26

How to Cite

Saldanha Bach Estevao, J., & Klas, C.-P. (2024). Enhancing FAIR compliance: A controlled vocabulary for mapping Social Sciences survey variables. IASSIST Quarterly, 48(2). https://doi.org/10.29173/iq1118