IASSIST Quarterly https://iassistquarterly.com/index.php/iassist <p class="p1">The <strong>IASSIST Quarterly</strong> at https://iassistquarterly.com is an international, peer-reviewed, indexed, open access quarterly publication of articles dealing with social science information and data services, including relevant societal, legal, and ethical issues.</p> <p class="p1">The <strong>IASSIST Quarterly</strong> represents an international cooperative effort on the part of individuals managing, operating, or using machine-readable data archives, data libraries, and data services. The <strong>IASSIST Quarterly </strong>reports on activities related to the production, acquisition, preservation, processing, distribution, and use of machine-readable data carried out by its members and others in the international social science community. </p> International Association for Social Science Information Service and Technology en-US IASSIST Quarterly 0739-1137 <p>This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.</p> <p>The Creative Commons-Attribution-Noncommercial License 4.0 International applies to all works published by IASSIST Quarterly. Authors will retain copyright of the work. Your contribution will be available at the IASSIST Quarterly website when announced on the IASSIST list server.</p> Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires https://iassistquarterly.com/index.php/iassist/article/view/1023 <p>Data Documentation Initiative-Lifecycle (DDI-L) introduced a robust metadata model to support the capture of questionnaire content and flow, and encouraged through support for versioning and provenancing, objects such as BasedOn for the reuse of existing question items. However, the dearth of questionnaire banks including both question text and response domains has meant that an ecosystem to support the development of DDI ready Computer Assisted Interviewing (CAI) tools has been limited. Archives hold the information in PDFs associated with surveys but extracting that in an efficient manner into DDI-Lifecycle is a significant challenge.</p> <p>While CLOSER Discovery has been championing the provision of high-quality questionnaire metadata in DDI-Lifecycle, this has primarily been done manually. More automated methods need to be explored to ensure scalable metadata annotation and uplift.</p> <p>This paper presents initial results in engineering a machine learning (ML) pipeline to automate the extraction of questions from survey questionnaires as PDFs. Using CLOSER Discovery as a ‘training and test dataset’, a number of machine learning approaches have been explored to classify parsed text from questionnaires to be output as valid DDI items for inclusion in a DDI-L compliant repository.</p> <p>The developed ML pipeline adopts a continuous build and integrate approach, with processes in place to keep track of various combinations of the structured DDI-L input metadata, ML models and model parameters against the defined evaluation metrics, thus enabling reproducibility and comparative analysis of the experiments.&nbsp; Tangible outputs include a map of the various metadata and model parameters with the corresponding evaluation metrics’ values, which enable model tuning as well as transparent management of data and experiments.</p> Suparna De Harry Moss Jon Johnson Jenny Li Haeron Pereira Sanaz Jabbari Copyright (c) 2022 Suparna De, Harry Moss, Jon Johnson, Jenny Li, Haeron Pereira, Sanaz Jabbari https://creativecommons.org/licenses/by-nc/4.0 2022-03-28 2022-03-28 46 1 10.29173/iq1023 A tool to promote research planning and conceptualization: SoDaNet research infrastructure’s scientific dictionary of social terms https://iassistquarterly.com/index.php/iassist/article/view/1021 <p>This article examines the contribution of SoDaNet research infrastructure’s Scientific Dictionary of Social Terms to empirical social research. The article records the dictionary functional specifications in regarding to terms, definitions and bibliographic records and analyzes the management issues in user access in relation to the basic functions (search, import, modification and deletion of digital content). In addition, the functions of the dictionary as a research planning tool are analyzed (providing opportunities to search for scientific information necessary to design a new research), conceptualization (providing access to the different meanings of a term through the different definitions given) and scientific documentation. Finally, the function of the dictionary as an element of a research infrastructure is evaluated.</p> Ioannis Kallas Dimitra Kondyli Copyright (c) 2022 Ioannis Kallas, Dimitra Kondyli https://creativecommons.org/licenses/by-nc/4.0 2022-03-28 2022-03-28 46 1 10.29173/iq1021 Open geospatial data: A comparison of data cultures in local government https://iassistquarterly.com/index.php/iassist/article/view/1013 <p>Public geospatial data (geodata) is created at all levels of government, including federal, state, and local (county and municipal). Local governments, in particular, are critical sources of geodata because they produce foundational datasets, such as parcels, road centerlines, address points, land use, and elevation. These datasets are sought after by other public agencies for aggregation into state and national frameworks, by researchers for analysis, and by cartographers to serve as base map layers. Despite the importance of this data, policies about whether it is free and open to the public vary from place to place. As a result, some regions offer hundreds of free and open datasets to the public, while their neighbors may have zero, preferring to restrict them due to privacy, economic, or legal concerns.</p> <p>Minnesota relies on an approach that allows counties to choose for themselves if their geodata is free and open. By contrast, its neighboring state of Wisconsin has passed legislation requiring that specific foundational geospatial datasets created by counties must be freely available to the public. This paper compares the implications and outcomes of these diverging data cultures.</p> Karen Majewicz Jaime Martindale Melinda Kernik Copyright (c) 2022 Karen Majewicz, Jaime Martindale, Melinda Kernik https://creativecommons.org/licenses/by-nc/4.0 2022-03-28 2022-03-28 46 1 10.29173/iq1013 Openness in metadata, dictionaries, and data https://iassistquarterly.com/index.php/iassist/article/view/1034 Karsten Boye Rusmussen Copyright (c) 2022 Karsten Boye Rusmussen https://creativecommons.org/licenses/by-nc/4.0 2022-03-28 2022-03-28 46 1 10.29173/iq1034