Research Data Management Tools and Workflows: Experimental Work at the University of Porto

  • Cristina Ribeiro
  • João Rocha da Silva
  • João Aguiar Castro
  • Ricardo Carvalho Amorim
  • João Correia Lopes
  • Gabriel David


Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable.

Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts.

We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running. 

The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.

Jul 18, 2018