Discussing GitLab as a central NFDI service for code and data

3 Apr 2024

DataPLANT participated in the RSE meeting on March 25th to present “GitLab based DataHUB for RDM: Learning from open source software development”. The research software engineers working group is part of the common infrastructure section of the NFDI that meets monthly to discuss software related aspects (Link). The collaboration of researchers locally or worldwide ranging from single researchers over a group or a lab up to cross institutional and disciplinary cooperation requires suitable working environments. Data hubs in the form of science gateways - usually abstracting from just locally shared storage resources - are discussed and explored for quite some time. The DataPLANT team found that Git and the GitLab framework met key needs for a data hub in RDM. These needs included versioning, group collaboration, and easy access management. For this, Git provides versioning, which makes it possible to track and undo changes within a Git repository. To store the repository data, GitLFS is used, especially for large files (see Link). While it was en vogue to propose and develop discipline specific gateways, we suggest to rely on well-established standard software frameworks instead. Research data considered over the entire data life cycle and closely related activities like annotation, versioning and sharing has a lot in common with (open source) software development. GitLab itself adds features for fine grained access management, allowing users to form collaborative groups that they can manage themselves, fostering easy and flexible collaboration across institutions. GitLab gives detailed access control, allowing users to create and manage their own collaboration groups, which helps collaboration across organizations happen easily and flexibly. Git and GitLab could play a major role in general research data management. From DataPLANT's point of view, GitLab as a science gateway would be a valuable addition to the NFDI software landscape in the form of a basic service. Thus, it would be beneficial to address it as a joint service in cross-domain activities of all interested consortia.