Research churns out petabytes (1015 bytes) of data worldwide every year. And it is in the scientific data, often generated as a by-product of research, that great wealth lies. Proper handling, security and documentation of this data are crucial for its exploitation and thus for science itself. In medicine, for example, a major problem is that often the very institutions themselves do not know about all the already existing data on a problem, where to look for it and what exactly it represents. This is one of today's major societal challenges that is emerging with the development of informatics and the increasing investment in digitisation.
To make data more accessible and usable for scientists, experts from FIT CTU are trying to achieve this through projects focused on the goals of the FAIR initiative. FAIR is the acronym of the initiative, which aims to make data findable, accessible, interoperable and reusable. One of the team's achievements is the DSW data management planning tool.
"We developed the DSW tool at FIT CTU in cooperation with Dutch colleagues within the ELIXIR infrastructure. It is groundbreaking in the sense that it facilitates data management planning for scientists, which is now required by all public funders. No scientist would start a challenging experiment without proper data management planning, which is often sold as a 'annoiyng duty'. One reason for this is the difficulty of creating a good plan. This tool provides an easy and efficient way to create good data management plans, guides the researcher, helps to use what is available for research and maximises the effect of research data, bringing value to researchers, institutions and society as a whole," says doc. Ing. Robert Pergl, Ph.D., Head of the Centre for Conceptual Modelling and Implementation (CCMi) at FIT CTU. DSW is a tool that is now mostly used in natural sciences, but it is also ready for application in other scientific fields. It is used not only for planning itself, but also for data management education.
"The FAIR principles significantly improve the reusability of data, especially with regard to linking data to each other, for example, linking data from clinical drug trials to genetics research. Effective use of data at a global level is also crucial for effective management of epidemiological situations, as demonstrated by the COVID-19 pandemic. In this context, we have been involved with the DSW project in the digitisation and FAIRification of patient data that can be used in further research, in a completely anonymised way," adds doc. Pergl on the wide applicability of DSW for current challenges in the society .
One of the goals of the FAIR initiative is better machine processability of data, which is important for artificial intelligence (AI) research. If AI has more well-described and interpretable datasets, it will better understand them and use them effectively to improve its results. It is the exactness and auditability of AI outputs that is key to moving from "creative AI" to "exact AI".
The DSW was created within the international infrastructure for data in the life sciences called ELIXIR, in cooperation between the Dutch node (ELIXIR-NL) and the Czech node (ELIXIR-CZ), namely FIT CTU and Institute of Organic Chemistry and Biochemistry of the Academy of Sciences. The project is funded from European and national sources, in particular from the projects for support of large infrastructures of the Ministry of Education, Youth and Sports of the Czech Republic. This year the follow-up LM project 2023-2026 starts, within which the development and deployment of DSW will continue. The ambition is to progressively integrate DSW with other tools to assist not only in planning but also in plan implementation, further fulfilling the authors' vision of contributing to effective and efficient data management.