These recommendations were grouped into 4 areas:
Connecting Existing Infrastructures
The primary finding confirmed the view set out in the Statement of Need, that the current data landscape is fragmented, making data analysis and reuse unduly complex, and that physical sciences research would be greatly accelerated by more integration of the systems that handle data. This would enable researchers not only to undertake their own analyses more effectively, but also to make their data products available as inputs for further research.
There is strong view in the community that the major need is for a data infrastructure that connects existing systems, widening their applicability and adding value through aggregation, rather than for the development of new functions. Such an integration would support data workflows enabling researchers to concentrate on their science rather than spend time on data management activities.
Stakeholders were understandably concerned that a data infrastructure must be trustworthy and enduring as without assurance of its longevity, researchers would be reticent to invest the time required to engage with a system which may be temporary or fail to gain traction as key infrastructure.
R1
R2
R3
Best Use of Data
There is a need to open up data for reuse and aggregation into collections that add value, and to link up with data sources from other domains for cross-disciplinary, multiscale modelling and multimodal research. It should be possible to readily access provenanced data, including reference quality data, and secondary data underpinning publications. Availability of data should support reproducibility and validation of research, in addition to application in further research including machine learning and AI.
There is also a crucial need to establish better data-level connectivity across the pillars, particularly bridging between experimental and computational activities.
For an integrated, distributed physical sciences data landscape to be realised, some new connecting functionality will need to be developed. The new infrastructure should support the overarching principles of data being as open and FAIR as possible, and drive international collaborations and interdisciplinary research through the use of open standards.
R4
R5
- reference quality data from commercial and open sources
- original data generated from experiments and simulations
- secondary data underpinning articles, theses and reports
- derived or analysed intermediate data
- collections of results data representing aggregations of properties or features of analysed data
R6
R7
Best Use of People
It was clearly recognised that an effective research ecosystem requires not only investment in technology, but also needs support professionals to make it usable, and appropriately trained people to fully exploit it. We observed a wide variation in levels of data skills in different groups. This highlighted an opportunity for sharing knowledge and best practice between projects, disciplines and research domains.
Much of a physical sciences researcher time is spent finding, cleaning, transforming and importing/exporting data. There is a need for dedicated professionals who can either fully support researchers’ data workflows enabling them to concentrate on research without being impeded by cumbersome data management, or provide streamlined tools supporting data intensive research in the physical sciences enabling researchers to more easily support themselves. The role of these professionals must be fully established, recognised and sustained.
The physical sciences research community is extensive and varied and therefore needs broad community participation in its governance, planning and development.
R8
R9
R10
R11
Best Use of Technology
Information Technology for data management and data analysis is rapidly changing and often diverging, with important new functionality emerging continuously. Physical science researchers currently have to navigate a wide diversity of provision in a highly heterogeneous technological environment. Physical Science research workflows should “Ride the wave” of technology evolution and integrate the latest technological developments.
Providing an integrated infrastructure where researchers can adopt diverse tools, yet continue to work together will require agreement on and maintenance of the vocabulary, interfaces and tools that enable interoperability. An essential feature of a data infrastructure for physical sciences should thus be to develop and maintain interoperability standards and the associated supporting tools that enable sharing and discovery of metadata and data.