PSDI Pilot Activities
PSDI pilot phase, was funded from the EPSRC Digital Research Infrastructure (DRI) programme, and ran from November 2021 – March 2022, following on from the large infrastructures SoN submitted by our project team. This pilot phase was a rapid scoping exercise, designed to expand on the ambitions of the project from the SoN, to engage broadly with the potential user community to gather and analyse requirements and to develop a plan for future phases.
Specifically, the objectives of the pilot were:
- Engage with the potential PSDI stakeholder community and build support for its creation
- Undertake some case studies to demonstrate the potential scientific benefits
- Trial some relevant technologies and investigate their interoperability
- Gather requirements arising from the case studies and trials and wider consultation
- Analyse these requirements, elucidate the necessary functionality, and propose a technology architecture for PSDI
- Produce a detailed plan for future phases of PSDI
- Create a governance structure for future activities
Pilot Report
The PSDI Pilot phase produced an overall high level report which summarised the activities that were carried out in the Pilot, as well as the recommendations and outlines for the future of PSDI. Recommendations at a more detailed level were also generated in many of the work packages and case studies. These findings can be explored in the additional reports produced through the pilot work.
Work Packages
The pilot activity was split into four work packages, outlined below, to capture the different elements of the pilot activities. These WPs worked across the pillars to engage with different scales of data infrastructure and disciplinary areas to solicit a broad spectrum of requirements. However, there was collaboration between the different work packages due to the connected nature of the project and the activities in separate WPs fed into deliverables and outputs of others. You can find out more about each of the work packages in the sections below.
WP1. Coordination, Governance and Strategy
Work Package 1 was lead by Jeremy Frey. This work package worked on managing the initial phase and establishing the governance structure for future development of PSDI. This required establishing a Management Board and Steering panel with representatives from all stakeholders. We liaised with other DRI pilots and defined detailed plans for the next phase of PSDI.
Work Package 1 also worked to produce a strategy and plan for PSDI going forward. This analysed and collated the results from all work packages, and from the consultations undertaken, prioritised the generic and community specific functionality required to deliver a fully functioning PSDI along with a programme of community activities to support it. A schedule was worked out, taking into account guidance and constraints regarding availability of funding.
WP2. Stakeholder Engagement
Work Package 2 was lead by Barbara Montanari. Community involvement was crucial to evaluating the potential of PSDI. This work package carried out stakeholder engagement with the main objective of gathering requirements that informed the plan for phase 2 that was prepared under Work Package 1. This activity ran the majority of the project communications including events, social media and community updates. It also undertook a program of stakeholder engagement events to receive input from the wide range of stakeholders. Including running workshops around different data themes and engaging with communities on a group and individual level. These engagement activities allow us to elicit requirements from communities that also fed into the other work packages.
Nationally, the focus of this first engagement phase was mainly in the PS research and computing/software infrastructure domains. Engagement with other data infrastructure initiatives in other domains and overseas was also undertaken in order to learn from other experiences and to design the PSDI so that it can be linked with data infrastructures across UKRI in a future phase. In addition to the categories of stakeholders represented by the heading of the four pillars, we also engaged with further potential users of the PSDI not currently organised in established networks, the EPSRC compute and software infrastructure (including ARCHER2, SSI, RSEs, etc), relevant institutes such as the ATI, and industry (including data infrastructure vendors).
In WP2 we ran a variety of different activities to engage with the community in different ways. These included:
- Workshop events
- Presentations at community events
- Focus Groups
- Interviews
- Surveys
WP3. Architecture & Technology
Work Package 3 was lead by Brian Matthews. The goal of this work package was to define a technological approach to provisioning a data infrastructure for Physical Sciences. The PSDI proposes to provide a state-of-the-art, enterprise scale digital infrastructure for research that is distributed across centres and responsive to the needs of the user community. It should provide a scalable, adaptable and accessible infrastructure that is secure and reliable, while being compatible with related data infrastructure initiatives supporting research. This work package focused on scoping the design for the common infrastructure architecture for PSDI. This planning developed an Options analysis for the core infrastructure including a survey of current technology and requirements analysis. We defined an Initial Architecture design and prototyped some example technologies for infrastructure provision. This also included undertaking small scale technology trials evaluating usability, scalability, security and interaction with other technologies.
This worked closely with WP2 and WP4 to engage with the community on their technology requirements.
WP4. Case studies
Work Package 4 was lead by Simon Coles. This work package worked with some of the key stakeholders identified to undertake case studies that scoped out or created prototype systems to demonstrate how proposed elements of PSDI could influence cutting-edge research. The objective of WP4 was to augment the community consultation and technology work (WPs 2/3) with focused, practical applications, testing particular aspects of the infrastructure. Through a combination of test implementations and desk-based analysis, this WP contributed domain specific results to the overall recommendations and specifications of this pilot project. These case studies demonstrated the potential benefits of PSDI and indicated the work to be continued in later phases. Case studies were selected to represent as many elements of the infrastructure and its user communities as possible. Topics explored in the case studies included exploring data pathways, combination, surfacing data and more.
In the Pilot phase we were undertaking work in 8 separate case studies run by researchers across our collaborating partners. The 8 case studies covered a wide range of the different research areas, techniques and infrastructure requirements. The case studies split into two categories, scientific disciplines and underpinning methods. The case studies predominantly spanned the pillars 1 to 3 (Facilities, institutes and hubs; National research facilities; Computational initiatives). However, they also touched on the more diverse 4th pillar (Institutions, groups and laboratories). The case studies formed the basis of a library of case studies, which were supplemented with examples arising from WP2, and as the project progressed, they were opened up so that anyone could contribute to a study and feed into the evolving requirements.
The 8 case studies are outlined below. You can read their individual reports by clicking on the case study name.
Scientific Disciplines
CS1: Data and Simulation driven understanding of catalytic activity
Aim: Demonstrate the practice and value of linking and combining data from across experimental data facilities.
CS2: Exploring CSD-Theory as a tool for assisting materials discovery
Aim: Assess the performance of the CCDC’s new CSD-Theory suite as a medium to link simulation and laboratory materials science in a multi-stage workflow involving computational crystal structure prediction (CSP) (Southampton) and high-throughput automated synthesis and analysis (Liverpool)
CS3: Combining data sources in Materials Physics
Aim: Evaluate the requirements for storing experimental and Natural Language Processing (NLP) mined data.
Underpinning methods
CS4: Spectroscopic data infrastructure
Aim: Evaluate technology and data requirements to underpin spectroscopy characterisation techniques across all disciplines using the infrastructure
CS5: Data curation and availability at instrument-based facilities
Aim: To understand facility data management necessary to publish standalone datasets, to support e.g. formal publishing routes or for machine learning within the National Research Facility for lab-based X-ray CT
CS6: Process Recording and Digital Research Notebooks
Aim: Assess process recording requirements and the associated digital landscape. Investigate Digital Research Notebooks (DRN) and evaluate their suitability as generic recording systems to support diverse workflows
CS7: Data trust, sharing & preservation
Aim: Explore data trust and sharing framework for applicability to PSDI and develop recommendations for preservation and curation approaches
CS8: The Role of Structure in Physical Sciences Data Management
Aim: To probe the requirements for structure-specific metadata to support data management of specific resources, and understand the potential for linking, discovery and machine learning