The PSDI launched a funding call in 2025 for short-term pilot studies aimed at developing innovative, data related resources for the physical sciences community. These projects were selected for their ability to produce tangible outputs: specifically new services or tools, and their commitment to community engagement and research transparency. The following partners represent the successful applicants from this call, each contributing unique digital research objects that will be integrated into the growing PSDI ecosystem. 

Queen’s University Belfast

Project Lead
Dr Josh J Bailey

Project Title: Open-source repositories for flow battery data to improve reproducibility and validate simulations.

Project Outputs: Creation of two open-source GitHub repositories for experimental flow battery data and numerical/analytical models. It also includes Python-based data import, curation, and analysis tools.

Project Details
  • Partners: Massachusetts Institute of Technology, Friedrich Schiller University Jena, and University College Dublin. 
  • Overview: This project develops an open-source hub to manage large datasets from an international interlaboratory flow battery testing campaign involving over 20 global institutions. The goal is to assess the replicability of electrochemical results and provide computational researchers with experimentally derived datasets for model validation. 

University of Cambridge

Project Lead
Prof. Jacqueline Cole

Project Title: Creating a language model tailored for the magnetic materials domain 

Project Outputs: A magnetic-materials-domain specific language model and codebase. These will be made available via the PSDI platform and the Henry Royce Institute’s Digital Materials Foundry. 

Project Details

Overview: This project focuses on delivering a “small,” energy-sustainable language model fine-tuned for conversational AI. It allows the PSDI community to ask broad and accurate questions about magnetic applications, bypassing the need to manually trawl through scientific literature and accelerating materials discovery. 

Heriot-Watt University 

Project Lead
Prof. Susana Garcia

Project Title: The MOFevaluator: AI-Driven Holistic Assessment of Metal-Organic Frameworks for Carbon Capture and Beyond 

Project Outputs: A web interface for the MOFevaluator and a database containing MOF properties and key performance indicators for different capture processes. 

Project Details
  • Partners: EPFL – Ecole Polytechnique Fédérale de Lausanne. 
  • Overview: With millions of potential Metal-Organic Frameworks (MOFs) available, traditional brute-force screening is no longer feasible. This project develops an AI-driven “digital twin” of the PrISMa platform to provide a holistic evaluation of MOFs, allowing synthetic chemists to upload crystal structures and receive rapid performance assessments for carbon capture applications. 

University of Bath

Project Lead
Dr Elizaveta Suturina

Project Title: SimpNMR_data – A curated database of analysed solution paramagnetic NMR spectra and associated ab initio calculations 

Project Outputs: A dataset repository for experimental and simulated paramagnetic NMR (pNMR) data and a website hosting the SimpNMR software and training materials. 

Project Details

Overview: Interpreting pNMR spectra is often difficult due to the large and variable contributions from unpaired electrons. This project aims to make pNMR as accessible as standard diamagnetic NMR by creating a curated database that links experimental results with ab initio calculations, supported by workshops and video tutorials for the scientific community. 

    University of Strathclyde

    Project Lead
    Dr Tahereh Nematiaram

    Project Title: BenchmarkSet-500: High-Accuracy Excited-State Reference Data for Organic Semiconductors 

    Project Outputs: A FAIR-compliant benchmark dataset of 500 organic semiconductor molecules and modular, reproducible computational workflows for calculations. 

    Project Details

    Overview: Standard computational methods often struggle to accurately model excited states in complex organic semiconductors. This project addresses this by applying high-level multireference methods to create a “gold standard” reference dataset, supporting the validation of computational methods and the training of next-generation machine learning models. 

      University of Warwick

      Project Lead
      Prof. James Kermode

      Project Title: Universal Hyper Active Learning: A Data Pipeline to Accelerate Materials Discovery 

      Project Outputs: The ase-uhal Python package, which serves as a plugin for the Atomic Software Environment (ASE), alongside documentation, tutorials, and benchmark applications for realistic materials. 

      Project Details

      Overview: Designing training datasets for machine learning interatomic potentials (MLIPs) is often a burdensome, multi-year process. This project creates an automated “Universal Hyper Active Learning” (uHAL) pipeline that allows researchers to quickly generate and evaluate datasets for fine-tuning foundation models, providing quantitative accuracy for materials discovery at a fraction of the traditional computational cost. 

        University of Southampton

        Project Lead
        Dr Sergio Vernuccio

        Project Title: ORKIM: An Open Repository for Kinetic Models of Reacting Systems 

        Project Outputs: The ORKIM standardised kinetic model repository and a Python-based open-source tool for plug flow reactor simulations. 

        Project Details

        Overview: As kinetic models grow in size and complexity, they become harder to reproduce and share without errors. This project establishes a centralised, community-endorsed repository to archive and retrieve detailed kinetic models and associated parameters, streamlining workflows and ensuring models are directly applicable to chemical reactor simulations. 

          Loading...