The PSDI launched a funding call in 2025 for short-term pilot studies aimed at developing innovative, data related resources for the physical sciences community. These projects were selected for their ability to produce tangible outputs: specifically new services or tools, and their commitment to community engagement and research transparency. The following partners represent the successful applicants from this call, each contributing unique digital research objects that will be integrated into the growing PSDI ecosystem.
Queen’s University Belfast
Project Lead
Dr Josh J Bailey
Project Title: Open-source repositories for flow battery data to improve reproducibility and validate simulations.
Project Outputs: Creation of two open-source GitHub repositories for experimental flow battery data and numerical/analytical models. It also includes Python-based data import, curation, and analysis tools.
Project Details
- Partners: Massachusetts Institute of Technology, Friedrich Schiller University Jena, and University College Dublin.
- Overview: This project develops an open-source hub to manage large datasets from an international interlaboratory flow battery testing campaign involving over 20 global institutions. The goal is to assess the replicability of electrochemical results and provide computational researchers with experimentally derived datasets for model validation.
University of Cambridge
Project Lead
Prof. Jacqueline Cole
Project Title: Creating a language model tailored for the magnetic materials domain
Project Outputs: A magnetic-materials-domain specific language model and codebase. These will be made available via the PSDI platform and the Henry Royce Institute’s Digital Materials Foundry.
Project Details
Overview: This project focuses on delivering a “small,” energy-sustainable language model fine-tuned for conversational AI. It allows the PSDI community to ask broad and accurate questions about magnetic applications, bypassing the need to manually trawl through scientific literature and accelerating materials discovery.
Heriot-Watt University
Project Lead
Prof. Susana Garcia
Project Title: The MOFevaluator: AI-Driven Holistic Assessment of Metal-Organic Frameworks for Carbon Capture and Beyond
Project Outputs: A web interface for the MOFevaluator and a database containing MOF properties and key performance indicators for different capture processes.
Project Details
- Partners: EPFL – Ecole Polytechnique Fédérale de Lausanne.
- Overview: With millions of potential Metal-Organic Frameworks (MOFs) available, traditional brute-force screening is no longer feasible. This project develops an AI-driven “digital twin” of the PrISMa platform to provide a holistic evaluation of MOFs, allowing synthetic chemists to upload crystal structures and receive rapid performance assessments for carbon capture applications.
University of Bath
Project Lead
Dr Elizaveta Suturina
Project Title: SimpNMR_data – A curated database of analysed solution paramagnetic NMR spectra and associated ab initio calculations
Project Outputs: A dataset repository for experimental and simulated paramagnetic NMR (pNMR) data and a website hosting the SimpNMR software and training materials.
Project Details
Overview: Interpreting pNMR spectra is often difficult due to the large and variable contributions from unpaired electrons. This project aims to make pNMR as accessible as standard diamagnetic NMR by creating a curated database that links experimental results with ab initio calculations, supported by workshops and video tutorials for the scientific community.
University of Strathclyde
Project Lead
Dr Tahereh Nematiaram
Project Title: BenchmarkSet-500: High-Accuracy Excited-State Reference Data for Organic Semiconductors
Project Outputs: A FAIR-compliant benchmark dataset of 500 organic semiconductor molecules and modular, reproducible computational workflows for calculations.
Project Details
Overview: Standard computational methods often struggle to accurately model excited states in complex organic semiconductors. This project addresses this by applying high-level multireference methods to create a “gold standard” reference dataset, supporting the validation of computational methods and the training of next-generation machine learning models.
University of Warwick
Project Lead
Prof. James Kermode
Project Title: Universal Hyper Active Learning: A Data Pipeline to Accelerate Materials Discovery
Project Outputs: The ase-uhal Python package, which serves as a plugin for the Atomic Software Environment (ASE), alongside documentation, tutorials, and benchmark applications for realistic materials.
Project Details
Overview: Designing training datasets for machine learning interatomic potentials (MLIPs) is often a burdensome, multi-year process. This project creates an automated “Universal Hyper Active Learning” (uHAL) pipeline that allows researchers to quickly generate and evaluate datasets for fine-tuning foundation models, providing quantitative accuracy for materials discovery at a fraction of the traditional computational cost.
University of Southampton
Project Lead
Dr Sergio Vernuccio
Project Title: ORKIM: An Open Repository for Kinetic Models of Reacting Systems
Project Outputs: The ORKIM standardised kinetic model repository and a Python-based open-source tool for plug flow reactor simulations.
Project Details
Overview: As kinetic models grow in size and complexity, they become harder to reproduce and share without errors. This project establishes a centralised, community-endorsed repository to archive and retrieve detailed kinetic models and associated parameters, streamlining workflows and ensuring models are directly applicable to chemical reactor simulations.