PSDI Updates: January 2025

Home » PSDI Updates: January 2025

Jan 28, 2025

This update gives a brief project update, includes upcoming events and recent publications from the team! 

News

Project Update

As we enter 2025, the PSDI team is hard at work as we approach an important milestone in the project. In spring we will roll out our initial PSDI resources across data services, tools and guidance, which have been developed with our network of partners. We will begin opening access to our partner network first and then making these resources available to all UK academics shortly after this. 

This initial release of resources will be a major milestone and will show the scope and potential of our infrastructure. But it is just one step on the journey, and we look forward to continuing to grow our community to extend the resources available through PSDI.  

We have recently expanded our communications team so there are lots of exciting things in the pipeline for news, events and collaborations. Keep an eye out for things coming from the team, and if you use social media make sure to follow us (just search PSDI)! 

Event Funding Call

The Physical Sciences Data Infrastructure (PSDI) are inviting proposals from the community to run events that engage the community in issues around scientific data. We have specific funding available to run events before 31st March 2025.

This is a fantastic opportunity to bring your ideas to life, connect with the PSDI community, and support innovation and collaboration in the physical sciences data infrastructure space.

Applications need to be made by 2nd February 2025. Find out more and apply here: https://www.psdi.ac.uk/event-funding-call/

Upcoming Webinars

Green Algorithms, Green DiSC and GREENER Principles: Making Computational Science More Environmentally Sustainable

Thursday 30th January 2025 2pm-3pm GMT

This webinar, presented by Dr Loïc Lannelongue from University of Cambridge, will discuss making Computational Science more environmentally sustainable and the challenges that exist.

More information and registration here:  https://www.psdi.ac.uk/event/webinar-green-algorithms/

From genetic studies and astrophysics simulations to AI, scientific computing has enabled amazing discoveries and there is no doubt it will continue to do so. However, the corresponding environmental impact is a growing concern in light of the urgency of the climate crisis, so what can we all do about it? Tackling this issue and making it easier for scientists to engage with sustainable computing is what motivated the Green Algorithms project. Through the prism of the GREENER principles for environmentally sustainable science, we will discuss what we learned along the way, how to estimate the impact of our work and what hurdles still exist. It will also be a chance to highlight how the new Green DiSC certification framework can support scientists and institutions in making their research more sustainable.

 

Upcoming Events

Research Data in the Physical Sciences: A Forum for Librarians and Research Support Professionals 5-6 March 2025

The Physical Sciences Data Infrastructure (PSDI) and the Digital Curation Centre (DCC) are delighted to invite you to this in-person forum designed for data librarians and research support professionals working within the physical sciences across the UK and EU.

Over two days, attendees will engage in knowledge exchange, community discussions, and networking opportunities, all centred on the challenges, opportunities, and emerging solutions in research data management for the physical sciences.

This event will be free to attend, but spaces are limited. The first round of applications will be assessed on Wednesday 29th January. Apply here: https://www.psdi.ac.uk/event/research-data-forum-2025/

Love Data Week 2025

International Love Data week will be taking place on the 10-14 February 2025. The theme for this year’s event is “Whose Data Is It, Anyway?” You can find lot of events online using the hashtag #Lovedata25 or on this event listing. The University of Southampton’s Hartley Library Research Data Team will be holding an event on the 12th February exploring this theme. Find out more and register here: https://library.soton.ac.uk/lovedata

 

Recent Events and Publications

You can find a collection of publications and presentations in the PSDI community on zenodo https://zenodo.org/communities/psdi/ Recent additions include:

Introduction to PSDI Metadata

Aileen Day gave a presentation at the 3rd Ontologies4Chem workshop Introducing PSDI Metadata. The presentation gives an overview of the metadata planned for PSDI version 1, and an idea of where it’s headed beyond that. Slides are available on Zenodo: https://zenodo.org/records/14609504

Webinar: KnowLedger: An Open Ecosystem for Research Data Management

Professor Stuart Chalk held a hybrid webinar on KnowLedger: An Open Ecosystem for Research Data Management on 12th December 2024. The presentation discusses the current approach on how to achieve KnowLedger, how researchers can get involved, and the timeline of the project. More details on the webinar, including a recording and slides, can be found here: https://www.psdi.ac.uk/event/webinar-knowledger/

Webinar: Introduction to NOMAD

Hampus Näsström gave a webinar on the Introduction to NOMAD on 5th December. The webinar showcases NOMAD’s applications in material synthesis, characterization, simulations, and AI-driven research, with a focus on solar cells, heterogeneous catalysis, and metal-organic frameworks. More details on the webinar, including slides, can be found here: https://www.psdi.ac.uk/event/webinar-nomad/

PSDI Pathfinders – Data to Knowledge 

The recording of our webinar “Pathfinder – Data to Knowledge “is now available to watch on our YouTube channel. This webinar, presented by Alin Elena and Federica Zanca from Science and Technology Facilities Council Daresbury Laboratory, discusses work undertaken in PSDI pathfinder 5 which focuses on transforming data to knowledge through the construction of workflows. In particular this webinar looked at machine learning for interatomic potentials (MLIP). Watch the recording of this webinar on YouTube: https://www.youtube.com/watch?v=hUhns8GcA0E&t=6s 

Modern scientific research workflows use a plethora of diverse software tools and file formats. Unfortunately, the file formats that one software tool can export are often incompatible with the formats required for import by another.  Furthermore, the current capabilities for converting data between these different formats are often slow, unclear and error-prone, particularly because data formats vary in their structure and in the amount of information they can represent, making conversion between specific formats complex and sometimes resulting in information loss. PSDI’s Data Conversion Service (DCS) was created to address this challenge, offering researchers a single, trusted place to convert data formats while helping them understand the likely quality and limitations of different conversions.

Where the idea came from

The need for a Data Conversion Service was first identified during research carried out for the PSDI pilot phase at the University of Southampton, which was published in Digital Discovery. This research identified a recurring issue across the physical sciences: researchers were working with data that existed in many different formats, making collaboration and reuse difficult due to a lack of interoperability. Therefore, highlighting that there was a clear need for “data format conversion between different data types in order to facilitate data exchange between different services, and to allow users to collaborate using common formats.”

A key conclusion of this work was that this issue, alongside many other interoperability challenges could best be addressed by identifying existing software that already offers relevant functionality, and creating the infrastructure needed to allow these tools to work together.

Several converters had already been created by the scientific community to address some of these issues, such as Open Babel, although in their current form they were fragmented and offered little insight into conversion quality or potential information loss. Therefore, rather than creating another converter, PSDI’s focus shifted towards making better use of these existing software tools by bringing them together and exposing their capabilities more transparently.

As Dr. Samantha Pearman-Kanza, who was closely involved in shaping the early direction of the service, explains:

Rather than simply creating another conversion tool, the focus was on making the best use of existing software and elevating their offerings. The aim was to help researchers understand what conversions were possible across different scientific data formats , which existing tools could be used, and where the use of these tools for certain conversions might involve compromises in data quality.

From concept to working service

Early ideas explored a search interface that identified possible conversions and directed users to existing conversion software. This quickly evolved into a more researcher-friendly approach: integrating established converters directly into a single service and exposing their options in a consistent way.

Development was carried out by Research Software Engineers Dr. Ray Whorley, Dr. Bryan Gillis and Dr. Don Cruickshank, who initially prototyped the service as a small Python application before expanding it into a fully-fledged web service and suite of downloadable tools.

Reflecting on this evolution, Dr. Whorley says:

The service now incorporates widely used converters such as Open Babel, Atomsk and c2x. Users can upload files, choose input and output formats, apply available conversion options, and download both the converted file and a detailed log. Accessibility has been built in throughout, with users able to customise fonts, sizes and colour schemes.

The Data Conversion Service interface showing format selection, available converters and indicative conversion quality.

Supporting real research workflows

Alongside the web application, the team developed three downloadable tools: a local browser-based version, a command-line tool and a Python library. These are proving particularly valuable for researchers working with sensitive data or automated workflows.

As Dr. Whorley explains:

“The downloadable tools give researchers confidence that their data remains local, and they can be dropped straight into automated workflows.”

This flexibility allows the Data Conversion Service to support everything from quick, one-off conversions to large-scale, repeatable processing pipelines.

Supporting FAIR data and PSDI’s wider ecosystem

Interoperability is a core part of FAIR data practice, and the Data Conversion Service plays a key role in enabling it. Researchers often need to convert the output of one tool into a format that can be used by the next, or to revive legacy data stored in outdated formats. Our service helps reduce the technical barriers to doing both.

Looking ahead

Now that the Data Conversion Service is established, its future direction will be strongly shaped by user feedback. Researchers can report missing formats and conversions directly through the service, and suggestions are already influencing planned enhancements.

Alongside this, there is clear scope for closer integration between the Data Conversion Service and other PSDI tools and services, for example by enabling data transformed through the Data Revival Service (a service which takes scanned handwritten paper lab notebooks and converts them into machine-readable data) to be converted into a wider range of usable formats, or by generating chemical identifiers such as InChI or SMILES from a broader set of input formats for use in discovery services like Cross Data Search.

As Dr. Pearman-Kanza notes:

“The capacity to convert data between different formats is what really unlock reuse across tools, across projects and across disciplines.”

Potential future developments also include support for conversions that require more than one input file, additional conversion tools, chained conversions where no direct route exists, data visualisation, and an API to enable integration with other platforms and services.

A service built with researchers in mind

For the team, seeing the Data Conversion Service grow from an identified need into a live, widely usable tool has been deeply rewarding. The aim is to make data conversion clearer, more transparent and more inclusive, so researchers can spend less time wrestling with formats and software, and more time doing research.

As Dr. Pearman-Kanza puts it:

“If researchers can trust the conversion process and understand its limitations, they are better placed to make informed decisions about how their data can be used. This includes understanding when conversion is appropriate, what can be gained, and what might be lost, which is an important step towards better research practice overall.”


Try the Data Conversion Service

The Data Conversion Service is freely available to use and designed to fit a wide range of research needs, from quick, one-off conversions to integration within automated workflows. Researchers can explore the web-based service, download local tools, and provide feedback directly to help shape future development.

To get started, visit the live service, watch the short introduction video, explore the documentation, or download the tools to use locally within your own workflows.

Explore the Data Conversion Service and start converting your data with confidence.

 

Loading...