Scientific Integrity in the 1950s: A Story of Data and Discovery

Home » Scientific Integrity in the 1950s: A Story of Data and Discovery

Aug 15, 2024

Could the FAIR principles have enhanced Rosalind Franklin’s legacy?

By Saanvi Jiteendra

Let me take you back in time to the 1950s when scientists across the world were racing to uncover the truth behind our genetic material, DNA. Famously, the scientists who won this race were Watson and Crick. They developed a model of DNA stating that it is a double helix composed of two opposite strands, joined together by hydrogen bonds. However, these specifics would not have been possible without another great scientist, and her work with X-ray Crystallography, Rosalind Franklin. Through her images of DNA, Franklin determined that DNA is a double helix, and calculated its diameter, distance between the strands, and their angles. When Watson visited his friend’s lab at King’s College, he saw Franklin’s report with her results and realised the true structure of DNA. It is a known fact that Watson and Crick relied on Franklin’s data to create their model.

So why am I telling you all of this? Did Watson and Crick steal Franklin’s data? While it is obvious Watson and Crick were unprofessional when taking Franklin’s data and should have sought her permission, her report was not confidential [4]. However, could this have been avoided with more standardised practices we have today regarding data sharing? Yes! Today scientists are encouraged to openly and FAIRly share their data. With the FAIR principles, scientists can ethically obtain someone’s data and work from it to create new data. [2] FAIR stands for findability, accessibility, interoperability, and reusability. FAIR data should be easily machine-readable, as most researchers use computers to analyse their data. This is why, EVERYTHING, including metadata (information about the data), should be included. 

To abide by the FAIR principles, researchers are recommended to put their data in a trusted repository [3] such as Harvard Dataverse (dataverse.harvard.edu), Uniprot (www.uniprot.org), and wwwPDB (www.rcsb.org) to name a few. They should ensure that their data and metadata have a unique persistent identifier (ex. DOI), which allow their data to be findable [1]. It should be stated how users can access a researcher’s data, sometimes with an authentication and authorisation procedure. Data should use a widely accepted file format so it can interoperate with different applications and workflows for analysis. Finally, metadata and data should be assigned a licence and clearly described so they can be replicated by others.  

Had the FAIR principles existed in the 1950s, maybe Franklin would have been aware of the significant impact her data had in developing the model of DNA. FAIR principles require a license and crediting of data sources which are good data management practices, so Watson and Crick would have been compelled to credit Franklin. Adherence to FAIR enables scientists to easily see where data can be reused through inclusion of clear licensing. Research institutes would have also had strict data sharing policies that would have had to be implemented to allow ethical sharing of data.  

Situations like these in the past, are the reason why the FAIR principles were created, so that we can learn from others’ mistakes. Often an important discovery is made through the contribution of many people. Abiding by the FAIR principles ensures that every contributor can be acknowledged, and data is shared ethically. I am sure Rosalind Franklin, and many other scientists who haven’t been acknowledged in the past for their discoveries would be grateful for the progress we have made in the scientific community. It is important that we all, especially new scientists, start abiding by the FAIR principles at a college and university level, so we won’t have to learn these principles at a later stage. 

A headshot of Saanvi

About the Author

Hi! My name is Saanvi Jiteendra, I am a PSDI intern and a 3rd year Bsc. Biochemistry student at the University of Southampton. My role at PSDI is Scientific Communications and Engagement, as part of my time here, I am creating content (like this article) to help explain what PSDI does and promote our engagement on social media. 

Works Cited (Vancouver)

1. The Open University Library. FAIR Principles in under 60 seconds [Internet]. YouTube. The Open University Library; 2023 [cited 2024 Aug 1]. Available from: https://www.youtube.com/watch?v=tFFRtP6h_KQ&ab_channel=TheOpenUniversityLibrary 

 2. GO FAIR. FAIR Principles – GO FAIR [Internet]. GO FAIR. GO FAIR; 2017 [cited 2024 Aug 1]. Available from: https://www.go-fair.org/fair-principles/ 

 3. Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Scientific Data [Internet]. 2016 Mar 15 [cited 2024 Aug 1];3(1). Available from: https://www.nature.com/articles/sdata201618 

 4. Cobb M. Sexism in science: did Watson and Crick really steal Rosalind Franklin’s data? [Internet]. The Guardian. Guardian News & Media Limited; 2018 [cited 2024 Aug 1]. Available from: https://www.theguardian.com/science/2015/jun/23/sexism-in-science-did-watson-and-crick-really-steal-rosalind-franklins-data 

Modern scientific research workflows use a plethora of diverse software tools and file formats. Unfortunately, the file formats that one software tool can export are often incompatible with the formats required for import by another.  Furthermore, the current capabilities for converting data between these different formats are often slow, unclear and error-prone, particularly because data formats vary in their structure and in the amount of information they can represent, making conversion between specific formats complex and sometimes resulting in information loss. PSDI’s Data Conversion Service (DCS) was created to address this challenge, offering researchers a single, trusted place to convert data formats while helping them understand the likely quality and limitations of different conversions.

Where the idea came from

The need for a Data Conversion Service was first identified during research carried out for the PSDI pilot phase at the University of Southampton, which was published in Digital Discovery. This research identified a recurring issue across the physical sciences: researchers were working with data that existed in many different formats, making collaboration and reuse difficult due to a lack of interoperability. Therefore, highlighting that there was a clear need for “data format conversion between different data types in order to facilitate data exchange between different services, and to allow users to collaborate using common formats.”

A key conclusion of this work was that this issue, alongside many other interoperability challenges could best be addressed by identifying existing software that already offers relevant functionality, and creating the infrastructure needed to allow these tools to work together.

Several converters had already been created by the scientific community to address some of these issues, such as Open Babel, although in their current form they were fragmented and offered little insight into conversion quality or potential information loss. Therefore, rather than creating another converter, PSDI’s focus shifted towards making better use of these existing software tools by bringing them together and exposing their capabilities more transparently.

As Dr. Samantha Pearman-Kanza, who was closely involved in shaping the early direction of the service, explains:

Rather than simply creating another conversion tool, the focus was on making the best use of existing software and elevating their offerings. The aim was to help researchers understand what conversions were possible across different scientific data formats , which existing tools could be used, and where the use of these tools for certain conversions might involve compromises in data quality.

From concept to working service

Early ideas explored a search interface that identified possible conversions and directed users to existing conversion software. This quickly evolved into a more researcher-friendly approach: integrating established converters directly into a single service and exposing their options in a consistent way.

Development was carried out by Research Software Engineers Dr. Ray Whorley, Dr. Bryan Gillis and Dr. Don Cruickshank, who initially prototyped the service as a small Python application before expanding it into a fully-fledged web service and suite of downloadable tools.

Reflecting on this evolution, Dr. Whorley says:

The service now incorporates widely used converters such as Open Babel, Atomsk and c2x. Users can upload files, choose input and output formats, apply available conversion options, and download both the converted file and a detailed log. Accessibility has been built in throughout, with users able to customise fonts, sizes and colour schemes.

The Data Conversion Service interface showing format selection, available converters and indicative conversion quality.

Supporting real research workflows

Alongside the web application, the team developed three downloadable tools: a local browser-based version, a command-line tool and a Python library. These are proving particularly valuable for researchers working with sensitive data or automated workflows.

As Dr. Whorley explains:

“The downloadable tools give researchers confidence that their data remains local, and they can be dropped straight into automated workflows.”

This flexibility allows the Data Conversion Service to support everything from quick, one-off conversions to large-scale, repeatable processing pipelines.

Supporting FAIR data and PSDI’s wider ecosystem

Interoperability is a core part of FAIR data practice, and the Data Conversion Service plays a key role in enabling it. Researchers often need to convert the output of one tool into a format that can be used by the next, or to revive legacy data stored in outdated formats. Our service helps reduce the technical barriers to doing both.

Looking ahead

Now that the Data Conversion Service is established, its future direction will be strongly shaped by user feedback. Researchers can report missing formats and conversions directly through the service, and suggestions are already influencing planned enhancements.

Alongside this, there is clear scope for closer integration between the Data Conversion Service and other PSDI tools and services, for example by enabling data transformed through the Data Revival Service (a service which takes scanned handwritten paper lab notebooks and converts them into machine-readable data) to be converted into a wider range of usable formats, or by generating chemical identifiers such as InChI or SMILES from a broader set of input formats for use in discovery services like Cross Data Search.

As Dr. Pearman-Kanza notes:

“The capacity to convert data between different formats is what really unlock reuse across tools, across projects and across disciplines.”

Potential future developments also include support for conversions that require more than one input file, additional conversion tools, chained conversions where no direct route exists, data visualisation, and an API to enable integration with other platforms and services.

A service built with researchers in mind

For the team, seeing the Data Conversion Service grow from an identified need into a live, widely usable tool has been deeply rewarding. The aim is to make data conversion clearer, more transparent and more inclusive, so researchers can spend less time wrestling with formats and software, and more time doing research.

As Dr. Pearman-Kanza puts it:

“If researchers can trust the conversion process and understand its limitations, they are better placed to make informed decisions about how their data can be used. This includes understanding when conversion is appropriate, what can be gained, and what might be lost, which is an important step towards better research practice overall.”


Try the Data Conversion Service

The Data Conversion Service is freely available to use and designed to fit a wide range of research needs, from quick, one-off conversions to integration within automated workflows. Researchers can explore the web-based service, download local tools, and provide feedback directly to help shape future development.

To get started, visit the live service, watch the short introduction video, explore the documentation, or download the tools to use locally within your own workflows.

Explore the Data Conversion Service and start converting your data with confidence.

 

Loading...