Unlocking Hidden Chemical Knowledge Through AI
Modern scientific discovery depends on data, yet vast quantities of valuable research information remain trapped in unstructured and under-utilised formats. Handwritten lab notebooks, patents, academic papers, PDFs, scanned images, charts, and legacy databases contain decades of chemical knowledge that are difficult—or impossible—to search, analyse, or reuse. The Data Revival project addresses this challenge head-on, using advanced machine-learning techniques to transform inaccessible chemical data into structured, searchable, and reusable digital resources.
By focusing on chemistry and chemical research, Data Revival enables researchers to unlock insights buried in historical and contemporary datasets, accelerating discovery and innovation across academia and industry.
The Challenge of Unstructured Data
Across the chemical and physical sciences, unstructured data presents a critical barrier to innovation. Valuable chemical knowledge is frequently locked within diverse formats—ranging from handwritten laboratory notebooks and internal R&D reports to complex patent filings containing Markush structures and static PDFs of academic literature. This fragmentation renders critical data invisible to modern digital workflows, leading to wasted resources on redundant experimentation and the failure of AI initiatives that lack high-quality, structured training data.
Data Revival addresses this by transforming static information into structured, computable digital assets. Our multi-modal AI technology extracts complex chemical data—including text, images, and reaction schemes—and converts it into high-value formats tailored to specific research needs, such as standardized Organic Reaction Databases (ORD) for machine learning or structured inputs for Electronic Lab Notebooks (ELNs).
This capability is proven at scale. We successfully transformed over 2,000 complex, handwritten experimental records at the University of Southampton into accessible digital assets. Similarly, we have demonstrated our ability to process scientific literature by converting 7,000 academic papers into a structured dataset containing over 100,000 unique reactions.
Reviving Chemical Knowledge with AI
Data Revival demonstrates how AI-driven approaches can reverse the trend of data loss. By digitising and analysing the Southampton chemistry archive, the project transformed a dormant collection into a FAIR-aligned (Findable, Accessible, Interoperable, Reusable) digital resource.
Using advanced techniques such as natural language processing, chemical structure recognition, and semantic data linking, Data Revival converted handwritten notes, diagrams, and experimental records into a structured, searchable database. The result was the recovery of an estimated 3,000 chemist-years’ worth of data, now accessible for modern research and analysis.
The platform has since proven its wider commercial impact through engagements with a number of large chemical, polymer, and pharmaceutical corporations, as well as specialized AI-chemistry companies. Data Revival has demonstrated clear operational value within these sectors and is actively expanding into routine industrial use. While currently focused on the physical sciences, the underlying approach holds transformative potential for other data-intensive fields, such as healthcare and law.
Partnership with PSDI
The collaboration between Data Revival and the PSDI extends these advanced capabilities to the wider physical sciences community. Through PSDI’s national platform, researchers can directly engage with Data Revival’s workflows, lowering barriers to adoption and enabling scalable use across institutions. Crucially, the structured digital assets generated by Data Revival are designed to serve as high-quality inputs for other PSDI services, fostering an interconnected ecosystem of data enrichment and utility.
Data Revival operates as an independent commercial entity, established as a spin-out from the University of Southampton by founders with deep strategic involvement in the PSDI initiative. Backed by £565,000 in pre-seed investment, the company maintains its own robust infrastructure designed to serve both the industrial and academic sectors. The partnership functions as the dedicated delivery channel for academia; Data Revival offers its service free of charge to UK researchers via the PSDI platform, ensuring that the resulting enriched data is returned to the ecosystem to support the broader national research infrastructure.
Key Capabilities Delivered via PSDI
This capability enables researchers to move beyond static archives, turning disparate data into living, resources ready for digital R&D initiatives that drive new hypotheses, support reproducible science, and fuel the next generation of discovery.
Enabling Discovery and Innovation
Data Revival distinguishes itself through a deep, chemistry-aware intelligence that outperforms generic digitization tools. By accurately interpreting complex scientific entities, from Markush structures to dense reaction schemes, the platform transforms fragmented information into high-fidelity, AI-ready data. This capability allows researchers to bridge the gap between static documents and dynamic discovery, converting isolated data points into a cohesive, computable knowledge base.
Through the PSDI partnership, the academic community gains direct access to this enterprise-grade technology. This collaboration democratizes access to industrial-strength data engineering, bringing cutting-edge AI tools into everyday research practice and empowering the wider scientific community to drive data-centric innovation.
Try Data Revival on PSDI
PSDI offers researchers the opportunity to explore Data Revival’s capabilities firsthand. Users can seamlessly convert handwritten lab book pages into machine-readable, searchable data and unlock the full potential of their chemical research records.
To access the service, visit either www.data-revival.com or go via the PSDI Data Revival Service. A PSDI account is required to get started.
The Data Revival Team
Read More
Upcoming Events
📅 26 February 2026
🔗Webinar: Breaking Data Silos – From static documents to living data
Looking Ahead
As the collaboration between Data Revival and PSDI evolves, the focus will shift from initial deployment to deep ecosystem integration. The long-term vision is to establish a seamless national pipeline where unstructured research outputs—whether legacy archives or newly published literature—are routinely converted into high-value, AI-ready assets.
As Data Revival continues to advance its multi-modal AI capabilities for the commercial pharmaceutical and chemical sectors, these innovations will propagate through the PSDI platform. This ensures that the academic community gains continuous access to state-of-the-art data engineering tools. Ultimately, this partnership aims to close the loop on scientific data, creating a fully connected environment where every piece of generated knowledge is computable, reusable, and actively powering the next wave of scientific breakthrough.