Authored by:

Ron Swart1, Simon Coles2

1Knowledge Centre for Materials Chemistry, Centre for Process Innovation

2School of Chemistry, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, SO17 1BJ

Download a .pdf copy of this report


The UK Chemical Industry provided more than £18billion to the UK Economy on turnover of around £56billion (end Q2/2021) and was responsible for more than 20% of total UK R&D spending. In the same period the Chemical Industries Association (CIA) reported that 94% of businesses had seen sales remain steady or increase in the second Quarter compared to the Q1 of 2021. Steve Elliott CEO of the CIA commented that “Our focus now turns increasingly to the opportunities and challenges of net zero with our latest survey showing the industry well-placed to continue manufacturing innovative products that will help the country deliver a new future.”

However, it was also reported that significant challenges exist not only with the development of technology but also with increases in the cost of raw materials, including energy and maintenance of the supply chain. These challenges will only have increased since, but PSDI could help reduce the impact by making data more available to industry with many of the benefits directly relevant, for example

  • Leverage simulation data to drive experimental science and vice versa
  • Surface data from many sources
  • Enable data to be exploited by AI methods
  • Provide a common platform to run models and codes from different sources
  • Be a place for curation of legacy beyond individual projects

With so much invested in R&D, the Chemical Industry is not only one of the first adopters of new technology but also one of the largest users and producers of physical data and consequently an important component of data generation and data driven discovery outlined in the introduction of the Statement of Need. This activity is expected to continue.

The Chemical Industry understands the benefits and opportunities of AI and machine learning but is also aware that both need large, broad data sets. Recommendations from the Infrastructure Commission are applicable to PSDI, namely

  • A Digital Framework for secure sharing of infrastructure data
  • Coordination of key players including national facilities, academic institutes, and the Catapult network

As Bill Gates put it in his book The Road Ahead, “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten. Don’t let yourself be lulled into inaction.”

Through effective gathering, curation and availability of physical data we can become more accurate in predicting the next two years and more adventurous in predicting the next ten.


A 2015 study from Imperial College Business School found that data currently contributed around £50bn a year in direct, indirect and induced impacts to the UK economy. Big data is predicted to contribute more to economic growth in the period from 2012-2025 than the typical contributions from R&D!

Data is now as much a critical component of our R&D infrastructure as laboratories, equipment and national facilities. Data is part of the R&D infrastructure and needs maintenance in the same way that physical infrastructure needs maintenance. It must be updated, housed, curated and made secure yet available.

The National Infrastructure study “Data for the Public Good” states “Greater access to open data enables greater innovation. The Industrial Strategy White Paper sets out the Grand Challenge “we will put the UK at the forefront of the AI and data revolution” and emphasises the gains that AI can bring to the UK economy…..The increased sharing of high-quality data is crucial to the development of AI and what it can achieve for infrastructure and consumers” a conclusion equally applicable to R&D.

An initial search of some UK landscapes (mostly via the KTN) indicated large numbers of Companies which could be interested in accessing curated and trustworthy physical data. These numbers include different divisions within these Companies and clearly not all will want to access and make use of physical data. But the numbers show the size of the potential user group

  • Satellite Technology
    • Data processing – 98 companies
    • Materials manufacturing – 84 companies
    • Test facilities – 74 companies
  • Chemistry
    • Chemical Manufacturing – 9,261 companies
      The Chemical Manufacturing subsector includes the transformation of organic and inorganic raw materials by a chemical process and the formulation of products. This subsector is separate from the production of basic chemicals that comprise the group below. This subsector does not include all industries transforming raw materials by a chemical process (eg mining and mineral beneficiation; the petroleum industry)
    • Basic Chemicals Suppliers – 853 companies
  • Industrial Biotechnology – 43 companies
  • Sustainable aviation – 60 companies
  • Synthetic biology – 75 companies
  • Agri-food – 101 research capabilities
  • Process manufacturing – 101 companies
  • Compound Semi-conductors supply chain – 461 companies
  • Precision medicine – 59 companies
  • Photonics – 826 companies

Also reviewed briefly

  • VIMMP Virtual Materials Market Place
    Further analysis would be interesting to understand their approaches to industry engagement

We have direct contacts into

  • CPI and via CPI the wider Catapult network
  • KCMC (Knowledge Centre for Materials Chemistry)
  • Hartree High Performance Computing Centre
  • UKRI KTN chemistry Group
  • Henry Royce InstituteAll of the above have broad industry links and could expand further the range of industry contacts who could be users of PSDI. They would also assist in focusing the approach to define the different user groups and their particular needs more rapidly

One final thought, use of the data infrastructure in the sense of a “descending route” of data intensive research could be used to identify hitherto, unidentified needs.


There are several approaches to the identification and subsequent engagement with industry. We would recommend an approach to classifying industry engagement and likely utilisation based upon the 4 pillars could work as follows

  • Pillar 1. Facilities, Institutes and Hubs – significant centralised national facilities and activities that serve a large number of researchers based on a common need
    Engage with the Catapult network, MIF and Royce. Their industrial contacts will cover a range of sectors and the collaborative projects a range of TRLs. The generation and/or utilisation of data will vary across these sectors and will give us a broad spectrum of applications. Based on this discussion, a range of needs could be defined and from these a list of MUSTS and WANTS developed for the data infrastructure. The MUSTS could then become some of the core characteristics of the system whereas the WANTS would be more sector specific and may not always be achievable. A key question would be how such an infrastructure could be exploited to enhance technology translation. This is particularly important to the Catapult network and a clear explanation of how the PSDI would enhance translation would be a key outcome.
  • Pillar 2. National Research Facilities – medium-scale centralised facilities operating at a world-leading level to perform research that cannot be addressed in a standard laboratory
    Utilisation of the data generated at these Research Institutes will be much easier if its existence, curation and availability is known and access is straight forward. Companies tend to access these medium-scale facilities via an academic partner or other institution. However, some spin-outs and SMEs can be more comfortable utilising data directly because of their technology base. Investigate how other such databases (e.g. in Germany NFDI or Australia ARDC) have engaged industry in helping to define the type of data of interest and the most efficient methods of access.
  • Pillar 3. Computational Initiatives – uniting performing simulations with the communities and tools required to do so
    There is a lot of interest in AI/ML but also a growing awareness of how much data is needed. Companies often have historic archives, but these can be difficult to access. Even then, data will be missing or of (now) dubious quality and completion of data sets from available DBs via the PSDI would be very powerful. An alternative approach would be comparison of a Company’s in-house data with data sets held on the PSDI. A good match with the PSDI data would encourage trust in the historical data held by a Company, could potentially fill gaps or complete data sets and lead to a more rapid implementation of AI or ML approaches.

    Discuss with Companies known to be interested in this area – some large corporates, medium sized enterprises.

    We should also be aware of less technically challenging objectives such as availability of materials with specific properties to bolster a supply chain and/or which might be potential replacements for current materials – driven by carbon footprint, availability, price, toxicity/environmental concerns.

  • Pillar 4. Research Institutions, research groups and laboratories – the community of institutions with strong research profiles and a degree of data management and computing capabilities.
    Broaden access and use of physical data sources to a wider variety of businesses. Involve the Catapult network to act as a conduit to these data bases for Companies seeking access to high quality physical data to move their technology more rapidly
    along the TRL spectrum toward commercialisation. Make Companies aware of the potential of the “descending route of data intensive research” as a means to identifying new challenges (and market opportunities) which could be addressed by the physical properties of their current products; or where they have the technical background to develop completely new offerings to meet a new challenge.

Initial List of Companies which could be approached

  • GSK
  • AstraZeneca
  • Linde
  • UCB Celltech
  • Thermo Fisher
  • Inovyn (part of Ineos)
  • Unilever
  • Croda International
  • Cargill
  • Roche
  • Jannsen
  • Novartis
  • BASF
  • PZ Cussons
  • Air Products
  • Victrex
  • Siemens
  • Reckitt-Benckiser
  • Syngenta
  • DOW silcones
  • Argent Energy
  • Alkegen
  • PQ Silicas
  • M&I Materials
  • BAE sytems
  • Beckers
  • DuPont Tejin
  • Akzo Nobel
  • M&I Materials
  • Infineum
  • NSG Pilkington
  • Tata Steel
  • PragmatIC Semiconductor
  • Johnson Matthey
  • Deragallera
  • HyNet North West
  • Neo Chemicals and Oxides
  • Ceres Power
  •  …