Breaking News on Contract Research, Manufacturing & Clinical Trials

Headlines > Preclinical Research

Teaching computers to read chemical structures

By Dr Matt Wilkinson, 01-Aug-2007

Related topics: Preclinical Research

German scientists have developed software that can recognise chemical structure images and convert them into a computer readable format to enable graphical searching of patents and scientific papers.

The new software, dubbed chemoCR, was developed at the Fraunhoffer Institute for Algorithms and Scientific Computing (SCAI) in Sankt Augustin in Germany to allow computers to trawl through scientific papers and patents without needing a trained operator to input the data manually.

 

 

 

The technology will now be further developed and marketed in collaboration with German cheminformatics company InfoChem who will share the rights to the technology.

 

 

 

The need for manual data input has slowed the process of building up the structural databases commonly used by research chemists to look for reaction schemes and check if the processes are patented.

 

 

 

"With our software, for the first time, millions of patents can be searched using the chemical information contained in the pictures," said Professor Martin Hofmann-Apitius, director of the Bioinformatics Department at Fraunhofer Institute SCAI.

 

 

 

"This opens new possibilities for the investigation of patent claims on compounds and synthesis procedures; chemoCR addresses one of the most common challenges of the chemical and pharmaceutical industry."

 

 

Chemists readily understand the images as chemical structures, but computers see nothing but an accumulation of pixels.

 

 

 

This has made computerised indexing of chemical structures a big challenge to pharmaceutical and chemical companies as many publications and patents contain structures as pictures and not easily searchable data files.

 

 

 

This has led to the structures depicted in the images being entered into databases by trained chemists to allow structure searching.

 

 

 

"Up to now, structures have been drawn by chemists in India, Russia and other low-wage countries, and entered manually in databases. With chemoCR we can now reconstruct chemical structures faster and more cost-effectively with computers," said Dr Peter Loew, InfoChem's CEO.

 

 

 

The chemoCR system uses pattern recognition techniques to identify the structural formulas from the symbols used to depict various types of bonds, atoms and reaction arrows.

 

 

 

The molecules are then reconstructed in a manner that allows them to be readily searched in a database before the results are validated.

 

 

 

The reconstructions can then be used by all chemical drawing and database programs to allow companies to efficiently add data to their cheminformatics systems.

 

 

 

This will no doubt be a useful tool for scientific publishing company, Springer-Verlag, which has held a majority shareholding in InfoChem since 1991.

Follow us on