Virtual room makes knowledge out of data

A team at Purdue University in the US is developing a virtual 'data
cave' for scientific discovery that uses high-performance computing
and artificial intelligence software to display information and
interact with researchers.

One of its earliest applications is expected to be in the pharmaceutical and chemical sectors, according to the team behind the work.

"If you were a chemist, you could walk right up to this display and move molecules and atoms around to see how the changes would affect a formulation or a material's properties," said James Caruthers, a professor of chemical engineering at Purdue.

The idea is similar to one employed in the recent science-fiction thriller Minority Report, in which actor Tom Cruise plays a detective who solves future crimes by being immersed in a virtual space in which he rapidly accesses all the relevant information about the identity, location and associates of the potential victim.

The method represents a fundamental shift from more conventional techniques in computer-aided scientific discovery, according to the researchers.

"Most current approaches to computer-aided discovery centre on mining data in a process that assumes there is a nugget of gold that needs to be found in a sea of irrelevant information,"​ according to Caruthers.

But while this data-mining approach is appropriate for some scientific discovery problems, human scientists tend to work in a different way, using a process known as knowledge discovery.

"This is more like sifting through a warehouse filled with small gears and levers, none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts,"​ he said.

The system is specifically designed to cope with the vast amount of data being generated through high-throughput experimentation in many areas of research, including work aimed at creating new drugs, fuel additives, catalysts and rubber compounds.

Their method, dubbed 'discovery informatics', enables researchers to test new theories on the fly and literally see how well their concepts might work in real time via a three-dimensional display, said Venkat Venkatasubramanian, another professor of chemical engineering at Purdue working to develop the new system.

Discovery informatics depends on a two-part repeating cycle made up of a "forward model" and an "inverse process" and two types of artificial intelligence software: hybrid neural networks and genetic algorithms.

The forward model combines fundamental knowledge and rules of thumb with neural networks - software that mimics how the human brain thinks - to tell researchers how a particular material will perform.

In the forward model, a researcher postulates a molecular structure or a product's formulation and then wants to predict what properties that structure or formulation will have. The inverse process is just the opposite: Researchers enter the properties they are looking for, and the system gives them a molecular structure or formulation that will likely have those properties, said Nicholas Delgass, also a professor of chemical engineering at Purdue. The inverse process cannot begin until the forward model is completed because the former depends on information in the model.

Venkatasubramanian said: "The product design problem is this: I want some material that would have the following mechanical, chemical, electrical properties and so on.

"I know what properties I want in order to get my job done, but I don't know what material, what molecular combinations, will give me that. You know the answer, but you are looking for the question."

The inverse process may use genetic algorithms, software programs that mimic the Darwinian survival-of-the-fittest evolutionary approach to find the best candidates. The algorithms cull the best materials and eliminate the poor performers, just like survival of the fittest, generating 'mutations' of the best materials to create even better versions over time, and the software determines the chemical structures of those mutations.

The resulting formulas are tested and used to improve the forward model, and the cycle starts over again, progressively creating better and better solutions.

"The opportunities are enormous for engineers who work in product design, which is now largely done as an art form by formulation chemists,"​ Caruthers said. "We want to retain the creative aspects that can only come from the human mind, while reducing the amount of guesswork now needed to create new catalysts and other materials.

With conventional methods, it might take several years and thousands of tests before hitting on the right formulation, whereas discovery informatics dramatically speeds up the process by using a computer to sample potential materials and requires a fraction of the usual number of laboratory experiments.

The method will be tested in a new Centre for Catalyst Design. Catalysts in US industry account for billions of dollars in annual business revenues. That means even small improvements in catalyst performance can result in significant increases in profits, Delgass said.

Using the system, data is converted into interactive images and visualised on a three-dimensional, 12-foot-wide, 7-foot-high display. It is displayed in stereo, with the left and the right eye each getting its own picture, so you get a 3-D depth effect. This allows researchers to look at an entire problem, including chemical and atomic structures, graphs and charts.

"Discovery requires human beings making intuitive leaps,"​ Caruthers said. "You try one thing. It doesn't work, you try something else. Sometimes you go off in an entirely new direction."But this process is very inefficient. What we are doing is enhancing the efficiency of this process, assisting the intuitive human mind by providing massive data and computing power."

Related news

Show more