Data Mining in Chemistry
Markus Hemmer and Johann Gasteiger, University of Erlangen-Nürnberg, Germany
The analysis of data sets is one of the most important tasks in the investigation of properties of chemical compounds. Especially in drug design, methods are used to characterize complete sets of chemical compounds instead of describing individual molecules. Data mining, i.e. the exploration of large amounts of data in search for consistent patterns, correlations and other systematic relationships, can be a helpful tool to evaluate "hidden" information in a set of molecules. Finding the adequate descriptors for the representation of chemical structures is one of the most important problems in chemical data mining. A special descriptor of chemical structures has already been used successfully in the Internet project "TeleSpec" and it will now find its application in a new project.
Data Mining Service - Chemistry (DMSC) is a project for the development of a centralized service for the exploration of chemical data sets. With this service it will be possible to analyze chemical data sets for molecular patterns and systematic relationships using the following methods:
The results of these data analyses are presented with modern Web technology using VRML, XML and the special languages CML and MathML and can be downloaded in several graphical formats as well as in more than 30 different chemical file formats. Data Mining Service - Chemistry opens a new way of chemical information processing using the newest WWW techniques to visualize complex trends, patterns and relationships in chemical datasets in a most effective way.
- statistical analyses of individual molecules within a data set;
- self-organizing neural networks for the characterization of complex properties of molecules, e.g., biological activity;
- genetic algorithms for the optimization of fuzzy results of data analysis;
- expert systems, that are able to provide proposals for a complex information space that is produced during data analysis.
Short Paper - Slides (1.06MB)