I study the role of machine representation of knowledge and hypothesis in life sciences, drug discovery, and toxicology, involving chemometrics and semantic web technologies. In the past, I have applied research on this in QSAR, crystallography and metabolomics. Open source programming is my main hobby resulting in participation in, amongst many others, Chemistry Development Kit, , Bioclipse , BridgeDb , and others. See also my biography and below.

My Group

Being part of the BiGCaT department, I have now established my own research group. Future vacancies may be available.

Group Members

Past Members

Other students that I supervised include Harm (6 month M.Sc. partical University of Nijmegen), Niels Out and Rob Schellhorn (3 month Programmeerzomer projects), and Alexandr Goncearenco, Rianne Feijten, and Answesha Dutta (3 month Google Summer of Code projects).


The research in my group is about reducing the error introduced in chemical data analysis by the analysis itself, a field coined Molecular Chemometrics. This involves study of the errors introduced by improper handling of chemical knowledge, improper representation of the problem, and the statistical analysis method used. The solution my group is using to reduce that error involves explicit markup of knowledge using semantic technologies, development of new representation methods for molecular information, and the use of statistics to find, visualize, and validate new patterns.

Practically, however, this also means that to prove my point, I need others to adopt such methods too. This has resulted in that I am involved very much in Applied Cheminformatics too. This research field is about getting sound cheminformatics used in practice, and raising awareness about the problems in Molecular Chemometrics among people in other research fields, such as bioinformatics, metabolomics, QSAR, etc.

Each part of my research is exemplified by one or two key or recent papers.

Semantic Technologies

Technologies studied here include markup languages, like Chemical Markup Language, and the family of semantic approaches around the Resource Description Framework. These methods are being studied for use in cheminformatics and exchange of molecular data. This work has resulted in the development of CMLRSS, QSAR-ML, the Blue Obelisk Descriptor Ontology (BODO), and the CHEMINF ontology.

  • Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, C., and Dumontier, M. (2011). The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web. PLoS ONE 6, e25513. doi:10.1371/journal.pone.0025513.
  • Willighagen, E. L., Alvarsson, J., Andersson, A., Eklund, M., Lampa, S., Lapins, M., Spjuth, O., and Wikberg, J. E. (2011). Linking the Resource Description Framework to cheminformatics and proteochemometrics. Journal of Biomedical Semantics 2, S6. doi:10.1186/2041-1480-2-S1-S6.

Molecular Representation

This branch of the research focuses on the presentation of molecular data, such that statistical methods can extract the most information from the data or generate the best prediction models. This research resulted in the development of a new descriptor for molecular crystal structures, the debunking of NMR spectra for some modeling approaches, and the development of applied tools, like the Chemistry Development Kit.

  • Willighagen, E. L., Wehrens, R., Verwer, P., De Gelder, R., and Buydens, L. M. C. (2005). Method for the computational comparison of crystal structures. Acta crystallographica Section B Structural crystallography and crystal chemistry 61, 29-36. doi:10.1107/S0108768104028344.


Statistical methods help us find, understand and visualize complex patterns in our molecular data. This research focuses on improving the expert validation of our statistical models, by linking the models to external data. The latter brings us back to the semantic technologies to do that accurately. This work resulted in the application of supervised Self-Organizing Maps to classification problems with multiple end points.

  • Willighagen, E. L., Wehrens, R., Melssen, W., De Gelder, R., and Buydens, L. M. C. (2007). Supervised Self-Organizing Maps in Crystal Property and Structure Prediction. Crystal Growth Design 7, 1738-1745. doi:10.1021/cg060872y.

Applied Cheminformatics

This part of the research is about getting the above methods used in other scientfic fields. This involves the development of tools that (unfortunately) hide much of the research from the above three fields, so that they can be easily used in other research fields. This research, particularly, is well accepted by the scientific community. Solutions here include the Chemistry Development Kit, Jmol, Bioclipse, and Oscar, which all make more fundamental research in Molecular Chemometrics more accessible.

  • Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E. (2003). The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. Journal of Chemical Information and Computer Sciences 43, 493-500. doi:10.1021/ci025584y.

Research Topics

The topics that this research covers all involves small molecules and nanomaterials, and is currently directed at toxicology and life sciences in general. Research is ongoing in the fields of drug discovery, QSAR/QSPR, among others, in toxicology and nanotoxicology (and property prediction in general), and metabolite and small compound pathways. Past topics include crystallography and polymorph prediction.

Research Collaborations

Collaborations are ongoing with: Dr Christoph Steinbeck (EBI/UK) on the CDK. Bioclipse, and JChemPaint; Prof. Bengt Fadeel, Dr Hanna Karlsson, and Prof. Roland Grafström on predictive toxicology; Dr Steffen Neumann (Halle/Germany) on open source metabolomics; Dr Ola Spjuth (Uppsala University) on Bioclipse and a Swedish data warehouse for toxicity; with Dr Nina Jeliazkova and others of the OpenTox.org community; with many people in the CDK community; with various people in the HCLS interest group; and others.

EU Projects

I am currently involved in the eNanoMapper and Open PHACTS project. The first is part of the European NanoSafety Cluster. Previously, I worked in the ToxBank and as scientific advisor for OpenTox projects on the use of semantic web technologies in drug discovery.


For now, please find my publication list at Google Scholar, CiteULike, ORCID:0000-0001-7542-0286, Mendeley and the less populated researchid:C-6136-2008 and dai:308108485.


Post Address

Maastricht University
Department of Bioinformatics - BiGCaT
dr E.L. Willighagen
P.O. Box 616
UNS50 Box19
NL-6200 MD Maastricht

Visiting Address

Department of Bioinformatics - BiGCaT
Universiteitssingel 50
Room 1.314

Web Addresses

Email: egon.willighagen@Gmail . Blog: chem-bla-ics. Social Networking: egonw@SourceForge , egonw@LinkedIn , egonw@FriendFeed , egonw@GitHub , egonwillighagen@Twitter , egonw@CiteULike , 0000-0001-7542-0286@ORCID , egon-willighagen@Mendeley , chemblaics@Identi.ca , egonwillighagen@Lanyrd