ChEMBL database as linked open data

The eNanoMapper database for nanomaterial safety information

Bioclipse-R: integrating management and visualization of life science data with statistical analysis

WikiPathways WP2513: Nanoparticle triggered regulated necrosis

About me

I study the role of machine representation of knowledge and hypothesis in life sciences, metabolomics, drug discovery, and toxicology, involving cheminformatics, chemometrics and semantic web technologies. In the past, I have applied research on this also to QSAR and crystallography. Open source programming and Open science is also my main hobby, resulting in participation in, amongst many others, Chemistry Development Kit, WikiPathways, Bioclipse, BridgeDb, and others.


Egon Willighagen studied Chemistry at the University of Nijmegen in the Netherlands (1993-2001) where he did a minor in organic chemistry at the Department of Supramolecular Chemistry (Prof. R. Nolte) studying the supramolecular agglomeration of amphiphiles with relation to their DNA transfection properties, and a major in chemometrics at the Deparment of Analytical Chemistry (Prof. L.M.C. Buydens) studying the unsupervised classfication of polymorphic organic crystal structures.

He continued his studies on the relation of representation of molecular knowledge and machine learning during his PhD at the same Department of Analytical Chemistry at now-called Radboud University (2002-2006). Taking advantage of his extra-curriculum research in cheminformatics, he worked on on methods to optimize the amount of information gained from pattern recognition methods in the fields of Quantitative Structure-Activity Relationships (QSAR), supervised clustering and prediction of properties of organic crystal structures, the general reduction of error introduced in exchange of chemical data, and on improving the reproducibility of data analysis in this field in general. This research was partly performed at the European Bioinformatics Institute in Hinxton (Prof. J. Thornton) and the nearby Cambridge University in the United Kingdom (Prof. P. Murray-Rust, 2003, 3 months). The work resulted in the thesis Representation of Molecules and Molecular Systems in Data Analysis and Modeling (2008, ISBN:978-90-9022806-8).

After his PhD research he continued his efforts on reducing the error introduced by data aggregation and cheminformatics toolkits during a year at the Cologne University Bioinformatics Institute in Germany (dr C. Steinbeck) (2006-2007), participating the bioinformatics education (Homology Modeling). He then returned to The Netherlands and did a year at Wageningen University and Research Centre embedded in the Netherlands Metabolomics Center and worked on use of cheminformatics in accurate metabolite identification (2007-2008).

In 2008 Willighagen started a post-doc at the Department of Pharmaceutical Biosciences (Prof. E. Brittebo) strenghtening the cheminformatics knowledge of the Bioclipse and Proteochemometrics Group (Prof. J. Wikberg) at Uppsala University. His research here focused on linking his cheminformatics research with pharmaceutical and drug discovery research. The goal here is to make the data modeling more insightful, by upscaling of data analysis by using e-Science approaches and the use of semantic markup languages and ontologies which make it possible to link statistical models to external, complementary data sources. This allows linking of patterns between data types and between domains.

After a post-doc in the area of text mining for chemistry back at the Unilever Center (Prof. P. Murray-Rust) in the last three months of 2010, Willighagen started in 2011 a post-doc at the Institute of Environmental Medicine of the Karolinska Intitute (Prof. B. Fadeel, and Prof. R. Grafström) working on applications of cheminformatics in toxicology.

In 2012 Willighagen returned to the Netherlands, and started a post-doc position at Maastricht University, to work on the EU/IMI project Open PHACTS. He acquired the 4M euro project eNanoMapper (with 800k euro to fund research in Maastricht) and started as assistant professor in the same group in 2014.

My Team

Being part of the BiGCaT Department of Bioinformatics of Prof. Chris Evelo, I have now established my own research group. Future vacancies may be available.

Group Members

Past Members

Past Undergraduate and Master Students

  • Miguel Correa (Maastricht University; M.Sc. student 2015; from Wageningen University)

  • Esmee Eeltink (Maastricht University; M.Sc. student 2014; from the University of Amsterdam)

  • Patricia Zaandam (Maastricht University; M.Sc. student 2014)

  • Ann-Sofie Andersson (Uppsala University; M.Sc. student 2010; blog; report:urn:nbn:se:uu:diva-155184)

  • Samuel Lampa (Uppsala University, M.Sc. student 2009-2010; blog; report:urn:nbn:se:uu:diva-146738)

Other students that I supervised include Harm (6 month M.Sc. partical University of Nijmegen), Niels Out and Rob Schellhorn (3 month Programmeerzomer projects), and Alexandr Goncearenco, Rianne Feijten, and Answesha Dutta (3 month Google Summer of Code projects).


The topics that this research covers all involves small molecules and nanomaterials, and is currently directed at metabolite and small compound pathways, nanotoxicology, and life sciences in general. Research is applied to the fields of drug discovery, QSAR/QSPR, among others. Past topics include crystallography and polymorph prediction.

The research I am personally interested in is about reducing the error introduced in (chemical or molecular) data analysis by the analysis itself, a field coined Molecular Chemometrics. This involves study of the errors introduced by improper handling of chemical knowledge, improper representation of the problem, and the statistical analysis method used. The solution my group is using to reduce that error involves explicit markup of knowledge using semantic technologies, development of new representation methods for molecular information, and the use of statistics to find, visualize, and validate new patterns.

Practically, however, this also means that to prove my point, I need others to adopt such methods too. This has resulted in that I am involved very much in Applied Cheminformatics too. This research field is about getting sound cheminformatics used in practice, and raising awareness about the problems in Molecular Chemometrics among people in other research fields, such as bioinformatics, metabolomics, QSAR, etc.

Each part of my research is exemplified by one or two key or recent papers.

Semantic Technologies

Technologies studied here include markup languages, like Chemical Markup Language, and the family of semantic approaches around the Resource Description Framework. These methods are being studied for use in cheminformatics and exchange of molecular data. This work has resulted in the development of CMLRSS, QSAR-ML, the Blue Obelisk Descriptor Ontology (BODO), and the CHEMINF ontology.

  • Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, C., and Dumontier, M. (2011). The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web. PLoS ONE 6, e25513. doi:10.1371/journal.pone.0025513
  • Willighagen, E. L., Alvarsson, J., Andersson, A., Eklund, M., Lampa, S., Lapins, M., Spjuth, O., and Wikberg, J. E. (2011). Linking the Resource Description Framework to cheminformatics and proteochemometrics. Journal of Biomedical Semantics 2, S6. doi:10.1186/2041-1480-2-S1-S6

Molecular Representation

This branch of the research focuses on the presentation of molecular data, such that statistical methods can extract the most information from the data or generate the best prediction models. This research resulted in the development of a new descriptor for molecular crystal structures, the debunking of NMR spectra for some modeling approaches, and the development of applied tools, like the Chemistry Development Kit.

  • Willighagen, E. L., Wehrens, R., Verwer, P., De Gelder, R., and Buydens, L. M. C. (2005). Method for the computational comparison of crystal structures. Acta crystallographica Section B Structural crystallography and crystal chemistry 61, 29-36. doi:10.1107/S0108768104028344


Statistical methods help us find, understand and visualize complex patterns in our molecular data. This research focuses on improving the expert validation of our statistical models, by linking the models to external data. The latter brings us back to the semantic technologies to do that accurately. This work resulted in the application of supervised Self-Organizing Maps to classification problems with multiple end points.

  • Willighagen, E. L., Wehrens, R., Melssen, W., De Gelder, R., and Buydens, L. M. C. (2007). Supervised Self-Organizing Maps in Crystal Property and Structure Prediction. Crystal Growth Design 7, 1738-1745. doi:10.1021/cg060872y

Applied Cheminformatics

This part of the research is about getting the above methods used in other scientfic fields. This involves the development of tools that (unfortunately) hide much of the research from the above three fields, so that they can be easily used in other research fields. This research, particularly, is well accepted by the scientific community. Solutions here include the Chemistry Development Kit, Jmol, Bioclipse, and Oscar, which all make more fundamental research in Molecular Chemometrics more accessible.

  • Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E. (2003). The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. Journal of Chemical Information and Computer Sciences 43, 493-500. doi:10.1021/ci025584y

Research Collaborations

Collaborations are ongoing with: Dr Christoph Steinbeck (EBI/UK) on the CDK. Bioclipse, and JChemPaint; Prof. Bengt Fadeel, Dr Hanna Karlsson, and Prof. Roland Grafström on predictive toxicology; Dr Steffen Neumann (Halle/Germany) on open source metabolomics; Dr Ola Spjuth (Uppsala University) on Bioclipse and a Swedish data warehouse for toxicity; with Dr Nina Jeliazkova and others of the community; with many people in the CDK community; with various people in the HCLS interest group; and others.

EU Projects

I am currently involved in the eNanoMapper and Open PHACTS project. The first is part of the European NanoSafety Cluster, of which I am chairing the Database Working Group.

Previously, I worked in the ToxBank and as scientific advisor for OpenTox projects on the use of semantic web technologies in drug discovery.


For now, please find my publication list at Google Scholar, CiteULike, ORCID:0000-0001-7542-0286, Mendeley, ResearchGate, and the less populated researchid:C-6136-2008, and dai:308108485.


Image of the cover of my Groovy Cheminformatics with the Chemistry Development Kit book