cdk-cxsmiles

Introduction

At the Dagstuhl Computation Metabolomics meeting there was a session about core structures. The research question here is what the right balance is of representation of multiple structures, matching measured data. ChemAxon extended SMILES, or CxSMILES, came up as one solution [1,2]. This repository contains code using the Chemistry Development Kit [3,4,5] for various tasks around this question. The following chapters discuss these.

CxSMILES in Wikidata

Prior to the Computation Metabolomics 2022 meeting a proposal was made to add a Wikidata property for CxSMILES which was approved during the meeting. This was accepted as P10718. The growth of the use of CXSMILES can be monitored with this query. At the time of writing, it is mostly used for polymers and groups of compounds. The SPARQL query https://w.wiki/58rF returns a list of Wikidata items with a CxSMILES value.

CDKDepict Gadget

There is a CDKDepict Gadget available for Wikidata that will change the Wikidata interface and depict the CXSMILES:

Screenshot of the CDKDepict Gadget entry.

References

  1. Alexandrov T, Böcker S, Dorrestein P, Schymanski E. Computational Metabolomics: Identification, Interpretation, Imaging (Dagstuhl Seminar 17491). 2018. doi:10.4230/DAGREP.7.12.1 (Scholia)
  2. Ludwig M, Neumann S, Willighagen E. Cheminformatics for Users. In: Computational Metabolomics: From Cheminformatics to Machine Learning (Dagstuhl Seminar 20051). 2020.
  3. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. JCICS. 2003 Feb 11;43(2):493–500. doi:10.1021/CI025584Y (Scholia)
  4. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr Pharm Des [Internet]. 2006 Jun 1;12(17):2111–20. Available from: https://cdk.github.io/cdk-paper-2/ doi:10.2174/138161206777585274 (Scholia)
  5. Willighagen E, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. 2017 Jun 6;9(1). doi:10.1186/S13321-017-0220-4 (Scholia)