Introduction
At the Dagstuhl Computation Metabolomics meeting there was a session about core structures.
The research question here is what the right balance is of representation of multiple
structures, matching measured data. ChemAxon extended SMILES,
or CxSMILES, came up as one solution [1,2].
This repository contains code using the Chemistry Development Kit [3,4,5]
for various tasks around this question. The following chapters discuss these.
CxSMILES in Wikidata
Prior to the Computation Metabolomics 2022 meeting a proposal was made to add a Wikidata
property for CxSMILES which was approved during the meeting. This was accepted as
P10718. The growth of the use of CXSMILES
can be monitored with this query.
At the time of writing, it is mostly used for polymers and groups of compounds.
The SPARQL query https://w.wiki/58rF returns a list of Wikidata items with a
CxSMILES value.
CDKDepict Gadget
There is a CDKDepict Gadget available for Wikidata that will change the Wikidata
interface and depict the CXSMILES:
References
- Alexandrov T, Böcker S, Dorrestein P, Schymanski E. Computational Metabolomics: Identification, Interpretation, Imaging (Dagstuhl Seminar 17491). 2018. doi:10.4230/DAGREP.7.12.1 (Scholia)
- Ludwig M, Neumann S, Willighagen E. Cheminformatics for Users. In: Computational Metabolomics: From Cheminformatics to Machine Learning (Dagstuhl Seminar 20051). 2020.
- Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. JCICS. 2003 Feb 11;43(2):493–500. doi:10.1021/CI025584Y (Scholia)
- Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr Pharm Des [Internet]. 2006 Jun 1;12(17):2111–20. Available from: https://cdk.github.io/cdk-paper-2/ doi:10.2174/138161206777585274 (Scholia)
- Willighagen E, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. 2017 Jun 6;9(1). doi:10.1186/S13321-017-0220-4 (Scholia)