chempyformatics

Input/Output

Line Notations

Another common input mechanism in cheminformatics is the line notation. Several line notations have been proposed, including the Wiswesser Line Notation (WLN) [1] and the Sybyl Line Notation (SLN) [2], but the most popular is SMILES [3]. There is a Open Standard around this format called OpenSMILES, available at http://www.opensmiles.org/.

SMILES

The CDK can both read and write SMILES, or at least a significant subset of the line notation. You can parse a SMILES into a IAtomContainer with the

SmilesParser

. The constructor of the parser takes an IChemObjectBuilder (see Section ??) because it needs to know what CDK interface implementation it must use to create classes. This example uses the DefaultChemObjectBuilder:

Script code/ReadSMILES.py

sp = SmilesParser(Builder.getInstance())
mol = sp.parseSmiles("CC(=O)OC1=CC=CC=C1C(=O)O")
print(f"Aspirin has {mol.getAtomCount()} atoms.")

This outputs:

Aspirin has 13 atoms.

References

  1. Wiswesser WJ. How the WLN began in 1949 and how it might be in 1999. JCICS. 1982 May 1;22(2):88–93. doi:10.1021/CI00034A005 (Scholia)
  2. Homer RW, Swanson J, Jilek RJ, Hurst T, Clark RD. SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries. JCIM. 2008 Dec 1;48(12):2294–307. doi:10.1021/CI7004687 (Scholia)
  3. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. JCICS [Internet]. 1988 Feb 1;28(1):31–6. Available from: http://organica1.org/seminario/weininger88.pdf doi:10.1021/CI00057A005 (Scholia)