cdkbook

Groovy Cheminformatics with the Chemistry Development Kit

Edition 2.9-2

Egon L. Willighagen PhD
Long time CDK developer

© E.L. Willighagen 2011-2024

License: CC-BY-SA 4.0 International

DOI

Warning

This book is being opensourced [1]. This involves transforming the LaTeX source into Markdown, and updating all scripts to ensure all the automation works well. I have made good steps forward, but it will take some time for things to iron out. If you find issue, please report them here. If you like this book, please give the GitHub repository a star.

Most code snippets in this book are actually Groovy scripts, but this repository has some Jupyter notebook examples. If you want to know how any of those examples translates to Python, please file a request here.

Contents

  1. Introduction
  2. Writing CDK Applications
    2.1. A (Very) Basic Java Application
    2.2. Groovy
    2.2.1. Closures
    2.2.2. Grabbing dependencies
    2.3. Python
    2.4. Other environments
    2.4.1. Bioclipse
    2.4.2. Cinfony
    2.4.3. R
  3. Cheminformatics
    3.1. Molecular Representations
    3.2. Chemical Graphs
    3.3. Quantum Chemistry
    3.4. Numerical Representations
    3.5. Chemometrics
  4. Atoms, Bonds and Molecules
    4.1. Atoms
    4.1.1. IElement
    4.1.2. IIsotope
    4.1.3. IAtomType
    4.1.4. Coordinates
    4.2. Bonds
    4.2.1. Electron counts
    4.2.2. Bond stereochemistry
    4.3. Molecules
    4.3.1. Iterating over atoms and bonds
    4.3.2. Neighboring atoms and bonds
    4.4. Molecular Formula
    4.5. Implicit and Explicit Hydrogens
    4.6. Chemical Objects
    4.7. Rings
  5. Stereochemistry
    5.1. Stereochemistry in a flat world
    5.2. Tetrahedral chirality
  6. Salts and other disconnected structures
    6.1. Salts
    6.2. Crystals
  7. Paired and unpaired electrons
    7.1. Lone Pairs
    7.2. Unpaired electrons
  8. Protein and DNA
    8.1. Protein From File
    8.2. Protein From Sequence
    8.3. Strands and Monomers
  9. Reactions
    9.1. A single reaction
    9.2. Reaction from File
    9.2.1. MDL RXN files
    9.3. CMLReact files
  10. From IChemObject to IChemFile
    10.1. IAtomContainerSet
    10.2. IReactionSet and IRingSet
    10.3. IChemModel
    10.4. IChemSequence
    10.5. IChemFile
  11. IChemObjectBuilders
    11.1. Implementations
    11.1.1. The Default Builder
    11.1.2. The Debug Builder
    11.1.3. The Silent Builder
  12. Input/Output
    12.1. File Format Detection
    12.1.1. Custom format matchers
    12.2. Reading from Readers and InputStreams
    12.2.1. Example: Downloading Domoic Acid from PubChem
    12.3. Input Validation
    12.3.1. Reading modes
    12.3.2. Validation
    12.4. Gzipped files
    12.5. Iterating Readers
    12.5.1. MDL SD files
    12.5.2. PubChem Compounds XML files
    12.6. Customizing the Output
    12.6.1. Setting Properties
    12.7. Example: creating unit tests for atom type perception
    12.8. Line Notations
    12.8.1. SMILES
    12.9. Recipes
    12.9.1. MDL molfile (V2000)
    12.9.2. SDF files with properties
  13. Atom types
    13.1. The CDK atom type model
    13.1.1. Hybridization Types
    13.2. Atom type perception
    13.2.1. Single atoms
    13.2.2. Full molecules
    13.2.3. Configuring the Atom
    13.2.4. No atom type perceived?!
    13.3. Sybyl atom types
  14. Graph Properties
    14.1. Partitioning
    14.2. Spanning Tree
    14.3. Ring counts
    14.3.1. Smallest Rings
    14.3.2. All Rings
    14.4. Graph matrices
    14.4.1. Adjacency matrix
    14.4.2. Distance matrix
    14.5. Atom Numbers
    14.5.1. Morgan Atom Numbers
    14.5.2. InChI Atom Numbers
  15. Missing Information
    15.1. Element and Isotope information
    15.1.1. Elements
    15.1.2. Isotopes
    15.2. Reconnecting Atoms
    15.3. Missing Bond Orders
    15.4. Missing Hydrogens
    15.4.1. Implicit Hydrogens
    15.4.2. Explicit Hydrogens
    15.5. 2D Coordinates
    15.6. Unknown Molecular Formula
  16. Depiction
    16.1. Molecules
    16.2. Background color
  17. Substructure Searching
    17.1. Fingerprints
    17.1.1. MACCS Fingerprints
    17.1.2. ECFP and FCFP Fingerprints
  18. Molecular Properties
    18.1. Molecular Mass
    18.1.1. Implicit Hydrogens
    18.2. LogP
    18.3. Total Polar Surface Area
    18.4. Van der Waals Volume
    18.5. Aromaticity
  19. Molecular Descriptors
    19.1. Descriptors and Specifications
    19.1.1. IImplementationSpecification
    19.2. IDescriptor
    19.3. IMolecularDescriptor
    19.4. IDescriptorResult
    19.5. Counting Nitrogens and Oxygens
  20. InChI
    20.1. Layers
    20.1.1. Fixed Hydrogens
    20.1.2. Stereoisomerism
  21. Chemistry Toolkit Rosetta
    21.1. Heavy atom counts from an SD file
    21.2. Depict a compound as an image
    21.3. Working with SD tag data
  22. Migration
    22.1. CDK 2.0 to 2.3
    22.2. CDK 1.4 to 2.0
    22.2.1. Removed classes
    22.2.2. Renamed classes and methods
    22.2.3. Changed behavior
    22.2.4. Constructors that now require a builder
    22.2.5. Static methods that are no longer
    22.2.6. IsotopeFactory
    22.2.7. IFingerPrinter
    22.2.8. SMILESGenerator
    22.2.9. Aromaticity calculations

Index

Appendix A
A.1 CDK Atom Types
A.2 Sybyl Atom Types
Appendix B
B.1 Isotope List
Appendix C
C.1 Molecular Descriptors
C.2 Atomic Descriptors
C.3 Atom-Pair Descriptors
C.4 Bond Descriptors
C.5 Protein Descriptors
Appendix D
D.1 Readers and Writers

References

  1. Willighagen E. Edition 1.4.1-0 of Groovy Cheminformatics with the Chemistry Development Kit. 2015. doi:10.6084/M9.FIGSHARE.2057790.V1 (Scholia)