cdkbook

Groovy Cheminformatics with the Chemistry Development Kit

Edition 2.2-0

Egon L. Willighagen PhD
Long time CDK developer

© E.L. Willighagen 2011-2019

License: CC-BY-SA 4.0 International

Warning

This book is being opensourced. This involves transforming the LaTeX source into Markdown, and updating all scripts to ensure all the automation works well. I have made good steps forward, but it will take some time for things to iron out.

Contents

  1. Introduction
  2. Cheminformatics
    2.1. Molecular Representations
    2.2. Chemical Graphs
    2.3. Quantum Chemistry
    2.4. Numerical Representations
    2.5. Chemometrics
  3. Atoms, Bonds and Molecules
    3.1. Atoms
    3.1.1. IElement
    3.1.2. IIsotope
    3.1.3. IAtomType
    3.1.4. Coordinates
    3.2. Bonds
    3.2.1. Electron counts
    3.2.2. Bond stereochemistry
    3.3. Molecules
    3.3.1. Iterating over atoms and bonds
    3.3.2. Neighboring atoms and bonds
    3.4. Molecular Formula
    3.5. Implicit and Explicit Hydrogens
    3.6. Chemical Objects
    3.7. Rings
  4. Stereochemistry
    4.1. Stereochemistry in a flat world
    4.2. Tetrahedral chirality
  5. Salts and other disconnected structures
    5.1. Salts
    5.2. Crystals
  6. Paired and unpaired electrons
    6.1. Lone Pairs
    6.2. Unpaired electrons
  7. Protein and DNA
    7.1. Protein From File
    7.2. Protein From Sequence
    7.3. Strands and Monomers
  8. Reactions
    8.1. A single reaction
    8.2. Reaction from File
    8.2.1. MDL RXN files
    8.3. CMLReact files
  9. From IChemObject to IChemFile
    9.1. IAtomContainerSet
    9.2. IReactionSet and IRingSet
    9.3. IChemModel
    9.4. IChemSequence
    9.5. IChemFile
  10. IChemObjectBuilders
    10.1. Implementations
    10.1.1. The Default Builder
    10.1.2. The Debug Builder
    10.1.3. The Silent Builder
  11. Input/Output
    11.1. File Format Detection
    11.1.1. Custom format matchers
    11.2. Reading from Readers and InputStreams
    11.2.1. Example: Downloading Domoic Acid from PubChem
    11.3. Input Validation
    11.3.1. Reading modes
    11.3.2. Validation
    11.4. Gzipped files
    11.5. Iterating Readers
    11.5.1. MDL SD files
    11.5.2. PubChem Compounds XML files
    11.6. Customizing the Output
    11.6.1. Setting Properties
    11.7. Example: creating unit tests for atom type perception
    11.8. Line Notations
    11.8.1. SMILES
    11.9. Recipes
    11.9.1. MDL molfile (V2000)
  12. Atom types
    12.1. The CDK atom type model
    12.1.1. Hybridization Types
    12.2. Atom type perception
    12.2.1. Single atoms
    12.2.2. Full molecules
    12.2.3. Configuring the Atom
    12.2.4. No atom type perceived?!
    12.3. Sybyl atom types
  13. Graph Properties
    13.1. Partitioning
    13.2. Spanning Tree
    13.3. Ring counts
    13.3.1. Smallest Rings
    13.3.2. All Rings
    13.4. Graph matrices
    13.4.1. Adjacency matrix
    13.4.2. Distance matrix
    13.5. Atom Numbers
    13.5.1. Morgan Atom Numbers
    13.5.2. InChI Atom Numbers
  14. Missing Information
    14.1. Element and Isotope information
    14.1.1. Elements
    14.1.2. Isotopes
    14.2. Reconnecting Atoms
    14.3. Missing Bond Orders
    14.4. Missing Hydrogens
    14.4.1. Implicit Hydrogens
    14.4.2. Explicit Hydrogens
    14.5. 2D Coordinates
    14.6. Unknown Molecular Formula
  15. Substructure Searching
    15.1. Fingerprints
    15.1.1. MACCS Fingerprints
    15.1.2. ECFP and FCFP Fingerprints
  16. Molecular Properties
    16.1. Molecular Mass
    16.1.1. Implicit Hydrogens
    16.2. LogP
    16.3. Total Polar Surface Area
    16.4. Van der Waals Volume
    16.5. Aromaticity
  17. Molecular Descriptors
    17.1. Descriptors and Specifications
    17.1.1. IImplementationSpecification
    17.2. IDescriptor
    17.3. IMolecularDescriptor
    17.4. IDescriptorResult
    17.5. Counting Nitrogens and Oxygens
  18. InChI
    18.1. Layers
  19. Chemistry Toolkit Rosetta
    19.1. Heavy atom counts from an SD file
    19.2. Depict a compound as an image
  20. Migration
    20.1. CDK 1.4 to 2.0
    20.1.1. Removed classes
    20.1.2. Renamed classes and methods
    20.1.3. Changed behavior
    20.1.4. Constructors that now require a builder
    20.1.5. Static methods that are no longer
    20.1.6. IsotopeFactory
    20.1.7. IFingerPrinter
    20.1.8. SMILESGenerator
    20.1.9. Aromaticity calculations

Index

Appendix A
A.1 CDK Atom Types
A.2 Sybyl Atom Types
Appendix B
B.1 Isotope List
Appendix C
C.1 Molecular Descriptors
C.2 Atomic Descriptors
C.3 Atom-Pair Descriptors
C.4 Bond Descriptors
C.5 Protein Descriptors
Appendix D
D.1 Readers and Writers