cdkbook

Groovy Cheminformatics with the Chemistry Development Kit

Edition 2.9-0

Egon L. Willighagen PhD
Long time CDK developer

© E.L. Willighagen 2011-2023

License: CC-BY-SA 4.0 International

DOI

Warning

This book is being opensourced [1]. This involves transforming the LaTeX source into Markdown, and updating all scripts to ensure all the automation works well. I have made good steps forward, but it will take some time for things to iron out. If you find issue, please report them here. If you like this book, please give the GitHub repository a star.

Contents

  1. Introduction
  2. Writing CDK Applications
    2.1. A (Very) Basic Java Application
    2.2. Groovy
    2.2.1. Closures
    2.2.2. Grabbing dependencies
    2.3. Python
    2.4. Other environments
    2.4.1. Bioclipse
    2.4.2. Cinfony
    2.4.3. R
  3. Cheminformatics
    3.1. Molecular Representations
    3.2. Chemical Graphs
    3.3. Quantum Chemistry
    3.4. Numerical Representations
    3.5. Chemometrics
  4. Atoms, Bonds and Molecules
    4.1. Atoms
    4.1.1. IElement
    4.1.2. IIsotope
    4.1.3. IAtomType
    4.1.4. Coordinates
    4.2. Bonds
    4.2.1. Electron counts
    4.2.2. Bond stereochemistry
    4.3. Molecules
    4.3.1. Iterating over atoms and bonds
    4.3.2. Neighboring atoms and bonds
    4.4. Molecular Formula
    4.5. Implicit and Explicit Hydrogens
    4.6. Chemical Objects
    4.7. Rings
  5. Stereochemistry
    5.1. Stereochemistry in a flat world
    5.2. Tetrahedral chirality
  6. Salts and other disconnected structures
    6.1. Salts
    6.2. Crystals
  7. Paired and unpaired electrons
    7.1. Lone Pairs
    7.2. Unpaired electrons
  8. Protein and DNA
    8.1. Protein From File
    8.2. Protein From Sequence
    8.3. Strands and Monomers
  9. Reactions
    9.1. A single reaction
    9.2. Reaction from File
    9.2.1. MDL RXN files
    9.3. CMLReact files
  10. From IChemObject to IChemFile
    10.1. IAtomContainerSet
    10.2. IReactionSet and IRingSet
    10.3. IChemModel
    10.4. IChemSequence
    10.5. IChemFile
  11. IChemObjectBuilders
    11.1. Implementations
    11.1.1. The Default Builder
    11.1.2. The Debug Builder
    11.1.3. The Silent Builder
  12. Input/Output
    12.1. File Format Detection
    12.1.1. Custom format matchers
    12.2. Reading from Readers and InputStreams
    12.2.1. Example: Downloading Domoic Acid from PubChem
    12.3. Input Validation
    12.3.1. Reading modes
    12.3.2. Validation
    12.4. Gzipped files
    12.5. Iterating Readers
    12.5.1. MDL SD files
    12.5.2. PubChem Compounds XML files
    12.6. Customizing the Output
    12.6.1. Setting Properties
    12.7. Example: creating unit tests for atom type perception
    12.8. Line Notations
    12.8.1. SMILES
    12.9. Recipes
    12.9.1. MDL molfile (V2000)
    12.9.2. SDF files with properties
  13. Atom types
    13.1. The CDK atom type model
    13.1.1. Hybridization Types
    13.2. Atom type perception
    13.2.1. Single atoms
    13.2.2. Full molecules
    13.2.3. Configuring the Atom
    13.2.4. No atom type perceived?!
    13.3. Sybyl atom types
  14. Graph Properties
    14.1. Partitioning
    14.2. Spanning Tree
    14.3. Ring counts
    14.3.1. Smallest Rings
    14.3.2. All Rings
    14.4. Graph matrices
    14.4.1. Adjacency matrix
    14.4.2. Distance matrix
    14.5. Atom Numbers
    14.5.1. Morgan Atom Numbers
    14.5.2. InChI Atom Numbers
  15. Missing Information
    15.1. Element and Isotope information
    15.1.1. Elements
    15.1.2. Isotopes
    15.2. Reconnecting Atoms
    15.3. Missing Bond Orders
    15.4. Missing Hydrogens
    15.4.1. Implicit Hydrogens
    15.4.2. Explicit Hydrogens
    15.5. 2D Coordinates
    15.6. Unknown Molecular Formula
  16. Substructure Searching
    16.1. Fingerprints
    16.1.1. MACCS Fingerprints
    16.1.2. ECFP and FCFP Fingerprints
  17. Molecular Properties
    17.1. Molecular Mass
    17.1.1. Implicit Hydrogens
    17.2. LogP
    17.3. Total Polar Surface Area
    17.4. Van der Waals Volume
    17.5. Aromaticity
  18. Molecular Descriptors
    18.1. Descriptors and Specifications
    18.1.1. IImplementationSpecification
    18.2. IDescriptor
    18.3. IMolecularDescriptor
    18.4. IDescriptorResult
    18.5. Counting Nitrogens and Oxygens
  19. InChI
    19.1. Layers
    19.1.1. Fixed Hydrogens
    19.1.2. Stereoisomerism
  20. Chemistry Toolkit Rosetta
    20.1. Heavy atom counts from an SD file
    20.2. Depict a compound as an image
    20.3. Working with SD tag data
  21. Migration
    21.1. CDK 2.0 to 2.3
    21.2. CDK 1.4 to 2.0
    21.2.1. Removed classes
    21.2.2. Renamed classes and methods
    21.2.3. Changed behavior
    21.2.4. Constructors that now require a builder
    21.2.5. Static methods that are no longer
    21.2.6. IsotopeFactory
    21.2.7. IFingerPrinter
    21.2.8. SMILESGenerator
    21.2.9. Aromaticity calculations

Index

Appendix A
A.1 CDK Atom Types
A.2 Sybyl Atom Types
Appendix B
B.1 Isotope List
Appendix C
C.1 Molecular Descriptors
C.2 Atomic Descriptors
C.3 Atom-Pair Descriptors
C.4 Bond Descriptors
C.5 Protein Descriptors
Appendix D
D.1 Readers and Writers

References

  1. Willighagen E. Edition 1.4.1-0 of Groovy Cheminformatics with the Chemistry Development Kit. 2015. doi:10.6084/M9.FIGSHARE.2057790.V1 (Scholia)