This book gave a lot of small code snippets, which can easily be integrated in larger programs. But the book has not shown so far what such a larger program can look like. This book is not about Java programming, and therefore did not introduce those aspects of using the CDK. Nevertheless, this section gives a brief introduction on how to write a Java application, a BeanShell script, and a Groovy script.
Given you already downloaded the CDK jar file, or compiled it from scratch, consider the following piece of Java source code:
import org.openscience.cdk.interfaces.IAtom;
import org.openscience.cdk.silent.Atom;
public class BasicProgram {
public static void main(String args[]) throws Exception {
IAtom atom = new Atom("C");
System.out.println(atom);
}
}
This Java application
can then be compiled with javac to byte code, creating a
BasicProgram.class
:
$ javac -classpath cdk-2.9.jar BasicProgram
And then run with:
$ java -classpath .:cdk-2.9.jar BasicProgram
The downside of pure Java applications is the relative overhead needed to define an application. Other programming language provide a simpler syntax, including the BeanShell, Groovy, and Clojure described below.
Groovy
(http://www.groovy-lang.org/) is a programming language that
advertizes itself as \emph{an agile and dynamic language for the Java
Virtual Machine}. It provides an environment to quickly
try Java code, but it provides more linguistic changes
to the Java language, and adds quite interesting sugar too.
A simple script may look like:
Script code/IterateAtoms.groovy
for (IAtom atom : molecule.atoms()) {
System.out.println(atom.getSymbol());
}
But in Groovy it can also look like:
Script code/IterateAtomsGroovy.groovy
for (atom in molecule.atoms()) {
println atom.getSymbol()
}
One of the more interesting features of Groovy is something called
closures
.
I have know this programming pattern from R and happily used for a long time,
but only recently learned them to be called closures. Closures allow you to
pass a method as a parameter, which can have many applications, and I will show one
situation here.
Consider the calculation of molecular properties which happen to be a mere summation over atomic properties, such as the total charge, or the molecular weight. Both these calculations require an iteration over all atoms. If we need those properties at the same time, we can combine the calcultion into one iteration. However, for the purpose of this section, we will not combine the two calculations to use one iteration, but use closures instead.
Therefore, we have two slices of code which share a large amount of source code statements:
Script code/CalculateTotalCharge.groovy
totalCharge = 0.0
for (atom in molecule.atoms()) {
totalCharge += atom.getCharge()
}
and
Script 16.1 code/CalculateMolecularWeight.groovy
molWeight = 0.0
for (atom in molecule.atoms()) {
molWeight += isotopeInfo.getNaturalMass(atom)
}
In both cases we want to apply a custom bit of code to all atoms, while the iteration over the atoms is identical. Groovy allows us to share the common code, by defining a \code{forAllAtoms} function into which we inject a code block using closures:
Script code/GroovyClosureForAllAtoms.groovy
def forAllAtoms(molecule, block) {
for (atom in molecule.atoms()) {
block(atom)
}
}
totalCharge = 0.0
forAllAtoms(molecule, { totalCharge += it.getCharge() } )
totalCharge = String.format('%.2f', totalCharge)
println "Total charge: ${totalCharge}"
molWeight = 0.0
forAllAtoms(molecule, {
molWeight += isotopeInfo.getNaturalMass(it)
} )
molWeight = String.format('%.2f', molWeight)
println "Molecular weight: ${molWeight}"
which gives the output:
Total charge: -0.00
Molecular weight: 16.04
This language feature makes it possible to write more compact code.
The introduction of this section showed how to use the environment variable
CLASSPATH
to define where to find dependencies. Groovy has, however,
a different way of doing this too, allowing it to grab
its dependencies.
@Grab(group='org.openscience.cdk', module='cdk-io', version='2.9')
@Grab(group='org.openscience.cdk', module='cdk-silent', version='2.9')
Using projects like ScyJava, the CDK can also be used in Python, for example, in a Jupyter notebook on Google Colab. Most code snippets in this book are actually Groovy scripts, but this repository has some Jupyter notebook examples. If you want to know how any of those examples translates to Python, please file a request here.
from scyjava import config, jimport
config.add_endpoints('org.openscience.cdk:cdk-bundle:2.9')
SmilesParser = jimport('org.openscience.cdk.smiles.SmilesParser')
Builder = jimport('org.openscience.cdk.silent.SilentChemObjectBuilder')
sp = SmilesParser(Builder.getInstance())
mol = sp.parseSmiles("CC(=O)OC1=CC=CC=C1C(=O)O")
print(f"Aspirin has {mol.getAtomCount()} atoms.")
There are even other languages at your disposal for using the CDK library. This book will mostly use Groovy code snippets, but this section points a few alternatives. These alternatives do not always provide access to the full CDK API, but at the same time often do offer a customized API which hides certain more technical details.
Bioclipse has a custom scripting language with a JavaScript
interface [1,2]. Functionality is provided using managers
,
and CDK functionality is provided using two such managers. Bioclipse 2.6.2 was the
last release using the Eclipse UI, but Bacting allows you to run Bioclipse
scripts from the command line [3].
Cinfony is a Python module that integrates to the CDK as well as other cheminformatics toolkits [4]. Cinfony can be downloaded from https://cinfony.github.io/.
The statistical software R (http://www.r-project.org/) also provide access to the CDK functionality via the rcdk package [5,6]. This package can be downloaded from CRAN from https://cran.r-project.org/web/packages/rcdk/.