cdkbook

File Formats

This appendix lists of file formats the CDK knows about. For each format, it indicated if the CDK has a reader (R) and/or a writer (W) for it. It also indicates which formats can be detected from the file content. Chemical file format definitions implementations in the CDK implement the IChemFormatMatcher class for that.

This script was used to create this list:

Script 24.1 code/ListAllFileFormats.groovy

formats = new ArrayList<IChemFormat>();
reader =
  this.getClass().getClassLoader().getResourceAsStream(
    "io-formats.set"
  )
reader.eachLine { formatName ->
  try {
    Class<? extends Object> formatClass =
      this.getClass().getClassLoader().
        loadClass(formatName);
    Method getinstanceMethod =
      formatClass.getMethod(
        "getInstance", new Class[0]
      );
    format = getinstanceMethod.invoke(
      null, new Object[0]
    );
    formats.add(format);
  } catch (ClassNotFoundException exception) {
  } catch (Exception exception) {
  }
}
formats.sort{ it.formatName }
for (format in formats) {
  if (format instanceof IChemFormat &&
      format.getReaderClassName() != null) {
    output.append("R");
  }
  if (format instanceof IChemFormat &&
      format.getWriterClassName() != null) {
    output.append("W");
  }
  if (format instanceof IChemFormatMatcher) {
    output.append("M");
  }
  output.append(format.getFormatName())
}
Read/Write Matcher File Format
M ABINIT
M ADF
M Aces2
Alchemy
Ball and Stick
M CAChe MolStruct
RW M CDK OWL (N3)
W CDK Source Code
W CML enriched RSS
R M CTX
Cacao Cartesian
Cacao Internal
Chem3D Cartesian 1
Chem3D Cartesian 2
ChemDraw eXchange file
RW M Chemical Markup Language
Chemical Resource Kit 2D
Chemical Resource Kit 3D
Chemtool
RW M CrystClust
R M Crystallographic Interchange Format
DMol3
M Dalton
Dock 5 Box
Fenske-Hall Z-Matrix
Fingerprint
R M GAMESS log file
GROMOS96
R M Gaussian 2003
W Gaussian Input
M Gaussian90
M Gaussian92
M Gaussian94
M Gaussian95
R M Gaussian98
R M Ghemical Quantum/Molecular Mechanics Model
R M Ghemical Simplified Protein Model
RW M HyperChem HIN
R M IUPAC-NIST Chemical Identifier (Plain Text)
R M IUPAC-NIST Chemical Identifier (XML)
JME
M Jaguar
R M MDL Mol/SDF V3000
R M MDL Molfile
RW M MDL Molfile V2000
R M MDL RXN V2000
R M MDL RXN V3000
R M MDL Reaction format
RW M MDL Structure-data file
M MOPAC 2002
M MOPAC 2007
M MOPAC 2009
M MOPAC 2012
M MOPAC 93
M MOPAC 97
M MOPAC7
M MOPAC7 Input
MSI BGF
MacroModel
MacroModel
Massively Parallel Quantum Chemistry Program
M MoSS Output Format
RW M Mol2 (Sybyl)
M NWChem
PCModel
POV Ray
Parallel Quantum Solutions
R M PolyMorph Predictor (Cerius)
RW M Protein Brookhave Database (PDB)
Protein Data Bank Markup Language (PDBML)
PubChem
R M PubChem Compound ASN
R M PubChem Compound XML
M PubChem Compounds XML
R M PubChem Substance XML
M PubChem Substances ASN
M PubChem Substances XML
M Q-Chem
Raw Copy
SMARTS
RW SMILES
SMILES FIX
Scalable Vector Graphics
RW M ShelXL
M Spartan Quantum Mechanics Program
Sybyl descriptor
RW M Symyx Rgroup query files
Tinker MM2
Tinker XYZ
TurboMole
UniChemXYZ
R M VASP
Viewmol
XED
RW XYZ
Yasara
R M ZMatrix
Zindo

The Readers and Writers

Additionally, for all formats we can list information about the readers and writers, again by iterating over all formats:

Script 24.2 code/ListAllIOClassesByFormat.groovy

for (format in formats) {
  if (format instanceof IChemFormat &&
      (format.getReaderClassName() != null ||
       format.getWriterClassName() != null)) {
    output.append(
      "## " + format.formatName + "\n"
    )
    // output some further format details
    if (format.readerClassName != null) {
      reader = format.readerClassName.substring(
        format.readerClassName.lastIndexOf(".") + 1
      )
      output.append(
        "### <topic type=\"class\">" + reader + "</topic>\n"
      )
    }
    if (format.writerClassName != null) {
      writer = format.writerClassName.substring(
        format.writerClassName.lastIndexOf(".") + 1
      )
      output.append(
        "### <topic type=\"class\">" + writer + "</topic>\n"
      )
    }
  }
}

CDK OWL (N3)

Preferred Extension: n3 MIME type: text/n3 XML Based?: No

CDKOWLReader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

CDKOWLWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

CDK Source Code

Preferred Extension: java XML Based?: No

CDKSourceCodeWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
write3DCoordinates Should 3D coordinates be added? [Default: true]
builder Which IChemObjectBuilder should be used? [Default: DefaultChemObjectBuilder]
write2DCoordinates Should 2D coordinates be added? [Default: true]

CML enriched RSS

XML Based?: Yes

RssWriter

This writer supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
ChemSequencetrue
Reactiontrue
ReactionSettrue
isomorphism.matchers.RGroupQuerytrue
Crystaltrue

CTX

Preferred Extension: ctx MIME type: chemical/x-ctx XML Based?: No

CTXReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

Chemical Markup Language

Extensions: [cml, xml] Preferred Extension: cml MIME type: chemical/x-cml XML Based?: Yes

CMLReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

CMLWriter

This writer supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse
ChemSequencetrue
Reactiontrue
ReactionSettrue
Crystaltrue

This writer has these IO settings:

NameDesc
CMLIDs Should the output use CML identifiers? [Default: true]
NamespacedOutput Should the output use namespaced output? [Default: true]
NamespacePrefix What should the namespace prefix be? [empty is no prefix] [Default: ]
Indenting Should the output be indented? [Default: true]
SchemaInstance Should the output use the Schema-Instance attribute? [Default: false]
XMLDeclaration Should the output contain an XML declaration? [Default: true]
InstanceLocation Where is the schema found? [Default: ]

CrystClust

Preferred Extension: crystclust XML Based?: No

CrystClustReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

CrystClustWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainerfalse
ChemSequencetrue
Crystaltrue

Crystallographic Interchange Format

Preferred Extension: cif MIME type: chemical/x-cif XML Based?: No

CIFReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

GAMESS log file

Extensions: [gam, gamin, inp, gamout] Preferred Extension: gam MIME type: chemical/x-gamess-output XML Based?: No

GamessReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

Gaussian 2003

MIME type: chemical/x-gaussian-log XML Based?: No

Gaussian03Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse
ChemSequencetrue

Gaussian Input

Extensions: [gau, com] Preferred Extension: gau MIME type: chemical/x-gaussian-input XML Based?: No

GaussianInputWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
OpenShell Should the calculation be open shell? [Default: false]
Comment What comment should be put in the file? [Default: Created with CDK (http://cdk.sf.net/)]
Memory How much memory do you want to use? [Default: unset]
Command What kind of job do you want to perform? [Default: energy calculation]
ProcessorCount How many processors should be used by Gaussian? [Default: 1]

Gaussian98

MIME type: chemical/x-gaussian-log XML Based?: No

Gaussian98Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

This reader has these IO settings:

NameDesc
ReadOptimizedStructureOnly Should I only read the optimized structure from a geometry optimization? [Default: false]

Ghemical Quantum/Molecular Mechanics Model

Preferred Extension: gpr MIME type: application/x-ghemical XML Based?: No

GhemicalMMReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

Ghemical Simplified Protein Model

XML Based?: No

GhemicalMMReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

HyperChem HIN

Preferred Extension: hin MIME type: chemical/x-hin XML Based?: No

HINReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

HINWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue

IUPAC-NIST Chemical Identifier (Plain Text)

MIME type: chemical/x-inchi XML Based?: No

INChIPlainTextReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

IUPAC-NIST Chemical Identifier (XML)

Preferred Extension: inchi MIME type: chemical/x-inchi-xml XML Based?: Yes

INChIReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

MDL Mol/SDF V3000

MIME type: chemical/x-mdl-molfile XML Based?: No

MDLV3000Reader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

This reader has these IO settings:

NameDesc
AddStereo0d Allow stereo created from parity value when no coordinates [Default: true]
AddStereoElements Detect and create IStereoElements for the input. [Default: true]
InterpretHydrogenIsotopes Should D and T be interpreted as hydrogen isotopes? [Default: true]
ForceReadAs3DCoordinates Should coordinates always be read as 3D? [Default: false]

MDL Molfile

Preferred Extension: mol MIME type: chemical/x-mdl-molfile XML Based?: No

MDLReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This reader has these IO settings:

NameDesc
ForceReadAs3DCoordinates Should coordinates always be read as 3D? [Default: false]

MDL Molfile V2000

Preferred Extension: mol MIME type: chemical/x-mdl-molfile XML Based?: No

MDLV2000Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This reader has these IO settings:

NameDesc
AddStereo0d Allow stereo created from parity value when no coordinates [Default: true]
AddStereoElements Detect and create IStereoElements for the input. [Default: true]
InterpretHydrogenIsotopes Should D and T be interpreted as hydrogen isotopes? [Default: true]
ForceReadAs3DCoordinates Should coordinates always be read as 3D? [Default: false]

MDLV2000Writer

This writer supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
ProgramName Program name to write at the top of the molfile header, should be exactly 8 characters long [Default: CDK]
ForceWriteAs2DCoordinates Should coordinates always be written as 2D? [Default: false]
WriteAromaticBondTypes Should aromatic bonds be written as bond type 4? [Default: false]
WriteMajorIsotopes Write atomic mass of any non-null atomic mass including major isotopes (e.g. [12]C) [Default: true]
WriteDefaultProperties Write trailing zero's on atom/bond property blocks even if they're not used. [Default: true]
WriteQueryFormatValencies Should valencies be written in the MDL Query format? (deprecated) [Default: false]

MDL RXN V2000

Preferred Extension: rxn MIME type: chemical/x-mdl-rxnfile XML Based?: No

MDLRXNV2000Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse
Reactiontrue

MDL RXN V3000

Preferred Extension: rxn MIME type: chemical/x-mdl-rxnfile XML Based?: No

MDLRXNV3000Reader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainerfalse
Reactiontrue

MDL Reaction format

Preferred Extension: rxn MIME type: chemical/x-mdl-rxnfile XML Based?: No

MDLRXNReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse
Reactiontrue
ReactionSettrue

MDL Structure-data file

Extensions: [sdf, sd] Preferred Extension: sdf MIME type: chemical/x-mdl-sdfile XML Based?: No

MDLV2000Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This reader has these IO settings:

NameDesc
AddStereo0d Allow stereo created from parity value when no coordinates [Default: true]
AddStereoElements Detect and create IStereoElements for the input. [Default: true]
InterpretHydrogenIsotopes Should D and T be interpreted as hydrogen isotopes? [Default: true]
ForceReadAs3DCoordinates Should coordinates always be read as 3D? [Default: false]

SDFWriter

This writer supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
WriteAromaticBondTypes Should aromatic bonds be written as bond type 4? [Default: false]
WriteMajorIsotopes Write atomic mass of any non-null atomic mass including major isotopes (e.g. [12]C) [Default: true]
writeProperties Should molecule properties be written as non-structural data [Default: true]
WriteQueryFormatValencies Should valencies be written in the MDL Query format? (deprecated) [Default: false]
TruncateLongData Truncate long data files >200 characters [Default: false]
ProgramName Program name to write at the top of the molfile header, should be exactly 8 characters long [Default: CDK]
ForceWriteAs2DCoordinates Should coordinates always be written as 2D? [Default: false]
WriteDefaultProperties Write trailing zero's on atom/bond property blocks even if they're not used. [Default: true]
writeV3000 Write all records as V3000 [Default: false]

Mol2 (Sybyl)

Preferred Extension: mol2 MIME type: chemical/x-mol2 XML Based?: No

Mol2Reader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

Mol2Writer

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

PolyMorph Predictor (Cerius)

Preferred Extension: pmp XML Based?: No

PMPReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

Protein Brookhave Database (PDB)

Extensions: [pdb, ent] Preferred Extension: pdb MIME type: chemical/x-pdb XML Based?: No

PDBReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

This reader has these IO settings:

NameDesc
UseRebondTool Should the PDBReader deduce bonding patterns? [Default: false]
ReadConnectSection Should the CONECT be read? [Default: true]
UseHetDictionary Should the PDBReader use the HETATM dictionary for atom types? [Default: false]

PDBWriter

This writer supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
WriteCONECT Should the bonds be written as CONECT records? [Default: true]
UseElementSymbolAsAtomName Should the element symbol be written as the atom name [Default: false]
WriteTER Should a TER record be put at the end of the atoms? [Default: false]
WriteEND Should an END record be put at the end of the file? [Default: true]
WriteAsHET Should the output file use HETATM [Default: false]

PubChem Compound ASN

Preferred Extension: asn XML Based?: No

PCCompoundASNReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

PubChem Compound XML

Preferred Extension: xml XML Based?: Yes

PCCompoundXMLReader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

PubChem Substance XML

Preferred Extension: xml XML Based?: Yes

PCSubstanceXMLReader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

SMILES

Preferred Extension: smi MIME type: chemical/x-daylight-smiles XML Based?: No

SMILESReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

SMILESWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

This writer has these IO settings:

NameDesc
SmilesFlavor Output SMILES flavor, binary option [Default: 12551944]
WriteTitle Write the molecule title after the SMILES [Default: true]
UseAromaticity Should aromaticity information be stored in the SMILES? [Default: false]

ShelXL

Extensions: [ins, res] Preferred Extension: ins MIME type: chemical/x-shelx XML Based?: No

ShelXReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse
Crystaltrue

ShelXWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainerfalse
Crystaltrue

Symyx Rgroup query files

Extensions: [mol, rgp] Preferred Extension: mol XML Based?: No

RGroupQueryReader

This reader supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainerfalse
isomorphism.matchers.RGroupQuerytrue

RGroupQueryWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainerfalse
isomorphism.matchers.RGroupQuerytrue

VASP

XML Based?: No

VASPReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

XYZ

Preferred Extension: xyz MIME type: chemical/x-xyz XML Based?: No

XYZReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse

XYZWriter

This writer supports these data objects:

ClassAccepted
ChemFilefalse
AtomContainertrue
Crystaltrue

ZMatrix

XML Based?: No

ZMatrixReader

This reader supports these data objects:

ClassAccepted
ChemFiletrue
AtomContainerfalse