14.10.1. Introduction to the VCF2FHIR python library¶
license: CC-BY-4.0
version: 1.0
creation-date: 2021.10.22
This python library (in early stage of development) by Dolin et al, 2021 [1] provides an initial capability to convert genetic variation information stored in a standard Variant Call File (VCF) into a JSON-based HL7 FHIR message, compliant with HL7 FHIR Genomics Report
guidelines.
This notebook offers a simple way for anyone interested in how FAIR principles can be connected to Clinical World
to try it out for themselves.
Main Features:
supports
simple variants
(SNVs, MNVs, Indels)
Limitations:
does not support
structural variants
This software is not intended for use in production systems
14.10.1.1. Let’s get going by importing all the necessary python libraries¶
import os
import json
import logging
import vcf2fhir
14.10.1.1.1. VCF2FHIR python libraryvcf2fhir
main method is called Converter
and takes a number of arguments, most of which are optional.¶
Required arguments:
vcf_filename (required)
: the path to a text-based or bgzipped VCF file.IMPORTANT:
Valid path and filename without whitespace must be provided.
VCF file must conform to VCF Version 4.1 or later.
FORMAT.GT must be present.
Multi-sample VCFs are allowed, but only the first sample will be converted.
bgzipped VCF files are allowed but then the additional argument
has_tabix
must be set toTrue
and a tabix index file must be provided. The Tabix file must have the same name as the bgzipped VCF file, with a ‘.tbi’ extension, and must be in the same folder.
ref_build (required)
: Genome Reference Consortium genome assembly to which variants in the VCF were called.IMPORTANT:
Must be one of ‘GRCh37’ or ‘GRCh38’.
Optional arguments are:
patient_id
(optional):conv_region_dict
conv_region_filename
annotation_filename (optional)
region_studied_filename (optional)
nocall_filename (optional):
ratio_ad_dp (optional)(default value = 0.99)
genomic_source_class (optional)(default value = somatic)
For more information about those options, refer to the library documentation.
14.10.1.1.1.1. Invoking the converter is as simple as the following command:¶
fhir = vcf2fhir.Converter('vcftests.vcf','GRCh37')
14.10.1.1.1.2. Invoking the convert()
submethod to serialize the information as a HL7 FHIR JSON message to a default file output.¶
fhir.convert()
14.10.1.1.1.3. Performing both actions in one go while using an additional optional argument¶
vcf2fhir.Converter('vcftests.vcf','GRCh38', 'patient01').convert()
14.10.1.1.1.4. Invoking the conversion and writing to a user defined file instead of the default file.¶
output=vcf2fhir.Converter('vcftests.vcf','GRCh37', 'patient01', ratio_ad_dp = 0.89).convert(output_filename='patient01.json')
14.10.1.1.2. Peaking at the resulting JSON file:¶
with open('patient01.json','r') as input:
fhirmsg=json.load(input)
print(json.dumps(fhirmsg, indent=4, sort_keys=True))
{
"category": [
{
"coding": [
{
"code": "GE",
"system": "http://terminology.hl7.org/CodeSystem/v2-0074"
}
]
}
],
"code": {
"coding": [
{
"code": "81247-9",
"display": "Master HL7 genetic variant reporting panel",
"system": "http://loinc.org"
}
]
},
"contained": [
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60465:T:C",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "T"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "C"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60466
}
}
}
],
"id": "dv-506559af936d4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-506559af936d4"
}
],
"id": "di-6da3099b6b204",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60578:G:A",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "G"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "A"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60579
}
}
}
],
"id": "dv-6f399c7fb0be4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-6f399c7fb0be4"
}
],
"id": "di-12de791e725b4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60582:G:C",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "G"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "C"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60583
}
}
}
],
"id": "dv-6175dec7e9904",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-6175dec7e9904"
}
],
"id": "di-cda284da44504",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60592:A:T",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "A"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "T"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60593
}
}
}
],
"id": "dv-c5a54b1cd5684",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-c5a54b1cd5684"
}
],
"id": "di-7771ce77a9a54",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60691:T:C",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "T"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "C"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60692
}
}
}
],
"id": "dv-b69d08e525b44",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-b69d08e525b44"
}
],
"id": "di-69f5a76124f04",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10:60881:T:G",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_000023.10",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "T"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "G"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 60882
}
}
}
],
"id": "dv-e18135b890434",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-e18135b890434"
}
],
"id": "di-ee4e4752910f4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "69548-6",
"display": "Genetic variant assessment",
"system": "http://loinc.org"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "48004-6",
"display": "DNA change (c.HGVS)",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_012920.1:6017:A:C",
"system": "http://varnomen.hgvs.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48013-7",
"display": "Genomic reference sequence ID",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "NC_012920.1",
"system": "http://www.ncbi.nlm.nih.gov/nuccore"
}
]
}
},
{
"code": {
"coding": [
{
"code": "48002-0",
"display": "Genomic Source Class",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA6684-0",
"display": "Somatic",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "81258-6",
"display": "Sample VAF",
"system": "http://loinc.org"
}
]
},
"valueQuantity": {
"code": "1",
"system": "http://unitsofmeasure.org",
"value": 0.8
}
},
{
"code": {
"coding": [
{
"code": "69547-8",
"display": "Genomic Ref allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "A"
},
{
"code": {
"coding": [
{
"code": "69551-0",
"display": "Genomic Alt allele [ID]",
"system": "http://loinc.org"
}
]
},
"valueString": "C"
},
{
"code": {
"coding": [
{
"code": "92822-6",
"display": "Genomic coord system",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA30102-0",
"display": "1-based character counting",
"system": "http://loinc.org"
}
]
}
},
{
"code": {
"coding": [
{
"code": "exact-start-end",
"display": "Variant exact start and end",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"valueRange": {
"low": {
"value": 6018
}
}
}
],
"id": "dv-3d5841401bfb4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "LA9633-4",
"display": "present",
"system": "http://loinc.org"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "diagnostic-implication",
"display": "Diagnostic Implication",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/TbdCodes"
}
]
},
"component": [
{
"code": {
"coding": [
{
"code": "53037-8",
"display": "Genetic variation clinical significance [Imp]",
"system": "http://loinc.org"
}
]
},
"valueCodeableConcept": {
"coding": [
{
"display": "not specified",
"system": "http://loinc.org"
}
]
}
}
],
"derivedFrom": [
{
"reference": "#dv-3d5841401bfb4"
}
],
"id": "di-07dcac1e12104",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/diagnostic-implication"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "82120-7",
"display": "Allelic phase",
"system": "http://loinc.org"
}
]
},
"derivedFrom": [
{
"reference": "#dv-6f399c7fb0be4"
},
{
"reference": "#dv-6175dec7e9904"
}
],
"id": "sid-b582d59d887d4",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/sequence-phase-relationship"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "Cis",
"display": "Cis",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/SequencePhaseRelationshipCS"
}
]
}
},
{
"category": [
{
"coding": [
{
"code": "laboratory",
"system": "http://terminology.hl7.org/CodeSystem/observation-category"
}
]
}
],
"code": {
"coding": [
{
"code": "82120-7",
"display": "Allelic phase",
"system": "http://loinc.org"
}
]
},
"derivedFrom": [
{
"reference": "#dv-6175dec7e9904"
},
{
"reference": "#dv-c5a54b1cd5684"
}
],
"id": "sid-87677dfd43394",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/sequence-phase-relationship"
]
},
"resourceType": "Observation",
"status": "final",
"subject": {
"reference": "Patient/patient01"
},
"valueCodeableConcept": {
"coding": [
{
"code": "Cis",
"display": "Cis",
"system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/SequencePhaseRelationshipCS"
}
]
}
}
],
"id": "dr-0646393fe2044",
"issued": "2021-10-22T10:49:11+00:00",
"meta": {
"profile": [
"http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/genomics-report"
]
},
"resourceType": "DiagnosticReport",
"result": [
{
"reference": "#dv-506559af936d4"
},
{
"reference": "#di-6da3099b6b204"
},
{
"reference": "#dv-6f399c7fb0be4"
},
{
"reference": "#di-12de791e725b4"
},
{
"reference": "#dv-6175dec7e9904"
},
{
"reference": "#di-cda284da44504"
},
{
"reference": "#dv-c5a54b1cd5684"
},
{
"reference": "#di-7771ce77a9a54"
},
{
"reference": "#dv-b69d08e525b44"
},
{
"reference": "#di-69f5a76124f04"
},
{
"reference": "#dv-e18135b890434"
},
{
"reference": "#di-ee4e4752910f4"
},
{
"reference": "#dv-3d5841401bfb4"
},
{
"reference": "#di-07dcac1e12104"
},
{
"reference": "#sid-b582d59d887d4"
},
{
"reference": "#sid-87677dfd43394"
}
],
"status": "final",
"subject": {
"reference": "Patient/patient01"
}
}
14.10.1.1.3. Tracking conversion errors by activating the logger function¶
As with all conversions, things can go awry. It is therefore always good to log any error when executing code.
The authors of the vcf2fhir
library provide 2 distinct logging modes, which we’ll now use.
The vcf2fhir
logging process simply builds on the well established python logging
library and therefore to use it is as simple as using said library:
14.10.1.1.3.1. i. instantiate a logger and set a error logging level¶
general_logger = logging.getLogger('vcf2fhir.general')
general_logger.setLevel(logging.DEBUG)
14.10.1.1.3.2. ii. define an file as output and a formatter pattern¶
# create console handler and set level to debug
ch = logging.FileHandler('vcf2fhir-generic-errors.log')
ch.setLevel(logging.DEBUG)
# create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
# add formatter to ch
ch.setFormatter(formatter)
# add ch to logger
general_logger.addHandler(ch)
14.10.1.1.4. Using the dedicated invalid_record_logger
:¶
14.10.1.1.4.1. i. create a logger and pass it the specific vcf2fhir logger as follows:¶
invalid_record_logger = logging.getLogger('vcf2fhir.invalidrecord')
14.10.1.1.4.2. ii. configure the logger output file, error logging level and output formatting¶
inv_ch = logging.FileHandler('vcf2fhir-invalid-record-errors.log')
inv_ch.setLevel(logging.DEBUG)
inv_ch.setFormatter(formatter)
note
: *we reuse the formatter
created previously
14.10.1.1.4.3. iii. plug the error handler in the logger and execute¶
invalid_record_logger.addHandler(inv_ch)
14.10.1.1.5. We can now read the log file to check what happened during the conversion from VCF to FHIR JSON.¶
This is an important Quality Control step as the vcf2fhir
is still experimental and under active development.
Therefore, users of the tool need to excert critical thinking, understand the parsing and conversion rules as well as understand
the limit of the envelop of the tool, as explained by the authors in their manuscript .
with open('vcf2fhir-invalid-record-errors.log','r') as input:
lines=input.readlines()
print(lines[0:10])
["2021-10-22 09:37:17,523 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF INFO.SVTYPE must be in ['INS', 'DEL', 'DUP', 'CNV', 'INV']. Record: Record(CHROM=M, POS=11551, REF=T, ALT=[TN[M:16141[]), considered sample: CallData(GT=1, PS=None)\n", "2021-10-22 09:37:17,524 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF INFO.SVTYPE must be in ['INS', 'DEL', 'DUP', 'CNV', 'INV']. Record: Record(CHROM=M, POS=11562, REF=T, ALT=[TN]11:49883566]]), considered sample: CallData(GT=1)\n", "2021-10-22 09:37:17,525 - vcf2fhir.invalidrecord - DEBUG - Reason: Mitochondrial DNA with GT = 0 or its diploid, Record: Record(CHROM=M, POS=6021, REF=A, ALT=[C]), considered sample: CallData(GT=0|1, PS=60003, DP=15, AD=['12', '3'], CGA_RDP=12)\n", "2021-10-22 09:37:17,526 - vcf2fhir.invalidrecord - DEBUG - Reason: Mitochondrial DNA with GT = 0 or its diploid, Record: Record(CHROM=M, POS=6027, REF=A, ALT=[C]), considered sample: CallData(GT=0|1, PS=60003, DP=17, AD=['13', '4'], CGA_RDP=13)\n", "2021-10-22 09:37:17,526 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF FORMAT.GT is in ['0/0','0|0','0'], Record: Record(CHROM=M, POS=6028, REF=A, ALT=[C]), considered sample: CallData(GT=0, PS=60003, DP=17, AD=['13', '4'], CGA_RDP=13)\n", "2021-10-22 09:37:20,983 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF INFO.SVTYPE must be in ['INS', 'DEL', 'DUP', 'CNV', 'INV']. Record: Record(CHROM=M, POS=11551, REF=T, ALT=[TN[M:16141[]), considered sample: CallData(GT=1, PS=None)\n", "2021-10-22 09:37:20,983 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF INFO.SVTYPE must be in ['INS', 'DEL', 'DUP', 'CNV', 'INV']. Record: Record(CHROM=M, POS=11562, REF=T, ALT=[TN]11:49883566]]), considered sample: CallData(GT=1)\n", "2021-10-22 09:37:20,987 - vcf2fhir.invalidrecord - DEBUG - Reason: Mitochondrial DNA with GT = 0 or its diploid, Record: Record(CHROM=M, POS=6021, REF=A, ALT=[C]), considered sample: CallData(GT=0|1, PS=60003, DP=15, AD=['12', '3'], CGA_RDP=12)\n", "2021-10-22 09:37:20,987 - vcf2fhir.invalidrecord - DEBUG - Reason: Mitochondrial DNA with GT = 0 or its diploid, Record: Record(CHROM=M, POS=6027, REF=A, ALT=[C]), considered sample: CallData(GT=0|1, PS=60003, DP=17, AD=['13', '4'], CGA_RDP=13)\n", "2021-10-22 09:37:20,988 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF FORMAT.GT is in ['0/0','0|0','0'], Record: Record(CHROM=M, POS=6028, REF=A, ALT=[C]), considered sample: CallData(GT=0, PS=60003, DP=17, AD=['13', '4'], CGA_RDP=13)\n"]
14.10.1.1.6. Conclusion¶
With this notebook, we’ve shown how to convert genetic variation information held in a VCF formatted file (it must comply with v4.1 or higher for this conversion to work) and generate a JSON-based HL7 FHIR Genomics Report message.
14.10.1.1.6.1. Why does this matter and how does it relate to FAIR:¶
The conversion from VCF to HL7 FHIR JSON message has to do with the **I and R**
of FAIR
, that is interoperability and reusability.
From a syntactic standpoint, the availability of genetic variation information at a granular level in an easily parseable form (JSON) is a gain for anyone looking at merging this information with other clinical messages.
From a semantic standpoint, the reliance on LOINC
vocabulary to mark up the patterns defined in the HL7 FHIR Genomics Reports enhances interoperation between systems by provided unambiguous annotations.
Finally, as more systems are able to produce FHIR message from a variety of instruments or data sources, the availability of a FHIR message covering a subset of genetic variation available from testing facilities makes investigating and mining phenotypic and genotypic relations more straightforward.
However, one needs to remember that the capability affored by the vcf2fhir
library is at an early stage and only supports simple cases. More efforts and more efforts is needed before a functionality is available at a Technical Readiness Level compatible with production systems.
14.10.1.1.7. Reference:¶
Reference
Dolin, R.H., Gothi, S.R., Boxwala, A. et al. vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration. BMC Bioinformatics 22, 104 (2021). https://doi.org/10.1186/s12859-021-04039-1
14.10.1.2. Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
University of Oxford |
Writing - Original Draft |
||||
University of Oxford |
Writing - Review & Editing, Funding Acquisition |
||||
University of Luxembourg |
Writing - Review & Editing |