SARS-CoV-2-Queries

SARS-CoV-2

Genomes

Perhaps the first question should be, which genomes have been measured for the SARS-CoV-2 virus:

SPARQL sparql/genomes.rq (run, edit)

SELECT ?genome WHERE {
  wd:Q82069695 wdt:P527/wdt:P6800 ?genome .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
}

Which lists these genome URLs:

genome
https://gisaid.org/CoV2020
https://www.ncbi.nlm.nih.gov/assembly/GCF_009858895.2
https://www.ncbi.nlm.nih.gov/genome/86693
https://www.ncbi.nlm.nih.gov/nuccore/1798174254

Variants

Multiple variants of the virus genome made it into the international news. Originally thes were known as a Danish variant, a South-African variant, and a South-England variant. But the variants were only first discovered there, and the variant is not caused by anything related to the region. The following variants are listed in Wikidata, and includes the PANGO lineage code:

variant pango
SARS-CoV-2 lineage XBB.1 (edit) XBB.1
SARS-CoV-2 Lineage BA.2.75.2 (edit) BA.2.75.2
SARS-CoV-2 Lineage BQ.1.1 (edit) BQ.1.1
SARS-CoV-2 Lineage BA.1.1 (edit) BA.1.1
SARS-CoV-2 Lineage BA.5.2.1 (edit) BA.5.2.1
SARS-CoV-2 Lineage BA.2.75 (edit) BA.2.75
SARS-CoV-2 lineage XBB.1.5 (edit) XBB.1.5
SARS-CoV-2 Lineage BQ.1 (edit) BQ.1
SARS-CoV-2 lineage XD (edit) XD
SARS-CoV-2 Lineage BA.5.2 (edit) BA.5.2
SARS-CoV-2 lineage XBB (edit) XBB
Cluster 5 (edit)
SARS-CoV-2 Lineage BF.7 (edit) BF.7
SARS-CoV-2 Lineage BA.3 (edit) BA.3
SARS-CoV-2 20C clade (edit)
SARS-CoV-2 Delta plus variant (edit) AY.1
SARS-CoV-2 Lineage R.1 (edit) R.1
Deltacron (edit)
SARS-CoV-2 Lineage AZ.5 (edit) AZ.5
SARS-CoV-2 Lineage B.1.214.2 (edit) B.1.214.2
SARS-CoV-2 Beta variant (edit) B.1.351
SARS-CoV-2 Mu variant (edit) B.1.621
Lineage B.1.427 (edit) B.1.427
SARS-CoV-2 Theta variant (edit) P.3
SARS-CoV-2 Iota variant (edit) B.1.526
SARS-CoV-2 Eta variant (edit) B.1.525
SARS-CoV-2 Lineage AY.4.2 (edit) AY.4.2
SARS-CoV-2 lineage B.1.617 (edit) B.1.617
SARS-CoV-2 Lineage BA.2.12 (edit) BA.2.12
SARS-CoV-2 Zeta variant (edit) P.2
SARS-CoV-2 Kappa variant (edit) B.1.617.1
SARS-CoV-2 Lineage BA.1 (edit) BA.1
Lineage B.1.618 (edit) B.1.618
SARS-CoV-2 lineage XF (edit) XF
SARS-CoV-2 Lineage C.36.3 (edit) C.36.3
SARS-CoV-2 Lineage B.1.617.3 (edit) B.1.617.3
SARS-CoV-2 Lineage B.1.630 (edit) B.1.630
SARS-CoV-2 Lineage B.1.466.2 (edit) B.1.466.2
SARS-CoV-2 Lineage B.1.1.519 (edit) B.1.1.519
SARS-CoV-2 Lineage B.1.1.318 (edit) B.1.1.318
SARS-CoV-2 Lambda variant (edit) C.37
SARS-CoV-2 Lineage BA.5 (edit) BA.5
SARS-CoV-2 Lineage C.1.2 (edit) C.1.2
SARS-CoV-2 lineage B.1.640.2 (edit) B.1.640.2
SARS-CoV-2 Lineage B.1.619 (edit) B.1.619
SARS-CoV-2 Lineage BA.4 (edit) BA.4
SARS-CoV-2 Lineage AY.20 (edit) AY.20
SARS-CoV-2 Lineage BA.2 (edit) BA.2
Q109046536 (edit)
SARS-CoV-2 Delta variant (edit) B.1.617.2
Lineage B.1.620 (edit) B.1.620
SARS-CoV-2 lineage B.1.640 (edit) B.1.640
SARS-CoV-2 Lineage AV.1 (edit) AV.1
Lineage B.1.1.7 with E484K (edit)
SARS-CoV-2 Omicron variant (edit) B.1.1.529
SARS-CoV-2 Alpha variant (edit) B.1.1.7
SARS-CoV-2 lineage XE (edit) XE
Lineage B.1.1.207 (edit) B.1.1.207
SARS-CoV-2 Lineage B.1.429 (edit) B.1.429
SARS-CoV-2 Gamma variant (edit) P.1
SARS-CoV-2 Lineage C.36 (edit) C.36
Lineage B.1.616 (edit) B.1.616
SARS-CoV-2 Lineage BA.2.12.1 (edit) BA.2.12.1

These were found in Wikidata with this query:

SPARQL sparql/sarscov2Variants.rq (run, edit)

SELECT DISTINCT ?variant ?variantLabel ?pango WHERE {
  VALUES ?variantType { wd:Q15304597 wd:Q75913269 }
  { ?variant p:P31 [ ps:P31 ?variantType ; pq:P642 wd:Q82069695 ] . }
  UNION
  { ?variant wdt:P31 wd:Q104450895 }
  OPTIONAL { ?variant wdt:P9632 ?pango }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
}

These variants are all SARS-CoV-2, but a common type of combinations or sequence variants found in them gives them different properties. For examples, VUI–202012/01 (also known as B.1.1.7) has a combination of 17 sequence variants, see this write up. It must be noted that many of these 17 sequence variants are found in other SARS-CoV-2 variants too.

We can list all sequence variants listed in Wikidata (out of a few thousand!) with this query:

SPARQL sparql/sequenceVariants.rq (run, edit)

SELECT ?variant ?variantLabel ?sequence ?sequenceLabel ?taxon ?taxonLabel WHERE {
  ?variant wdt:P3433 ?sequence .
  ?sequence wdt:P703 / wdt:P171* wd:Q82069695 .
  ?variant  wdt:P703 ?taxon .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

This gives us this list:

variant sequence taxon
C3267T (edit) ORF1a polyprotein;ORF1ab polyprotein (edit) SARS-CoV-2 Alpha variant (edit)
N501Y mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Alpha variant (edit)
N501Y mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Beta variant (edit)
N501Y mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Gamma variant (edit)
N501Y mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Theta variant (edit)
C5388A (edit) ORF1a polyprotein;ORF1ab polyprotein (edit) SARS-CoV-2 Alpha variant (edit)
C27972T (edit) ORF8 protein (edit) SARS-CoV-2 Alpha variant (edit)
P681H mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Alpha variant (edit)
P681H mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Theta variant (edit)
P681H mutation (edit) surface glycoprotein (edit) SARS-CoV-2 Lineage B.1.1.318 (edit)
P681H mutation (edit) surface glycoprotein (edit) Lineage B.1.1.207 (edit)
HV 69-70 deletion (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
Y144 deletion (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
ORF8 ∆382 (edit) ORF8 protein (edit) SARS-CoV-2 (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Beta variant (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) Cluster 5 (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Gamma variant (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) Lineage B.1.427 (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Lineage B.1.429 (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Eta variant (edit)
D614G mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Lineage B.1.1.318 (edit)
T6954C (edit) ORF1a polyprotein;ORF1ab polyprotein (edit) SARS-CoV-2 Alpha variant (edit)
11288-11296 deletion (edit) ORF1a polyprotein;ORF1ab polyprotein (edit) SARS-CoV-2 Alpha variant (edit)
C23271A (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
C23709T (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
T24506G (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
G24914C (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
G28048T (edit) ORF8 protein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
A28111G (edit) ORF8 protein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
C28977T (edit) nucleocapsid protein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
D3L (edit) nucleocapsid protein [SARS-CoV-2] (edit) SARS-CoV-2 Alpha variant (edit)
K417N mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 (edit)
K417N mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 Beta variant (edit)
E484A mutation (edit) spike glycoprotein [SARS-CoV-2] (edit) SARS-CoV-2 (edit)

Each sequence variant is a change in the genes encoded by the viral DNA and cause a change in the protein encoded by that gene. The following two sections lists all genes and proteins. An interestion online book is found online under the title A sequence alignment and analysis of SARS-CoV-2 spike glycoprotein [1].

Genes

The RNA of SARS-CoV-2 has been sequenced. Therefore, the open reading frames are known and identified. We can query for the gene information in Wikidata with thie query:

SPARQL sparql/virusGenes.rq (run, edit)

SELECT ?gene ?geneLabel ?ncbigene WHERE {
  ?gene wdt:P703 wd:Q82069695 ; wdt:P31 wd:Q7187 .
  OPTIONAL { ?gene wdt:P351 ?ncbigene }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
}

Which gives us these genes:

gene ncbigene
surface glycoprotein (edit) 43740568
ORF1a polyprotein;ORF1ab polyprotein (edit) 43740578
ORF3a protein-encoding gene (edit) 43740569
envelope protein (edit) 43740570
membrane glycoprotein (edit) 43740571
ORF6 protein (edit) 43740572
ORF7a protein (edit) 43740573
ORF7b (edit) 43740574
ORF8 protein (edit) 43740577
nucleocapsid phosphoprotein (edit) 43740575
ORF10 protein (edit) 43740576
Record to support submission of GeneRIFs for a gene not in Gene (COVID-19 virus; HCoV-19; Human coronavirus 2019; SARS-2; SARS-CoV2; SARS2). (edit) 43562271
ORF3d (edit)

Proteins

Alternatively, we may be interested in the proteins of the coronaviruses. We can get those with this query:

SPARQL sparql/virusProteins.rq (run, edit)

SELECT ?protein ?proteinLabel ?short ?refseq ?uniprot ?guideToPharma WHERE {
  ?protein wdt:P703 wd:Q82069695 ; wdt:P31 wd:Q8054 .
  OPTIONAL { ?protein wdt:P637 ?refseq }
  OPTIONAL { ?protein wdt:P352 ?uniprot }
  OPTIONAL { ?protein wdt:P5458 ?guideToPharma }
  OPTIONAL { ?protein wdt:P1813 ?short }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
} ORDER BY ASC(?protein) ASC(?uniprot)

Which gives us these proteins:

protein short refseq uniprot guideToPharma
putative protein ORF3c (edit) P0DTG1
Non-structural protein 11 [SARS CoV-2] (edit) nsp11 YP_009725312 P0DTC1-PRO_0000449645
ORF3b protein [SARS-CoV-2] (edit) P0DTF1
ORF9c protein [SARS CoV-2] (edit) P0DTD3
S1 Subunit of Spike Protein (edit) S1 P0DTC2-PRO_0000449647
S2 Subunit of Spike Protein (edit) P0DTC2-PRO_0000449648
S2' Subunit of Spike Protein (edit) P0DTC2-PRO_0000449649
Non-structural protein 10 [SARS CoV-2] (edit) nsp10 YP_009725306 P0DTC1-PRO_0000449644
Non-structural protein 10 [SARS CoV-2] (edit) nsp10 YP_009725306 P0DTD1-PRO_0000449628
non-structural protein 15 [SARS-CoV-2] (edit) nsp15 YP_009725310 P0DTD1-PRO_0000449632
non-structural protein 16 [SARS-CoV-2] (edit) nsp16 YP_009725311 P0DTD1-PRO_0000449633
Papain-like proteinase [SARS-CoV-2] (edit) nsp3 YP_009725299.1 P0DTC1-PRO_0000449637
Papain-like proteinase [SARS-CoV-2] (edit) nsp3 YP_009725299.1 P0DTD1-PRO_0000449621
non-structural protein 5 [SARS-CoV-2] (edit) nsp5 YP_009725301 P0DTC1-PRO_0000449639 3111
non-structural protein 5 [SARS-CoV-2] (edit) nsp5 YP_009725301 P0DTD1-PRO_0000449623 3111
nucleocapsid protein [SARS-CoV-2] (edit) N YP_009724397.2 P0DTC9 3121
spike glycoprotein [SARS-CoV-2] (edit) S YP_009724390.1 P0DTC2 3114
orf1ab polyprotein [SARS-Cov 2] (edit) YP_009724389.1 P0DTD1 3125
Viroporin 3a [SARS-CoV-2] (edit) YP_009724391.1 P0DTC3 3115
envelope protein [SARS-CoV-2] (edit) E YP_009724392.1 P0DTC4 3116
membrane protein [SARS-CoV-2] (edit) M YP_009724393.1 P0DTC5 3117
non-structural protein 6 [SARS-CoV-2] (edit) nsp6 P0DTC1-PRO_0000449640 3118
non-structural protein 6 [SARS-CoV-2] (edit) nsp6 P0DTD1-PRO_0000449624 3118
ORF1a polyprotein (edit) YP_009725295.1 P0DTC1 3124
Protein 7a [SARS-CoV-2] (edit) YP_009724395.1 P0DTC7 3119
Protein non-structural 7b [SARS-CoV-2] (edit) YP_009725318.1 P0DTD8 3123
ORF8 protein [SARS-CoV-2] (edit) YP_009724396.1 P0DTC8 3120
Non-structural protein 2 [SARS CoV-2] (edit) nsp2 P0DTC1-PRO_0000449636
Non-structural protein 2 [SARS CoV-2] (edit) nsp2 P0DTD1-PRO_0000449620
ORF6 protein [SARS-CoV-2] (edit) YP_009724394.1 P0DTC6
ORF10 protein [SARS-CoV-2] (edit) YP_009725255.1 A0A663DJA2
Protein ORF9b [SARS-CoV-2] (edit) P0DTD2 3122
Non-structural protein 9 [SARS-CoV-2] (edit) nsp9 P0DTC1-PRO_0000449643
Non-structural protein 9 [SARS-CoV-2] (edit) nsp9 P0DTD1-PRO_0000449627
Host translation inhibitor nsp1 [SARS-CoV-2] (edit) nsp1 P0DTC1-PRO_0000449635
Host translation inhibitor nsp1 [SARS-CoV-2] (edit) nsp1 P0DTD1-PRO_0000449619
Non-structural protein 4 [SARS-CoV-2] (edit) nsp4 P0DTC1-PRO_0000449638
Non-structural protein 4 [SARS-CoV-2] (edit) nsp4 P0DTD1-PRO_0000449622
Non-structural protein 7 [SARS-CoV-2] (edit) nsp7 P0DTC1-PRO_0000449641
Non-structural protein 7 [SARS-CoV-2] (edit) nsp7 P0DTD1-PRO_0000449625
RNA-directed RNA polymerase [SARS-CoV-2] (edit) nsp12 YP_009725307 P0DTD1-PRO_0000449629
Non-structural protein 14 [SARS-CoV-2] (edit) nsp14 YP_009725309 P0DTD1-PRO_0000449631
Helicase [SARS-CoV-2] (edit) nsp13 YP_009725308 P0DTD1-PRO_0000449630
Non-structural protein nsp8 [SARS-CoV-2] (edit) nsp8 P0DTC1-PRO_0000449642
Non-structural protein nsp8 [SARS-CoV-2] (edit) nsp8 P0DTD1-PRO_0000449626

Protein complexes

Thanks to work done by a team at the online BioHackathon in April 2020, macromolecular structures from the Complex Portal [2,3] have been entering Wikidata:

SPARQL sparql/complexes.rq (run, edit)

SELECT ?cpx ?complex ?complexLabel WHERE {
  ?complex wdt:P7718 ?cpx ;
           wdt:P703 wd:Q82069695
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
}

Listing these complexes:

cpx complex
CPX-5687 SARS-CoV-2 NSP9 complex (edit)
CPX-5683 SARS-CoV-2 Spike - human ACE2 receptor complex (edit)
CPX-5684 SARS-CoV-2 Spike - human ACE2-SLC6A19 complex (edit)
CPX-5682 SARS-CoV-2 cleaved Spike protein complex (edit)
CPX-5685 SARS-CoV-2 main protease complex (edit)
CPX-5686 SARS-CoV-2 nucleocapsid complex (edit)
CPX-5688 SARS-CoV-2 NSP10-NSP16 2'-O-methyltransferase complex (edit)
CPX-5689 SARS-CoV-2 NSP15 complex (edit)
CPX-5690 SARS-CoV-2 primase complex (edit)
CPX-5691 SARS-CoV-2 NSP3-NSP4-NSP6 complex (edit)
CPX-5692 SARS-CoV-2 3'-5' exoribonuclease proof-reading complex (edit)
CPX-5742 SARS-CoV-2 polymerase complex (edit)
CPX-6098 SARS-CoV-2 3a complex (edit)
CPX-6100 SARS-CoV-2 9b complex (edit)
CPX-6147 SARS-CoV-2 ORF8 complex (edit)
CPX-6442 SARS-CoV-2 replication and transcription complex (edit)
CPX-6761 SARS-CoV-2 Spike - human CLEC4M lectin complex (edit)
CPX-7041 SARS-CoV-2 Cap(0)-replication and transcription complex (edit)
CPX-7042 SARS-CoV-2 uncleaved Spike protein complex (edit)
CPX-7043 SARS-CoV-2 post-fusion S2 Spike complex (edit)
CPX-7083 SARS-CoV-2 dimeric Cap(0)-replication and transcription complex (edit)

PDB structures

For the proteins, we can then query for the PDB structures [4]:

SPARQL sparql/virusProteinsPDB.rq (run, edit)

SELECT ?protein ?proteinLabel ?refseq ?uniprot ?pdb WHERE {
  ?protein wdt:P703 wd:Q82069695 ; wdt:P31 wd:Q8054 .
  ?protein wdt:P638 ?pdb .
  OPTIONAL { ?protein wdt:P637 ?refseq }
  OPTIONAL { ?protein wdt:P352 ?uniprot }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
}

The full list can be found on the linked sparql/complexes.rq page, which has become quite long so we will just visualize the number of PDB entries per protein here:

Which was created with this query:

SPARQL sparql/virusProteinsPDBBubbleChart.rq (run, edit)

#defaultView:BubbleChart
SELECT ?protein ?proteinLabel (COUNT(?pdb) AS ?count) WHERE {
  ?protein wdt:P703 wd:Q82069695 ; wdt:P31 wd:Q8054 .
  ?protein wdt:P638 ?pdb .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
} GROUP BY ?protein ?proteinLabel

References

  1. Sgro J-Y. A sequence alignment and analysis of SARS-CoV-2 spike glycoprotein [Internet]. 2020. Available from: https://static-bcrf.biochem.wisc.edu/tutorials/COVID19/spikealignment/book/
  2. Meldal BHM, Bye-A-Jee H, Gajdoš L, Hammerová Z, Horácková A, Melicher F, et al. Complex Portal 2018: extended content and enhanced visualization tools for macromolecular complexes. NAR. 2019 Jan 1;47(D1):D550–8. doi:10.1093/NAR/GKY1001 (Scholia)
  3. Meldal BHM, Perfetto L, Combe C, Lubiana T, Cavalcante JVF, Bye-A-Jee H, et al. Complex Portal 2022: new curation frontiers. NAR. 2021 Oct 29; doi:10.1093/NAR/GKAB991 (Scholia)
  4. Burley SK, Berman HM, Kleywegt G, Markley JL, Nakamura H, Velankar S. Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. Methods Mol Biol. 2017 Jan 1;1607:627–41. doi:10.1007/978-1-4939-7000-1_26 (Scholia)