#metawinterschool contribution by BiGCaT
This workshop consists of a part about identifier mapping (BridgeDb) and pathway analysis (WikiPathways and PathVisio).
BridgeDb is a project, an application programming interface (API), a Java library, a webservice, a collection of identifier mapping database, and more (see doi:10.1186/1471-2105-11-5).
In this practical you will use BridgeDb to modify identifier mappings. However, using the R package does require you to have a working rJava package, which sometimes is hard to get working.
First, install the BridgeDbR package. This can be done with Bioconductor:
source("https://bioconductor.org/biocLite.R")
biocLite("BridgeDbR")
Alternatively, install the package directly from GitHub:
install.packages("rJava") # if not present already
install.packages("RCurl") # if not present already
install.packages("devtools") # if not present already
library(devtools)
install_github("bridgedb/BridgeDbR")
library(BridgeDbR)
And, another alternative, use the docker container from https://hub.docker.com/r/bioconductor/devel_metabolomics2/ which has all package in the BioC View Metabolomics pre-installed. From next week on that should also contain bridgedb.
docker run -it bioconductor/devel_metabolomics2 bash
or with the built-in rstudio-server in that docker container, see http://bioconductor.org/help/docker/
Besides a toolkit that can do identifier mapping, we actually still need the data that does the identifier mapping. The latest metabolite ID mapping database can be downloaded from Figshare.
The vignette on Bioconductor was a short walkthrough how to use the R package.
This identifier database can then be loaded into R with (it assumes the file is located in the current working folder):
mbmaps = loadDatabase("metabolites_20180201.bridge")
With the mappin data loaded, a single metabolite identifier (CHEBI:55) can be mapped to other identifiers with the following code:
map(mbmaps, "Ce", "CHEBI:55")
If you have a data matrix (like macs_glucose_challenge.tsv), then a helper function can be helpful, for example, to handle multiple mapped identifiers:
helper = function(x) {
mappings = map(mbmaps, "Ch", x, "Wd")
if (length(mappings) == 1) {
result = mappings
} else {
result = mappings[1]
}
return(result)
}
This function takes a HMDB identifier(“Ch”) and returns the corresponding Wikidata identifier(“Wd”). Please check the documentation from BridgeDb, to find the other databases to which you could map.
You can load that aforementioned data file with:
data <- read.table("macs_glucose_challenge.tsv", sep='\t', header=TRUE)
The seccond column has the HMDB identifiers:
data[,2]
We can convert all these identifiers with our helper function and the sapply function (and some R magic):
wikidata = unlist(sapply(as.character(data[,2]), helper))
data2 = cbind(c(wikidata,""),data)
You can verify that everything went fine with for example:
data2[1:4,1:5]
map(mbmaps, "Wd", "Q27075135", "Ch")
Pathway analysis tries to find pathways where the more interesting biology is changed.
WikiPathways (see this Scholia page and this feedback with use cases) is a community project to develop an Open knowledge database of biological pathways. Over the years many communities have collaborated via the website’s portals. There is a colorful academy where you can learn about the ins and outs of WikiPathways.
PathVisio is one of the tools that can be used to explore pathways, map experimental data onto pathways, and do pathway enrichment (see doi:10.1371/journal.pcbi.1004085).
PathVisio has developed tutorials but these are often aimed at transcriptomics data. However, metabolomics data is basically of the same structure: it all links experimental data to gene, protein, or metabolite identifiers.
Please take note of these general tutorials so that you know what information you can find there. These more general tutorials may be helpful if the below tutorial is not entirely clear. After that, please proceed to the metabolomics practical for today.
Copyright (C) 2018 Egon Willighagen, Creative Commons Attribution 4.0 International