metawinterschool-bigcat

#metawinterschool contribution by BiGCaT

This workshop consists of a part about identifier mapping (BridgeDb) and pathway analysis (WikiPathways and PathVisio).

Identifier Mapping

BridgeDb

BridgeDb is a project, an application programming interface (API), a Java library, a webservice, a collection of identifier mapping database, and more (see doi:10.1186/1471-2105-11-5).

In this practical you will use BridgeDb to modify identifier mappings. However, using the R package does require you to have a working rJava package, which sometimes is hard to get working.

Installation

First, install the BridgeDbR package. This can be done with Bioconductor:

source("https://bioconductor.org/biocLite.R")
biocLite("BridgeDbR")

Alternatively, install the package directly from GitHub:

install.packages("rJava") # if not present already
install.packages("RCurl") # if not present already
install.packages("devtools") # if not present already
library(devtools)
install_github("bridgedb/BridgeDbR")
library(BridgeDbR)

And, another alternative, use the docker container from https://hub.docker.com/r/bioconductor/devel_metabolomics2/ which has all package in the BioC View Metabolomics pre-installed. From next week on that should also contain bridgedb.

docker run -it bioconductor/devel_metabolomics2 bash

or with the built-in rstudio-server in that docker container, see http://bioconductor.org/help/docker/

Identifier mapping data

Besides a toolkit that can do identifier mapping, we actually still need the data that does the identifier mapping. The latest metabolite ID mapping database can be downloaded from Figshare.

Browsing the documentation

The vignette on Bioconductor was a short walkthrough how to use the R package.

Mapping identifiers

This identifier database can then be loaded into R with (it assumes the file is located in the current working folder):

mbmaps = loadDatabase("metabolites_20180201.bridge")

With the mappin data loaded, a single metabolite identifier (CHEBI:55) can be mapped to other identifiers with the following code:

map(mbmaps, "Ce", "CHEBI:55")

If you have a data matrix (like macs_glucose_challenge.tsv), then a helper function can be helpful, for example, to handle multiple mapped identifiers:

helper = function(x) {
  mappings = map(mbmaps, "Ch", x, "Wd")
  if (length(mappings) == 1) {
    result = mappings
  } else {
    result = mappings[1]
  }
  return(result)
}

This function takes a HMDB identifier(“Ch”) and returns the corresponding Wikidata identifier(“Wd”). Please check the documentation from BridgeDb, to find the other databases to which you could map.

You can load that aforementioned data file with:

data <- read.table("macs_glucose_challenge.tsv", sep='\t', header=TRUE)

The seccond column has the HMDB identifiers:

data[,2]

We can convert all these identifiers with our helper function and the sapply function (and some R magic):

wikidata = unlist(sapply(as.character(data[,2]), helper))
data2 = cbind(c(wikidata,""),data)

You can verify that everything went fine with for example:

data2[1:4,1:5] 
map(mbmaps, "Wd", "Q27075135", "Ch")

Pathway Analysis

Pathway analysis tries to find pathways where the more interesting biology is changed.

WikiPathways

WikiPathways (see this Scholia page and this feedback with use cases) is a community project to develop an Open knowledge database of biological pathways. Over the years many communities have collaborated via the website’s portals. There is a colorful academy where you can learn about the ins and outs of WikiPathways.

PathVisio

PathVisio is one of the tools that can be used to explore pathways, map experimental data onto pathways, and do pathway enrichment (see doi:10.1371/journal.pcbi.1004085).

PathVisio has developed tutorials but these are often aimed at transcriptomics data. However, metabolomics data is basically of the same structure: it all links experimental data to gene, protein, or metabolite identifiers.

Please take note of these general tutorials so that you know what information you can find there. These more general tutorials may be helpful if the below tutorial is not entirely clear. After that, please proceed to the metabolomics practical for today.