What the pipeline does?
The pipeline will help you find and visualize significant Methylation Susceptible Loci (MSL) between the groups using 3 SIMPLE STEPS:
msap
package.find.sig.loci( )
that will take the output of msap
and return the significant loci. The function is based on multiple fisher exact tests followed by p-value adjustment using FDR.What do you need?
You will need to install the package msap
and prepare the data format that it needs. That is all you need for this pipeline.
Getting Started
Lets run the three steps for the pipeline. Before starting with the following command, copy and run the function sourcecode provided in the last section.
Use the msap
package and the input data to find the MSL.
library(msap)
msap.out=msap("your_data.csv",name = "any_name")
pass the output to the custom function find.sig.loci( )
(more info about the function in subsequent sections)
my.out=find.sig.loci(msap.out)
## analysing....
## Total loci: 153
## Total samples: 41
## Total number of groups: 2
## replacing types with numbers suitable for fisher test and heatmap..
## Performing multiple fisher tests
## Calculating FDR using benjamini and hochberg
## Warning in p.adjust(pval, method = "hochberg"): NAs introduced by coercion
## filtering significant loci
## 25 loci are found significant.
Note: The warning during coercion to NA is produced when the input data has only one variable and canโt be used for fisher test. The cistome function returns a na for such cases.
and now the output my.out
is a list of five elements:
names(my.out)
## [1] "heatmap.matrix" "Significant.Loci" "All.loci.results"
## [4] "complete.data" "group.cont"
where, *heatmap.matrix
is a output matrix of numbers significant loci suitable for creating the heatmap.
*significant.loci
is character vector containing the significant loci selected.
*All.loci.results
is the overall result showing the p-values and adjusted p-values for all the loci(not filtered). The pipeline uses a filter of FDR<0.05. While using this data you can use your own filter.
*complete.data
contains the matrix of all loci and samples together.
*group.count
can be used to know the sample count for each group for using it in heatmap cluster coloring.
#significant loci
my.out$Significant.Loci
## [1] "X104.372303" "X166.2043885" "X177.2979221" "X178.4825882"
## [5] "X189.2535821" "X247.2600556" "X264.6262937" "X306.2137821"
## [9] "X307.6917062" "X312.9926866" "X329.91875" "X347.8885"
## [13] "X365.7359649" "X373.8959322" "X375.1234703" "X376.7047115"
## [17] "X380.05265" "X384.7438168" "X394.4164319" "X439.0106726"
## [21] "X447.656391" "X450.4504651" "X456.8275" "X526.8863354"
## [25] "X547.6297468"
## attr(,"na.action")
## [1] 20
## attr(,"class")
## [1] "omit"
head(my.out$All.loci.results)
## loci pval p.adjust.pval..method....hochberg..
## 1 X98.61148718 0.0697636307412465 1.0000000000
## 2 X104.372303 1.09988172605173e-06 0.0001649823
## 3 X104.6896429 0.000479415113561449 0.0589680590
## 4 X106.0720225 0.0612799103669727 1.0000000000
## 5 X108.8681673 0.00862025252269153 0.8620252523
## 6 X110.0554777 0.0387238934603335 1.0000000000
#number of samples
my.out$group.cont
## grpNAME grpCOUNT
## 1 G1 21
## 2 G2 20
Now get the heatmap matrix data and plot the heatmap. I have used the heatmap.2( )
function from gplots
package. You can use any package or settings.
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
heat.data=my.out$heatmap.matrix
#for getting the group colors for the samples marked as G1,G2.. size of which obtained from group,cont dataframe
row.cluster.colors= c(rep("green",21),rep("red",20))
heatmap.2(as.matrix(heat.data),
Rowv = F,
trace = "none",
col = c('#eff3ff','#bdd7e7','#6baed6','#2171b5') ,
RowSideColors = ,
dendrogram="column")
NOTE: The four shades of blue corresponds to the four levels of loci TypeI-IV(Type I, unmethylated; Type II, inner cytosine methylation; Type III, hemi-methylation of the outer cytosine; Type IV, fully methylation.) in the order 1-4.
Function Sourcecode
Copy the following function code and run it in your R environment. This will create two functions, find.sig.loci( )
and tryCatchFisher( )
.
###########################################################################
########### STATISTICAL ANALYSIS FOR FINDING SIGNIFICANT LOCI BW GROUPS####
############## @author: Lakshay Anand #####################################
find.sig.loci<- function(msap.input){
out=list()
ptrn=msap.input[[2]]
grpNAME=c()
grpCOUNT=c()
grpALL=data.frame()
for(d in 1:length(ptrn)){
Grp=ptrn[[d]]
rownames(Grp)= c(paste("G",d,"-",c(1:nrow(Grp)),sep = ""))
grpNAME[d]=paste("G",d,sep = "")
grpCOUNT[d]=nrow(Grp)
grpALL=rbind.data.frame(grpALL,Grp)
}
cat("analysing.... \n")
cat("Total loci:",ncol(grpALL),"\n")
cat("Total samples:",nrow(grpALL),"\n")
cat("Total number of groups:",length(ptrn),"\n")
############## replace types with numbers
message("replacing types with numbers suitable for fisher test and heatmap.. \n")
for(i in 1:ncol(grpALL)){
for(j in 1:nrow(grpALL)){
if(grpALL[j,i]=="u"){
grpALL[j,i]=1
}
else if(grpALL[j,i]=="h"){
grpALL[j,i]=2
} else if(grpALL[j,i]=="i"){
grpALL[j,i]=3
} else if(grpALL[j,i]=="f"){
grpALL[j,i]=4
}
}
}
for(i in 1:ncol(grpALL)){
grpALL[,i]=as.integer(grpALL[,i])
}
#####################################################
#creating groups vector for fisher test
groups.vector=c()
for(i in 1:length(rownames(grpALL))){
groups.vector[i]=strsplit(rownames(grpALL)[i],"-")[[1]][1]
}
############### multiple fisher test ##########
message("Performing multiple fisher tests \n")
pval=c()
loci=c()
for(i in 1:ncol(grpALL)){
loci[i]=colnames(grpALL)[i]
pval[i]=tryCatchFisher(table(groups.vector,grpALL[,i]))
}
message("Calculating FDR using benjamini and hochberg \n")
#result of the fisher test wit p-vale adjustment using benjamini
grpALL_LOCI=data.frame(loci,pval,p.adjust(pval,method = "BH"))
message("filtering significant loci \n")
### filter the loci with FDR <0.05
grp_sigloci=grpALL_LOCI[grpALL_LOCI[,3]<0.05,]
###collect the significant loci
sig.loci=as.character(grp_sigloci[,1])
sig.loci=na.omit(sig.loci)
cat(length(sig.loci)," loci are found significant. \n")
##### filter the loci from the matrix
grpALL_with_sig_loci=grpALL[,sig.loci]
out[[1]]=as.matrix(grpALL_with_sig_loci)
out[[2]]=sig.loci
out[[3]]=grpALL_LOCI
out[[4]]=grpALL
out[[5]]=data.frame(grpNAME,grpCOUNT)
names(out)=c("heatmap.matrix","Significant.Loci","All.loci.results","complete.data","group.cont")
return(out)
}
#### the fisher test in try catch
tryCatchFisher<-function(v){
tryCatch(fisher.test(v)[[1]],
error = function(e) {"na"})
}