Searching (grep etc.)
1. Match all occurances of a value in a vector in another vector:
View(DF[which(DF$Gene %in% c("L1TD1","NANOG","POU5F1B","LIN28A","SOX2","YAP1","KRAS","TP53","BRAF","VHL")),])
The above line works only if search is done in an atomic vector.
The following line works to find ALL occurances of the gene including probes that have 1:many mapping.
idx <- grep(c("\\bL1TD1\\b|\\bNANOG\\b|\\bPOU5F1B\\b|\\bLIN28A\\b|\\bDNMT3B\\b"),
DF$symbol)
subset <- DF[idx,]
2. Search a character in a string and get substring
searchI <- gregexpr("ins",var)
start.posI <- searchI[[1]][1]
##ins.frag.length <- attributes(search[[1]])[1]
substr(var,start.posI,nchar(var))
3. Fetch a few columns from a dataframe by some keyword in the column names.
parameters <- c("P.Value","Score")
pattern <- paste("\\b",parameters,"\\b",sep="",collapse="|")
idx <- grep(pattern = pattern,x = colnames(masterDF))
subtab <- masterDF[,idx];rm(idx,pattern,parameters)
4. From a vector of strings, select from first char to a particular character e.g. “_”
searchI <- gregexpr("_",var)
start.posI <- unlist(lapply(searchI, `[[`, 1))
temp <- substr(var,0,start.posI-1)
5. Selecting specific number of characters, substring:
substr(x=colnames(mat),start=1,stop=12)
6. Find the integers contained within a pair of parentheses.
e.g. if the data is A(212)XUY and you want to extract 212.
unique(gsub("[\\(\\)]", "", regmatches(longDF1$Region, (gregexpr("\\(.*?\\)", longDF1$Region)))))