Searching (grep etc.)

1. Match all occurances of a value in a vector in another vector:

View(DF[which(DF$Gene %in% c("L1TD1","NANOG","POU5F1B","LIN28A","SOX2","YAP1","KRAS","TP53","BRAF","VHL")),])

The above line works only if search is done in an atomic vector.

The following line works to find ALL occurances of the gene including probes that have 1:many mapping.

idx <- grep(c("\\bL1TD1\\b|\\bNANOG\\b|\\bPOU5F1B\\b|\\bLIN28A\\b|\\bDNMT3B\\b"),
          DF$symbol)
subset <- DF[idx,]

2. Search a character in a string and get substring

 searchI <- gregexpr("ins",var)
 start.posI <- searchI[[1]][1]
 ##ins.frag.length <- attributes(search[[1]])[1]
 substr(var,start.posI,nchar(var))

3. Fetch a few columns from a dataframe by some keyword in the column names.

 parameters <- c("P.Value","Score")
 pattern <- paste("\\b",parameters,"\\b",sep="",collapse="|")
 idx <- grep(pattern = pattern,x = colnames(masterDF))
 subtab <- masterDF[,idx];rm(idx,pattern,parameters)

4. From a vector of strings, select from first char to a particular character e.g. “_”

 searchI <- gregexpr("_",var)
 start.posI <- unlist(lapply(searchI, `[[`, 1))
 temp <- substr(var,0,start.posI-1)

5. Selecting specific number of characters, substring:

 substr(x=colnames(mat),start=1,stop=12)

6. Find the integers contained within a pair of parentheses.

e.g. if the data is A(212)XUY and you want to extract 212.

 unique(gsub("[\\(\\)]", "", regmatches(longDF1$Region, (gregexpr("\\(.*?\\)", longDF1$Region)))))
Previous
Next