Make a function using apply, stringr, stringi, and rbind run faster

Make a function using apply, stringr, stringi, and rbind run faster

Content Index :

Make a function using apply, stringr, stringi, and rbind run faster
Tag : r , By : Dov
Date : December 05 2020, 12:18 PM

wish help you to fix your issue The background: I'm going to provide the background for the application of this code and the programmatic background. Hopefully both help. I do genomics computational work. Yep - just another biologist posing as a computer scientist. I am working on a script that will allow me to integrate a bunch of data sets by each position in the human genome. This translates to a dataframe that is over 3 billion rows by 12 columns. As a test dataset, I'm building my analysis pipeline using the yeast genome, which will generate a dataframe with about 25 million rows and 12 columns. , You can vectorise all your operations:
# Generate vector of start positions
# Goes from 0 (minimal position in given data) to maximum base position in chromosome
foo <- 0:max(as.numeric(as.character(seqData$start)))
# Split sequence into a character vector
bar <- unlist(strsplit(as.character(seqData$sequence), ""))
# Generate final data frame
data.frame(start = foo, end = foo + 1, seq = bar)
#   start end seq
# 1     0   1   a
# 2     1   2   t
# 3     2   3   t
# 4     3   4   c
# 5     4   5   a
# 6     5   6   g
# 7     6   7   a
wl <- function(data, chr) {
    startPos <- 0:max(as.numeric(as.character(data$start)))
    nucs     <- unlist(strsplit(as.character(data$sequence), ""))
    data.frame(chr, start = startPos, end = startPos + 1, seq = nucs)
# use dopar for parallel computations 
foreach(i = unique(seqData$chr), .combine = rbind) %do% {
    wl(subset(seqData, chrom == i), i)

No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

R lapply using stringi and rbind

Tag : r , By : Andrew Mattie
Date : March 29 2020, 07:55 AM
With these it helps You need to move the write.csv file out of the loop, otherwise it will override the previously saved file and you will only get the file saved at the final stage. By doing that, you will have to rbind your result outside lapply, since you can't modify the result variable in the function.
result <- do.call(rbind, lapply(list$list, function(x){
                                t <- data.frame(words = stri_extract(df$data, coll=x))
                                t<- setDT(t)[, .( Count = .N), by = words]

write.csv(result, "new.csv", row.names = F)

using captured groups in str_replace / stri_replace - stringi vs stringr

Tag : r , By : glisignoli
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , If you look at the source for stringr::str_replace_all you'll see that it calls fix_replacement(replacement) to convert the \\# capture group references to $#. But the help on stringi:: stri_replace_all also clearly shows that you use $1, $2, etc for the capture groups.
str <- "thisIsCamelCase aintIt"
stri_replace_all(str, regex="(?<=[a-z])([A-Z])", replacement=" $1")
## [1] "this Is Camel Case aint It"

Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

Tag : r , By : s8k
Date : March 29 2020, 07:55 AM

Equivalent function to stringr::word in stringi package

Tag : r , By : John Phipps
Date : March 29 2020, 07:55 AM
wish helps you Not exactly the same as stringr::word(), but this would seem to do the trick:
sentencas %>% 
  stri_split_coll("jose ", strength=1, simplify = TRUE) %>% 
  .[,2] %>% 

how to apply string_extract between two pattens in stringr or stringi in r

Tag : r , By : UnKnownUser
Date : March 29 2020, 07:55 AM
it helps some times Use positive lookarounds like this.
(?<=ns_ap_dft=) means find something preceded by ns_ap_dft=, .*? means any characters but as few as possible (because you only want until the first &), (?=&) means find something preceding &.
text <- c(

text %>%
#> [1] "0"      "305277"
Related Posts Related QUESTIONS :
  • How can I fill columns based on values in another column?
  • 32 bit R and 64 bit R: output differs
  • Remove a single backslash in paste0 output
  • ggplot2 different label for the first break
  • TSP in R, with given distances
  • How to find the given value from the range of values?
  • Solution on R group by issue _ multiple combination
  • Transform multiple columns with a function that uses different arguments per column
  • How can I parse a string with the format "1/16/2019 1:24:51" into a POSIXct or other date variable?
  • How to plot a box plot in R for outlier detection for a huge number of rows?
  • How to change column name according to another dataframe in R?
  • `sjPlot::tab_df()`--how to set the number of decimal places?
  • time average for specific time range in r
  • joining dataframes by closest time and another key in r
  • How to create nested for loop for a certain range
  • New category based on sequence of date ranges
  • how to extract formula from coxph model summary in R?
  • add row based on variable condition in R
  • Generating the sequence 111122222333334
  • Unable to use has_goog_key() in R
  • how to multiply each row with a scaler in corresponding column?
  • R is not recognizing levels of a factor as the same. Is there a way to do this?
  • Calculating mean of replicate experiment result values in a column based on multiple columns using R
  • Best method to extract the first instance of a string between specified keywords using data.table
  • ignore optional combination of alphanumeric characters in str_extract
  • Why tracemem shows two copies when modification occurs inside function body?
  • Can't use mppm on multitype point patterns
  • How to move selected matrix rows to top of matrix based on a selection vector of row names
  • Combining expressions with a common operator
  • Passing string through multiple filters for matching
  • Convert two columns in R to rows of unique occurrence
  • How to create a dataframe using a function based on user-input?
  • How to access the visited vertices in a given shortest path using R igraph
  • Differences in Unicode character output with print()
  • Extracting Function or Objects from a String and then Piping Them with Magrittr/Dplyr
  • renderUI not evaluated until it is rendered
  • Find the maximum absolute value by row in an R data frame
  • Extracting data from irregular lists using purrr:map()
  • transforming data based on range of column in r
  • Identify and subset rows with some similar information
  • converting character from mongolite to timestamp in R
  • Create list from two vectors with every combo of each
  • Error in running a spread because of unique 'key combinations'; combining rows of data
  • visualize numerical strings as a matrixed heatmap
  • how to make a blocked matrix?
  • How to summarize with two functions using with dplyr
  • Dataframe is no longer the same after being saved to Excel and read back in
  • Create duplicate rows using based on availability of data
  • Keep empty groups when grouping with data.table in R
  • Grouping of Event Time Data based on multiple, iterative conditions
  • Formatting Numbers in Flextable for Specific Columns
  • How to store results from for-loop into a dataframe
  • How to select the values in my dataframe which has logical operator "<" (less than), divide them by two, an
  • Rowwise extract data between two strings
  • Convert a string separate by . and +
  • stacking function for values in R
  • dplyr coerces characters to factors
  • How do I use spread and group_by on a single row dataset
  • Replacing values in one matrix with values from another
  • Aggregate data and exclude duplicates in one column
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com