logo
down
shadow

efficient subsetting of data.table with greater-than, less-than using indices


efficient subsetting of data.table with greater-than, less-than using indices

Content Index :

efficient subsetting of data.table with greater-than, less-than using indices
Tag : r , By : user183954
Date : November 24 2020, 04:01 AM

This might help you You're doing it wrong. Calling [.data.table in a loop, which is what your lapply does, is going to be slow because that function has a lot of overhead, and that overhead is not worth it for the tiny operation that you do. The correct way is to do a non-equi join:
table[data.table(x), on = .(min.x < x, max.x > x), rowname, by = .EACHI]
#          min.x    max.x rowname
#    1: 1.084668 1.084668       1
#    2: 1.293461 1.293461    7734
#    3: 1.293461 1.293461     739
#    4: 1.293461 1.293461       2
#    5: 1.293461 1.293461    3757
#   ---                          
#30216: 1.324366 1.324366    9999
#30217: 1.324366 1.324366    9635
#30218: 1.869469 1.869469    8740
#30219: 1.869469 1.869469    3302
#30220: 1.869469 1.869469   10000

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Memory-efficient subsetting of large data.table


Tag : r , By : user119605
Date : March 29 2020, 07:55 AM
Any of those help I have a SQLite-db with a size of 11 GB and 16 GB of RAM (shared with OS and so on). I want to perform a subsetting method with data.table: , The only two I have in mind at the moment:
sql = "SELECT *, period >= stableStateStart AS tmpcol FROM inventory"
inventory = setDT(dbGetQuery(conn, sql), key="tmpcol")
inventory[.(TRUE)]

C++: Efficient way to check if elements in a vector are greater than elements in another having same indices?


Tag : cpp , By : user181945
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further This is called the maxima of a point set. For two and three dimensions, this can be solved in O(n log n) time. For more than three dimensions, this can be solved in O(n(log n)^(d − 3)  log log n) time. For random points, a linear expected time algorithm is available.

Different results when subsetting data.table columns with numeric indices in different ways


Tag : r , By : user187301
Date : March 29 2020, 07:55 AM
this will help By looking at the source code we can simulate data.tables behaviour for different inputs
if (!missing(j)) {
    jsub = replace_dot_alias(substitute(j))
    root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
    if (root == ":" ||
        (root %chin% c("-","!") && is.call(jsub[[2L]]) && jsub[[2L]][[1L]]=="(" && is.call(jsub[[2L]][[2L]]) && jsub[[2L]][[2L]][[1L]]==":") ||
        ( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
          root %chin% c("","c","paste","paste0","-","!") &&
          missing(by) )) {   # test 763. TODO: likely that !missing(by) iff with==TRUE (so, with can be removed)
      # When no variable names (i.e. symbols) occur in j, scope doesn't matter because there are no symbols to find.
      # If variable names do occur, but they are all prefixed with .., then that means look up in calling scope.
      # Automatically set with=FALSE in this case so that DT[,1], DT[,2:3], DT[,"someCol"] and DT[,c("colB","colD")]
      # work as expected.  As before, a vector will never be returned, but a single column data.table
      # for type consistency with >1 cases. To return a single vector use DT[["someCol"]] or DT[[3]].
      # The root==":" is to allow DT[,colC:colH] even though that contains two variable names.
      # root == "-" or "!" is for tests 1504.11 and 1504.13 (a : with a ! or - modifier root)
      # We don't want to evaluate j at all in making this decision because i) evaluating could itself
      # increment some variable and not intended to be evaluated a 2nd time later on and ii) we don't
      # want decisions like this to depend on the data or vector lengths since that can introduce
      # inconistency reminiscent of drop=TRUE in [.data.frame that we seek to avoid.
      with=FALSE
is_satisfied <- function(...) {
  jsub <- substitute(...)
  root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
  if (root == ":" ||
    (root %chin% c("-","!") && 
     is.call(jsub[[2L]]) && 
     jsub[[2L]][[1L]]=="(" && 
     is.call(jsub[[2L]][[2L]]) && 
     jsub[[2L]][[2L]][[1L]]==":") ||
    ( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
      root %chin% c("","c","paste","paste0","-","!"))) TRUE else FALSE
}

is_satisfied("x")
# [1] TRUE
is_satisfied(c("x", "y"))
# [1] TRUE
is_satisfied(..x)
# [1] TRUE
is_satisfied(1:2)
# [1] TRUE
is_satisfied(c(1:2))
# [1] TRUE
is_satisfied((1:2))
# [1] FALSE
is_satisfied(y)
# [1] FALSE
is_satisfied(list(x, y))
# [1] FALSE

Subsetting a data frame depending if value in column of reference is greater or lower than 0


Tag : r , By : markku
Date : March 29 2020, 07:55 AM
may help you . This can also be done using this succinct code using rowSums() and sign()
mismatch = 1
df[rowSums(sign(df)) >= (ncol(df) - mismatch * 2), ]

     col1 col2 col3 col_Reference
[1,]    1    1    1            -5
[2,]    2    2    2             6
[3,]    4    4   -4             8

Generate the output array A[] when the number of items greater than a[i] for indices greater than i is given


Tag : arrays , By : Simon Hogg
Date : March 29 2020, 07:55 AM
Related Posts Related QUESTIONS :
  • How to change the position of stacked stacked bar chart in ggplot in R?
  • How to divide each of a range a variables by a second range of variables in R
  • Why do I need to assemble vector before scaling in Spark?
  • How to select individuals which appear in multiple groups?
  • How can I fill columns based on values in another column?
  • 32 bit R and 64 bit R: output differs
  • Remove a single backslash in paste0 output
  • ggplot2 different label for the first break
  • TSP in R, with given distances
  • How to find the given value from the range of values?
  • Solution on R group by issue _ multiple combination
  • Transform multiple columns with a function that uses different arguments per column
  • How can I parse a string with the format "1/16/2019 1:24:51" into a POSIXct or other date variable?
  • How to plot a box plot in R for outlier detection for a huge number of rows?
  • How to change column name according to another dataframe in R?
  • `sjPlot::tab_df()`--how to set the number of decimal places?
  • time average for specific time range in r
  • joining dataframes by closest time and another key in r
  • How to create nested for loop for a certain range
  • New category based on sequence of date ranges
  • how to extract formula from coxph model summary in R?
  • add row based on variable condition in R
  • Generating the sequence 111122222333334
  • Unable to use has_goog_key() in R
  • how to multiply each row with a scaler in corresponding column?
  • R is not recognizing levels of a factor as the same. Is there a way to do this?
  • Calculating mean of replicate experiment result values in a column based on multiple columns using R
  • Best method to extract the first instance of a string between specified keywords using data.table
  • ignore optional combination of alphanumeric characters in str_extract
  • Why tracemem shows two copies when modification occurs inside function body?
  • Can't use mppm on multitype point patterns
  • How to move selected matrix rows to top of matrix based on a selection vector of row names
  • Combining expressions with a common operator
  • Passing string through multiple filters for matching
  • Convert two columns in R to rows of unique occurrence
  • How to create a dataframe using a function based on user-input?
  • How to access the visited vertices in a given shortest path using R igraph
  • Differences in Unicode character output with print()
  • Extracting Function or Objects from a String and then Piping Them with Magrittr/Dplyr
  • renderUI not evaluated until it is rendered
  • Find the maximum absolute value by row in an R data frame
  • Extracting data from irregular lists using purrr:map()
  • transforming data based on range of column in r
  • Identify and subset rows with some similar information
  • converting character from mongolite to timestamp in R
  • Create list from two vectors with every combo of each
  • Error in running a spread because of unique 'key combinations'; combining rows of data
  • visualize numerical strings as a matrixed heatmap
  • how to make a blocked matrix?
  • How to summarize with two functions using with dplyr
  • Dataframe is no longer the same after being saved to Excel and read back in
  • Create duplicate rows using based on availability of data
  • Keep empty groups when grouping with data.table in R
  • Grouping of Event Time Data based on multiple, iterative conditions
  • Formatting Numbers in Flextable for Specific Columns
  • How to store results from for-loop into a dataframe
  • How to select the values in my dataframe which has logical operator "<" (less than), divide them by two, an
  • Rowwise extract data between two strings
  • Convert a string separate by . and +
  • stacking function for values in R
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com