logo
down
shadow

Correlations between numerous variables grouped in dplyr


Correlations between numerous variables grouped in dplyr

Content Index :

Correlations between numerous variables grouped in dplyr
Tag : r , By : MJRider
Date : November 25 2020, 04:01 AM

This might help you Say I have a data frame, like this: , Use by like this:
By <- by(df[1:5], df[-(1:5)], cor)
> By
loc: A
seas: S
            PC1        PC2        PC3          A           B
PC1  1.00000000 -0.3941583  0.1872622  0.4576316 -0.00925106
PC2 -0.39415826  1.0000000 -0.6797708  0.3522161  0.20916667
PC3  0.18726218 -0.6797708  1.0000000 -0.2003091  0.37414025
A    0.45763159  0.3522161 -0.2003091  1.0000000  0.57292305
B   -0.00925106  0.2091667  0.3741403  0.5729230  1.00000000
----------------------------------------------------------------------------------------------------------------------------- 
loc: B
seas: S
            PC1         PC2         PC3           A          B
PC1  1.00000000 -0.52651449  0.07120701 -0.01779813 -0.7432814
PC2 -0.52651449  1.00000000 -0.05448583 -0.35011878  0.4632416
PC3  0.07120701 -0.05448583  1.00000000  0.80342399  0.4580262
A   -0.01779813 -0.35011878  0.80342399  1.00000000  0.5558740
B   -0.74328144  0.46324158  0.45802622  0.55587404  1.0000000
----------------------------------------------------------------------------------------------------------------------------- 
loc: A
seas: W
           PC1         PC2        PC3          A           B
PC1  1.0000000 -0.79784422  0.0932317  0.7483545  0.49639477
PC2 -0.7978442  1.00000000 -0.3526315 -0.3994917 -0.05233889
PC3  0.0932317 -0.35263151  1.0000000 -0.5902400  0.36066898
A    0.7483545 -0.39949171 -0.5902400  1.0000000  0.18081316
B    0.4963948 -0.05233889  0.3606690  0.1808132  1.00000000
----------------------------------------------------------------------------------------------------------------------------- 
loc: B
seas: W
           PC1        PC2        PC3          A          B
PC1  1.0000000  0.3441459  0.1135686 -0.4502518 -0.6672104
PC2  0.3441459  1.0000000 -0.8447551 -0.9899521 -0.8098906
PC3  0.1135686 -0.8447551  1.0000000  0.7606430  0.3738706
A   -0.4502518 -0.9899521  0.7606430  1.0000000  0.8832408
B   -0.6672104 -0.8098906  0.3738706  0.8832408  1.0000000
library(plyr)

onerow <- function(x) {
  if (is.data.frame(x)) x <- cor(x[1:5])
  dtab <- as.data.frame.table(x[4:5, 1:3])
  with(dtab, setNames(Freq, paste(Var2, Var1, sep = "_")))
}

adply(By, 1:2, onerow)
  loc seas       PC1_A       PC1_B      PC2_A       PC2_B      PC3_A     PC3_B
1   A    S  0.45763159 -0.00925106  0.3522161  0.20916667 -0.2003091 0.3741403
2   B    S -0.01779813 -0.74328144 -0.3501188  0.46324158  0.8034240 0.4580262
3   A    W  0.74835455  0.49639477 -0.3994917 -0.05233889 -0.5902400 0.3606690
4   B    W -0.45025181 -0.66721038 -0.9899521 -0.80989058  0.7606430 0.3738706
library(plyr)
ddply(df, -(1:5), onerow)
library(dplyr)
df %>%
  group_by_at(-(1:5)) %>%
  do( onerow(.) %>% t %>% as.data.frame ) %>%
  ungroup

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

counts of grouped variables using dplyr


Tag : r , By : Ben Humphrys
Date : March 29 2020, 07:55 AM
will be helpful for those in need Not quite sure I get which group you want to compare with which here, but at any rate you have two grouping variables tp = X > 8 and Z. If you want to compare the rows with X > 8 and Z == "A" to all rows with X > 8 you can do it like this
merge(
    dt %>%
        group_by(X > 8) %>%
        summarize(n.X = n()),
    dt %>%
        group_by(X > 8, Z) %>%
        summarise(n.XZ = n()),
    by = "X > 8"
) %>%
    mutate(prop.XZ = n.XZ/n.X) %>%
    mutate(SE = sqrt((prop.XZ*(1-prop.XZ))/n.X))%>%
    mutate(Lower_limit = prop.XZ-1.96 * SE) %>%
    mutate(Upper_limit = prop.XZ+1.96 * SE)
  X > 8 n.X Z n.XZ   prop.XZ         SE Lower_limit Upper_limit
1 FALSE  70 A   37 0.5285714 0.05966378   0.4116304   0.6455124
2 FALSE  70 B   33 0.4714286 0.05966378   0.3544876   0.5883696
3  TRUE  30 A   16 0.5333333 0.09108401   0.3548087   0.7118580
4  TRUE  30 B   14 0.4666667 0.09108401   0.2881420   0.6451913
merge(
    dt %>%
        group_by(Z) %>%
        summarize(n.Z = n()),
    dt %>%
        group_by(X > 8, Z) %>%
        summarise(n.XZ = n()),
    by = "Z"
) %>%
    mutate(prop.XZ = n.XZ/n.Z) %>%
    mutate(SE = sqrt((prop.XZ*(1-prop.XZ))/n.Z))%>%
    mutate(Lower_limit = prop.XZ-1.96 * SE) %>%
    mutate(Upper_limit = prop.XZ+1.96 * SE)
  Z n.Z X > 8 n.XZ   prop.XZ         SE Lower_limit Upper_limit
1 A  53 FALSE   37 0.6981132 0.06305900   0.5745176   0.8217088
2 A  53  TRUE   16 0.3018868 0.06305900   0.1782912   0.4254824
3 B  47 FALSE   33 0.7021277 0.06670743   0.5713811   0.8328742
4 B  47  TRUE   14 0.2978723 0.06670743   0.1671258   0.4286189

Correlation matrix of grouped variables in dplyr


Tag : r , By : Tony Z
Date : March 29 2020, 07:55 AM
I hope this helps . I have a grouped data frame (using dplyr) with 50 numeric columns, which are split into groups using one of the columns. I want to calculate a matrix of correlation between all non grouping columns and one particular column. , We could use do.
library(dplyr)
mtcars %>% 
       group_by(cyl) %>%
       do(data.frame(Cor=t(cor(.[,3:11], .[,3]))))
library(data.table)
d1 <- copy(mtcars)
setnames(setDT(d1)[, as.list(cor(.SD, .SD[[1]])) , cyl, 
                            .SDcols=3:11],  names(d1)[2:11])[]

Grouped times series lag on selected variables using dplyr


Tag : r , By : Hadley
Date : March 29 2020, 07:55 AM
Hope this helps I am trying to use dplyr to lag some variables (all of which have a common naming convention) for each group in my data set. , This seems to work,
library(dplyr)
iris %>% 
     tbl_df() %>%
     group_by(Species) %>%
     slice(1:3) %>%
     mutate_if(grepl('Sepal', names(.)), funs(lag(.)))
iris %>% 
     tbl_df() %>%
     group_by(Species) %>%
     slice(1:3) %>% 
     mutate_at(vars(contains('Sepal')), lag)

dplyr - compare grouped variables to a subset of grouped variables


Tag : r , By : Nate Bedortha
Date : March 29 2020, 07:55 AM
wish of those help Let's say I have a table of purchases in a long format. It looks something like:
library(tidyverse)
purchases %>%
  count(Item, Variable, Value) %>%
  group_by(Item, Variable) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup()

# A tibble: 7 x 5
  Item  Variable Value        n     pct
  <fct> <fct>    <fct>    <int>   <dbl>
1 Bike  Age      New          1     0.5
2 Bike  Age      Used         1     0.5
3 Bike  Price    Discount     1     0.5
4 Bike  Price    Full         1     0.5
5 Car   Age      New          1     0.5
6 Car   Age      Used         1     0.5
7 Car   Price    Discount     2     1 

How to use dplyr to calculate a weighted mean of two grouped variables


Tag : r , By : Steve
Date : March 29 2020, 07:55 AM
Does that help I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have: , If I have understood you correctly, you need weighted.mean
library(dplyr)
mtcars %>% 
   group_by(cyl, mpg) %>% 
   summarize(Count = n()) %>%
   group_by(cyl) %>%
   summarise(avg_mpg = weighted.mean(mpg, Count))

# A tibble: 3 x 2
#    cyl   avg_mpg
#  <dbl>   <dbl>
#1  4.00    26.7
#2  6.00    19.7
#3  8.00    15.1
mtcars %>% 
  group_by(cyl, mpg) %>% 
  summarize(Count = n()) %>%
  group_by(cyl) %>%
  summarise(avg_mpg = sum(mpg * Count)/sum(Count))
Related Posts Related QUESTIONS :
  • Generating the sequence 111122222333334
  • Unable to use has_goog_key() in R
  • how to multiply each row with a scaler in corresponding column?
  • R is not recognizing levels of a factor as the same. Is there a way to do this?
  • Calculating mean of replicate experiment result values in a column based on multiple columns using R
  • Best method to extract the first instance of a string between specified keywords using data.table
  • ignore optional combination of alphanumeric characters in str_extract
  • Why tracemem shows two copies when modification occurs inside function body?
  • Can't use mppm on multitype point patterns
  • How to move selected matrix rows to top of matrix based on a selection vector of row names
  • Combining expressions with a common operator
  • Passing string through multiple filters for matching
  • Convert two columns in R to rows of unique occurrence
  • How to create a dataframe using a function based on user-input?
  • How to access the visited vertices in a given shortest path using R igraph
  • Differences in Unicode character output with print()
  • Extracting Function or Objects from a String and then Piping Them with Magrittr/Dplyr
  • renderUI not evaluated until it is rendered
  • Find the maximum absolute value by row in an R data frame
  • Extracting data from irregular lists using purrr:map()
  • transforming data based on range of column in r
  • Identify and subset rows with some similar information
  • converting character from mongolite to timestamp in R
  • Create list from two vectors with every combo of each
  • Error in running a spread because of unique 'key combinations'; combining rows of data
  • visualize numerical strings as a matrixed heatmap
  • how to make a blocked matrix?
  • How to summarize with two functions using with dplyr
  • Dataframe is no longer the same after being saved to Excel and read back in
  • Create duplicate rows using based on availability of data
  • Keep empty groups when grouping with data.table in R
  • Grouping of Event Time Data based on multiple, iterative conditions
  • Formatting Numbers in Flextable for Specific Columns
  • How to store results from for-loop into a dataframe
  • How to select the values in my dataframe which has logical operator "<" (less than), divide them by two, an
  • Rowwise extract data between two strings
  • Convert a string separate by . and +
  • stacking function for values in R
  • dplyr coerces characters to factors
  • How do I use spread and group_by on a single row dataset
  • Replacing values in one matrix with values from another
  • Aggregate data and exclude duplicates in one column
  • Perform an R data.table binary search with OR select
  • How can I include a function in the Standard Deviation parameter of pnorm
  • How to get a tidy excel output of P values from R
  • Rotate boxplot legend (R, ggplot2)
  • dplyr::n() returns “Error: Error: n() should only be called in a data context ”
  • Extract fix columns and one variable column from a list of df´s in R
  • A function that can translate DNA sequence to binary code
  • I want to extract 365 netcdf files using loop
  • rvest vs RSelenium results for text extracting
  • Converting wide data to tall data
  • How to remove vertical white lines when using ggsave in R?
  • R-Shiny error: "renderDataTable" and "server=FALSE"
  • Read csv file with selected rows using data.table's fread
  • how to resolve an error like non numeric argument to binary argument?
  • If value exists in environment
  • R get one value according to some rules in each group
  • Use any apply method to find difference between max and min score for each students
  • subsetting a dataframe by existing object
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com