logo
down
shadow

Using cast() or ddply() to summarise the mean for two continuous variables in one dataframe


Using cast() or ddply() to summarise the mean for two continuous variables in one dataframe

Content Index :

Using cast() or ddply() to summarise the mean for two continuous variables in one dataframe
Tag : r , By : iyogee
Date : December 05 2020, 12:18 PM

wish of those help It is not a ddply() or a cast() solution, but using tidyverse and reshape2 you can do:
df %>%
 group_by(Date, Independent_Variable) %>%
 summarise(Independent_Value = mean(Independent_Value)) %>%
 mutate(Independent_Variable = paste(Independent_Variable, "IV", sep = "_")) %>%
 dcast(Date~Independent_Variable, value.var = "Independent_Value") %>%
 arrange(factor(Date, levels = month.name)) %>%
 left_join(df %>%
 group_by(Date, Independent_Variable) %>%
 summarise(Sapflow = mean(Sapflow)) %>%
 mutate(Independent_Variable = paste(Independent_Variable, "Sapflow", sep = "_")) %>%
 dcast(Date~Independent_Variable, value.var = "Sapflow") %>%
 arrange(factor(Date, levels = month.name)),
 by = c("Date" = "Date"))

       Date Humidity_IV Radiation_IV Temperature_IV Humidity_Sapflow
1      June    17.60733     263.6733       70.56133        16.067000
2      July    21.80065     270.9065       61.33065        23.356774
3    August    18.38968     178.9806       71.73355        22.941613
4 September    14.82200     152.2333       72.21367        19.309333
5   October    11.34867      93.6000       81.74300         6.700667
  Radiation_Sapflow Temperature_Sapflow
1         16.067000           16.067000
2         23.356774           23.356774
3         22.941613           22.941613
4         19.309333           19.309333
5          6.700667            6.700667
df %>%
 group_by(Date, Independent_Variable) %>% #Grouping
 summarise_all(funs(mean = mean(.))) %>% #Summarising all variables and adding "_mean" to the new variables
 arrange(factor(Date, levels = month.name)) #Arranging according the real order of months

   Date      Independent_Variable Independent_Value_mean Sapflow_mean
   <fct>     <fct>                                 <dbl>        <dbl>
 1 June      Humidity                               17.6        16.1 
 2 June      Radiation                             264.         16.1 
 3 June      Temperature                            70.6        16.1 
 4 July      Humidity                               21.8        23.4 
 5 July      Radiation                             271.         23.4 
 6 July      Temperature                            61.3        23.4

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

for each group summarise means for all variables in dataframe (ddply? split?)


Tag : r , By : CookingCoder
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , Given the format you want for the result, the reshape package will be more efficient than plyr.
test_data <- data.frame(
var0 = rnorm(100),
var1 = rnorm(100,1),
var2 = rnorm(100,2),
var3 = rnorm(100,3),
var4 = rnorm(100,4),
group = sample(letters[1:10],100,replace=T),
year = sample(c(2007,2009),100, replace=T))

library(reshape)
Molten <- melt(test_data, id.vars = c("group", "year"))
cast(group + variable ~ year, data = Molten, fun = mean)
   group variable         2007         2009
1      a     var0  0.003767891  0.340989068
2      a     var1  2.009026385  1.162786943
3      a     var2  1.861061882  2.676524736
4      a     var3  2.998011426  3.311250399
5      a     var4  3.979255971  4.165715967
6      b     var0 -0.112883844 -0.179762343
7      b     var1  1.342447279  1.199554144
8      b     var2  2.486088196  1.767431740
9      b     var3  3.261451449  2.934903824
10     b     var4  3.489147597  3.076779626
11     c     var0  0.493591055 -0.113469315
12     c     var1  0.157424796 -0.186590644
13     c     var2  2.366594176  2.458204041
14     c     var3  3.485808031  2.817153628
15     c     var4  3.681576886  3.057915666
16     d     var0  0.360188789  1.205875725
17     d     var1  1.271541181  0.898973536
18     d     var2  1.824468264  1.944708165
19     d     var3  2.323315162  3.550719308
20     d     var4  3.852223640  4.647498956
21     e     var0 -0.556751465  0.273865769
22     e     var1  1.173899189  0.719520372
23     e     var2  1.935402724  2.046313047
24     e     var3  3.318669590  2.871462470
25     e     var4  4.374478734  4.522511874
26     f     var0 -0.258956555 -0.007729091
27     f     var1  1.424479454  1.175242755
28     f     var2  1.797948551  2.411030282
29     f     var3  3.083169793  3.324584667
30     f     var4  4.160641429  3.546527820
31     g     var0  0.189038036 -0.683028110
32     g     var1  0.429915866  0.827761101
33     g     var2  1.839982321  1.513104866
34     g     var3  3.106414330  2.755975622
35     g     var4  4.599340239  3.691478466
36     h     var0  0.015557352 -0.707257185
37     h     var1  0.933199148  1.037655156
38     h     var2  1.927442457  2.521369108
39     h     var3  3.246734239  3.703213646
40     h     var4  4.242387776  4.407960355
41     i     var0  0.885226638 -0.288221276
42     i     var1  1.216012653  1.502514588
43     i     var2  2.302815441  1.905731471
44     i     var3  2.026631277  2.836508446
45     i     var4  4.800676814  4.772964668
46     j     var0 -0.435661855  0.192703997
47     j     var1  0.836814185  0.394505861
48     j     var2  1.663523873  2.377640369
49     j     var3  3.489536343  3.457597835
50     j     var4  4.146020948  4.281599816

How to speed up summarise and ddply?


Tag : r , By : Joshua Johnson
Date : March 29 2020, 07:55 AM
around this issue If you're going to use your edit, why not use rowsum and save yourself a few minutes of execution time?
nr <- 2e6
nc <- 3
aggdf <- data.frame(matrix(rnorm(nr*nc),nr,nc),
                    matrix(sample(100,nr*nc,TRUE),nr,nc), rnorm(nr))
colnames(aggdf) <- c("col1","col2","col3","fac1","fac2","fac3","w")

system.time({
aggsums <- rowsum(data.frame(aggdf[,c("col1","col2","col3")]*aggdf$w,w=aggdf$w), 
  interaction(aggdf[,c("fac1","fac2","fac3")]))
agg_wtd_mean <- aggsums[,1:3]/aggsums[,4]
})
#   user  system elapsed 
#  16.21    0.77   16.99 

plyr ddply and summarise use in R


Tag : r , By : Bharath
Date : March 29 2020, 07:55 AM
wish help you to fix your issue Hi I want to avoid using loops and so want to use something from plyr to help solve my problem. , Is this what you're after?
> ddply(df, .(f), colwise(sum))
    f          x           y          z
1   1 -0.4190284  2.61101681  1.2280026
2   2  1.1063977  2.40006922  4.9550079
3   3  0.4498366 -4.00610558  0.9964754
4   4  1.9325488 -2.81241212 -3.1185574
5   5 -4.1077670 -1.01232884 -3.9852388
6   6 -1.0488003 -2.42924689  3.5273636
7   7  2.2999306  0.85930085 -0.6245167
8   8 -4.8105311 -6.81352238 -2.1223436
9   9 -2.8187083  5.03391770  1.6433896
10 10  5.1323666 -0.06192382  1.8978994
foo <- function(df.obj,colname){ddply(df, .(f), colwise(sum))[,c("f",colname)]}

ddply summarise on multiple variables


Tag : r , By : user183289
Date : March 29 2020, 07:55 AM
Hope this helps I don't know what plyr does internally, but data.table is only going to use the columns that are in the expression itself, effectively scanning the data only once (column by column):
library(data.table)
dt = data.table(df)

lapply(c('hw', 'app', 'srvc'), function(name) dt[, .N, by = name])

Using ddply to summarise data in R


Tag : r , By : paolodm
Date : March 29 2020, 07:55 AM
Does that help Generally the point of bundling commands up into a function is so you don't have to worry about the intermediate steps. You've done that, but now you want the intermediate results too (your "interval"). I think the only good solution is to take your function apart.
Defining interval first, you can just use it as a grouping variable in ddply and use plain old mean, unless I'm misunderstanding the purpose of your average function.
df$interval <- with(df, cut(velocity, seq(min(velocity), max(velocity), by = 4.5)))
df <- ddply(df, c("class", "PrecVehClass", "interval"), summarise,
            avg.spacing = mean(spacing),
            avg.headway = mean(headway),
            avg.speed = mean(velocity))
df1 <- data.frame(x = rnorm(100))
df1$interval <- cut(df1$x, breaks=c(-10, -1, 1, 10))
ddply(df1, "interval", summarize, mean_within_interval = mean(x))
  interval mean_within_interval
1 (-10,-1]           -1.5262258
2   (-1,1]            0.0880585
3   (1,10]            1.4796220
Related Posts Related QUESTIONS :
  • How to change the lab name corresponding to function in ggplot
  • R, filtering for an element in a list in a dataframe cell
  • Extracting only bottom temperature from 4d NetCDF file
  • How to add/wrap lines of text to .tex with .sh script
  • R - building new variables from sequenced data
  • Sum rows values one after the other
  • Nesting ifelse inside summarytools
  • How best to divide different levels of a factor by one another in dataframe in R?
  • Why does my code run multiple times before I type data into the table? How do I make an action button that creates a tab
  • How to impute missing values not at random?
  • Set the y limits of an added average line of a plotly plot
  • how to calculate a new column after grouping with dplyr
  • Extract data from rows creating new columns using R
  • Create a filled area line plot with plotly
  • When do I need parentheses around an if statement to control the sequence of a formula in R?
  • my graph in ggplot2 contains an "e" character in y-axis
  • Making variables immutable in R
  • R: Difference between the subsequent ranks of a item group by date
  • Match data within multiple time-frames with dplyr
  • Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop
  • Conditional formula referring to preview row in DF not working
  • Set hoverinfo text in plotly scatterplot
  • Histogram of Sums from Categorical/Binary Data
  • Efficiently find set differences and generate random sample
  • Find closest points from data set B to point in data set A, using lat long in R
  • dplyr join on column A OR column B
  • Replace all string if row starts with (within a column)
  • Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?
  • How can I extract bounding boxes in a row-wise manner using R?
  • How do I easily sum up values in different columns?
  • Reading numeric Date value from CSV file to data.frame in "R"
  • R programming: creating a stacked bar graph, with variable colors for each stacked bar
  • How to identify all columns that contain binary representation
  • Filter different groups by different factor levels
  • Saving .xlsx file to disc, form http post request
  • Add an "all" option under the filter that selects the number of rows displayed in a datatable
  • How to select second column of every xts in list
  • Generate a frequency dataframe out of an input dataframe
  • Why manual autocorrelation does not match acf() results?
  • Merge 3 dataframes which are different to each other
  • remove adjacent duplicates from string
  • How to change the position of stacked stacked bar chart in ggplot in R?
  • How to divide each of a range a variables by a second range of variables in R
  • Why do I need to assemble vector before scaling in Spark?
  • How to select individuals which appear in multiple groups?
  • How can I fill columns based on values in another column?
  • 32 bit R and 64 bit R: output differs
  • Remove a single backslash in paste0 output
  • ggplot2 different label for the first break
  • TSP in R, with given distances
  • How to find the given value from the range of values?
  • Solution on R group by issue _ multiple combination
  • Transform multiple columns with a function that uses different arguments per column
  • How can I parse a string with the format "1/16/2019 1:24:51" into a POSIXct or other date variable?
  • How to plot a box plot in R for outlier detection for a huge number of rows?
  • How to change column name according to another dataframe in R?
  • `sjPlot::tab_df()`--how to set the number of decimal places?
  • time average for specific time range in r
  • joining dataframes by closest time and another key in r
  • How to create nested for loop for a certain range
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com