Produce a precision weighted average among rows with repeated observations
Tag : r , By : Juan Pablo
Date : March 29 2020, 07:55 AM
I wish this helpful for you I like the plyr package for these sorts of problems. It should be functionally equivalent to aggregate, but I think it is nice and convenient to use. There are lots of examples and a great ~20 page intro to plyr on the website. For this problem, since the data starts as a data.frame and you want another data.frame on the other end, we use ddply() library(plyr)
#f1()
ddply(example, "id", summarize,
newMean = weighted.mean(x=var1, 1/SE1, na.rm = TRUE),
newSE = 1/sum(1/SE1, na.rm = TRUE)
)
id newmean newSE
1 Bob 8.8982 0.91917
2 Jeff 4.6375 2.51690
3 Joe 7.8734 1.05064
4 Kim 7.1984 1.04829
5 Sam 11.1130 2.71324
6 Sara 9.8376 1.95649
library(data.table)
dt <- data.table(example, key="id")
#f2()
dt[, list(newMean = weighted.mean(var1, 1/SE1, na.rm = TRUE),
newSE = 1/sum(1/SE1, na.rm = TRUE)),
by = "id"]
library(rbenchmark)
#f1 = plyr, #f2 = data.table
benchmark(f1(), f2(),
replications = 1000,
order = "elapsed",
columns = c("test", "elapsed", "relative"))
test elapsed relative
2 f2() 3.580 1.0000
1 f1() 6.398 1.7872
|
insert average month values in to another table but present month average values should prevet to pass to table
Date : March 29 2020, 07:55 AM
I hope this helps . This query works well. I want to prevent this month averages pass to avg_month_val1 how can i do it. , Maybe this does what you intent: INSERT IGNORE INTO `clima_data`.`avg_month_val1` ( `year` , `month` ,
`evep` , `sunshine_hrs` , `rainfall` ,
`max_temp` , `min_temp` )
SELECT year(str_to_date(date, '%m/%d/%Y'))as year,
month(str_to_date(date, '%m/%d/%Y'))as month,
round(avg(evep),2),
round(Avg(sunshine_hrs),2),
round(sum(rainfall),2),
round(AVG(max_temp),2),
round(avg(min_temp),2)
FROM reg_data3
GROUP BY year(str_to_date(date, '%m/%d/%Y')),
month(str_to_date(date, '%m/%d/%Y'))
HAVING (year(str_to_date(date, '%m/%d/%Y')) <> year(CURRENT_TIMESTAMP)
OR month(str_to_date(date, '%m/%d/%Y')) <> month(CURRENT_TIMESTAMP) )
ORDER BY 1 Desc;
|
How to identify repeated variables within observations?
Date : March 29 2020, 07:55 AM
seems to work fine I'm new to R, I have a very long data set with presumably some repeated values (dates) in different variables, I want to assess whether two or more variables (if possible) are equal or not for each individual. , It seems like a job for apply. Here's a possible solution mydata2 <- as.data.frame(t(apply(mydata, 1, function(x){temp <- unique(x);
c(temp, rep("", length(x) - length(temp)))})))
names(mydata2) <- names(mydata)
mydata2
# Id date1 date2 date3 date25
# 1 1 17/10/2002 25/01/2008
# 2 2 13/04/2009
# 3 3 07/02/2008
# 4 4 24/11/2006 09/06/2010
|
Get an average per month in h2 that is one average regardles of which year or month
Date : March 29 2020, 07:55 AM
wish of those help I think you almost got it the first time. You need to: Add grouping by year/month into your original query, to get average per month. Perform a select on the result you already have and group it by truck, surrounding the first query with an extra select: select truckcode, avg(avgPetrolQty) from (
SELECT t.truckCode,
COALESCE(year(orderDate),'Not Announced') as year,
COALESCE(monthname(orderDate),'Not Announced') as month,
IFNULL (avg(petrolQty),0) as avgPetrolQty
from truck t left join orderz o
on t.truckId = o.truckId
group by t.truckCode,
COALESCE(year(orderDate),'Not Announced'),
COALESCE(monthname(orderDate),'Not Announced')
) group by truckcode
|
summarize over repeated observations
Tag : r , By : SachinJadhav
Date : March 29 2020, 07:55 AM
To fix the issue you can do akrun's answer is more elegant, but as an alternative you can simply add the group variable to your group_by() call: library(dplyr)
dat <- tibble(id = c(1, 1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4),
group = c(1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0))
dat %>%
group_by(id, group) %>%
tally()
# A tibble: 4 x 3
# Groups: id [4]
id group n
<dbl> <dbl> <int>
1 1 1 3
2 2 0 4
3 3 1 1
4 4 0 5
dat2 <- tibble(id = c(1, 1, 1, 2, 2), group = c(1, 0, 0, 1, 0))
dat2 %>%
group_by(id, group) %>%
tally()
# A tibble: 4 x 3
# Groups: id [2]
id group n
<dbl> <dbl> <int>
1 1 0 2
2 1 1 1
3 2 0 1
4 2 1 1
|