R: reshaping a dataframe and creating proportions
Tag : r , By : littlefuzz
Date : March 29 2020, 07:55 AM
this will help This is a pretty straightforward table operation when combined with prop.table(...,margin=). the margin= argument allows for calculating proportions for rows, columns or the whole table (default) prop.table(...,1) does rows; 2 does columns, 3 does strata etc.. Also, instead of data.frame, use as.data.frame.matrix to avoid the reshape requirement: as.data.frame.matrix(prop.table(with(df,table(Product,Day)),1))
# Friday Monday Saturday Sunday Thursday Tuesday Wednesday
#A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
as.data.frame.matrix(prop.table(with(df,table(Product,Day)),2))
# Friday Monday Saturday Sunday Thursday Tuesday Wednesday
#A 1 0.50 0.6666667 0 1 0.6666667 0.6666667
#B 0 0.25 0.0000000 0 0 0.3333333 0.3333333
#C 0 0.25 0.3333333 1 0 0.0000000 0.0000000
|
List all variables (and their proportions) in a subset of a dataframe
Date : March 29 2020, 07:55 AM
Does that help For what you are looking to do, the matrix above with blanks is a fairly clunky style, and hard(er) to create. The following might be more useful (and captures the same information). Setup Data: bout$StartTime <- times(as.character(bout$Time))
bout$EndTime <- bout$StartTime + times("00:00:10")
library(data.table)
bout.result <- setDT(bout)[order(bout), list(Date = min(Date), Dur = max(Time) - min(Time)), by = c("bout","area")]
boutPro <- bout.result[, list(boutDur = times(sum(Dur))), by = "bout"]
bout.result <- merge(bout.result,boutPro, by = "bout")
bout.result$prop <- as.numeric(bout.result$Dur/bout.result$boutDur)
bout.result
bout area Date Dur boutDur prop
1: 0 E456 2013-02-02 00:00:20 00:01:20 0.25000000
2: 0 E461 2013-02-02 00:00:10 00:01:20 0.12500000
3: 0 E462 2013-02-02 00:00:10 00:01:20 0.12500000
4: 0 E469 2013-02-02 00:00:10 00:01:20 0.12500000
5: 0 E470 2013-02-02 00:00:10 00:01:20 0.12500000
6: 0 E471 2013-02-02 00:00:10 00:01:20 0.12500000
7: 0 E479 2013-02-02 00:00:10 00:01:20 0.12500000
8: 1 E457 2013-02-02 00:00:40 00:00:50 0.80000000
9: 1 E460 2013-02-02 00:00:10 00:00:50 0.20000000
10: 2 E463 2013-02-02 00:00:20 00:01:00 0.33333333
11: 2 E465 2013-02-02 00:00:30 00:01:00 0.50000000
12: 2 E468 2013-02-02 00:00:10 00:01:00 0.16666667
13: 3 E457 2013-02-02 00:00:20 00:01:50 0.18181818
14: 3 E463 2013-02-02 00:00:40 00:01:50 0.36363636
15: 3 E478 2013-02-02 00:00:10 00:01:50 0.09090909
16: 3 E479 2013-02-02 00:00:40 00:01:50 0.36363636
|
Calculate proportions group-wise from dataframe
Date : March 29 2020, 07:55 AM
will help you I have a dataframe of word frequencies, such as: a=aggregate(df$Freq, by=list(df$Pred), FUN=sum)
a1=a[,2]
names(a1)=as.character(a[,1])
df$Props=df$Freq/a1[df$Pred]
|
Getting pairwise proportions of concordance in a binary dataframe
Date : March 29 2020, 07:55 AM
this one helps. I have a dataframe with binary values like so: #Get the combinations
j = combn(x = df, m = 2, simplify = FALSE)
#Get the Proportions
sapply(j, function(x) length(which(x[1] == x[2]))/NROW(x))
combn(x = df, m = 2, FUN=function(x) length(which(x[1] == x[2]))/NROW(x))
|
pandas dataframe row proportions
Tag : python , By : ChristianM
Date : March 29 2020, 07:55 AM
this one helps. I have a dataframe with multiple columns and rows , Do you mean sth like this: First creating test data: np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 20, [5, 3]), columns=['A', 'B', 'C'])
A B C
0 6 19 14
1 10 7 6
2 18 10 10
3 3 7 2
4 1 11 5
(df*.5).rolling(2).sum()
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
def weighted_mean(arr):
return sum(arr*[.25, .75])
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
def weighted_mean(arr, weights=[.5, .5]):
return sum(arr*weights/sum(weights))
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
df.rolling(2).apply(weighted_mean, raw=True, args=[[.25, .75]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
df.rolling(2).apply(weighted_mean, raw=True, args=[[1, 3]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
|