logo
down
shadow

Creating a Dataframe of Proportions


Creating a Dataframe of Proportions

Content Index :

Creating a Dataframe of Proportions
Tag : python , By : unadopted
Date : November 27 2020, 04:01 AM

Any of those help I have list of lists like such: , Using pandas
import pandas as pd

data = [
    ['person_a', 'code_1'],
    ['person_a', 'code_2'],
    ['person_a', 'code_3'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_a', 'code_4'],
    ['person_b', 'code_2']]

df = pd.DataFrame(data, columns=['person', 'code'])

df = df.assign(relative_frequency=1).groupby(['person', 'code']).count().unstack()
# >>> df
#          relative_frequency                     
# code                 code_1 code_2 code_3 code_4
# person                                          
# person_a                  1      1      1      1
# person_b                  3      1    NaN    NaN

>>> df.div(df.sum(1), axis=0)
         relative_frequency                     
code                 code_1 code_2 code_3 code_4
person                                          
person_a               0.25   0.25   0.25   0.25
person_b               0.75   0.25    NaN    NaN

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

R: reshaping a dataframe and creating proportions


Tag : r , By : littlefuzz
Date : March 29 2020, 07:55 AM
this will help This is a pretty straightforward table operation when combined with prop.table(...,margin=). the margin= argument allows for calculating proportions for rows, columns or the whole table (default) prop.table(...,1) does rows; 2 does columns, 3 does strata etc..
Also, instead of data.frame, use as.data.frame.matrix to avoid the reshape requirement:
as.data.frame.matrix(prop.table(with(df,table(Product,Day)),1))
#     Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday
#A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000

as.data.frame.matrix(prop.table(with(df,table(Product,Day)),2))
#  Friday Monday  Saturday Sunday Thursday   Tuesday Wednesday
#A      1   0.50 0.6666667      0        1 0.6666667 0.6666667
#B      0   0.25 0.0000000      0        0 0.3333333 0.3333333
#C      0   0.25 0.3333333      1        0 0.0000000 0.0000000

List all variables (and their proportions) in a subset of a dataframe


Tag : r , By : mansoor
Date : March 29 2020, 07:55 AM
Does that help For what you are looking to do, the matrix above with blanks is a fairly clunky style, and hard(er) to create. The following might be more useful (and captures the same information).
Setup Data:
bout$StartTime <- times(as.character(bout$Time))
bout$EndTime <- bout$StartTime + times("00:00:10")
library(data.table)
bout.result <- setDT(bout)[order(bout), list(Date = min(Date), Dur = max(Time) - min(Time)), by = c("bout","area")]
boutPro <- bout.result[, list(boutDur = times(sum(Dur))), by = "bout"]
bout.result <- merge(bout.result,boutPro, by = "bout")
bout.result$prop <- as.numeric(bout.result$Dur/bout.result$boutDur)
bout.result

    bout area       Date      Dur  boutDur       prop
 1:    0 E456 2013-02-02 00:00:20 00:01:20 0.25000000
 2:    0 E461 2013-02-02 00:00:10 00:01:20 0.12500000
 3:    0 E462 2013-02-02 00:00:10 00:01:20 0.12500000
 4:    0 E469 2013-02-02 00:00:10 00:01:20 0.12500000
 5:    0 E470 2013-02-02 00:00:10 00:01:20 0.12500000
 6:    0 E471 2013-02-02 00:00:10 00:01:20 0.12500000
 7:    0 E479 2013-02-02 00:00:10 00:01:20 0.12500000
 8:    1 E457 2013-02-02 00:00:40 00:00:50 0.80000000
 9:    1 E460 2013-02-02 00:00:10 00:00:50 0.20000000
10:    2 E463 2013-02-02 00:00:20 00:01:00 0.33333333
11:    2 E465 2013-02-02 00:00:30 00:01:00 0.50000000
12:    2 E468 2013-02-02 00:00:10 00:01:00 0.16666667
13:    3 E457 2013-02-02 00:00:20 00:01:50 0.18181818
14:    3 E463 2013-02-02 00:00:40 00:01:50 0.36363636
15:    3 E478 2013-02-02 00:00:10 00:01:50 0.09090909
16:    3 E479 2013-02-02 00:00:40 00:01:50 0.36363636

Calculate proportions group-wise from dataframe


Tag : r , By : UpperLuck
Date : March 29 2020, 07:55 AM
will help you I have a dataframe of word frequencies, such as:
a=aggregate(df$Freq, by=list(df$Pred), FUN=sum)
a1=a[,2]
names(a1)=as.character(a[,1])
df$Props=df$Freq/a1[df$Pred]

Getting pairwise proportions of concordance in a binary dataframe


Tag : r , By : jgood
Date : March 29 2020, 07:55 AM
this one helps. I have a dataframe with binary values like so:
#Get the combinations
j = combn(x = df, m = 2, simplify = FALSE)

#Get the Proportions
sapply(j, function(x) length(which(x[1] == x[2]))/NROW(x))
combn(x = df, m = 2, FUN=function(x) length(which(x[1] == x[2]))/NROW(x))

pandas dataframe row proportions


Tag : python , By : ChristianM
Date : March 29 2020, 07:55 AM
this one helps. I have a dataframe with multiple columns and rows , Do you mean sth like this:
First creating test data:
np.random.seed(42)

df = pd.DataFrame(np.random.randint(0, 20, [5, 3]), columns=['A', 'B', 'C'])

    A   B   C
0   6  19  14
1  10   7   6
2  18  10  10
3   3   7   2
4   1  11   5
(df*.5).rolling(2).sum()

      A     B     C
0   NaN   NaN   NaN
1   8.0  13.0  10.0
2  14.0   8.5   8.0
3  10.5   8.5   6.0
4   2.0   9.0   3.5
def weighted_mean(arr):
    return sum(arr*[.25, .75])

df.rolling(2).apply(weighted_mean, raw=True)

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25
def weighted_mean(arr, weights=[.5, .5]):
    return sum(arr*weights/sum(weights))
df.rolling(2).apply(weighted_mean, raw=True)

      A     B     C
0   NaN   NaN   NaN
1   8.0  13.0  10.0
2  14.0   8.5   8.0
3  10.5   8.5   6.0
4   2.0   9.0   3.5
df.rolling(2).apply(weighted_mean, raw=True, args=[[.25, .75]])

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25
df.rolling(2).apply(weighted_mean, raw=True, args=[[1, 3]])

       A      B     C
0    NaN    NaN   NaN
1   9.00  10.00  8.00
2  16.00   9.25  9.00
3   6.75   7.75  4.00
4   1.50  10.00  4.25
Related Posts Related QUESTIONS :
  • Dataframe Timestamp Filter for new/repeating value
  • Problem with clicking select2 dropdownlist in selenium
  • pandas dataframe masks to write values into new column
  • How to click on item in navigation bar on top of page using selenium python?
  • Add multiple EntityRuler with spaCy (ValueError: 'entity_ruler' already exists in pipeline)
  • error when replacing missing ')' using negative look ahead regex in python
  • Is there a way to remove specific strings from indexes using a for loop?
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com