logo
down
shadow

How to groupby and sum if the cell value of certain columns fit specific conditions


How to groupby and sum if the cell value of certain columns fit specific conditions

Content Index :

How to groupby and sum if the cell value of certain columns fit specific conditions
Tag : python , By : UpperLuck
Date : November 28 2020, 04:01 AM

it should still fix some issue Sometimes it is clearest to join additional series to your dataframe, then groupby:
df = pd.DataFrame({'NAME': ['CityA', 'CityB', 'CityA', 'CityB', 'CityA', 'CityB'],
                   'FATAL#': [5, 5, 3, 3, 3, 2],
                   'INJURY#': [1, 1, 1, 1, 0, 2],
                   'ALCOHOL': [0, 0, 1, 1, 1, 0],
                   'CELL': [0, 1, 0, 0, 0, 0]})

# construct fatals dataframe and join
fatals = df.iloc[:, -2:].mul(df['FATAL#'], axis=0).add_prefix('FATAL_')
df = df.join(fatals)

# define columns to sum and groupby
sum_cols = ['FATAL#', 'INJURY#'] + df.columns[-2:].tolist()
res = df.groupby('NAME')[sum_cols].sum().reset_index()

print(res)

    NAME  FATAL#  INJURY#  FATAL_ALCOHOL  FATAL_CELL
0  CityA      11        2              6           0
1  CityB      10        4              3           5

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

pandas add columns conditions with groupby and on another column values


Tag : python , By : user177837
Date : March 29 2020, 07:55 AM
wish helps you I have pandas.DataFrame called companysubset like below, but actual data is much longer. , The age column can be generated with:
Code
df.set_index(['conm'], inplace=True)
df['age'] = df.groupby(level=0).apply(
    lambda x: max(x.fyear) - round(x.ipodate.iloc[0]/10000-0.5))
df = pd.read_fwf(StringIO(
    u"""
        ID      conm                  fyear   ipodate
        46078   CAESARS ENTERTAINMENT 2003    19891213.0
        46079   CAESARS ENTERTAINMENT 2004    19891213.0
        46080   CAESARS ENTERTAINMENT 2005    19891213.0
        46091   CAESARS ENTERTAINMENT 2016    19891213.0
        114620  CAESARSTONE LTD       2010    20120322.0
        114621  CAESARSTONE LTD       2011    20120322.0
        114622  CAESARSTONE LTD       2012    20120322.0
        114623  CAESARSTONE LTD       2013    20120322.0
        114624  CAESARSTONE LTD       2014    20120322.0
        114625  CAESARSTONE LTD       2015    20120322.0
        114626  CAESARSTONE LTD       2016    20120322.0
        132524  CAFEPRESS INC         2010    20120329.0
        132525  CAFEPRESS INC         2011    20120329.0
        132526  CAFEPRESS INC         2012    20120329.0
        132527  CAFEPRESS INC         2013    20120329.0
        132528  CAFEPRESS INC         2014    20120329.0
        132529  CAFEPRESS INC         2015    20120329.0
        132530  CAFEPRESS INC         2016    20120329.0
        120049  CAI INTERNATIONAL INC 2005    20070516.0
        120050  CAI INTERNATIONAL INC 2006    20070516.0
        3897    CALAMP CORP           2000    NaN
        3898    CALAMP CORP           2001    NaN
        3896    CALAMP CORP           1999    NaN
        3899    CALAMP CORP           2002    NaN
        21120   CALATLANTIC GROUP INC 1995    NaN
        21121   CALATLANTIC GROUP INC 1996    NaN
        21122   CALATLANTIC GROUP INC 1997    NaN
        21123   CALATLANTIC GROUP INC 1998    NaN
        21124   CALATLANTIC GROUP INC 1999    NaN
        21125   CALATLANTIC GROUP INC 2000    NaN
        21126   CALATLANTIC GROUP INC 2001    NaN
        21127   CALATLANTIC GROUP INC 2002    NaN
        21128   CALATLANTIC GROUP INC 2003    NaN"""),
    header=1)

df.set_index(['conm'], inplace=True)
df['age'] = df.groupby(level=0).apply(
    lambda x: max(x.fyear) - round(x.ipodate.iloc[0]/10000-0.5))
print(df)
                           ID  fyear     ipodate   age
conm                                                  
CAESARS ENTERTAINMENT   46078   2003  19891213.0  27.0
CAESARS ENTERTAINMENT   46079   2004  19891213.0  27.0
CAESARS ENTERTAINMENT   46080   2005  19891213.0  27.0
CAESARS ENTERTAINMENT   46091   2016  19891213.0  27.0
CAESARSTONE LTD        114620   2010  20120322.0   4.0
CAESARSTONE LTD        114621   2011  20120322.0   4.0
CAESARSTONE LTD        114622   2012  20120322.0   4.0
CAESARSTONE LTD        114623   2013  20120322.0   4.0
CAESARSTONE LTD        114624   2014  20120322.0   4.0
CAESARSTONE LTD        114625   2015  20120322.0   4.0
CAESARSTONE LTD        114626   2016  20120322.0   4.0
CAFEPRESS INC          132524   2010  20120329.0   4.0
CAFEPRESS INC          132525   2011  20120329.0   4.0
CAFEPRESS INC          132526   2012  20120329.0   4.0
CAFEPRESS INC          132527   2013  20120329.0   4.0
CAFEPRESS INC          132528   2014  20120329.0   4.0
CAFEPRESS INC          132529   2015  20120329.0   4.0
CAFEPRESS INC          132530   2016  20120329.0   4.0
CAI INTERNATIONAL INC  120049   2005  20070516.0  -1.0
CAI INTERNATIONAL INC  120050   2006  20070516.0  -1.0
CALAMP CORP              3897   2000         NaN   NaN
CALAMP CORP              3898   2001         NaN   NaN
CALAMP CORP              3896   1999         NaN   NaN
CALAMP CORP              3899   2002         NaN   NaN
CALATLANTIC GROUP INC   21120   1995         NaN   NaN
CALATLANTIC GROUP INC   21121   1996         NaN   NaN
CALATLANTIC GROUP INC   21122   1997         NaN   NaN
CALATLANTIC GROUP INC   21123   1998         NaN   NaN
CALATLANTIC GROUP INC   21124   1999         NaN   NaN
CALATLANTIC GROUP INC   21125   2000         NaN   NaN
CALATLANTIC GROUP INC   21126   2001         NaN   NaN
CALATLANTIC GROUP INC   21127   2002         NaN   NaN
CALATLANTIC GROUP INC   21128   2003         NaN   NaN

Pandas Groupby Lambda function multiple conditions/columns


Tag : python , By : koder
Date : March 29 2020, 07:55 AM
around this issue I am trying to create a new column that groups df by Deal and Month, and applies a percentage (9%) to the Amount column. If all the Amount values for a particular Deal in a particular month add up to 20,000 then apply the percentage to the Amount; otherwise, if the TYPE is MONTHLY, and the individual Amount is at least 1500, apply the percentage to the Amount; failing that, multiply by 0. , I tried to translate your description into this:
df['Sum'] = df.groupby(['Deal','Month'])['Amount'].transform('sum')

df['Desired Column'] = np.where(df['Sum'] > 20000, df['Sum'] * 0.09, np.where((df['Amount'] >= 1500) & (df['TYPE'] == 'MONTHLY'), df['Amount'] * 0.09, 0))
   Deal     TYPE      Month    Amount       Sum  Desired Column
0     A   ANNUAL      April  10021.34  10057.20          0.0000
1     A  MONTHLY      April     35.86  10057.20          0.0000
2     B  MONTHLY      April  11150.05  11150.05       1003.5045
3     B   ANNUAL       July    661.65    661.65          0.0000
4     B   ANNUAL     August    303.63    303.63          0.0000
5     C   ANNUAL      April  25624.59  25624.59       2306.2131
6     D   ANNUAL       June  27309.26  27309.26       2457.8334
7     D   ANNUAL       July      0.00      0.00          0.0000
8     D   ANNUAL     August      0.00      0.00          0.0000
9     E   ANNUAL      April     10.65     10.65          0.0000
10    E  MONTHLY        May      0.00  18716.70          0.0000
11    E   ANNUAL        May  18716.70  18716.70          0.0000
12    E  MONTHLY       June      0.00    606.49          0.0000
13    E   ANNUAL       June    606.49    606.49          0.0000
14    E  MONTHLY       July      0.00   8890.17          0.0000
15    E  MONTHLY       July   8890.17   8890.17        800.1153
16    E  MONTHLY     August   4000.00  18000.00        360.0000
17    E   ANNUAL     August  14000.00  18000.00          0.0000
18    E   ANNUAL  September   2157.34   2157.34          0.0000
19    E   ANNUAL    October   3025.24   3025.24          0.0000

Group by 2 columns simultaneously while add some conditions to the groupby


Tag : python , By : jch
Date : March 29 2020, 07:55 AM
this one helps. I want to group my data by set and parts columns. If they have the same parts then group them all together. Please see the output column. I want to write a python script to generate exactly what the output column shows. , This is more like a network problem
import networkx as nx
G=nx.from_pandas_edgelist(df, 'Set', 'Parts')
l=list(nx.connected_components(G))
c1=[[y  for y in x if y in df['Set'].tolist()  ]for x in l]
c2=[','.join(set([y  for y in x if y in df['Parts'].tolist()]))for x in l]
from collections import ChainMap

df.Set.map(dict(ChainMap(*map(dict.fromkeys, c1, c2))))
Out[167]: 
0     f,a,b,c,d,g,e
1     f,a,b,c,d,g,e
2     f,a,b,c,d,g,e
3     f,a,b,c,d,g,e
4     f,a,b,c,d,g,e
5     f,a,b,c,d,g,e
6     f,a,b,c,d,g,e
7     f,a,b,c,d,g,e
8     f,a,b,c,d,g,e
9     f,a,b,c,d,g,e
10                z
11              u,y
12              u,y
13              u,y
Name: Set, dtype: object

How to groupby one column with 3 conditions in multiple columns


Tag : python-3.x , By : user186876
Date : March 29 2020, 07:55 AM
Does that help I have dataframe, where i need to apply below condition , Try using Rank function
data['Rank'] = data.groupby('Temp')['output'].rank(method='dense',ascending=True)
data['Final'] = data.groupby('Temp')['Rank'].rank(method='first',ascending=True)

Pandas - groupby columns with conditions from another column


Tag : python , By : Pancilobak
Date : March 29 2020, 07:55 AM
With these it helps Set id and trigger as the index Since the index contains duplicate entries, append another index column with the groupwise cumcount. Totally, df must have a MultiIndex with 3 columns unstack on timestamp Find the difference between the columns hourwise and assign the result back
df['timestamp'] = pd.to_datetime(df['timestamp']) # if necessary

i = df.groupby(['id', 'trigger']).cumcount()
df.set_index(['id', i, 'trigger']).timestamp.unstack().assign(
       diff=lambda d: d.ended.sub(d.started).dt.total_seconds() / 3600
)
v

                  timestamp                      diff
trigger               ended             started      
id                                                   
1  0    2017-10-04 12:00:01 2017-10-01 14:00:01  70.0
   1    2017-10-05 16:00:01 2017-10-03 11:00:01  53.0
2  0    2017-10-04 12:00:01 2017-10-02 10:00:01  50.0
   1    2017-10-05 17:00:01 2017-10-05 15:00:01   2.0
Related Posts Related QUESTIONS :
  • Is there a way to remove specific strings from indexes using a for loop?
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com