Python/Pandas: combine columns from 2 dataframes based on match of values between columns, but can't use merge
Date : March 29 2020, 07:55 AM
To fix the issue you can do I have two dataframes and I need to separate rows where a value from pmdf matches one of the codes in jcrdf.All_codes. , One way would be to expand jcrdf All_codes column and then use merge jcrdf_temp = jcrdf.set_index(['jcode', 'jobtitle', 'location']).All_codes.str.split(',',expand = True)\
.stack().reset_index(3,drop = True).reset_index(name = 'All_codes')
new_df = pd.merge(pmdf, jcrdf_temp, left_on = 'code', right_on = 'All_codes')
code count jcode jobtitle location All_codes
0 0567-8315 6 3333-3333 technician loc3 0567-8315
1 0567-8315 6 3333-3333 technician loc3 0567-8315
2 0007-4977 7 2222-2222 noob loc4 0007-4977
3 0007-4977 7 2222-2222 noob loc4 0007-4977
4 0096-0225 10 4444-4444 manager loc1 0096-0225
5 0096-0225 10 4444-4444 manager loc1 0096-0225
6 1365-2133 2 1111-1111 retiree loc2 1365-2133
new_df = new_df.drop('All_codes', 1).groupby(['jcode', 'jobtitle', 'count', 'location']).code.apply(','.join).reset_index()
jcode jobtitle count location code
0 1111-1111 retiree 2 loc2 1365-2133
1 2222-2222 noob 7 loc4 0007-4977,0007-4977
2 3333-3333 technician 6 loc3 0567-8315,0567-8315
3 4444-4444 manager 10 loc1 0096-0225,0096-0225
|
TypeError from merge pandas DataFrame on columns when item in columns is list
Date : March 29 2020, 07:55 AM
this one helps. I got TypeError: type object argument after * must be an iterable, not itertools.imap for doing pd.merge 2 dataframes df1 and df_idenon the columns 'allmzidx' where data is list , I think need convert lists to tuples: df1['allmzidx'] = df1['allmzidx'].apply(tuple)
for index, each_iden in enumerate(alliden_tuple2):
df_iden = pd.DataFrame(each_iden, columns=['int','mztop3','allmzidx'])
df_iden['allmzidx'] = df_iden['allmzidx'].apply(tuple)
df_iden = pd.merge(df_iden, df1, how='left', on='allmzidx')
|
pandas merge: merge two dataframes on same column but keep different columns
Tag : python , By : Hitesh Prajapati
Date : March 29 2020, 07:55 AM
I wish this help you I have two pandas dataFrames that share one common column name. I would like to merge on the common column name but keep all the different columns from the second dataFrame where there's a match on the common column name. Here's an example of the two dataFrames: , If you have setup z=io.StringIO(""" A B C D E F G H
value2 value2 value2 value2 value2 value2 value2 value2
value3 value3 value3 value3 value3 value3 value3 value3
value value value value value value value value""")
df = pd.read_table(z, delim_whitespace=True)
z2=io.StringIO(""" A I J K L
value value value value value
value2 value2 value2 value2 value2
value3 value3 value3 value3 value3""")
df2=pd.read_table(z2,delim_whitespace=True)
pd.merge(df,df2, on="A",right_index=True, left_index=True)
A B C D E F G H I J K L
0 value value value value value value value value value value value value
1 value value value value value value value value value value value value
2 value value value value value value value value value value value value
pd.merge(df.set_index("A"),df2.set_index("A"), right_index=True, left_index=True).reset_index()
A B C D E F G H I J K L
0 value2 value2 value2 value2 value2 value2 value2 value2 value2 value2 value2 value2
1 value3 value3 value3 value3 value3 value3 value3 value3 value3 value3 value3 value3
2 value value value value value value value value value value value value
|
How to merge columns after groupby and selecting first valid value of other columns in a pandas dataframe?
Tag : python , By : Longchao Dong
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further You don't need to loop, but you do need to "melt" your dataframe before your group-by operation. So starting with: from io import StringIO
import pandas
f = StringIO("""\
ID,col_1,col_2,col_3,Date
1,,20,40,1/1/2018
1,10,,,1/2/2018
1,50,,60,1/3/2018
3,40,10,90,1/1/2018
4,,80,80,1/1/2018
""")
df = pandas.read_csv(f)
print(
df.melt(id_vars=['ID', 'Date'], value_vars=['col_1', 'col_2', 'col_3'], value_name='first')
.groupby(by=['ID', 'variable'])
.first()
.unstack(level='variable')
)
Date first
variable col_1 col_2 col_3 col_1 col_2 col_3
ID
1 1/1/2018 1/1/2018 1/1/2018 10.0 20.0 40.0
3 1/1/2018 1/1/2018 1/1/2018 40.0 10.0 90.0
4 1/1/2018 1/1/2018 1/1/2018 NaN 80.0 80.0
def flatten_columns(df, sep='_'):
newcols = [sep.join(_) for _ in df.columns]
return df.set_axis(newcols, axis='columns', inplace=False)
print(
df.melt(id_vars=['ID', 'Date'], value_vars=['col_1', 'col_2', 'col_3'], value_name='first')
.groupby(by=['ID', 'variable'])
.first()
.unstack(level='variable')
.sort_index(level='variable', axis='columns')
.pipe(flatten_columns)
)
Date_col_1 first_col_1 Date_col_2 first_col_2 Date_col_3 first_col_3
ID
1 1/1/2018 10.0 1/1/2018 20.0 1/1/2018 40.0
3 1/1/2018 40.0 1/1/2018 10.0 1/1/2018 90.0
4 1/1/2018 NaN 1/1/2018 80.0 1/1/2018 80.0
|
Merge Pandas DataFrame using apply() to only merge on partial match in two columns
Date : March 29 2020, 07:55 AM
it helps some times I need to merge two pandas DataFrames but not only on exact column values, but also on approximate ones. , Here is one way from merge_asof pd.merge_asof(df,df2,left_on='col2',right_on='col2b',tolerance = 1,direction ='nearest').dropna()
Out[7]:
col1 col2 col1a col2b
0 a 3 aa 3.0
1 b 4 bb 4.0
2 c 66 cc 67.0
|