Filtering multiple items in a multi-index Python Panda dataframe
Tag : python , By : Bjørn Lyngwa
Date : March 29 2020, 07:55 AM
I wish this help you I have the following table: , You can get_level_values in conjunction with Boolean slicing. In [50]:
print df[np.in1d(df.index.get_level_values(1), ['Lake', 'River', 'Upland'])]
Area
NSRCODE PBL_AWI
CM Lake 57124.819333
River 1603.906642
LBH Lake 258046.508310
River 44262.807900
|
Filtering a panda dataframe based on value and time
Tag : python , By : Steven Weber
Date : March 29 2020, 07:55 AM
With these it helps Group by the Car column first and process every group as following: Create the test data first: import pandas as pd
import numpy as np
np.random.seed(1)
idx = pd.date_range("2016-03-01 10:00:00", "2016-03-01 20:00:00", freq="S")
idx = idx[np.random.randint(0, len(idx), 10000)].sort_values()
evt = np.array(["no event", "event"])[(np.random.rand(len(idx)) < 0.0005).astype(int)]
df = pd.DataFrame({"event":evt, "value":np.random.randint(0, 10, len(evt))}, index=idx)
event_time = df.index[df.event == "event"]
delta = pd.Timedelta(10, unit="s")
start_idx = df.index.searchsorted(event_time - delta).tolist()
end_idx = df.index.searchsorted(event_time + delta).tolist()
mask = np.zeros(df.shape[0], dtype=bool)
evt_id = np.zeros(df.shape[0], dtype=int)
for i, (s, e) in enumerate(zip(start_idx, end_idx)):
mask[s:e] = True
evt_id[s:e] = i
df_event = df[mask]
df_event["event_id"] = evt_id[mask]
event value event_id
2016-03-01 13:51:48 no event 0 0
2016-03-01 13:51:51 event 8 0
2016-03-01 13:51:53 no event 3 0
2016-03-01 13:52:00 no event 1 0
2016-03-01 14:21:00 no event 2 1
2016-03-01 14:21:00 no event 5 1
2016-03-01 14:21:00 no event 0 1
2016-03-01 14:21:02 no event 1 1
2016-03-01 14:21:04 no event 2 1
2016-03-01 14:21:06 no event 0 1
2016-03-01 14:21:07 event 1 1
2016-03-01 14:21:16 no event 1 1
2016-03-01 14:21:16 no event 9 1
2016-03-01 15:09:42 no event 1 2
2016-03-01 15:09:49 event 7 2
2016-03-01 15:09:54 no event 3 2
2016-03-01 15:09:55 no event 3 2
2016-03-01 15:09:58 no event 5 2
2016-03-01 15:09:58 no event 9 2
2016-03-01 17:36:44 no event 8 3
2016-03-01 17:36:44 no event 2 3
2016-03-01 17:36:44 no event 9 3
2016-03-01 17:36:45 no event 2 3
2016-03-01 17:36:49 event 9 3
2016-03-01 17:36:50 no event 6 3
2016-03-01 17:36:54 no event 1 3
2016-03-01 17:36:56 no event 1 3
2016-03-01 18:51:37 no event 5 4
2016-03-01 18:51:37 no event 3 4
2016-03-01 18:51:42 no event 0 4
2016-03-01 18:51:47 event 9 4
2016-03-01 18:51:55 no event 4 4
|
Filtering out string in a Panda Dataframe
Date : March 29 2020, 07:55 AM
help you fix your problem You could filter the rows so as to compute weight and standard deviation as follows: df_string = df.iloc[0] # Assign First row to DF
df_numeric = df.iloc[1:].astype(float) # Assign All rows after first row to DF
cols = df_numeric.columns.values.tolist()
weight = pd.DataFrame([df_numeric[col] / df_numeric.sum(axis=1) for col in df_numeric],
index=cols).T
weight
std = pd.DataFrame([df_numeric.std(axis=1) for col in df_numeric],index=cols).T
std
df_string_std = df_string.to_frame().T.append(std)
df_string_std
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 8 entries, 2006-04-27 to 2006-05-08
Data columns (total 5 columns):
A 8 non-null object
B 8 non-null object
C 8 non-null object
D 8 non-null object
E 8 non-null object
dtypes: object(5)
memory usage: 384.0+ bytes
df.index
DatetimeIndex(['2006-04-27', '2006-04-28', '2006-05-01', '2006-05-02',
'2006-05-03', '2006-05-04', '2006-05-05', '2006-05-08'],
dtype='datetime64[ns]', name='Date', freq=None)
df
|
Efficient way of filtering groupby data in a Panda DataFrame
Tag : python , By : Ohad Barzilay
Date : March 29 2020, 07:55 AM
it helps some times Issue , Using map s = df_a.id.map(dict(df_b[['id', 'A']].values))
df_a[df_a.A <= s]
Out[35]:
id A B C D
0 123 2019-09-10 00:00:00 1 True False
1 123 2019-09-10 00:10:00 1 True False
3 456 2019-09-05 01:00:00 1 True False
5 789 2019-09-10 10:00:00 1 True False
6 789 2019-09-11 00:50:00 1 True False
7 789 2019-09-12 12:00:00 1 True False
(df_a.merge(df_b[['id', 'A']], on='id', how='left', suffixes=('','_y'))
.query('A <= A_y').drop('A_y', 1))
Out[43]:
id A B C D
0 123 2019-09-10 00:00:00 1 True False
1 123 2019-09-10 00:10:00 1 True False
3 456 2019-09-05 01:00:00 1 True False
5 789 2019-09-10 10:00:00 1 True False
6 789 2019-09-11 00:50:00 1 True False
7 789 2019-09-12 12:00:00 1 True False
|
Read 5 lines from a panda dataframe and insert it in one cell per line in another panda dataframe
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I am reading data from an excel file: the dataframe resulting is an array with a single column and several lines: , You can try the following. df['group'] = df.index//5 # add extra column to hold the group value
new_df = df.groupby('group').identifier.apply(list).apply(pd.Series)
df.drop('group', axis=1) # drop the extra column that was created.
print(new_df.head())
df = pd.DataFrame(np.random.randint(0,1000,size=6026), columns=["identifier"])
df.head()
identifier
0 752
1 14
2 184
3 139
4 37
df['group'] = df.index//5
df1 = df.groupby('group').identifier.apply(list).apply(pd.Series).fillna(0)
df1 = df1.astype('int32')
df1.head()
0 1 2 3 4
group
0 752 14 184 139 37
1 716 499 902 54 565
2 74 427 939 380 244
3 651 803 97 78 492
4 169 376 737 342 616
df['group'] = df.index//5
df1 = pd.DataFrame(df.groupby('group').identifier.apply(list))
df1.head()
identifier
group
0 [752, 14, 184, 139, 37]
1 [716, 499, 902, 54, 565]
2 [74, 427, 939, 380, 244]
3 [651, 803, 97, 78, 492]
4 [169, 376, 737, 342, 616]
|