logo
down
shadow

complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]


complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]

Content Index :

complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]
Tag : python , By : Sinisa Ruzin
Date : November 27 2020, 03:01 PM

this will help I have a hard time to do a complex dataframe filtering. , Here is a version without the need of extra variables:
df.groupby('imo').apply(lambda grp: grp[grp.index >= 
                                        ((grp.polygon.shift() != grp.polygon) & 
                                         (grp.polygon.shift(-1) == grp.polygon) & 
                                         (grp.polygon == 'FE')
                                        ).cumsum().idxmax()]
                       ).reset_index(level=0, drop=True)

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Filtering multiple items in a multi-index Python Panda dataframe


Tag : python , By : Bjørn Lyngwa
Date : March 29 2020, 07:55 AM
I wish this help you I have the following table: , You can get_level_values in conjunction with Boolean slicing.
In [50]:

print df[np.in1d(df.index.get_level_values(1), ['Lake', 'River', 'Upland'])]
                          Area
NSRCODE PBL_AWI               
CM      Lake      57124.819333
        River      1603.906642
LBH     Lake     258046.508310
        River     44262.807900

Filtering a panda dataframe based on value and time


Tag : python , By : Steven Weber
Date : March 29 2020, 07:55 AM
With these it helps Group by the Car column first and process every group as following:
Create the test data first:
import pandas as pd
import numpy as np

np.random.seed(1)
idx = pd.date_range("2016-03-01 10:00:00", "2016-03-01 20:00:00", freq="S")
idx = idx[np.random.randint(0, len(idx), 10000)].sort_values()
evt = np.array(["no event", "event"])[(np.random.rand(len(idx)) < 0.0005).astype(int)]
df = pd.DataFrame({"event":evt, "value":np.random.randint(0, 10, len(evt))}, index=idx)
event_time = df.index[df.event == "event"]
delta = pd.Timedelta(10, unit="s")

start_idx = df.index.searchsorted(event_time - delta).tolist()
end_idx = df.index.searchsorted(event_time + delta).tolist()
mask = np.zeros(df.shape[0], dtype=bool)
evt_id = np.zeros(df.shape[0], dtype=int)
for i, (s, e) in enumerate(zip(start_idx, end_idx)):
    mask[s:e] = True
    evt_id[s:e] = i
df_event = df[mask]
df_event["event_id"] = evt_id[mask]
                        event  value  event_id
2016-03-01 13:51:48  no event      0         0
2016-03-01 13:51:51     event      8         0
2016-03-01 13:51:53  no event      3         0
2016-03-01 13:52:00  no event      1         0
2016-03-01 14:21:00  no event      2         1
2016-03-01 14:21:00  no event      5         1
2016-03-01 14:21:00  no event      0         1
2016-03-01 14:21:02  no event      1         1
2016-03-01 14:21:04  no event      2         1
2016-03-01 14:21:06  no event      0         1
2016-03-01 14:21:07     event      1         1
2016-03-01 14:21:16  no event      1         1
2016-03-01 14:21:16  no event      9         1
2016-03-01 15:09:42  no event      1         2
2016-03-01 15:09:49     event      7         2
2016-03-01 15:09:54  no event      3         2
2016-03-01 15:09:55  no event      3         2
2016-03-01 15:09:58  no event      5         2
2016-03-01 15:09:58  no event      9         2
2016-03-01 17:36:44  no event      8         3
2016-03-01 17:36:44  no event      2         3
2016-03-01 17:36:44  no event      9         3
2016-03-01 17:36:45  no event      2         3
2016-03-01 17:36:49     event      9         3
2016-03-01 17:36:50  no event      6         3
2016-03-01 17:36:54  no event      1         3
2016-03-01 17:36:56  no event      1         3
2016-03-01 18:51:37  no event      5         4
2016-03-01 18:51:37  no event      3         4
2016-03-01 18:51:42  no event      0         4
2016-03-01 18:51:47     event      9         4
2016-03-01 18:51:55  no event      4         4

Filtering out string in a Panda Dataframe


Tag : python , By : DK.
Date : March 29 2020, 07:55 AM
help you fix your problem You could filter the rows so as to compute weight and standard deviation as follows:
df_string = df.iloc[0]                       # Assign First row to DF
df_numeric = df.iloc[1:].astype(float)       # Assign All rows after first row to DF

cols = df_numeric.columns.values.tolist()
weight = pd.DataFrame([df_numeric[col] / df_numeric.sum(axis=1) for col in df_numeric],    
                       index=cols).T
weight
std = pd.DataFrame([df_numeric.std(axis=1) for col in df_numeric],index=cols).T
std
df_string_std = df_string.to_frame().T.append(std)  
df_string_std
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 8 entries, 2006-04-27 to 2006-05-08
Data columns (total 5 columns):
A    8 non-null object
B    8 non-null object
C    8 non-null object
D    8 non-null object
E    8 non-null object
dtypes: object(5)
memory usage: 384.0+ bytes

df.index
DatetimeIndex(['2006-04-27', '2006-04-28', '2006-05-01', '2006-05-02',
               '2006-05-03', '2006-05-04', '2006-05-05', '2006-05-08'],
               dtype='datetime64[ns]', name='Date', freq=None)
df

Efficient way of filtering groupby data in a Panda DataFrame


Tag : python , By : Ohad Barzilay
Date : March 29 2020, 07:55 AM
it helps some times Issue , Using map
s = df_a.id.map(dict(df_b[['id', 'A']].values))
df_a[df_a.A <= s]

Out[35]:
    id                   A  B     C      D
0  123 2019-09-10 00:00:00  1  True  False
1  123 2019-09-10 00:10:00  1  True  False
3  456 2019-09-05 01:00:00  1  True  False
5  789 2019-09-10 10:00:00  1  True  False
6  789 2019-09-11 00:50:00  1  True  False
7  789 2019-09-12 12:00:00  1  True  False
(df_a.merge(df_b[['id', 'A']], on='id', how='left', suffixes=('','_y'))
     .query('A <= A_y').drop('A_y', 1))

Out[43]:
    id                   A  B     C      D
0  123 2019-09-10 00:00:00  1  True  False
1  123 2019-09-10 00:10:00  1  True  False
3  456 2019-09-05 01:00:00  1  True  False
5  789 2019-09-10 10:00:00  1  True  False
6  789 2019-09-11 00:50:00  1  True  False
7  789 2019-09-12 12:00:00  1  True  False

Read 5 lines from a panda dataframe and insert it in one cell per line in another panda dataframe


Tag : python , By : demize95
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I am reading data from an excel file: the dataframe resulting is an array with a single column and several lines: , You can try the following.
df['group'] = df.index//5 # add extra column to hold the group value
new_df = df.groupby('group').identifier.apply(list).apply(pd.Series)
df.drop('group', axis=1) # drop the extra column that was created.
print(new_df.head())
df = pd.DataFrame(np.random.randint(0,1000,size=6026), columns=["identifier"])
df.head()

identifier
0   752
1   14
2   184
3   139
4   37
df['group'] = df.index//5
df1 = df.groupby('group').identifier.apply(list).apply(pd.Series).fillna(0)
df1 = df1.astype('int32')
df1.head()

    0   1   2   3   4
group                   
0   752 14  184 139 37
1   716 499 902 54  565
2   74  427 939 380 244
3   651 803 97  78  492
4   169 376 737 342 616
df['group'] = df.index//5
df1 = pd.DataFrame(df.groupby('group').identifier.apply(list))
df1.head()

    identifier
group   
0   [752, 14, 184, 139, 37]
1   [716, 499, 902, 54, 565]
2   [74, 427, 939, 380, 244]
3   [651, 803, 97, 78, 492]
4   [169, 376, 737, 342, 616]
Related Posts Related QUESTIONS :
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • I have a regex statement to pull all numbers out of a text file, but it only finds 77 out of the 81 numbers in the file
  • How do I create a dataframe of jobs and companies that includes hyperlinks?
  • Detect if user has clicked the 'maximized' button
  • Does flask_login automatically set the "next" argument?
  • Indents in python 3
  • How to create a pool of threads
  • Pandas giving IndexError on one dataframe but not on another similar dataframe
  • Django Rest Framework - Testing client.login doesn't login user, ret anonymous user
  • Running dag without dag file in airflow
  • Filling across a specified dimension of a numpy array
  • Python populating dataframe in pandas from text files
  • How to interpolate a single ("non-piecewise") cubic spline from a set of data points?
  • Divide 2 integers (leetcode 29) - recursion issue
  • Can someone explain why do I get this output in Python?
  • How do I scrape pdf and html from search results without obvious url
  • Is there a way to automatically make a "collage" of plots with matplotlib?
  • How to combine multiple rows in pandas with shared column values
  • How do I get LOAD_CLASSDEREF instruction after dis.dis?
  • Django - How to add items to Bootstrap dropdown?
  • Linear Regression - Does the below implementation of ridge regression finding coefficient term using gradient method is
  • How to drop all rows in pandas dataframe with negative values?
  • Most Efficient Way to Find Closest Date Between 2 Dataframes
  • Execution error when Passing arguments to a python script using os.system. The script takes sys.argv arguments
  • Looping through a function
  • Create a plot for each unique ID
  • a thread python with 'while' got another thread never start
  • Solution from SciPy solve_ivp contains oscillations for a system of first-order ODEs
  • trigger python events driven by selenium controlled browser
  • Passing line-edits to a contextmanager to set validators
  • Python: globals().items() iterations try to change a dict
  • Is it possible to specify starting values for each parameter (instead of bounds) for scipy's differential evolution?
  • why datetime.now() and constructed datetime using all fields(like year,month...) of now has big timedelta?
  • MySQL multiple table UPDATE query using sqlalchemy core?
  • find if a semantic version is superset of of another version python
  • Type checking against dynamically created objects
  • Struggling with simple reverse function
  • Is there a function for finding the midpoint of n points on sklearn.neighbors.NearestNeighbors?
  • How to set max number of tweets to fetch
  • PYTHON 3.7.4 NOT USING SQLITE 3.29.0
  • How to replace Nan value with zeros in a numpy array?
  • How to speed up calculating variance among sparse matrix
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com