Pandas: access data from dataframe by row and column number
Date : March 29 2020, 07:55 AM
this will help I have a simple program made me confused. I read a 3 * 10 data from a csv file, and I want to access a particular data by its row and column number. But it failed, I doun't know why. , Indexing starts from 0: In [8]:
df
Out[8]:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10
1 11 12 13 14 15 16 17 18 19 20
2 21 22 23 24 25 26 27 28 29 30
In [11]:
df[2][2]
Out[11]:
23
In [13]:
df[3][2], df[5][2]
Out[13]:
(24, 26)
df[3][3]
|
How I can speed up row column access to pandas dataframe?
Date : March 29 2020, 07:55 AM
I hope this helps you . You can use iat: print product_list.category_name.iat[int(prod)-1]
print product_list.brand_name.iat[int(prod)-1]
product_list = pd.DataFrame({'brand_name': {'r': 'r', 'g': 't', 'w': 'i'},
'category_name': {'r': 's', 'g': 'f', 'w': 'a'}})
print product_list
brand_name category_name
g t f
r r s
w i a
In [242]: %timeit product_list.iloc[int(prod)-1]['category_name']
The slowest run took 8.27 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 82.7 µs per loop
In [243]: %timeit product_list.brand_name.iat[int(prod)-1]
The slowest run took 16.01 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.96 µs per loop
product_list = pd.DataFrame({'brand_name': {0: 't', 1: 'r', 2: 'i'},
'category_name': {0: 'f', 1: 's', 2: 'a'}})
print product_list
brand_name category_name
0 t f
1 r s
2 i a
In [250]: %timeit product_list.iloc[int(prod)-1]['category_name']
The slowest run took 8.24 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 84.7 µs per loop
In [251]: %timeit product_list.brand_name.iat[int(prod)-1]
The slowest run took 24.17 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.86 µs per loop
|
Proper way to access a column of a pandas dataframe
Tag : python , By : Lucas Thompson
Date : March 29 2020, 07:55 AM
like below fixes the issue Using . as a column accessor is a convenience. There are many limitations beyond having spaces in the name. For example, if your column is named the same as an existing dataframe attribute or method, you won't be able to use it with a .. A non-exhaustive list is mean, sum, index, values, to_dict, etc. You also cannot reference columns with numeric headers via the . accessor. So, yes, ['col'] is strictly better than .col because it is more consistent and reliable.
|
Python Pandas NLTK Tokenize Column in Pandas Dataframe: expected string or bytes-like object
Tag : python , By : user134570
Date : March 29 2020, 07:55 AM
it helps some times There is probably a non-string-like object (such as NaN) in your actual df['TEXT'] which is not shown in the data you posted. Here is how you might be able to find the problematic values: mask = [isinstance(item, (str, bytes)) for item in df['TEXT']]
print(df.loc[~mask])
df = df.loc[mask]
df['TEXT'] = df['TEXT'].astype(str)
import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'TEXT': ['cat, dog fish',
'turtle; cat; fish fish',
'hello book fish',
np.nan]})
# ID TEXT
# 0 1 cat, dog fish
# 1 2 turtle; cat; fish fish
# 2 3 hello book fish
# 3 4 NaN
# df['TEXT'].apply(word_tokenize)
# TypeError: expected string or buffer
mask = [isinstance(item, (str, bytes)) for item in df['TEXT']]
df = df.loc[mask]
# ID TEXT
# 0 1 cat, dog fish
# 1 2 turtle; cat; fish fish
# 2 3 hello book fish
In [108]: df['TEXT'].apply(word_tokenize)
Out[108]:
0 [cat, ,, dog, fish]
1 [turtle, ;, cat, ;, fish, fish]
2 [hello, book, fish]
Name: TEXT, dtype: object
|
Access 1st column in Pandas dataframe
Date : March 29 2020, 07:55 AM
|