Load CSV Strings With Different Types into Pandas Dataframe, Split Columns, Parse Date
Date : March 29 2020, 07:55 AM
it fixes the issue You can preprocess everything inside the read_csv as shown: import csv
data = StringIO(
'''
"XAU=,XAU=,XAG=,XAG="
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"
"25/08/2014 6:00:05,1200,,,"
''')
df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, escapechar='"', \
parse_dates=[0, 2]).rename(columns=lambda x: x.split("=")[0])
df
df.dtypes
XAU datetime64[ns]
XAU float64
XAG datetime64[ns]
XAG float64
dtype: object
df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, escapechar='"', \
parse_dates=[0], usecols=[0,1]).rename(columns=lambda x: x.split("=")[0])
df
df.columns = df.columns + [str('_%d'%(i)) for i in list(range(len(df.columns)))]
ser = pd.Series(data=df['XAU_1'].values, index=df['XAU_0'].values, name='XAU')
ser
2014-08-25 06:00:05 1200.343
2014-08-25 06:00:05 1200.000
Name: XAU, dtype: float64
type(ser)
pandas.core.series.Series
data = StringIO(
'''
"XAU=,XAU=,XAG=,XAG="
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"
"25/08/2014 6:00:05,1200,,,"
",,25/08/2014 6:00:05,19.50,"
''')
df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, na_values=[""],
parse_dates=[2]).rename(columns=lambda x: x.strip('"').split("=")[0])
old_cols = df.columns
# Index(['XAU', 'XAU', 'XAG', 'XAG'], dtype='object')
new_cols = [col[0] for col in list(enumerate(df.columns))]
# [0, 1, 2, 3]
df.columns = new_cols
# Converting first column to datetime dtype
df[0] = pd.to_datetime(df[0].str.replace('"', ''))
df.columns = old_cols
df
df.dtypes
XAU datetime64[ns]
XAU float64
XAG datetime64[ns]
XAG float64
dtype: object
|
A Pythonic way to reshape Pandas.DataFrame's
Tag : python , By : Fenix Drakken
Date : March 29 2020, 07:55 AM
wish of those help One way is to use cumcount and then pivot_table: In [11]: df["count"] = df.groupby("label").cumcount()
In [12]: df
Out[12]:
label value count
0 a 0.2 0
1 a 0.1 1
2 a 0.4 2
3 b 0.5 0
4 b 0.2 1
5 b 0.6 2
6 c 0.7 0
7 c 0.9 1
8 c 0.3 2
In [13]: df.pivot_table("value", "count", "label")
Out[13]:
label a b c
count
0 0.2 0.5 0.7
1 0.1 0.2 0.9
2 0.4 0.6 0.3
In [21]: df["value"].values.reshape((-1, 3)).T
Out[21]:
array([[ 0.2, 0.5, 0.7],
[ 0.1, 0.2, 0.9],
[ 0.4, 0.6, 0.3]])
In [22]: pd.DataFrame(df["value"].values.reshape((-1, 3)).T,
columns=df.loc[::3, "label"])
Out[22]:
label a b c
0 0.2 0.5 0.7
1 0.1 0.2 0.9
2 0.4 0.6 0.3
|
A 'pythonic' way to generate a seasonal dataframe from a pandas timeseries dataframe
Date : March 29 2020, 07:55 AM
will be helpful for those in need You can use DatetimeIndex.strftime and DatetimeIndex.year and for correct ordering use sorted CategoricalIndex, last reshape by pivot: c = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.pivot(index=pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c),
columns=df.index.year,
values=df['value'])
print (df)
2015 2016 2017
Jan 201.55 201.65 201.75
Feb 201.60 201.70 201.80
Mar 201.65 201.75 201.85
Apr 201.70 201.80 201.90
May 201.75 201.85 201.95
Jun 201.80 201.90 202.00
Jul 201.85 201.95 202.05
Aug 201.90 202.00 202.10
Sep 201.95 202.05 202.15
Oct 202.00 202.10 202.20
Nov 202.05 202.15 202.25
Dec 202.10 202.20 202.30
df.plot()
df['months'] = pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c)
df['years'] = df.index.year
df = df.pivot(index='months', columns='years',values='value')
|
Parse/split URLs in a pandas dataframe using urllib
Date : March 29 2020, 07:55 AM
This might help you The example you used assumes that the links are in a dataframe. Here's the correct solution: import urllib
import pandas as pd
df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])
protocol domain path query fragment
0 https www.google.com /something
1 https mail.google.com /anohtersomething
2 https www.amazon.com /yetanotherthing
|
Parse / split many columns to multiple columns in pandas dataframe using custom function
Date : March 29 2020, 07:55 AM
With these it helps I've consulted a bunch of previous related SO posts, but I could not adapt them to solve my question. for c in df.columns:
if c.endswith('_date'):
parse_column(df, c)
for c in my_columns_list:
parse_column(df, c)
|