logo
down
shadow

pythonic way to parse/split URLs in a pandas dataframe


pythonic way to parse/split URLs in a pandas dataframe

Content Index :

pythonic way to parse/split URLs in a pandas dataframe
Tag : python , By : user184975
Date : November 29 2020, 09:01 AM

I wish this help you You can use Series.map to accomplish the same in one line:
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urlparse.urlsplit))
import pandas

urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething','https://www.amazon.com/yetanotherthing'] # tested with list of 186 urls instead
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urlparse.urlsplit))

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Load CSV Strings With Different Types into Pandas Dataframe, Split Columns, Parse Date


Tag : python , By : kaart
Date : March 29 2020, 07:55 AM
it fixes the issue You can preprocess everything inside the read_csv as shown:
import csv

data = StringIO(
'''
"XAU=,XAU=,XAG=,XAG="  
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"            
"25/08/2014 6:00:05,1200,,,"
''')

df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, escapechar='"',   \
                 parse_dates=[0, 2]).rename(columns=lambda x: x.split("=")[0])
df
df.dtypes

XAU    datetime64[ns]
XAU           float64
XAG    datetime64[ns]
XAG           float64
dtype: object
df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, escapechar='"',   \
                 parse_dates=[0], usecols=[0,1]).rename(columns=lambda x: x.split("=")[0])

df
df.columns = df.columns + [str('_%d'%(i)) for i in list(range(len(df.columns)))]

ser = pd.Series(data=df['XAU_1'].values, index=df['XAU_0'].values, name='XAU')
ser

2014-08-25 06:00:05    1200.343
2014-08-25 06:00:05    1200.000
Name: XAU, dtype: float64

type(ser)
pandas.core.series.Series
data = StringIO(
'''
"XAU=,XAU=,XAG=,XAG="   
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44," 
"25/08/2014 6:00:05,1200,,,"
",,25/08/2014 6:00:05,19.50,"       
''')


df = pd.read_csv(data, quoting=csv.QUOTE_NONE, index_col=False, na_values=[""], 
                 parse_dates=[2]).rename(columns=lambda x: x.strip('"').split("=")[0])

old_cols = df.columns
# Index(['XAU', 'XAU', 'XAG', 'XAG'], dtype='object')

new_cols = [col[0] for col in list(enumerate(df.columns))]
# [0, 1, 2, 3]
df.columns = new_cols

# Converting first column to datetime dtype
df[0] = pd.to_datetime(df[0].str.replace('"', ''))   
df.columns = old_cols

df
df.dtypes

XAU    datetime64[ns]
XAU           float64
XAG    datetime64[ns]
XAG           float64
dtype: object

A Pythonic way to reshape Pandas.DataFrame's


Tag : python , By : Fenix Drakken
Date : March 29 2020, 07:55 AM
wish of those help One way is to use cumcount and then pivot_table:
In [11]: df["count"] = df.groupby("label").cumcount()

In [12]: df
Out[12]:
  label  value  count
0     a    0.2      0
1     a    0.1      1
2     a    0.4      2
3     b    0.5      0
4     b    0.2      1
5     b    0.6      2
6     c    0.7      0
7     c    0.9      1
8     c    0.3      2

In [13]: df.pivot_table("value", "count", "label")
Out[13]:
label    a    b    c
count
0      0.2  0.5  0.7
1      0.1  0.2  0.9
2      0.4  0.6  0.3
In [21]: df["value"].values.reshape((-1, 3)).T
Out[21]:
array([[ 0.2,  0.5,  0.7],
       [ 0.1,  0.2,  0.9],
       [ 0.4,  0.6,  0.3]])
In [22]: pd.DataFrame(df["value"].values.reshape((-1, 3)).T, 
                      columns=df.loc[::3, "label"])
Out[22]:
label    a    b    c
0      0.2  0.5  0.7
1      0.1  0.2  0.9
2      0.4  0.6  0.3

A 'pythonic' way to generate a seasonal dataframe from a pandas timeseries dataframe


Tag : python , By : RinKaMan
Date : March 29 2020, 07:55 AM
will be helpful for those in need You can use DatetimeIndex.strftime and DatetimeIndex.year and for correct ordering use sorted CategoricalIndex, last reshape by pivot:
c = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

df = pd.pivot(index=pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c),
              columns=df.index.year,
              values=df['value'])
print (df)

       2015    2016    2017
Jan  201.55  201.65  201.75
Feb  201.60  201.70  201.80
Mar  201.65  201.75  201.85
Apr  201.70  201.80  201.90
May  201.75  201.85  201.95
Jun  201.80  201.90  202.00
Jul  201.85  201.95  202.05
Aug  201.90  202.00  202.10
Sep  201.95  202.05  202.15
Oct  202.00  202.10  202.20
Nov  202.05  202.15  202.25
Dec  202.10  202.20  202.30

df.plot()
df['months'] = pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c)
df['years'] = df.index.year
df = df.pivot(index='months', columns='years',values='value')

Parse/split URLs in a pandas dataframe using urllib


Tag : python , By : cthulhup
Date : March 29 2020, 07:55 AM
This might help you The example you used assumes that the links are in a dataframe. Here's the correct solution:
import urllib
import pandas as pd

df = pd.DataFrame()
urls = ['https://www.google.com/something','https://mail.google.com/anohtersomething', 'https://www.amazon.com/yetanotherthing']
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*[urllib.parse.urlsplit(x) for x in urls])
  protocol           domain               path query fragment
0    https   www.google.com         /something
1    https  mail.google.com  /anohtersomething
2    https   www.amazon.com   /yetanotherthing

Parse / split many columns to multiple columns in pandas dataframe using custom function


Tag : python , By : ikey
Date : March 29 2020, 07:55 AM
With these it helps I've consulted a bunch of previous related SO posts, but I could not adapt them to solve my question.
for c in df.columns:
    if c.endswith('_date'):
        parse_column(df, c)
for c in my_columns_list:
    parse_column(df, c)
Related Posts Related QUESTIONS :
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • I have a regex statement to pull all numbers out of a text file, but it only finds 77 out of the 81 numbers in the file
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com