Pandas semi structured JSON data frame to simple Pandas dataframe
Date : March 29 2020, 07:55 AM
this will help Taking your input string above as a variable named 'data', this Python+pyparsing code will make some sense of it. Unfortunately, that stuff to the right of the fourth '|' isn't really JSON. Fortunately, it is well enough formatted that it can be parsed without undue discomfort. See the embedded comments in the program below: from pyparsing import *
from datetime import datetime
# for the most part, we suppress punctuation - it's important at parse time
# but just gets in the way afterwards
LBRACE,RBRACE,COLON,DBLQ,LBRACK,RBRACK = map(Suppress, '{}:"[]')
DBLQ2 = DBLQ + DBLQ
# define some scalar value expressions, including parse-time conversion parse actions
realnum = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
timestamp = Regex(r'""\d{4}-\d{2}-\d{2}T\d{2}:\d{2}""')
timestamp.setParseAction(lambda t: datetime.strptime(t[0][2:-2],'%Y-%m-%dT%H:%M'))
string_value = QuotedString('""')
# define our base key ':' value expression; use a Forward() placeholder
# for now for value, since these things can be recursive
key = Optional(DBLQ2) + Word(alphas, alphanums+'_') + DBLQ2
value = Forward()
key_value = Group(key + COLON + value)
# objects can be values too - use the Dict class to capture keys as field names
obj = Group(Dict(LBRACE + OneOrMore(key_value) + RBRACE))
objlist = (LBRACK + ZeroOrMore(obj) + RBRACK)
# define expression for previously-declared value, using <<= operator
value <<= timestamp | string_value | realnum | integer | obj | Group(objlist)
# the outermost objects are enclosed in "s, and list of them can be given with '|' delims
quotedObj = DBLQ + obj + DBLQ
obsList = delimitedList(quotedObj, delim='|')
fields = data.split('|',4)
result = obsList.parseString(fields[-1])
# we get back a list of objects, dump them out
for r in result:
print r.dump()
print
[['currency', 'EUR'], ['item_id', '143'], ['type', 'FLIGHT'], ['name', 'PAR-FEZ'], ['price', 1111], ['origin', 'PAR'], ['destination', 'FEZ'], ['merchant', 'GOV'], ['flight_type', 'OW'], ['flight_segment', [[['origin', 'ORY'], ['destination', 'FEZ'], ['departure_date_time', datetime.datetime(2015, 8, 2, 7, 20)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 9, 5)], ['carrier', 'AT'], ['f_class', 'ECONOMY']]]]]
- currency: EUR
- destination: FEZ
- flight_segment:
[0]:
[['origin', 'ORY'], ['destination', 'FEZ'], ['departure_date_time', datetime.datetime(2015, 8, 2, 7, 20)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 9, 5)], ['carrier', 'AT'], ['f_class', 'ECONOMY']]
- arrival_date_time: 2015-08-02 09:05:00
- carrier: AT
- departure_date_time: 2015-08-02 07:20:00
- destination: FEZ
- f_class: ECONOMY
- origin: ORY
- flight_type: OW
- item_id: 143
- merchant: GOV
- name: PAR-FEZ
- origin: PAR
- price: 1111
- type: FLIGHT
[['type', 'FLIGHT'], ['name', 'FI_ORY-OUD'], ['item_id', 'FLIGHT'], ['currency', 'EUR'], ['price', 111], ['origin', 'ORY'], ['destination', 'OUD'], ['flight_type', 'OW'], ['flight_segment', [[['origin', 'ORY'], ['destination', 'OUD'], ['departure_date_time', datetime.datetime(2015, 8, 2, 13, 55)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 15, 30)], ['flight_number', 'AT625'], ['carrier', 'AT'], ['f_class', 'ECONOMIC_DISCOUNTED']]]]]
- currency: EUR
- destination: OUD
- flight_segment:
[0]:
[['origin', 'ORY'], ['destination', 'OUD'], ['departure_date_time', datetime.datetime(2015, 8, 2, 13, 55)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 15, 30)], ['flight_number', 'AT625'], ['carrier', 'AT'], ['f_class', 'ECONOMIC_DISCOUNTED']]
- arrival_date_time: 2015-08-02 15:30:00
- carrier: AT
- departure_date_time: 2015-08-02 13:55:00
- destination: OUD
- f_class: ECONOMIC_DISCOUNTED
- flight_number: AT625
- origin: ORY
- flight_type: OW
- item_id: FLIGHT
- name: FI_ORY-OUD
- origin: ORY
- price: 111
- type: FLIGHT
res[0].currency
res[0].price
res[0].destination
res[0].flight_segment[0].origin
len(res[0].flight_segment) # gives how many segments
|
Pandas, how to multiindex an existing DataFrame that's data comes from JSON. How to alter JSON object through pandas
Tag : python , By : Fred Morrison
Date : March 29 2020, 07:55 AM
around this issue I have files containing JSON objects as below: , I think you need: print (df)
day_time sensor_id customer_id rssi advertiser_id
0 2017-03-17 4000068 76 352 1000001
0 2017-03-17 09:20:17.708 4000068 56 374 1000001
1 2017-03-17 09:20:42.561 4000068 60 392 1000001
0 2017-03-17 09:44:21.728 4000514 76 352 1000001
0 2017-03-17 10:32:45.227 4000461 76 332 1000001
0 2017-03-17 12:47:06.639 4000046 43 364 1000001
0 2017-03-17 12:49:34.438 4000046 62 423 1000001
0 2017-03-17 12:52:28.430 4000072 62 430 1000001
1 2017-03-17 12:52:32.593 4000072 62 394 1000001
0 2017-03-17 12:53:17.708 4000917 76 335 1000001
df['day_time'] = pd.to_datetime(df['day_time']).dt.date
df = df.set_index(['day_time','sensor_id']).sort_index()
print (df)
customer_id rssi advertiser_id
day_time sensor_id
2017-03-17 4000046 43 364 1000001
4000046 62 423 1000001
4000068 76 352 1000001
4000068 56 374 1000001
4000068 60 392 1000001
4000072 62 430 1000001
4000072 62 394 1000001
4000461 76 332 1000001
4000514 76 352 1000001
4000917 76 335 1000001
|
one-to-many joining pandas dataframes as a JSON instead of a pandas dataframe
Date : March 29 2020, 07:55 AM
To fix this issue One possible solution is define columns to emps list of DataFrames in apply: d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'emp_name': 'gigi'}]
1 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]}]
d = (pd.merge(dept, emp, on = 'dep_id')
.groupby('dep_name').apply(lambda x: x[['emp_name', 'dep_id']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
dep_name emps
0 giraffes [{'dep_id': 2, 'emp_name': 'gigi'}]
1 shoes [{'dep_id': 1, 'emp_name': 'joe'}, {'dep_id': ...
j = d.to_json(orient='records')
print (j)
[{"dep_name":"giraffes","emps":[{"dep_id":2,"emp_name":"gigi"}]},
{"dep_name":"shoes","emps":[{"dep_id":1,"emp_name":"joe"},{"dep_id":1,"emp_name":"bo"}]}]
dept = pd.DataFrame({'dep_id': [1,2], 'dep_name':['shoes', 'giraffes'], 'def_size':[4,5]})
emp = pd.DataFrame({'dep_id': [1,1,2], 'emp_name': ['joe', 'bo', 'gigi']})
df = pd.merge(dept, emp, on = 'dep_id')
#single columns def_size and dep_name
d = (df.groupby(['def_size','dep_name']).apply(lambda x: x[['emp_name']]
.to_dict('r'))
.reset_index(name='emps'))
print (d)
def_size dep_name emps
0 4 shoes [{'emp_name': 'joe'}, {'emp_name': 'bo'}]
1 5 giraffes [{'emp_name': 'gigi'}]
j = d.to_json(orient='records')
print (j)
[{"def_size":4,"dep_name":"shoes","emps":[{"emp_name":"joe"},{"emp_name":"bo"}]},
{"def_size":5,"dep_name":"giraffes","emps":[{"emp_name":"gigi"}]}]
|
Export pandas dataframe to json and back to a dataframe with columns in the same order
Date : March 29 2020, 07:55 AM
wish help you to fix your issue You can use parameter orient='split' in to_json/read_json which also save in json column names in list in original ordering: df = pd.DataFrame({
'C':list('abcdef'),
'B':[4,5,4,5,5,4],
'A':[7,8,9,4,2,3],
})
print (df.to_json(orient='split'))
{"columns":["C","B","A"],
"index":[0,1,2,3,4,5],
"data":[["a",4,7],["b",5,8],
["c",4,9],["d",5,4],["e",5,2],["f",4,3]]}
df.to_json('file.json', orient='split')
df = pd.read_json('file.json', orient='split')
print (df)
C B A
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
df.to_pickle('file')
df = pd.read_pickle('file')
import json
j = {'columns': df.columns.tolist(), 'data' : df.to_dict(orient='records')}
print (j)
{'columns': ['C', 'B', 'A'],
'data': [{'C': 'a', 'B': 4, 'A': 7},
{'C': 'b', 'B': 5, 'A': 8},
{'C': 'c', 'B': 4, 'A': 9},
{'C': 'd', 'B': 5, 'A': 4},
{'C': 'e', 'B': 5, 'A': 2},
{'C': 'f', 'B': 4, 'A': 3}]}
file = 'file.json'
with open(file, 'w') as f_obj:
json.dump(j, f_obj)
with open(file) as f_obj:
json_data = json.load(f_obj)
df = pd.DataFrame(json_data['data'], columns=json_data['columns'])
print(df)
C B A
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
|
Best way to transform JSON data within a Pandas dataframe into a dataframe itself
Date : March 29 2020, 07:55 AM
hop of those help? I would call the dataframe constructor after converting the string to dict ( i think this would be faster): import ast
pd.DataFrame(df.js.apply(ast.literal_eval).tolist())
import json
pd.DataFrame(df["js"].apply(json.loads).tolist())
k1 k2 k3 k4
0 1 A X NaN
1 2 B X NaN
2 3 A Y NaN
3 4 D NaN M
|