Extract a data frame using model.frame and formula
Date : March 29 2020, 07:55 AM
To fix the issue you can do I want to extract a data frame using a formula, which specifies which columns to select and some crossing overs among columns. , You can use model.matrix: > model.matrix(f, df)
(Intercept) x y x:y
1 1 1 2 2
2 1 2 3 6
3 1 3 4 12
4 1 4 7 28
attr(,"assign")
[1] 0 1 2 3
> mat <- model.matrix(f, df)
> library(Matrix)
> Matrix(mat, sparse = TRUE)
4 x 4 sparse Matrix of class "dgCMatrix"
(Intercept) x y x:y
1 1 1 2 2
2 1 2 3 6
3 1 3 4 12
4 1 4 7 28
|
retrieve normal data frame after pivoting a data frame
Date : March 29 2020, 07:55 AM
hope this fix your issue When you call pivot_table, make sure you specify the values parameter: df.pivot_table(index=['time', 'name'], columns=['feature_type'],
values='feature_value')
result = df.pivot_table(index=['time', 'name'],
columns=['feature_type'],
values='feature_value').reset_index()
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 10
df = pd.DataFrame(
{'time': np.random.choice(pd.date_range('2016-05-10', '2016-05-12'), size=N),
'name': np.random.choice(['Clay', 'John', 'Mary', 'Boby', 'Lucy'], size=N),
'feature_type': np.random.choice(['f{}'.format(i) for i in range(1,6)], size=N),
'feature_value': np.random.randint(100, size=N)})
orig = df.pivot_table(index=['time', 'name'], columns=['feature_type'])
print(orig)
alt = df.pivot_table(index=['time', 'name'],
columns=['feature_type'],
values='feature_value').reset_index()
alt.columns.name = None
print(alt)
feature_value
feature_type f1 f2 f3 f4 f5
time name
2016-05-10 John NaN 50.0 NaN NaN 91.0
Lucy NaN NaN NaN 28.0 NaN
Mary NaN NaN 19.0 NaN 27.0
2016-05-11 Clay 2.0 NaN NaN NaN NaN
Lucy 24.0 NaN NaN NaN NaN
2016-05-12 Boby NaN 16.0 NaN NaN NaN
John NaN NaN NaN NaN 62.0
Mary NaN NaN NaN 84.0 NaN
time name f1 f2 f3 f4 f5
0 2016-05-10 John NaN 50.0 NaN NaN 91.0
1 2016-05-10 Lucy NaN NaN NaN 28.0 NaN
2 2016-05-10 Mary NaN NaN 19.0 NaN 27.0
3 2016-05-11 Clay 2.0 NaN NaN NaN NaN
4 2016-05-11 Lucy 24.0 NaN NaN NaN NaN
5 2016-05-12 Boby NaN 16.0 NaN NaN NaN
6 2016-05-12 John NaN NaN NaN NaN 62.0
7 2016-05-12 Mary NaN NaN NaN 84.0 NaN
|
Retrieve rows from data frame for partial matching in column of the data frame with elements in list
Date : March 29 2020, 07:55 AM
hop of those help? Another simplier solution with str.split and DataFrame.isin with boolean indexing: gene_list = ['ARF3', 'ABC']
df1 = df.gene_name.str.split(',', expand=True)
mask = df1.isin(gene_list)
s = df1[mask].dropna(how='all').apply(lambda x: x[x.first_valid_index()], axis=1)
s.name='new'
print (s)
0 ARF3
1 ABC
2 ARF3
3 ARF3
4 ARF3
Name: new, dtype: object
print (df.join(s).dropna(subset=['new']))
chr gene_name new
0 1 ARF3 ARF3
1 1 ABC ABC
2 1 ARF3,ENSG123 ARF3
3 1 ENSG1245,ARF3,ENSG89 ARF3
4 1 ENSG,ARF3 ARF3
gene_list = ['ARF3', 'ABC']
#new dafarame with splited values
df1 = df.gene_name.str.split(',', expand=True)
#mask - True where is desired value
mask = df1.isin(gene_list)
#find first valid value in dataframe and create serie by these values
s = df1[mask].dropna(how='all').apply(lambda x: x[x.first_valid_index()], axis=1)
s.name='new'
print (s)
0 ARF3
1 ABC
2 ARF3
3 ARF3
4 ARF3
Name: new, dtype: object
#join series to filtered dataframe - create new column
print (df[mask.any(1)].join(s))
chr gene_name new
0 1 ARF3 ARF3
1 1 ABC ABC
2 1 ARF3,ENSG123 ARF3
3 1 ENSG1245,ARF3,ENSG89 ARF3
4 1 ENSG,ARF3 ARF3
|
How to Retrieve Specific data.frame combination by using another Index data.frame?
Date : March 29 2020, 07:55 AM
I hope this helps you . I'm doing a data validation Project in R. After calculations I have produced 2 Dataframes as following : , Here's an option: i <- t(indices)
data.frame(Name = registry[i[,1],1], Grade = registry[i[,2],2])
# Name Grade
#1 Joshi 7
#2 Rahul 2
#3 Sharma 7
as.data.frame(Map(`[`, registry, as.data.frame(i)))
# Name Grade
#1 Joshi 7
#2 Rahul 2
#3 Sharma 7
|
how to retrieve data from data frame 1 contents that do not have in data frame 2 in Scala
Tag : scala , By : antonio
Date : March 29 2020, 07:55 AM
To fix this issue There is except function that should solve the requirement you have. just do df1.except(df2)
+------------------------------------+------------------+
|REQ_ID |PRS_ID |
+------------------------------------+------------------+
|048022cc-9c26-4c0d-a9a8-551f4a364510|999999000185298297|
|d2824085-65d3-432f-a4dd-73e31453733a|999999000185266094|
|9c642932-7a95-4bfe-ae75-687af9151fc8|990000000061356494|
|999999000185425636asdasd12321312321 |999999000185425636|
|cd66629d-14db-42df-a558-49e78c3ae320|999999000185320831|
|dc8b5731-8d1a-4394-ae9d-f74098462be4|999999000185250909|
|be1e63ce-cdf6-407d-abf3-f818e0872e92|999999000185254510|
|999999000185392677asdasd12321312321 |999999000185392677|
+------------------------------------+------------------+
df1.except(df2).dropDuplicates("REQ_ID", "PRS_ID")
df1.except(df2).dropDuplicates(Seq("REQ_ID", "PRS_ID"))
|