logo
down
shadow

selecting values from a df based on multiple percentages from a different dataframe


selecting values from a df based on multiple percentages from a different dataframe

Content Index :

selecting values from a df based on multiple percentages from a different dataframe
Tag : r , By : Frank Rotolo
Date : November 28 2020, 04:01 AM

fixed the issue. Will look into that further I am trying to store values from one df into a new dataframe based on percentages from another df. , The dputs you provided:
df1 <- structure(list(Seq = structure(c(1L, 2L, 2L, 3L, 3L), .Label = 
                                 c("AAAAAACCAGTCCCAGTTCGGATTG", 
                                   "AAAAAACCAGTCTCAGTTCGGATTG", "AAAAAACCGGTCACAGTTCAGATTG"), class = 
                                 "factor"), 
               loc = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("b", 
                                                                 "t"), class = "factor"), Ball = c(0, 0, 0, 0, 0), Cat = c(0, 
                                                                                                                           0, 0, 16.6666666666667, 16.6666666666667), Square = c(0, 
                                                                                                                                                                                 0, 0, 0, 0), Water = c(0, 0, 0, 33.3333333333333, 33.3333333333333
                                                                                                                                                                                 )), row.names = c(NA, -5L), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                       "tbl", "data.frame"), vars = c("Seq", "loc"), drop = TRUE, indices = 
            list(
              0L, 1L, 2L, 3L, 4L), group_sizes = c(1L, 1L, 1L, 1L, 1L), 
          biggest_group_size = 1L, labels = structure(list(
            Seq = structure(c(1L, 2L, 2L, 3L, 3L), .Label = 
                              c("AAAAAACCAGTCCCAGTTCGGATTG", 
                                "AAAAAACCAGTCTCAGTTCGGATTG", "AAAAAACCGGTCACAGTTCAGATTG"), class = 
                              "factor"), 
            loc = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("b", 
                                                              "t"), class = "factor")), row.names = c(NA, -5L), class = "data.frame", 
            vars = c("Seq", 
                     "loc"), drop = TRUE))


df2 <- structure(list(Type = c("Ball", "Cat", "Square", "Water"), n = c(4L, 
                                                                 6L, 3L, 6L), `n/2` = c(50, 50, 66.6666666666667, 50), `1/n` = c(25, 
                                                                                                                                 16.6666666666667, 33.3333333333333, 16.6666666666667)), row.names = c(NA, 
                                                                                                                                                                                                       -4L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)

df3 <- sapply(1:length(df2$Type), 
       function(y) sapply(df1[, df2$Type][,y], 
                          function(x) ifelse(x < df2[y, c("1/n")], "1", 
                                             ifelse(x > df2[y, c("n/2")], "3", 
                                                    "2")) ))

df3 <- df3 %>% as_data_frame %>% 
  setNames(df2$Type) %>% 
  add_column(Seq = df1[,c("Seq")], loc = df1[,c("loc")], .before = 1)

df3
# A tibble: 5 x 6
  Seq                       loc   Ball  Cat   Square Water
  <fct>                     <fct> <chr> <chr> <chr>  <chr>
1 AAAAAACCAGTCCCAGTTCGGATTG t     1     1     1      1    
2 AAAAAACCAGTCTCAGTTCGGATTG b     1     1     1      1    
3 AAAAAACCAGTCTCAGTTCGGATTG t     1     1     1      1    
4 AAAAAACCGGTCACAGTTCAGATTG b     1     2     1      2    
5 AAAAAACCGGTCACAGTTCAGATTG t     1     2     1      2 

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

selecting rows based on multiple column values in pandas dataframe


Tag : python , By : Lucyberad
Date : March 29 2020, 07:55 AM
hope this fix your issue I have a pandas DataFrame df: , I think below should do it, but its elegance is up for debate.
new_df = old_df[((old_df['C1'] > 0) & (old_df['C1'] < 20)) & ((old_df['C2'] > 0) & (old_df['C2'] < 20)) & ((old_df['C3'] > 0) & (old_df['C3'] < 20))]

Python Pandas: Spread total values in dataframe based on percentages back to categories


Tag : python , By : Yst
Date : March 29 2020, 07:55 AM
it should still fix some issue Don't know if you can do this more elegantly, but you could do it like this:
>>> df_tot = df[df['Cat'] == 'tot cat'].reindex(index=df.index, method='backfill')
>>> for x in df.columns:
        if 'val' in x:
            df[x] = df['Percentage'] * df_tot[x] / 100

   Country State City      Cat  Total_amount  Percentage    val1     val2    val3     val4
0       US    FL  MIA     cat1           100       10.00    20.0    3.000   40.00   12.000
1       US    FL  MIA     cat2           850       85.00   170.0   25.500  340.00  102.000
2       US    FL  MIA     cat3            50        5.00    10.0    1.500   20.00    6.000
3       US    FL  MIA  tot cat          1000      100.00   200.0   30.000  400.00  120.000
4       US    GA  ATL     cat1           200       40.00    40.0   20.000    8.00   12.000
5       US    GA  ATL     cat2           300       60.00    60.0   30.000   12.00   18.000
6       US    GA  ATL  tot cat           500      100.00   100.0   50.000   20.00   30.000
7       US    NY   NY  tot cat           100      100.00     0.0   20.000    5.00   15.000
8   Canada    MB  WPG     cat1           250       50.00    25.0  275.000   20.00   80.000
9   Canada    MB  WPG     cat2           250       50.00    25.0  275.000   20.00   80.000
10  Canada    MB  WPG  tot cat           500      100.00    50.0  550.000   40.00  160.000
11  Canada    QC  YUL     cat1           500       33.33   333.3   83.325  166.65   19.998
12  Canada    QC  YUL     cat2          1000       66.66   666.6  166.650  333.30   39.996
13  Canada    QC  YUL  tot cat          1500      100.00  1000.0  250.000  500.00   60.000

Selecting rows from a Dataframe based on values in multiple columns in pandas


Tag : python , By : Alex
Date : March 29 2020, 07:55 AM
Any of those help There is only a very small change needed in your code: change the and with & (and add parentheses for correct ordering of comparisons):
In [104]: df.loc[(df['A'] == 'foo') & (df['B'] == 'one')]
Out[104]:
     A    B  C   D
0  foo  one  0   0
6  foo  one  6  12

Selecting rows from a Dataframe based on values from multiple columns in pandas


Tag : python , By : user87752
Date : March 29 2020, 07:55 AM
Does that help I think I understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:
In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]: 
     A    B
2  foo  two
4  foo  two
5  bar  two
In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'
In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]: 
B
one      foobarfoo
three       barfoo
two      foofoobar
dtype: object
In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]: 
     A    B  C  D
2  foo  two  2  4
4  foo  two  4  8

selecting values from a pandas dataframe based on row and column labels stored in a different dataframe


Tag : python , By : Anthony Eden
Date : March 29 2020, 07:55 AM
hope this fix your issue Use lookup:
df1['PD'] = df2.lookup(df1.RATING,df1.TENOR.astype(str))
     RATING  TENOR        PD
ID                          
1234    BBB    2.0  0.005100
2345    BB+    1.5  0.009106
3456   BBB-    1.0  0.002800
Related Posts Related QUESTIONS :
  • Slope of time series (xts) object over rolling window
  • Is there an R function for comparing rows in data.frame?
  • Changing linetype and line color with plot_model()
  • Update existing package on CRAN
  • Delete NA data ,but with certain condition in R
  • calculate number and names of similar sounding words from a data frame
  • Reset input fields of dynamically generated widgets through insertUI
  • How to get the coordinates that spits out min/max value from the function in R?
  • running t.test() on multiple columns to output tibble
  • Conditionally replace the values in columns to value in another column using dplyr
  • Add count as label to points in geom_count
  • Temporarily Disable Rprofile
  • Select certain region of column for lm
  • Convert multiple rows into one row depending on unique values in another column
  • Issues installing Plotly Dash for R
  • Is there an R function to retrieve values from a matrix of column names?
  • R;Too slow to overate loops for million vectors
  • How to optimize intersect of rows and columns in a matrix?
  • Format and export the output of Mann-Kendall test in R to excel from Rstudio
  • how to calculate cumsum with depreciation in a grouped dataframe?
  • reshape wide to long based on part of column name
  • How to get a hyperlink for the words in a description in an r dataframe?
  • shinymeta works locally but breaks when published to shinyapps.io
  • Deparse and (un)escape quotes
  • Regression table with clustered standard errors in R jupyter notebook?
  • Disaggregate quarterly data to daily data in R keeping values?
  • How to save output to console and file simultaneously in RStudio server?
  • Why does data.table j have a different environment when directly calling mget() vs calling mget() inside another functio
  • scale_fill_viridis_c color bar on a log scale
  • How to change the lab name corresponding to function in ggplot
  • R, filtering for an element in a list in a dataframe cell
  • Extracting only bottom temperature from 4d NetCDF file
  • How to add/wrap lines of text to .tex with .sh script
  • R - building new variables from sequenced data
  • Sum rows values one after the other
  • Nesting ifelse inside summarytools
  • How best to divide different levels of a factor by one another in dataframe in R?
  • Why does my code run multiple times before I type data into the table? How do I make an action button that creates a tab
  • How to impute missing values not at random?
  • Set the y limits of an added average line of a plotly plot
  • how to calculate a new column after grouping with dplyr
  • Extract data from rows creating new columns using R
  • Create a filled area line plot with plotly
  • When do I need parentheses around an if statement to control the sequence of a formula in R?
  • my graph in ggplot2 contains an "e" character in y-axis
  • Making variables immutable in R
  • R: Difference between the subsequent ranks of a item group by date
  • Match data within multiple time-frames with dplyr
  • Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop
  • Conditional formula referring to preview row in DF not working
  • Set hoverinfo text in plotly scatterplot
  • Histogram of Sums from Categorical/Binary Data
  • Efficiently find set differences and generate random sample
  • Find closest points from data set B to point in data set A, using lat long in R
  • dplyr join on column A OR column B
  • Replace all string if row starts with (within a column)
  • Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?
  • How can I extract bounding boxes in a row-wise manner using R?
  • How do I easily sum up values in different columns?
  • Reading numeric Date value from CSV file to data.frame in "R"
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com