logo
down
shadow

How to summarise taking a random value from a categorical column?


How to summarise taking a random value from a categorical column?

Content Index :

How to summarise taking a random value from a categorical column?
Tag : r , By : TheDave1022
Date : November 23 2020, 04:01 AM

like below fixes the issue Since you are using dplyr you can also take advantage of sample_n function, i.e.
library(dplyr)

df %>%
   group_by(spp) %>%
   sample_n(1)
# A tibble: 2 x 2
# Groups:   spp [2]
  spp   values
  <chr>  <dbl>
1 a          2
2 b          9

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

How to plot stacked bar chart to summarise each categorical column for proportion of values


Tag : development , By : kgw
Date : March 29 2020, 07:55 AM
Does that help Get rid of the irrelevant columns. Make all values be in ('Missing', 'Unknown', 'Other'). Call value_countson each column. The count will be nan instead of 0 when a value is not in column so you might want to use fillna(0) at the end. You already have the data you need, just plot it.
-
result = (df[['action', 'action_type', 'action_detail']]
 .where(df.isin(('Missing', 'Unknown')), 'Other')
 .apply(lambda x: x.value_counts(normalize=True))
 .fillna(0))
print(result)

         action  action_type  action_detail
Missing       0          0.5            0.5
Other         1          0.5            0.5

result.T.plot(kind='bar', stacked=True)

Summarise data between given value of a categorical variable


Tag : r , By : mobi phil
Date : March 29 2020, 07:55 AM
will help you You could use cumsum() to make your groupings and then process based on those like this:
df %>% mutate(Agroups = cumsum(categoriesVector == "A")) %>%
    filter(categoriesVector == "B") %>%
    group_by(Agroups) %>%
    summarise(propertyStart = min(propertyVector),
              propertyEnd = max(propertyVector),
              dataTotal = sum(dataVector))

# A tibble: 3 x 4
  Agroups propertyStart propertyEnd dataTotal
    <int>         <dbl>       <dbl>     <dbl>
1       2             3           3       700
2       3             5           7      1200
3       4             9           9       100

dplyr, summarise categorical variable


Tag : r , By : Denis Chaykovskiy
Date : March 29 2020, 07:55 AM
this one helps. You have at least two options to solve this:
Add the Category column to your group_by:
small %>% 
  group_by(Video.ID, cat = Category) %>% 
  summarise(sumr = sum(Partner.Revenue),
            len = mean(Video.Duration..sec.))

# A tibble: 1 x 4
# Groups:   Video.ID [?]
#     Video.ID    cat      sumr   len
#     <chr>       <chr>   <dbl> <dbl>
#   1 ---0zh9uzSE gadgets     0  1184
small %>% 
  group_by(Video.ID) %>% 
  summarise(sumr = sum(Partner.Revenue),
            len = mean(Video.Duration..sec.),
            cat = unique(Category))

# A tibble: 1 x 4
#   Video.ID     sumr   len cat    
#   <chr>       <dbl> <dbl> <chr>  
# 1 ---0zh9uzSE     0  1184 gadgets

Create new column filled with random elements based on a categorical column


Tag : python , By : codelurker
Date : March 29 2020, 07:55 AM
will be helpful for those in need I tried to find a solution using vectors but was unable. This solution iterates through the index and calculates new values for New1 and New2.
This will achieve the result I believe you are looking for.
for i in df.index:
    # Grab the category variable for each row.
    cat = df.loc[i,'Cat']

    # Set column New1
    mask1 = df['Cat'] == cat
    mask2 = df.index != i
    df.at[i,'New1']= df[mask1 & mask2]["ID"].sample().iloc[0]

    # Set column New2
    mask3 = df['Cat'] != cat
    df.at[i,'New2']= df[mask3]["ID"].sample().iloc[0]
 ID Cat  New1  New2
0  87   A  56.0  76.0
1  56   A  87.0  36.0
2  67   A  56.0  76.0
3  76   D  36.0  87.0
4  36   D  76.0  87.0
  ID Cat  New1  New2
0  87   A  67.0  36.0
1  56   A  87.0  36.0
2  67   A  87.0  76.0
3  76   D  36.0  67.0
4  36   D  76.0  67.0

Summarise based on categorical runs


Tag : r , By : Tim Coffman
Date : March 29 2020, 07:55 AM
This might help you We can create groups using lag and cumsum and then calculate statistics for each group.
library(dplyr)

test %>%
  group_by(group = cumsum(fruit != lag(fruit, default = first(fruit)))) %>%
  summarise(fruit = first(fruit), 
            duration = n(), 
            mean_temp = mean(temp)) %>%
  select(-group)

#  fruit  duration mean_temp
#  <fct>     <int>     <dbl>
#1 apple         2      91  
#2 banana        3     101  
#3 guava         4      94.8
#4 apple         3      92  
#5 banana        1      92  
#6 guava         1     101  
group_by(group = data.table::rleid(fruit))
group_by(group = with(rle(as.character(fruit)), rep(seq_along(values), lengths)))
library(data.table)
setDT(test)[, .(duration = .N, fruit = fruit[1L], 
                mean_temp = mean(temp)), by = rleid(fruit)]
Related Posts Related QUESTIONS :
  • Unable to use has_goog_key() in R
  • how to multiply each row with a scaler in corresponding column?
  • R is not recognizing levels of a factor as the same. Is there a way to do this?
  • Calculating mean of replicate experiment result values in a column based on multiple columns using R
  • Best method to extract the first instance of a string between specified keywords using data.table
  • ignore optional combination of alphanumeric characters in str_extract
  • Why tracemem shows two copies when modification occurs inside function body?
  • Can't use mppm on multitype point patterns
  • How to move selected matrix rows to top of matrix based on a selection vector of row names
  • Combining expressions with a common operator
  • Passing string through multiple filters for matching
  • Convert two columns in R to rows of unique occurrence
  • How to create a dataframe using a function based on user-input?
  • How to access the visited vertices in a given shortest path using R igraph
  • Differences in Unicode character output with print()
  • Extracting Function or Objects from a String and then Piping Them with Magrittr/Dplyr
  • renderUI not evaluated until it is rendered
  • Find the maximum absolute value by row in an R data frame
  • Extracting data from irregular lists using purrr:map()
  • transforming data based on range of column in r
  • Identify and subset rows with some similar information
  • converting character from mongolite to timestamp in R
  • Create list from two vectors with every combo of each
  • Error in running a spread because of unique 'key combinations'; combining rows of data
  • visualize numerical strings as a matrixed heatmap
  • how to make a blocked matrix?
  • How to summarize with two functions using with dplyr
  • Dataframe is no longer the same after being saved to Excel and read back in
  • Create duplicate rows using based on availability of data
  • Keep empty groups when grouping with data.table in R
  • Grouping of Event Time Data based on multiple, iterative conditions
  • Formatting Numbers in Flextable for Specific Columns
  • How to store results from for-loop into a dataframe
  • How to select the values in my dataframe which has logical operator "<" (less than), divide them by two, an
  • Rowwise extract data between two strings
  • Convert a string separate by . and +
  • stacking function for values in R
  • dplyr coerces characters to factors
  • How do I use spread and group_by on a single row dataset
  • Replacing values in one matrix with values from another
  • Aggregate data and exclude duplicates in one column
  • Perform an R data.table binary search with OR select
  • How can I include a function in the Standard Deviation parameter of pnorm
  • How to get a tidy excel output of P values from R
  • Rotate boxplot legend (R, ggplot2)
  • dplyr::n() returns “Error: Error: n() should only be called in a data context ”
  • Extract fix columns and one variable column from a list of df´s in R
  • A function that can translate DNA sequence to binary code
  • I want to extract 365 netcdf files using loop
  • rvest vs RSelenium results for text extracting
  • Converting wide data to tall data
  • How to remove vertical white lines when using ggsave in R?
  • R-Shiny error: "renderDataTable" and "server=FALSE"
  • Read csv file with selected rows using data.table's fread
  • how to resolve an error like non numeric argument to binary argument?
  • If value exists in environment
  • R get one value according to some rules in each group
  • Use any apply method to find difference between max and min score for each students
  • subsetting a dataframe by existing object
  • Parsing time formats in R
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com