Merging of 2 factors in R with large no. of levels
Date : March 29 2020, 07:55 AM
This might help you I have a plain text format file data_table_complete of size of 13 GB with over 100 columns in which 1 there is a column related to color. , So lets say this is your data frame df <- data.frame(color = factor(c(rep("red",4), rep("OTHERS", 4),rep("blue", 5), rep("OTHETRS",5))))
table(df$color)
#blue OTHERS OTHETRS red
# 5 4 5 4
df$color <- factor(ifelse(df$color == "OTHERS" | df$color == "OTHETRS", "OTHETRS", as.character(df$color)))
table(df$color)
#blue OTHETRS red
# 5 9 4
|
Merging multiple rows with multiple factors to create a new row in a dataset
Date : March 29 2020, 07:55 AM
I wish this help you My answer uses aggregate() and does not require any other package. Replace df by your data frame name. df$VegType <- factor(df$VegType)
levels(df$VegType) <- list(WoodyVeg=c("Shrub", "Sapling", "Vine"), Forb=c("Forb"),Grass=c("Grass"))
df1<-aggregate(df[,4:13],by=list(df$TranID,df$PT,df$VegType),FUN=sum)
names(df1)<-names(df)
df1[with(df1, order(df1$PT)),]
TranID PT VegType Int1 Int2 Int3 Int4 Int5 Int6 Int7 Int8 Int9 Int10
1 1M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 1M Forb 0 1 0 0 0 0 0 0 0 0
1 1M Grass 1 1 1 0 0 0 0 0 0 0
1 2M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 2M Forb 0 1 0 0 0 0 0 0 0 0
1 2M Grass 1 1 1 0 0 0 0 0 0 0
1 3M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 3M Forb 0 1 0 0 0 0 0 0 0 0
1 3M Grass 1 1 1 0 0 0 0 0 0 0
1 4M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 4M Forb 0 1 0 0 0 0 0 0 0 0
1 4M Grass 1 1 1 0 0 0 0 0 0 0
1 5M WoodyVeg 1 0 2 1 6 9 0 0 5 0
1 5M Forb 0 1 0 0 0 0 0 0 0 0
1 5M Grass 1 1 1 0 0 0 0 0 0 0
|
Recalculate the new weighted mean when merging two factors by group, and keep original data
Date : March 29 2020, 07:55 AM
this will help got to admit it was challenging...you should reconsider the data structure library(tidyverse)
set.seed(123)
df <- data.frame(ID = 1:20,
total_X = runif(20),
min_X = runif(20),
max_X = runif(20),
mean_X = runif(20),
total_Y = runif(20),
min_Y = runif(20),
max_Y = runif(20),
mean_Y = runif(20),
Counts = runif(20)*1000,
category = rep(letters[1:5], 4),
file = as.factor(sort(rep(1:4, 5))))
x <- df %>% bind_rows(
gather(df,metric,value,-ID,-file,-category,-Counts) %>%
mutate(group=str_extract(metric,"[A-Z]$"),metric = str_replace(metric,"_.$","")) %>%
filter(category %in% c('a' , 'b')) %>%
spread(metric,value) %>%
group_by(file,group) %>%
summarise(Counts = mean(Counts),
category = paste0(category,collapse = ''),
max = max(max),
min = min(min),
total = sum(total),
mean = sum(Counts * mean)/sum(Counts)) %>%
ungroup() %>%
gather(metric,value,-file,-group,-category,-Counts) %>%
mutate(metric = paste(metric,group,sep='_'),group=NULL) %>%
spread(metric,value) %>%
mutate(ID=0)
) %>% mutate(ID = row_number())
|
merging incomplete duplicate rows
Date : March 29 2020, 07:55 AM
I hope this helps . If we want to sample a row after grouping by 'dates', 'co.name', we can use that in slice library(dplyr)
df %>%
group_by(dates, co.name) %>%
slice(sample(row_number(), 1))
df %>%
group_by(dates, co.name) %>%
sample_n(1)
|
merging data frames while assigning factors to missing data
Date : March 29 2020, 07:55 AM
|