Comparing two columns in a data frame across many rows
Tag : r , By : adbanginwar
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , I have a data frame that I'm working with in which I'd like to compare a data point Genotype with two references S288C and SK1. This comparison will be done across many rows (100+) of the data frame. Here are the first few lines of my data frame: , A nested ifelse should do it (take a look at help(ifelse) for usage): ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
> dat
Genotype S288C SK1
[1,] "G" "A" "G"
[2,] "G" "A" "G"
[3,] "C" "T" "C"
[4,] "G" "A" "G"
[5,] "G" "G" "T"
[6,] "G" "A" "A"
> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1] 0 0 0 0 1 NA
|
Comparing two data.frames and deleting rows based on NA values in one data.frame
Date : March 29 2020, 07:55 AM
will be helpful for those in need I have two data frames. One is considered a reference and has every value, the other may or may not be missing values. I want to compare both data frames, then delete the values from the reference data frame that have NA in the other. However, each row of the data frame that can have missing values needs to be treated as a single comparison so you are developing a unique reference for every single row. For example the reference dataframe(1): , Try: > ref<-data.frame(var1=c('a','q','z'),var2=c('b','w','x'),var3=c('c','e','n'))
> new<-data.frame(var1=c('p','u',NA,'l'),var2=c('o','y','e','k'),var3=c('i','t','w',NA))
> apply(new,1,function(x) ref[,which(!is.na(x))] )
[[1]]
var1 var2 var3
1 a b c
2 q w e
3 z x n
[[2]]
var1 var2 var3
1 a b c
2 q w e
3 z x n
[[3]]
var2 var3
1 b c
2 w e
3 x n
[[4]]
var1 var2
1 a b
2 q w
3 z x
is.odd <- function(x) x %% 2 == 1
apply(new, 1, function(x) {
toremove <-which(is.na(x))
toremove1<-sapply(toremove,function(x) ifelse(is.odd(x),x+1,x-1) )
ref[,!(1:ncol(ref) %in% c(toremove,toremove1)),drop=F]
})
|
Comparing data frame rows containing NAs
Date : March 29 2020, 07:55 AM
it fixes the issue One option would be to create an | condition to get those rows having NA for 'x' subset(my.df, x != y | is.na(x))
subset(my.df, x != y | is.na(x)|is.na(y))
subset(my.df, (x != y | is.na(x)|is.na(y)) & !(is.na(x) & is.na(y)))
|
Comparing one value across multiple rows in one data frame with values across multiple rows in a second data frame
Date : March 29 2020, 07:55 AM
I hope this helps . Scenario: , Here's an answer with dplyr: library(dplyr)
df1 <- tribble(
~CHR, ~POS,
1, 2000,
1, 3000,
2, 1500,
3, 3000
)
df2 <- tribble(
~CHR, ~POS_START, ~POS_END,
1, 1500, 2500,
1, 3200, 4000,
2, 1200, 1600,
2, 2000, 2200,
3, 5000, 5500,
4, 1000, 1200
)
df1 %>%
left_join(df2, by = 'CHR') %>%
mutate(IN_RANGE = POS >= POS_START & POS <= POS_END) %>%
group_by(CHR, POS) %>%
summarize(IN_RANGE = sum(IN_RANGE) > 0)
|
comparing each row with all other rows in data.frame
Date : March 29 2020, 07:55 AM
wish helps you Here is an option using base R by making use of table and crossprod. Set the lower triangular values of the matrix output of crossprod to NA, convert it to 'long' format by converting to data.frame and then subset the rows that are non-NA for 'Freq' column out <- with(df, crossprod(table(paste(category, value), ID)))
out[lower.tri(out, diag = TRUE)] <- NA
subset(as.data.frame.table(out), !is.na(Freq))
# ID ID.1 Freq
#4 ID1 ID2 2
#7 ID1 ID3 1
#8 ID2 ID3 2
df <- structure(list(ID = c("ID1", "ID1", "ID1", "ID2", "ID2", "ID2",
"ID3", "ID3", "ID3"), category = c("length", "type", "color",
"length", "type", "color", "length", "type", "color"),
value = c("100",
"L", "Blue", "100", "M", "Blue", "150", "M", "Blue")),
class = "data.frame", row.names = c(NA, -9L))
|