How to create a function that will split continuous variables only to groups equal size groups
Tag : r , By : Bart van Bragt
Date : March 29 2020, 07:55 AM
it should still fix some issue I would like to run a function over my data frame that will find only continuous variables and add new categorial variables based on dividing the continuous variables to 2 equal size groups. I have a code that I use to split a variable to groups and add it as anew categorial variable but when I tried to use it in a function it does't work.What could be the problem? Also, how can I avoid from running over non continuous variables? Here is a toy data frame: , Here are some possible problems in your function for (i in names(df)) function (x) { as.factor( as.numeric( cut(df$i,2))) }
lst <- vector('list', ncol(df))
for(i in seq_along(df)) {
lst[[i]] <- as.factor(as.numeric(cut(df[,i], 2)))
}
df[paste0(names(df), 'new')] <- lst
df[paste0(names(df), 'new')] <- lapply(df, function(x)
factor(cut(x, 2, labels=FALSE)))
indx <- vapply(df2, function(x) !all(x %in% 0:1) & is.numeric(x), logical(1L))
lst <- vector('list', ncol(df2[indx]))
for(i in seq_along(df2[indx])) {
lst[[i]] <- as.factor(as.numeric(cut(df2[indx][,i], 2)))
}
df2[paste0(names(df2)[indx], 'new')] <- lst
df2[paste0(names(df2)[indx], 'new')] <- lapply(df2[indx],
function(x) factor(cut(x, 2, labels=FALSE)))
set.seed(24)
df1 <- data.frame(col1=sample(0:1, 10, replace=TRUE),
col2=rnorm(10), col3=letters[1:10])
#df - OP's dataset
df2 <- cbind(df1, df)
|
Matlab: zero groups of non-zero elements in a matrix based on group size
Tag : matlab , By : user183676
Date : March 29 2020, 07:55 AM
wish helps you Essentially I have binary, 3D image masks with the "1"'s in them in groups of various shapes and sizes spread throughout the mask. Working in matlab, I've got tools that allow me to convert this into a matrix, and what I'm looking to do is go through the matrix and zero blobs of 1's (i.e. adjacent sets of non-zero numbers which are surrounded by 0's) if the total size of that group is less than a given number of elements (say 30). Is there a pre-existing function that will do this, or am I going to need to get involved with kernels and the like? , Fortunately, Matlab has a function for that: bwareaopen maskWithOnlyBigObjects = bwareaopen(mask, 30);
maskWithOnlyBigObjects = bwareaopen(mask, 30, conndef(6));
|
Hive - Create map columns type by aggregating values across groups
Tag : sql , By : TheDave1022
Date : March 29 2020, 07:55 AM
should help you out This can be accomplished using a series of self-joins to find other rooms in the same category before combining the results into 2 maps. Code CREATE TABLE `table` AS
SELECT 1 AS customer, 'A' AS category, 'aa' AS room, 'd1' AS `date` UNION ALL
SELECT 1 AS customer, 'A' AS category, 'bb' AS room, 'd2' AS `date` UNION ALL
SELECT 1 AS customer, 'B' AS category, 'cc' AS room, 'd3' AS `date` UNION ALL
SELECT 1 AS customer, 'C' AS category, 'aa' AS room, 'd1' AS `date` UNION ALL
SELECT 1 AS customer, 'C' AS category, 'bb' AS room, 'd2' AS `date` UNION ALL
SELECT 2 AS customer, 'A' AS category, 'aa' AS room, 'd3' AS `date` UNION ALL
SELECT 2 AS customer, 'A' AS category, 'bb' AS room, 'd4' AS `date` UNION ALL
SELECT 2 AS customer, 'C' AS category, 'bb' AS room, 'd4' AS `date` UNION ALL
SELECT 2 AS customer, 'C' AS category, 'ee' AS room, 'd5' AS `date` UNION ALL
SELECT 3 AS customer, 'D' AS category, 'ee' AS room, 'd6' AS `date`
;
SELECT
customer_rooms.customer,
collect(customer_rooms.room, customer_rooms.date) AS map_customer_room_date,
collect(
COALESCE(customer_category_rooms.room, category_rooms.room),
COALESCE(customer_category_rooms.date, category_rooms.date)) AS map_category_room_date
FROM `table` AS customer_rooms
JOIN `table` AS category_rooms ON customer_rooms.category = category_rooms.category
LEFT OUTER JOIN `table` AS customer_category_rooms ON customer_rooms.customer = customer_category_rooms.customer
AND category_rooms.category = customer_category_rooms.category
AND category_rooms.room = customer_category_rooms.room
WHERE (
customer_rooms.customer = customer_category_rooms.customer AND
customer_rooms.category = customer_category_rooms.category AND
customer_rooms.room = customer_category_rooms.room AND
customer_rooms.date = customer_category_rooms.date
)
OR (
customer_category_rooms.customer IS NULL AND
customer_category_rooms.category IS NULL AND
customer_category_rooms.room IS NULL AND
customer_category_rooms.date IS NULL
)
GROUP BY
customer_rooms.customer
;
1 {"aa":"d1","bb":"d2","cc":"d3"} {"aa":"d1","bb":"d2","cc":"d3","ee":"d5"}
2 {"aa":"d3","bb":"d4","ee":"d5"} {"aa":"d3","bb":"d4","ee":"d5"}
3 {"ee":"d6"} {"ee":"d6"}
FROM `table` AS customer_rooms
JOIN `table` AS category_rooms ON customer_rooms.category = category_rooms.category
LEFT OUTER JOIN `table` AS customer_category_rooms ON customer_rooms.customer = customer_category_rooms.customer
AND category_rooms.category = customer_category_rooms.category
AND category_rooms.room = customer_category_rooms.room
WHERE (
customer_rooms.customer = customer_category_rooms.customer AND
customer_rooms.category = customer_category_rooms.category AND
customer_rooms.room = customer_category_rooms.room AND
customer_rooms.date = customer_category_rooms.date
)
OR (
customer_category_rooms.customer IS NULL AND
customer_category_rooms.category IS NULL AND
customer_category_rooms.room IS NULL AND
customer_category_rooms.date IS NULL
)
collect(customer_rooms.room, customer_rooms.date) AS map_customer_room_date,
collect(
COALESCE(customer_category_rooms.room, category_rooms.room),
COALESCE(customer_category_rooms.date, category_rooms.date)) AS map_category_room_date
|
Delete from PC table the computers having minimal hdd size or minimal ram size
Date : March 29 2020, 07:55 AM
seems to work fine Your EXISTS will just delete anything from the table where the EXISTS condition is true. You can find out more here. You need to delete only the records you're after, which points to a window function. You can find out more info here. BEGIN TRAN;
DELETE p FROM PC p
INNER JOIN
(
SELECT Code,
ROW_NUMBER() OVER (PARTITION BY model ORDER BY hd DESC, ram DESC) [RNum]
) m ON m.Code = p.Code AND m.RNum = 1;
--COMMIT TRAN;
--ROLLBACK TRAN;
|
Python: Summarizing & Aggregating Groups and Sub-groups in DataFrame
Date : March 29 2020, 07:55 AM
Hope this helps Use DataFrame.melt with GroupBy.agg and tuples for aggregate functions with new columns names: df1 = (df.melt('interval', var_name='source')
.groupby(['interval','source'])['value']
.agg([('cnt','count'), ('average','mean')])
.reset_index())
print (df1.head())
interval source cnt average
0 0 a 1 5.0
1 0 b 1 0.0
2 0 c 1 0.0
3 0 d 1 0.0
4 0 f 1 0.0
|