logo
down
shadow

Sum and Count Changes per Group for each Column in R


Sum and Count Changes per Group for each Column in R

Content Index :

Sum and Count Changes per Group for each Column in R
Tag : r , By : Chaz
Date : December 05 2020, 12:18 PM

To fix the issue you can do This is the tidyverse approach to the first problem. Hopefully you can use this to approach the second part of your question.
First we convert the data from wide to long with gather from tidyr. I also remove the Timestamp variable, but that's optional.
library(lubridate); library(tidyverse)

df_long <- df %>% 
  gather(ID, Val, -Timestamp)
head(df_long)
            Timestamp                    ID Val
1 2018-08-13 00:00:00 1000 Sensor 2 Panel 1 0.0
2 2018-08-13 00:15:00 1000 Sensor 2 Panel 1 0.7
3 2018-08-13 00:30:00 1000 Sensor 2 Panel 1 1.0
4 2018-08-13 00:45:00 1000 Sensor 2 Panel 1 0.0
5 2018-08-13 01:00:00 1000 Sensor 2 Panel 1 0.7
6 2018-08-13 01:15:00 1000 Sensor 2 Panel 1 1.0

df_long <- df_long %>% 
  mutate(Year = year(Timestamp),
         Month = month(Timestamp),
         Hour = hour(Timestamp)) %>% 
  select(-Timestamp)
df_long <- df_long %>% 
  group_by(ID, Year, Month, Hour) %>% 
  mutate(Turned = ifelse(lag(Val) == 0 & Val != 0, 1, 0))
df_long %>% 
  group_by(ID, Year, Month, Hour) %>% 
  summarise(Sum = sum(Val),
            NTurned = sum(Turned, na.rm = T))

  ID                     Year Month  Hour   Sum NTurned
  <chr>                 <dbl> <dbl> <int> <dbl>   <dbl>
1 1000 Sensor 2 Panel 1  2018     8     0   1.7       1
2 1000 Sensor 2 Panel 1  2018     8     1   2.4       1
3 1000 Sensor 2 Panel 1  2018     8     2   2.7       1
4 1000 Sensor 2 Panel 1  2018     8     3   1.7       1
5 1000 Sensor 2 Panel 2  2018     8     0   1.5       2
6 1000 Sensor 2 Panel 2  2018     8     1   1         2
7 1000 Sensor 2 Panel 2  2018     8     2   1         2
8 1000 Sensor 2 Panel 2  2018     8     3   2         1

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Which is faster: COUNT(DISTINCT Column) vs. COUNT(*) ... GROUP BY Column


Tag : mysql , By : Saul
Date : March 29 2020, 07:55 AM
around this issue SELECT COUNT(*) FROM table GROUP BY column returns the number of rows for each value of the grouped by column, not the number of groups.
SELECT COUNT(DISTINCT column) FROM table returns the number of groups (although you can also get this using the row count from the GROUP BY query).

Rails group a table by :created_at, returning a count of the :status column, and then sub-group the :status column with


Tag : sql , By : Malikul
Date : March 29 2020, 07:55 AM
Hope that helps I'm adding a second answer with a different approach that I believe to be much better in that it is efficient and can be translated into a DB view.
Any time I end up with lots of repeated hits on the DB or large, complex queries that don't translate well, I look to use pure SQL as that can then be used as a view in the DB. I asked this question because my SQL is poor. I think this can be adapted to your needs, especially if the "status" field is a know set of possible values. Here's how I would try it initially:
SELECT created_at, count(status) AS total,
sum(case when status = 'error' then 1 end) AS errors,
sum(case when status = 'pending' then 1 end) AS pending,
sum(case when status = 'sent' then 1 end) AS sent
FROM notifications
GROUP BY created_at;
| created_at       |total|errors|pending|sent|
----------------------------------------------
| Mon, 05 Oct 2015 |2572 |500   |12     |null|
| Tue, 06 Oct 2015 |555  |null  |12     |50  |
@stats = Notification.where(user: users).find_by_sql("SELECT created_at, count(status) 
  AS total,
  sum(case when status = 'error' then 1 end) AS errors,
  sum(case when status = 'pending' then 1 end) AS pending,
  sum(case when status = 'sent' then 1 end) AS sent
  FROM notifications
  GROUP BY created_at;")
=> [#< Notification id: nil, created_at: "2014-02-07 22:36:30">
#< Notification id: nil, created_at: "2014-06-26 02:07:51">,
#< Notification id: nil, created_at: "2015-04-26 21:37:09">,
#< Notification id: nil, created_at: "2014-02-07 22:48:29">,
#< Notification id: nil, created_at: "2014-11-04 23:39:07">,
#< Notification id: nil, created_at: "2015-01-27 17:46:50">,...]
@stats.each do |daily_stats|
  puts daily_stats.attributes
end

#{"created_at" => "Mon, 05 Oct 2015", "total" = 2572, "errors" => 500, "pending" => 12, "sent" => nil}
#{"created_at" => "Tue, 06 Oct 2015", "total" = 555, "errors" => nil, "pending" => 12, "sent" => 50}
@stats[0].created_at
  #=> "Mon, 05 Oct 2015"

@stats[1].pending
  #=> 12
CREATE VIEW daily_stats AS
SELECT user_id, created_at, count(status) AS total,
   sum(case when status = 'error' then 1 end) AS errors,
   sum(case when status = 'pending' then 1 end) AS pending,
   sum(case when status = 'sent' then 1 end) AS sent
FROM notifications
GROUP BY user_id, created_at;
Select * FROM daily_stats;
class DailyStat < ActiveRecord::Base
  belongs_to :user
  #this is a model for a view in the DB called dash_views
  #class name is singular and will automatically look for the table "daily_stats" which his snake_case and plural.
end
class User < ActiveRecord::Base
  has_many :daily_stats
end
users = [2]
DailyStat.where(user: users)
   => AllStat Load (2.8ms)  SELECT "all_stats".* FROM "all_stats" WHERE "all_stats"."category_id" = 2
   => [ #<AllStat user_id: 2, created_at: "2014-02-14 00:30:24", total: 300, errors: 23, pending: nil, sent: 3>,
        #<AllStat user_id: 2, created_at: "2014-11-29 00:18:28", total: 2454, errors: 3, pending: 45, sent: 323>,
        #<AllStat user_id: 2, created_at: "2014-02-07 22:46:59", total: 589, errors: 33, pending: 240, sent: 68>...]
user = User.first
user.daily_stats
 #returns array of that users DailyStat objects.

Group by multiple columns, get group total count and specific column from last two rows in each group


Tag : sql , By : Jonathan
Date : March 29 2020, 07:55 AM
around this issue I have an SQL Server table with the following columns: , I would attempt this by using the following WITH clause:
WITH RUL AS (
select
  UserId,
  Area,
  Action,
  ObjectId,
  RelatedUserLink as RelatedUserLink1,

  LAG(RelatedUserLink) OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created) as RelatedUserLink2,

  ROW_NUMBER() OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created DESC) latest_to_earliest,

  MAX(Created) OVER (PARTITION BY UserId, Area, Action, ObjectId) as Created,

  COUNT(*) OVER OVER (PARTITION BY UserId, Area, Action, ObjectId) as Count

from
  Notification
where UserId = 10
)
select 
  UserId,
  Area,
  Action,
  ObjectId,
  RelatedUserLink1,
  RelatedUserLink2,
  Created,
  Count
from 
  RUL 
where 
  latest_to_earliest = 1;

How to do group by and take Count of one column divide by count of unique of second column of data frame in python panda


Tag : python , By : FriendL
Date : March 29 2020, 07:55 AM
hop of those help? It seems you need aggregate by size and nunique and then div output columns:
df = pd.DataFrame({'col1':[1,1,1],
                   'col2':[4,4,6],
                   'col3':[7,7,9],
                   'col4':[3,3,5]})

print (df)
   col1  col2  col3  col4
0     1     4     7     3
1     1     4     7     3
2     1     6     9     5

df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
           col4  col3  result_col
col1 col2                        
1    4        1     2         2.0
     6        1     1         1.0

Count unique values of a column by pairwise combinations of another column and group by third column in R


Tag : r , By : afarouk
Date : March 29 2020, 07:55 AM
hop of those help? Not much tested idea, but this is what comes to mind first with data.table:
library(data.table)
dt <- data.table(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))
dt.cj <- merge(dt, dt, by ="Location", all = T, allow.cartesian = T)
dt.res <- dt.cj[Product.x < Product.y, .(cnt = length(unique(Reg.ID.x))),by = .(Location, Product.x, Product.y)]


#    Location Product.x Product.y cnt
# 1:        X         A         B  2
# 2:        Y         A         B  1
# 3:        Y         A         C  1
# 4:        Y         B         C  1
Related Posts Related QUESTIONS :
  • R: Difference between the subsequent ranks of a item group by date
  • Match data within multiple time-frames with dplyr
  • Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop
  • Conditional formula referring to preview row in DF not working
  • Set hoverinfo text in plotly scatterplot
  • Histogram of Sums from Categorical/Binary Data
  • Efficiently find set differences and generate random sample
  • Find closest points from data set B to point in data set A, using lat long in R
  • dplyr join on column A OR column B
  • Replace all string if row starts with (within a column)
  • Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?
  • How can I extract bounding boxes in a row-wise manner using R?
  • How do I easily sum up values in different columns?
  • Reading numeric Date value from CSV file to data.frame in "R"
  • R programming: creating a stacked bar graph, with variable colors for each stacked bar
  • How to identify all columns that contain binary representation
  • Filter different groups by different factor levels
  • Saving .xlsx file to disc, form http post request
  • Add an "all" option under the filter that selects the number of rows displayed in a datatable
  • How to select second column of every xts in list
  • Generate a frequency dataframe out of an input dataframe
  • Why manual autocorrelation does not match acf() results?
  • Merge 3 dataframes which are different to each other
  • remove adjacent duplicates from string
  • How to change the position of stacked stacked bar chart in ggplot in R?
  • How to divide each of a range a variables by a second range of variables in R
  • Why do I need to assemble vector before scaling in Spark?
  • How to select individuals which appear in multiple groups?
  • How can I fill columns based on values in another column?
  • 32 bit R and 64 bit R: output differs
  • Remove a single backslash in paste0 output
  • ggplot2 different label for the first break
  • TSP in R, with given distances
  • How to find the given value from the range of values?
  • Solution on R group by issue _ multiple combination
  • Transform multiple columns with a function that uses different arguments per column
  • How can I parse a string with the format "1/16/2019 1:24:51" into a POSIXct or other date variable?
  • How to plot a box plot in R for outlier detection for a huge number of rows?
  • How to change column name according to another dataframe in R?
  • `sjPlot::tab_df()`--how to set the number of decimal places?
  • time average for specific time range in r
  • joining dataframes by closest time and another key in r
  • How to create nested for loop for a certain range
  • New category based on sequence of date ranges
  • how to extract formula from coxph model summary in R?
  • add row based on variable condition in R
  • Generating the sequence 111122222333334
  • Unable to use has_goog_key() in R
  • how to multiply each row with a scaler in corresponding column?
  • R is not recognizing levels of a factor as the same. Is there a way to do this?
  • Calculating mean of replicate experiment result values in a column based on multiple columns using R
  • Best method to extract the first instance of a string between specified keywords using data.table
  • ignore optional combination of alphanumeric characters in str_extract
  • Why tracemem shows two copies when modification occurs inside function body?
  • Can't use mppm on multitype point patterns
  • How to move selected matrix rows to top of matrix based on a selection vector of row names
  • Combining expressions with a common operator
  • Passing string through multiple filters for matching
  • Convert two columns in R to rows of unique occurrence
  • How to create a dataframe using a function based on user-input?
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com