Summarising Rows by Maximising Two Columns in R data.table

When working with grouped data, you sometimes need to pick one row per group where two numeric columns are simultaneously as large as possible. Imagine a data.table like this:

library(data.table)

# sample data
DT <- rowwiseDT(
  group = c("a", "a", "a", "b", "b", "c"),
  a = c(1, 10, 9, 9, 1, 10),
  b = c(10, 1, 9, 9, 1, 10)
)

DT
#> group a b
#> a     1 10
#> a     10 1
#> a      9 9
#> b      9 9
#> b      1 1
#> c     10 10

Your goal is to summarise this into one row per group with the “best” combination of a and b, i.e. the row where both columns are high. Simply taking the maximum of a and the maximum of b separately doesn’t guarantee they come from the same row. Instead, you need to compare the row sums.

Solution using `.SD[which.max()]`

A concise way is to compute the row sum of a and b and select the row with the maximum sum for each group. In data.table, .SD holds the subset for each group, and which.max() picks the index of the maximum value. Here’s the full solution:

# Select the row with the largest sum of a and b for each group
DT[, .SD[which.max(a + b)], by = group]
#>    group a b
#> 1:     a  9 9
#> 2:     b  9 9
#> 3:     c 10 10

In the example above, group a has rows (1,10), (10,1) and (9,9). Summing a + b gives (11, 11, 18) respectively, so the (9,9) row has the highest combined value and is returned. The same logic applies to groups b and c.

Generalising to more columns

If your table has more numeric columns, you can use rowSums(.SD) to sum all of them:

# Summarise by the maximum row sum across all numeric columns
DT[, .SD[which.max(rowSums(.SD))], by = group, .SDcols = is.numeric]

The .SDcols = is.numeric argument tells data.table to include only numeric columns when computing row sums. This technique works for any number of numeric columns and scales well to larger tables.

Why this works

.SD stands for “Subset of Data” and contains the data for the current group. It behaves like a mini data frame for each group.
which.max() returns the index of the first maximum value. When applied to a + b or rowSums(.SD), it tells data.table which row to return.
Using the row sum ensures you consider both columns simultaneously, which is important when the maxima of individual columns do not occur in the same row.

Conclusion

To summarise grouped data by maximising multiple unrelated columns in R, compute the row sum of those columns and use .SD[which.max(rowSums(.SD))] within data.table. This pattern elegantly picks the row with the highest combined value and generalises to any number of numeric columns.

Free 24 Gig Drum Break Sample Compilation

Skate 4 Founders & Founders Deluxe Items Not Showing Up? Here’s the Fix

Skate 4 Error Code 3967269002 – Connection Lost Fix

How to Fix Skate 4 Error Code 786808283

Fixing CreateProcess error=2: The system cannot find the file specified in Java

Fixing Failed to load ApplicationContext with Cassandra in Spring

Fixing Spring Boot Preferring XML Over JSON After Upgrade

Fixing “Attempt to read property … on null” Errors in Yii2 (PHP 8+)

Owala FreeSip Insulated Stainless Steel Water Bottle Review (24oz, Denim)

HEXEUM Telescope 80 mm Aperture 600 mm – Full Review & Buying Guide

Blink Outdoor 4 – Wireless Smart Security Camera Review

Top 10 Hidden Gems of St. Louis for Visitors

Summarising Rows by Maximising Two Columns in R data.table

Summarising Rows by Maximising Two Columns in R data.table

Solution using `.SD[which.max()]`

Generalising to more columns

Why this works

Conclusion

LEAVE A REPLY Cancel reply

Recent comments

Summarising Rows by Maximising Two Columns in R data.table

Summarising Rows by Maximising Two Columns in R data.table

Solution using .SD[which.max()]

Generalising to more columns

Why this works

Conclusion

LEAVE A REPLY Cancel reply

Recent comments

Solution using `.SD[which.max()]`