0

It looks like tidyr's drop_na will drop rows if any of the specified columns contain missing values.

Example:

> library(tidyverse)
> df <- data.frame(a=c(1,NA,2,NA), b=c(3,4,NA,NA))
> df
   a  b
1  1  3
2 NA  4
3  2 NA
4 NA NA
> df %>% drop_na()
  a b
1 1 3

Is there a straightforward way to drop rows where all columns are missing?

pandas' dropna has a how argument that specify whether to look for any missing value or all missing value.

12
  • 2
  • 1
    Thank you, @M--. I am suprised soo many well reputated users do not identify as such.
    – Friede
    Commented 12 hours ago
  • 2
    @Friede If one is here to help people but also enjoys fake internet points, then there's no incentive for them to identify dupes. It's been asked multiple times, and I've brought it up recently that we need a form of appreciation for duplicate finding, but it is yet to be implemented years after it was requested. As of now, unless you curate the site because you value the knowledge base, there's no reason to not answer duplicates and snatch couple extra reps. Not a stab at people who answered here; the system is broken.
    – M--
    Commented 12 hours ago
  • 1
    Occasionally (though rare) I get a reason for the downvote, and yes I've been told the dv was because it was an answer on a dupe question. If I cared much about points I'd be more upset, but really it's just the principle (and a little rude in my opinion). Meh, it's just points, it doesn't help with retirement.
    – r2evans
    Commented 11 hours ago
  • 2
    @r2evans It is frowned upon to answer and close a question. I have seen/heard of examples that got in "trouble" for that especially if they used their mjolnir (you can search and read about it on Meta). Regarding dv's on answers provided on dupe questions, I should say it really depends. Some instances when the duplicated questions can be a good target (for search engines), but the dupe-target has your answer and a bunch more, it is actually bad to provide a answer since instead of letting others getting redirected to the dupe-target, you get them to the dupe and leave them with one answer...
    – M--
    Commented 11 hours ago

3 Answers 3

2

You can use filter

library(dplyr)
df %>% filter(!if_all(everything(), is.na)) 

or in base R:

df[rowSums(is.na(df)) != ncol(df), ]
2

I usually recommend collapse::na_omit for that. The oddly-specific prop argument will remove rows with a proportion of missing values >= prop. It's the fastest and most flexible option:

collapse::na_omit(df, prop = 1)

#    a  b
# 1  1  3
# 2 NA  4
# 3  2 NA
0

drop_na does not have that functionality but we can try to add it

library(dplyr)

df %>% drop_na(method = "all")
   a  b
1  1  3
2 NA  4
3  2 NA

The original function uses vctrs::vec_detect_complete. Adding vctrs::vec_detect_missing to get the extra functionality yields something like

drop_na <- function(data, ..., method = "any") {
  stopifnot(method == "any" | method == "all")
  dots <- enquos(...)

  if (rlang::is_empty(dots)) {
    # Use all columns if no `...` are supplied
    cols <- data
  } else {
    vars <- tidyselect::eval_select(expr(c(!!!dots)), data, allow_rename = FALSE)
    cols <- data[vars]
  }

  if (method == "any") {
    loc <- vctrs::vec_detect_complete(cols)
    out <- vctrs::vec_slice(data, loc)
  }
  if (method == "all") {
    loc <- !vctrs::vec_detect_missing(cols)
    out <- vctrs::vec_slice(data, loc)
  }
  out
}

A short version without modifying drop_na

library(dplyr)
library(vctrs)

df %>% filter(!vec_detect_missing(.))

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.