profile
viewpoint
Hadley Wickham hadley @rstudio Houston, TX http://hadley.nz Chief Scientist at @RStudio

hadley/adv-r 1644

Advanced R: a book

hadley/bigvis 274

Exploratory data analysis for large datasets (10-100 million observations)

hadley/assertthat 143

User friendly assertions for R

hadley/babynames 93

An R package containing US baby names from the SSA

garrettgman/ggsubplot 75

Embed subplots in ggplot2 graphics in R

cwickham/munsell 55

munsell colour system for R

dcl-2017-04/curriculum 48

Curriculum for Data Challenge Lab (2017-04)

dicook/nullabor 40

Tools for doing statistical inference using data plots

dcl-2017-01/curriculum 24

Draft syllabus for the Data Challenge Lab

issue closedtidyverse/dplyr

Order seems important when using !grepl in filter

I've encountered a problem using !grepl in the filter command in dplyr. In the work I am doing, the use of ! is important. I'm removing the Chatham Islands from a data.frame, which will be linked later in my code

Works (grepl): SIAreasNoDups2013 <- AreasNoDups2013 %>% filter(AU2013_V1_ >= 580200 | grepl("Chatham", AU2013_label))

Doesn't work (!grepl): SIAreasNoDups2013 <- AreasNoDups2013 %>% filter(AU2013_V1_ >= 580200 | !grepl("Chatham", AU2013_label))

Doesn't work (!grepl): SIAreasNoDups2013 <- AreasNoDups2013 %>% filter(!grepl("Chatham", AU2013_label) | AU2013_V1_ >= 580200)

Works: SIAreasNoDups2013 <- AreasNoDups2013 %>% filter(!(grepl("Chatham", AU2013_label) | AU2013_V1_ < 580200))

The data is from a public file, so I have uploaded it to my repository. I'm new to using GitHub so let me know if you can't access this file: https://github.com/programgirl/TestData/blob/master/AreasNoDups2013.rds

closed time in 4 hours

programgirl

issue commenttidyverse/dplyr

Order seems important when using !grepl in filter

This sort of question is a better fit for https://community.rstudio.com. Do you mind asking it over there? (You might want to read https://www.tidyverse.org/help/ first to maximise your chances of getting a good answer)

programgirl

comment created time in 4 hours

pull request commentr-lib/vctrs

Proof of concept) ALTREP compact integer reps

So with this change, recycling consistently takes ~1 µs. How small does size have to be before this is slower than without ALTREP?

DavisVaughan

comment created time in 4 hours

push eventtidyverse/dplyr

Travis CI User

commit sha 96947d5e7093023157d51521ea0fb3c3902c16d6

Built site for dplyr: 0.8.99.9000@b30c2a4

view details

push time in 18 hours

issue openedr-lib/usethis

Set RStudio defaults

https://blog.rstudio.com/2020/02/18/rstudio-1-3-preview-configuration/ — eg to not save/load workspace

created time in a day

pull request commentr-lib/vctrs

`vec_unchop()`

If you needed an example for the docs, I suspect a simple implementation of ave() would be interesting.

DavisVaughan

comment created time in a day

issue closedtidyverse/dplyr

case_when bug

In the case_when function, if any RHS option is an empty vector, then that vector is automatically returned.

case_when(TRUE  ~ "Hello",
          FALSE ~ character(0))

closed time in a day

mac-pfa

issue commenttidyverse/dplyr

case_when bug

@mac-pfa This sort of question is a better fit for https://community.rstudio.com. Do you mind asking it over there? (You might want to read https://www.tidyverse.org/help/ first to maximise your chances of getting a good answer)

mac-pfa

comment created time in a day

pull request commenttidyverse/dplyr

Prevent collapse of data.frame to vector in across (#4867)

For future reference, if you include Fixes #xyz in the text of the PR, the corresponding issue will be automatically closed when the PR is merged.

courtiol

comment created time in a day

Pull request review commenttidyverse/dplyr

Prevent collapse of data.frame to vector in across (#4867)

+test_that("across() works on one column data.frame", {+  df <- data.frame(x = 1)++  out <- df %>% mutate(across(x, identity))

I think you could just use across() here, and can you please use expect_equal() to test that output is the same as the input?

courtiol

comment created time in a day

issue commenttidyverse/lubridate

Update testthat style

@vspinu we have a convention (and tooling) where tests for (e.g) R/foo.R live in tests/testthat/test-foo.R. That allows (e.g.) devtools::test_coverage_file() to work so that you can easily test the code coverage for just the R file that you are working on.

hadley

comment created time in a day

issue closedtidyverse/dplyr

sample_n() is slow because of slice()?

I realized that sample_n was the bottleneck of my simulations, and was shocked by how much slower it is than base sample+subsetting. Turns out this is due to slice (which I understand is being called) being much slower than base [ for me.

Example with a tibble since this issue suggested it as a fix

df <- data.frame(A = rep('A', 20), C = seq(1:20))
df2 <- dplyr::as_tibble(df)

microbenchmark::microbenchmark(
  
  'sample_n'     = {dplyr::sample_n(df2,size = 1,weight = C)},
  
  'sample+['     = {df2[sample(1:20, size = 1, prob = df2$C),]},
  
  'sample+slice' = {dplyr::slice(df2, sample(1:20, size = 1, prob = df2$C))}
  
)

# Unit: microseconds
# expr         min    lq    mean   median   uq   max   neval
# sample_n     149.1 155.5 162.925 159.70 163.65 342.9   100
# sample+[     31.0  37.3  42.335  42.65  44.70  80.3    100
# sample+slice 127.5 132.9 138.887 135.45 138.50 324.1   100

Also happens without sampling

microbenchmark::microbenchmark(
  
  '['     = {df2[5,]},
  'slice' = {dplyr::slice(df2, 5)}
  
)

# Unit: microseconds
# expr  min    lq    mean median     uq   max neval
# [     22.6  25.4  29.572  30.55  32.55  60.1   100
# slice 104.9 108.2 114.078 110.40 112.80 280.3  100

Using R 3.6.1, dplyr 0.8.3 Am missing something ? Thanks!

closed time in a day

vlepori

push eventtidyverse/tidyr

Travis CI User

commit sha 7ce98e956c1d551bd7e71b171b2b7b06281dd593

Built site for tidyr: 1.0.2.9000@44d0ec7

view details

push time in a day

push eventtidyverse/tidyr

Travis CI User

commit sha ec3f80bad0fde14b69ff7ac94f291cb334aa9a43

Built site for tidyr: 1.0.2.9000@ad2f6b1

view details

push time in a day

push eventtidyverse/dplyr

Travis CI User

commit sha 6cd9fd08f37740a3c7443cd0afde112ba9d4237e

Built site for dplyr: 0.8.99.9000@62950c1

view details

push time in 2 days

pull request commenttidyverse/dplyr

join_mutate() using vec_ptype_common() instead of vec_ptype2()

Yeah I think so

romainfrancois

comment created time in 2 days

issue openedtidyverse/design

Use delayed binding to create expensive functions

Because they're run once

Use for backports etc

created time in 2 days

pull request commenttidyverse/dplyr

[stopgap] avoid setting rows when not necessary

What if we did something more like:

types <- vec_ptype2(x_key, y_key)
types <- discard(types, is_unspecified)
out[names(types)] <- vec_cast(out[names(types)], types)

Or is the fix actually to switch from vec_ptype2() to vec_ptype_common()?

romainfrancois

comment created time in 2 days

pull request commenttidyverse/tidyr

Improve `unchop()` performance

If R-level loops don't work, is there some other underlying pattern that could move to vctrs?

DavisVaughan

comment created time in 2 days

Pull request review commenttidyverse/tidyr

Create list-columns with `list_of()` in `chop()`

 chop <- function(data, cols) {   split <- vec_split(vals, keys)    if (length(split$val)) {-    chopped_vals <- map(split$val, ~ new_data_frame(map(.x, list), n = 1L))-    vals <- vec_rbind(!!!chopped_vals)+    split_vals <- transpose(split$val)

Nice!

lionel-

comment created time in 2 days

Pull request review commenttidyverse/tidyr

Create list-columns with `list_of()` in `chop()`

 # tidyr (development version) +* `chop()` now creates list-columns of class `vctrs::list_of()`. This+  helps keep track of the type in case the chopped data frame is+  emptied. This allows `unchop()` to reconstitute a data frame with+  the correct column types even when there is no observation.
  the correct column types even when there are no observations.
lionel-

comment created time in 2 days

Pull request review commenttidyverse/tidyr

Create list-columns with `list_of()` in `chop()`

 # tidyr (development version) +* `chop()` now creates list-columns of class `vctrs::list_of()`. This+  helps keep track of the type in case the chopped data frame is+  emptied. This allows `unchop()` to reconstitute a data frame with
  empty. This allows `unchop()` to reconstitute a data frame with
lionel-

comment created time in 2 days

pull request commenttidyverse/dplyr

[stopgap] avoid setting rows when not necessary

Is this causing a problem?

romainfrancois

comment created time in 2 days

Pull request review commentr-lib/vctrs

Support row names in `vec_cbind()`

 * `vec_slice()` and `vec_chop()` now work correctly with `bit64::integer64()`   objects when an `NA` subscript is supplied. By extension, this means that   `vec_init()` now works with these objects as well (#813).-  + * `vec_rbind()` now binds row names. When named inputs are supplied   and `names_to` is `NULL`, the names define row names. If `names_to`   is supplied, they are assigned in the column name as before. +* `vec_cbind()` now binds row names if they are congruent across+  inputs. If the row names are not identical that's an error. If some+  inputs do not have row names, they are propagated.

What does propagated mean in this context?

lionel-

comment created time in 2 days

push eventtidyverse/tidyr

Travis CI User

commit sha 815893f5b8be445d56584df6aa4a0574771ea4e8

Built site for tidyr: 1.0.2.9000@0946cdc

view details

push time in 2 days

push eventtidyverse/tidyr

Travis CI User

commit sha cdda5e1aa2277046677c3c02c99d8ed5b72b4ecb

Built site for tidyr: 1.0.2.9000@3fb9b9a

view details

push time in 2 days

push eventtidyverse/lubridate

Travis CI User

commit sha 5996f2fe57ac29fac40d1ec74a5a3be714b1151b

Built site for lubridate: 1.7.4.9000@20bf809

view details

push time in 2 days

push eventtidyverse/lubridate

Travis CI User

commit sha c79e03bec8407ca049862358d37c108cfe46a094

Built site for lubridate: 1.7.4.9000@289cf45

view details

push time in 2 days

issue commentr-lib/vctrs

Record-style objects incompatible with dplyr::mutate ?

Are you using the development version of dplyr? The released version does not have vctrs support.

davidchall

comment created time in 3 days

fork hadley/qs

Quick serialization of R objects

fork in 3 days

issue commenttidyverse/dplyr

add a complete scoped -> across list change to documentation

We've included the translation in the old documentation, so that if you look at (e.g.) ?mutate_all() you'll see how to use across() instead. But the bulk of the documentation will be in a new vignette, #4704.

msberends

comment created time in 3 days

Pull request review commenttidyverse/dplyr

Prevent slice_min/max from crashing when there are NA's

 sample_int <- function(n, size, replace = FALSE, wt = NULL) {   } } +nr_ranks_lesser_or_equal_to <- function(x, y) {

How about smaller_ranks()?

wfjvdham

comment created time in 3 days

issue commenttidyverse/dplyr

Need equivalent of scoped select/rename helpers

Would be nice if df %>% rename_if("Count of (.*?) [Ff]emales", "\\1 count") could work (i.e. special case string inputs so that you could reuse regexp groups in the output).

hadley

comment created time in 3 days

push eventtidyverse/tidyr

Travis CI User

commit sha 122f620f99af20028cc6d88066e97e170c8df4e1

Built site for tidyr: 1.0.2.9000@3fb9b9a

view details

push time in 3 days

push eventtidyverse/tidyr

Eric Gunnink

commit sha 3fb9b9a5066babdeb4eab492792bde52aded8b67

Update pivot.Rmd (#881) Typo. `sp` listed twice.

view details

push time in 3 days

pull request commenttidyverse/tidyr

Update pivot.Rmd

Thanks!

ericgunnink

comment created time in 3 days

PR merged tidyverse/tidyr

Update pivot.Rmd

Typo. sp listed twice.

+1 -1

0 comment

1 changed file

ericgunnink

pr closed time in 3 days

pull request commenthadley/adv-r

Back-ref to attributes for pairlists

You only need to know that attributes are pairlists when working with them in C, so I'd rather remove the backlink that try and explain here.

CorradoLanera

comment created time in 3 days

push eventhadley/mastering-shiny

Karandeep Singh

commit sha 0bd2bdb6f797f70057efe32ef998286969228335

Fixed “characater” to “character” misspelling (#139)

view details

push time in 3 days

pull request commenthadley/mastering-shiny

Fixed “characater” to “character” misspelling

Thanks!

kdpsingh

comment created time in 3 days

Pull request review commenttidyverse/lubridate

Update final tests

 test_that("parsers throw on invalid tz argument", { })  test_that("ymd functions correctly parse dates separated by -", {-  expect_that(ymd("2010-01-02"),-              equals(as.Date("2010-01-02")))-  expect_that(ymd("10-01-02"),-              equals(as.Date("2010-01-02")))-  expect_that(ydm("2010-02-01"),-              equals(as.Date("2010-01-02")))-  expect_that(mdy("01-02-2010"),-              equals(as.Date("2010-01-02")))-  expect_that(myd("01-2010-02"),-              equals(as.Date("2010-01-02")))-  expect_that(dmy("02-01-2010"),-              equals(as.Date("2010-01-02")))-  expect_that(dym("02-2010-01"),-              equals(as.Date("2010-01-02")))-  expect_that(ymd(c("2010-01-02", "2010-01-03")),-              equals(as.Date(c("2010-01-02", "2010-01-03"))))+  expect_equal(ymd("2010-01-02"),

Would you mind reformatting these tests to fit on one line?

sushmitavgopalan16

comment created time in 3 days

issue commenttidyverse/lubridate

Update testthat style

The break up looks good but the file names should match the R/ names, e.g. accessors-minute.R, accessors-month.R, etc

hadley

comment created time in 3 days

issue commentr-lib/vctrs

Revisit special casing of `vec_ptype2.logical.list()`

Yeah, agreed

DavisVaughan

comment created time in 5 days

push eventr-lib/pkgdown

Jim Hester

commit sha a78e04c8516894904c16da95c62f78907df29f36

Use git worktree in deploy_local (#1221) * Use git worktree in deploy_local, deprecating repo_slug argument * Rename deploy_local() to deploy_to_branch() and export

view details

push time in 5 days

PR merged r-lib/pkgdown

Use git worktree in deploy_local

This avoids the need to determine the remote url, as we can reuse the existing one in the checked out repository.

This also allows us to use deploy_local() in GitHub Actions

+72 -47

3 comments

5 changed files

jimhester

pr closed time in 5 days

push eventr-lib/usethis

Travis CI User

commit sha e125ca2402f0bf2838341173ee069e1d51ab26bc

Built site for usethis: 1.5.1.9000@2a3d134

view details

push time in 5 days

push eventtidyverse/tibble

Kirill Müller

commit sha 91b999c86052e50b7d2f446aa8106391445dc9f5

Deploy from Travis build 2355 [ci skip] Build URL: https://travis-ci.org/tidyverse/tibble/builds/650429922 Commit: b90dfe24b4bf6e544c189bfcf185622e790838df

view details

push time in 5 days

push eventtidyverse/tibble

GitHub

commit sha 5ae0281f2ab2474273673fffa9b56389b7325eaa

Deploy from Travis build 2354 [ci skip] Build URL: https://travis-ci.org/tidyverse/tibble/builds/650423151 Commit: cba28e93ef5a11510a82c4b5802861d0f7438ea5

view details

push time in 6 days

push eventtidyverse/tibble

Kirill Müller

commit sha a25c1e2a2e570735619327bf28dbb44d9c20bd88

Deploy from Travis build 2348 [ci skip] Build URL: https://travis-ci.org/tidyverse/tibble/builds/650401321 Commit: 9789707f5b01066eb318bd8a7bc897fda6c3520e

view details

push time in 6 days

push eventtidyverse/tibble

Kirill Müller

commit sha e6434d34d6fd6eddaa830fabb1a4af80f6fced57

Deploy from Travis build 2346 [ci skip] Build URL: https://travis-ci.org/tidyverse/tibble/builds/650304634 Commit: 19484f9cb0b8eb44fe4dc16357d2e7cb9e1e20b8

view details

push time in 6 days

push eventtidyverse/tibble

GitHub

commit sha 5c76df545e07078bf713a413446d1882ef184c47

Deploy from Travis build 2345 [ci skip] Build URL: https://travis-ci.org/tidyverse/tibble/builds/650302279 Commit: ebc37028cf0ee317de1c284f99378e0bdede6675

view details

push time in 6 days

issue closedtidyverse/dtplyr

Summarise does not throw an error for summary values greater than length 1

summariseunexpectedly allows summary values of length greater than 1. This can be troublesome if the error is behaviour relied upon for sanity checking values set in summarise. Furthermore, I think most people would reasonably assume that the grouping variables can function as unique identifiers after a call to summarise but this will no longer be the case when using dtplyr.

Reprex below:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
library(dtplyr)

a <- tibble(A = c(1,2,3), B = c(4,5,6))

# expected behaviour: Error
a %>%
  group_by(A) %>%
  summarise(C = c(1,2))
#> Error: Column `C` must be length 1 (a summary value), not 2

# unexpected behaviour: continues silently
a %>%
  lazy_dt() %>%
  group_by(A) %>%
  summarise(C = c(1,2)) %>%
  as_tibble()
#> # A tibble: 6 x 2
#>       A     C
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     2
#> 3     2     1
#> 4     2     2
#> 5     3     1
#> 6     3     2

<sup>Created on 2020-02-13 by the reprex package (v0.3.0)</sup>

closed time in 6 days

tylerferguson

issue commenttidyverse/dtplyr

Summarise does not throw an error for summary values greater than length 1

This is a deliberate change in the development version of dplyr. I think this is a reasonable decision because it enables the use of many more summary functions (e.g. quantile()).

tylerferguson

comment created time in 6 days

Pull request review commentr-lib/pkgdown

Use git worktree in deploy_local

 deploy_site_github <- function(   cat_line("Setting private key permissions to 0600")   fs::file_chmod(ssh_id_file, "0600") -  deploy_local(pkg, repo_slug = repo_slug, host = host, commit_message = commit_message, ...)+  deploy_local(pkg, commit_message = commit_message, ...)    rule("Deploy completed", line = 2) } -deploy_local <- function(-                         pkg = ".",-                         repo_slug = NULL,-                         host,+#' Build and deploy a site locally

deploy_to_branch() maybe? deploy_site_github() isn't named very well either, unfortunately.

jimhester

comment created time in 6 days

Pull request review commentr-lib/pkgdown

Use git worktree in deploy_local

 deploy_site_github <- function(   tarball = Sys.getenv("PKG_TARBALL", ""),   ssh_id = Sys.getenv("id_rsa", ""),   repo_slug = Sys.getenv("TRAVIS_REPO_SLUG", ""),-  host = "github.com",

Oh I missed that; that makes sense.

jimhester

comment created time in 6 days

push eventtidyverse/lubridate

Travis CI User

commit sha 4625e1bd79cf19aaea9ef759cca98b0cd05a078a

Built site for lubridate: 1.7.4.9000@05e4b71

view details

push time in 7 days

pull request commenttidyverse/lubridate

update more tests to modern style

Thanks! It's slightly better to not restyle (saving it to do one restyling pass at the end), but it's not bad enough to make it worth undoing. Thanks for continuing to work on these tests!

sushmitavgopalan16

comment created time in 7 days

push eventtidyverse/lubridate

Sushmita V Gopalan

commit sha 05e4b71a88bbe792639b46c5bee1276c01ff5902

update more tests to modern style (#859) * update tests in test-ops-multiplication.R * update tests in test-ops-subtraction.R * update tests in test-periods.R and ran styler * update tests in test-timespans.R and ran styler * update tests in test-timezones.R

view details

push time in 7 days

PR merged tidyverse/lubridate

Reviewers
update more tests to modern style

partially addresses #810

+384 -252

1 comment

5 changed files

sushmitavgopalan16

pr closed time in 7 days

Pull request review commentr-lib/vctrs

Add native handling of `unspecified()`

 vec_ptype2.list <- function(x, y, ..., x_arg = "x", y_arg = "y") { #' @method vec_ptype2.logical logical #' @export vec_ptype2.logical.logical <- function(x, y, ..., x_arg = "x", y_arg = "y") {+  # Special case `vec_ptype2(NA, NA)` to ensure that

I think this comment should go in the block below

DavisVaughan

comment created time in 7 days

issue commenttidyverse/dplyr

Coercion rules section in two table vignettes needs an update

Tracked in #4828

hadley

comment created time in 7 days

issue commenttidyverse/dplyr

Row order after right_join() no longer matches original order

Ok, you haven't read the docs — we are now very clear that joins always preserve the ordering (and other properties) of x.

courtiol

comment created time in 7 days

Pull request review commentr-lib/pkgdown

Use git worktree in deploy_local

 deploy_site_github <- function(   tarball = Sys.getenv("PKG_TARBALL", ""),   ssh_id = Sys.getenv("id_rsa", ""),   repo_slug = Sys.getenv("TRAVIS_REPO_SLUG", ""),-  host = "github.com",

And can repo_slug be deprecated now too?

jimhester

comment created time in 7 days

Pull request review commentr-lib/pkgdown

Use git worktree in deploy_local

 deploy_site_github <- function(   cat_line("Setting private key permissions to 0600")   fs::file_chmod(ssh_id_file, "0600") -  deploy_local(pkg, repo_slug = repo_slug, host = host, commit_message = commit_message, ...)+  deploy_local(pkg, commit_message = commit_message, ...)    rule("Deploy completed", line = 2) } -deploy_local <- function(-                         pkg = ".",-                         repo_slug = NULL,-                         host,+#' Build and deploy a site locally

Probably worth updating the deploy_site_github() to mention deploy_local().

And maybe since we're exporting it, it's worth considering the name a bit more?

jimhester

comment created time in 7 days

Pull request review commentr-lib/pkgdown

Use git worktree in deploy_local

 deploy_site_github <- function(   tarball = Sys.getenv("PKG_TARBALL", ""),   ssh_id = Sys.getenv("id_rsa", ""),   repo_slug = Sys.getenv("TRAVIS_REPO_SLUG", ""),-  host = "github.com",

To be safe, maybe deprecate this argument?

jimhester

comment created time in 7 days

Pull request review commentr-lib/vctrs

Add internal FAQ about identity element of `vec_ptype2()`

++```{r, child = "setup.Rmd", include = FALSE}+```++## Promotion monoid++`vec_ptype2()` returns the _promotion_ type of two vectors, if any. This is typically the richer type of the two so that automatic coercions are never lossy. For example, the promotion type of integer and double is the latter because double covers a larger range of values than integer.

Can we just use richer type instead of promotion type?

lionel-

comment created time in 7 days

Pull request review commenttidyverse/dplyr

Prevent slice_min/max from crashing when there are NA's

 slice_min.data.frame <- function(.data, order_by, ..., n, prop, with_ties = TRUE   size <- check_slice_size(n, prop)   if (with_ties) {     idx <- switch(size$type,-      n =    function(x, n) head(order(x), sum(min_rank(x) <= size$n)),-      prop = function(x, n) head(order(x), sum(min_rank(x) <= size$prop * n)),+      n =    function(x, n) head(order(x), sum(min_rank(x) <= size$n, na.rm = TRUE)),

I think at this point it's probably worth extracting out the repeated code into a function like:

something <- function(x, y) {
  sum(min_rank(x) < y, na.rm = TRUE)
}

But I'm not sure what to call it

wfjvdham

comment created time in 7 days

pull request commentr-lib/vctrs

Fix name inconsistency in `vec_rbind()`

Yeah, it makes sense for external names to become row names

lionel-

comment created time in 7 days

issue commenttidyverse/dplyr

performance of dplyr_col_modify()

Yeah, we can definitely have a faster version for data.frames — the idea of these functions were to extract out as much common code as possible so we can optimise in one place. Do you want to take a stab, or do you want me to take a look next week?

romainfrancois

comment created time in 7 days

issue closedtidyverse/dplyr

right_join() no longer rearranges the rows correctly!

The current output of right_join() is no longer what it used to be. I believe this is a bug:

In dplyr 0.8.3:

dplyr::right_join(data.frame(x = 2:1), data.frame(x = 1:2, y = 1:2))
#>Joining, by = "x"
#>  x y
#> 1 1 1
#> 2 2 2

In current devel:

dplyr::right_join(data.frame(x = 2:1), data.frame(x = 1:2, y = 1:2))
#> Joining, by = "x"
#>   x y
#> 1 2 2
#> 2 1 1

closed time in 7 days

courtiol

issue commenttidyverse/dplyr

right_join() no longer rearranges the rows correctly!

This works for me:

dplyr::right_join(data.frame(x = 2:1), data.frame(x = 1:2, y = 1:2))
#> Joining, by = "x"
#>   x y
#> 1 2 2
#> 2 1 1

<sup>Created on 2020-02-12 by the reprex package (v0.3.0)</sup>

courtiol

comment created time in 7 days

issue commenttidyverse/dplyr

right_join() no longer rearranges the rows correctly!

Please provide a reprex

courtiol

comment created time in 7 days

push eventtidyverse/dplyr

Travis CI User

commit sha 4692c465c781769bb66823ec16b24b2d8c8bbadb

Built site for dplyr: 0.8.99.9000@30363bf

view details

push time in 8 days

push eventtidyverse/dplyr

Travis CI User

commit sha 334a776526e7f11e77769c839c31d2f0e9ed4f0b

Built site for dplyr: 0.8.99.9000@8e0b4f5

view details

push time in 8 days

issue closedtidyverse/dplyr

Batch creation of variable with mutate_over

When doing data-wrangling, a common task is to create new variables on the basis of several other variables. In dplyr this can be done using mutate & case_when. Often (but not as common maybe) several variables have to be created in similar ways.

At the moment, it seems,mutate_at does not allow this kind batch creation of variables with case_when (or at least I am not aware of it - on SO similar questions have been unanswered or at least do not seem to offer a way to do it with mutate_at - see for example here and here).

It would be great, if dplyr had an official way of dealing with this kind of data-wrangling task, this could be a function called mutate_over, which mutates a tibble using a string vector. Or it could be just an argument within mutate_at which would allow this kind of operations.

There is a pipe-friendly workaround for this kind of batch creation of variables using purrr::reduce. However, I think that this way is not very obvious, plus being a pure data-wrangling task, this functionality would fit to dplyr. Although I understand that it would lead to inconsistencies when having a mutate_over function but not filter_over etc.

Below I provide three examples with increasing complexity.

library(tidyverse)

iris <- as_tibble(iris)

# example 1
# generate product of width and length for each string variable ("Sepal", "Petal") 
gen_vars1 <- function(df, x) {
  
  mutate(df,
         !! x := !! sym(paste0(x, ".Length")) * !! sym(paste0(x, ".Width")))
}

iris %>% 
  reduce(c("Sepal", "Petal"), gen_vars1, .init = .)
#> # A tibble: 150 x 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal Petal
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl> <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa   17.8 0.280
#>  2          4.9         3            1.4         0.2 setosa   14.7 0.280
#>  3          4.7         3.2          1.3         0.2 setosa   15.0 0.26 
#>  4          4.6         3.1          1.5         0.2 setosa   14.3 0.3  
#>  5          5           3.6          1.4         0.2 setosa   18   0.280
#>  6          5.4         3.9          1.7         0.4 setosa   21.1 0.68 
#>  7          4.6         3.4          1.4         0.3 setosa   15.6 0.42 
#>  8          5           3.4          1.5         0.2 setosa   17   0.3  
#>  9          4.4         2.9          1.4         0.2 setosa   12.8 0.280
#> 10          4.9         3.1          1.5         0.1 setosa   15.2 0.15 
#> # … with 140 more rows


#  example 2
# generate logical vector based on two conditions
gen_vars2 <- function(df, x) {
  
  mutate(df,
         "{x}.Ratio" := case_when( 
          !! sym(paste0(x, ".Length")) >= 4 & !! sym(paste0(x, ".Width")) <= 2.2 ~ T,
          T ~ F
           )
         )
}

iris %>% 
  reduce(c("Sepal", "Petal"), gen_vars2, .init = .)
#> # A tibble: 150 x 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Ratio
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <lgl>      
#>  1          5.1         3.5          1.4         0.2 setosa  FALSE      
#>  2          4.9         3            1.4         0.2 setosa  FALSE      
#>  3          4.7         3.2          1.3         0.2 setosa  FALSE      
#>  4          4.6         3.1          1.5         0.2 setosa  FALSE      
#>  5          5           3.6          1.4         0.2 setosa  FALSE      
#>  6          5.4         3.9          1.7         0.4 setosa  FALSE      
#>  7          4.6         3.4          1.4         0.3 setosa  FALSE      
#>  8          5           3.4          1.5         0.2 setosa  FALSE      
#>  9          4.4         2.9          1.4         0.2 setosa  FALSE      
#> 10          4.9         3.1          1.5         0.1 setosa  FALSE      
#> # … with 140 more rows, and 1 more variable: Petal.Ratio <lgl>



# example 3
# generate logical vector based on two conditions with different conditions per input
gen_vars3 <- function(df, x) {
  
  l = switch(x,
            "Sepal" = 4,
            "Petal" = 2) 
  
  w = switch(x,
            "Sepal" = 3,
            "Petal" = 1) 
  
  mutate(df,
         "{x}.Ratio" := case_when( 
           !! sym(paste0(x, ".Length")) >= l & !! sym(paste0(x, ".Width")) <= w ~ T,
           T ~ F
         )
  )
}

iris %>% 
  reduce(c("Sepal", "Petal"), gen_vars3, .init = .)
#> # A tibble: 150 x 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Ratio
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <lgl>      
#>  1          5.1         3.5          1.4         0.2 setosa  FALSE      
#>  2          4.9         3            1.4         0.2 setosa  TRUE       
#>  3          4.7         3.2          1.3         0.2 setosa  FALSE      
#>  4          4.6         3.1          1.5         0.2 setosa  FALSE      
#>  5          5           3.6          1.4         0.2 setosa  FALSE      
#>  6          5.4         3.9          1.7         0.4 setosa  FALSE      
#>  7          4.6         3.4          1.4         0.3 setosa  FALSE      
#>  8          5           3.4          1.5         0.2 setosa  FALSE      
#>  9          4.4         2.9          1.4         0.2 setosa  TRUE       
#> 10          4.9         3.1          1.5         0.1 setosa  FALSE      
#> # … with 140 more rows, and 1 more variable: Petal.Ratio <lgl>

<sup>Created on 2020-02-05 by the reprex package (v0.3.0)</sup>

Feature wishlist:

  1. It would be great to have this functionality in dplyr as a function (for example, mutate_over).
  2. Such a function could provide further regex helpers, which would help extract strings from a tibbles colnames.
  3. Not a direct dplyr request, but it would be great to make the!! sym less verbose, maybe by extending the functionality of the curly-curly operator or by introducing another similar operator.

closed time in 8 days

TimTeaFan

issue commenttidyverse/dplyr

Batch creation of variable with mutate_over

That feels pretty special purpose to me — so I don’t think it belongs in dplyr, but it would be a great function for another package.

TimTeaFan

comment created time in 8 days

push eventtidyverse/ggplot2

Travis CI User

commit sha 5bf8f446ff908b7220db58adf6459d14f5b49634

Built site for ggplot2: 3.3.0.9000@b434351

view details

push time in 8 days

issue openedtidyverse/dplyr

Flag record as questioning

Since the arguments are in the wrong order

created time in 8 days

push eventtidyverse/ggplot2

Travis CI User

commit sha ca97f45afb9c1521617c19ded2024f09d4f86734

Built site for ggplot2: 3.3.0.9000@2ae36a2

view details

push time in 8 days

issue closedtidyverse/dplyr

curly has unexpected behavior

Curly does not return the expected result from the below code. However, if I uncomment print(y), it returns the expected result. Is it a bug ? or I am using the curly in a wrong way ?

UpDownCount <- function(x, y) {
  # print(y)
  df <- x %>%
    dplyr::mutate(
      Up = sum({{ y }} >= 6),
      Down = sum({{ y }} <= 6)
    )

  lst <- list(
    Up = unique(df$Up),
    Down = unique(df$Down),
    Toal = unique(df$Up) + unique(df$Down)
  )
  return(lst)
}

mtcars %>% UpDownCount(sym("cyl"))
#> Error in mtcars %>% UpDownCount(sym("cyl")): could not find function "%>%"

<sup>Created on 2020-02-11 by the reprex package (v0.3.0)</sup>

<details>

<summary>Session info</summary>

</details>

closed time in 8 days

gadepallivs

issue commenttidyverse/dplyr

curly has unexpected behavior

There are a few problems with your code:

  • you didn't load dplyr so you get a weird error message
  • the first argument is x but you use df in your code
  • you're using sym("cyl") instead of just cyl
gadepallivs

comment created time in 8 days

issue commenttidyverse/dplyr

Batch creation of variable with mutate_over

Can you please try explaining what you want another way? I’m not currently understanding what you want.

TimTeaFan

comment created time in 9 days

push eventr-lib/usethis

Travis CI User

commit sha 953b5927ef5c1d090cfc0082921d87d93d39ef5d

Built site for usethis: 1.5.1.9000@7d824f1

view details

push time in 9 days

push eventr-lib/roxygen2

Travis CI User

commit sha 05452bd94b0be74a7c666e223025402c3478bba3

Built site for roxygen2: 7.0.2.9000@0d67307

view details

push time in 9 days

issue commentr-lib/vdiffr

Shiny app should exit when all cases validated

Yeah, that makes sense to me

hadley

comment created time in 9 days

push eventr-lib/usethis

Travis CI User

commit sha 3f0b6c9f5310718140c19f8f4caf0c8c64133bdc

Built site for usethis: 1.5.1.9000@937f704

view details

push time in 9 days

push eventtidyverse/dplyr

Travis CI User

commit sha cd7622c083b497cbf0da1050a79853b60a3a6860

Built site for dplyr: 0.8.99.9000@5f599a0

view details

push time in 9 days

push eventtidyverse/dplyr

Eric Stern

commit sha 5f599a0c8fc73b2e179f8f0fdaa89de75a35ce03

Refresh `bind()` docs (#4833) Fixes #4794

view details

push time in 9 days

issue closedtidyverse/dplyr

Refresh `bind()` docs

  • [ ] Proofread title, description, and parameters
  • [ ] Proofread comments in examples
  • [ ] Ensure comment headings used to divide into scannable sections
  • [ ] Ensure there's one example that shows tidy eval and links to ?dplyr_tidy_eval
  • [ ] Ensure there's one example that shows across()
  • [ ] Flag examples that seem overly complex for further review
  • [ ] Switch out mtcars/iris for something more interesting, or a smaller dataset constructed specifically to illustrate one point.
  • [ ] Review size of example outputs to ensure that you can easily see important rows and cols.
  • [ ] Datasets should always be tibbles to avoid use of as_tibble() distracting from the main point

closed time in 9 days

batpigandme

PR merged tidyverse/dplyr

Refresh `bind()` docs #4794 partial refresh tidy-dev-day :nerd_face:

As a user, I find the following examples unclear/could use some rethinking:

#' # Note that for historical reasons, lists containing vectors are #' # always treated as data frames. Thus their vectors are treated as #' # columns rather than rows, and their inner names are ignored: #' ll <- list( #' a = c(A = 1, B = 2), #' b = c(A = 3, B = 4) #' ) #' bind_rows(ll) #' #' # You can circumvent that behaviour with explicit splicing: #' bind_rows(!!!ll)

I am not fully clear on the naming convention in this example ll, also it looks like the code has changed and the results of bind_rows(ll) and bind_rows(!!!ll) return the same result.

This no longer breaks since single length vectors are repeated over the length. I adjusted it in a pull request.

#' \dontrun{ #' # Rows do need to match when column-binding #' bind_cols(data.frame(x = 1), data.frame(y = 1:2)) #' }

+5 -18

1 comment

1 changed file

estern95

pr closed time in 9 days

push eventtidyverse/tidyr

Travis CI User

commit sha a19f01f00cb80e7c9ab185b5fc03435342b4ea5b

Built site for tidyr: 1.0.2.9000@6717810

view details

push time in 9 days

push eventtidyverse/tidyr

Will Beasley

commit sha 67178106703f230941ad082e9ea79f3d08650ac6

consistent capitalization of GitHub (#875)

view details

push time in 9 days

pull request commenttidyverse/tidyr

consistent capitalization of GitHub

Thanks!

wibeasley

comment created time in 9 days

PR merged tidyverse/tidyr

consistent capitalization of GitHub

This commit is so trivial. Sorry if it's not even worth the time to read the PR.

+1 -1

0 comment

1 changed file

wibeasley

pr closed time in 9 days

push eventtidyverse/dplyr

Travis CI User

commit sha d7263dc070eb6556c40a3ac0114a621f0e46feb0

Built site for dplyr: 0.8.99.9000@481268c

view details

push time in 9 days

push eventhadley/ggplot2-book

Zhuoer Dong

commit sha 384dbd9a1fe36084b3fa281db883b8f65d77fbd7

class column of mpg is already shown in "Getting starded > Introduction" (#174)

view details

push time in 9 days

more