profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/sfirke/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Sam Firke sfirke City of Ann Arbor Ann Arbor, MI samfirke.com Data scientist, caring human. Currently focused on municipal data analysis and business intelligence.

sfirke/janitor 1034

simple tools for data cleaning in R

sfirke/packagemetrics 128

A Package for Helping You Choose Which Package to Use

sfirke/predicting-march-madness 26

Machine learning tutorial to create an entry for the Kaggle March Mania contest

tntp/surveymonkey 18

Access your SurveyMonkey data directly from R!

seanofahey/Rmonkey 2

A Survey Monkey R Client

sfirke/annarborvotes 2

Search Twitter for firstname/lastname combinations from voter rolls, then tweet reminders to vote at those who match name & location.

sfirke/a2_police_calls 1

Using recent police calls to demonstrate how the tabyl() function works

sfirke/surveymonkey_getresponses 1

Get survey lists and responses from SurveyMonkey

allhandsactive/site-test 0

AHA Test Site

issue commentsfirke/janitor

suggestion: option to silence warnings in row_to_names

Fascinating, I had no idea about this concept. So we'd give the warning a class, and then a user could call supressWarnings() in such a way that it would only apply to that class - which makes it safer, since they won't risk inadvertently suppressing a separate, potentially-important warning? Sounds good to me.

mgacc0

comment created time in 5 hours

issue commentsfirke/janitor

feature suggestion: parameter to specify decimal separator on adorn_pct_formatting

Yes, that sounds like a good idea. This may be a blind spot for me as an American - is this for countries where what I know as 99.1% would be written 99,1% ?

mgacc0

comment created time in 17 hours

pull request commentsfirke/janitor

Add a find_header() function to help with row_to_names()

I fixed a conflict in NEWS introduced by edits I'd made on the main branch in the meantime, so this is again merge-able.

billdenney

comment created time in 5 days

push eventbilldenney/janitor

Bill Denney

commit sha e7540d6835b0ab7643ebdccf1b4d4cd6395b669d

Warn if mu/micro is in make_clean_names() input but not handled by `replace` (#449)

view details

Sam Firke

commit sha e22e8a58fbce0c4935fcd9393e57476443674a37

address OSX CI failure due to suggested pkg

view details

Sam Firke

commit sha ff173db0eb9d6eaa0c832a6f2179aa50e5809c33

avoid name collision when input variable to tabyl is "n" (#450)

view details

Sam Firke

commit sha 9c0d3cda3509cb308653feae865d8fed303580f6

note yesterday's bug patch and tweak some old items

view details

Sam Firke

commit sha 6cd6b8ea1b03990402b048de336902a043692193

Try to stop false CI failure due to sf pkg not available for mac

view details

Sam Firke

commit sha f54950cd1299f6e756e48661f6f9f35b3a3fdd26

Merge branch 'main' into fix-429

view details

push time in 5 days

startedmoodymudskipper/boomer

started time in 6 days

push eventsfirke/janitor

Sam Firke

commit sha 6cd6b8ea1b03990402b048de336902a043692193

Try to stop false CI failure due to sf pkg not available for mac

view details

push time in 6 days

push eventsfirke/janitor

Sam Firke

commit sha 9c0d3cda3509cb308653feae865d8fed303580f6

note yesterday's bug patch and tweak some old items

view details

push time in 6 days

delete branch sfirke/janitor

delete branch : fix_445

delete time in 7 days

push eventsfirke/janitor

Sam Firke

commit sha ff173db0eb9d6eaa0c832a6f2179aa50e5809c33

avoid name collision when input variable to tabyl is "n" (#450)

view details

push time in 7 days

PR merged sfirke/janitor

avoid name collision when input variable to tabyl is "n"

fixes #445

+2 -2

1 comment

1 changed file

sfirke

pr closed time in 7 days

issue closedsfirke/janitor

Two-way tabyl where the column 'n' is being spread yields weird result

Good

> mtcars %>% group_by(cyl, am) %>% summarise(x = n()) %>% tabyl(cyl, x)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
 cyl 12 2 3 4 8
   4  0 0 1 0 1
   6  0 0 1 1 0
   8  1 1 0 0 0

Same thing but with different column name, bad result contains a new meaningless column:

> mtcars %>% group_by(cyl, am) %>% summarise(n = n()) %>% tabyl(cyl, n)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
Storing counts in `nn`, as `n` already present in input
i Use `name = "new_name"` to pick a new name.
 cyl nn 12 2 3 4 8
   4  1  0 0 3 0 8
   4 NA 12 2 0 4 0
   6  1  0 0 3 4 0
   6 NA 12 2 0 0 8
   8  1 12 2 0 0 0
   8 NA  0 0 3 4 8

I thought the n -> nn renaming was an artifact of one-way tabyl defensiveness, but no, that behavior turns into n_n and does not print a warning:

> mtcars %>% group_by(cyl, am) %>% summarise(n = n()) %>% tabyl(n)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
  n n_n   percent
  2   1 0.1666667
  3   2 0.3333333
  4   1 0.1666667
  8   1 0.1666667
 12   1 0.1666667

closed time in 7 days

sfirke
PullRequestEvent

PR closed sfirke/janitor

avoid name collision when input variable to tabyl is "n"

fixes #445

+2 -2

1 comment

1 changed file

sfirke

pr closed time in 7 days

PR opened sfirke/janitor

avoid name collision when input variable to tabyl is "n"

fixes #445

+2 -2

0 comment

1 changed file

pr created time in 7 days

push eventsfirke/janitor

Sam Firke

commit sha e9dae1e6114e21451fdd5e1dbd04278f9c8b40b9

avoid name collision when input variable to tabyl is "n"

view details

push time in 7 days

issue openedtidyverse/dplyr

Inaccurate documentation for `name` argument of `count()`

The documentation for count() says of the argument name:

If omitted, it will default to n. If there's already a column called n, it will error, and require you to specify the name.

But when I try calling count on a variable named n, that's not what happens - it succeeds with a warning:

mtcars %>% group_by(cyl, am) %>% summarise(n = n(), .groups = "drop") %>% count(cyl, n)
Storing counts in `nn`, as `n` already present in input
i Use `name = "new_name"` to pick a new name.
# A tibble: 6 x 3
    cyl     n    nn
  <dbl> <int> <int>
1     4     3     1
2     4     8     1
3     6     3     1
4     6     4     1
5     8     2     1
6     8    12     1

This is trivial, I think, and I like the current behavior of count() warning instead of erroring - I'm just noting the documentation mismatch.

created time in 7 days

create barnchsfirke/janitor

branch : fix_445

created branch time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 test_that("row_to_names works on matrices (Fix #320)", {     matrix(c("B", "D"), nrow=1, dimnames=list(NULL, c("A", "C")))   ) })++test_that("find_header works", {+  no_complete <-+    data.frame(+      A=NA_character_,+      stringsAsFactors=FALSE+    )+  expect_error(+    find_header(no_complete, "A", "B"),+    regexp="Either zero or one arguments other than 'dat' may be provided.",+    fixed=TRUE+  )+  expect_error(+    find_header(no_complete),+    regexp="No complete rows (rows with zero NA values) were found.",+    fixed=TRUE+  )+  all_partial <-+    data.frame(+      A=c(NA_character_, "A"),+      B=c("B", NA_character_),+      stringsAsFactors=FALSE+    )+  expect_error(+    find_header(all_partial),+    regexp="No complete rows (rows with zero NA values) were found.",+    fixed=TRUE+  )+  single_complete <-+    data.frame(+      A=c(NA_character_, "A"),+      B=c("B", "B"),+      stringsAsFactors=FALSE+    )+  expect_equal(find_header(single_complete), 2)+  expect_equal(find_header(single_complete, "A"), 2)+  expect_error(+    find_header(single_complete, "C"),+    regexp="The string 'C' was not found in column 1",+    fixed=TRUE+  )+  expect_equal(+    expect_warning(+      find_header(single_complete, "B"=2),+      regexp="The string 'B' was found 2 times in column 2, using the first row where it was found"+    ),+    1+  )+  multiple_complete <-+    data.frame(+      A=c("A", "A"),+      B=c("B", "B"),+      stringsAsFactors=FALSE+    )+  expect_equal(find_header(multiple_complete), 1)+})++test_that("find_header works within row_to_names", {+  single_complete <-+    data.frame(+      A=c(NA_character_, "C"),+      B=c("D", "D"),+      stringsAsFactors=FALSE+    )+  expect_equal(+    row_to_names(dat=single_complete, row_number="find_header"),+    data.frame(C=NA_character_, D=NA_character_, stringsAsFactors=FALSE)[-1,]+  )++  find_correct <-+    data.frame(+      A=c(NA_character_, "C", "D", "E"),+      B=c("D", "D", "E", "F"),+      stringsAsFactors=FALSE+    )+  expect_equal(+    row_to_names(dat=find_correct, row_number="find_header"),+    setNames(find_correct[3:nrow(find_correct),], c("C", "D"))+  )+  expect_equal(+    row_to_names(dat=find_correct, row_number="find_header", "D"),+    setNames(find_correct[4:nrow(find_correct),], c("D", "E"))+  )+  expect_equal(+    row_to_names(dat=find_correct, row_number="find_header", "E"=2),+    setNames(find_correct[4:nrow(find_correct),], c("D", "E"))+  )+})

I reviewed all of these tests, they look good

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 #' Elevate a row to be the column names of a data.frame. #' #' @param dat The input data.frame-#' @param row_number The row of \code{dat} containing the variable names-#' @param remove_row Should the row \code{row_number} be removed from the resulting data.frame?-#' @param remove_rows_above If \code{row_number != 1}, should the rows above \code{row_number} - that is, between-#'   \code{1:(row_number-1)} - be removed from the resulting data.frame?+#' @param row_number The row of \code{dat} containing the variable names or the+#'   string \code{"find_header"} to use \code{find_header(dat=dat, ...)} to find+#'   the row_number.+#' @param ... Sent to \code{find_header()}, if+#'   \code{row_number = "find_header"}.  Otherwise, ignored.+#' @param remove_row Should the row \code{row_number} be removed from the+#'   resulting data.frame?+#' @param remove_rows_above If \code{row_number != 1}, should the rows above+#'   \code{row_number} - that is, between \code{1:(row_number-1)} - be removed+#'   from the resulting data.frame? #' @return A data.frame with new names (and some rows removed, if specified)+#' @family Set names #' @examples #' x <- data.frame(X_1 = c(NA, "Title", 1:3), #'                 X_2 = c(NA, "Title2", 4:6)) #' x %>% #'   row_to_names(row_number = 2)+#'+#' x %>%+#'   row_to_names(row_number = "find_header") #' @export-row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above = TRUE) {+row_to_names <- function(dat, row_number, ..., remove_row = TRUE, remove_rows_above = TRUE) {   # Check inputs-  if (length(row_number) != 1 | !is.numeric(row_number)) {-    stop("row_number must be a numeric of length 1")+  if (!is.logical(remove_row) & length(remove_row) == 1) {+    stop("remove_row must be a logical scalar, not ", as.character(remove_row))+  } else if (!is.logical(remove_rows_above) & length(remove_rows_above) == 1) {+    stop("remove_rows_above must be a logical scalar, not ", as.character(remove_rows_above))

Same here as my first comment above about making it if (!(... , do we want to fail if either condition isn't met - in which case wrap them in parentheses?

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 #' Elevate a row to be the column names of a data.frame. #' #' @param dat The input data.frame-#' @param row_number The row of \code{dat} containing the variable names-#' @param remove_row Should the row \code{row_number} be removed from the resulting data.frame?-#' @param remove_rows_above If \code{row_number != 1}, should the rows above \code{row_number} - that is, between-#'   \code{1:(row_number-1)} - be removed from the resulting data.frame?+#' @param row_number The row of \code{dat} containing the variable names or the+#'   string \code{"find_header"} to use \code{find_header(dat=dat, ...)} to find+#'   the row_number.+#' @param ... Sent to \code{find_header()}, if+#'   \code{row_number = "find_header"}.  Otherwise, ignored.+#' @param remove_row Should the row \code{row_number} be removed from the+#'   resulting data.frame?+#' @param remove_rows_above If \code{row_number != 1}, should the rows above+#'   \code{row_number} - that is, between \code{1:(row_number-1)} - be removed+#'   from the resulting data.frame? #' @return A data.frame with new names (and some rows removed, if specified)+#' @family Set names #' @examples #' x <- data.frame(X_1 = c(NA, "Title", 1:3), #'                 X_2 = c(NA, "Title2", 4:6)) #' x %>% #'   row_to_names(row_number = 2)+#'+#' x %>%+#'   row_to_names(row_number = "find_header") #' @export-row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above = TRUE) {+row_to_names <- function(dat, row_number, ..., remove_row = TRUE, remove_rows_above = TRUE) {   # Check inputs-  if (length(row_number) != 1 | !is.numeric(row_number)) {-    stop("row_number must be a numeric of length 1")+  if (!is.logical(remove_row) & length(remove_row) == 1) {

Should this be if (!(is.logical(remove_row) & length(remove_row) == 1)) { ?

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 #' Elevate a row to be the column names of a data.frame. #' #' @param dat The input data.frame-#' @param row_number The row of \code{dat} containing the variable names-#' @param remove_row Should the row \code{row_number} be removed from the resulting data.frame?-#' @param remove_rows_above If \code{row_number != 1}, should the rows above \code{row_number} - that is, between-#'   \code{1:(row_number-1)} - be removed from the resulting data.frame?+#' @param row_number The row of \code{dat} containing the variable names or the+#'   string \code{"find_header"} to use \code{find_header(dat=dat, ...)} to find+#'   the row_number.+#' @param ... Sent to \code{find_header()}, if+#'   \code{row_number = "find_header"}.  Otherwise, ignored.+#' @param remove_row Should the row \code{row_number} be removed from the+#'   resulting data.frame?+#' @param remove_rows_above If \code{row_number != 1}, should the rows above+#'   \code{row_number} - that is, between \code{1:(row_number-1)} - be removed+#'   from the resulting data.frame? #' @return A data.frame with new names (and some rows removed, if specified)+#' @family Set names #' @examples #' x <- data.frame(X_1 = c(NA, "Title", 1:3), #'                 X_2 = c(NA, "Title2", 4:6)) #' x %>% #'   row_to_names(row_number = 2)+#'+#' x %>%+#'   row_to_names(row_number = "find_header") #' @export-row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above = TRUE) {+row_to_names <- function(dat, row_number, ..., remove_row = TRUE, remove_rows_above = TRUE) {   # Check inputs-  if (length(row_number) != 1 | !is.numeric(row_number)) {-    stop("row_number must be a numeric of length 1")+  if (!is.logical(remove_row) & length(remove_row) == 1) {+    stop("remove_row must be a logical scalar, not ", as.character(remove_row))

Would it be clearer to people who don't know what a scalar is to say "must be either TRUE or FALSE?" Also applies to subsequent use of "logical scalar"

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above =     dat   } }++#' Find the header row in a data.frame+#' +#' @details+#' If \code{...} is missing, then the first row with no missing values is used.+#' If \code{...} has a single character argument, then the first column is+#' searched for that value.  If \code{...} has a named numeric argument, then+#' the value of the argument is searched for the name (see the examples).  If+#' more than one row is found matching a value that is searched for, the first+#' matching row will be returned (with a warning).+#' 

Maybe we note that when searching for a string, the complete row requirement no longer applies. We could add in a new short paragraph here:

When searching for a specified column name, the first row with a match will be returned, regardless of the completeness of the rest of that row.

Is that an improvement, or just more wordy and confusing? It might be able to replace the last sentence in the preceding paragraph about what happens with multiple matches.

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above =     dat   } }++#' Find the header row in a data.frame+#' +#' @details+#' If \code{...} is missing, then the first row with no missing values is used.+#' If \code{...} has a single character argument, then the first column is+#' searched for that value.  If \code{...} has a named numeric argument, then+#' the value of the argument is searched for the name (see the examples).  If+#' more than one row is found matching a value that is searched for, the first+#' matching row will be returned (with a warning).+#' +#' @inheritParams row_to_names+#' @param ... See details+#' @return The row number for the header row+#' @family Set names+#' @examples+#' # the first row+#' find_header(data.frame(A="B"))+#' # the second row+#' find_header(data.frame(A=c(NA, "B")))+#' # the second row since the first has an empty value+#' find_header(data.frame(A=c(NA, "B"), B=c("C", "D")))+#' # The second row because the second column was searched for the text "D"+#' find_header(data.frame(A=c(NA, "B", "C", "D"), B=c("C", "D", "E", "F")), "D"=2)

Would this find the second row anyway because of the NA value in the first? In which case maybe make it "E" = 2

billdenney

comment created time in 7 days

Pull request review commentsfirke/janitor

Add a find_header() function to help with row_to_names()

 row_to_names <- function(dat, row_number, remove_row = TRUE, remove_rows_above =     dat   } }++#' Find the header row in a data.frame

This function looks great 👍

billdenney

comment created time in 7 days

PullRequestReviewEvent
PullRequestReviewEvent

push eventsfirke/janitor

Sam Firke

commit sha e22e8a58fbce0c4935fcd9393e57476443674a37

address OSX CI failure due to suggested pkg

view details

push time in 11 days

issue openedtidyverse/tidyr

replace_na does not warn or error when called on non-existent column

Continuing with the reprex from #356:

library(tidyr)
df <- tibble::tibble(a = c(1, NA))
replace_na(df, list(a = 100, b = 0))

This returns df as-is, with no warning. I would prefer it to warn that I called on a non-existent column, as is the case with other tidyverse functions. Erroring would be fine too.

Context: I called replace_na and mis-named my variable, then was surprised to find the NAs were still there.

created time in 13 days

push eventsfirke/janitor

Bill Denney

commit sha e7540d6835b0ab7643ebdccf1b4d4cd6395b669d

Warn if mu/micro is in make_clean_names() input but not handled by `replace` (#449)

view details

push time in 13 days

PR merged sfirke/janitor

Warn if mu/micro is in make_clean_names() input but not handled by `replace`

Fix #448

This issues a warning when mu or micro is in the input and may need to be handled. It will generate some false positives, but the false positives seem worth the potential issues caused by the false negatives. (And I think that the code needed to prevent false positives would likely be significant.)

+188 -28

5 comments

3 changed files

billdenney

pr closed time in 13 days