profile
viewpoint

mayer79/missRanger 41

R package "missRanger" for fast imputation of missing values by random forests.

mayer79/flashlight 14

Machine learning explanations

mayer79/MetricsWeighted 7

R package for weighted model metrics

mayer79/ml_lecture 7

intro to ML

mayer79/outForest 7

Outlier detection based on random forest models

mayer79/Bootstrap-p-values 6

Jupyter notebook showing how to get bootstrap p values in python in the two-sample t test setting

mayer79/data_preparation_r 5

base R vs. tidyverse vs. data.table vs. sqldf

mayer79/partialPlot 5

Partial dependency plots in R for xgboost, lightGBM and ranger objects

mayer79/foodDetector 4

Deep learning in R and Windows...

issue commentmicrosoft/LightGBM

[New idea - Feature Request] an option to exclude features from sampling

I think it is a good idea. The fast random forest implementation ranger in R offers this and I use it quite frequently in situations where there are many redundant columns and one important "golden feature".

TpTheGreat

comment created time in 3 hours

issue commentr-lib/devtools

`Error: callr subprocess failed: Vignette re-building failed.` in build_vignettes()

@jcubic: maybe you can try to update pandoc - sometimes this magically solves such issues.

matthiasgomolka

comment created time in 4 days

push eventmayer79/image_art

mayer79

commit sha f9e417fbf375cb018f0bfbec76a279ed7108b441

jpg instead of svg

view details

push time in 11 days

startedmayer79/image_art

started time in 11 days

issue commentmicrosoft/LightGBM

Histogram algorithm behaviour not as expected

I think LGB uses 0 as fixed binning position. So if you shift x or take log, this can affect the fit. I can't find a reference for this, so it would be good if one of the core developers could clarify.

kapytaine

comment created time in 21 days

pull request commentmicrosoft/LightGBM

[R-package] added argument eval_train_metric to lgb.cv() (fixes #4911)

Thanks as always for the help @mayer79 !

Thanks go to you, sir :-)

mayer79

comment created time in 24 days

delete branch mayer79/LightGBM

delete branch : eval_train_metric

delete time in 24 days

push eventmayer79/LightGBM

Michael Mayer

commit sha a82e0631733623477d4d8215d0d1259b69ec1bb6

Update R-package/tests/testthat/test_basic.R Co-authored-by: James Lamb <jaylamb20@gmail.com>

view details

push time in 24 days

push eventmayer79/LightGBM

mayer79

commit sha 987c51d344a29abfbe651098d602552bffcafe89

move new argument to the last position

view details

mayer79

commit sha 2b0524334e8fe178987ce8f2247305e9da1d85da

update R docu

view details

mayer79

commit sha 21dc2d776ae42517679474afeb709e4b2c7b6040

unit tests for eval_train_metric

view details

push time in 24 days

Pull request review commentmicrosoft/LightGBM

[R-package] added argument eval_train_metric to lgb.cv()

 lgb.cv <- function(params = list()                    , record = TRUE                    , eval_freq = 1L                    , showsd = TRUE+                   , eval_train_metric = FALSE

Okay.

mayer79

comment created time in 24 days

PullRequestReviewEvent

push eventmayer79/LightGBM

mayer79

commit sha 1c156639e016dac975b7cad85f098ba0408adff3

removed further trailing whitespace

view details

push time in 25 days

push eventmayer79/LightGBM

mayer79

commit sha 86d90777d79d34f2de9181caec08d6ce45046d0b

remove unnecessary whitespace

view details

push time in 25 days

PR opened microsoft/LightGBM

added argument eval_train_metric

Attempt to solve https://github.com/microsoft/LightGBM/issues/4911

If we set eval_train_metric = TRUE in lgb.cv(), then we get output like

[1] "[50]: train's binary_logloss:0.248276+0.00109575 valid's binary_logloss:0.248655+0.00448549"

The resulting object "cvm" contains the information in the record_evals slot and the average training performance of the best round could be found by something like cvm$record_evals$train$binary_logloss$eval[[cvm$best_iter]].

There is no unit test yet, but that does not mean we shouldn't write one.

+12 -4

0 comment

1 changed file

pr created time in 25 days

push eventmayer79/LightGBM

mayer79

commit sha 4272cecf5ed5e379143645c5fcc65c0ff702bed9

added argument eval_train_metric

view details

push time in 25 days

create barnchmayer79/LightGBM

branch : eval_train_metric

created branch time in 25 days

fork mayer79/LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

https://lightgbm.readthedocs.io/en/latest/

fork in 25 days

push eventmayer79/ml_lecture

mayer79

commit sha 64074f4a28b55ac2f99c7ed37cf3dde4369e5a4c

slight improvements in py code

view details

push time in a month

push eventmayer79/ml_lecture

mayer79

commit sha 0aebf69953f7ed8c4ba66b6a107b603f4d9b3573

py dalex update with facet_scales="free"

view details

push time in a month

push eventmayer79/ml_lecture

mayer79

commit sha 71291c39e0990206de37791335f87381b38f6c11

be less specific about what the "best" model is

view details

push time in a month

push eventmayer79/ml_lecture

mayer79

commit sha cc85ff7dacf6f3a2fd601e56e59c7627e7b4d403

added training performance to XGB/LGB grid search

view details

push time in a month

issue openedmicrosoft/LightGBM

[R] add flag of displaying train loss for lgb.cv()

In https://github.com/microsoft/LightGBM/issues/1105, it was suggested to add the option eval_train_metric to lgb.cv() in order to see not only CV scores but also training scores. This is very useful to monitor overfitting.

In Python, it was implemented in https://github.com/microsoft/LightGBM/pull/2089.

I would love to see something similar for R.

created time in a month

push eventmayer79/image_art

Michael Mayer

commit sha ba64505110766f19fa51be7c60eaa332b463966e

added second img

view details

push time in a month

push eventlorentzenchr/notebooks

Michael Mayer

commit sha 877699fe689500943d66e70cb86775e55a4aca29

Add files via upload

view details

push time in a month

startedmayer79/xmas_tree_r

started time in a month

issue closedmayer79/missRanger

How to save the trained Random Forest model and use it to impute new data set?

Hi, appreciate if you can let me know a little bit on how to use missRanger to develop a Random Forest model on a training data set and use it on test data set? Thank you very much!

closed time in a month

DrRoad

issue commentmayer79/missRanger

How to save the trained Random Forest model and use it to impute new data set?

It is currently not possible. The reason is simple: there is no obvious way to use random forests for imputation in test/production.

DrRoad

comment created time in a month

push eventmayer79/xmastreer

Michael Mayer

commit sha b1ae264251fa98dadba574b183aa02f835ef7907

Update README.md

view details

push time in a month

push eventmayer79/xmastreer

Michael Mayer

commit sha 6062f91a6bcd49dcd3d0a4be184a3cdd50a33bf6

Update README.md

view details

push time in a month

push eventmayer79/xmastreer

Michael Mayer

commit sha de80539b5469fd88e42eee41871aec0176de3eb3

Update README.md

view details

push time in a month

more