profile
viewpoint
Epstein Ben-Epstein Splice Machine Washington, DC https://www.linkedin.com/in/ben-epstein-2317a610a Machine Learning Engineer at Splice Machine passionate about Machine Learning accessibility | WashU Stl 2019 | https://www.linkedin.com/in/ben-epstein/

gilakos/astra-money-survey 1

nano web app to visualize our money survey

Ben-Epstein/api 0

Coinigy API Examples

Ben-Epstein/beakerx 0

Beaker Extensions for Jupyter Notebook

Ben-Epstein/beakerx-feedstock 0

A conda-smithy repository for beakerx.

Ben-Epstein/mlflow 0

Open source platform for the machine learning lifecycle

Ben-Epstein/pysplicec 0

Creating a SpliceMachineContext using Python and the JVM

create barnchsplicemachine/pysplice

branch : DBAAS-4100

created branch time in 6 days

delete branch splicemachine/pysplice

delete branch : DBAAS-3992

delete time in 6 days

delete branch splicemachine/pysplice

delete branch : DBAAS-4021

delete time in 6 days

delete branch splicemachine/pysplice

delete branch : DBAAS-4099

delete time in 6 days

push eventsplicemachine/pysplice

Epstein

commit sha acd10a7bc714b53a5e70e2e0737f19d23c073592

Dbaas 4099 (#68) * file name extension * whitespace

view details

push time in 7 days

PR merged splicemachine/pysplice

Reviewers
Dbaas 4099

Fix artifact file extension name and added iframe for mlflow UI

+7 -3

0 comment

2 changed files

Ben-Epstein

pr closed time in 7 days

push eventsplicemachine/pysplice

Epstein

commit sha 2cc2716d193d7d37ea9379f4e3ae64f83bed9c74

function for getting the spark UI (#67)

view details

Epstein

commit sha 0737228bb335850e0dbb87255a2d1ea8141fdb2e

Merge branch 'master' into DBAAS-4099

view details

push time in 7 days

PR opened splicemachine/pysplice

Reviewers
Dbaas 4099
+8 -4

0 comment

2 changed files

pr created time in 7 days

delete branch splicemachine/pysplice

delete branch : bump

delete time in 7 days

delete branch splicemachine/pysplice

delete branch : NSDS_PATCH

delete time in 7 days

delete branch splicemachine/pysplice

delete branch : eNSDS-3.1.0.1951

delete time in 7 days

delete branch splicemachine/pysplice

delete branch : DBAAS-3682

delete time in 7 days

push eventsplicemachine/pysplice

Epstein

commit sha 2cc2716d193d7d37ea9379f4e3ae64f83bed9c74

function for getting the spark UI (#67)

view details

push time in 7 days

push eventsplicemachine/pysplice

Ben Epstein

commit sha 71fada50a1d4470e6d796fe797757a2af0d5dc71

whitespace

view details

push time in 8 days

issue commentkrishnan-r/sparkmonitor

Does this support multiple spark notebooks ?

Hello, I've modified the SparkMonitor to work with Multiple Spark Sessions here: https://github.com/ben-epstein/sparkmonitor

@krishnan-r If you'd like to merge it into your repo just let me know.

For anyone interested in using it, you can install it with

pip install sparkmonitor-s==0.0.13
jupyter nbextension install sparkmonitor --py --user --symlink 
jupyter nbextension enable sparkmonitor --py --user            
jupyter serverextension enable --py --user sparkmonitor
ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

If you've already installed the original sparkmonitor, you're going to have to remove it as well as the jupyter extension (which I'm not actually sure how to do...). In order for these changes to take effect, you need to fully remove the old extension and then enable this one. If you just want to test it, you can clone the repo and run

docker build -t sparkmonitor .
docker run -it -p 8888:8888 sparkmonitor
AbdealiJK

comment created time in 8 days

PR opened splicemachine/pysplice

Reviewers
function for getting the spark UI
+18 -1

0 comment

1 changed file

pr created time in 8 days

push eventBen-Epstein/sparkmonitor

Epstein

commit sha 97e41028f34da69cef4372df8e838ac435c6f58f

Update README.md

view details

push time in 8 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha 35e112898dc648d97f82303159633e593a79e4b7

multi spark session working

view details

Ben Epstein

commit sha 36f6af1a101aeb49b3ab2bb7da5529cf44347a07

multi spark session working

view details

Ben Epstein

commit sha e1af231c4ab969a9b1c5afb2dce8a979981e1ddb

node modules

view details

Ben Epstein

commit sha 995418522330d3ca3a8c93bc4379226531e5d681

cleanup

view details

Ben Epstein

commit sha 1a9a03716e75be3cf044d7185d584c4b61568195

bump version'

view details

Epstein

commit sha 61f49373d53f14ae987ee56484b257ed2705adac

Merge pull request #1 from Ben-Epstein/working_multi_spark_session Working multi spark session

view details

push time in 8 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha 1a9a03716e75be3cf044d7185d584c4b61568195

bump version'

view details

push time in 8 days

create barnchsplicemachine/pysplice

branch : DBAAS-4099

created branch time in 8 days

create barnchsplicemachine/pysplice

branch : DBAAS-3682

created branch time in 10 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha 995418522330d3ca3a8c93bc4379226531e5d681

cleanup

view details

push time in 12 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha e1af231c4ab969a9b1c5afb2dce8a979981e1ddb

node modules

view details

push time in 12 days

create barnchBen-Epstein/sparkmonitor

branch : working_multi_spark_session

created branch time in 12 days

issue commenttwosigma/beakerx

Heatmap/Data bars Don't work on tables from SQL

@jaroslawmalekcodete Hello, thanks for checking in. If you go to the binder and run the SQL notebook under Languages

The 3rd cell is a select cell. If you click the dropdown for the numeric columns and choose heatmap or databars, neither work as expected. The same features work fine with Pandas display tables. I've seen the same experience on beakerx 1.2.0, 1.3.0 and 1.5.0 (haven't tested 1.4.0). Thanks!

Ben-Epstein

comment created time in 13 days

push eventsplicemachine/pysplice

jpanko1

commit sha 148731d54855f91f592a7093a388990b2bc22faf

DB-8674 Sync with Scala version, and support NSDS v2 (#54) * Changes for eNSDS. * Merge master. * DB-8627 Changed _generateDBSchema to expect types dictionary to have column name keys instead of dataType keys. * Added PySpliceContext2 to init file. * Synced api with scala version. * DB-8674 Changed PySpliceContext2 to ExtPySpliceContext * DB-8674 Changed to single function each for tableExists and dropTable. * DB-9687 Updated createTable function. * DB-9687 Updated createTable functions and _dropTableIfExists. Co-authored-by: Epstein <bepstein@splicemachine.com> Co-authored-by: Ben Epstein <ben.epstein97@gmail.com>

view details

push time in 15 days

PR merged splicemachine/pysplice

Reviewers
DB-8674 Sync with Scala version, and support NSDS v2

DO NOT MERGE until the ENSDS is available in the DB version that the cloud is on (currently 3.0.0.1958)

+272 -30

0 comment

2 changed files

jpanko1

pr closed time in 15 days

Pull request review commentsplicemachine/pysplice

DB-8674 Sync with Scala version, and support NSDS v2

 def createTable(self, schema_table_name, dataframe, keys=None, create_table_opti         """         if to_upper:             dataframe = self.toUpper(dataframe)-        # Need to convert List (keys) to scala seq-        keys_seq = self.jvm.PythonUtils.toSeq(keys)-        self.context.createTable(schema_table_name, dataframe._jdf.schema(), keys_seq, create_table_options)+        self.context.createTableWithSchema(schema_table_name, df.schema, keys=keys, create_table_options=create_table_options)+        +    def createTableWithSchema(self, schema_table_name, schema, keys=None, create_table_options=None):

Wondering if this function is necessary since we have a createTable which takes a dataframe and does the same thing

jpanko1

comment created time in 2 months

issue closedscikit-learn/scikit-learn

Question: What is the recommended way to serialize Pipelines with custom transformers?

When building a Pipeline with custom transformers, what is the best way to serialize that for later use?

If you use pickle, you need to define those functions in the new environment, so that doesn't seem like a solution to me. I ran into the same issue with dill and joblib.

What is the best practice here?

Thanks!

closed time in 16 days

Ben-Epstein

issue commentscikit-learn/scikit-learn

Question: What is the recommended way to serialize Pipelines with custom transformers?

@avinashpancham I figured it out, you can use cloudpickle. Then you can unpickle with regular pickle when you want to use it.

Ben-Epstein

comment created time in 16 days

push eventsplicemachine/pysplice

Epstein

commit sha cde40f186fbfdd289570e33dfff82a129ce6834d

int to long (#57)

view details

Ben Epstein

commit sha acd307666b8cf0ada9fbd4b5714793af11412cb0

drop_table function :

view details

Christopher Maahs

commit sha ec6433e9ea2d723322c34ee5dad80b8b05d4729f

Merge pull request #58 from splicemachine/add_drop_option drop_table function

view details

Epstein

commit sha 4a88cdeb9c45e110ebfe7eb5bb758b01a4be6380

Update context.py (#59) Fix for df insert

view details

Epstein

commit sha 9ebfda02cb45226903f6f96364f66c4127147a3b

Dbaas 3949 (#60) * more keras logic * circular imports * circular imports * . instead of , * syntax * indentation * missing conditional * sloppy coding * missing reference * python float to java conversion failing * pass a string to be safe * longer prediction column * cleanup H2O model comparison * cleanup * cleanup * revert bad change * interesting toUpper bug

view details

Epstein

commit sha 389caa07e096396305e2f56f0ceae24eefccc040

cloudpickle (#61) * cloudpickle * testing recursion limit for sklearn * testing different write for sklearn * open params * found another way, reverting and testing * cleanup * remove unused imports

view details

Epstein

commit sha c05d67ada30e5d06b0c62046d642ed18d8346c33

Dbaas 3269 (#62) * initial logic for single table deployment done * typing issue * param fixes * going to need to replace splicectx with a single connection * weird import issues * param and func same name * param and func same name * sql syntax * referencing to secondary table * missing newrow reference * missing newrow reference * missing newrow reference * splice issue * truth value of df is ambiguous * splice-properties is picky * splice-properties is reallly picky * move error checking earlier * better checking for existing model deployed * checking wrong value * case insensitive checking * code cleanup * case insensitive checking * replace upsert with update

view details

Epstein

commit sha 29d0ed25b81fcc7a1c8bd1a4b327e3c64628b01c

Dbaas 3992 (#63) * working on metadata, cleanup other code * added metadata function * forgot to call * syntax * syntax * case sensitive * syntax * case sensitive * new table design * syntax change

view details

Epstein

commit sha 1b426489ef36eeaf6ffbef8bccb38b846482a390

check param length (#64)

view details

Epstein

commit sha 5d1a20ea09092fd36b38cd0d9afe6dd0c9281159

patch (#66)

view details

Epstein

commit sha 504f8f289f06bdbddf9c41717bbd95a6b26a427b

Update setup.py (#65)

view details

Ben Epstein

commit sha ae9770ba986a5a941ddd7fe85d53dcf0dd1478e6

Merge branch 'master' into eNSDS-3.1.0.1951

view details

push time in 20 days

created tagsplicemachine/pysplice

tag2.2.0-k8

created time in 22 days

release splicemachine/pysplice

2.2.0-k8

released time in 22 days

push eventsplicemachine/pysplice

Epstein

commit sha 504f8f289f06bdbddf9c41717bbd95a6b26a427b

Update setup.py (#65)

view details

push time in 22 days

PR merged splicemachine/pysplice

Reviewers
Update setup.py
+1 -1

0 comment

1 changed file

Ben-Epstein

pr closed time in 22 days

create barnchsplicemachine/pysplice

branch : NSDS_PATCH

created branch time in 22 days

PR opened splicemachine/pysplice

Reviewers
patch
+1 -1

0 comment

1 changed file

pr created time in 22 days

push eventsplicemachine/pysplice

Ben Epstein

commit sha 7a8e30f7be7604b4ae48cd591fa199b942368831

patch

view details

push time in 22 days

PR opened splicemachine/pysplice

Reviewers
Update setup.py
+1 -1

0 comment

1 changed file

pr created time in 22 days

create barnchsplicemachine/pysplice

branch : bump

created branch time in 22 days

push eventsplicemachine/pysplice

Epstein

commit sha 1b426489ef36eeaf6ffbef8bccb38b846482a390

check param length (#64)

view details

push time in 22 days

PR merged splicemachine/pysplice

Reviewers
check param length
+8 -2

2 comments

1 changed file

Ben-Epstein

pr closed time in 22 days

pull request commentsplicemachine/pysplice

check param length

@liprais

F-string is a feature that only python 3.6 or higher supports.

It's a good point, and worth considering further, thank you. f-strings are used through the repo currently. We also control the python version (3.7.3) because this is only used within the context of the Splice Machine k8s cluster. If that were to change, it would be worth looking into backwards compatibility

Ben-Epstein

comment created time in 22 days

push eventsplicemachine/pysplice

Epstein

commit sha 29d0ed25b81fcc7a1c8bd1a4b327e3c64628b01c

Dbaas 3992 (#63) * working on metadata, cleanup other code * added metadata function * forgot to call * syntax * syntax * case sensitive * syntax * case sensitive * new table design * syntax change

view details

push time in 22 days

PR merged splicemachine/pysplice

Reviewers
Dbaas 3992

model metadata tracking

+129 -42

0 comment

3 changed files

Ben-Epstein

pr closed time in 22 days

push eventBen-Epstein/sparkmonitor

Epstein

commit sha 7043078b289846cdca4aff22072956bcdb5943e7

Update README.md

view details

push time in 22 days

push eventBen-Epstein/sparkmonitor

Epstein

commit sha 6c05e82330a7083f2c4e27878fe47a58b82d1c3c

Update README.md

view details

push time in 22 days

PR opened splicemachine/pysplice

Reviewers
check param length
+8 -2

0 comment

1 changed file

pr created time in 22 days

create barnchsplicemachine/pysplice

branch : DBAAS-4021

created branch time in 22 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha 22cf2c9d278531e7e2fd6eef78f64e3c900003b2

cleanup

view details

push time in 25 days

push eventBen-Epstein/sparkmonitor

Ben Epstein

commit sha c43917518820470c56bb95b1bdaa59bd91237ca6

cleanup

view details

push time in 25 days

create barnchBen-Epstein/sparkmonitor

branch : MULTI_SPARK_SESSION

created branch time in 25 days

PR opened splicemachine/pysplice

Reviewers
Dbaas 3992

model metadata tracking

+129 -42

0 comment

3 changed files

pr created time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 88895e80d53237ab4c4150b5d4af560c26a1ccbb

syntax change

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 07d6bd343f21aca3fcef2578ba50fa55b1d0339d

new table design

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha d213662dc4fecc5de3dc0310d0a86154da3a23fd

case sensitive

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 4a8a443fa54785e1b816410b7ec994d32285bc0f

syntax

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 70e6ef91d6c54457b9018a805c4cf4a4b19628e3

case sensitive

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 240a4c463bc79aefc2a891316db8b26c14588f59

syntax

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 582b378c09ddcdb24cc3409366fc72da70a10793

syntax

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 234c63c6e05199fe5e0cc30c67243434bc1b0bec

forgot to call

view details

push time in a month

create barnchsplicemachine/pysplice

branch : DBAAS-3992

created branch time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 961cbebedfb074f67de2c9c29976c10c14fa8d39

autocommit

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 51659724d27a3f93a5f936f42bf2efa6c1e1ea1a

missing auto commit

view details

push time in a month

create barnchsplicemachine/pysplice

branch : DBAAS-4004

created branch time in a month

issue closedmlflow/mlflow

[BUG] When logging metrics over time, the first value is shown in the UI

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • [ ] Yes. I can contribute a fix for this bug independently.
  • [ X] Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • [ ] No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • MLflow installed from (source or binary): Source
  • MLflow version (run mlflow --version): 1.6.0
  • Python version: 3.7.3
  • Exact command to reproduce:
with mlflow.start_run(run_name='example of bad metrics'):
    for i in range(10):
        mlflow.log_metric('quality', i*0.5, step=i)

Describe the problem

With the above code, the first metric value (0) is shown in the UI, not the last (5). This should be the last so when comparing multiple runs, you can compare how the "final" models performed.

Code to reproduce issue

As above

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: Local serving, model deployment tools, spark UDFs
  • [X ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • [ X] area/uiux: Front-end, user experience, JavaScript, plotting
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

Language

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients

Integrations

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations

closed time in a month

Ben-Epstein

issue commentmlflow/mlflow

[BUG] When logging metrics over time, the first value is shown in the UI

@dmatrix I am still experience the problem, but I think that is due to my Tracking Store plugin. I will close, thanks for the help!

Ben-Epstein

comment created time in a month

create barnchsplicemachine/pysplice

branch : pandas_help

created branch time in a month

delete branch splicemachine/pysplice

delete branch : DBAAS-3990

delete time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 25f8192f8d096c2ce013c5f3caa6c756c71b2b3e

handle varchar

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 7e689ce2d3439fa9307795d01314de0fdd5ee467

sklearn is annoying

view details

push time in a month

create barnchsplicemachine/pysplice

branch : SK_PIP_PATCH

created branch time in a month

delete branch splicemachine/pysplice

delete branch : DBAAS-3269

delete time in a month

push eventsplicemachine/pysplice

Epstein

commit sha c05d67ada30e5d06b0c62046d642ed18d8346c33

Dbaas 3269 (#62) * initial logic for single table deployment done * typing issue * param fixes * going to need to replace splicectx with a single connection * weird import issues * param and func same name * param and func same name * sql syntax * referencing to secondary table * missing newrow reference * missing newrow reference * missing newrow reference * splice issue * truth value of df is ambiguous * splice-properties is picky * splice-properties is reallly picky * move error checking earlier * better checking for existing model deployed * checking wrong value * case insensitive checking * code cleanup * case insensitive checking * replace upsert with update

view details

push time in a month

PR merged splicemachine/pysplice

Reviewers
Dbaas 3269

Code for single table DB deployment New option to deploy to an existing table

+328 -187

0 comment

2 changed files

Ben-Epstein

pr closed time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 984a106a111974bc619070fc9240243207cec4ab

replace upsert with update

view details

push time in a month

GollumEvent

push eventsplicemachine/pysplice

Ben Epstein

commit sha 19b1f51f32647c3382fcb6b6a2831a772f2a018f

case insensitive checking

view details

push time in a month

PR opened splicemachine/pysplice

Reviewers
Dbaas 3269

Code for single table DB deployment New option to deploy to an existing table

+321 -181

0 comment

2 changed files

pr created time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha f052e501a96b15557c5032bacb5cfe9f3f7128ca

code cleanup

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha d66e68c21caa79002bb04e41acc423edc08917c3

case insensitive checking

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 63962e15ece7f290ff9e9826a9552cfaf960c04e

checking wrong value

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 6c82d5d6d1d9288c8d1d32daf39e54bf364ad138

better checking for existing model deployed

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 4040e293d8d85c6d17f9dd0b92bef7f8320239b9

move error checking earlier

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 7011a91938f720101e88751d5d52ee23fc04a963

splice-properties is reallly picky

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 5a21b17ad37f3eb66b0e31832fc21f5f685055cd

splice-properties is picky

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha ee7f28953c0d424b6d38446206fbb6af66905641

truth value of df is ambiguous

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 8ff5e379b7ce59adeddb57c8989b041503140a5f

splice issue

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 506eb75d695bc52a3a65fe2e6a6ef652ec52cd2e

missing newrow reference

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha c8e72f8325b09b16f0a56169aeb3a03ef1fcb068

missing newrow reference

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 71a8616ca3458d50e76bf92c0e11450b3ebbc317

missing newrow reference

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 3a7fd819dbd57d1fde12255949e18e0fbb5c2939

referencing to secondary table

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 79afe360a3311c6de75b6e25a8f9dc06952bacd9

sql syntax

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha bd95e3ad059877b44415c1aecdc6d60fcfb28aca

param and func same name

view details

push time in a month

push eventsplicemachine/pysplice

Ben Epstein

commit sha 61bfb921d6c436553c9517eee490212fc66340c5

param and func same name

view details

push time in a month

more