profile
viewpoint
Yu Ishikawa yu-iskw San Francisco, CA https://www.linkedin.com/in/yuishikawa0301/ Machine Learning/Deep Learning/Data Science/Self Driving Car Engineering/Apache Spark/Tensorflow/Keras/Apache Beam/Google Cloud

yu-iskw/bigquery-to-datastore 48

Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow

yu-iskw/auto-sklearn-examples 12

auto-sklearn examples on Jupyter notebooks

yu-iskw/bisecting-kmeans 11

An implementation of Bisecting KMeans Clustering which is a kind of Hierarchical Clustering algorithm on Spark

yu-iskw/aas 0

Code to accompany Advanced Analytics with Spark from O'Reilly Media

yu-iskw/algebird 0

Abstract Algebra for Scala

yu-iskw/als-benchmark-scripts 0

ALS benchmark scripts

yu-iskw/ambrose 0

A platform for visualization and real-time monitoring of data workflows

startedsungchun12/dbt_bigquery_example

started time in 10 days

startedmikekaminsky/dbt-helper

started time in 10 days

issue openedfishtown-analytics/dbt

Insert a BigQuery table only when the last modified timestamp is after a given timestamp

Describe the feature

The idea is probably very close to dbt_utils.insert_by_period_materialization, but I would like to decide to insert a BigQuery table. Because, a BigQuery table doesn't potentially any timestamp field. And even running queries just to get the information about the latest value is potentially costly.

And consider if a project has a hundred of models and it takes a long time to execute dbt run. moreover, it is accidentally killed or failed in the middle of the process. So, we have to rerun dbt run again, however we want to resume the process as much as possible in order to reduce cost and elapsed time.

For instance, if we want to insert a table whose last updated timestamp is only more than 24 hours, a possible config is like below.

{{
  config(
    materialized="insert_by_last_modified",
    period="24 hours",
    enabled=(target.type == 'bigquery')
  )
}}

...

Describe alternatives you've considered

No idea.

Additional context

NA

Who will this benefit?

I think the feature is probably very useful for many BigQuery users.

Are you interested in contributing this feature?

Yes, I am interested in implementing it. But, I am not sure what is the best approach, since I am still very new to dbt.

created time in 10 days

issue closedfishtown-analytics/dbt

tag based inclusion and exclusion with `dbt run`

Describe the feature

I would like to select run models by tags. Consider if we have daily scheduled models and hourly based models in a dbt project. As well as, we tag them with daily or hourly. So, I want to separately run only hourly scheduled models from daily schedule models. One possible senario is to select models by the tag of hourly.

dbt run --tags hourly

Describe alternatives you've considered

We can also realize something similar with dbt run -m, by defining the naming convention of models, for instance, daily_xxx and hourly_xxx. But it has a negative impact on the table names.

dbt run -m {dbt_project}.hourly_*

Additional context

No specific database(s)

Who will this benefit?

As I mentioned above, when we schedule dbt run, the proposed feature enables us to specify what models are run based on the interval.

Are you interested in contributing this feature?

Yes, I am interested in implementing it.

closed time in 11 days

yu-iskw

issue commentfishtown-analytics/dbt

tag based inclusion and exclusion with `dbt run`

Sorry, I figured out the solution. https://docs.getdbt.com/reference/resource-configs/tags/

yu-iskw

comment created time in 11 days

issue openedfishtown-analytics/dbt

tag based inclusion and exclusion with `dbt run`

Describe the feature

I would like to select run models by tags. Consider if we have daily scheduled models and hourly based models in a dbt project. As well as, we tag them with daily or hourly. So, I want to separately run only hourly scheduled models from daily schedule models. One possible senario is to select models by the tag of hourly.

dbt run --tags hourly

Describe alternatives you've considered

We can also realize something similar with dbt run -m, by defining the naming convention of models, for instance, daily_xxx and hourly_xxx. But it has a negative impact on the table names.

dbt run -m {dbt_project}.hourly_*

Additional context

No specific database(s)

Who will this benefit?

As I mentioned above, when we schedule dbt run, the proposed feature enables us to specify what models are run based on the interval.

Are you interested in contributing this feature?

Yes, I am interested in implementing it.

created time in 11 days

pull request commentyu-iskw/kuromoji-for-bigquery

Replace the deprecated method to read BigQuery table

@Komei22 I have replaced the deprecated method to read BigQuery. As well as, it uses DIRECT_READ method and limits columns to improve the performance. The merged code has been tagged as 0.2.1.

yu-iskw

comment created time in 21 days

created tagyu-iskw/kuromoji-for-bigquery

tag0.2.1

Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale

created time in 21 days

push eventyu-iskw/kuromoji-for-bigquery

Yu Ishikawa

commit sha 516ec3a9e8c0359f982242556beeda51f2daf99f

Replace the deprecated method to read BigQuery table (#10) * Replace the deprecated method to read BigQuery * Change the version to 0.2.1 * Modify README * Fix

view details

push time in 21 days

PR merged yu-iskw/kuromoji-for-bigquery

Replace the deprecated method to read BigQuery table

Overciew

By upgradeing Apache Beam to 2.20.0, the method to BigQuery table is deprecated. So, it would be great to use the latest manner. Moreover, DIRECT_READ method to the new BigQuery reader enables us to improve the performance to read.a table.

+60 -3

0 comment

4 changed files

yu-iskw

pr closed time in 21 days

push eventyu-iskw/kuromoji-for-bigquery

Yu ISHIKAWA

commit sha 0330acc66a0b1d0001887ef952dd7d4a93786a6d

Fix

view details

push time in 21 days

PR opened yu-iskw/kuromoji-for-bigquery

Replace the deprecated method to read BigQuery table

Overciew

By upgradeing Apache Beam to 2.20.0, the method to BigQuery table is deprecated. So, it would be great to use the latest manner.

+60 -4

0 comment

4 changed files

pr created time in 21 days

push eventyu-iskw/kuromoji-for-bigquery

Yu ISHIKAWA

commit sha bc185d9f7ab4d8f581a0cc69bb5c10fd50ef3c13

Modify README

view details

push time in 21 days

create barnchyu-iskw/kuromoji-for-bigquery

branch : change-bq-read

created branch time in 21 days

issue openedyu-iskw/kuromoji-for-bigquery

Upgrade kuromoji

Overview

At the time of writing the issue, it uses kuromoji 0.7.7. But, it seems outdated, since the version was bumped up 0.9.0. So, it would be great to upgrade the dependencies.

  • https://github.com/atilika/kuromoji/releases/tag/0.9.0

created time in 24 days

pull request commentyu-iskw/kuromoji-for-bigquery

Bump up 0.2.0

FYI: @Komei22 I have changed version in pom.xml as 0.2.0, since we upgraded the beam version. So, the version with the built jar file is changed too.

yu-iskw

comment created time in 24 days

issue closedyu-iskw/kuromoji-for-bigquery

Change the version in pom.xml

Overview

Thanks to the contribution, we have upgraded beam version. So, it would be nice to change the version in pom.xml as well.

closed time in 24 days

yu-iskw

issue commentyu-iskw/kuromoji-for-bigquery

Change the version in pom.xml

Tagged https://github.com/yu-iskw/kuromoji-for-bigquery/releases/tag/0.2.0

yu-iskw

comment created time in 24 days

created tagyu-iskw/kuromoji-for-bigquery

tag0.2.0

Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale

created time in 24 days

push eventyu-iskw/kuromoji-for-bigquery

Yu Ishikawa

commit sha 16508e04db2a3a3866c5c3b7f0097944beb122fb

Bump up 0.2.0 (#7)

view details

push time in 24 days

PR merged yu-iskw/kuromoji-for-bigquery

Bump up 0.2.0

https://github.com/yu-iskw/kuromoji-for-bigquery/issues/8

+2 -2

0 comment

2 changed files

yu-iskw

pr closed time in 24 days

issue openedyu-iskw/kuromoji-for-bigquery

Change the version in pom.xml

Overview

Thanks to the contribution, we have upgraded beam version. So, it would be nice to change the version in pom.xml as well.

created time in 24 days

PR opened yu-iskw/kuromoji-for-bigquery

Bump up 0.2.0
+2 -2

0 comment

2 changed files

pr created time in 24 days

create barnchyu-iskw/kuromoji-for-bigquery

branch : v0.2.0

created branch time in 24 days

created tagyu-iskw/kuromoji-for-bigquery

tag0.1.0

Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale

created time in 24 days

pull request commentyu-iskw/kuromoji-for-bigquery

Using new apache beam version

@Komei22 Merged. Thank you for the contribution. That would be super helpful to me, since I haven't maintained the repository for a while.

Komei22

comment created time in 25 days

push eventyu-iskw/kuromoji-for-bigquery

Komei22

commit sha c70e74f05a0e35e6fa22e41984cb0f268ddfd13c

Using new apache beam version (#5) * update apache beam and dependent packeages * update readme

view details

push time in 25 days

PR merged yu-iskw/kuromoji-for-bigquery

Using new apache beam version

This pull request recreate by following discussion. https://github.com/yu-iskw/kuromoji-for-bigquery/pull/1#issuecomment-658542841

This pull request updated apache beam and its dependent packages version. This is because the current apache beam version used in kuromoji-for-bigquery is deprecated.

ref: https://cloud.google.com/dataflow/docs/support/sdk-version-support-status?hl=ja#apache-beam-sdks-2x

+5 -5

4 comments

2 changed files

Komei22

pr closed time in 25 days

pull request commentyu-iskw/kuromoji-for-bigquery

Using new apache beam version

@Komei22 Can you rebase your branch with the master branch? I have removed the test environment with openjdk7.

Komei22

comment created time in 25 days

push eventyu-iskw/kuromoji-for-bigquery

Yu Ishikawa

commit sha 2e629cd1a926621f259a5ef033bd03f085370d93

Remove openjdk7 (#6)

view details

push time in 25 days

pull request commentyu-iskw/kuromoji-for-bigquery

Using new apache beam version

I will stop tests with `openjdk7 at https://github.com/yu-iskw/kuromoji-for-bigquery/pull/6 . The CI at the PR should be going to pass after merging it.

Komei22

comment created time in 25 days

create barnchyu-iskw/kuromoji-for-bigquery

branch : remove-jdk7

created branch time in 25 days

pull request commentyu-iskw/kuromoji-for-bigquery

Using new apache beam version

@Komei22 Thank you for creating the PR! I look into the error of the CI.

Komei22

comment created time in 25 days

issue closedyu-iskw/kuromoji-for-bigquery

test

closed time in a month

yu-iskw

issue openedyu-iskw/kuromoji-for-bigquery

test

created time in a month

pull request commentyu-iskw/kuromoji-for-bigquery

Using new apache beam version

@Komei22 Thank you so much for the PR. I have a request to you. Can you send another new PR in order to run the CI? I haven't maintained the repository for a while. The CI was no longer working due to the outdated manner. So, I changed the GitHub repository settings and fix the travis issue. If you create another PR with your branch, the CI should work. https://github.com/yu-iskw/kuromoji-for-bigquery/pull/3

Komei22

comment created time in a month

push eventyu-iskw/kuromoji-for-bigquery

Yu Ishikawa

commit sha 15d349407e21c20049bf321d8e531d2c971b78ac

Update travis (#3) * Add dist: trusty * Add oraclejdk11

view details

push time in a month

push eventyu-iskw/kuromoji-for-bigquery

Yu ISHIKAWA

commit sha 4be789765f00ad08ace35f896d8d19aae5ee176b

Add oraclejdk11

view details

push time in a month

create barnchyu-iskw/kuromoji-for-bigquery

branch : update-travis

created branch time in a month

issue commentpolyaxon/polyaxon

cannot login user "root" with default password "rootpassword" while no config changed

@mouradmourafiq I have the same issue with polyaxon 0.6.1. I just rebuild polyaxon 0.6.1, not v1.1. I can't login as root. As well as, I looked into db_user table to make sure if root exists or not. Surprisingly, root doesn't exist. Do you have any idea why root was not created?

polyaxon=> select * from db_user;
 id | password | last_login | is_superuser | username | first_name | last_name | email | is_staff | is_active | date_joined
----+----------+------------+--------------+----------+------------+-----------+-------+----------+-----------+-------------
(0 rows)
yh-xu

comment created time in a month

issue openedMarquezProject/marquez

Add other source types for Google Cloud

We would like to support other source types for Google Cloud.

  • Google Cloud Storage
  • BigQuery
  • Spanner
  • BigTable
  • Firestore

created time in 2 months

startedgithub/super-linter

started time in 2 months

startedheartexlabs/label-studio

started time in 2 months

startedtpope/vim-rhubarb

started time in 2 months

push eventyu-iskw/.vim

Yu ISHIKAWA

commit sha 3bea7130fef3f2e1952eb4e8364c2073812c6758

Update

view details

push time in 2 months

startedMarquezProject/marquez

started time in 3 months

startedfacebookresearch/spreadingvectors

started time in 3 months

more