profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/hanxiao/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Han Xiao hanxiao @jina-ai Berlin, Germany https://hanxiao.io Founder @jina-ai | Creator of Fashion-MNIST & bert-as-service | We're hiring 👐

hanxiao/bert-as-service 8976

Mapping a variable-length sentence to a fixed-length vector using BERT model

gnes-ai/gnes 1177

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

hanxiao/daanet 145

DAANet: Dual Ask-Answer Network for Machine Reading Comprehension

gnes-ai/hub 33

GNES Hub ship AI/ML models as Docker containers and use Docker containers as plugins.

hanxiao/demo-gnes-flow 11

Demo of building a flower image search using GNES Flow API

hanxiao/benchmark 8

Benchmarking GNES on its network latency

hanxiao/bert 2

TensorFlow code and pre-trained models for BERT

hanxiao/activitywatch 1

Log what you do on your computer. Simple (yet powerful), extensible, no third parties.

hanxiao/all-contributors-cli 1

Tool to help automate adding contributor acknowledgements according to the all-contributors specification ✨

push eventjina-ai/jina

Jina Dev Bot

commit sha a2b62ff411b4a2269ce7fedf0e2c2d4ea741b15e

chore(contributor): update contributors

view details

push time in 2 hours

Pull request review commentjina-ai/jina

Improve Formatter

 def format(self, record):         :param record: A LogRecord object         :returns: Formatted LogRecord with level-colour MAPPING to add corresponding colour.         """-        cr = copy(record)+        cr = deepcopy(record)

Sure. Since the formatter is used in logger, which is used extensively. I'm not sure if this will harm the performance, haven't got time to look into it. To my understanding, we should use deepcopy only when necessary.

And it's good that you can raise this suggestion.

chunyuema

comment created time in 2 hours

Pull request review commentjina-ai/jina-hub

feat: enable Sentencizer to segment text besides English

 class Sentencizer(BaseSegmenter):     The text is split by the punctuation characters listed in ``punct_chars``.     The sentences that are shorter than the ``min_sent_len``     or longer than the ``max_sent_len`` after stripping will be discarded.-+    +    :param lang: language of the input text, by default "en".

@JoanFM New commit that removes the filter. CI/hub-build-test is failing, any pointers why? I am a new contributor so not sure why it might be happening :/

haroonrashid235

comment created time in 2 hours

push eventjina-ai/jina

Jina Dev Bot

commit sha 2f51048c3b2a1c7fb3cf4fe7c4d0fea19581fe5c

chore(style): reformatted by jina-dev-bot

view details

Jina Dev Bot

commit sha 4b6af96dec04ac17a9375f44c73ff72ac13c36ec

chore(contributor): update contributors

view details

cristian

commit sha cb40b44f05212dbf69f8ef40792d094e51553048

ci: include docstr linter (#2045)

view details

cristian

commit sha 3af29051e8e9e35eca5c76c52f86e60df515c834

feat(binarypb): delete on dump (#2102)

view details

Maximilian Werk

commit sha 9bbb0769b0474ddb5a0682b518f41f7e9643ff43

fix: expose env variable for workspace (#2114)

view details

CatStark

commit sha 7169fb56ad2fa3a919c0225e35fdb0ed08b910e4

fix: fix traversal_path, change from c to r (#2116)

view details

cristian

commit sha 640daf4d389768be216dac4125ef4e837ee65d23

ci: add black (#2036) * ci: add black * ci: add git blame

view details

Jina Dev Bot

commit sha e01e57df00deda8ea7bbda1f0a26ba25c60782a6

chore(contributor): update contributors

view details

cristian

commit sha dc2be2f009b8e82be8241363efd71ce3f32cbf84

ci: reenable docstrings lint (#2118)

view details

cristian

commit sha c258e4aa22495d3809ecbcb0ee9966938ccdbfe5

docs: update black docs and sha (#2117)

view details

Jina Dev Bot

commit sha 7dd876d0a1fbfca3818c13a68521b80e43a1c617

chore(contributor): update contributors

view details

Jina Dev Bot

commit sha 47ac7b0a8d55faf8032579cb6e114e9b02bf392f

chore(version): the next version will be 1.0.9 build(hanxiao): Sunday night weekly patch release

view details

Jina Dev Bot

commit sha 3e91a1faf4680e055aff4cf1d6db637024aa5612

chore(contributor): update contributors

view details

Jina Dev Bot

commit sha 5e32eddc940b3259fe712d3fffeecc96c8e23afb

Merge remote-tracking branch 'origin/master'

view details

Jina Dev Bot

commit sha 666d302ef490d35f7eb080f108994e4582c59dc2

chore(docs): update TOC

view details

Han Xiao

commit sha dd687735bb2c569f8dee51ff262d88b3f271b681

refactor(cli): rename silent to quiet (#2122)

view details

Deepankar Mahapatro

commit sha b429d2215475e56a8808b5687db9e90c2d1e133e

feat(schema): generate pydantic based jsonschema for any jina proto (#2121) * feat(schema): genereate pydantic based jsonschema for any jina proto * docs: fix return type * docs: fix docstrings * feat(schema): camel case support for all fields * test(schema): jina document to pydantic document * fix(schema): remove proto name check

view details

Florian Hönicke

commit sha caae3f6d9ba9e29583f08a7d721f8a1629e171fa

refactor: crud delete types (#2014)

view details

Joan Fontanals

commit sha f0b6a44045f7fccca05a34020fd42981ce34dc4e

refactor: prepare changes to have batching for every executor (#2110) Co-authored-by: Nan Wang <nan.wang@jina.ai>

view details

Florian Hönicke

commit sha 24ff01d30f0d6af7ed752d00a545e806108c333b

refactor: merge master

view details

push time in 2 hours

Pull request review commentjina-ai/jina-hub

feat: enable Sentencizer to segment text besides English

 class Sentencizer(BaseSegmenter):     The text is split by the punctuation characters listed in ``punct_chars``.     The sentences that are shorter than the ``min_sent_len``     or longer than the ``max_sent_len`` after stripping will be discarded.-+    +    :param lang: language of the input text, by default "en".

I agree it is better to remove the check than to add this language parameter

haroonrashid235

comment created time in 2 hours

PR opened jina-ai/jina-hub

feat: enable Sentencizer to segment text besides English

Fix for issue #5141

PR Summary:

  • Update the santicizer to switch off filter for languages other than english
  • Added small test for French and German and fixed the Chinese unit test
  • Added test to check for filter in case of non-english characters in english sentence.

Questions:

  • How important is the filter on the following line. A potential solution could be to remove the filter altogether: https://github.com/jina-ai/jina-hub/blob/cb7a176535eed6c86973465e8c6bfc977a4c572e/segmenters/nlp/Sentencizer/init.py#L66

Currently, here is no test case that signifies the importance of the filter. A test is added to filter the emoji but I can't think of any significant test case which makes the the filter a must.

+42 -6

0 comment

2 changed files

pr created time in 2 hours

pull request commentjina-ai/cookiecutter-jina

feat: use version matching clause

@ddelange The illustration in this PR is really good and well-organized. Which I learned a lot! Looking forward to your more contributions! ❤️

ddelange

comment created time in 2 hours

push eventjina-ai/jina

YongxuanZhang

commit sha 282b402b0be2b93dd48e034336bafe9b1829d6aa

fix: test

view details

push time in 2 hours

Pull request review commentjina-ai/jina

Improve Formatter

 def format(self, record):         :param record: A LogRecord object         :returns: Formatted LogRecord with level-colour MAPPING to add corresponding colour.         """-        cr = copy(record)+        cr = deepcopy(record)

Hi, this is a good question. I think deepcopy will definitely be slower than just copy. I need to do more study on the code base to see how much using deepcopy impacts the performance.

chunyuema

comment created time in 2 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

Then we need to change other jina_pb2.DocumentProto() to Document() in other tests right?

Yongxuanzhang

comment created time in 2 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

Ok I know what to do

Yongxuanzhang

comment created time in 2 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

What's the reason? Then which should we use? Document() is not correct here. Because this line won't set the value. https://github.com/jina-ai/jina/blob/f0b6a44045f7fccca05a34020fd42981ce34dc4e/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py#L61

Yongxuanzhang

comment created time in 2 hours

pull request commentjina-ai/jina

test: refactor rank driver test

Latency summary

Current PR yields:

  • 😶 index QPS at 1251, delta to last 3 avg.: +2%
  • 😶 query QPS at 20, delta to last 3 avg.: -2%

Breakdown

Version Index QPS Query QPS
current 1251 20
1.0.7 1224 20

Backed by latency-tracking. Further commits will update this comment.

Yongxuanzhang

comment created time in 2 hours

pull request commentjina-ai/jina

test: refactor rank driver test

Codecov Report

Merging #2127 (b0760db) into refactor-rankers (c93f59a) will decrease coverage by 31.81%. The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           refactor-rankers    #2127       +/-   ##
=====================================================
- Coverage             82.61%   50.79%   -31.82%     
=====================================================
  Files                   208      189       -19     
  Lines                 11182    10516      -666     
=====================================================
- Hits                   9238     5342     -3896     
- Misses                 1944     5174     +3230     
Flag Coverage Δ
daemon ?
jina 50.79% <ø> (-32.07%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/parsers/ping.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/docker/helper.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/new.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/list.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/build.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/login.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/optimizer.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/pushpull.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/types/request/common.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/types/ndarray/sparse/numpy.py 0.00% <0.00%> (-100.00%) :arrow_down:
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c93f59a...2a90694. Read the comment docs.

Yongxuanzhang

comment created time in 2 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

there maybe an error in this test, you are welcome to refactor

The refactor is in this PR, this makes more sense I think. We compare the abs diff between the length of matches with the query. Those with the same value will be sorted by id asending. https://github.com/jina-ai/jina/pull/2127

JoanFM

comment created time in 2 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

avoid the usage of Proto

Yongxuanzhang

comment created time in 3 hours

PR opened jina-ai/jina

test: refactor rank driver test

refactor test_matches2doc_rank_drivers.py

+13 -16

0 comment

1 changed file

pr created time in 3 hours

create barnchjina-ai/jina

branch : refactor-rankers-test

created branch time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

there maybe an error in this test, you are welcome to refactor

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

The value of the score cannot be set this way? In the test, this value is 0, thus old_matches_socres is [0.0,0.0,0.0,0.0]

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

In the test, https://github.com/jina-ai/jina/blob/f0b6a44045f7fccca05a34020fd42981ce34dc4e/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py#L61

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

where is this code?

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

1.the *20 responds to changes that have been applied to Document ID and continuous refactoring.

  1. I thinks is by test design, but this test is too complicated to be honest

😂

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

match.score.value = match_score this line seems doesn't work?

JoanFM

comment created time in 3 hours

pull request commentjina-ai/jina

refactor: refactor rankers, move logic to driver

what's the motivation for moving methods into drivers?

To have a much simpler interface. Every executor has a very neat and clear interface except Chunk2DocRanker.

Also like this we can hide some complexity about grouping from the executor developer

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def _apply_all(self, docs: 'DocumentSet', *args, **kwargs) -> None:             )              matches = doc.matches-            old_match_scores = {match.id: match.score.value for match in matches}-            match_meta = (-                {match.id: match.get_attrs(*self._exec_match_keys) for match in matches}-                if self._exec_match_keys-                else None-            )+            num_matches = len(matches)+            old_match_scores = []+            needs_match_meta = self._exec_match_keys is not None+            match_meta = [] if needs_match_meta else None+            for match in matches:+                old_match_scores.append(match.score.value)+                if needs_match_meta:+                    match_meta.append(match.get_attrs(*self._exec_match_keys))              # if there are no matches, no need to sort them             if not old_match_scores:                 continue -            new_match_scores = self.exec_fn(query_meta, old_match_scores, match_meta)-            self._sort_matches_in_place(doc, new_match_scores)+            new_scores = self.exec_fn(old_match_scores, query_meta, match_meta)

yes, we could call it back to nee

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

1.the *20 responds to changes that have been applied to Document ID and continuous refactoring.

  1. I thinks is by test design, but this test is too complicated to be honest
JoanFM

comment created time in 4 hours

push eventjina-ai/jina-hub

Roshanjossey

commit sha 53b02090f45c554f3e202e94acc41d74a7514f4f

Deploying to gh-pages from @ 1ef38a1d5f84d0a9b4325cdad55e4d0970f7d931 🚀

view details

push time in 4 hours

push eventjina-ai/dashboard

Roshanjossey

commit sha 06f7fa85e0f8877908cfe5105ac4fe2b5820eec5

Deploying to gh-pages from @ 1ef38a1d5f84d0a9b4325cdad55e4d0970f7d931 🚀

view details

push time in 4 hours

push eventjina-ai/dashboard

rjgallego

commit sha a67c8f1252e27969c2802f0d0f6451efd3f36a8c

Merge pull request #1 from jina-ai/master Pulling changes since original fork

view details

rjgallego

commit sha 2068cfda7bd5474933f12fa6fce8e0d234b05c63

Updating button background color to use theme from props instead of hard-coded hex color

view details

rjgallego

commit sha 2cb81ba06e75bb6c32ee876bd833f05a27853c7e

Revert "Updating button background color to use theme from props instead of hard-coded hex color" This reverts commit 2068cfda7bd5474933f12fa6fce8e0d234b05c63.

view details

rjgallego

commit sha 9ea11f15dec604fe06734efce199bb19a37bce80

maintenance: change button style to use props instead of hard coded hex for color - change background property of Button and ButtonGroup to use props.theme.palette.primary from props instead of hard-coded hex value Closes #209

view details

Roshan Jossy

commit sha 1ef38a1d5f84d0a9b4325cdad55e4d0970f7d931

Merge pull request #242 from rjgallego/enhancement/209_command_bar_theme_palette Enhancement/209 command bar theme palette

view details

push time in 4 hours