profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/alexcg1/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Alex Cureton-Griffiths alexcg1 Jina AI Berlin, Germany http://alexcg1.github.io Developer Relations Lead at Jina.AI, caffeine addict and vim-masochist

alexcg1/easy_text_generator 8

Generate text from machine-learning models right in your browser

alexcg1/jina-wikipedia-sentences 3

Using Jina to search through sentences from English-language Wikipedia

alexcg1/jina-streamlit-frontend 1

A simple front-end for Jina neural search framework, written in Streamlit, that supports querying with image, text, or drawing on a canvas.

alexcg1/mediawiki2book 1

Convert mediawiki pages into beautiful PDF books

alexcg1/alexcg1 0

Personal README

alexcg1/alexcg1.github.io 0

Alex C-G's personal website

alexcg1/Assembler 0

repository for OpenSCAD scripts for e-nable project

alexcg1/awesome-diversity 0

A curated list of amazingly awesome articles, websites and resources about diversity in technology.

startedredisson/redisson

started time in 5 minutes

startedredis/jedis

started time in 6 minutes

issue commentflorisboard/florisboard

Add multi copy paste (Clipboard history)

Just so you know I've delayed the merge of the input-logic-rework branch to tomorrow because I've found some ultra nasty bugs in the EditorInstance which get annoying when typing fast. I've fixed most of them but there's still a bug open which currently prevents me from merging the branch.

angriness-sleek

comment created time in 36 minutes

push eventflorisboard/florisboard

Patrick Goldinger

commit sha 058be7a169be7beae88574058f3b77cc3669b452

Fix editor instance commit text logic

view details

push time in 40 minutes

issue openedjina-ai/jina

Serve as the document store backend for Haystack

Describe the feature

Mirroring the ticket: https://github.com/deepset-ai/haystack/issues/82

Your proposal <!-- copy past your code/pull request link -->


<!-- Optional, but really help us locate the problem faster -->

Environment <!-- Run jina --version-full and copy paste the output here -->

Screenshots <!-- If applicable, add screenshots to help explain your problem. -->

created time in an hour

push eventjina-ai/jina

Jina Dev Bot

commit sha a2b62ff411b4a2269ce7fedf0e2c2d4ea741b15e

chore(contributor): update contributors

view details

push time in 2 hours

Pull request review commentjina-ai/jina

Improve Formatter

 def format(self, record):         :param record: A LogRecord object         :returns: Formatted LogRecord with level-colour MAPPING to add corresponding colour.         """-        cr = copy(record)+        cr = deepcopy(record)

Sure. Since the formatter is used in logger, which is used extensively. I'm not sure if this will harm the performance, haven't got time to look into it. To my understanding, we should use deepcopy only when necessary.

And it's good that you can raise this suggestion.

chunyuema

comment created time in 3 hours

push eventjina-ai/jina

Jina Dev Bot

commit sha 2f51048c3b2a1c7fb3cf4fe7c4d0fea19581fe5c

chore(style): reformatted by jina-dev-bot

view details

Jina Dev Bot

commit sha 4b6af96dec04ac17a9375f44c73ff72ac13c36ec

chore(contributor): update contributors

view details

cristian

commit sha cb40b44f05212dbf69f8ef40792d094e51553048

ci: include docstr linter (#2045)

view details

cristian

commit sha 3af29051e8e9e35eca5c76c52f86e60df515c834

feat(binarypb): delete on dump (#2102)

view details

Maximilian Werk

commit sha 9bbb0769b0474ddb5a0682b518f41f7e9643ff43

fix: expose env variable for workspace (#2114)

view details

CatStark

commit sha 7169fb56ad2fa3a919c0225e35fdb0ed08b910e4

fix: fix traversal_path, change from c to r (#2116)

view details

cristian

commit sha 640daf4d389768be216dac4125ef4e837ee65d23

ci: add black (#2036) * ci: add black * ci: add git blame

view details

Jina Dev Bot

commit sha e01e57df00deda8ea7bbda1f0a26ba25c60782a6

chore(contributor): update contributors

view details

cristian

commit sha dc2be2f009b8e82be8241363efd71ce3f32cbf84

ci: reenable docstrings lint (#2118)

view details

cristian

commit sha c258e4aa22495d3809ecbcb0ee9966938ccdbfe5

docs: update black docs and sha (#2117)

view details

Jina Dev Bot

commit sha 7dd876d0a1fbfca3818c13a68521b80e43a1c617

chore(contributor): update contributors

view details

Jina Dev Bot

commit sha 47ac7b0a8d55faf8032579cb6e114e9b02bf392f

chore(version): the next version will be 1.0.9 build(hanxiao): Sunday night weekly patch release

view details

Jina Dev Bot

commit sha 3e91a1faf4680e055aff4cf1d6db637024aa5612

chore(contributor): update contributors

view details

Jina Dev Bot

commit sha 5e32eddc940b3259fe712d3fffeecc96c8e23afb

Merge remote-tracking branch 'origin/master'

view details

Jina Dev Bot

commit sha 666d302ef490d35f7eb080f108994e4582c59dc2

chore(docs): update TOC

view details

Han Xiao

commit sha dd687735bb2c569f8dee51ff262d88b3f271b681

refactor(cli): rename silent to quiet (#2122)

view details

Deepankar Mahapatro

commit sha b429d2215475e56a8808b5687db9e90c2d1e133e

feat(schema): generate pydantic based jsonschema for any jina proto (#2121) * feat(schema): genereate pydantic based jsonschema for any jina proto * docs: fix return type * docs: fix docstrings * feat(schema): camel case support for all fields * test(schema): jina document to pydantic document * fix(schema): remove proto name check

view details

Florian Hönicke

commit sha caae3f6d9ba9e29583f08a7d721f8a1629e171fa

refactor: crud delete types (#2014)

view details

Joan Fontanals

commit sha f0b6a44045f7fccca05a34020fd42981ce34dc4e

refactor: prepare changes to have batching for every executor (#2110) Co-authored-by: Nan Wang <nan.wang@jina.ai>

view details

Florian Hönicke

commit sha 24ff01d30f0d6af7ed752d00a545e806108c333b

refactor: merge master

view details

push time in 3 hours

pull request commentjina-ai/cookiecutter-jina

feat: use version matching clause

@ddelange The illustration in this PR is really good and well-organized. Which I learned a lot! Looking forward to your more contributions! ❤️

ddelange

comment created time in 3 hours

push eventjina-ai/jina

YongxuanZhang

commit sha 282b402b0be2b93dd48e034336bafe9b1829d6aa

fix: test

view details

push time in 3 hours

startednorchen/terraform-talks

started time in 3 hours

Pull request review commentjina-ai/jina

Improve Formatter

 def format(self, record):         :param record: A LogRecord object         :returns: Formatted LogRecord with level-colour MAPPING to add corresponding colour.         """-        cr = copy(record)+        cr = deepcopy(record)

Hi, this is a good question. I think deepcopy will definitely be slower than just copy. I need to do more study on the code base to see how much using deepcopy impacts the performance.

chunyuema

comment created time in 3 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

Then we need to change other jina_pb2.DocumentProto() to Document() in other tests right?

Yongxuanzhang

comment created time in 3 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

Ok I know what to do

Yongxuanzhang

comment created time in 3 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

What's the reason? Then which should we use? Document() is not correct here. Because this line won't set the value. https://github.com/jina-ai/jina/blob/f0b6a44045f7fccca05a34020fd42981ce34dc4e/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py#L61

Yongxuanzhang

comment created time in 3 hours

pull request commentjina-ai/jina

test: refactor rank driver test

Latency summary

Current PR yields:

  • 😶 index QPS at 1251, delta to last 3 avg.: +2%
  • 😶 query QPS at 20, delta to last 3 avg.: -2%

Breakdown

Version Index QPS Query QPS
current 1251 20
1.0.7 1224 20

Backed by latency-tracking. Further commits will update this comment.

Yongxuanzhang

comment created time in 3 hours

pull request commentjina-ai/jina

test: refactor rank driver test

Codecov Report

Merging #2127 (b0760db) into refactor-rankers (c93f59a) will decrease coverage by 31.81%. The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           refactor-rankers    #2127       +/-   ##
=====================================================
- Coverage             82.61%   50.79%   -31.82%     
=====================================================
  Files                   208      189       -19     
  Lines                 11182    10516      -666     
=====================================================
- Hits                   9238     5342     -3896     
- Misses                 1944     5174     +3230     
Flag Coverage Δ
daemon ?
jina 50.79% <ø> (-32.07%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/parsers/ping.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/docker/helper.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/new.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/list.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/build.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/login.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/optimizer.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/parsers/hub/pushpull.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/types/request/common.py 0.00% <0.00%> (-100.00%) :arrow_down:
jina/types/ndarray/sparse/numpy.py 0.00% <0.00%> (-100.00%) :arrow_down:
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c93f59a...2a90694. Read the comment docs.

Yongxuanzhang

comment created time in 3 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

there maybe an error in this test, you are welcome to refactor

The refactor is in this PR, this makes more sense I think. We compare the abs diff between the length of matches with the query. Those with the same value will be sorted by id asending. https://github.com/jina-ai/jina/pull/2127

JoanFM

comment created time in 3 hours

Pull request review commentjina-ai/jina

test: refactor rank driver test

 def create_document_to_score():     # |- matches: (id: 3, parent_id: 1, score.value: 3),     # |- matches: (id: 4, parent_id: 1, score.value: 4),     # |- matches: (id: 5, parent_id: 1, score.value: 5),-    doc = Document()+    doc = jina_pb2.DocumentProto()

avoid the usage of Proto

Yongxuanzhang

comment created time in 3 hours

PR opened jina-ai/jina

test: refactor rank driver test

refactor test_matches2doc_rank_drivers.py

+13 -16

0 comment

1 changed file

pr created time in 4 hours

create barnchjina-ai/jina

branch : refactor-rankers-test

created branch time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

there maybe an error in this test, you are welcome to refactor

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

The value of the score cannot be set this way? In the test, this value is 0, thus old_matches_socres is [0.0,0.0,0.0,0.0]

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

In the test, https://github.com/jina-ai/jina/blob/f0b6a44045f7fccca05a34020fd42981ce34dc4e/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py#L61

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

where is this code?

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

1.the *20 responds to changes that have been applied to Document ID and continuous refactoring.

  1. I thinks is by test design, but this test is too complicated to be honest

😂

JoanFM

comment created time in 4 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

match.score.value = match_score this line seems doesn't work?

JoanFM

comment created time in 4 hours

pull request commentjina-ai/jina

refactor: refactor rankers, move logic to driver

what's the motivation for moving methods into drivers?

To have a much simpler interface. Every executor has a very neat and clear interface except Chunk2DocRanker.

Also like this we can hide some complexity about grouping from the executor developer

JoanFM

comment created time in 5 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def _apply_all(self, docs: 'DocumentSet', *args, **kwargs) -> None:             )              matches = doc.matches-            old_match_scores = {match.id: match.score.value for match in matches}-            match_meta = (-                {match.id: match.get_attrs(*self._exec_match_keys) for match in matches}-                if self._exec_match_keys-                else None-            )+            num_matches = len(matches)+            old_match_scores = []+            needs_match_meta = self._exec_match_keys is not None+            match_meta = [] if needs_match_meta else None+            for match in matches:+                old_match_scores.append(match.score.value)+                if needs_match_meta:+                    match_meta.append(match.get_attrs(*self._exec_match_keys))              # if there are no matches, no need to sort them             if not old_match_scores:                 continue -            new_match_scores = self.exec_fn(query_meta, old_match_scores, match_meta)-            self._sort_matches_in_place(doc, new_match_scores)+            new_scores = self.exec_fn(old_match_scores, query_meta, match_meta)

yes, we could call it back to nee

JoanFM

comment created time in 5 hours

Pull request review commentjina-ai/jina

refactor: refactor rankers, move logic to driver

 def __init__(self, *args, **kwargs):             **kwargs         ) -    def score(self, query_meta, old_match_scores, match_meta):-        new_scores = [-            (match_id, -abs(match_meta[match_id]['length'] - query_meta['length']))-            for match_id, old_score in old_match_scores.items()-        ]-        return np.array(-            new_scores,-            dtype=[(self.COL_MATCH_ID, np.object), (self.COL_SCORE, np.float64)],-        )+    def score(self, old_match_scores, query_meta, match_meta):+        new_scores = [-abs(m['length'] - query_meta['length']) for m in match_meta]+        return new_scores   def create_document_to_score():

1.the *20 responds to changes that have been applied to Document ID and continuous refactoring.

  1. I thinks is by test design, but this test is too complicated to be honest
JoanFM

comment created time in 5 hours