profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/k-bx/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Kon Rybnikov k-bx Kyiv, Ukraine http://twitter.com/ko_bx Mostly Haskell, Elm, Rust, Agda, Idris, HoTT

informatikr/hedis 302

A Redis client library for Haskell.

haskell-servant/servant-elm 152

Automatically derive Elm functions to query servant webservices

k-bx/boilerpipe 26

Automatically exported from code.google.com/p/boilerpipe

iand675/datadog 13

Haskell DataDog client library

k-bx/aeson-migrate 2

Type-level data migration for aeson values

k-bx/bit-protocol 2

Encode bit protocols not aligned by 8

k-bx/corenlp-parser 2

Launches CoreNLP and parses the JSON output

dfoxfranke/haskell-cld2 1

Haskell bindings to Google's Compact Language Detector 2

k-bx/accelerate-play 1

Something that I couldn't build

k-bx/antilorak 1

Project aimed to improve data from Pantheon

startedaristocratos/btop

started time in 20 hours

startedjoehillen/sysz

started time in 7 days

pull request commentinformatikr/hedis

Support cases when response of xstreamInfo does not contain entries.

Thank you @qnikst Released as 0.15.0

qnikst

comment created time in 21 days

push eventinformatikr/hedis

Kon Rybnikov

commit sha 02440c1e6a2d77d5d2cff76eda00d95f820d86fb

0.15.0 changelog

view details

push time in 21 days

push eventinformatikr/hedis

Kon Rybnikov

commit sha ae271d142662eea430e309bf53cfe3ddb9e888b9

Update GHC 8.10

view details

push time in 21 days

push eventinformatikr/hedis

Ilya Kopeshtianski

commit sha b647cdaf604b7c8325bdf6ff7b284e5a974612cd

Support cases when response of xstreamInfo does not contain entries. When redis stream does not contain any entry fields 'first-entry' and 'last-entry' fields may be missing. Before this patch hedis failed to read a reponse and filed with `Bulk Nothing` exception. In this patch we provide a means to support that by introducing an extra constructor to the XInfoStreamResponse. We introduced a constructor instead of wrapping fields into Maybe, because this change leads to a fewer refactoring and in addition in maintains an invariant that the first and the last entries are either full or empty simultaniously.

view details

Kon Rybnikov

commit sha 3f1199227273dec297fe840fdd04f88ce1b92592

Merge pull request #174 from cheopslab/master Support cases when response of xstreamInfo does not contain entries.

view details

push time in 21 days

PR merged informatikr/hedis

Support cases when response of xstreamInfo does not contain entries.

When Redis stream does not contain any entry fields 'first-entry' and 'last-entry' fields may be missing. Before this patch hedis failed to read a response and filed with Bulk Nothing exception.

In this patch we provide a means to support that by introducing an extra constructor to the XInfoStreamResponse. We introduced a constructor instead of wrapping fields into Maybe, because this change leads to a fewer refactoring and in addition in maintains an invariant that the first and the last entries are either full or empty simultaneously.

Fixes #173.

This commit break backwards compatibility, but it seems that the current solution is the smallest possible breakage we could find

+29 -2

0 comment

1 changed file

qnikst

pr closed time in 21 days

issue closedinformatikr/hedis

Hedis fails to decode xstreamInfo response in case when the stream is empty

If you try to run xstreamInfo on a newly created stream Redis fails with Bulk Nothing error instead of providing the result.

It happens because there is no data in the first-entry and the last-entry fields of the response.

closed time in 21 days

qnikst

push eventk-bx/openrtb-rust

Kon Rybnikov

commit sha d04d9b391e698cebb4b527d616f65d037cbee087

Privacy in response is a string

view details

push time in 25 days

startedskoltech-nlp/parallel_detoxification_dataset

started time in a month

pull request commenttimescale/timescaledb-toolkit

Aggregate HyperLogLogs into another HyperLogLog

@JLockerman yeah, just the fact that it’s my first pgx/Postgres thing, and I might be ready to upgrade this PR to make this an aggregate, but I don’t think I’ll have time in the near days, so if TimescaleDB team can get to that sooner — that’d be great

k-bx

comment created time in a month

startedSamSchott/maestral

started time in a month

PR opened timescale/timescaledb-toolkit

Aggregate HyperLogLogs into another HyperLogLog

Hi. I'm making a PR implementing https://github.com/timescale/timescaledb-toolkit/issues/202. I'm not sure how useful it would be in terms of documenting and merging into TimescaleDB. It allows you to solve the problem described in the issue, but it does so:

  • With a code that's probably shit. I didn't really get into details of how things work and just copy-pasted a bunch of things extending a 2-parameter function into a vec-function essentially. This is minor as this can be improved in this very PR
  • With an approach that doesn't directly work as described in the issue, and requires an additional layer of an array_agg call, which consumes the group-by-streamed data into an array

If there is no clear vision/understanding of how exactly to make a proper implementation of the feature, I'm happy to document and propose to merge this, as it would be able to help users solve specific problems they will have. If not -- let's discuss how we want to reimplement this properly.

Thanks!

+40 -2

0 comment

1 changed file

pr created time in a month

create barnchk-bx/timescaledb-toolkit

branch : 202-aggregate-hlls

created branch time in a month

fork k-bx/timescaledb-toolkit

Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈

https://www.timescale.com/

fork in a month

issue openedtimescale/timescaledb-toolkit

Aggregate HyperLogLogs into another HyperLogLog

Hi. There is a common scenario when you have a per-hour materialized view with your HyperLogLog distinct stats data, and you want to further "merge" it into per-day or per-month results.

For example, let's consider "events" table which has a "device_id" field, and we want a materialized view which shows how many unique device IDs you've had every hour. Furthermore, you'd like to show stats for per-day and per-month unique devices on your portal.

The data can look something like this:

CREATE TABLE IF NOT EXISTS test_hll_events (
    time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
    device_id text,
    game_id int8
);
SELECT create_hypertable('test_hll_events','time');
insert into test_hll_events values 
  ('2021-08-01 01:00:00', 'dev01', 1)
, ('2021-08-01 02:00:00', 'dev01', 2)
, ('2021-08-02 01:00:00', 'dev01', 2)
, ('2021-08-02 02:00:00', 'dev03', 3)
;
CREATE MATERIALIZED view test_hll_mv_devices_hourly  WITH (timescaledb.continuous) AS
  select time_bucket('1 hour', time) as bucket, toolkit_experimental.hyperloglog(32, device_id) as val from test_hll_events group by 1 
  with no data;

Now, what we want to see to get a daily view of unique devices is something like this:

select time_bucket('1 day', bucket), toolkit_experimental.hyperloglog_count(toolkit_experimental.aggregate(val)) from test_hll_mv_devices_hourly group by 1 order by time_bucket asc;

This is currently not possible.

I will also make a PR implementing a solution that works but I think is not good enough (but is totally usable), but that should be discussed separately in that PR.

created time in a month

startedqdrant/qdrant

started time in 2 months

starteddatafuselabs/datafuse

started time in 2 months

issue openedtimescale/timescaledb-toolkit

`lru_mapping` function for a fixed-size continuous-aggregate-friendly join analog

I was thinking about adding this myself, but realised that I probably won't have time in the near future so it's maybe better to spec out the idea first, since I think it could be a very handy one.

The problem I faced yesterday: we have a very small "price" table which maps country code to its price (US => 0.123, UA => 0.045), and then we have a large impressions hypertable. I wanted to build a materialised view that joins the price table and shows a sum of those prices (per hour, per game_id, but doesn't matter).

The error I've received was that only one hyperfunction is supported in materialized views. I then tried doing this (in an inferior) differently via subquery, which errored too.

I'm sure there's a good architectural reason for this being a very hard thing to implement as a general feature (to allow joins in aggregates). So I wondered, what if, instead, there's going to be a function like this (roughly):

lru_mapping(pricing.price, this.country == pricing.country, '1 mb')

which would create a 1-megabyte LRU cache which stores mappings by country essentially, used like this:

select time_bucket('1 hour') as bucket, game_id, sum(lru_mapping(pricing.price, this.country == pricing.country, '1 mb'))
from impression
where 

I see there are two cases:

  • you know that 1 MB is enough for the whole pricing table, then it'll probably work (although then you don't need an LRU cache, can be simplified to in_memory_mapping or something to ask to load the whole table)
  • you have a pricing table that's more than 1 MB (in LRU mapping). then I have no idea if something like this could work in theory, that's where I'd appreciate feedback

Function's signature is obviously not going to look as I sketched, but I think it's good enough to understand the essence of my goal here.

What scale is this useful at?

Large hypertables with small joined tables to lookup stuff, esp useful for LRU-friendly traversals. Maybe rethinking this as a two-step aggregate allowing parallelized multiple LRUs would also work.

Drawbacks

Another way of doing something (inner join), could confuse people when joins are supported.

Open Questions

  • is this even possible
  • precise api
  • two-part aggregate thing

created time in 2 months

pull request commenttimescale/timescaledb-toolkit

Use HyperLogLog++ implementation

@JLockerman thanks for pointing this out, I'm happy to test it out this week on both my local machine and our prod data

JLockerman

comment created time in 2 months

startedsusam/muboard

started time in 2 months

startedyamadapc/augmented-audio

started time in 2 months

pull request commenttimescale/timescaledb-toolkit

Use HyperLogLog++ implementation

@JLockerman asking for continuous aggregates. Yeah, I don't mind rebuilding them. Was just making sure having old (HLL, not HLL++) aggregates would not break timescale, or whether I'd need to drop those materialized views before upgrading the toolkit

JLockerman

comment created time in 2 months

startedforked-from-1kasper/castle_bravo

started time in 2 months

pull request commenttimescale/timescaledb-toolkit

Use HyperLogLog++ implementation

Apologies if this is already covered somewhere or I've missed in the PR, but what would be the upgrade path to those of us already using HLL, exactly? Will it "just work" or are there steps needed? Thanks

JLockerman

comment created time in 2 months

create barnchk-bx/obriz

branch : main

created branch time in 2 months

created repositoryk-bx/obriz

A small wasm website to crop your photos to be immediately uploadable for print at https://www.fotovramke.com/

created time in 2 months

startedkahst/BirdNET

started time in 2 months

startedPistonDevelopers/resize

started time in 2 months

startedquickwit-inc/quickwit

started time in 2 months