profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/oranagra/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

sripathikrishnan/redis-rdb-tools 4226

Parse Redis dump.rdb files, Analyze Memory, and Export Data to JSON

RedisGears/RedisGears 193

Dynamic execution framework for your Redis data

RedisLabs/mbdirector 3

Memtier benchmark front-end

oranagra/memtier_benchmark 1

NoSQL Redis and Memcache traffic generation and benchmarking tool.

oranagra/redis-rdb-tools 1

Parse Redis dump.rdb files, Analyze Memory, and Export Data to JSON

oranagra/M2Crypto 0

M2Crypto for 2013+: Python 2.6+, OpenSSL 0.9.8+, SWIG 2.0+, modern POSIX or Windows Vista+

oranagra/redis 0

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, HyperLogLogs, Bitmaps.

oranagra/redis-doc 0

Redis documentation source code for markdown and metadata files, conversion scripts, and so forth

push eventredis/redis

Ozan Tezcan

commit sha 3ff56a6dde57b0109d16a186b65a3d2cdb4a7616

Fix crash due to free() call for a string literal in redis-benchmark (#9546)

view details

push time in an hour

PR merged redis/redis

Fix crash due to free() call for a string literal in redis-benchmark

Hostname is freed here, expects a sds string :

https://github.com/redis/redis/blob/9967a53f4c8e9f50a4e018aee469bf8c5a4647e9/src/redis-benchmark.c#L2010

+2 -1

1 comment

1 changed file

tezc

pr closed time in an hour

pull request commentredis/redis

Fix crash due to free() call for a string literal in redis-benchmark

Broken by #9314

tezc

comment created time in 2 hours

PullRequestReviewEvent

pull request commentredis/redis

[Improve] Use fcntl(fd,F_FULLFSYNC) instead of fsync on OSX platform

I must say it looks odd. If they have a proper way to implement fsync, why leave it i in a way it's not fulfilling its (POSIX) purpose, and introduce another call that does just that?

Linux man page says: This includes writing through or flushing a disk cache if present

I found a similar old suggestion in RocksDB that seemed to have been ignored, and one in golang where it's mentioned that it caused performance degradation.

Luckily MacOS isn't the real target platform for production workloads of Redis, just for development, so I don't think we care much either way.

However, regarding the patch, I think it's odd that random c files now include fcntl.h, calling redis_fsync from another file, will cause compilation error on just one platform. I think we should either include it in config.h, or at the very least add a comment next to the include stating that it's needed for redis_fsync on MacOS.

guoxiangCN

comment created time in 3 hours

pull request commentredis/redis

Minor optimize getMaxmemoryState, when server.maxmemory is not set

ok, i did see that HRANDFIELD with RESP3 error before, and it was on the same run that @huangzhw reported later (above) but we also see the same failure here: https://github.com/sundb/redis/runs/3686678983 so that's two occurrences, not 3.

huangzhw

comment created time in 5 hours

push eventredis/redis

sundb

commit sha 9967a53f4c8e9f50a4e018aee469bf8c5a4647e9

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER (#9538) In the `HRANDFIELD`, `SRANDMEMBER` and `ZRANDMEMBER` commands, There are some strategies that could in some rare cases return an unfair random. these cases are where s small dict happens be be hashed unevenly. Specifically when `count*ZRANDMEMBER_SUB_STRATEGY_MUL > size`, using `dictGetRandomKey` to randomize from a dict will result in an unfair random result.

view details

push time in 6 hours

PR merged redis/redis

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER state:to-be-merged

In the HRANDFIELD, SRANDMEMBER and ZRANDMEMBER commands, There are some strategies that could in some rare cases return an unfair random. these cases are where s small dict happens be be hashed unevenly.

Specifically when count*ZRANDMEMBER_SUB_STRATEGY_MUL > size, using dictGetRandomKey to randomize from a dict will result in an unfair random result.

+3 -3

12 comments

3 changed files

sundb

pr closed time in 6 hours

pull request commentredis/redis

Minor optimize getMaxmemoryState, when server.maxmemory is not set

ohh wait.. maybe i'm mixing different issues.

  • SRANDMEMBER histogram distribution (the one in #9538)
  • HRANDFIELD with RESP3 (i think i saw it recently and discussed it, but can't recall where)
huangzhw

comment created time in 6 hours

push eventredis/redis

Huang Zhw

commit sha bdecbd30df83b838bce1cb06743611abc129b208

Fix test randstring, compare string and int is wrong. (#9544) This will cause the generated string containing "\". Fixes a broken change in #8687

view details

push time in 6 hours

PR merged redis/redis

Fix test randstring, compare string and int is wrong.

This will cause the generated string containing "\".

fix https://github.com/redis/redis/runs/3688648146

Fixes a broken change in #8687 discussed in #9533

+3 -2

1 comment

1 changed file

huangzhw

pr closed time in 6 hours

pull request commentredis/redis

Minor optimize getMaxmemoryState, when server.maxmemory is not set

I saw it in a different place.. Let's discuss that in #9538

huangzhw

comment created time in 7 hours

PullRequestReviewEvent

pull request commentredis/redis

Minor optimize getMaxmemoryState, when server.maxmemory is not set

@huangzhw where did you see the failure of HRANDFIELD with RESP3? It's discussed in https://github.com/redis/redis/pull/9538 but it's odd that we see it a lot all of the sudden. Unless we're looking at the same execution and not at different failures...

huangzhw

comment created time in 7 hours

pull request commentredis/redis

Minor optimize getMaxmemoryState, when server.maxmemory is not set

I see, so @yoav-steinberg changed the type of rr from int to string and kept the condition I recently introduced that compares it to 92, so this checks fails.

For a moment I was thinking that this is odd, since the use of randstring was there since forever, and my fix for char 92 is quite recent, but now I notice that the use of randstring inside list unpacking is new, so that one depends on the fix for char 92.

Anyway, enough chatter.. Can one of you make a quick PR to fix this? (I can't merge my own PRs) .

huangzhw

comment created time in 7 hours

issue commentredis/redis

A perfect implementation of a hashtable with a very large number of keys

The idea of moving malloc and free to a background thread is about the original dict with huge allocations, a simple hack. That's why I put the comment in that other PR first.

Ohh got you.. So it's not even about my realloc idea (since in that case we do need it in the main thread immediately), it's just about avoiding freeze with the current rehash code we have in unstable.

I suppose it's just timing.. Had I seen that comment there a month ago I would have got it, but now after the realloc efficiency discussion here, I had the wrong context.

It still have some complexity (since dict.c is generic and it'll need the help from Redis and some global flags).

Anyway, I don't particularly like that idea, and I also wanna solve the memory issue, not just the freeze, so I wanna proceed with the idea discussed here (which is more complex, but local to dict.c)

GuoZhaoran

comment created time in 11 hours

pull request commentredis/redis

All replicas use one global shared replication buffer

think it's ok to mark comments as resolved as soon as you address them on your local working copy (or even put a TODO in the code). specifically if the comment is trivial suggestion. i think this is important for you to be able to keep track of what you handled and what's still pending.

i think it would also be a good idea to comment or add an emoji reaction indicating that you decided to take it or skip it. since emoji don't trigger notification, that's a good idea to use them when the suggestion is trivial or non-critical. but if you decide to reject a more complicated one, it's best to comment back as to why, so we can have a discussion and make sure there's no misunderstanding..

thanks for the great work on this PR. i wanna hammer it and merge it quickly before it gets outdated or we all lose focus and go elsewhere. it was pending for quite a while, but now that we picked it up, let's keep pushing to be merged soon.

ShooterIT

comment created time in 12 hours

Pull request review commentredis/redis

All replicas use one global shared replication buffer

 void feedReplicationBacklogWithObject(robj *o) {         len = sdslen(o->ptr);         p = o->ptr;     }-    feedReplicationBacklog(p,len);+    feedReplicationBuffer(p,len); } -int canFeedReplicaReplBuffer(client *replica) {-    /* Don't feed replicas that only want the RDB. */-    if (replica->flags & CLIENT_REPL_RDBONLY) return 0;+/* Generally, we only have one replication buffer block to trim when replication backlog+ * size exceeds our setting and no replica reference it. But if replica clients+ * disconnect, we need to free many replication buffer blocks that are referenced.+ * It would cost much time if there are a lots blocks to free, that will+ * freeze server, so we trim replication backlog incrementally. */+void incrementalTrimReplicationBacklog(size_t max_blocks) {+    serverAssert(server.repl_backlog != NULL); -    /* Don't feed replicas that are still waiting for BGSAVE to start. */-    if (replica->replstate == SLAVE_STATE_WAIT_BGSAVE_START) return 0;+    size_t trimmed_blocks = 0, trimmed_bytes = 0;+    while (server.repl_backlog_histlen > server.repl_backlog_size &&+           trimmed_blocks < max_blocks)+    {+        /* We never trim backlog to less than one block. */+        if (listLength(server.repl_buffer_blocks) <= 1) break;++        /* Replicas increment the refcount of the first replication buffer block+         * they refer to, in that case, we don't trim the backlog even if+         * backlog_histlen exceeds backlog_size. This implicitly makes backlog+         * bigger than our setting, but makes the master accept partial resync as+         * much as possible. So that backlog must be the last reference of+         * replication buffer blocks. */+        listNode *first = listFirst(server.repl_buffer_blocks);+        serverAssert(first == server.repl_backlog->ref_repl_buf_node);+        replBufBlock *fo = listNodeValue(first);+        if (fo->refcount != 1) break;++        /* We don't try trim backlog if backlog valid size will be lessen than+         * setting backlog size once we release the first repl buffer block. */+        if (server.repl_backlog_histlen - (long long)fo->size <=+            server.repl_backlog_size) break;++        /* Decr refcount and release the first block later. */+        fo->refcount--;+        trimmed_bytes += fo->size;+        trimmed_blocks++;++        /* Go to use next replication buffer block node. */+        listNode *next = listNextNode(first);+        server.repl_backlog->ref_repl_buf_node = next;+        serverAssert(server.repl_backlog->ref_repl_buf_node != NULL);+        /* Incr reference count to keep the new head node. */+        ((replBufBlock *)listNodeValue(next))->refcount++;++        /* Remove the node in recorded blocks. */+        uint64_t encoded_offset = htonu64(fo->repl_offset);+        raxRemove(server.repl_backlog->blocks_index,+            (unsigned char*)&encoded_offset, sizeof(uint64_t), NULL);++        /* Delete the first node from global replication buffer. */+        serverAssert(fo->refcount == 0 && fo->used == fo->size);+        server.repl_buffer_mem -= (fo->size ++            sizeof(listNode) + sizeof(replBufBlock));+        listDelNode(server.repl_buffer_blocks, first);+    }++    server.repl_backlog_histlen -= trimmed_bytes;+    /* Set the offset of the first byte we have in the backlog. */+    server.repl_backlog_off = server.master_repl_offset -+                              server.repl_backlog_histlen + 1;+} -    return 1;+/* Append bytes into the global replication buffer list, replication backlog and+ * all replica clients use replication buffers collectively, this function replace+ * 'addReply*', 'feedReplicationBacklog' for replicas and replication backlog,+ * First we add buffer into global replication buffer block list, and then+ * update replica / replication-backlog referenced node and block position. */+void feedReplicationBuffer(char *s, size_t len) {+    static long long repl_block_id = 0;++    if (server.repl_backlog == NULL) return;+    server.master_repl_offset += len;+    server.repl_backlog_histlen += len;++    /* Install write handler for all replicas. */+    prepareReplicasToWrite();++    size_t start_pos = 0;+    listNode *start_node = NULL;+    int add_new_block = 0;+    replBufBlock *used_last_block = NULL;+    listNode *ln = listLast(server.repl_buffer_blocks);+    replBufBlock *tail = ln? listNodeValue(ln): NULL;++    /* Append to tail string when possible. */+    if (tail && tail->size > tail->used) {+        start_node = listLast(server.repl_buffer_blocks);+        start_pos = tail->used;+        used_last_block = tail;+        /* Copy the part we can fit into the tail, and leave the rest for a+         * new node */+        size_t avail = tail->size - tail->used;+        size_t copy = avail >= len? len: avail;+        memcpy(tail->buf + tail->used, s, copy);+        tail->used += copy;+        s += copy;+        len -= copy;+    }+    if (len) {+        /* Create a new node, make sure it is allocated to at+         * least PROTO_REPLY_CHUNK_BYTES */+        size_t usable_size;+        size_t size = len < PROTO_REPLY_CHUNK_BYTES? PROTO_REPLY_CHUNK_BYTES: len;+        tail = zmalloc_usable(size + sizeof(replBufBlock), &usable_size);+        /* Take over the allocation's internal fragmentation */+        tail->size = usable_size - sizeof(replBufBlock);+        tail->used = len;+        tail->refcount = 0;+        tail->repl_offset = server.master_repl_offset - tail->used + 1;+        tail->id = repl_block_id++;+        memcpy(tail->buf, s, len);+        listAddNodeTail(server.repl_buffer_blocks, tail);+        /* We also count the list node memory into replication buffer memroy. */+        server.repl_buffer_mem += (usable_size + sizeof(listNode));+        add_new_block = 1;+        if (start_node == NULL) {+            start_node = listLast(server.repl_buffer_blocks);+            start_pos = 0;+        }+    }++    /* For output buffer of replicas. */+    listIter li;+    listRewind(server.slaves,&li);+    while((ln = listNext(&li))) {+        client *slave = ln->value;+        if (!canFeedReplicaReplBuffer(slave)) continue;++        /* Update shared replication buffer start position. */+        if (slave->ref_repl_buf_node == NULL) {+            slave->ref_repl_buf_node = start_node;+            slave->ref_block_pos = start_pos;+            /* Only increase the start block reference count. */+            ((replBufBlock *)listNodeValue(start_node))->refcount++;+        }++        /* Check output buffer limit only when add new block. */+        if (add_new_block) closeClientOnOutputBufferLimitReached(slave, 1);

p.s. we can't enforce one config to be greater than the other at config.c, because we don't know at which order they'll be changed. what we can / should do, as Yoav suggested in another comment is that when we consider the replica-buffer limit, we max() it with the backlog size, so that setting it to a lower value will be meaningless. plus a big comment in the code explaining why it's invalid (psync will succeed and then drop the replica), and maybe a mention of that fact in redis.conf.

ShooterIT

comment created time in 12 hours

PullRequestReviewEvent

Pull request review commentredis/redis

All replicas use one global shared replication buffer

 void createReplicationBacklog(void) {  /* This function is called when the user modifies the replication backlog  * size at runtime. It is up to the function to both update the- * server.cfg_repl_backlog_size and to resize the buffer and setup it so that+ * server.repl_backlog_size and to resize the buffer and setup it so that  * it contains the same data as the previous one (possibly less data, but  * the most recent bytes, or the same data and more free space in case the  * buffer is enlarged). */ void resizeReplicationBacklog(long long newsize) {     if (newsize < CONFIG_REPL_BACKLOG_MIN_SIZE)         newsize = CONFIG_REPL_BACKLOG_MIN_SIZE;-    if (server.cfg_repl_backlog_size == newsize) return;+    if (server.repl_backlog_size == newsize) return; -    server.cfg_repl_backlog_size = newsize;-    if (server.repl_backlog != NULL) {-        /* What we actually do is to flush the old buffer and realloc a new-         * empty one. It will refill with new data incrementally.-         * The reason is that copying a few gigabytes adds latency and even-         * worse often we need to alloc additional space before freeing the-         * old buffer. */-        zfree(server.repl_backlog);-        server.repl_backlog = zmalloc(server.cfg_repl_backlog_size);-        server.repl_backlog_size = zmalloc_usable_size(server.repl_backlog);-        server.repl_backlog_histlen = 0;-        server.repl_backlog_idx = 0;-        /* Next byte we have is... the next since the buffer is empty. */-        server.repl_backlog_off = server.master_repl_offset+1;-    }+    server.repl_backlog_size = newsize;+    if (server.repl_backlog)+        incrementalTrimReplicationBacklog(TRIM_REPL_BUF_BLOCKS_PER); }  void freeReplicationBacklog(void) {     serverAssert(listLength(server.slaves) == 0);+    if (server.repl_backlog == NULL) return;++    /* Decrease the start buffer node reference count. */+    if (server.repl_backlog->ref_repl_buf_node) {+        replBufBlock *o = listNodeValue(+            server.repl_backlog->ref_repl_buf_node);+        serverAssert(o->refcount == 1); /* Last reference. */+        o->refcount--;+    }+    freeReplicationBacklogRefMemAsync(server.repl_buffer_blocks,+                           server.repl_backlog->blocks_index);+    server.repl_buffer_size = 0;+    server.repl_buffer_blocks = listCreate();+    listSetFreeMethod(server.repl_buffer_blocks, (void (*)(void*))zfree);     zfree(server.repl_backlog);     server.repl_backlog = NULL; } -/* Add data to the replication backlog.- * This function also increments the global replication offset stored at- * server.master_repl_offset, because there is no case where we want to feed- * the backlog without incrementing the offset. */-void feedReplicationBacklog(void *ptr, size_t len) {-    unsigned char *p = ptr;+int canFeedReplicaReplBuffer(client *replica) {+    /* Don't feed replicas that only want the RDB. */+    if (replica->flags & CLIENT_REPL_RDBONLY) return 0;

i always prefer smaller diffs and less damage to the blame log. try the re-order i suggested above, and compare to the merge-base, see if that re-order makes a smaller diff (don't change unchanged lines). if it works, keep it, if not, let's forget about it.

ShooterIT

comment created time in 12 hours

PullRequestReviewEvent

Pull request review commentredis/redis

All replicas use one global shared replication buffer

 proc test_slave_buffers {test_name cmd_count payload_len limit_memory pipeline}                     $rd_master setrange key:0 0 [string repeat A $payload_len]                 }                 for {set k 0} {$k < $cmd_count} {incr k} {-                    #$rd_master read+                    $rd_master read

ohh, i'm reading it backwards. you didn't comment this line, you uncommented it. why was it commented??? looks like it was commented in the original commit that added that test: bf680b6f8c i have no clue why. probably a mistake.

ShooterIT

comment created time in 12 hours

PullRequestReviewEvent

Pull request review commentredis/redis

All replicas use one global shared replication buffer

 void loadDataFromDisk(void) {         rdbSaveInfo rsi = RDB_SAVE_INFO_INIT;         errno = 0; /* Prevent a stale value from affecting error checking */         int rdb_flags = RDBFLAGS_NONE;-        if (iAmMaster()) {+        if (iAmMaster() && access(server.rdb_filename, F_OK) == 0) {

ok. one could argue that this test is over sensitive. if the test was using a replicated pair of servers, we would have had to change the test. but since the test isn't using replication, and isn't loading an RDB (like many others), let's try to make the memory usage more predictable for them. i.e. in the past (before Soloestoy's recent change), there was on replication backlog in these tests. then the backlog was added, but its memory consumption was constant, so this test didn't notice. now its memory consumption gradually grows, and that breaks the test. let's make sure to avoid creating the backlog in that case, or create it and then destroy it if not needed.

but i don't like the current solution. the better one IMHO is that if we didn't set server.master_repl_offset from rsi.repl_offset (or didn't load an rdb file at all), then we should destroy the backlog.

ShooterIT

comment created time in 12 hours

PullRequestReviewEvent

issue commentredis/redis

A perfect implementation of a hashtable with a very large number of keys

@zuiderkwast maybe I got lost, but this whole discussion about realloc being slow was related to the previous design (the one in which I suggested a realloc of x2 rather than a malloc of x2 and memory usage of x3), but in the new design suggested here there's no huge realloc (and even if realloc was fast, that idea is still better due to gradual memory increase rater than sudden).

P. S even if we went with that old idea, or target the smaller realloc of the new idea, I don't see how we can send the realloc to be executed in the bg thread (considering a very slow realloc) , since we need the ht to serve clients. If I understand the suggestion above, it's only valid for a realloc that doesn't take that long (up to few milliseconds, to be run in parallel to network / io) and considering that this is a rare event, I don't think the complexity would be worth it.

GuoZhaoran

comment created time in 12 hours

pull request commentredis/redis

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER

So to sum up, we conclude the siphash behaves ok, but since this test doesn't use a lot of fields, once in a blue moon it happens that many of them fall into the same bucket, and that fails the test if we don't use fair random. What's odd is that we have that test for quite a while, it never failed, and all of the sudden we've seen several failures in short proximity.

Is that all right?

sundb

comment created time in 13 hours

pull request commentredis/redis

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER

Ohh, you mean a while loop I the TCL code.. I thought you meant a while loop in bash.

sundb

comment created time in 13 hours

pull request commentredis/redis

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER

@sundb but using --loop also spins up a new server for each test.

sundb

comment created time in 15 hours

pull request commentredis/redis

Use dictGetFairRandomKey() for HRANDFIELD,SRANDMEMBER,ZRANDMEMBER

@sundb can you elaborate? this seems much more severe than just hrandomkey, if keys are falling into one hash slot it causes severe performance issues.

Do you know why siphash gives different results in the two loop approaches?

Does the bad distribution happens a lot? Is it because we use similar field names?

Do you happen to know why this started to appear recently? (what did we change that caused it)?

sundb

comment created time in 16 hours