profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/jasone/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

jasone/Hemlock 0

Programming language

jasone/jemalloc-ci 0

Continuous integration for jemalloc

jasone/skip 0

A programming language to skip the things you have already computed

pull request commentjemalloc/jemalloc

Add m4 to .cirrus.yml.

Doesn't seem to work even though m4 is installed according to the log

[3/5] Fetching m4-1.4.18_1,1.txz: .......... done
[4/5] Fetching perl5-5.32.1_1.txz: .......... done
[5/5] Fetching autoconf-wrapper-20131203.txz: . done
Checking integrity... done (0 conflicting)
[1/5] Installing m4-1.4.18_1,1...
[1/5] Extracting m4-1.4.18_1,1: .......... done

Somehow the path is not expected or set properly?

davidtgoldblatt

comment created time in 2 days

PR opened jemalloc/jemalloc

WIP / CI-only
+45 -18

0 comment

5 changed files

pr created time in 2 days

pull request commentKinesisCorporation/SmartSetApps

CI & command-line build

After rebasing against the latest commits on master, all four current apps are successfully building! (Advantage, Master, FSEdge, and Savant Elite)

So, @KinesisCorporation this would be a great time to merge the PR ;)

Are you interested at all in continuous integration?

  • With every push, GitHub will build the apps on each platform.
  • It's a great way to keep track of when your changes inadvertently break another app or another platform.
  • Eventually, if you adopt a practice of using pull requests, you could set a policy so that pull requests require passing builds before they can be merged.
  • Who knows, maybe one day there could even be unit tests!

And you could have one of these badges like all the cool kids these days: Build

jrr

comment created time in 2 days

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

And I changed to plain malloc from mallocx and then I can see the difference. This looks better for dev.

sdallocx (dev), n=10: 2.6773, stddev 0.095249439
free (dev), n=10: 2.8577, stddev 0.019448
sdallocx (5.2.1), n=10: 2.9049, stddev 0.025856226
free (5.2.1), n=10: 2.9007, stddev 0.0190616
jq-rs

comment created time in 2 days

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

Awesome, thanks for the fix! There aren't that many sized allocation tests, larson was the easiest looking for me to transform to such. As a jemalloc user (which I am myself too) it is tempting to move to sized deallocation if it is faster. Below are some results in my environment. It seems that the performance benefit between sdallocx and free is not seen in this larson test:

Relative time (smaller is better):
sdallocx (dev), n=10: 2.9557, stddev 0.020477901
free (dev), n=10: 2.8577, stddev 0.019448
sdallocx (5.2.1), n=10: 3.1222, stddev 0.017357675
free (5.2.1), n=10: 2.9007, stddev 0.0190616
jq-rs

comment created time in 2 days

Pull request review commentBranchTaken/Hemlock

Implement the scanner.

+open Basis+include Basis.Rudiments+open Hmc++let scan_file path =+  let open Format in+  let rec fn scanner = begin+    let scanner', ctoken = Scan.next scanner in+    let atoken = Scan.ConcreteToken.atoken ctoken in+    let source = Scan.ConcreteToken.source ctoken in+    printf "  %a : %s\n"+      Scan.Source.pp_loc source+      (Scan.AbstractToken.to_string atoken)+    ;+    match atoken with+    | Scan.AbstractToken.Tok_end_of_input -> ()+    | _ -> fn scanner'+  end in+  printf "@[<h>";+  let () = match File.of_path ~flag:File.Flag.R_O path with

It looks like I set the default as RW and that it combines O_RDRW | O_CREAT. I kinda wonder if the default should be R_O.

jasone

comment created time in 2 days

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

BTW, poked at this a little more -- here's the fix, in the loop in warmup (with size_t tmp_sz; declared in the loop header).

  /* generate a random permutation of the chunks */
  for( cblks=num_chunks; cblks > 0 ; cblks--){
    victim = lran2(&rgen)%cblks ;
    tmp = blkp[victim] ;
    tmp_sz = blksize[victim];
    blkp[victim]  = blkp[cblks-1] ;
    blksize[victim] = blksize[cblks-1];
    blkp[cblks-1] = (char *) tmp ;
    blksize[cblks-1] = tmp_sz;
  }

Also, this is sort of a pessimizing usage of mallocx/sdallocx. The way I'd write the wrappers if I cared about perf would be to make sized_malloc just be return malloc(size);, and make sized_free just be sdallocx(ptr, size, 0); (which matches what your program is probably going to do in practice).

jq-rs

comment created time in 3 days

issue closedjemalloc/jemalloc

Larson memory test failure with sized allocations

While testing the jemalloc performance difference between freeand sdallocx with a modified Larson test program I managed to get a seg fault.

Please note that this may well be a user error, I am not familiar with jemalloc internals so it may be an issue with the introduced test program changes. It may be the alignment settings are now somehow invalid. Nevertheless, it does not happen with Ubuntu 20.04 system allocator or with another allocator with sized free either. If it is a real problem, perhaps a timing-related as it does not happen with certain printfs even with sdallocx-call. I tested with dev-branch head 73ca4b8ef81d2a54970804182c010b8c95a93587 (cs below). It seems to even happen with stable 5.2.1, so decided open this issue. HW is Intel server with 40 hyperthreaded CPUs. gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04). Please see details below of the problem:

/usr/bin/c++ -w -O3 -Wno-unused-result -std=gnu++17 -o larson.cpp.o -c larson.cpp
/usr/bin/c++ -w -O3 -rdynamic larson.cpp.o -o larson -lpthread -ljemalloc
./larson 5 8 1000 5000 100 4141 40
Segmentation fault (core dumped)

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d64540 in bitmap_unset (binfo=<optimized out>, bit=<optimized out>, bitmap=<optimized out>) at include/jemalloc/internal/bitmap.h:344
344             *gp = g;
(gdb) bt
#0  0x00007ffff7d64540 in bitmap_unset (binfo=<optimized out>, bit=<optimized out>, bitmap=<optimized out>) at include/jemalloc/internal/bitmap.h:344
#1  arena_dalloc_bin_locked_step (ptr=<optimized out>, slab=<optimized out>, binind=<optimized out>, info=<synthetic pointer>, bin=<optimized out>, arena=<optimized out>, tsdn=0x7ffffffe8960) at include/jemalloc/internal/arena_inlines_b.h:514
#2  tcache_bin_flush_impl (small=true, rem=<optimized out>, binind=0, cache_bin=<optimized out>, tcache=<optimized out>, tsd=0x7ffffffe8960) at src/tcache.c:449
#3  je_tcache_bin_flush_small (tsd=tsd@entry=0x7ffff778b748, tcache=<optimized out>, cache_bin=<optimized out>, binind=binind@entry=17, rem=<optimized out>) at src/tcache.c:511
#4  0x00007ffff7cf738c in tcache_dalloc_small (slow_path=false, binind=17, ptr=0x7ffff2878dc0, tcache=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:137
#5  arena_sdalloc (slow_path=<optimized out>, caller_alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:417
#6  isdalloct (slow_path=<optimized out>, alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:133
#7  isfree (slow_path=false, tcache=<optimized out>, usize=<optimized out>, ptr=0x7ffff2878dc0, tsd=<optimized out>) at src/jemalloc.c:2841
#8  je_sdallocx_default (ptr=0x7ffff2878dc0, size=<optimized out>, flags=<optimized out>) at src/jemalloc.c:3736
#9  0x0000555555556232 in runthreads(long, int, int, int, int) ()
#10 0x000055555555545d in main ()

The test code is located here. I hope you get on to it with this info, if it is a real problem. Let me know if you need anything else, thanks!

closed time in 3 days

jq-rs

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

Thanks for your quick reply! I'll check the app more closely and close this.

jq-rs

comment created time in 3 days

push eventKinesisCorporation/SmartSetApps

KinesisCorporation

commit sha 4cfb5f5d9718f37f9a26552ac456ee54e8904bc7

Update buttons for Mac Big Sur FS Edge/Pro, minor fixes

view details

push time in 3 days

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

Also, in general, I'd recommend against using this sort of microbenchmark to try to inform some allocator choices; they tend to emphasize entirely the wrong things. It's fairly easy to write an allocator that performs really well in benchmarks and then falls over in "real" programs. Better would be to just test against the prod workload you're interested in, directly.

jq-rs

comment created time in 3 days

issue commentjemalloc/jemalloc

Larson memory test failure with sized allocations

This looks like application corruption -- if you configure with --enable-debug --enable-log, and run with MALLOC_CONF="log:core" in the environment, it'll dump all allocation requests and results. On my test run, mallocx's logging indicates that the pointer was allocated with size 631, but passed to sdallocx with size 759.

jq-rs

comment created time in 3 days

issue openedjemalloc/jemalloc

Larson memory test failure with sized allocations

While testing the jemalloc performance difference between freeand sdallocx with a modified Larson test program I managed to get a seg fault.

Please note that this may well be a user error, I am not familiar with jemalloc internals so it may be an issue with the introduced test program changes. It may be the alignment settings are now somehow invalid. Nevertheless, it does not happen with Ubuntu 20.04 system allocator or with another allocator with sized free either. If it is a real problem, perhaps a timing-related as it does not happen with certain printfs even with sdallocx-call. I tested with dev-branch head 73ca4b8ef81d2a54970804182c010b8c95a93587 (cs below). It seems to even happen with stable 5.2.1, so decided open this issue. HW is Intel server with 40 hyperthreaded CPUs. gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04). Please see details below of the problem:

/usr/bin/c++ -w -O3 -Wno-unused-result -std=gnu++17 -o larson.cpp.o -c larson.cpp
/usr/bin/c++ -w -O3 -rdynamic larson.cpp.o -o larson -lpthread -ljemalloc
./larson 5 8 1000 5000 100 4141 40
Segmentation fault (core dumped)

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d64540 in bitmap_unset (binfo=<optimized out>, bit=<optimized out>, bitmap=<optimized out>) at include/jemalloc/internal/bitmap.h:344
344             *gp = g;
(gdb) bt
#0  0x00007ffff7d64540 in bitmap_unset (binfo=<optimized out>, bit=<optimized out>, bitmap=<optimized out>) at include/jemalloc/internal/bitmap.h:344
#1  arena_dalloc_bin_locked_step (ptr=<optimized out>, slab=<optimized out>, binind=<optimized out>, info=<synthetic pointer>, bin=<optimized out>, arena=<optimized out>, tsdn=0x7ffffffe8960) at include/jemalloc/internal/arena_inlines_b.h:514
#2  tcache_bin_flush_impl (small=true, rem=<optimized out>, binind=0, cache_bin=<optimized out>, tcache=<optimized out>, tsd=0x7ffffffe8960) at src/tcache.c:449
#3  je_tcache_bin_flush_small (tsd=tsd@entry=0x7ffff778b748, tcache=<optimized out>, cache_bin=<optimized out>, binind=binind@entry=17, rem=<optimized out>) at src/tcache.c:511
#4  0x00007ffff7cf738c in tcache_dalloc_small (slow_path=false, binind=17, ptr=0x7ffff2878dc0, tcache=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:137
#5  arena_sdalloc (slow_path=<optimized out>, caller_alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:417
#6  isdalloct (slow_path=<optimized out>, alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:133
#7  isfree (slow_path=false, tcache=<optimized out>, usize=<optimized out>, ptr=0x7ffff2878dc0, tsd=<optimized out>) at src/jemalloc.c:2841
#8  je_sdallocx_default (ptr=0x7ffff2878dc0, size=<optimized out>, flags=<optimized out>) at src/jemalloc.c:3736
#9  0x0000555555556232 in runthreads(long, int, int, int, int) ()
#10 0x000055555555545d in main ()

The test code is located here. I hope you get on to it with this info, if it is a real problem. Let me know if you need anything else, thanks!

created time in 3 days

PR opened jemalloc/jemalloc

Add m4 to .cirrus.yml.

This is a precondition for building.

I'm not actually sure this will fix the CI errors, but we'll see.

+1 -1

0 comment

1 changed file

pr created time in 3 days

pull request commentjemalloc/jemalloc

Darwin malloc_size override support proposal.

I think this makes sense overall -- probably though, we should keep malloc_usable_size (since it's a jemalloc extension on OS X at this point, even if it wasn't really meant to be), and just have both mallooc_usable_size and malloc_size (probably both just being thin wrappers around some shared implementation function).

devnexen

comment created time in 3 days

push eventKinesisCorporation/SmartSetApps

KinesisCorporation

commit sha 9b67e3ec48b9ddb1ff7277d7b7c78b4d337febdd

Advantage2 fixes for Mac Big Sur

view details

push time in 4 days

push eventKinesisCorporation/SmartSetApps

KinesisCorporation

commit sha fd342210ccd5fed67531ccd35ccac001c58e6526

MGA changes for Mac OS

view details

push time in 4 days

push eventKinesisCorporation/SmartSetApps

KinesisCorporation

commit sha d605cf01b7a606b850faef7e4f93b11e3db7e368

Advantage2 fix for tap and hold

view details

KinesisCorporation

commit sha e363e2ac9636c6ae56e12a462ac14c7fc7051e5c

Merge branch 'master' of https://github.com/KinesisCorporation/SmartSetApps

view details

push time in 4 days

issue commentjemalloc/jemalloc

Jemalloc 5.3 estimated release time frame

It would be nice to get it into FreeBSD too. We are going to miss the 13.0 release but it should be able to MFC for the next point release.

LifeIsStrange

comment created time in 6 days

issue commentjemalloc/jemalloc

Jemalloc 5.3 estimated release time frame

If there is the possibility to get a decent application speedup just by upgrading to a new version of the memory allocator, I am all for it. So an official release, or at least an ETA for it, will be highly appreciated from my side too!

I would guess that will be true for many users that ship applications that bundle/link jemalloc. Thanks a lot!

LifeIsStrange

comment created time in 6 days

issue commentjemalloc/jemalloc

Deadlock in child process after fork (source: mutex_pool locks during decay)

Good point. Found a few more cases where the higher level locks were dropped before acquiring the emap locks, e.g. within extent_recycle_split where the ecache lock were only held during extent_recycle_extract right before.

So it looks a quick fix might not be very feasible, w/o causing other unwanted side effects. I'll shoot for removing the mutex_pool directly then; capturing the cases here do help though, especially on sanity checking the extent state transitions. Hopefully I'll be able to turn all emap lock operations into assertions.

chk-jxcn

comment created time in 7 days

issue commentjemalloc/jemalloc

Deadlock in child process after fork (source: mutex_pool locks during decay)

Note that we acquire emap mutexes in other places too, without holding anything else -- e.g. extent_register when called by ecache_alloc_grow

chk-jxcn

comment created time in 8 days

issue commentjemalloc/jemalloc

Deadlock in child process after fork (source: mutex_pool locks during decay)

One possible quick fix, is to hold the decay mutex during puring, i.e. move this line up a bit: https://github.com/jemalloc/jemalloc/blob/dev/src/pac.c#L342

I'll check to make sure that holding the decay mutex longer won't block anything important.

chk-jxcn

comment created time in 8 days

issue commentjemalloc/jemalloc

Deadlock in child process after fork (source: mutex_pool locks during decay)

One additional thought, this hasn't popped up in our environment, likely because we have background threads enabled by default, in which case the per bg thd mutex is held during decay (and those are included in the pre-fork operations).

chk-jxcn

comment created time in 8 days

issue commentjemalloc/jemalloc

Deadlock in child process after fork

Added some more assertions during my attempt to get rid of the mutex_pool, and discovered the case causing the deadlock here: it's on the decay path where we drop all higher level locks (including decay and ecache locks during pac_decay_stashed) but then invoking extent_dalloc_wrapper which later triggers coalesce and takes mutex_pool locks. One such example:

#5  0x00007ffff78ec972 in je_emap_lock_edata2 (tsdn=0x7ffff7fe6748, emap=0x7ffff7bc3400 <je_arena_emap_global>, edata1=0x7ffff61ff580, 
    edata2=0x7ffff61ff600) at src/emap.c:53
#6  0x00007ffff78f5c88 in extent_merge_impl (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, a=0x7ffff61ff580, 
    b=0x7ffff61ff600, growing_retained=false) at src/extent.c:1272
#7  0x00007ffff78f46da in extent_coalesce (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, ecache=0x7ffff6405b20, 
    inner=0x7ffff61ff580, outer=0x7ffff61ff600, forward=true, growing_retained=false) at src/extent.c:841
#8  0x00007ffff78f47e4 in extent_try_coalesce_impl (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, 
    ecache=0x7ffff6405b20, edata=0x7ffff61ff580, coalesced=0x0, growing_retained=false, inactive_only=false) at src/extent.c:883
#9  0x00007ffff78f4975 in extent_try_coalesce (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, ecache=0x7ffff6405b20, 
    edata=0x7ffff61ff580, coalesced=0x0, growing_retained=false) at src/extent.c:940
#10 0x00007ffff78f4bbc in extent_record (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, ecache=0x7ffff6405b20, 
    edata=0x7ffff61ff580, growing_retained=false) at src/extent.c:989
#11 0x00007ffff78f5190 in je_extent_dalloc_wrapper (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, ehooks=0x7ffff64000c0, edata=0x7ffff61ff580)
    at src/extent.c:1093
#12 0x00007ffff7917a36 in pac_decay_stashed (tsdn=0x7ffff7fe6748, pac=0x7ffff6403390, decay=0x7ffff6406fa0, decay_stats=0x7ffff64009f8, 
    ecache=0x7ffff64033c0, fully_decay=true, decay_extents=0x7fffffffcf18) at src/pac.c:287

I'm still thinking about the fix; the long term one is still removing the mutex_pool. However I'll see if a quick one can be done, in case we want an upstream release sooner.

chk-jxcn

comment created time in 8 days

push eventjemalloc/jemalloc

David Goldblatt

commit sha 4b8870c7dbfaeea7136a8e0b9f93a2ad85d31a55

SEC: Fix a comment typo.

view details

David Goldblatt

commit sha f47b4c2cd8ed3e843b987ee972d187df45391b69

PAI/SEC: Add a dalloc_batch function. This lets the SEC flush all of its items in a single call, rather than flushing everything at once.

view details

David Goldblatt

commit sha 1944ebbe7f079e79fbeda836dc0333f7a049ac26

HPA: Implement batch deallocation. This saves O(n) mutex locks/unlocks during SEC flush.

view details

David Goldblatt

commit sha bf448d7a5a4c2aecbda7ef11767a75829d9aaf77

SEC: Reduce lock hold times. Only flush a subset of extents during flushing, and drop the lock while doing so.

view details

David Goldblatt

commit sha 480f3b11cd61c1cf37c90d61701829a0cebc98da

Add a batch allocation interface to the PAI. For now, no real allocator actually implements this interface; this will change in subsequent diffs.

view details

David Goldblatt

commit sha cdae6706a6dbe6ab75688ea24a82ef4165c3b0b1

SEC: Use batch fills. Currently, this doesn't help much, since no PAI implementation supports flushing. This will change in subsequent commits.

view details

David Goldblatt

commit sha ce9386370ad67d4b12dc167600080fe17fcf3113

HPA: Implement batch allocation.

view details

David Goldblatt

commit sha fb327368db39a2edca5f9659a70a53bd3bb0ed6c

SEC: Expand option configurability. This change pulls the SEC options into a struct, which simplifies their handling across various modules (e.g. PA needs to forward on SEC options from the malloc_conf string, but it doesn't really need to know their names). While we're here, make some of the fixed constants configurable, and unify naming from the configuration options to the internals.

view details

David Goldblatt

commit sha d21d5b46b607542398440d77b5f5ba22116dad5a

Edata: Move sn into its own field. This lets the bins use a fragmentation avoidance policy that matches the HPA's (without affecting the PAC).

view details

David Goldblatt

commit sha 271a676dcd2d5ff863e8f6996089680f56fa0656

hpdata: early bailout for longest free range. A number of common special cases allow us to stop iterating through an hpdata's bitmap earlier rather than later.

view details

David Goldblatt

commit sha 154aa5fcc102172fcac0e111ff79df9d5ced7973

Use the flat bitmap for eset and psset bitmaps. This is simpler (note that the eset field comment was actually incorrect!), and slightly faster.

view details

David Goldblatt

commit sha 6bddb92ad64ee096a34c0d099736c237d46f1065

psset: Rename "bitmap" to "pageslab_bitmap". It tracks pageslabs. Soon, we'll have another bitmap (to track dirty pages) that we want to disambiguate. While we're here, fix an out-of-date comment.

view details

David Goldblatt

commit sha 0f6c420f83a52c3927cc1c78d155622de05e3ba5

HPA: Make purging/hugifying more principled. Before this change, purge/hugify decisions had several sharp edges that could lead to pathological behavior if tuning parameters weren't carefully chosen. It's the first of a series; this introduces basic "make every hugepage with dirty pages purgeable" functionality, and the next commit expands that functionality to have a smarter policy for picking hugepages to purge. Previously, the dehugify logic would *never* dehugify a hugepage unless it was dirtier than the dehugification threshold. This can lead to situations in which these pages (which themselves could never be purged) would push us above the maximum allowed dirty pages in the shard. This forces immediate purging of any pages deallocated in non-hugified hugepages, which in turn places nonobvious practical limitations on the relationships between various config settings. Instead, we make our preference not to dehugify to purge a soft one rather than a hard one. We'll avoid purging them, but only so long as we can do so by purging non-hugified pages. If we need to purge them to satisfy our dirty page limits, or to hugify other, more worthy candidates, we'll still do so.

view details

David Goldblatt

commit sha 73ca4b8ef81d2a54970804182c010b8c95a93587

HPA: Use dirtiest-first purging. This seems to be practically beneficial, despite some pathological corner cases.

view details

push time in 9 days

PR merged jemalloc/jemalloc

Hpa purging and scalability improvements

This is a few logical pieces of functionality, stacked together per reviewer preference.

The initial (SEC-centered commits) stack of commits reduces SEC shard lock hold times (by dropping the lock while flushing), increases SEC hit rates (by only flushing individual bins in the SEC until we get down below some threshold), and adds batch allocation/deallocation facilities to the hpa shard.

We then have a few miscellaneous cleanups and performance improvements (edata change through the psset bitmap renaming).

The last two commits rework purging; first by doing a "so-so but principled" approach (in which every huge extent with dirty pages goes on a FIFO purging list, segregated by hugification status so that we purge hugified huge extents last). This is mostly to set up the framework with something simple.

In the last commit, we switch to a dirtiest-first strategy selection strategy. We still purge non-hugified extents before hugified ones, but only within a given size class bucket. This seems to work much better in practice. (I think there's lots more tuning here to do, though -- maybe linear dirtiness bucketing is better than exponential, for example).

+1023 -435

0 comment

31 changed files

davidtgoldblatt

pr closed time in 9 days

Pull request review commentjemalloc/jemalloc

Hpa purging and scalability improvements

 psset_alloc_container_remove(psset_t *psset, hpdata_t *ps) { 	} } +static size_t+psset_purge_list_ind(hpdata_t *ps) {+	size_t ndirty = hpdata_ndirty_get(ps);+	/* Shouldn't have something with no dirty pages purgeable. */+	assert(ndirty > 0);+	pszind_t pind = sz_psz2ind(sz_psz_quantize_floor(ndirty << LG_PAGE));+	/*+	 * Higher indices correspond to lists we'd like to purge earlier;+	 * increment the index for the nonhugified hpdatas first, so that we'll+	 * pick them before picking hugified ones.+	 */+	return (size_t)pind * 2 + (hpdata_huge_get(ps) ? 0 : 1);+}++static void+psset_maybe_remove_purge_list(psset_t *psset, hpdata_t *ps) {+	/*+	 * Remove the hpdata from its purge list (if it's in one).  Even if it's+	 * going to stay in the same one, by appending it during+	 * psset_update_end, we move it to the end of its queue, so that we+	 * purge LRU within a given dirtiness bucket.+	 */+	if (hpdata_purge_allowed_get(ps)) {

Sounds good. Thanks!

davidtgoldblatt

comment created time in 9 days

Pull request review commentjemalloc/jemalloc

Hpa purging and scalability improvements

 psset_alloc_container_remove(psset_t *psset, hpdata_t *ps) { 	} } +static size_t+psset_purge_list_ind(hpdata_t *ps) {+	size_t ndirty = hpdata_ndirty_get(ps);+	/* Shouldn't have something with no dirty pages purgeable. */+	assert(ndirty > 0);+	pszind_t pind = sz_psz2ind(sz_psz_quantize_floor(ndirty << LG_PAGE));+	/*+	 * Higher indices correspond to lists we'd like to purge earlier;+	 * increment the index for the nonhugified hpdatas first, so that we'll+	 * pick them before picking hugified ones.+	 */+	return (size_t)pind * 2 + (hpdata_huge_get(ps) ? 0 : 1);+}++static void+psset_maybe_remove_purge_list(psset_t *psset, hpdata_t *ps) {+	/*+	 * Remove the hpdata from its purge list (if it's in one).  Even if it's+	 * going to stay in the same one, by appending it during+	 * psset_update_end, we move it to the end of its queue, so that we+	 * purge LRU within a given dirtiness bucket.+	 */+	if (hpdata_purge_allowed_get(ps)) {

Not for purging (I don't think there's in general an easy way to check if a pageslab is in a list in our linked list implementation). So the "can purge" bool tells us if something is in the list when it's not being updated.

davidtgoldblatt

comment created time in 9 days

Pull request review commentjemalloc/jemalloc

Hpa purging and scalability improvements

 psset_alloc_container_remove(psset_t *psset, hpdata_t *ps) { 	} } +static size_t+psset_purge_list_ind(hpdata_t *ps) {+	size_t ndirty = hpdata_ndirty_get(ps);+	/* Shouldn't have something with no dirty pages purgeable. */+	assert(ndirty > 0);+	pszind_t pind = sz_psz2ind(sz_psz_quantize_floor(ndirty << LG_PAGE));+	/*+	 * Higher indices correspond to lists we'd like to purge earlier;+	 * increment the index for the nonhugified hpdatas first, so that we'll+	 * pick them before picking hugified ones.+	 */+	return (size_t)pind * 2 + (hpdata_huge_get(ps) ? 0 : 1);+}++static void+psset_maybe_remove_purge_list(psset_t *psset, hpdata_t *ps) {+	/*+	 * Remove the hpdata from its purge list (if it's in one).  Even if it's+	 * going to stay in the same one, by appending it during+	 * psset_update_end, we move it to the end of its queue, so that we+	 * purge LRU within a given dirtiness bucket.+	 */+	if (hpdata_purge_allowed_get(ps)) {

Discussed offline; I forgot that the LRU-nature of the purgelist means that every time a pageslab gets updated it will also needs to be adjusted for the LRU.

Do we have assertions to check for that case, i.e. pageslab not on purge list during update (I seem to recall we do)?

davidtgoldblatt

comment created time in 9 days