profile
viewpoint

graydon/awesome-c 2

A curated list of awesome C frameworks, libraries and software.

graydon/awesome-scalability 1

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

davidungar/swift 0

The Swift Programming Language

graydon/1ml 0

1ML prototype interpreter

graydon/8086tiny 0

Official repository for 8086tiny: a tiny PC emulator/virtual machine

graydon/adapton.rust 0

Nominal Adapton in Rust

graydon/aliceml 0

A functional programming language based on Standard ML, extended with support for concurrent, distributed, and constraint programming

graydon/alternative-internet 0

A collection of interesting new networks and tech aiming at decentralisation (in some form).

graydon/antidote 0

SyncFree Reference Platform

graydon/art-rs 0

Adaptive Radix Tree in Rust

startedgoogle/libnop

started time in 16 hours

startedu2zv1wx/neut

started time in 16 hours

push eventgraydon/concorde

Graydon Hoare

commit sha e528f618625cf7faffff1b745c672594650081e4

add history variables for model checking

view details

Graydon Hoare

commit sha 20ebd8d04719a2ce2182ce7f5804d4b97bd6bf33

expand stateright tests to nontrivial properties

view details

push time in 2 days

pull request commentstellar/stellar-core

Bump to fmt library 6.2.1

r+ 25b0057

MonsieurNicolas

comment created time in 4 days

pull request commentstellar/stellar-core

clarify build instructions for Ubuntu 16.04

r+ 2cab93b

MonsieurNicolas

comment created time in 4 days

pull request commentstateright/stateright

Add support for "eventually" (at least acyclic liveness) properties.

(@jonnadal this is the only other change I had pending in the near term!)

graydon

comment created time in 4 days

PR opened stateright/stateright

Add support for "eventually" (at least acyclic liveness) properties.

This PR adds a simple form of liveness-proerty checking -- "eventually" properties. They're implemented by assigning a bit-number to each such property and propagating a compact bitset (IdSet from the id-set crate) representing properties not-yet-established on a given path, along with the state, in the queue of states. Any property still not-yet-established when we arrive at a terminal state is marked as a counterexample discovery, as with "always" properties, and reported the same way.

These properties suffer from two possible false negatives:

  • Cyclic paths are not disambiguated from DAG joins, so neither is considered terminal. If an eventually property is not met by the cycle-closing edge of a cyclic path, it is forgotten.
  • DAG joins with different sets of liveness properties at the join point are not differentiated. If an eventually property was met on the first visit through a state but not on the second visit that joins it, it is not reported.

In theory both of these could be converted to a false positive by considering revisited states as terminal. I'd be open to the argument that that's more-useful / more-correct. Not sure what's more annoying to users in practice; the code is anyways factored in such a way that you'd only have to add one more call to note_terminal_state to switch to that way, if you prefer.

+144 -36

0 comment

4 changed files

pr created time in 4 days

push eventgraydon/stateright

Graydon Hoare

commit sha c009317235ed4c8d4eaf152dd35f74dea2ab79f5

Add support for "eventually" (at least acyclic liveness) properties.

view details

push time in 4 days

create barnchgraydon/stateright

branch : eventually-properties

created branch time in 4 days

pull request commentstateright/stateright

Add duplicating_network flag.

@jonnadal I do have one more I'll file presently! then I'll pause until you land your refactoring :)

graydon

comment created time in 4 days

push eventgraydon/stateright

Graydon Hoare

commit sha 422d92fda2e7f73f4a98f70f4709b8ba4cab564d

Add duplicating_network flag.

view details

push time in 5 days

fork graydon/whitenoise-core

Differential privacy validator and runtime

fork in 5 days

startedopendifferentialprivacy/whitenoise-core

started time in 5 days

push eventstellar/medida

marta-lokhova

commit sha b804108bab235827149f4d1c39027fec301edf14

fix reporting bug

view details

Graydon Hoare

commit sha 2422f03f1892d250d6a441686b774e266f8e610a

Merge pull request #19 from marta-lokhova/console_reporter_fix fix reporting bug

view details

push time in 7 days

PR merged stellar/medida

fix reporting bug
+5 -4

1 comment

1 changed file

marta-lokhova

pr closed time in 7 days

pull request commentstellar/medida

fix reporting bug

Thanks!

marta-lokhova

comment created time in 7 days

pull request commentstellar/stellar-core

Bug 2304 scheduling cleanup

@MonsieurNicolas I believe this is now ready to land -- squashed as tidily as I can make it, and passes tests locally.

graydon

comment created time in 7 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha 16e6a9f90ca1679a575a1ce440ed3d37fcfe2028

Add lib/util/finally.h

view details

Graydon Hoare

commit sha fbae26484482f9cfefc872a7d31039423fe0d44a

Switch VirtualClock::time_point to steady_clock, add system_time_point.

view details

Graydon Hoare

commit sha eda64ca7184f43f6422cdfcc602cefce060ebd1e

Add Scheduler class.

view details

Graydon Hoare

commit sha 5085ad0b7b42935c74b5e5a38d73ffa5f85d7d29

Rewrite VirtualClock::crank to use Scheduler, eliminate YieldTimer.

view details

MonsieurNicolas

commit sha 956de65e32930b61881c7cf0b5151c620de22340

Update Visual Studio project

view details

Graydon Hoare

commit sha 510b6a328fcb85ba37c6b915706388741b43852f

const-ify some Timer methods.

view details

Graydon Hoare

commit sha ea203659606b20f34a647f2d07cac22ec4c4ed86

clang-format PendingEnvelopes, somehow it drifted

view details

push time in 7 days

push eventgraydon/stellar-core

MonsieurNicolas

commit sha 62149d534edd20742706c245899c6dbeb96b10b8

bump sqlite to 3.30.1

view details

MonsieurNicolas

commit sha a2f9cddbbfb97b1a6c50e39b2b447b2891923d38

add an explicit error message when running into a crypto error

view details

Latobarita

commit sha b1f7d30e607c8231ead07f6d65f2880897daf049

Merge pull request #2524 from MonsieurNicolas/bumpSqlite3.30.1 bump sqlite to 3.30.1 Reviewed-by: MonsieurNicolas

view details

Latobarita

commit sha 48700ffc02026994b5b2fba02e470a0c8f19babc

Merge pull request #2532 from MonsieurNicolas/cryptoCleanup add an explicit error message when running into a crypto error Reviewed-by: jonjove

view details

MonsieurNicolas

commit sha 826c2621297d2baafa3f7746811af0f000803a53

herder: adjust clamp down (was overly aggressive) this only effects fresh networks

view details

MonsieurNicolas

commit sha b28fca6fa71d11fd644d4e0d5e8de9bdc3a764d3

herder: schedule `ledgerClosed` instead of calling "inline" this avoids closing many ledgers that may sit in the SCP queue all at once

view details

Latobarita

commit sha bd7be0105ac28e8149797887cbf51f10237e0e82

Merge pull request #2530 from MonsieurNicolas/avoidAgressiveProcessingSCPQueue Avoid too aggressive processing of scp queue Reviewed-by: marta-lokhova

view details

marta-lokhova

commit sha 5d01f39cb75d073f82bd2a0a72163828d04e73fc

Clear metrics when running simulation replay

view details

MonsieurNicolas

commit sha fdca00b3a6d75a0ab0fdbb29027fad40572abea8

update MinGW instructions

view details

Latobarita

commit sha 69a5a88a8aff6e28f51c5e9a8d71eeb25177d401

Merge pull request #2523 from marta-lokhova/clear_metrics_simulation Clear metrics when running simulation replay Reviewed-by: MonsieurNicolas

view details

MonsieurNicolas

commit sha 52ebdcd13f947141b64be2c0b4aaf1dd7d85d0f6

Merge pull request #2534 from stellar/minGWupdate update MinGW instructions

view details

Graydon Hoare

commit sha 49ea0f052c93a0df168a3112018c5c95311125e1

Add scripts/Dockerfile.testing for one-off test images

view details

Graydon Hoare

commit sha 7e1b7243ea916cffc84191271a4d3eb5600ea7ac

const-ify methods on RandomEvictionCache

view details

Graydon Hoare

commit sha e9c4cdee07dcb789c53a0b4997af976e613aa8db

add functional to RandomEvictionCache

view details

Graydon Hoare

commit sha 23effdb4ded65be50f084732adae8a20a6a81fa0

Add Scheduler class.

view details

Graydon Hoare

commit sha f20f078cd560eb9c0551f1c30f5c43dbb813f462

Rewrite VirtualClock::crank and clean up supporting machinery.

view details

Graydon Hoare

commit sha f2f6b1491e07a90cbdb6d86efa5ced01f93c5726

Switch Application.postOnMainThread callers to new interface.

view details

Graydon Hoare

commit sha 72a545f25c3ccd1b149ae18cd40754110a660565

Replace YieldTimer with simpler, uniform-quantum VirtualClock::shouldYield.

view details

Graydon Hoare

commit sha 80f11b35dc068e4bea51d180d9dc6609ee821f20

Add FLOOD_MESSAGE_DEADLINE_MS config var.

view details

Graydon Hoare

commit sha c19b62451e9fe07d33cb680b70a36114a94ea640

Rename serviceTime to totalService

view details

push time in 7 days

push eventgraydon/stellar-core

MonsieurNicolas

commit sha 62149d534edd20742706c245899c6dbeb96b10b8

bump sqlite to 3.30.1

view details

MonsieurNicolas

commit sha a2f9cddbbfb97b1a6c50e39b2b447b2891923d38

add an explicit error message when running into a crypto error

view details

Latobarita

commit sha b1f7d30e607c8231ead07f6d65f2880897daf049

Merge pull request #2524 from MonsieurNicolas/bumpSqlite3.30.1 bump sqlite to 3.30.1 Reviewed-by: MonsieurNicolas

view details

Latobarita

commit sha 48700ffc02026994b5b2fba02e470a0c8f19babc

Merge pull request #2532 from MonsieurNicolas/cryptoCleanup add an explicit error message when running into a crypto error Reviewed-by: jonjove

view details

MonsieurNicolas

commit sha 826c2621297d2baafa3f7746811af0f000803a53

herder: adjust clamp down (was overly aggressive) this only effects fresh networks

view details

MonsieurNicolas

commit sha b28fca6fa71d11fd644d4e0d5e8de9bdc3a764d3

herder: schedule `ledgerClosed` instead of calling "inline" this avoids closing many ledgers that may sit in the SCP queue all at once

view details

Latobarita

commit sha bd7be0105ac28e8149797887cbf51f10237e0e82

Merge pull request #2530 from MonsieurNicolas/avoidAgressiveProcessingSCPQueue Avoid too aggressive processing of scp queue Reviewed-by: marta-lokhova

view details

marta-lokhova

commit sha 5d01f39cb75d073f82bd2a0a72163828d04e73fc

Clear metrics when running simulation replay

view details

MonsieurNicolas

commit sha fdca00b3a6d75a0ab0fdbb29027fad40572abea8

update MinGW instructions

view details

Latobarita

commit sha 69a5a88a8aff6e28f51c5e9a8d71eeb25177d401

Merge pull request #2523 from marta-lokhova/clear_metrics_simulation Clear metrics when running simulation replay Reviewed-by: MonsieurNicolas

view details

MonsieurNicolas

commit sha 52ebdcd13f947141b64be2c0b4aaf1dd7d85d0f6

Merge pull request #2534 from stellar/minGWupdate update MinGW instructions

view details

Graydon Hoare

commit sha 49ea0f052c93a0df168a3112018c5c95311125e1

Add scripts/Dockerfile.testing for one-off test images

view details

Graydon Hoare

commit sha 7e1b7243ea916cffc84191271a4d3eb5600ea7ac

const-ify methods on RandomEvictionCache

view details

Graydon Hoare

commit sha e9c4cdee07dcb789c53a0b4997af976e613aa8db

add functional to RandomEvictionCache

view details

Graydon Hoare

commit sha 23effdb4ded65be50f084732adae8a20a6a81fa0

Add Scheduler class.

view details

Graydon Hoare

commit sha f20f078cd560eb9c0551f1c30f5c43dbb813f462

Rewrite VirtualClock::crank and clean up supporting machinery.

view details

Graydon Hoare

commit sha f2f6b1491e07a90cbdb6d86efa5ced01f93c5726

Switch Application.postOnMainThread callers to new interface.

view details

Graydon Hoare

commit sha 72a545f25c3ccd1b149ae18cd40754110a660565

Replace YieldTimer with simpler, uniform-quantum VirtualClock::shouldYield.

view details

Graydon Hoare

commit sha 80f11b35dc068e4bea51d180d9dc6609ee821f20

Add FLOOD_MESSAGE_DEADLINE_MS config var.

view details

Graydon Hoare

commit sha c19b62451e9fe07d33cb680b70a36114a94ea640

Rename serviceTime to totalService

view details

push time in 7 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 WorkScheduler::scheduleOne(std::weak_ptr<WorkScheduler> weak)         return;     } -    // Note: at the moment we're using a timer (when not in STANDALONE mode) to-    // throttle _down_ the amount of "work" we do statically: we set a yield-    // timer to 1ms so that in case we're a long-running step, we don't hog the-    // CPU and incur IO debt.-    //-    // Long-running work steps are a problem in practice: specifically the-    // bucket-apply work tends to have a long pause when it commits -- longer-    // than the presumed quantum in the central IO-vs-RR-queues time-slicing-    // scheme -- and this can cause a node that is catching up to starve IO and-    // desync.-    //-    // In other words, while we _could_ avoid trying to explicitly time-slice-    // ourselves here, and instead post ourselves to the RR queues, the problem-    // with this is that the RR queues do not currently account for IO debt very-    // well and we may well go "over budget" every time we get scheduled. The RR-    // queues really need to track how much a callback exceeds its yield-timer-    // loop and treat that as over-budget debt to pay off in subsequent-    // iterations, to maintain a static schedule. This is TBD.-    //-    // See-    // https://github.com/stellar/stellar-core/issues/2304#issuecomment-614953677--    auto cb = [weak]() {-        auto innerSelf = weak.lock();-        if (!innerSelf)-        {-            return;-        }+    self->mScheduled = true;+    self->mApp.postOnMainThread(+        [weak]() {+            auto innerSelf = weak.lock();+            if (!innerSelf)+            {+                return;+            } -        try-        {-            // loop as to perform some meaningful amount of work-            YieldTimer yt(innerSelf->mApp.getClock(),-                          std::chrono::milliseconds(1));-            do+            try             {-                innerSelf->crankWork();-            } while (innerSelf->getState() == State::WORK_RUNNING &&-                     yt.shouldKeepGoing());-        }-        catch (...)-        {+                // loop as to perform some meaningful amount of work+                do+                {+                    innerSelf->crankWork();+                } while (innerSelf->getState() == State::WORK_RUNNING &&+                         !innerSelf->mApp.getClock().shouldYield());+            }+            catch (...)

Yes, fixed.

graydon

comment created time in 7 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 namespace stellar using namespace std;  static const uint32_t RECENT_CRANK_WINDOW = 1024;+static const std::chrono::milliseconds CRANK_TIME_SLICE(500);+static const size_t CRANK_EVENT_SLICE = 100;+static const std::chrono::seconds SCHEDULER_LATENCY_WINDOW(5); -VirtualClock::VirtualClock(Mode mode) : mMode(mode), mRealTimer(mIOContext)+VirtualClock::VirtualClock(Mode mode)+    : mMode(mode)+    , mActionScheduler(+          std::make_unique<Scheduler>(*this, SCHEDULER_LATENCY_WINDOW))+    , mRealTimer(mIOContext) {-    mExecutionIterator = mExecutionQueue.begin();     resetIdleCrankPercent(); }  VirtualClock::time_point-VirtualClock::now() noexcept+VirtualClock::now() const noexcept {     if (mMode == REAL_TIME)     {-        return std::chrono::system_clock::now();+        return std::chrono::steady_clock::now();     }     else     {         return mVirtualNow;     } } +VirtualClock::system_time_point+VirtualClock::system_now() const noexcept+{+    if (mMode == REAL_TIME)+    {+        return std::chrono::system_clock::now();+    }+    else+    {+        auto offset = mVirtualNow.time_since_epoch();+        return std::chrono::system_clock::time_point(offset);

I don't think there's any overflow risk here; if I'm reading correctly:

std::nano == std::ratio<1, 1000000000>`
system_clock::duration::period
      == std::ratio_multiply<std::ratio<100,1>, std::nano>
      == std::ratio<1, 10000000>

That is, they're defining a system_clock::duration with period = 100ns rather than 1ns; and doing a duration_cast from steady_clock with 1ns period to system_clock with 100ns period is just going to cause a division-by-100.

graydon

comment created time in 7 days

issue openedstellar/stellar-core

LoadManager could/should probably use overload signal from Scheduler

The overload signal in the Scheduler class introduced in #2511 is probably worth trying to integrate into (or possibly replace) the load estimate currently driving overlay/LoadManager's decisions.

created time in 7 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 ApplicationImpl::syncOwnMetrics()     mMetrics->NewCounter({"process", "memory", "handles"})         .set_count(mProcessManager->getNumRunningProcesses()); -    // Update ioservice related metrics-    mMetrics->NewCounter({"process", "ioservice", "queue"})-        .set_count(static_cast<int64_t>(getClock().getExecutionQueueSize()));+    // Update action-queue related metrics

I'll add a metric here, yeah.

I would love to redo the code in overlay/LoadManager.cpp and/or connect it too this new load signal, but I feel like this PR is already getting pretty overgrown; I'll file a followup bug.

graydon

comment created time in 7 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 VirtualClock::crank(bool block)     {         return 0;     }--    size_t nWorkDone = 0;-+    size_t progressCount = 0;     {-        std::lock_guard<std::recursive_mutex> lock(mDelayExecutionMutex);-        // Adding to mDelayedExecutionQueue is now restricted to main thread.--        mDelayExecution = true;-+        std::lock_guard<std::recursive_mutex> lock(mDispatchingMutex);+        mDispatching = true;+        mLastDispatchStart = now();         nRealTimerCancelEvents = 0;-         if (mMode == REAL_TIME)         {-            // Fire all pending timers.-            // May add work to mDelayedExecutionQueue-            nWorkDone += advanceToNow();+            // Dispatch all pending timers.+            progressCount += advanceToNow();         } -        // Pick up some work off the IO queue.-        // Calling mIOContext.poll() here may introduce unbounded delays-        // to trigger timers.-        size_t lastPoll;-        // to trigger timers so we instead put bounds on the amount of work we-        // do at once both in size and time. As MAX_CRANK_WORK_DURATION is-        // not enforced in virtual mode, we use a smaller batch size.-        // We later process items queued up in the execution queue, that has-        // similar parameters.-        size_t const WORK_BATCH_SIZE = 100;-        std::chrono::milliseconds const MAX_CRANK_WORK_DURATION{200};--        YieldTimer ioYt(*this, MAX_CRANK_WORK_DURATION, WORK_BATCH_SIZE);-        do-        {-            // May add work to mDelayedExecutionQueue.-            lastPoll = mIOContext.poll_one();-            nWorkDone += lastPoll;-        } while (lastPoll != 0 && ioYt.shouldKeepGoing());+        // Dispatch some IO event completions.+        mLastDispatchStart = now();+        size_t ioDivisor = mActionScheduler->isOverloaded() ? 2 : 1;

Ok. Accepting the exponential-backoff you added. Looks fine.

graydon

comment created time in 8 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include <chrono>+#include <functional>+#include <list>+#include <map>+#include <memory>+#include <queue>+#include <set>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Load-shedding and back-pressure: we want to be able to define a load+//      limit (in terms of worst-case time actions are delayed in the queue)+//      beyond which we consider the system to be "overloaded" and both shed+//      load where we can (dropping non-essential actions) and exert+//      backpressure on our called (eg. by having them throttle IO that+//      ultimately drives queue growth).+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticeable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We record the enqueue time and "droppability" of an action, to allow us+//     to measure load level and perform load shedding.++namespace stellar+{++class VirtualClock;+using Action = std::function<void()>;+enum class ActionType

Fixed.

graydon

comment created time in 8 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include "lib/util/finally.h"+#include "util/Timer.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++class Scheduler::ActionQueue+    : public std::enable_shared_from_this<Scheduler::ActionQueue>+{+    struct Element+    {+        Action mAction;+        VirtualClock::time_point mEnqueueTime;+        Element(VirtualClock& clock, Action&& action)+            : mAction(std::move(action)), mEnqueueTime(clock.now())+        {+        }+    };++    std::string mName;+    ActionType mType;+    nsecs mTotalService{0};+    std::chrono::steady_clock::time_point mLastService;+    std::deque<Element> mActions;++    // mIdleList is a reference to the mIdleList member of the Scheduler that+    // owns this ActionQueue. mIdlePosition is an iterator to the position in+    // that list that this ActionQueue occupies, or mIdleList.end() if it's+    // not in mIdleList.+    std::list<Qptr>& mIdleList;+    std::list<Qptr>::iterator mIdlePosition;++  public:+    ActionQueue(std::string const& name, ActionType type,+                std::list<Qptr>& idleList)+        : mName(name)+        , mType(type)+        , mLastService(std::chrono::steady_clock::time_point::max())+        , mIdleList(idleList)+        , mIdlePosition(mIdleList.end())+    {+    }++    bool+    isInIdleList() const+    {+        return mIdlePosition != mIdleList.end();+    }++    void+    addToIdleList()+    {+        assert(!isInIdleList());+        assert(isEmpty());+        mIdleList.push_front(shared_from_this());+        mIdlePosition = mIdleList.begin();+    }++    void+    removeFromIdleList()+    {+        assert(isInIdleList());+        assert(isEmpty());+        mIdleList.erase(mIdlePosition);+        mIdlePosition = mIdleList.end();+    }++    std::string const&+    name() const+    {+        return mName;+    }++    ActionType+    type() const+    {+        return mType;+    }++    nsecs+    totalService() const+    {+        return mTotalService;+    }++    std::chrono::steady_clock::time_point+    lastService() const+    {+        return mLastService;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    bool+    isOverloaded(nsecs latencyWindow, VirtualClock::time_point now) const+    {+        if (!mActions.empty())+        {+            auto timeInQueue = now - mActions.front().mEnqueueTime;+            return timeInQueue > latencyWindow;+        }+        return false;+    }++    size_t+    tryTrim(nsecs latencyWindow, VirtualClock::time_point now)+    {+        size_t n = 0;+        while (mType == ActionType::DROPPABLE_ACTION && !mActions.empty() &&+               isOverloaded(latencyWindow, now))+        {+            mActions.pop_front();+            n++;+        }+        return n;+    }++    void+    enqueue(VirtualClock& clock, Action&& action)+    {+        auto elt = Element(clock, std::move(action));+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(VirtualClock& clock, nsecs minTotalService)+    {+        auto before = clock.now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();++        auto fini = gsl::finally([&]() {+            auto after = clock.now();+            nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+            mTotalService = std::max(mTotalService + duration, minTotalService);+            mLastService = after;+        });++        action();+    }+};++Scheduler::Scheduler(VirtualClock& clock,+                     std::chrono::nanoseconds latencyWindow)+    : mRunnableActionQueues([](Qptr a, Qptr b) -> bool {+        return a->totalService() > b->totalService();+    })+    , mClock(clock)+    , mLatencyWindow(latencyWindow)+{+}++void+Scheduler::trimSingleActionQueue(Qptr q, VirtualClock::time_point now)+{+    size_t trimmed = q->tryTrim(mLatencyWindow, now);+    mStats.mActionsDroppedDueToOverload += trimmed;+    mSize -= trimmed;+}++void+Scheduler::trimIdleActionQueues(VirtualClock::time_point now)+{+    if (mIdleActionQueues.empty())+    {+        return;+    }+    Qptr old = mIdleActionQueues.back();+    if (old->lastService() + mLatencyWindow < now)+    {+        assert(old->isEmpty());+        mAllActionQueues.erase(std::make_pair(old->name(), old->type()));+        mOverloadedActionQueues.erase(old);+        old->removeFromIdleList();+    }+}++void+Scheduler::enqueue(std::string&& name, Action&& action, ActionType type)+{+    auto key = std::make_pair(name, type);+    auto qi = mAllActionQueues.find(key);+    if (qi == mAllActionQueues.end())+    {+        mStats.mQueuesActivatedFromFresh++;+        auto q = std::make_shared<ActionQueue>(name, type, mIdleActionQueues);+        qi = mAllActionQueues.emplace(key, q).first;+        mRunnableActionQueues.push(qi->second);+    }+    else+    {+        if (qi->second->isInIdleList())+        {+            assert(qi->second->isEmpty());+            mStats.mQueuesActivatedFromIdle++;+            qi->second->removeFromIdleList();+            mRunnableActionQueues.push(qi->second);+        }+    }+    mStats.mActionsEnqueued++;+    qi->second->enqueue(mClock, std::move(action));+    mSize += 1;+}++size_t+Scheduler::runOne()+{+    auto start = mClock.now();+    trimIdleActionQueues(start);+    if (mRunnableActionQueues.empty())+    {+        assert(mSize == 0);+        return 0;+    }+    else+    {+        auto q = mRunnableActionQueues.top();+        mRunnableActionQueues.pop();+        trimSingleActionQueue(q, start);++        auto putQueueBackInIdleOrActive = gsl::finally([&]() {+            if (q->isOverloaded(mLatencyWindow, mClock.now()))+            {+                mOverloadedActionQueues.insert(q);

throwing from within a noexcept-qualified function is specified to call std::terminate.

$ cat t.cc 
int bar() { throw 1; }
int foo() noexcept { bar(); } 
int main() { foo(); }
$ c++ t.cc && ./a.out 
terminate called after throwing an instance of 'int'
Aborted (core dumped)
graydon

comment created time in 8 days

fork graydon/dolt

Dolt – It's Git for Data

fork in 8 days

startedliquidata-inc/dolt

started time in 8 days

PR opened stateright/stateright

Add duplicating_network flag.

Hi! Thanks so much for building this library, it's great!

As a first contribution, here's a change that lets users prune the portion of search space that consists of duplicate messages -- some systems have trivially idempotent messages, where exploring all the extra delivery schedules is just a waste of model-checking time.

+17 -1

0 comment

1 changed file

pr created time in 8 days

create barnchgraydon/stateright

branch : non-duplicating-network

created branch time in 8 days

startedAntidoteDB/antidote

started time in 8 days

push eventgraydon/concorde

Graydon Hoare

commit sha 49476f1e2b31fa1bdae8a1b51db29a1bb37d2af9

pretty-print stateright paths

view details

push time in 8 days

startededwinb/Idris2-SH

started time in 9 days

startedclux/kube-rs

started time in 9 days

startedstrega-nil/garnet

started time in 9 days

push eventgraydon/concorde

Graydon Hoare

commit sha 452c45a934eb3a27a692bbf081c23da41ad127e0

Finish initial stateright hookup

view details

Graydon Hoare

commit sha d68537c2c12bfad36849183b66b24e70402b4077

simple property with expected cex

view details

Graydon Hoare

commit sha 64dac77980eb622c892e814953542b74564c0de7

avoid racing on pretty_env_logger::init

view details

push time in 9 days

create barnchgraydon/concorde

branch : im-sets

created branch time in 9 days

push eventgraydon/pergola

Graydon Hoare

commit sha ea5e6ccf0f3a2ed0ec87b50cbf9823b5bfb07c5f

support im, im-rc, bit-set as features

view details

Graydon Hoare

commit sha 4678db16d50e5c790d1284957498a16b5a97d6ee

cargo fmt

view details

push time in 9 days

startedservo/servo

started time in 9 days

push eventgraydon/concorde

Graydon Hoare

commit sha 99539e76b2417ccc805739cf0f0925170a132c1c

cargo fmt

view details

push time in 10 days

push eventgraydon/concorde

Graydon Hoare

commit sha 4c63c340959ec4f7965b10bdcf3606f0b33985c7

Split out modules, add many impls, hook up to stateright

view details

push time in 10 days

startedbodil/smartstring

started time in 10 days

push eventgraydon/concorde

Graydon Hoare

commit sha e2c7458d7386bc832a1753abc23b7bfd4ccdf2d5

revive test in explicit-state-machine form, no async/await.

view details

push time in 11 days

push eventgraydon/concorde

Graydon Hoare

commit sha 3b31757bc0a050027795292283b9da417e2cf78f

cleanup/refactor

view details

Graydon Hoare

commit sha 13c7328f69a34e2c12ee754ab6b2d08f31a2b185

No async/await for us, sadly.

view details

push time in 11 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha 5de532ba631b8ccdee8da1766fe63d9c08c2e9ca

Expand comment, s/totalServiceWindow/latencyWindow/, lower to 5s

view details

push time in 12 days

push eventgraydon/stellar-core

marta-lokhova

commit sha a302a624f215258ab561d2fdefed1ef29775e1d8

Command line publish: forget referenced buckets when finished

view details

marta-lokhova

commit sha f19cc59ec859825a833e0f6e6b3fb67a35923eea

Allow sorting signers for AccountEntry

view details

marta-lokhova

commit sha f24c49d51ac91dc0accada92e1d889230f116d86

Extract curr merge logic into a separate function

view details

marta-lokhova

commit sha 9f45d023da1dd99743a5e7878854b0b76ed12e92

Make toMuxedAccount utility function available outside of tests

view details

marta-lokhova

commit sha 8cb8bf72ef7adcfbb4e196fd727a7d9f806076d5

Introduce a variety of tx simulation utility functions

view details

marta-lokhova

commit sha eee08268950c24f2724f64a544471fc602ad3205

Introduce new works to simulate scaled bucketlist

view details

marta-lokhova

commit sha 35b52a4743c52b310665c3089f16037f85ea3559

Consistent naming: rename SimulationMergeOpFrame

view details

marta-lokhova

commit sha dfdda61cafbe69a990f17e1005afdfed21701726

Consistent naming: rename ApplyTransactionsWork

view details

marta-lokhova

commit sha cb9044cdce602d95f7d12505089c6cdfe1b753aa

Consistent naming: rename SimulationTransactionFrame

view details

marta-lokhova

commit sha ac054cf789861e055d583fa110148c6af87f7eaa

Consistent naming: rename SimulationTxSetFrame

view details

marta-lokhova

commit sha 6f2c25d520fc6a6ac637f499dc829ada48981c30

Transaction bridge: add helper functions

view details

marta-lokhova

commit sha b6686707fc532317eb755386a9ae164374492c68

Implement simulated fee bump tx frame

view details

marta-lokhova

commit sha 6ac2293bb229113fb73a1e0fe707ab20f2d96699

Implement simulation op frames for ManageSellOffer, ManageBuyOffer and CreatePassiveSellOffer

view details

marta-lokhova

commit sha 9bffd15cc659306a2637dfbc72f5d902628635f0

Signature utils: allow signature verification with PublicKey

view details

marta-lokhova

commit sha df83637eee1c106f8f45fdda47f80acae94c2201

SecretKey: implement operator< to allow using secret keys in std::set

view details

marta-lokhova

commit sha 86b31ae7fc2407a1fb8ad7b80d54431f8d698511

TxSimApplyTransactionsWork overhaul: extend functionality to support scaled ledger

view details

marta-lokhova

commit sha 1609577c10557b8abaf6605fca7b5c5c4ce0fc24

Ledger replay: reduce allowed gap between replay and publishing

view details

marta-lokhova

commit sha 370d3fc93766838f9723add67adcfa57845ccefc

Command line: implement bucketlist and transaction simulation commands

view details

marta-lokhova

commit sha ab0145ddf55be9f7848befefc3339b5356f4c3eb

Tx simulation README

view details

marta-lokhova

commit sha 98830c830c650e7616c817a7586c9e4ad8baa568

Put transaction simulation files under txsimulation namespace

view details

push time in 12 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha 9e25df5e13236d271488042964c317174bc71ce5

Redo overloaded-queue tracking with set.

view details

push time in 12 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha beaaa67b9f1466f5b6995956b4700b5f46195725

Redo overloaded-queue tracking with set.

view details

push time in 12 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;++    Stats mStats;++    // A queue is considered "overloaded" if its size is above the load limit.+    // This is a per-queue limit.

Implemented a basic version of this using the same queue window-duration (time) parameter for overload detection, and throttling IO (switching from 1:1 to 1:2 ratio) for shedding/backpressure. Doesn't have a two-threshold scheme like you're suggesting, though could possibly be modified in that direction to oscilate less? Unsure.

graydon

comment created time in 12 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))

Removed deadlines as a concept when adding queue-linger time; moved back to droppable-ness being a property of a queue.

graydon

comment created time in 12 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha d3a7b37afbb189c5894cc2901030da185d3625db

const-ify some Timer methods.

view details

Graydon Hoare

commit sha 9dd09521e9ea39b7214b605d02c9c1b7b1e40813

clang-format PendingEnvelopes, somehow it drifted

view details

Graydon Hoare

commit sha baa76080672a8dd6b4debe35114070a8833bd54a

mop up more direct uses of realtime clock in Timer

view details

Graydon Hoare

commit sha aa97efe1143fec9bc6cbe17e9a4560da4cb6b1cc

Switch back to ActionTypes, not deadlines; redo overload logic to be time-based

view details

Graydon Hoare

commit sha a23682458337f6c133166b04ecca149341e62beb

Shed load by throttling IO when Scheduler is overloaded.

view details

push time in 12 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();

Reworked VirtualClock::time_point to be steady_clock::time_point and use steady_clock::now, added VirtualClock::system_time_point and changed callers that need system times (eg. herder/SCP) to use it. Then switched Scheduler to use VirtualClock. This seems to work.

graydon

comment created time in 13 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();+    while (true)+    {+        size_t trimmed = q->tryTrim(mLoadLimit, now, mStats);+        if (trimmed == 0)+        {+            return;+        }+        mSize -= trimmed;+    }+}++void+Scheduler::enqueue(std::string&& name, Action&& action,+                   Scheduler::RelativeDeadline deadline)+{+    auto qi = mQueues.find(name);+    if (qi == mQueues.end())+    {+        if (mQueueCache.exists(name))+        {+            mStats.mQueuesActivatedFromCache++;+            qi = mQueues.emplace(name, mQueueCache.get(name)).first;+        }+        else+        {+            mStats.mQueuesActivatedFromFresh++;+            auto q = std::make_shared<Queue>(name);+            mQueueCache.put(name, q);+            qi = mQueues.emplace(name, q).first;+        }+        mQueueQueue.push(qi->second);+    }+    mStats.mActionsEnqueued++;+    qi->second->enqueue(std::move(action), deadline);+    mSize += 1;+    trim(qi->second);+}++size_t+Scheduler::runOne()+{+    if (mQueueQueue.empty())+    {+        return 0;+    }+    else+    {+        auto q = mQueueQueue.top();+        mQueueQueue.pop();+        trim(q);+        if (!q->isEmpty())+        {+            // We pass along a "minimum service time" floor that the service+            // time of the queue will be incremented to, at minimum.+            auto minServiceTime = mMaxServiceTime - mServiceTimeWindow;+            q->runNext(minServiceTime);

Very astute! And also a thing I wasted 2 days hunting down the absence of. Fixed.

graydon

comment created time in 13 days

pull request commentstellar/stellar-core

bump sqlite to 3.30.1

LGTM! (not r+'ing because I think you want to sequence these landings)

MonsieurNicolas

comment created time in 13 days

push eventgraydon/stellar-core

marta-lokhova

commit sha a302a624f215258ab561d2fdefed1ef29775e1d8

Command line publish: forget referenced buckets when finished

view details

marta-lokhova

commit sha f19cc59ec859825a833e0f6e6b3fb67a35923eea

Allow sorting signers for AccountEntry

view details

marta-lokhova

commit sha f24c49d51ac91dc0accada92e1d889230f116d86

Extract curr merge logic into a separate function

view details

marta-lokhova

commit sha 9f45d023da1dd99743a5e7878854b0b76ed12e92

Make toMuxedAccount utility function available outside of tests

view details

marta-lokhova

commit sha 8cb8bf72ef7adcfbb4e196fd727a7d9f806076d5

Introduce a variety of tx simulation utility functions

view details

marta-lokhova

commit sha eee08268950c24f2724f64a544471fc602ad3205

Introduce new works to simulate scaled bucketlist

view details

marta-lokhova

commit sha 35b52a4743c52b310665c3089f16037f85ea3559

Consistent naming: rename SimulationMergeOpFrame

view details

marta-lokhova

commit sha dfdda61cafbe69a990f17e1005afdfed21701726

Consistent naming: rename ApplyTransactionsWork

view details

marta-lokhova

commit sha cb9044cdce602d95f7d12505089c6cdfe1b753aa

Consistent naming: rename SimulationTransactionFrame

view details

marta-lokhova

commit sha ac054cf789861e055d583fa110148c6af87f7eaa

Consistent naming: rename SimulationTxSetFrame

view details

marta-lokhova

commit sha 6f2c25d520fc6a6ac637f499dc829ada48981c30

Transaction bridge: add helper functions

view details

marta-lokhova

commit sha b6686707fc532317eb755386a9ae164374492c68

Implement simulated fee bump tx frame

view details

marta-lokhova

commit sha 6ac2293bb229113fb73a1e0fe707ab20f2d96699

Implement simulation op frames for ManageSellOffer, ManageBuyOffer and CreatePassiveSellOffer

view details

marta-lokhova

commit sha 9bffd15cc659306a2637dfbc72f5d902628635f0

Signature utils: allow signature verification with PublicKey

view details

marta-lokhova

commit sha df83637eee1c106f8f45fdda47f80acae94c2201

SecretKey: implement operator< to allow using secret keys in std::set

view details

marta-lokhova

commit sha 86b31ae7fc2407a1fb8ad7b80d54431f8d698511

TxSimApplyTransactionsWork overhaul: extend functionality to support scaled ledger

view details

marta-lokhova

commit sha 1609577c10557b8abaf6605fca7b5c5c4ce0fc24

Ledger replay: reduce allowed gap between replay and publishing

view details

marta-lokhova

commit sha 370d3fc93766838f9723add67adcfa57845ccefc

Command line: implement bucketlist and transaction simulation commands

view details

marta-lokhova

commit sha ab0145ddf55be9f7848befefc3339b5356f4c3eb

Tx simulation README

view details

marta-lokhova

commit sha 98830c830c650e7616c817a7586c9e4ad8baa568

Put transaction simulation files under txsimulation namespace

view details

push time in 13 days

fork graydon/uftrace

Function (graph) tracer for user-space

https://uftrace.github.io/slide/

fork in 15 days

startednamhyung/uftrace

started time in 15 days

create barnchgraydon/stellar-core

branch : Scheduler

created branch time in 15 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"++#include "lib/catch.hpp"+#include "util/Logging.h"+#include <chrono>++using namespace stellar;

Oh weird! Yeah that must be some rebase fallout. Will clean up, sorry!

graydon

comment created time in 16 days

starteddoctorn/micro-mitten

started time in 16 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP

(If/elses -- you can't switch on std::chrono::duration values)

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();+    while (true)+    {+        size_t trimmed = q->tryTrim(mLoadLimit, now, mStats);+        if (trimmed == 0)+        {+            return;+        }+        mSize -= trimmed;+    }+}++void+Scheduler::enqueue(std::string&& name, Action&& action,+                   Scheduler::RelativeDeadline deadline)+{+    auto qi = mQueues.find(name);+    if (qi == mQueues.end())+    {+        if (mQueueCache.exists(name))+        {+            mStats.mQueuesActivatedFromCache++;+            qi = mQueues.emplace(name, mQueueCache.get(name)).first;+        }+        else+        {+            mStats.mQueuesActivatedFromFresh++;+            auto q = std::make_shared<Queue>(name);+            mQueueCache.put(name, q);+            qi = mQueues.emplace(name, q).first;+        }+        mQueueQueue.push(qi->second);+    }+    mStats.mActionsEnqueued++;+    qi->second->enqueue(std::move(action), deadline);+    mSize += 1;+    trim(qi->second);+}++size_t+Scheduler::runOne()+{+    if (mQueueQueue.empty())+    {+        return 0;+    }+    else+    {+        auto q = mQueueQueue.top();+        mQueueQueue.pop();+        trim(q);+        if (!q->isEmpty())+        {+            // We pass along a "minimum service time" floor that the service+            // time of the queue will be incremented to, at minimum.+            auto minServiceTime = mMaxServiceTime - mServiceTimeWindow;+            q->runNext(minServiceTime);+            mMaxServiceTime = std::max(q->serviceTime(), mMaxServiceTime);+            mSize -= 1;+            mStats.mActionsDequeued++;+            trim(q);+        }+        if (q->isEmpty())

Fixed thought differently (see my other comment -- it can't be in the runnable queue). I added a list-based secondary idle queue with expiry from the end.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();+    while (true)+    {+        size_t trimmed = q->tryTrim(mLoadLimit, now, mStats);+        if (trimmed == 0)+        {+            return;+        }+        mSize -= trimmed;+    }+}++void+Scheduler::enqueue(std::string&& name, Action&& action,+                   Scheduler::RelativeDeadline deadline)+{+    auto qi = mQueues.find(name);+    if (qi == mQueues.end())+    {+        if (mQueueCache.exists(name))+        {+            mStats.mQueuesActivatedFromCache++;+            qi = mQueues.emplace(name, mQueueCache.get(name)).first;+        }+        else+        {+            mStats.mQueuesActivatedFromFresh++;+            auto q = std::make_shared<Queue>(name);+            mQueueCache.put(name, q);+            qi = mQueues.emplace(name, q).first;+        }+        mQueueQueue.push(qi->second);+    }+    mStats.mActionsEnqueued++;+    qi->second->enqueue(std::move(action), deadline);+    mSize += 1;+    trim(qi->second);+}++size_t+Scheduler::runOne()+{+    if (mQueueQueue.empty())+    {+        return 0;+    }+    else+    {+        auto q = mQueueQueue.top();+        mQueueQueue.pop();+        trim(q);+        if (!q->isEmpty())+        {+            // We pass along a "minimum service time" floor that the service+            // time of the queue will be incremented to, at minimum.+            auto minServiceTime = mMaxServiceTime - mServiceTimeWindow;+            q->runNext(minServiceTime);+            mMaxServiceTime = std::max(q->serviceTime(), mMaxServiceTime);+            mSize -= 1;+            mStats.mActionsDequeued++;+            trim(q);

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();

Ok. This is a little delicate because of the difference between std::chrono::system_clock and std::chrono::steady_clock. We use system_clock in VirtualClock, but .. it's only appropriate for certain uses (eg. NTP-driven synchronization with the consensus time of the network), not others (eg. timer delays and scheduling, which should really be monotonic). I will take a look at cleaning this up, but it might get a little involved.

(I don't think we want to switch the entire system to monotonic time since it might drift out of consensus time with the network; maybe we can split the interfaces in VirtualClock between those that deal with consensus times and those that deal with monotonic times? Hmm..)

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in

I don't pretend to be a queueing theory expert, but that's how I interpret the commentary in this survey paper anyway: http://users.cms.caltech.edu/~adamw/papers/fb-survey.pdf

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP+                            ? Scheduler::ABS_NEVER_DROP+                            : (rel == Scheduler::DROP_ONLY_UNDER_LOAD+                                   ? Scheduler::ABS_DROP_ONLY_UNDER_LOAD+                                   : std::chrono::steady_clock::now() + rel))+        {+        }+        bool+        shouldDrop(bool overloaded, Scheduler::AbsoluteDeadline now,+                   Scheduler::Stats& stats) const+        {+            if (overloaded)+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP)+                {+                    stats.mActionsDroppedDueToOverload++;+                    return true;+                }+            }+            else+            {+                if (mDeadline != Scheduler::ABS_NEVER_DROP &&+                    mDeadline != Scheduler::ABS_DROP_ONLY_UNDER_LOAD &&+                    now > mDeadline)+                {+                    stats.mActionsDroppedDueToDeadline++;+                    return true;+                }+            }+            return false;+        }+    };++    std::string mName;+    nsecs mServiceTime{0};+    std::deque<Element> mActions;++  public:+    Queue(std::string const& name) : mName(name)+    {+    }++    std::string const&+    name() const+    {+        return mName;+    }++    nsecs+    serviceTime() const+    {+        return mServiceTime;+    }++    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    tryTrim(size_t loadLimit, Scheduler::AbsoluteDeadline now,+            Scheduler::Stats& stats)+    {+        if (!mActions.empty())+        {+            bool overloaded = size() > loadLimit;+            if (mActions.front().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_front();+                return 1;+            }+            if (mActions.back().shouldDrop(overloaded, now, stats))+            {+                mActions.pop_back();+                return 1;+            }+        }+        return 0;+    }++    void+    enqueue(Action&& action, Scheduler::RelativeDeadline deadline)+    {+        auto elt = Element(std::move(action), deadline);+        mActions.emplace_back(std::move(elt));+    }++    void+    runNext(nsecs minServiceTime)+    {+        auto before = std::chrono::steady_clock::now();+        Action action = std::move(mActions.front().mAction);+        mActions.pop_front();+        action();+        auto after = std::chrono::steady_clock::now();+        nsecs duration = std::chrono::duration_cast<nsecs>(after - before);+        mServiceTime = std::max(mServiceTime + duration, minServiceTime);+    }+};++Scheduler::Scheduler(size_t loadLimit,+                     std::chrono::nanoseconds serviceTimeWindow)+    : mQueueQueue(+          [](std::shared_ptr<Queue> a, std::shared_ptr<Queue> b) -> bool {+              return a->serviceTime() > b->serviceTime();+          })+    , mLoadLimit(loadLimit)+    , mServiceTimeWindow(serviceTimeWindow)+{+}++void+Scheduler::trim(std::shared_ptr<Queue> q)+{+    Scheduler::AbsoluteDeadline now = std::chrono::steady_clock::now();+    while (true)+    {+        size_t trimmed = q->tryTrim(mLoadLimit, now, mStats);+        if (trimmed == 0)+        {+            return;+        }+        mSize -= trimmed;+    }+}++void+Scheduler::enqueue(std::string&& name, Action&& action,+                   Scheduler::RelativeDeadline deadline)+{+    auto qi = mQueues.find(name);+    if (qi == mQueues.end())+    {+        if (mQueueCache.exists(name))+        {+            mStats.mQueuesActivatedFromCache++;+            qi = mQueues.emplace(name, mQueueCache.get(name)).first;+        }+        else+        {+            mStats.mQueuesActivatedFromFresh++;+            auto q = std::make_shared<Queue>(name);+            mQueueCache.put(name, q);+            qi = mQueues.emplace(name, q).first;+        }+        mQueueQueue.push(qi->second);+    }+    mStats.mActionsEnqueued++;+    qi->second->enqueue(std::move(action), deadline);+    mSize += 1;+    trim(qi->second);

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;++    Stats mStats;++    // A queue is considered "overloaded" if its size is above the load limit.+    // This is a per-queue limit.+    size_t const mLoadLimit;++    // The serviceTime of any queue will always be advanced to at least this+    // duration behind mMaxServiceTime, to limit the amount of "suplus" service

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;++    Stats mStats;++    // A queue is considered "overloaded" if its size is above the load limit.+    // This is a per-queue limit.+    size_t const mLoadLimit;++    // The serviceTime of any queue will always be advanced to at least this+    // duration behind mMaxServiceTime, to limit the amount of "suplus" service+    // time any given queue can acucmulate if it happens to go idle a long time.+    std::chrono::nanoseconds const mServiceTimeWindow;++    // Largest serviceTime seen in any queue. This number will continuously+    // advance as queues are serviced; it exists to serve as the upper limit+    // of the window, from which mServiceTimeWindow is subtracted to derive+    // the lower limit.+    std::chrono::nanoseconds mMaxServiceTime{0};++    // Sum of sizes of all the active queues. Maintained as items are enqueued+    // or run.+    size_t mSize{0};++    // We cache recent queues for a while after we empty them, so that we can+    // maintain a service-level estimate spanning their repeated activations.+    RandomEvictionCache<std::string, Qptr> mQueueCache{1024};

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;

With changes to idle-expiry, this now does hold all action queues, so renamed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"+#include <cassert>++namespace stellar+{+using nsecs = std::chrono::nanoseconds;++const Scheduler::RelativeDeadline Scheduler::NEVER_DROP;+const Scheduler::RelativeDeadline Scheduler::DROP_ONLY_UNDER_LOAD;+const Scheduler::AbsoluteDeadline Scheduler::ABS_NEVER_DROP;+const Scheduler::AbsoluteDeadline Scheduler::ABS_DROP_ONLY_UNDER_LOAD;++class Scheduler::Queue+{+    struct Element+    {+        Action mAction;+        Scheduler::AbsoluteDeadline mDeadline;+        Element(Action&& action, Scheduler::RelativeDeadline rel)+            : mAction(std::move(action))+            , mDeadline(rel == Scheduler::NEVER_DROP

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 #include "util/Math.h" #include "util/NonCopyable.h" +#include <functional>

There's a use of std::function in the file. It was only compiling before because all the places that used it happened to include <functional> before hand. I can add a comment to the commit or make it a separate commit if you like.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/Scheduler.h"++#include "lib/catch.hpp"+#include "util/Logging.h"+#include <chrono>++using namespace stellar;

No, I added a test file along with the class. Is this wrong?

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 VirtualClock::crank(bool block)     {         std::lock_guard<std::recursive_mutex> lock(mDispatchingMutex);         mDispatching = true;+        mLastDispatchStart = std::chrono::steady_clock::now();

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 Peer::recvMessage(StellarMessage const& stellarMsg)         fmt::format("Error RecvMessage T:{} cat:{} {} @{}", stellarMsg.type(),                     cat, toString(), mApp.getConfig().PEER_PORT); -    mApp.postOnMainThread(-        [ err, weak, sm = StellarMessage(stellarMsg) ]() {-            auto self = weak.lock();-            if (self)-            {-                self->recvRawMessage(sm);-            }-            else-            {-                CLOG(TRACE, "Overlay") << err;-            }-        },-        {catType, fmt::format("{}-{} recvMessage", cat, toString())});+    mApp.postOnMainThread([ err, weak, sm = StellarMessage(stellarMsg) ]() {+        auto self = weak.lock();+        if (self)+        {+            self->recvRawMessage(sm);+        }+        else+        {+            CLOG(TRACE, "Overlay") << err;+        }+    },+                          fmt::format("{}-{} recvMessage", cat, toString()),

Outdated comment, resolving

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 VirtualClock::sleep_for(std::chrono::microseconds us)     } } -void-VirtualClock::advanceExecutionQueue()-{-    // limit in terms of number of items-    // ideally we would just pick up everything we can up to the duration.-    // There is a potential issue with picking up too many items all the time,-    // in that round robin gets defeated if work is submitted in batches.-    size_t EXECUTION_QUEUE_BATCH_SIZE = 100;--    // DURATION here needs to be small enough that timers get to fire often-    // enough but it needs to be big enough so that we do enough work generated-    // by pulling items from the IO queue-    std::chrono::milliseconds MAX_EXECUTION_QUEUE_DURATION{500};--    // safety in case we can't keep up at all with work-    const size_t QUEUE_SIZE_DROP_THRESHOLD = 10000;-    if (mExecutionQueueSize > QUEUE_SIZE_DROP_THRESHOLD)-    {-        // 1. drain elements that can be dropped-        for (auto it = mExecutionQueue.begin(); it != mExecutionQueue.end();)-        {-            if (it->first.mExecutionType ==-                ExecutionCategory::Type::DROPPABLE_EVENT)-            {-                mExecutionQueueSize -= it->second.size();-                it = mExecutionQueue.erase(it);-            }-            else-            {-                ++it;-            }-        }-        // 2. update parameters so that we help drain the queue-        // this may lock up the instance for a while, but this is the best thing-        // we can do-        EXECUTION_QUEUE_BATCH_SIZE = mExecutionQueueSize;-        MAX_EXECUTION_QUEUE_DURATION = std::chrono::seconds(30);--        mExecutionIterator = mExecutionQueue.begin();-    }--    YieldTimer queueYt(*this, MAX_EXECUTION_QUEUE_DURATION,-                       EXECUTION_QUEUE_BATCH_SIZE);--    // moves any future work into the main execution queue-    mergeExecutionQueue();--    // consumes items in a round robin fashion-    // note that mExecutionIterator stays valid outside of this function as only-    // insertions are performed on mExecutionQueue-    while (mExecutionQueueSize != 0 && queueYt.shouldKeepGoing())-    {-        if (mExecutionIterator == mExecutionQueue.end())-        {-            mExecutionIterator = mExecutionQueue.begin();-        }--        auto& q = mExecutionIterator->second;-        auto elem = std::move(q.front());-        --mExecutionQueueSize;-        q.pop_front();-        if (q.empty())-        {-            mExecutionIterator = mExecutionQueue.erase(mExecutionIterator);-        }-        else-        {-            ++mExecutionIterator;-        }--        elem();-    }-}--void-VirtualClock::mergeExecutionQueue()-{-    while (!mDelayedExecutionQueue.empty())-    {-        auto& d = mDelayedExecutionQueue.front();-        mExecutionQueue[std::move(d.first)].emplace_back(std::move(d.second));-        ++mExecutionQueueSize;-        mDelayedExecutionQueue.pop_front();-    }-}- size_t VirtualClock::crank(bool block) {     if (mDestructing)     {         return 0;     }--    size_t nWorkDone = 0;-+    size_t progressCount = 0;     {-        std::lock_guard<std::recursive_mutex> lock(mDelayExecutionMutex);-        // Adding to mDelayedExecutionQueue is now restricted to main thread.--        mDelayExecution = true;-+        std::lock_guard<std::recursive_mutex> lock(mDispatchingMutex);+        mDispatching = true;         nRealTimerCancelEvents = 0;-         if (mMode == REAL_TIME)         {-            // Fire all pending timers.-            // May add work to mDelayedExecutionQueue-            nWorkDone += advanceToNow();+            // Dispatch all pending timers.+            progressCount += advanceToNow();         }--        // Pick up some work off the IO queue.-        // Calling mIOContext.poll() here may introduce unbounded delays-        // to trigger timers.-        size_t lastPoll;-        // to trigger timers so we instead put bounds on the amount of work we-        // do at once both in size and time. As MAX_CRANK_WORK_DURATION is-        // not enforced in virtual mode, we use a smaller batch size.-        // We later process items queued up in the execution queue, that has-        // similar parameters.-        size_t const WORK_BATCH_SIZE = 100;-        std::chrono::milliseconds const MAX_CRANK_WORK_DURATION{200};--        YieldTimer ioYt(*this, MAX_CRANK_WORK_DURATION, WORK_BATCH_SIZE);-        do-        {-            // May add work to mDelayedExecutionQueue.-            lastPoll = mIOContext.poll_one();-            nWorkDone += lastPoll;-        } while (lastPoll != 0 && ioYt.shouldKeepGoing());--        nWorkDone -= nRealTimerCancelEvents;--        if (!mExecutionQueue.empty() || !mDelayedExecutionQueue.empty())-        {-            // If any work is added here, we don't want to advance VIRTUAL_TIME-            // and also we don't need to block, as next crank will have-            // something to execute.-            nWorkDone++;-            advanceExecutionQueue();-        }--        if (mMode == VIRTUAL_TIME && nWorkDone == 0)+        // Dispatch 0-or-1 IO completions.+        progressCount += mIOContext.poll_one();+        // Dispatch 0-or-1 queued actions.+        progressCount += mActionScheduler->runOne();

Outdated comment, resolving.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

 VirtualClock::sleep_for(std::chrono::microseconds us)     } } -void-VirtualClock::advanceExecutionQueue()-{-    // limit in terms of number of items-    // ideally we would just pick up everything we can up to the duration.-    // There is a potential issue with picking up too many items all the time,-    // in that round robin gets defeated if work is submitted in batches.-    size_t EXECUTION_QUEUE_BATCH_SIZE = 100;--    // DURATION here needs to be small enough that timers get to fire often-    // enough but it needs to be big enough so that we do enough work generated-    // by pulling items from the IO queue-    std::chrono::milliseconds MAX_EXECUTION_QUEUE_DURATION{500};--    // safety in case we can't keep up at all with work-    const size_t QUEUE_SIZE_DROP_THRESHOLD = 10000;-    if (mExecutionQueueSize > QUEUE_SIZE_DROP_THRESHOLD)-    {-        // 1. drain elements that can be dropped-        for (auto it = mExecutionQueue.begin(); it != mExecutionQueue.end();)-        {-            if (it->first.mExecutionType ==-                ExecutionCategory::Type::DROPPABLE_EVENT)-            {-                mExecutionQueueSize -= it->second.size();-                it = mExecutionQueue.erase(it);-            }-            else-            {-                ++it;-            }-        }-        // 2. update parameters so that we help drain the queue-        // this may lock up the instance for a while, but this is the best thing-        // we can do-        EXECUTION_QUEUE_BATCH_SIZE = mExecutionQueueSize;-        MAX_EXECUTION_QUEUE_DURATION = std::chrono::seconds(30);--        mExecutionIterator = mExecutionQueue.begin();-    }--    YieldTimer queueYt(*this, MAX_EXECUTION_QUEUE_DURATION,-                       EXECUTION_QUEUE_BATCH_SIZE);--    // moves any future work into the main execution queue-    mergeExecutionQueue();--    // consumes items in a round robin fashion-    // note that mExecutionIterator stays valid outside of this function as only-    // insertions are performed on mExecutionQueue-    while (mExecutionQueueSize != 0 && queueYt.shouldKeepGoing())-    {-        if (mExecutionIterator == mExecutionQueue.end())-        {-            mExecutionIterator = mExecutionQueue.begin();-        }--        auto& q = mExecutionIterator->second;-        auto elem = std::move(q.front());-        --mExecutionQueueSize;-        q.pop_front();-        if (q.empty())-        {-            mExecutionIterator = mExecutionQueue.erase(mExecutionIterator);-        }-        else-        {-            ++mExecutionIterator;-        }--        elem();-    }-}--void-VirtualClock::mergeExecutionQueue()-{-    while (!mDelayedExecutionQueue.empty())-    {-        auto& d = mDelayedExecutionQueue.front();-        mExecutionQueue[std::move(d.first)].emplace_back(std::move(d.second));-        ++mExecutionQueueSize;-        mDelayedExecutionQueue.pop_front();-    }-}- size_t VirtualClock::crank(bool block) {     if (mDestructing)     {         return 0;     }--    size_t nWorkDone = 0;-+    size_t progressCount = 0;     {-        std::lock_guard<std::recursive_mutex> lock(mDelayExecutionMutex);-        // Adding to mDelayedExecutionQueue is now restricted to main thread.--        mDelayExecution = true;-+        std::lock_guard<std::recursive_mutex> lock(mDispatchingMutex);+        mDispatching = true;         nRealTimerCancelEvents = 0;-         if (mMode == REAL_TIME)         {-            // Fire all pending timers.-            // May add work to mDelayedExecutionQueue-            nWorkDone += advanceToNow();+            // Dispatch all pending timers.+            progressCount += advanceToNow();         }--        // Pick up some work off the IO queue.-        // Calling mIOContext.poll() here may introduce unbounded delays-        // to trigger timers.-        size_t lastPoll;-        // to trigger timers so we instead put bounds on the amount of work we-        // do at once both in size and time. As MAX_CRANK_WORK_DURATION is-        // not enforced in virtual mode, we use a smaller batch size.-        // We later process items queued up in the execution queue, that has-        // similar parameters.-        size_t const WORK_BATCH_SIZE = 100;-        std::chrono::milliseconds const MAX_CRANK_WORK_DURATION{200};--        YieldTimer ioYt(*this, MAX_CRANK_WORK_DURATION, WORK_BATCH_SIZE);-        do-        {-            // May add work to mDelayedExecutionQueue.-            lastPoll = mIOContext.poll_one();-            nWorkDone += lastPoll;-        } while (lastPoll != 0 && ioYt.shouldKeepGoing());--        nWorkDone -= nRealTimerCancelEvents;--        if (!mExecutionQueue.empty() || !mDelayedExecutionQueue.empty())-        {-            // If any work is added here, we don't want to advance VIRTUAL_TIME-            // and also we don't need to block, as next crank will have-            // something to execute.-            nWorkDone++;-            advanceExecutionQueue();-        }--        if (mMode == VIRTUAL_TIME && nWorkDone == 0)+        // Dispatch 0-or-1 IO completions.+        progressCount += mIOContext.poll_one();+        // Dispatch 0-or-1 queued actions.+        progressCount += mActionScheduler->runOne();+        // Subtract out any timer cancellations from the above two steps.+        progressCount -= nRealTimerCancelEvents;+        if (mMode == VIRTUAL_TIME && progressCount == 0)         {             // If we did nothing and we're in virtual mode, we're idle and can-            // skip time forward.-            // May add work to mDelayedExecutionQueue for next crank.-            nWorkDone += advanceToNext();+            // skip time forward, dispatching all timers at the next time-step.+            progressCount += advanceToNext();         }--        mDelayExecution = false;+        mDispatching = false;     }-     // At this point main and background threads can add work to next crank.-    if (block && nWorkDone == 0)+    if (block && progressCount == 0)     {-        nWorkDone += mIOContext.run_one();+        // If we didn't make progress and caller wants blocking, block now.+        progressCount += mIOContext.run_one();     }--    noteCrankOccurred(nWorkDone == 0);--    return nWorkDone;+    noteCrankOccurred(progressCount == 0);+    return progressCount; }  void-VirtualClock::postToExecutionQueue(std::function<void()>&& f,-                                   ExecutionCategory&& id)+VirtualClock::postAction(std::function<void()>&& f, std::string&& name,+                         ActionType ty) {-    std::lock_guard<std::recursive_mutex> lock(mDelayExecutionMutex);--    if (!mDelayExecution)+    std::lock_guard<std::recursive_mutex> lock(mDispatchingMutex);+    if (!mDispatching)     {         // Either we are waiting on io_context().run_one, or by some chance         // run_one was woken up by network activity and postToExecutionQueue was         // called from a background thread.          // In any case, all we need to do is ensure that we wake up `crank`, we         // do this by posting an empty event to the main IO queue-        mDelayExecution = true;+        mDispatching = true;

Fixed.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/DRRScheduler.h"+#include "medida/timer.h"+#include <queue>++namespace stellar+{+using usecs = std::chrono::microseconds;++class DRRQueue+{+    // Each queue retains DURATION_MEMORY worth of action execution-durations,+    // and estimates the next action durtion as the average over this memory.+    static constexpr const size_t DURATION_MEMORY = 8;+    size_t mNextDuration{0};+    std::vector<usecs> mRecentDurations;+    usecs mNextDurationEstimate;++    usecs mDeficit;+    std::deque<Action> mActions;++    void+    updateDurationEstimate(usecs dur)+    {+        if (mRecentDurations.size() < DURATION_MEMORY)+        {+            mRecentDurations.push_back(dur);+        }+        else+        {+            mRecentDurations.at(mNextDuration) = dur;+            mNextDuration = (mNextDuration + 1) % DURATION_MEMORY;+        }+        usecs sum;+        for (auto const& d : mRecentDurations)+        {+            sum += d;+        }+        mNextDurationEstimate = sum / mRecentDurations.size();+    }++  public:+    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    enqueueOrDrop(ActionType ty, size_t dropLimit, Action&& action)+    {+        if (ty != ActionType::DROPPABLE_ACTION || mActions.size() < dropLimit)+        {+            mActions.emplace_back(std::move(action));+            return 1;+        }+        return 0;

Outdated comment, resolving

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/DRRScheduler.h"+#include "medida/timer.h"+#include <queue>++namespace stellar+{+using usecs = std::chrono::microseconds;++class DRRQueue+{+    // Each queue retains DURATION_MEMORY worth of action execution-durations,+    // and estimates the next action durtion as the average over this memory.+    static constexpr const size_t DURATION_MEMORY = 8;+    size_t mNextDuration{0};+    std::vector<usecs> mRecentDurations;+    usecs mNextDurationEstimate;++    usecs mDeficit;+    std::deque<Action> mActions;++    void+    updateDurationEstimate(usecs dur)+    {+        if (mRecentDurations.size() < DURATION_MEMORY)+        {+            mRecentDurations.push_back(dur);+        }+        else+        {+            mRecentDurations.at(mNextDuration) = dur;+            mNextDuration = (mNextDuration + 1) % DURATION_MEMORY;+        }+        usecs sum;+        for (auto const& d : mRecentDurations)+        {+            sum += d;+        }+        mNextDurationEstimate = sum / mRecentDurations.size();+    }++  public:+    size_t+    size() const+    {+        return mActions.size();+    }++    bool+    isEmpty() const+    {+        return mActions.empty();+    }++    size_t+    enqueueOrDrop(ActionType ty, size_t dropLimit, Action&& action)+    {+        if (ty != ActionType::DROPPABLE_ACTION || mActions.size() < dropLimit)+        {+            mActions.emplace_back(std::move(action));+            return 1;+        }+        return 0;+    }++    bool+    bumpByQuantumAndMaybeRun(usecs quantum)+    {+        if (mActions.empty())+        {+            return false;+        }+        mDeficit += quantum;

Outdated comment, resolving.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;++    Stats mStats;++    // A queue is considered "overloaded" if its size is above the load limit.+    // This is a per-queue limit.+    size_t const mLoadLimit;++    // The serviceTime of any queue will always be advanced to at least this+    // duration behind mMaxServiceTime, to limit the amount of "suplus" service+    // time any given queue can acucmulate if it happens to go idle a long time.+    std::chrono::nanoseconds const mServiceTimeWindow;++    // Largest serviceTime seen in any queue. This number will continuously+    // advance as queues are serviced; it exists to serve as the upper limit+    // of the window, from which mServiceTimeWindow is subtracted to derive+    // the lower limit.+    std::chrono::nanoseconds mMaxServiceTime{0};++    // Sum of sizes of all the active queues. Maintained as items are enqueued+    // or run.+    size_t mSize{0};++    // We cache recent queues for a while after we empty them, so that we can+    // maintain a service-level estimate spanning their repeated activations.+    RandomEvictionCache<std::string, Qptr> mQueueCache{1024};

(IOW in the time between made-idle and idle-so-long-we-need-to-drop, they need to be stored somewhere and that somewhere has to allow us to make them non-idle in O(small) and allow us to drop the longest-idle one in O(small). I'd use a priority queue for that also but you can't pull random elements out of the middle of a priority queue, alas. I think probably the only way to do this is to make a secondary intrusive doubly-linked list running through the ActionQueues indicating their position in the idleness-order. Which is a bit gross but maybe not more gross than the random eviction cache, idk. I'll do a sketch to show what it looks like.)

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;

No this only stores the names of the queues that are in the runnable priority queue. The non-runnable ones are kept in the cache (under their names). I'll rename this for clarity.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;++    Stats mStats;++    // A queue is considered "overloaded" if its size is above the load limit.+    // This is a per-queue limit.+    size_t const mLoadLimit;++    // The serviceTime of any queue will always be advanced to at least this+    // duration behind mMaxServiceTime, to limit the amount of "suplus" service+    // time any given queue can acucmulate if it happens to go idle a long time.+    std::chrono::nanoseconds const mServiceTimeWindow;++    // Largest serviceTime seen in any queue. This number will continuously+    // advance as queues are serviced; it exists to serve as the upper limit+    // of the window, from which mServiceTimeWindow is subtracted to derive+    // the lower limit.+    std::chrono::nanoseconds mMaxServiceTime{0};++    // Sum of sizes of all the active queues. Maintained as items are enqueued+    // or run.+    size_t mSize{0};++    // We cache recent queues for a while after we empty them, so that we can+    // maintain a service-level estimate spanning their repeated activations.+    RandomEvictionCache<std::string, Qptr> mQueueCache{1024};

Well, if they're idle we have to put them somewhere that's not the runnable queue, otherwise they'll quickly move to the front of the runnable queue -- since they're not accumulating new service time -- and sit there not-accumulating service time.

The cache was just a way to keep them around over idle periods so we mostly preserve accounting information. I can try to rework this to be time based.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;+    using Qptr = std::shared_ptr<Queue>;+    std::map<std::string, Qptr> mQueues;+    std::priority_queue<Qptr, std::vector<Qptr>,+                        std::function<bool(Qptr, Qptr)>>+        mQueueQueue;

Ok.

graydon

comment created time in 17 days

Pull request review commentstellar/stellar-core

Bug 2304 scheduling cleanup

+#pragma once++// Copyright 2020 Stellar Development Foundation and contributors. Licensed+// under the Apache License, Version 2.0. See the COPYING file at the root+// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0++#include "util/RandomEvictionCache.h"++#include <chrono>+#include <functional>+#include <map>+#include <memory>+#include <queue>++// This class implements a multi-queue scheduler for "actions" (deferred-work+// callbacks that some subsystem wants to run "soon" on the main thread),+// attempting to satisfy a variety of constraints and goals simultaneously:+//+//   0. Non-preemption: We have no ability to preempt actions while they're+//      running so this is a hard constraint, not just a goal.+//+//   1. Serial execution: within a queue, actions must run in the order they are+//      enqueued (or a subsequence thereof, if there are dropped actions) so+//      that clients can use queue-names to sequence logically-sequential+//      actions. Scheduling happens between queues but not within them.+//+//   2. Non-starvation: Everything enqueued (and not dropped) should run+//      eventually and the longer something waits, generally the more likely it+//      is to run.+//+//   3. Fairness: time given to each queue should be roughly equal, over time.+//+//   4. Deadlines and load-shedding: some actions are best-effort and should be+//      dropped when the system is under load, and others have deadlines after+//      which there's no point running them, they're just a waste. We want to+//      be able to drop either so they don't interfere with necessary actions.+//+//   5. Simplicity: clients of the scheduler shouldn't need to adjust a lot of+//      knobs, and the implementation should be as simple as possible and+//      exhibit as fixed a behaviour as possible. We don't want surprises in+//      dynamics.+//+//   6. Non-anticipation: many scheduling algorithms require more information+//      than we have, or are so-called "anticipation" algorithms that need to+//      know (or guess) the size or duration of the next action. We certainly+//      don't know these, and while we _could_ try to estimate them, the+//      algorithms that need anticipation can go wrong if fed bad estimates;+//      we'd prefer a non-anticipation (or "blind") approach that lacks this+//      risk.+//+// Given these goals and constraints, our current best guess is a lightly+// customized algorithm in a family called FB ("foreground-background" or+// "feedback") or LAS ("least attained service") or SET ("shortest elapsed+// time").+//+// For an idea with so many names, the algorithm is utterly simple: each queue+// tracks the accumulated runtime of all actions it has run, and on each step we+// run the next action in the queue with the lowest accumulated runtime (the+// queues themselves are therefore stored in an outer priority queue to enable+// quick retrieval of the next lowest queue).+//+// This has a few interesting properties:+//+//   - A low-frequency action (eg. a ledger close) will usually be scheduled+//     immediately, as it has built up some "credit" in its queue in the form of+//     zero new runtime in the period since its last run, lowering its+//     accumulation relative to other queues.+//+//   - A continuously-rescheduled multipart action (eg. bucket-apply or+//     catchup-replay) will quickly consume any "credit" it might have and be+//     throttled back to an equal time-share with other queues: since it spent a+//     long time on-CPU it will have to wait at least until everyone else has+//     had a similar amount of time before going again.+//+//   - If a very-short-duration action occurs it has little affect on anything+//     else, either its own queue or others, in the relative scheduling order. A+//     queue that's got lots of very-small actions (eg. just issuing a pile of+//     async IOs or writing to in-memory buffers) may run them _all_ before+//     anyone else gets to go, but that's ok precisely because they're very+//     small actions. The scheduler will shift to other queues exactly when a+//     queue uses up a _noticable amount of time_ relative to others.+//+// This is an old algorithm that was not used for a long time out of fear that+// it would starve long-duration actions; but it's received renewed study in+// recent years based on the observation that such starvation only occurs in+// certain theoretically-tidy but practically-rare distributions of action+// durations, and the distributions that occur in reality behave quite well+// under it.+//+// The customizations we make are minor:+//+//   - We put a floor on the cumulative durations; low cumulative durations+//     represent a form of "credit" that a queue might use in a burst if it were+//     to be suddenly full of ready actions, or continuously-reschedule itself,+//     so we make sure no queue can have less than some (steadily rising) floor.+//+//   - We encode deadlines in actions: those with a positive deadline are always+//     dropped -- unconditionally -- if they are ready to run after their+//     deadline, and may also be dropped conditionally if the system is under+//     load / the queue is too long. The sentinel "minimum deadline" value is+//     reserved to indicate a never-droppable action. To encode a "best effort"+//     action with no particular deadline that will be dropped only under load,+//     we set the deadline to the the maximal duration value.++namespace stellar+{++using Action = std::function<void()>;++class Scheduler+{+  public:+    struct Stats+    {+        size_t mActionsEnqueued{0};+        size_t mActionsDequeued{0};+        size_t mActionsDroppedDueToOverload{0};+        size_t mActionsDroppedDueToDeadline{0};+        size_t mQueuesActivatedFromFresh{0};+        size_t mQueuesActivatedFromCache{0};+        size_t mQueuesSuspended{0};+    };++    using RelativeDeadline = std::chrono::nanoseconds;+    using AbsoluteDeadline = std::chrono::steady_clock::time_point;+    static constexpr RelativeDeadline NEVER_DROP = RelativeDeadline::min();+    static constexpr RelativeDeadline DROP_ONLY_UNDER_LOAD =+        RelativeDeadline::max();+    static constexpr AbsoluteDeadline ABS_NEVER_DROP = AbsoluteDeadline::min();+    static constexpr AbsoluteDeadline ABS_DROP_ONLY_UNDER_LOAD =+        AbsoluteDeadline::max();++  private:+    class Queue;

It's a private class inside Scheduler, but I guess, sure.

graydon

comment created time in 17 days

fork graydon/stateright

A model checker for implementing distributed systems.

https://docs.rs/stateright

fork in 19 days

startedstateright/stateright

started time in 19 days

startedbtorpey/clocks

started time in 19 days

fork graydon/clocks

Code to test various clocks

fork in 19 days

push eventgraydon/stellar-core

Graydon Hoare

commit sha 985a432695e41b8fbb15f293c7e3b0686c560be0

Add FLOOD_MESSAGE_DEADLINE_MS config var.

view details

push time in 19 days

startedgimli-rs/object

started time in 19 days

fork graydon/Gradualizer

Supporting tool for Gradual Typing

fork in 19 days

startedmcimini/Gradualizer

started time in 19 days

fork graydon/speedscope

🔬 A fast, interactive web-based viewer for performance profiles.

https://www.speedscope.app

fork in 19 days

startedjlfwong/speedscope

started time in 19 days

issue commentgraydon/newel

Loop fusion

Yes I understand the general principle of loop fusion. I'm saying that "an iteration" in this design is "32kb of data", which you process entirely in one opertor, and then hand to another operator (while it's still in cache). And the hand-off between those operators is done dynamically -- using interpretation, virtual methods, etc. -- not statically.

Obviously if you tried to fuse operators dynamically like that on a scalar-by-scalar basis, it'd be prohibitively costly: you'd spend as much time calculating the next operator to apply as you spend applying it. But the whole point of this crate is that if you batch your operators to a relatively large scale (all the way up to the size of the L1 cache) you can amortize out the cost of dynamic composition, making one dynamic-composition decision every 32,000 useful-work operations. This is cheap enough that there's no big benefit to specializing statically.

And this matters for implementation complexity. If you try to fuse at a fine grain, you have to do it statically (it's too costly at runtime to do so dynamically); but then if you try to fuse statically ahead of time you can't: you wind up having to make too many static combinations. Like you make a combined operator for add+compare, then add+div, then add+mul, then div+compare, then div+mul, ... you wind up having to AOT-compile (nOps * nTypes) ** (nFuseDepth) fused-operators. It's overwhelming very quickly. So instead you get forced to JIT, generating only the fusions the user wants, but at runtime. And writing a JIT is expensive and complicated.

The idea of this crate (which is widely practiced by other systems!) is to go with the simpler route: dynamically compose your operator graph, and amortize the dynamic composition costs by choosing a large-ish chunk size. Not so large that you start spilling out of the cache, but large enough to amortize.

There's empirical work to back this up. If you look at the referenced paper (http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf) you'll see that their "TectorWise" implementation (a clone of VectorWise -- which does what this crate does) runs almost as fast on most queries as their "Typer" implementation (a clone of HyPer -- which implements a much more complex specializing and op-fusing JIT). Because the composition points are rare relative to the very-fast vectorized inner loops. You do have to be careful to choose the chunk size to have good cache-affinity, but that's really the only delicate part. See figure 5 in the paper, somewhere between 1kb-32kb (the size of the L1 cache) is ideal: you get the amortization effects without bringing in new cache-miss effects as chunks move through the graph.

Does that all make sense?

DemiMarie-parity

comment created time in 20 days

push eventgraydon/stellar-core

marta-lokhova

commit sha 704d89a2084dbcad02c98187becea2cdb409aa3e

implement shutdown in ApplyBufferedLedgersWork

view details

Latobarita

commit sha ff6c9b0e8ba1be1eb94a2c4e14a90628ca0c0fce

Merge pull request #2510 from marta-lokhova/apply_buffered_ledger_fix Add shutdown in ApplyBufferedLedgersWork Reviewed-by: MonsieurNicolas

view details

Graydon Hoare

commit sha 617dff3c7a773503a3c75abb5075651a12cef2ba

Add scripts/Dockerfile.testing for one-off test images

view details

Graydon Hoare

commit sha c78e968d0b7b92426d4c0a717e113d854a5d87d0

const-ify methods on RandomEvictionCache

view details

Graydon Hoare

commit sha 020ac06edc0a2a77cacb61679c841a0dfc41f5ba

add functional to RandomEvictionCache

view details

Graydon Hoare

commit sha 6dc17b98fbb4f25b7819673eed73d00996033615

Add Scheduler class.

view details

Graydon Hoare

commit sha 568affb8232cda69020a28b25eda6c5bce5b5b97

Rewrite VirtualClock::crank and clean up supporting machinery.

view details

Graydon Hoare

commit sha 68517818e4a49c69dda23251c7f95e5ca1d27b2d

Switch Application.postOnMainThread callers to new interface.

view details

Graydon Hoare

commit sha 91d3aa0aa096135c246f3506a7d73fd4fb83853e

Replace YieldTimer with simpler, uniform-quantum VirtualClock::shouldYield.

view details

push time in 20 days

push eventgraydon/stellar-core

marta-lokhova

commit sha 704d89a2084dbcad02c98187becea2cdb409aa3e

implement shutdown in ApplyBufferedLedgersWork

view details

Latobarita

commit sha ff6c9b0e8ba1be1eb94a2c4e14a90628ca0c0fce

Merge pull request #2510 from marta-lokhova/apply_buffered_ledger_fix Add shutdown in ApplyBufferedLedgersWork Reviewed-by: MonsieurNicolas

view details

Graydon Hoare

commit sha 617dff3c7a773503a3c75abb5075651a12cef2ba

Add scripts/Dockerfile.testing for one-off test images

view details

push time in 20 days

issue commentgraydon/newel

Loop fusion

@DemiMarie-parity Hmm, I'm not sure. I think the general idea with vectorized interpreters is you pipeline a single L1-cache-worth of data through multiple operators before recycling the storage. Fusing loops at a code level would cause significant combinatorial expansion of the operator set. Keep in mind this is not a JIT, that's the whole point -- we're AOT compiling all the loops.

Or am I misunderstanding? What sort of loop fusion do you have in mind?

DemiMarie-parity

comment created time in 20 days

more