profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/gwhiteCL/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

cablelabs/docsis-ns3 4

NS3 module for simulating DOCSIS 3.1 links

bbriscoe/single-delay-metric 0

A single common delay metric for the communications industry

gwhiteCL/home-assistant 0

:house_with_garden: Open-source home automation platform running on Python 3

gwhiteCL/home-assistant.github.io 0

:blue_book: Jekyll project to generate home-assistant.io

gwhiteCL/NQBdraft 0

IETF draft on Non-Queue-Building Per-Hop-Behavior

push eventgwhiteCL/NQBdraft

Greg White

commit sha fb07c7bcc4d7786531ce90d4ffc4f3f8c6546a7c

initial draft-08

view details

push time in 8 days

push eventbbriscoe/single-delay-metric

gwhiteCL

commit sha 075f2f86b36d72428f17165b1807130d47ef524a

last set of suggested edits There are a few portions that aren't perfect, but I think this is close enough for a discussion paper (and the deadline is looming!).

view details

push time in a month

Pull request review commentbbriscoe/single-delay-metric

additional comments and edits

 As a strawman, we propose **the 99th percentile (P99)** as a lowest common denom  ## The 'Benchmark Effect' -As explained in the introduction, defining a delay metric is not just about choosing a percentile. The layer to measure and the background traffic pattern to use also have to be defined. As soon as these have been settled on, researchers, product engineers, etc. tend to optimize around this set of conditions---the so-called 'benchmark effect'. It is possible that harmonizing around one choice of percentile will lead to a benchmark effect. However, a percentile metric seems robust against such perverse incentives, because it seems hard to contrive performance results that fall off a cliff just beyond a certain percentile. Nonetheless, even if there were a benchmark effect, it would be harmless if the percentile chosen for the benchmark realistically reflected the needs of most applications. {ToDo: better wording for last sentence.}+As explained in the introduction, defining a delay metric is not just about choosing a percentile. The layer to measure and the background traffic pattern to use also have to be defined. As soon as these have been settled on, researchers, product engineers, etc. tend to optimize around this set of conditions---the so-called 'benchmark effect'. It is possible that harmonizing around one choice of percentile will lead to a benchmark effect. However, a percentile metric seems robust against such perverse incentives, because it seems hard to contrive performance results that fall off a cliff just beyond a certain percentile. [This is a little weak.  An intermittent outtage on a link that causes packets to queue up would cause delay spikes that might only show up in percentiles greater than P99.  These outtages would be ignored if P99 were the metric.] Nonetheless, even if there were a benchmark effect, it would be harmless if the percentile chosen for the benchmark realistically reflected the needs of most applications. {ToDo: better wording for last sentence.}

I guess the contrivance would involve ensuring that the glitch affected less than 1% of packets, with no concern for how bad those <1% were affected. e.g. having a timer that reboots the device every so often to avoid the affects of a memory leak showing up in the P99 score.

I wonder whether we need a strong argument that P99 is more robust against the benchmark effect than other metrics would be. It doesn't seem to me that it is any more or any less robust. The final point you make is the more important one. To date, the de facto single metric that gets quoted is average latency, which we argue is pretty worthless in predicting QoE. So the industry has been suffering terribly with the benchmark effect of using average latency for years (and people ignored bufferbloat, etc.).

gwhiteCL

comment created time in a month

PullRequestReviewEvent

Pull request review commentbbriscoe/single-delay-metric

additional comments and edits

 Arguments can be made for more than one delay metric to better characterize the  The factors that influence the choice of percentile are: -* The degree of late packet discard that can be efficiently concealed by real-time media coding (both that which is typical today and that which could be typical in future).+* The degree of late packet discard that can be efficiently concealed by real-time media coding (both that which is typical today and that which could be typical in future). [Again, I think there are two dimensions to this.] * The lag before results can be produced.-  For instance, to measure 99th percentile delay requires of the order of 1,000 packets minimum (an order of magnitude greater than 1/(1 - 0.99) = 100). In contrast 99.999th percentile requires 1,000,000 packets. At a packet rate of say 10k packet/s, they would take respectively 100 ms or 100 s.+  For instance, to measure 99th percentile delay requires of the order of 1,000 packets minimum (an order of magnitude greater than 1/(1 - 0.99) = 100) [I think I've seen others use this rule-of-thumb as well, but I don't think it has any basis. The number of samples needed is highly dependent on the shape of the distribution]. In contrast 99.999th percentile requires 1,000,000 packets. At a packet rate of say 10k packet/s [many real-time applications send more like 30-100 pps, would it be better to use a number in that ballpark?], they would take respectively 100 ms or 100 s.

Agreed on the PPS question.

On the percentile estimation rule of thumb, I'll send you via Slack the slides summarizing my conclusions on that when I looked into it last fall.

gwhiteCL

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentbbriscoe/single-delay-metric

additional comments and edits

  ## Introduction -A real-time application invariably discards any packet that arrives after the jitter buffer's play-out deadline. This applies for any real-time application, whether streamed video, interactive media or online gaming. For this broad class of applications a median delay metric is a distraction---waiting for the median delay before play-out would discard half the packets. To characterize the delay experienced by an application it would be more useful to quote the delay of a high percentile of packets. But which percentile? The 99th? The 99.99th? The 98th? The answer is application- and implementation-dependent, because it depends on how much discard can effectively be concealed (1%, 0.01% or 2% in the above examples, assuming no other losses). Nonetheless, it would be useful to settle on a single industry-wide percentile to characterize delay, even if it isn't perfect in every case.+A real-time application invariably discards any packet that arrives after the jitter buffer's play-out deadline. This applies for any real-time application, whether streamed video, interactive media or online gaming. [Not all games use jitter buffers] For this broad class of applications a median delay metric is a distraction---waiting for the median delay before play-out would discard half the packets. To characterize the delay experienced by an application it would be more useful to quote the delay of a high percentile of packets. But which percentile? The 99th? The 99.99th? The 98th? The answer is application- and implementation-dependent, because it depends on how much discard can effectively be concealed (1%, 0.01% or 2% in the above examples, assuming no other losses). [This text might be sufficient, but jitter buffers are usually adaptive, and my understanding is that they aim to maximize QoE.  At any particular operating point they convert variable delay into a fixed delay and residual loss.  Since QoE typically depends on both of those resulting attributes, I don't think that they generally ignore the former.]   Nonetheless, it would be useful to settle on a single industry-wide percentile to characterize delay, even if it isn't perfect in every case.  This brief discussion paper aims to start a debate on whether a percentile is the best single delay metric and, if so, which percentile the industry should converge on.  Note that the question addressed here is how to characterize a varying delay metric. That is orthogonal to the questions of what delay to measure and where to measure it. For instance, whether delay is measured in the application, at the transport layer or just at the bottleneck queue depends on the topic of interest and is an orthogonal question to the focus of this paper. Similarly, for delay under load, the question of which background traffic pattern to use depends on the scenario of interest and is an orthogonal question to the focus of this paper; which is solely about how to characterize delay variability most succinctly and usefully in *any* of these cases.  ## Don't we need two metrics? -In systems that aim for a certain delay, it has been common to quote mean delay and jitter. The distribution of delay is usually asymmetric, mostly clustered around the lower end, but with a long tail of higher delays. A traditional jitter metric is insensitive to the shape of this tail, because it is dominated by the *average* variability in the bulk of the traffic around the mean. However, it doesn't matter how little or how much variability there is in all the traffic that arrives before the play-out time. It only matters how much traffic arrives too late. The size of all the lower-than-average delay should not be allowed to counterbalance a long tail of above-average delay. +In systems that aim for a certain delay, it has been common to quote mean delay and jitter. The distribution of delay is usually asymmetric, mostly clustered around the lower end, but with a long tail of higher delays. Many jitter metrics are insensitive to the shape of this tail, because they are dominated by the *average* variability in the bulk of the traffic around the mean. [FYI, in our SCTE paper we quoted 9 different definitions of jitter in common usage in the industry] However, it doesn't matter how little or how much variability there is in all the traffic that arrives before the play-out time. It only matters how much traffic arrives too late. The size of all the lower-than-average delay should not be allowed to counterbalance a long tail of above-average delay. 

Yes, ok to cite the SCTE paper.

I guess I exaggerated when I said there were 9 different metrics :), but in any case there are 5 different ones.

Not sure I agree that most (or many) of the metrics "are dominated by the average variability in the bulk of the traffic around the mean." Two of them probably are (the IPDV metric and the std.dev(PDV) metric), but the other three (P99 PDV, max(PDV), RMS(PDV)) are not. My guess is that std.dev or IPDV are used more commonly than the others, so perhaps we could say that in predominant usage, your statement is true.

FYI, we ran one sequence of packet latency measurements from an ns3 application scenario through all 5 of those jitter definitions, and got the results that ranged from 0.06ms with one metric to 142.6ms with another. So, I would rather say something that equates to "jitter is largely meaningless, unless a precise definition is given".

gwhiteCL

comment created time in a month

Pull request review commentbbriscoe/single-delay-metric

additional comments and edits

  ## Introduction -A real-time application invariably discards any packet that arrives after the jitter buffer's play-out deadline. This applies for any real-time application, whether streamed video, interactive media or online gaming. For this broad class of applications a median delay metric is a distraction---waiting for the median delay before play-out would discard half the packets. To characterize the delay experienced by an application it would be more useful to quote the delay of a high percentile of packets. But which percentile? The 99th? The 99.99th? The 98th? The answer is application- and implementation-dependent, because it depends on how much discard can effectively be concealed (1%, 0.01% or 2% in the above examples, assuming no other losses). Nonetheless, it would be useful to settle on a single industry-wide percentile to characterize delay, even if it isn't perfect in every case.+A real-time application invariably discards any packet that arrives after the jitter buffer's play-out deadline. This applies for any real-time application, whether streamed video, interactive media or online gaming. [Not all games use jitter buffers] For this broad class of applications a median delay metric is a distraction---waiting for the median delay before play-out would discard half the packets. To characterize the delay experienced by an application it would be more useful to quote the delay of a high percentile of packets. But which percentile? The 99th? The 99.99th? The 98th? The answer is application- and implementation-dependent, because it depends on how much discard can effectively be concealed (1%, 0.01% or 2% in the above examples, assuming no other losses). [This text might be sufficient, but jitter buffers are usually adaptive, and my understanding is that they aim to maximize QoE.  At any particular operating point they convert variable delay into a fixed delay and residual loss.  Since QoE typically depends on both of those resulting attributes, I don't think that they generally ignore the former.]   Nonetheless, it would be useful to settle on a single industry-wide percentile to characterize delay, even if it isn't perfect in every case.

FWIW I think you are either misunderstanding my comment or we have a very different view as to how this works, because what I wrote is exactly the point. I don't agree with your statement that an implementation has to ensure a certain percentage of packets arrive within the deadline. In this context, I think QoE is a function of both factors, e.g. QoE = Qmax - k1latency - k2log(loss) (this is a simplification of the Cole & Rosenbluth formula for VoIP).

The network provides a certain delay distribution (likely isn't stationary, but let's assume it is).

The job of the adaptive jitter buffer is to tune to the operating point that produces the fixed latency and residual loss that maximizes the QoE function.

Consider the case where the network provides a P95 latency of 50ms and a P96 latency of 200ms. In that case, the optimal point might be 50ms and 5% loss.

Or, if the network provides P99 latency of 60ms and P99.999 latency of 62ms, the optimal point might be 62ms and 0.00001% loss.

So, even for a specific application, you can't simply say that the metric should match the residual loss target of its jitter buffer, because I don't think there is a set loss target. But, I still think that something like a P99 metric is useful.

gwhiteCL

comment created time in a month

PullRequestReviewEvent

Pull request review commentbbriscoe/single-delay-metric

additional comments and edits

 As a strawman, we propose **the 99th percentile (P99)** as a lowest common denom  ## The 'Benchmark Effect' -As explained in the introduction, defining a delay metric is not just about choosing a percentile. The layer to measure and the background traffic pattern to use also have to be defined. As soon as these have been settled on, researchers, product engineers, etc. tend to optimize around this set of conditions---the so-called 'benchmark effect'. It is possible that harmonizing around one choice of percentile will lead to a benchmark effect. However, a percentile metric seems robust against such perverse incentives, because it seems hard to contrive performance results that fall off a cliff just beyond a certain percentile. Nonetheless, even if there were a benchmark effect, it would be harmless if the percentile chosen for the benchmark realistically reflected the needs of most applications. {ToDo: better wording for last sentence.}+As explained in the introduction, defining a delay metric is not just about choosing a percentile. The layer to measure and the background traffic pattern to use also have to be defined. As soon as these have been settled on, researchers, product engineers, etc. tend to optimize around this set of conditions---the so-called 'benchmark effect'. It is possible that harmonizing around one choice of percentile will lead to a benchmark effect. However, a percentile metric seems robust against such perverse incentives, because it seems hard to contrive performance results that fall off a cliff just beyond a certain percentile. [This is a little weak.  An intermittent outtage on a link that causes packets to queue up would cause delay spikes that might only show up in percentiles greater than P99.  These outtages would be ignored if P99 were the metric.] Nonetheless, even if there were a benchmark effect, it would be harmless if the percentile chosen for the benchmark realistically reflected the needs of most applications. {ToDo: better wording for last sentence.}  ## How to articulate a percentile to the public?  Delay is not an easy metric for public consumption, because it exhibits the following undesirable features:  * Larger is not better.-  * It might be possible to invert the metric [[RPM21](#RPM21)], but rounds per minute carries an implication that it is only for repetitive tasks, which would limit the scope of the metric+  * It might be possible to invert the metric [[RPM21](#RPM21)], but rounds per minute carries an implication that it is only for repetitive tasks for which average delay is important, which would limit the scope of the metric  * it is measured in time units (ms) that seem too small to matter, and which are not common currency for a lay person   * This might also be addressed by inverting the metric -A delay percentile is expressed as a delay, so it shares the same failings, and the same potential mitigations. But a percentile carries additional baggage as well:+A delay percentile is expressed as a delay, so it shares the same failings, and the same potential mitigation [huh? it's hard for me to see how one would invert a P99 metric in a useful way]. But a percentile carries additional baggage as well:

That is a great comment! I agree. Anyone with a 3rd grade education understands that for some metrics (like cost, for example), lower is better. Let's strike any concern about latency being a 'smaller is better' metric. The other concern (units are really small) is a true concern and could be addressed more directly.

gwhiteCL

comment created time in a month

PullRequestReviewEvent

create barnchbbriscoe/single-delay-metric

branch : gwhiteCL-patch-2

created branch time in a month

create barnchbbriscoe/single-delay-metric

branch : gwhiteCL-patch-1

created branch time in a month

issue commentgwhiteCL/NQBdraft

Reference for AQM limitations wrt harmonising latency-sensitive and capacity-seeking requirements

I did not look for a reference yet, but I added some text to explain it better in draft-07:

Active Queue Management (AQM) mechanisms (such as PIE [RFC8033], DOCSIS-PIE [RFC8034], or CoDel [RFC8289]) can improve the quality of experience for latency sensitive applications, but there are practical limits to the amount of improvement that can be achieved without impacting the throughput of capacity-seeking applications. For example, AQMs generally allow a significant amount of queue depth variation in order to accommodate the behaviors of congestion control algorithms such as Reno and Cubic. If the AQM attempted to control the queue much more tightly, applications using those algorithms would not perform well.

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 1 Introduction

First paragraph – I suggest removing interactive video, as it's not always low data rate.  Interactive voice is a great example to keep.

closed time in 2 months

thomas-fossati

issue commentgwhiteCL/NQBdraft

David B -05 review -- 1 Introduction

done in draft-06

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 3 Non Queue-Building Behavior

These applications do not cause queues    to form in network buffers, but nonetheless can be subjected to   That's not strictly correct, but it's close.  A word such as "standing" or "persistent" ought to be inserted between "cause" and "queues to form" as NQB traffic can form short-lived queues that quickly drain.  A similar word insertion change is also wanted here (before "queues"):      ... as a result of sharing a network    buffer with applications that do cause queues to form

closed time in 2 months

thomas-fossati

issue commentgwhiteCL/NQBdraft

David B -05 review -- 3 Non Queue-Building Behavior

addressed in draft-06 & draft-07

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 4 The NQB PHB and its Relationship to the DiffServ Architecture

s/DiffServ/Diffserv/   Top of p.5:    performance "requirements" are not hard ones (e.g. applications will s/hard/strict/ or s/hard/absolute/ to avoid "hard" being misread as "difficult".  Not every reader will understand that the meaning of "hard" in "hard real time" is intended here.   A few paragraphs further down:       The intent of the NQB DSCP is that    it signals verifiable behavior as opposed to wants and needs. Hmm – perhaps s/as opposed to/in addition to/ ?? The NQB PHB definitely signals desired traffic treatment/conditioning, but it also signals verifiable behavior that justifies applying that treatment/conditioning.  FWIW, EF and VOICE-ADMIT go part of the way there, but explaining that in this draft is probably a diversion.          As a result, the NQB    PHB does not aim to meet specific application performance    requirements, nor does it aim to provide a differentiated service    class as defined in [RFC4594].  Instead the goal of the NQB PHB is Suggest changing that text to:        As a result, the goal of the NQB PHB is The statement about RFC 4594 is probably wrong, and it may be simpler just remove that statement rather than debate it.      These attributes eliminate the inherent value judgments that underlie    the handling of differentiated service classes in the DiffServ    architecture as it has traditionally been defined, they also    significantly simplify access control and admission control    functions, reducing them to simple verification of behavior. s/the inherent value judgements/many of the tradeoffs/ Otherwise, the draft will likely have to explain what an "inherent value judgement" is – better to not go there.

closed time in 2 months

thomas-fossati

issue commentgwhiteCL/NQBdraft

David B -05 review -- 4 The NQB PHB and its Relationship to the DiffServ Architecture

addressed in draft-06 & draft-07

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 5 DSCP Marking of NQB Traffic

(e.g. they    are NQB on higher-speed links, but QB on lower-speed links) s/are NQB on/exhibit NQB behavior on/ and s/QB on/exhibit QB behavior on/          If    there is uncertainty as to whether an application's traffic aligns    with the description of NQB behavior in the preceding section, the    application SHOULD NOT mark its traffic with the NQB DSCP. Well, the previous section (4) is somewhat conversational and informal, whereas this "SHOULD NOT" is a very important requirement.  It would be better to spell that requirement out crisply, e.g., in the form of:  If <X>, <Y> or <Z> then the application SHOULD NOT mark its traffic with the NQB DSCP.  That reduces the opportunity for different readers to reach different interpretations about what is required.         In such a    case, the application SHOULD instead implement a congestion control    mechanism, for example as described in [RFC8085] or    [I-D.ietf-tsvwg-ecn-l4s-id]. Add a specific section number or numbers to the citation of RFC 8085, as there's a lot of other material in that RFC.      It is worthwhile to note again that the NQB designation and marking    is intended to convey verifiable traffic behavior, not needs or    wants. Hmm – perhaps s/not needs/in addition to needs/ ??         Thus, a useful property of nodes that    support separate queues for NQB and QB flows would be that ... I suggest providing a forward reference at the end of this sentence to a section or sections later in this draft that explain more about how that property can be achieved, perhaps to Section 14 (Security considerations) ?

closed time in 2 months

thomas-fossati

issue commentgwhiteCL/NQBdraft

David B -05 review -- 5 DSCP Marking of NQB Traffic

addressed in draft-06 & draft-07

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 5.1 End-to-end usage and DSCP Re-marking

Material on observed DSCP remappings (last 3 paragraphs in Section 5.1) is important, but seems to detract from the flow of draft at this point.  Perhaps move that to a later section or Appendix?

closed time in 2 months

thomas-fossati

issue commentgwhiteCL/NQBdraft

David B -05 review -- 5.1 End-to-end usage and DSCP Re-marking

Moved to appendix in draft-06

thomas-fossati

comment created time in 2 months

issue closedgwhiteCL/NQBdraft

David B -05 review -- 5.2 Aggregation of the NQB PHB with other DiffServ PHBs

In these cases it is    recommended that NQB-marked traffic be aggregated with standard,    elastic, best-effort traffic, although in some cases a network    operator may instead choose to aggregate NQB traffic with Real-Time    traffic. Need more clarity on exactly what the NQB traffic is being aggregated with.  I suggest using terminology from RFC 4594, 5127 and/or 8100 and referencing the relevant RFC(s) for precision.         In either case, [RFC5127]    requires that such aggregations preserve the notion of each end-to-    end service class that is aggregated, and recommends preservation of    the DSCP as a way of accomplishing this.  Compliance with this    recommendation would serve to limit the negative impact that such    networks would have on end-to-end performance for NQB traffic. Phrasing is a bit peculiar.  It might be better to rewrite the sentence to say that the identity of the NQB traffic should be preserved when that traffic is aggregated and use RFC 5127 as an example.  Something to keep in mind is that RFC 5127 and RFC 8100 were both motivated by MPLS limitations, and have limited applicability to non-MPLS networks.

closed time in 2 months

thomas-fossati

issue closedgwhiteCL/NQBdraft

David B -05 review -- 6 Non-Queue-Building PHB Requirements

A node supporting the NQB PHB MUST provide a queue for non-queue-    building traffic separate from the queue used for queue-building    traffic. s/separate from the queue/separate from any queue/ as there may be more than one such queue.         The NQB queue SHOULD be given equal priority compared to queue-    building traffic of equivalent importance. I understand the intent, but I don't think that "equal priority" is the right choice of words because more than priority is involved.  Something like "equivalent forwarding preference" seems closer to the mark.      A node supporting the NQB PHB SHOULD treat traffic marked as Default    (DSCP=0) as QB traffic having equivalent importance to the NQB marked    traffic. [**] That may not work out well for the 42 DSCP.  Needs some rethinking.      A node supporting the NQB PHB SHOULD treat traffic marked as Default    (DSCP=0) as QB traffic having equivalent importance to the NQB marked    traffic.  A node supporting the NQB DSCP MUST support the ability to    configure the classification criteria that are used to identify QB    and NQB traffic having equivalent importance. Second sentence doesn't parse for me.  Do the criteria have equivalent importance or do the two types of traffic?      The NQB queue SHOULD have a buffer size that is significantly smaller    than the buffer provided for QB traffic.  It is expected that most QB    traffic is optimized to make use of a relatively deep buffer (e.g. on    the order of tens or hundreds of ms) in nodes where support for the    NQB PHB is advantageous (i.e. bottleneck nodes).  Providing a    similarly deep buffer for the NQB queue would be at cross purposes to    providing very low queueing delay, and would erode the incentives for    QB traffic to be marked correctly. Add an example of time-duration size of NQB queue – e.g., single-digit ms, sub-ms?

closed time in 2 months

thomas-fossati