profile
viewpoint
Bryan Boreham bboreham Weaveworks London, UK Distinguished Engineer, Weaveworks

bboreham/coatl 2

running, conducting, directing

bboreham/alertmanager 0

Prometheus Alertmanager

bboreham/amazon-vpc-cni-k8s 0

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS

bboreham/argo 0

Quest in pursuit of the Golden Fleece in Forex chaos

bboreham/arping 0

native go library to ping a host per arp datagram, or query a host mac address

bboreham/avalanche 0

Prometheus/OpenMetrics endpoint series generator for load testing.

bboreham/aws-workshop-for-kubernetes 0

AWS Workshop for Kubernetes

bboreham/cadvisor 0

Analyzes resource usage and performance characteristics of running containers.

bboreham/capnproto 0

Windows/MSVC port of Cap'n Proto serialization/RPC system

PR opened cortexproject/cortex

Drop alertmanager message about blank config to debug

We get one of these messages for every tenant every 15 seconds, and none of them are very interesting in normal operation.

Checklist

  • NA Tests updated
  • NA Documentation added
  • [x] CHANGELOG.md updated
+2 -1

0 comment

2 changed files

pr created time in 2 days

push eventcortexproject/cortex

Bryan Boreham

commit sha 6e6da035e505b8dfc28c46f6041f755f7c550278

Drop alertmanager message about blank config to debug We get one of these messages for every tenant every 15 seconds, and none of them are very interesting in normal operation. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 2 days

create barnchcortexproject/cortex

branch : am-blank-debug

created branch time in 2 days

PullRequestReviewEvent

issue openedthanos-io/thanos

Incorrect comment in DNS provider

This line is the only occurrence of defaultPort in this repo: https://github.com/thanos-io/thanos/blob/00510b5ae7fbcfc42c735532069afead30c3bb01/pkg/discovery/dns/provider.go#L111

I believe what actually happens is it returns an error.

created time in 2 days

issue commentkubernetes/kubernetes

Would like `ClusterCIDR` to be fetchable by pods.

Hopefully it’s clear that my original request was for a network plugin (daemon).

bboreham

comment created time in 2 days

issue commentcortexproject/cortex

Slow ingester Push operations

OK now I think I've got it. Result of rolling out #3191:

image

bboreham

comment created time in 4 days

issue commentcortexproject/cortex

Uniform validation errors type

I would like to see an error type that neatly expresses what went wrong without mentioning the text "http" or "grpc", neither of which are relevant.

pracucci

comment created time in 4 days

Pull request review commentcortexproject/cortex

Fix the querier returning 500 status code on query range limit errors

 type Config struct { 	UseSecondStoreBeforeTime flagext.Time `yaml:"use_second_store_before_time"` } +// UserError are errors caused by user input as opposed to errors executing the query.+type UserError string

I don't understand what point you are making in "can't control the status code". You added a case to the type switch not far above the line where it already decodes httpgrpc errors: https://github.com/cortexproject/cortex/blob/f1ac40e05e9e9d8d4d16c61121b3a91848d4ec56/pkg/api/queryable.go#L42-L49

gotjosh

comment created time in 4 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentcontainernetworking/plugins

flannel: allow input ipam parameters as basis for delegate

 const ( type NetConf struct { 	types.NetConf +	// IPAM field "replaces" that of types.NetConf which is incomplete+	IPAM          map[string]interface{} `json:"ipam,omitempty"`

Why use a string map rather than a struct ? (It came as a surprise to me that containernetworking/cni does not return any useful IPAM struct, but even so, why don't you?)

dverbeir

comment created time in 4 days

PullRequestReviewEvent

delete branch containernetworking/plugins

delete branch : fix-windows-ginko

delete time in 4 days

push eventcontainernetworking/plugins

Bryan Boreham

commit sha 1ea19f921386905010d09711af3198f241a33714

Remove extraneous test file in Windows plugin We already have a function to run all tests in the package, in netconf_suite_windows_test.go Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

Bryan Boreham

commit sha e78e6aa5b9fd7e3e66f0cb997152c44c2a4e43df

Merge pull request #529 from containernetworking/fix-windows-ginko Remove extraneous test file in Windows plugin

view details

push time in 4 days

PR merged containernetworking/plugins

Remove extraneous test file in Windows plugin

Fixes #528 (I hope)

We already have a function to run all tests in the package, in netconf_suite_windows_test.go; a second function doing the same thing leads to a spurious error report.

+0 -13

1 comment

1 changed file

bboreham

pr closed time in 4 days

issue closedcontainernetworking/plugins

Windows CI is failing

As noted at https://github.com/containernetworking/plugins/pull/521#issuecomment-669442930:

I don't understand the windows failures, it's complaining that "you may only call It inside Describe, Context, or When" yet that's exactly where the testcase calls it from...

Example: https://travis-ci.org/github/containernetworking/plugins/jobs/725216498

[1] + Failure [0.000 seconds]
[1] HNS NetConf ApplyPortMappingPolicy when portMappings is activated [It] creates NAT policies 
[1] C:/Users/travis/AppData/Local/Temp/tmp.UPynH6KdcZ/src/github.com/containernetworking/plugins/pkg/hns/netconf_windows_test.go:144
[1] 
[1]   You may only call It from within a Describe, Context or When

Looking at the code for Ginko, the test is if suite.running, which suggests to me that some other test has corrupted Ginko's internal data structures leading to this message. suite is a global and running is never set to `false.

Given it started happening when various things were updated, perhaps Ginko didn't check so carefully before?

closed time in 4 days

bboreham

Pull request review commentcontainernetworking/cni

types changes for 1.0.0

 func GetResult(r types.Result) (*Result, error) { 	return result, nil } +func copyIPConfig(from *IPConfig) *IPConfig {+	if from == nil {+		return nil+	}+	return from.Copy()+}

I haven't tried it, but I can't currently see why not. In Go, nil can have a type, but I think all the calls to this one would be statically dispatched.

dcbw

comment created time in 4 days

PullRequestReviewEvent

pull request commentcortexproject/cortex

Smooth out spikes in rate of chunk flush ops

TestIngesterSpreadFlush() is failing consistently in CI, though not on my local machine. I will try to figure that out soon.

bboreham

comment created time in 4 days

push eventcortexproject/cortex

Bryan Boreham

commit sha d42043003972af39c94b886d04f8b4f89631a43f

Wait longer for flush to complete The rate-limiter will delay flushing, so wait 80ms instead of 40ms Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 4 days

pull request commentkubernetes-sigs/cluster-api

:bug: Fix yaml file write issue

Like I said, we should get someone who knows Python to comment.

Jaakko-Os

comment created time in 4 days

issue commentkubernetes-sigs/cluster-api

hack/create-local-repository.py write yaml file fails

Which version of Python do you have?

Jaakko-Os

comment created time in 4 days

pull request commentcortexproject/cortex

Configurable active user series

In the current state of the PR, this is only wired into the blocks engine, and not chunks engine.

Is this still true?

pstibrany

comment created time in 4 days

push eventcortexproject/cortex

Bryan Boreham

commit sha dd1f3d3d604517ba7eb303edee38c44186e7adf3

Smooth out spikes in rate of chunk flush ops Ingester chunk flushes run periodically, by default every minute. Add a rate-limiter so we spread out calls to the DB across the period, avoiding spikes of intense activity which can slow down other operations such as incoming Push() calls. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 4 days

PR opened cortexproject/cortex

Smooth out spikes in rate of chunk flush ops

Ingester chunk flushes run periodically, by default every minute. Add a rate-limiter so we spread out calls to the DB across the period, avoiding spikes of intense activity which can slow down other operations such as incoming Push() calls.

Fixes #3171

Checklist

  • NA Tests updated
  • NA Documentation added
  • [x] CHANGELOG.md updated
+35 -5

0 comment

3 changed files

pr created time in 4 days

create barnchcortexproject/cortex

branch : spread-flushes

created branch time in 4 days

push eventcortexproject/cortex

Bryan Boreham

commit sha c21b829e9f1da629b782f6e5d8c13dc59877d78c

Update CHANGELOG Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 4 days

PR opened cortexproject/cortex

Turn memcached circuit-breaker on by default

With these settings it will Will trip after 10 failures within 10 seconds

Also change circuit-breaker log fields to avoid clash: the names 'from' and 'to' are used elsewhere as dates, so avoid re-using them here as strings

Checklist

  • NA Tests updated
  • NA Documentation added
  • [ ] CHANGELOG.md updated
+2 -2

0 comment

1 changed file

pr created time in 4 days

create barnchcortexproject/cortex

branch : circuit-breaker-default-on

created branch time in 4 days

issue commentcortexproject/cortex

Slow ingester Push operations

This symptom was considerably improved by #3177 when I tried it. Suggests that Go is blocking until memory is made available by GC: although Go's stop-the-world pauses are very short, individual goroutines can get hit.

bboreham

comment created time in 5 days

push eventcortexproject/cortex

Ed Welch

commit sha a4dd78927276394b1d0218e5449968604f2427ca

Improve the error message when an ingester finds an unhealthy entry in the ring during a readyness check. (#3158) Signed-off-by: Edward Welch <edward.welch@grafana.com>

view details

Marco Pracucci

commit sha 458e443ad325e56f387616c195774f5f1c113bfc

Fixed double tracing spans (#3175) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

gotjosh

commit sha 7a4bf0bcb4044134cfcf2aa3e30f2202723c9bee

Fix the querier returning 500 status code on query range limit errors (#3167) * Fix the querier returning 500 status code on query range limit errors Signed-off-by: gotjosh <josue@grafana.com> * Address review feedback Signed-off-by: gotjosh <josue@grafana.com>

view details

Peter Štibraný

commit sha 41c87cfb84e7136509c86431e835b2d2630bc25b

AwaitHealthy now returns more descriptive error message, including failures of failed services. (#3125) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Peter Štibraný

commit sha 1b7ae09e4c8f21b8a434641669a0cba11eefe0df

Make it possible for user to discover which modules are included in "All" target. (#3155) * Make it possible for user to discover which modules are part of "All" target. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use "module" terminology. Added CHANGELOG.md. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added PR number. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha 83230d75cc8d78fcde425d8c9b794ad6427fefb2

Move Cassandra and Blocks storage to GA (#3180) * Move Cassandra and Blocks storage to GA Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated architecture doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed integration tests Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Jacob Lisi

commit sha f78d08842f7083c2c6e328719aa22ab467496729

feat(e2e): set network name in e2e tests using env variable (#3176) Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

Peter Štibraný

commit sha bdd0a0f231d97df8f139e00b2a9db8572df3b1d0

Issue3139 flushing (#3140) * Add unit test simulating the issue. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Adding overrides fixes the panic. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't run ingester loops in "for flusher" mode. Only flush loops are running, started separately. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha cbaf36e4fa61c4c7a92495185e26da12881bc504

Improvements to blocksconvert (#3127) * Export start time of current plan. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * When deleting obsolete progress files, create error. Otherwise builders will try to build same plan again, and possibly crash again, in a loop. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Export more metrics about plans from scheduler. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added scanner.allowed-users flag to only generate plans for selected users. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixes metric names. Fixed error file uploaded when deleting obsolete progress file. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Invert If statement to reduce indentation. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Enhance description for allowed-users and ignore-users-regex. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha 195bac9900f8456003ef16560328c0719f342d8c

Fail the config validation if an empty node is set at root level (#3080) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha 0685271c53b8fa64c19256be015885c289f0b827

Follow symlinks and ignore target directories. (#3137) * Follow symlinks and ignore target directories. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha c76689da753e7f376fd8fea6d97748bb375d3c7a

Fixed doc about Gossip/memberlist support no more experimental (#3183) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Ganesh Vernekar

commit sha 9e8575c6a8520cb8d5b965ebe6abc4c8c2b459f7

Removed deprecated untyped record from chunks WAL (#3115) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha fa627426062890c1e6f74b9a274c9138e91b044d

Release the lock when loading of cache gen number fails. (#3182) * Release the lock when loading of cache gen number fails. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Release the lock when loading of cache gen number fails. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed linter Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Bryan Boreham

commit sha bb6784c6e467ec554ae544992482a330cc9888f0

Re-use memory for chunks in ingester QueryStream (#3177) * Refactor: extract function to set up ingester benchmark data Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Add benchmark for ingester QueryStream() Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Re-use memory for chunks in ingester QueryStream This improves performance of queries and reduces garbage-collection. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Add CHANGELOG entry Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha 8de38309c60881fb2dc82bb55e87d8b14ab53d1d

Document blocksconvert tools. (#3162) * Document blocksconvert tools. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix white space. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Small fixes to chunks conversion doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * More small fixes to blocksconvert doc Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Marco Pracucci

commit sha 3ef425d741a1a093c3de8b3073251a7c105a24b4

Micro fixes to CHANGELOG and API doc (#3184) * Micro fixes to CHANGELOG and API doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed function in e2e tests client Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Christian Simon

commit sha d24c388ceb72484dc7747d9f838f70e2d5e4795c

Update golang to 1.14.9 in build-image (#3179) * Update golang to 1.14.9 in build-image Signed-off-by: Christian Simon <simon@swine.de> * Updated build-image Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Bryan Boreham

commit sha a696c50e7ee750d2ce9ba7735c5e5355e4ae833c

Merge branch 'master' into cancel-rs-do

view details

push time in 5 days

Pull request review commentcortexproject/cortex

Cancel abandoned operations in ReplicationSet.Do()

 * [BUGFIX] Index page no longer shows links that are not valid for running Cortex instance. #3133 * [BUGFIX] Configs: prevent validation of templates to fail when using template functions. #3157 * [BUGFIX] Configuring the S3 URL with an `@` but without username and password doesn't enable the AWS static credentials anymore. #3170+* [BUGFIX] No-longer-needed ingester operations from queries are now canceled. #3178

I realise it's a small thing, but I meant to write "queries". The same code is called from the ruler, for instance.

bboreham

comment created time in 5 days

PullRequestReviewEvent

pull request commentcortexproject/cortex

Cancel abandoned operations in ReplicationSet.Do()

we can't do the same for the write path, but isn't touched

Note the top-level context does get canceled when the request returns at top level, so the write path takes precautions to avoid being impacted by that: https://github.com/cortexproject/cortex/blob/3ef425d741a1a093c3de8b3073251a7c105a24b4/pkg/distributor/distributor.go#L535-L536

The reason why I wanted this PR in the read path is that requests in the querier can run for a while after ingester requests, e.g. evaluating PromQL over the data.

bboreham

comment created time in 5 days

issue closedweaveworks/weave

Preserve old docker weaveworks images

This is a FEATURE REQUEST

What you expected to happen?

Docker announced that starting from November 1st 2020

If an image has not been pulled or pushed within 6 months, the image will be
marked “inactive.” Any images that are marked as “inactive” will be scheduled for
deletion. Only accounts that are on the Free individual or organization plans will be
subject to image retention limits. 

This may affect the images from https://hub.docker.com/u/weaveworks, and it would be great if the old images can be preserved.

The same docker announcement mentions:

If you have a Free account, you can easily upgrade to a Pro or Team account
starting at $5 per month with an annual plan.

closed time in 5 days

marcindulak

issue commentweaveworks/weave

Preserve old docker weaveworks images

OK, thanks for clarification.

marcindulak

comment created time in 5 days

PullRequestReviewEvent

Pull request review commentcortexproject/cortex

Fix the querier returning 500 status code on query range limit errors

 type Config struct { 	UseSecondStoreBeforeTime flagext.Time `yaml:"use_second_store_before_time"` } +// UserError are errors caused by user input as opposed to errors executing the query.+type UserError string

httpgrpc.Errorf() is used in non-gRPC situations, e.g. in Distributor.Push(). To be clear, I hate httpgrpc, but I also hate inconsistency.

gotjosh

comment created time in 5 days

issue commentweaveworks/weave

Preserve old docker weaveworks images

If they are pulled by kops then surely they do not meet the condition “not been pulled or pushed within 6 months”.

marcindulak

comment created time in 6 days

PullRequestReviewEvent
PullRequestReviewEvent

PR opened cortexproject/cortex

Cancel abandoned operations in ReplicationSet.Do()

Generally it will start a set of operations in parallel, and return once enough of them have succeeded. Pass down a context to each one, and cancel that context when ReplicationSet.Do() returns.

Fixes #3169

Checklist

  • NA Tests updated
  • NA Documentation added
  • [x] CHANGELOG.md updated
+16 -17

0 comment

4 changed files

pr created time in 6 days

push eventcortexproject/cortex

Peter Štibraný

commit sha df0954055c0598f3b6691328d20a9f12b58672ed

blocksconvert: Set compaction sources correctly (#3122) * Set compaction sources correctly, otherwise compactor deletes these blocks. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added test showing that deduplication filter doesn't remove the block anymore. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make lint happy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha 5ede3635eb550e57ed22a8d0d48c8fe783036b54

Honor configured Cassandra consistency when creating the keyspace (#3105) Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Roger Steneteg

commit sha 93f8f1f262afa1c3a282f8764e4ac0c07adbb98b

Added default template funcs to templates validation (#3157) Signed-off-by: Roger Steneteg <rsteneteg@ea.com>

view details

Sandeep Sukhani

commit sha 523cc1b1af38cf928eab6abd27fa79846e8193d2

add store method for getting fetcher for a chunk (#3164) * add store method for getting fetcher for a chunk Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * add validation in schema to have from time in increasing other, accept a single timestamp for getting chunk fetcher Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * suggested change from PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * Update pkg/chunk/composite_store.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha 9b5087313a2639fae474899516d2be41a2dbcf88

Index page should only show links that are valid for running Cortex (#3133) * Make index page dynamic, only show links that are valid for running Cortex. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Dmitry Shmulevich

commit sha cea6d0f1ca2271f135f5f3500429947c6ab987f0

Added support for Redis Cluster and Redis Sentinel (#2961) * Added support for Redis Cluster and Redis Sentinel (#2959) Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * fixed 'make doc' Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * fixed 'make lint' Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated Changelog Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated Changelog Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated go.mod Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * removed deprecated flags in redis config Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated modules Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated modules Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * added warning when Redis sentinel returns unexpected master info Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * fixed 'make lint' Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated unit test Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * added master group name to Redis Sentinel config Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated config validation Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * use redis universal client Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated dependencies Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * remove obsolete interface Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * add Redis DB index selection Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * updated CHANGELOG Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * Fixed CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

view details

Marco Pracucci

commit sha 604a34d997168d3b68580e2edf2d15f13736f900

Fix @ in the S3 URL without username and password (#3170) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Andrew Seigner

commit sha c3a344784a0c8ce70ef2521f543033dee3dce6c6

Doc fixes related to block storage (#3160) Fix a link to `block storage` on: https://cortexmetrics.io/docs/production/running-in-production/ Also fix a spelling error in store-gateway docs, and a port name in `query-frontend-svc.yaml`. Signed-off-by: Andrew Seigner <andrew@sig.gy>

view details

Bryan Boreham

commit sha 0b8e357751c9443da1d07b81baad834bccf0d461

Cancel abandoned operations in ReplicationSet.Do() Generally it will start a set of operations in parallel, and return once enough of them have succeeded. Pass down a context to each one, and cancel that context when ReplicationSet.Do() returns. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 6 days

create barnchcortexproject/cortex

branch : cancel-rs-do

created branch time in 6 days

PR opened cortexproject/cortex

Re-use memory for chunks in ingester QueryStream

This improves performance of queries and reduces garbage-collection.

Benchmark:

before:
BenchmarkIngester_QueryStream-4              154           7448975 ns/op        16886320 B/op      32108 allocs/op
after:
BenchmarkIngester_QueryStream-4              187           6027117 ns/op         3771120 B/op      28647 allocs/op

Relates to #3171, but I don't think this will fix the issue.

Checklist

  • [x] Tests updated
  • NA Documentation added
  • [x] CHANGELOG.md updated
+67 -14

0 comment

3 changed files

pr created time in 6 days

push eventcortexproject/cortex

Bryan Boreham

commit sha 95adc02c91a21fe978d25d4c94fb71a7afdff00f

Add CHANGELOG entry Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 6 days

PullRequestReviewEvent

push eventcortexproject/cortex

Bryan Boreham

commit sha c9e93c8e1d62c92f9eee4d20d0ff3cb227cc0daf

Refactor: extract function to set up ingester benchmark data Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

Bryan Boreham

commit sha 98421d058c730bb3d2e6cfac89899fca3f2233e9

Add benchmark for ingester QueryStream() Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

Bryan Boreham

commit sha 04f676cacd30c09b71fb0a2b17f28ec38fb167fd

Re-use memory for chunks in ingester QueryStream This improves performance of queries and reduces garbage-collection. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 6 days

Pull request review commentweaveworks/common

Rate limited errors

+package logging++import "golang.org/x/time/rate"++type rateLimitedLogger struct {+	next    Interface+	limiter *rate.Limiter+}++// NewRateLimitedLogger returns a logger.Interface that is limited to a number+// of logs per second+func NewRateLimitedLogger(logger Interface, logsPerSecond rate.Limit) Interface {+	return &rateLimitedLogger{+		next:    logger,+		limiter: rate.NewLimiter(logsPerSecond, 1),+	}+}++func (l *rateLimitedLogger) Debugf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Debugf(format, args...)+	}+}++func (l *rateLimitedLogger) Debugln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Debugln(args...)+	}+}++func (l *rateLimitedLogger) Infof(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Infof(format, args...)+	}+}++func (l *rateLimitedLogger) Infoln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Infoln(args...)+	}+}++func (l *rateLimitedLogger) Errorf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Errorf(format, args...)+	}+}++func (l *rateLimitedLogger) Errorln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Errorln(args...)+	}+}++func (l *rateLimitedLogger) Warnf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Warnf(format, args...)+	}+}++func (l *rateLimitedLogger) Warnln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Warnln(args...)+	}+}++func (l *rateLimitedLogger) WithField(key string, value interface{}) Interface {+	return &rateLimitedLogger{+		next:    l.next.WithField(key, value),+		limiter: rate.NewLimiter(l.limiter.Limit(), 0),+	}+}++func (l *rateLimitedLogger) WithFields(f Fields) Interface {+	return &rateLimitedLogger{+		next:    l.next.WithFields(f),+		limiter: rate.NewLimiter(l.limiter.Limit(), 0),

Thing is, code often calls WithFields() on every operation, e.g. the example you gave in the description, or this one: https://github.com/weaveworks/common/blob/61ffdd448099f94ec3dc9aaf93705399b97a2bb2/middleware/grpc_logging.go#L30

I expect the case you're trying to hit is where a high volume of logs from from lots of operations, so you would want the same rate-limit to be applied across them all. Try it!

joe-elliott

comment created time in 6 days

PullRequestReviewEvent

create barnchcortexproject/cortex

branch : querystream-reuse-mem

created branch time in 6 days

issue commentweaveworks/scope

Feature: a 'cordon' button for Kubernetes hosts

@goku321 please go ahead.

bboreham

comment created time in 6 days

issue closedcortexproject/cortex

Metric for chunks stored should take acount of de-duplication

Currently cortex_ingester_chunks_stored_total reports all chunks that were sent to the store; below that level the store may avoid the write and return success, but this isn't visible.

Similarly cortex_ingester_chunk_stored_bytes_total

(Note these metrics were renamed in April 2020)

(Note that even when the Cortex code doesn't detect duplication the underlying store may do so; there isn't much we can do about that in real-time)

closed time in 6 days

bboreham

issue commentcortexproject/cortex

Metric for chunks stored should take acount of de-duplication

Yes I believe it was solved by #2463

bboreham

comment created time in 6 days

Pull request review commentcortexproject/cortex

Fix the querier returning 500 status code on query range limit errors

 type Config struct { 	UseSecondStoreBeforeTime flagext.Time `yaml:"use_second_store_before_time"` } +// UserError are errors caused by user input as opposed to errors executing the query.+type UserError string

I think it's great to add a specific type here, but most similar errors are currently using httpgrpc.Errorf(). https://github.com/cortexproject/cortex/blob/5ede3635eb550e57ed22a8d0d48c8fe783036b54/pkg/querier/queryrange/query_range.go#L34-L36

Therefore I think you should either do the same here, or replace all similar ones with your new type.
Perhaps it should go in package validation?

gotjosh

comment created time in 6 days

PullRequestReviewEvent

issue openedcortexproject/cortex

Slow ingester Push operations

This screengrab is from Jaeger, requesting all Push operations longer than 100ms on a single ingester in an hour:

image

The vertical clustering suggests to me that this is caused by blocking, in one case over a second. I'm not yet sure of the cause, but my preferred theory is a lock held by a query. Hence #3169

Considering #3093 as a possible cause, I added a span for userState.createSeriesWithFingerprint() but it hasn't showed up in any of the slow traces I've looked at.

created time in 6 days

Pull request review commentweaveworks/common

Rate limited errors

+package logging++import "golang.org/x/time/rate"++type rateLimitedLogger struct {+	next    Interface+	limiter *rate.Limiter+}++// NewRateLimitedLogger returns a logger.Interface that is limited to a number+// of logs per second+func NewRateLimitedLogger(logger Interface, logsPerSecond rate.Limit) Interface {+	return &rateLimitedLogger{+		next:    logger,+		limiter: rate.NewLimiter(logsPerSecond, 1),+	}+}++func (l *rateLimitedLogger) Debugf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Debugf(format, args...)+	}+}++func (l *rateLimitedLogger) Debugln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Debugln(args...)+	}+}++func (l *rateLimitedLogger) Infof(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Infof(format, args...)+	}+}++func (l *rateLimitedLogger) Infoln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Infoln(args...)+	}+}++func (l *rateLimitedLogger) Errorf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Errorf(format, args...)+	}+}++func (l *rateLimitedLogger) Errorln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Errorln(args...)+	}+}++func (l *rateLimitedLogger) Warnf(format string, args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Warnf(format, args...)+	}+}++func (l *rateLimitedLogger) Warnln(args ...interface{}) {+	if l.limiter.Allow() {+		l.next.Warnln(args...)+	}+}++func (l *rateLimitedLogger) WithField(key string, value interface{}) Interface {+	return &rateLimitedLogger{+		next:    l.next.WithField(key, value),+		limiter: rate.NewLimiter(l.limiter.Limit(), 0),+	}+}++func (l *rateLimitedLogger) WithFields(f Fields) Interface {+	return &rateLimitedLogger{+		next:    l.next.WithFields(f),+		limiter: rate.NewLimiter(l.limiter.Limit(), 0),

Why do we get a new limiter here? Also, does it work with a burst size of zero?

joe-elliott

comment created time in 6 days

PullRequestReviewEvent

push eventweaveworks/common

Michel Hollands

commit sha 61ffdd448099f94ec3dc9aaf93705399b97a2bb2

Make HTTP middleware optional (#194) * Make HTTP middleware optional * Allow the Router to be set by caller Signed-off-by: Michel Hollands <michel.hollands@grafana.com>

view details

push time in 6 days

PullRequestReviewEvent

push eventweaveworks/common

Marco Pracucci

commit sha c5bd04a3d559f7d5858591e74bab56cbffef3145

Fixed aws.ConfigFromURL() when URL contains @ but no user/pass Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Bryan Boreham

commit sha cef15bccf23b6c0a8da38fb13157f453e121c32f

Merge pull request #198 from pracucci/fix-aws-config-from-url Fixed aws.ConfigFromURL() when URL contains @ but no user/pass

view details

push time in 6 days

PR merged weaveworks/common

Fixed aws.ConfigFromURL() when URL contains @ but no user/pass

Starting from an issue opened in Cortex https://github.com/cortexproject/cortex/issues/3034, I've realised an edge case in aws.ConfigFromURL(). If the URL is in the format scheme://@host/path, the static credentials are configured in the S3 config but with an empty ID / secret, because no user/pass is specified in the URL before the @.

In this PR I'm proposing a fix for that.

+18 -5

1 comment

2 changed files

pracucci

pr closed time in 6 days

PullRequestReviewEvent

issue openedcortexproject/cortex

Abandoned QueryStream calls to ingesters should get canceled

Queries are satisfied when (N-1) of the calls return, but the Nth one can still be doing work and chewing up resources.

Illustrative trace: image

I think the cancellation could be done in ReplicationSet.Do()

created time in 6 days

issue commentcortexproject/cortex

Ingester emitting two identical trace spans for every Push call

I suspect this, since taking it out makes the symptom go away: https://github.com/cortexproject/cortex/blob/f6122f4252a76c444a20528557d100caf93d8c7d/pkg/cortex/cortex.go#L275-L276

bboreham

comment created time in 7 days

PullRequestReviewEvent

issue openedcortexproject/cortex

Ingester emitting two identical trace spans for every Push call

Identical except for slight differences in timing. Do we add the middleware twice?

image

created time in 7 days

issue commentweaveworks/scope

Weave-scope CrashLoopBackOff - standard_init_linux.go:190 on armv7

No plans at present.

This is an Open Source project: no plans are necessary, just someone showing up to do the work.

fulvi0

comment created time in 9 days

issue commentweaveworks/weave

Kubernetes daemonset weave-npc container fails on ulogd.pcap file not found with containerd

Thanks for the report. The file /var/log/ulogd.pcap is a named pipe (aka fifo) so perhaps some new restriction is preventing it from being created from the image.

Could you please post the "previous containerd v1 config" where the issue goes away? I cannot see anything relevant in the implementation of containerd's config specifically, so I wonder if something else changes as a result. For instance, can you please show the versions of containerd and runc used in each case.

Also, please look in the logs for containerd and runc to see if they are reporting any issues relating to /var/log/ulogd.pcap.

rockholla

comment created time in 9 days

issue commentweaveworks/weave

After removing weave-npc container (part of weave-net DaemonSet) WEAVE-NPC-* iptables chains are kept

run_iptables runs the iptables program, with two different styles:

https://github.com/weaveworks/weave/blob/fb7fc7ddae2064ba3631f162f43eb2ae230f8c57/weave#L312-L320

awh

comment created time in 10 days

issue commentweaveworks/scope

Weave-scope CrashLoopBackOff - standard_init_linux.go:190 on armv7

We actually do build it for ARM; what we don't do is run the test suite and release the images. Or do any of the multi-arch manifest stuff.

https://github.com/weaveworks/scope/blob/bf6af9cfcc86c6cc023c440c313c6c69b586c54f/.circleci/config.yml#L116

If you'd like to figure out how to do any of that and submit a PR, let us know.

fulvi0

comment created time in 10 days

issue commentcortexproject/cortex

UserState metrics are not updated if a transfer fails

Same as #1705

jtlisi

comment created time in 10 days

issue commentweaveworks/weave

After removing weave-npc container (part of weave-net DaemonSet) WEAVE-NPC-* iptables chains are kept

@ensonic these lines should do the trick: https://github.com/weaveworks/weave/blob/fb7fc7ddae2064ba3631f162f43eb2ae230f8c57/weave#L498-L512

however as the previous comment noted, they don't clean up everything. (to disable it should be enough to remove the DROP rules and -j jumps to other chains.)

awh

comment created time in 10 days

issue commentcortexproject/cortex

New ingesters not ready if there's a faulty ingester in the ring

To clarify my complaint: "this instance cannot become ready until this problem is resolved" would be ok. "this instance cannot complete joining" is incorrect: the ingester has joined the ring and is in ACTIVE state.

pracucci

comment created time in 10 days

issue commentcortexproject/cortex

Ring state will be inconsistent between memory and consul after a CAS error

@pstibrany explained the last point: the next heartbeat will save the in-memory state to Consul.

So, maybe all we need is a better check in Ingester.transfer() ?

bboreham

comment created time in 10 days

issue openedcortexproject/cortex

Ring state will be inconsistent between memory and consul after a CAS error

The change in memory state is made before updating Consul, and no attempt is made to revert the former if the latter fails:

https://github.com/cortexproject/cortex/blob/a87c25fd994a2eec5ab5af0a920cc529b17c0030/pkg/ring/lifecycler.go#L710-L711

I noticed this because I got this log message:

level=warn ts=2020-09-09T19:59:32.324235593Z caller=grpc_logging.go:55 duration=15.010918473s method=/cortex.Ingester/TransferChunks err="Transfer: ChangeState: failed to CAS collectors/ring" msg="gRPC\n"

That's coming from here: https://github.com/cortexproject/cortex/blob/f27cef893d92e38395de0504f922231fc15bb7d8/pkg/ingester/transfer.go#L204

The defer in that function should then log "TransferChunks failed" and go back to PENDING state, but I don't see that log, which is explained by this line checking the in-memory state: https://github.com/cortexproject/cortex/blob/f27cef893d92e38395de0504f922231fc15bb7d8/pkg/ingester/transfer.go#L185

(Also odd: metrics show it did go to ACTIVE state)

created time in 10 days

issue commentcortexproject/cortex

New ingesters not ready if there's a faulty ingester in the ring

Today I see a message was added as part of the gossiping changes:

level=warn ts=2020-09-09T22:14:14.839462614Z caller=lifecycler.go:230 msg="found an existing instance(s) with a problem in the ring, this instance cannot complete joining and become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance ingester-5c45496986-k74wz past heartbeat timeout"

which is lying because the ingester printing this message is ACTIVE in the ring (although it is not ready).

pracucci

comment created time in 10 days

issue openedcortexproject/cortex

delay in ingester hand-over

On the receiving side it took about 15 seconds:

level=info ts=2020-09-09T19:56:26.181265072Z caller=transfer.go:56 msg="processing TransferChunks request" from_ingester=ingester-5c45496986-79mpw
level=info ts=2020-09-09T19:56:41.284612893Z caller=transfer.go:145 msg="Successfully transferred chunks" from_ingester=ingester-5c45496986-79mpw series_received=505456

On the sending side it took over a minute:

level=info ts=2020-09-09T19:56:26.169178429Z caller=transfer.go:315 msg="sending chunks" to_ingester=10.244.228.80:9095
level=info ts=2020-09-09T19:57:38.204960219Z caller=transfer.go:370 msg="successfully sent chunks" to_ingester=10.244.228.80:9095

The joining ingester seems to recover after this, although it prints a few warnings:

level=warn ts=2020-09-09T19:57:25.118710034Z caller=lifecycler.go:230 msg="found an existing instance(s) with a problem in the ring, this instance cannot complete joining and become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance ingester-5c45496986-79mpw in state LEAVING"

created time in 10 days

delete branch bboreham/prometheus

delete branch : raise-default-msps

delete time in 10 days

push eventbboreham/prometheus

Bryan Boreham

commit sha 771266b12017f45aa9c2875ce9466709bfe8843e

Change default Capacity to 2500 To maintain ratio with MaxSamplesPerSend Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

view details

push time in 11 days

push eventbboreham/prometheus

Bryan Boreham

commit sha 4a99fae310a1311ea0fccb80f3ea5fdbbb63cf69

Change default Capacity to 2500 to maintain ratio with MaxSamplesPerSend

view details

push time in 11 days

Pull request review commentprometheus/prometheus

Trim benchmarkIterator() memory usage

 func testChunk(t *testing.T, c Chunk) { }  func benchmarkIterator(b *testing.B, newChunk func() Chunk) {+	const samplesPerChunk = 250 	var ( 		t   = int64(1234123324) 		v   = 1243535.123 		exp []pair 	)-	for i := 0; i < b.N; i++ {+	for i := 0; i < samplesPerChunk; i++ { 		// t += int64(rand.Intn(10000) + 1)

Not relevant to this PR.

bboreham

comment created time in 11 days

PullRequestReviewEvent

Pull request review commentprometheus/prometheus

Trim benchmarkIterator() memory usage

 func testChunk(t *testing.T, c Chunk) { }  func benchmarkIterator(b *testing.B, newChunk func() Chunk) {+	const samplesPerChunk = 250

The number 250 comes from this line: https://github.com/prometheus/prometheus/pull/5983/files#diff-37dd1fb1ccb424c10039dd47e623d671L136

Probably I should remove that line since it doesn't do anything now.

Possibly I should replace the other 250 in the file.

bboreham

comment created time in 11 days

PullRequestReviewEvent

push eventbboreham/cluster-api

Bryan Boreham

commit sha eb54308618c15350699e4df98bf9044ff2df6f6b

expand CRD error message Co-authored-by: Vince Prignano <vince@vincepri.com>

view details

push time in 11 days

Pull request review commentkubernetes-sigs/cluster-api

🐛 Improve logging for workload connection error

 func (r *KubeadmControlPlaneReconciler) Reconcile(req ctrl.Request) (res ctrl.Re 		if err := r.updateStatus(ctx, kcp, cluster); err != nil { 			var connFailure *internal.RemoteClusterConnectionError 			if errors.As(err, &connFailure) {-				logger.Info("Could not connect to workload cluster to fetch status", "err", err)+				logger.Info("Could not connect to workload cluster to fetch status", "err", err.Error())

Yes I tested it.

I don't see why connFailure would work any better - it doesn't support .String().

I agree the logger should call .Error(), and asked in the description where that code lives.

bboreham

comment created time in 11 days

PullRequestReviewEvent

pull request commentcontainernetworking/plugins

Flannel log

It looks like this PR is actually two PRs: one matching the title and one matching the description.

On the routes side, I wonder if it would be nicer to use the ipam format for specifying them, and copy that through to the host-local delegate. This would save inventing a new identifier extraRoutes.

Either way, it feels like the new feature should be documented in the README.

dverbeir

comment created time in 11 days

issue commentonsi/ginkgo

"You may only call It from within a Describe, Context or When" isn't always clear

Another way this can happen is if you call RunSpecs() twice in the same directory. https://github.com/containernetworking/plugins/pull/529

williammartin

comment created time in 11 days

PR opened containernetworking/plugins

Remove extraneous test file in Windows plugin

Fixes #528 (I hope)

We already have a function to run all tests in the package, in netconf_suite_windows_test.go; a second function doing the same thing leads to a spurious error report.

+0 -13

0 comment

1 changed file

pr created time in 11 days

create barnchcontainernetworking/plugins

branch : fix-windows-ginko

created branch time in 11 days

issue openedcontainernetworking/plugins

Windows CI is failing

As noted at https://github.com/containernetworking/plugins/pull/521#issuecomment-669442930:

I don't understand the windows failures, it's complaining that "you may only call It inside Describe, Context, or When" yet that's exactly where the testcase calls it from...

Example: https://travis-ci.org/github/containernetworking/plugins/jobs/725216498

[1] + Failure [0.000 seconds]
[1] HNS NetConf ApplyPortMappingPolicy when portMappings is activated [It] creates NAT policies 
[1] C:/Users/travis/AppData/Local/Temp/tmp.UPynH6KdcZ/src/github.com/containernetworking/plugins/pkg/hns/netconf_windows_test.go:144
[1] 
[1]   You may only call It from within a Describe, Context or When

Looking at the code for Ginko, the test is if suite.running, which suggests to me that some other test has corrupted Ginko's internal data structures leading to this message. suite is a global and running is never set to `false.

Given it started happening when various things were updated, perhaps Ginko didn't check so carefully before?

created time in 11 days

issue closedcontainernetworking/cni

flannel: Allow for additional routes to be specified

I would like to use the flannel plugin and specify custom routes. As issue #333 outlined, it's not possible via the delegate field in the flannel CNI configuration file.

Background: I would like to bring up a container with multiple network interfaces. For example:

  • eth0: flannel interface, not the default route.
  • eth1: macvlan interface, default route.

The Kubernetes services are on 192.168.128.0/17. Flannel internal network is on 192.168.0.0/17.

Relevant configurations:

$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=192.168.0.0/17
FLANNEL_SUBNET=192.168.55.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=false

$ cat /etc/cni/net.d/flannel.conf
{
  "name": "flannel",
  "type": "flannel",
  "delegate": {
    "isDefaultGateway": false
  }
}

The routes from within the container (with the eth1 routes x'd out):

$ ip route
default via 10.x.x.x dev eth1
10.x.x.x/29 dev eth1  proto kernel  scope link  src 10.x.x.x
192.168.0.0/17 via 192.168.55.1 dev eth0
192.168.55.0/24 dev eth0  proto kernel  scope link  src 192.168.55.96

I don't know of a way to setup the routes so that there's a route to the Kubernetes services via eth0. As a result, the containers are unable to reach Kubernetes service endpoints.

Proposal:

Allow for additional routes to be specified via the flannel configuration file (/etc/cni/net.d/flannel.conf).

I prototyped a modified version of the flannel plugin, and was successfully able to add an additional route (relevant flannel CNI plugin code). I'd be willing to submit a PR if accepted.

One option would be with a new optional field under delegate. Example:

{
  "name": "flannel",
  "type": "flannel",
  "delegate": {
    "additionalRoutes": ["192.168.128.0/17"],   // optional array of CIDRs
    "isDefaultGateway": false
  }
}

Note: #337 looks like it might be related to the issue I describe above.

closed time in 11 days

awilliams

issue commentcontainernetworking/cni

flannel: Allow for additional routes to be specified

Flannel plugin has moved to https://github.com/containernetworking/plugins/

awilliams

comment created time in 11 days

Pull request review commentcontainernetworking/cni

types changes for 1.0.0

 func GetResult(r types.Result) (*Result, error) { 	return result, nil } +func copyIPConfig(from *IPConfig) *IPConfig {+	if from == nil {+		return nil+	}+	return from.Copy()+}

I'm not very clear why this function exists. The if nil bit could go in IPConfig.Copy.

dcbw

comment created time in 11 days

Pull request review commentcontainernetworking/cni

types changes for 1.0.0

 import ( 	"os"  	"github.com/containernetworking/cni/pkg/types"-	types020 "github.com/containernetworking/cni/pkg/types/020"+	types040 "github.com/containernetworking/cni/pkg/types/040" 	"github.com/containernetworking/cni/pkg/types/convert" ) -const ImplementedSpecVersion string = "0.4.0"+const ImplementedSpecVersion string = "1.0.0" -var SupportedVersions = []string{"0.3.0", "0.3.1", ImplementedSpecVersion}+var SupportedVersions = []string{ImplementedSpecVersion}

Is this right? Looks like we would still support 0.3.x and 0.4.0.

dcbw

comment created time in 11 days

Pull request review commentcontainernetworking/cni

types changes for 1.0.0

+// Copyright 2016 CNI authors+//+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+//     http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++package convert++import (+	"fmt"++	"github.com/containernetworking/cni/pkg/types"+)++// ConvertFn should convert from the given arbitrary Result type into a+// Result implementing CNI specification version passed in toVersion.+// The function is guaranteed to be passed a Result type matching the+// fromVersion it was registered with, and is guaranteed to be+// passed a toVersion matching one of the toVersions it was registered with.+type ConvertFn func(from types.Result, toVersion string) (types.Result, error)++type converter struct {+	// fromVersion is the CNI Result spec version that convertFn accepts+	fromVersion string+	// toVersions is a list of versions that convertFn can convert to+	toVersions []string+	convertFn  ConvertFn+}++var converters []*converter++func findConverter(fromVersion, toVersion string) *converter {+	for _, c := range converters {+		if c.fromVersion == fromVersion {+			for _, v := range c.toVersions {+				if v == toVersion {+					return c+				}+			}+		}+	}+	return nil+}++// Convert converts a CNI Result to the requested CNI specification version,+// or returns an error if the converstion could not be performed or failed+func Convert(from types.Result, toVersion string) (types.Result, error) {+	fromVersion := from.Version()++	// Shortcut for same version+	if fromVersion == toVersion {+		return from, nil+	}++	// Otherwise find the right converter+	c := findConverter(fromVersion, toVersion)+	if c == nil {+		return nil, fmt.Errorf("no converter for CNI result version %s to %s",+			fromVersion, toVersion)+	}+	return c.convertFn(from, toVersion)+}++// Register registers a CNI Result converter. SHOULD NOT BE CALLED+// EXCEPT FROM CNI ITSELF.

Should this be enforced by an internal package?

dcbw

comment created time in 11 days

more