profile
viewpoint
Tom Wilkie tomwilkie @grafana London https://grafana.com @grafana VP Product, @prometheus & @cortexproject maintainer. Previously @kausalco, @weaveworks, @google, @acunu

grafana/loki 9527

Like Prometheus, but for logs.

cortexproject/cortex 2831

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

grafana/tanka 912

Flexible, reusable and concise configuration for Kubernetes

pavoni/pywemo 90

Lightweight Python module to discover and control WeMo devices.

tomwilkie/awesomation 25

Home Awesomation; python home automation system.

tomwilkie/frankenstein 12

A multitenant, horizontally scalable Prometheus as a Service

tomwilkie/cubienetes 11

Cubienetes: A Kubernetes Cluster on Cubieboard2s

grafana/cortex 10

A multitenant, horizontally scalable Prometheus as a Service

tomwilkie/aws-self-serve 4

A self service portal for ec2 instances

tomwilkie/boto 1

Python interface to Amazon Web Services

pull request commentgrafana/cortex-jsonnet

Make job names configurable

Can we generate & install two version of the mixin then?

The challenge with this approach is it won't work for people not on k8s. And I'd rather avoid have 3 versions of the mixin if possible.

jtlisi

comment created time in a day

pull request commentgrafana/cortex-jsonnet

Make job names configurable

Looks like you’re trying to make this work for the single binary - does my previous PR not do the trick?

jtlisi

comment created time in 2 days

PullRequestEvent

PR closed grafana/cortex-jsonnet

Make job names configurable

This PR allows the operator to configure the job names for the generated dashboards. This allows users to configure the mixin for non-default deployments of cortex and configure the mixin for use with the single binary deployment of cortex. A single binary deployment of cortex will not have unique job s for all of its modules. Instead, $namespace/cortex will be the job name in most deployments. Exposing the job constant used in the cortex dashboards will allow the user to configure and generate dashboards that work best for them.

For example: Assuming a default micro-service deployment of cortex with the standard job names and a single binary deployment with the job name cortex. The following config could be set to allow for dashboards that accommodate both deployments.

{
  _config: {
    job_names: {
      ingester: '(ingester|cortex$)',
      distributor: ('distributor|cortex$)',
      querier: '(querier|cortex$)',
      query_frontend: '(query-frontend|cortex$)',
      table_manager: '(table-manager|cortex$)',
      store_gateway: '(store-gateway|cortex$)',
    },
  }
}
  • Later this can ideally be simplified into a smaller selection of configs
+116 -106

1 comment

6 changed files

jtlisi

pr closed time in 2 days

pull request commentgrafana/cortex-jsonnet

Make job names configurable

Looks like your trying to make this work for the single binary - does my previous effort not work here?

jtlisi

comment created time in 2 days

Pull request review commentgrafana/loki

Keep scrape config in line with the new Prometheus scrape config

 config {         replacement: '$1',       }, -      // But also include the namespace as a separate label, for routing alerts+      // But also include the namespace, container, pod as separate labels,+      // for routing alerts and joining with cAdvisor metrics.       {         source_labels: ['__meta_kubernetes_namespace'],         action: 'replace',         target_label: 'namespace',       },--      // Rename instances to be the pod name       {         source_labels: ['__meta_kubernetes_pod_name'],         action: 'replace',-        target_label: 'instance',+        target_label: 'pod',  // Not 'pod_name', which disappeared in K8s 1.16.+      },+      {+        source_labels: ['__meta_kubernetes_container_name'],+        action: 'replace',+        target_label: 'container',  // Not 'container_name', which disappeared in K8s 1.16.       }, -      // Include container_name label+      // Rename instances to the concatenation of pod:container:port.+      // All three components are needed to guarantee a unique instance label.

So how about just not having an instance label in Loki?

Make sense, as long as namespace/pod/container is consistent.

beorn7

comment created time in 5 days

issue commentgrafana/grafana

Invalid JSON: Unexpected token { in JSON at position 171

The query was working in 6.x, so I'm not sure it was an problem handling the error...

tomwilkie

comment created time in 8 days

issue openedgrafana/grafana

Invalid JSON: Unexpected token { in JSON at position 171

Running elastic query in explore on v7.0.0 (aee1438ff2), got the following error:

Invalid JSON: Unexpected token { in JSON at position 171

Looks like the query sent to elastic was invalid perhaps?

created time in 8 days

pull request commentcortexproject/cortex

Remove requirement for a cluster label value

@uepoch I added the changelog, docs and tests, had to update your signed-off-by line to your new email; hope thats okay.

@jtlisi can you do final review please?

uepoch

comment created time in 9 days

push eventuepoch/cortex

Martin Conraux

commit sha 4ebff2057dd0fa9aa0c554e43a548c7ac4612f35

Add an option to keep HA-tracker cluster as empty string to only check replica - improve findHALabels to break when finding required values - remove newline - Pr comments + fix - Simplify findHaLabels - Commit configuration Signed-off-by: Martin Conraux <m.conraux@criteo.com>

view details

Tom Wilkie

commit sha 69fc292796c5489c23ddaafed7cb6ac01e73d2e5

Rebase, add changelog, update docs and add test. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 9 days

push eventuepoch/cortex

Martin Conraux

commit sha 5d1063c868c2ae4fb78a5923f7198e0603e6760f

Add an option to keep HA-tracker cluster as empty string to only check replica - improve findHALabels to break when finding required values - remove newline - Pr comments + fix - Simplify findHaLabels - Commit configuration Signed-off-by: Martin Conraux <m.conraux@adevinta.com>

view details

Tom Wilkie

commit sha 2167240c8c38664812ee616e81a6335f81a87228

Rebase, add changelog, update docs and add test. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 9 days

push eventuepoch/cortex

Martin Conraux

commit sha 1b0d354765d842d7a6a38eeb9e61dcd6ec3cc2ea

Add an option to keep HA-tracker cluster as empty string to only check replica Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha f3a91ec720d185064f84d318f033bea28d0a7ef3

improve findHALabels to break when finding required values Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha abff2c20751c2939aa54d2d944eca8b3e3344daf

remove newline Signed-off-by: Martin Conraux <m.conraux@criteo.com> Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha 3da079e461b403afbd048970fbc8b4795f909be9

Pr comments + fix Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha 7103107815bd9cf823f8c273281c35e2f74afa08

Simplify findHaLabels Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha 22fda4a5ce4555f8c7440e11604628ece38ff3c7

Typo Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Martin Conraux

commit sha a7ae1991409be2b276c04e62168266f8077d1ba1

Commit configuration Signed-off-by: Martin Conraux <martin.conraux@adevinta.com>

view details

Tom Wilkie

commit sha 02ac37e94277544927f09d9dfcd087fb6c103aa8

Rebase, add changelog, update docs and add test. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 9 days

push eventuepoch/cortex

Marco Pracucci

commit sha fa5b104e1bf3cf2258dd03ed7e286f74884d217c

Add config file support to integration tests (#2167) * Added query-frontend support to local tsdb-blocks-storage-s3 dev env Signed-off-by: Marco Pracucci <marco@pracucci.com> * Introduced CortexService to have cleaner integration tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added to writeFileToSharedDir() the ability to create the path of directories Signed-off-by: Marco Pracucci <marco@pracucci.com> * Enhanced query-frontend tests to run with config file too Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Tom Wilkie

commit sha fc47e09c1beb77d72ed5d86fdba99243498f8b7c

Allow HTTP pushes directly to ingesters, remove old billing code. (#1491) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

view details

Peter Štibraný

commit sha 5e91b1f1347cadeed83209a36bcd4389d4e0089e

Added integration build tag to integration tests. (#2164) * Added integration build tag to integration tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use tags= syntax Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix confused linter complaining about unused symbols. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added integration tag. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Peter Štibraný

commit sha 8cc77490d2f86ca8a6b11f8cbac016e469f9e2b8

Added services package. (#2188) * Added services package. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Cody Boggs

commit sha 26073129186251c4d3533693e0f5a3492f1f27e1

Remove Cody Boggs from maintainers (#2180) Signed-off-by: Cody Boggs <strofcon@gmail.com>

view details

Peter Štibraný

commit sha 44a6290c94b5452b367a61052fcdc3d361767331

Pass logger with userID to per-user TSDB components. (#2190) * Pass logger with userID to per-user TSDB components. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha 28e6f45b105258e71355f34911f6fd4480098529

Generate blocks storage config file doc (#2186) * Generate blocks storage config file doc Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Peter Štibraný

commit sha 92ab6cbe0995e98929643aa387b6d0206f608bc8

Fix flaky test for AwaitRunning. (#2194) AwaitRunning can observe either Stopping or Terminated, but we only checked for error message containing Terminated state before. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Marco Pracucci

commit sha 151c577b50e75b6a48a95ac8b31f0d8f184ca239

Shared in-memory index cache for queriers with blocks storage (#2189) * Shift to a shared in-memory index cache for queriers with blocks storage Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Bryan Boreham

commit sha c0db39b4d856b0f31c1e7695c8d5715483ba778f

Break background cache writes into batches of 100 (#2135) * Break background cache writes into batches of 100 This improves parallelism and observability. Fixes https://github.com/cortexproject/cortex/issues/2134 Signed-off-by: Bryan Boreham <bryan@weave.works>

view details

Bryan Boreham

commit sha 9d696081cd804375cafa824ff4c081cceb348ad4

Comment unsafe memory usage in ingester push path (#2004) * Wrap ingester Push errors to avoid retaining any reference to unsafe data * Comment unsafe memory usage in ingester push path Signed-off-by: Bryan Boreham <bryan@weave.works>

view details

Thor

commit sha fd91ac84ffe7bdf484e7cc0080cf905eb05e68c3

tsdb: expose stripe size option to reduce tsdb memory footprint (#2185) Signed-off-by: Thor <thansen@digitalocean.com>

view details

Marco Pracucci

commit sha 0634cd376f56f1d65875991f9170b36503806024

Fixed doc (#2195) * Fixed doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated TSDB stripe size config option description Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Jacob Lisi

commit sha 3c6875dd433a34faaed604cca6b73ac5b63faea8

add per-tenant alertmanager metrics (#2124) * add per-tenant alertmanager metrics Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * fix linting error Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * update in regards to PR comments Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * fix double lock Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * adjust cortex_alertmanager_alerts_received_total to be per user Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * make isActive public function and fix typo Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * comment IsActive Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

Peter Štibraný

commit sha 0afd196c642b793a4e38923780a3f0a215b88abd

Add comment and check for chunkSeries implementation of SeriesWithChunks (#2197) * Add comment and check for chunkSeries implementation of SeriesWithChunks Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

view details

Tom Wilkie

commit sha 34b93866a75e448d5add520e9ac04116a5388765

Add documentation on configuring caching. (#2199) * Add documentation on configuring caching. Signed-off-by: Tom Wilkie <tom@grafana.com> * Fix headers Signed-off-by: Tom Wilkie <tom@grafana.com> * Add header Signed-off-by: Tom Wilkie <tom@grafana.com> * Review feedback. Signed-off-by: Tom Wilkie <tom@grafana.com> * Review feedback from Marco. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Marco Pracucci

commit sha 5a725f61e6e83596dcb5ec6bf43d30ba1df57ec0

Documented how to add a new maintainer (#2200) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Marco Pracucci

commit sha a51e7756a68766f8fd8bc3fd6504849146e5a7b8

Documented how to run the website locally (#2201) * Documented how to run the website locally Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update CONTRIBUTING.md Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: gotjosh <josue.abreu@gmail.com> * Update CONTRIBUTING.md Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: gotjosh <josue.abreu@gmail.com> * Update docs/contributing/how-to-run-website-locally.md Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: gotjosh <josue.abreu@gmail.com> * Update docs/contributing/how-to-run-website-locally.md Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: gotjosh <josue.abreu@gmail.com> * Fixed Node.js instructions Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: gotjosh <josue.abreu@gmail.com>

view details

Marco Pracucci

commit sha de30605150acd7abd8456d4c67bd05544b7c7f79

Add Peter as maintainer (#2198) * Add Peter as maintainer Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

Sandeep Sukhani

commit sha 1a9d54638f3067d65052f2dbd27148ec4726de02

delete series api and purger to purge data requested for deletion (#2103) * delete series api and purger to purge data requested for deletion delete_store manages delete requests and purge plan records in stores purger builds delete plans(delete requests sharded by day) and executes them paralelly only one requests per user would be in execution phase at a time delete requests gets picked up for deletion after they get older by more than a day Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * moved delete store creation from initStore to initPurger, which is the only component that needs it Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * implemented new methods in MockStorage for writing tests Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * removed DeleteClient and using IndexClient in DeleteStore Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * refactored some code, added some tests for chunk store and purger Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * add some tests and fixed some issues found during tests Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * changes suggested in PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * rebased and fixed conflicts Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * updated route for delete handler to look same as prometheus Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * added test for purger restarts and fixed some issues Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * suggested changes from PR review and fixed linter, tests Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * fixed panic in modules when stopping purger Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * changes suggested from PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * config changes suggested in PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * changes suggested from PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * updated config doc Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * updated changelog Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * some changes suggested from PR review Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> * made init in Purger public to call it from modules to fail early Signed-off-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>

view details

push time in 9 days

PR closed cortexproject/cortex

Automatic recording rule substitution for large queries stale

Implements https://github.com/cortexproject/cortex/issues/1162

Some points not mentioned in the design doc:

  • The info about query-to-recording-rule map and the modifiedAt of a recording rule is being read from a file. The file will be manually generated for now.

TODO:

  • [ ] End to end test (Is TestReplaceQueryWithRecordingRule enough for this?)
  • [x] API to reload the query-to-recording-rule map file Reload query-to-recording-rule map file when modified Reload query-to-recording-rule config at regular intervals.
  • [ ] Docs for the query-to-recording-rule map file format
+926 -74

5 comments

8 changed files

codesome

pr closed time in 9 days

pull request commentcortexproject/cortex

Automatic recording rule substitution for large queries

This is time to kill this; if we decide to use it in the future we can re-implement.

codesome

comment created time in 9 days

issue closedcortexproject/cortex

cortex_ingester_chunk_size_bytes: why 8k boundary for last bucket?

The highest non-inf upper inclusive bound (le) value for the histogram metric cortex_ingester_chunk_size_bytes is currently set to 8000:

https://github.com/cortexproject/cortex/blob/55a4fcfbc07903fd722e9c72659fd8ff9a873f84/pkg/ingester/metrics.go#L177

What's the rationale for that?

Not knowing much about Cortex internals when I look at the following graph I'd love to have more resolution within that last bucket between 8000 and infinity: Screenshot from 2020-04-27 15-19-24

Let's maybe add one or more buckets more?

Is there and explicit, predictable, known upper bond of chunk size (https://github.com/cortexproject/cortex/blob/master/docs/configuration/config-file-reference.md mentions seems to focus on length and age properties, not so much actual data size in bytes).

Any feedback is welcome, especially when I misunderstand something. Thanks!

closed time in 9 days

jgehrcke

issue commentcortexproject/cortex

Doesn't create chunks/indexes tables properly in Cassandra

I'll note that stage_chunks_18394 should have been created on the 12th, which was when you created this issue. Does this identifier appear anywhere in the logs?

williansvi

comment created time in 9 days

issue commentcortexproject/cortex

Doesn't create chunks/indexes tables properly in Cassandra

Can you attach the full logs when you Run Cortex? You should see logs saying which tables were created / synced and why.

williansvi

comment created time in 9 days

pull request commentgrafana/jsonnet-libs

Change the instance name for standard pod scraping to be unique

@beorn7 can you follow up to ensure the Loki scrape config is consistent?

beorn7

comment created time in 9 days

issue openedgrafana/grafana

Explore table panel gets Value #A and Value #B the wrong way roung

When running with two queries in explore, I'd expect Value #A to be the first and Value #B to be the second:

image

Version Grafana v7.1.0-91219227pre (bd42407)

created time in 12 days

issue openedgrafana/grafana

Undo doesn't work in explore with Prometheus

I accidentally typed the wrong thing, and ctrl-z (or cmd-z on a mac) didn't undo it.

Version Grafana v7.1.0-91219227pre (bd42407620)

created time in 12 days

create barnchcortexproject/cortex

branch : add-tls-support

created branch time in 15 days

issue openedgrafana/cortex-jsonnet

Cortex mixin query frontend cache should name the cache

https://github.com/grafana/cortex-jsonnet/blob/69538f6319a4bd4db8afc12562c9747548c8d115/cortex-mixin/dashboards/queries.libsonnet#L28

Otherwise in single binary mode you get all the caches.

created time in 20 days

push eventcortexproject/cortex

Tom Wilkie

commit sha 6a6e6b6d1934ed2bfc742974dab5f1a3f4d93936

Review feedback. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 20 days

push eventcortexproject/cortex

Tom Wilkie

commit sha 2bacde9a885e7ad033431a10b7e68d88a01b8412

Add changelog entry. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 20 days

push eventcortexproject/cortex

Tom Wilkie

commit sha 8f7ade0d8a45e1d1ed481ee35bd074f78ae960fc

go mod vendor Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 22 days

push eventcortexproject/cortex

Tom Wilkie

commit sha c82bdcb4fbaaa5cad3e9f4a009529ab631e5c6d4

Add option to limit concurrent queries to Cassandra. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in 22 days

PR opened cortexproject/cortex

Add option to limit concurrent queries to Cassandra.

Signed-off-by: Tom Wilkie tom@grafana.com

What this PR does:

Add an option to limit concurrent queries to Cassandra, to prevent us for DOSing the underlying clusters.

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+37 -7

0 comment

4 changed files

pr created time in 22 days

create barnchcortexproject/cortex

branch : cassandra-limit-queries

created branch time in 22 days

issue openedcortexproject/cortex

Cassandra: allow HostSelectionPolicy to be set to TokenAwarePolicy

We should allow the HostSelectionPolicy to be configured. https://godoc.org/github.com/gocql/gocql#PoolConfig

The TokenAwarePolicy will yield the best performance.

created time in 22 days

issue commentgrafana/cortex-jsonnet

Query ingesters within 12h too long?

Is it because BigTable is not able to process that many write requests sometimes and hence there might be a queue of up to 4h or what's the reason that it takes so much longer sometimes?

I wouldn't characterise it like that - I'd say its more that users series churn is not always constant, so the number of chunks to flush is highly variable - and our bigtable is statically provisioned.

weeco

comment created time in 22 days

PR opened cortexproject/cortex

Don't write empty values to the index.

This causes tombstones to be written to Cassandra, which causes excessive heap usage.:

Signed-off-by: Tom Wilkie tom@grafana.com

What this PR does:

We write empty values (just using the row and column value) for metric name and series rows. These empty value cause tombstones to be written to Cortex (https://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html). Tombstones make Cassandra slow(er).

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+5 -0

0 comment

1 changed file

pr created time in 23 days

create barnchcortexproject/cortex

branch : dont-write-null-values-to-index

created branch time in 23 days

issue commentgrafana/cortex-jsonnet

Query ingesters within 12h too long?

This is indeed a good question - your reasoning makes sense. Except it can take some time for the chunks to be flushed, and if you look at our alerts you'll see we sometimes see chunks take up to 10hrs to flush - so I suspect this is put in here to ensure users / customers never see gaps. Gaps are bad as they will be cached in the frontend.

weeco

comment created time in 23 days

issue commentgrafana/grafana

Dashboard drop down on home page broken

Oh sorry, I didn't realise!

I literally couldn't find out how to get to the list of dashboards - it felt like a bug to me.

Given the dashboard list is still accessible by clicking the header on a dashboard, this also feel inconsistent to me.

tomwilkie

comment created time in a month

issue openedgrafana/grafana

Dashboard drop down on home page broken

Version "Grafana v7.1.0-b8976df5pre (55ac97dccc)", clicking on home no longer offers the dashboard dropdown...

fz9iQ5gC9Y

created time in a month

issue commentgrafana/grafana

Explore table view doesn't show data

Is it correct that the response from prometheus doesn't have any metrics for certain instant queries?

Yes, the metric name is erased when you do aggregation in Prometheus. Can confirm, when you don't do an aggregation (and the metric name is returned) then the instance view works.

I will also move this down in our Backlog bugs priority list as it is affects only very small amount of queries.

I'd argue this is a pretty major regression; instance queries and aggregation are two very frequently used features.

tomwilkie

comment created time in a month

pull request commentgrafana/loki

WIP: production/promtail-mixin: Make dashboard queries configurable

I support this change! We recently did a similar thing for the Cortex mixin: https://github.com/grafana/cortex-jsonnet/pull/43

We've stopped globally merging mixins a few weeks ago: https://github.com/grafana/jsonnet-libs/blob/master/prometheus-ksonnet/mixins.libsonnet#L3, but its kinda up to the user of the mixin.

I think we should still use the _config for individual mixin config.

brancz

comment created time in a month

push eventtomwilkie/sudugo

Tom Wilkie

commit sha e712a42d11c554895eb55ed8a73d1ca9d2cce998

Add some more examples. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

issue openedgrafana/grafana

Trace view doesn't render in split view

See video, have to exit split view to get trace to show:

s7ogUolTSD

Can we not open in split view by default? I don't find it useful.

created time in a month

issue openedgrafana/grafana

Persistent floating tooltip

See video, seems to persist when entering explore from a panel while tooltip showing:

s7ogUolTSD

created time in a month

push eventcortexproject/cortex

Joe Elliott

commit sha 75c5eb5958746eb2d794a2851f2c66ebbe14feaa

Total Worker Parallelism in the Querier (#2456) * Added param Signed-off-by: Joe Elliott <number101010@gmail.com> * First pass new structure Signed-off-by: Joe Elliott <number101010@gmail.com> * Split out manager into its own code Signed-off-by: Joe Elliott <number101010@gmail.com> * created interface to help with testing Signed-off-by: Joe Elliott <number101010@gmail.com> * Cleaned up notes. Addressed concurrency issues Signed-off-by: Joe Elliott <number101010@gmail.com> * Added parallelism reset Signed-off-by: Joe Elliott <number101010@gmail.com> * Added total parallelism support Signed-off-by: Joe Elliott <number101010@gmail.com> * Mirrored all tests for total parallelism Signed-off-by: Joe Elliott <number101010@gmail.com> * Added clarity Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed upstream interface Signed-off-by: Joe Elliott <number101010@gmail.com> * Added first worker test Signed-off-by: Joe Elliott <number101010@gmail.com> * Added concurrency test Signed-off-by: Joe Elliott <number101010@gmail.com> * TestTable for Concurrency Signed-off-by: Joe Elliott <number101010@gmail.com> * Added series of test cases Signed-off-by: Joe Elliott <number101010@gmail.com> * Added test for failed receive. Fixed graceful quit shutdown Signed-off-by: Joe Elliott <number101010@gmail.com> * Added tests for number of calls made to the handler Signed-off-by: Joe Elliott <number101010@gmail.com> * Added test for cancelling service context cancelling processes Signed-off-by: Joe Elliott <number101010@gmail.com> * Added changelog entry and updated docs Signed-off-by: Joe Elliott <number101010@gmail.com> * lint Signed-off-by: Joe Elliott <number101010@gmail.com> * lint part deux Signed-off-by: Joe Elliott <number101010@gmail.com> * make doc Signed-off-by: Joe Elliott <number101010@gmail.com> * Added resetParallelism tests and fixed bug Signed-off-by: Joe Elliott <number101010@gmail.com> * Added stop test to resetParallelism Signed-off-by: Joe Elliott <number101010@gmail.com> * Added test names Signed-off-by: Joe Elliott <number101010@gmail.com> * Cleaned up resetParallelism Signed-off-by: Joe Elliott <number101010@gmail.com> * Added DNS Watch tests Signed-off-by: Joe Elliott <number101010@gmail.com> * lint Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed racey dns watcher tests Signed-off-by: Joe Elliott <number101010@gmail.com> * Changed to worker-match-max-concurrent Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed unnecessary param Signed-off-by: Joe Elliott <number101010@gmail.com> * Swapped to client config instead of bare param Signed-off-by: Joe Elliott <number101010@gmail.com> * Use DialContext() Signed-off-by: Joe Elliott <number101010@gmail.com> * Added comments/logs around concurrency distribution Signed-off-by: Joe Elliott <number101010@gmail.com> * Swapped to force cancel via context Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed outdated comment Signed-off-by: Joe Elliott <number101010@gmail.com> * Added match_max_concurrent to example single binary configs Signed-off-by: Joe Elliott <number101010@gmail.com> * Improved comments Signed-off-by: Joe Elliott <number101010@gmail.com> * Added log for scenario where we can't find in a map Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed unnecessary nesting Signed-off-by: Joe Elliott <number101010@gmail.com> * Moved concurrency to its own method Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed managerCtx/Cancel Signed-off-by: Joe Elliott <number101010@gmail.com> * Fixed expected value on test Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed WithBlock() option Signed-off-by: Joe Elliott <number101010@gmail.com> * Remove deleted address from manager map Signed-off-by: Joe Elliott <number101010@gmail.com>

view details

push time in a month

PR merged cortexproject/cortex

Reviewers
Total Worker Parallelism in the Querier size/XL

What this PR does:

  • Adds a parameter -querier.worker-total-parallelism that controls a total number of concurrent requests allowed spread across workers.
  • Adds workerFrontendManager to help encapsulate functionality related to maintaining a set of concurrent workers per frontend.
  • Related tests/docs
  • Extends existing frontend tests to cover both the existing and new parallelism modes

Which issue(s) this PR fixes: Fixes #1883

Checklist

  • [x] Tests updated
  • [x] Documentation added
  • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+555 -122

4 comments

14 changed files

joe-elliott

pr closed time in a month

issue closedcortexproject/cortex

querier.worker-parallelism should be derivable

Currently, querier.worker-parallelism is the number of parallel queries that can be processed per querier per frontend. In an effort to simplify operating cortex, this could be changed to a max parallelism configuration per querier irrespective of the number of frontends. Changing this would alleviate the common practice of determining querier parallelism -> dividing by n_frontends -> setting querier.worker-parallelism.

Additionally, the queriers already have a DNS watch loop running in order to add/drop ephemeral frontends, which can be used to readjust the derived per-frontend limits.

To maintain backwards compatibility, this change could be effected by adding a new flag and marking querier.worker-parallelism deprecated if necessary.

closed time in a month

owen-d

pull request commentgrafana/grafana

Logs: Derived fields link design

on hover

Yes please :-)

On Thu, Apr 23, 2020 at 9:30 AM Andrej Ocenas notifications@github.com wrote:

@davkal https://github.com/davkal Yeah I can play with something like that. Could be also done as link icon and on hover of that it would show all the links that way we could show the icon in relevant rows without occluding any of the text (but you would not see what kind of links there are until you hover).

In any case if that's ok I would merge this and as I said bit strapped on time so would take a look at the the improvement later on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grafana/grafana/pull/23695#issuecomment-618260343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMNBMA7XFWGBUJ7ZZZRTTRN74CPANCNFSM4MMJ6UOA .

aocenas

comment created time in a month

delete branch cortexproject/cortex

delete branch : rf-three-with-two-ingesters

delete time in a month

issue closedcortexproject/cortex

Gaps when querying ingesters with replication factor = 3 and 2 healthy ingesters in the cluster

Yesterday I've noticed that, in a Cortex cluster with 3 ingesters and a replication factor of 3, during the ingesters rollout there are gaps in queried series (for the series not flushed to storage yet). My gut feeling is that this is a bug in the ring replication strategy, but further investigation needs to be done.

closed time in a month

pracucci

push eventcortexproject/cortex

Tom Wilkie

commit sha fa0fc64235063d770d6aac223b11e016cf64fcf2

Ensure queries return correctly during rolling upgrades of stateful cluster with RF 3 and only 3 nodes. (#2503) * Use a real ring with mock KV when testing distributor. This is to teast out errors in the replication logic. Signed-off-by: Tom Wilkie <tom@grafana.com> * Extend distributor test to cover the case RF=3 with 2 ingesters. Signed-off-by: Tom Wilkie <tom@grafana.com> * Ensure ring correctly calculates the number of allowed failures when RF=3 and #ingesters=2. Signed-off-by: Tom Wilkie <tom@grafana.com> * Add changelog and review feedback. Signed-off-by: Tom Wilkie <tom@grafana.com> * Refactor some distributor tests to try and get them to pass. Signed-off-by: Tom Wilkie <tom@grafana.com> * Speed up tests but polling more frequently. Signed-off-by: Tom Wilkie <tom@grafana.com> * Fix same bug on the write path. Signed-off-by: Tom Wilkie <tom@grafana.com> * Tidy up the distributor tests. Signed-off-by: Tom Wilkie <tom@grafana.com> * Make test correctly handle RF3 and 2 ingesters - previously was succeeding when it shouldn't Signed-off-by: Tom Wilkie <tom@grafana.com> * Update pkg/ring/ring.go Co-Authored-By: Jacob Lisi <jacob.t.lisi@gmail.com> Signed-off-by: Tom Wilkie <tom@grafana.com> Co-authored-by: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

push time in a month

PR merged cortexproject/cortex

Ensure queries return correctly during rolling upgrades of stateful cluster with RF 3 and only 3 nodes. size/L

Signed-off-by: Tom Wilkie tom@grafana.com

What this PR does:

  • Use a real ring with mock KV when testing distributor. This is to tease out errors in the replication logic.
  • Extend the distributor tests to cover the case of RF=3 with 2 ingesters.
  • Fix ring.GetAll to correctly calculate maxErrors when RF=3 and 2 ingesters.

Which issue(s) this PR fixes: Fixes #2504

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+271 -171

0 comment

8 changed files

tomwilkie

pr closed time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha 2619f251363559835496e11b200aa6932fbd2582

Update pkg/ring/ring.go Co-Authored-By: Jacob Lisi <jacob.t.lisi@gmail.com> Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha 3eb03c01352d5890412b95f9718b6764860a0fcd

Update pkg/ring/ring.go Co-Authored-By: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

push time in a month

issue commentgrafana/grafana

Explore table view doesn't show data

This is happening again:

image

Grafana v7.0.0-67c235cfpre (66d405acab)

Could it be time related?

tomwilkie

comment created time in a month

IssuesEvent

push eventcortexproject/cortex

Tom Wilkie

commit sha 41f8ca37b40f9d26ef55ed571e4b35023aa77893

Speed up tests but polling more frequently. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha ee636654f6114c5ba90952f430dbb7a94cf81ecf

Fix same bug on the write path. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha 88b91e21a5342c27606c5f893d49f445816642dd

Tidy up the distributor tests. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha cc699543f3cc943c73c3e6154e416817b792444a

Make test correctly handle RF3 and 2 ingesters - previously was succeeding when it shouldn't Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha 56419b8dc0aee6afff3875a377611fbf7d404fe6

Refactor some distributor tests to try and get them to pass. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha e150c3d7b8b8c1d5e1fe6f3b9f2460cde3cd7602

Add changelog and review feedback. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha c1395877d946f2c74e77103209a03ceb0b61e601

Ensure ring correctly calculates the number of allowed failures when RF=3 and #ingesters=2. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Tom Wilkie

commit sha deeaec2561eb9f2c2f3a4091c0acd9c70e4826cb

Extend distributor test to cover the case RF=3 with 2 ingesters. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

PR opened cortexproject/cortex

Ensure queries return correctly during rolling upgrades of stateful cluster with RF 3 and only 3 nodes.

Signed-off-by: Tom Wilkie tom@grafana.com

What this PR does:

  • Use a real ring with mock KV when testing distributor. This is to teast out errors in the replication logic.

Which issue(s) this PR fixes: Fixes #<issue number>

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+43 -54

0 comment

3 changed files

pr created time in a month

create barnchcortexproject/cortex

branch : rf-three-with-two-ingesters

created branch time in a month

Pull request review commentcortexproject/cortex

Add disk space requirement in WAL doc

 _The WAL is currently considered experimental._  2. As there are no transfers between ingesters, the tokens are stored and recovered from disk between rollout/restarts. This is [not a new thing](https://github.com/cortexproject/cortex/pull/1750) but it is effective when using statefulsets. +## Disk space requirement
## Disk space requirements
codesome

comment created time in a month

Pull request review commentcortexproject/cortex

Add disk space requirement in WAL doc

 _The WAL is currently considered experimental._  2. As there are no transfers between ingesters, the tokens are stored and recovered from disk between rollout/restarts. This is [not a new thing](https://github.com/cortexproject/cortex/pull/1750) but it is effective when using statefulsets. +## Disk space requirement++Based on tests in real worl++* Numbers from an ingester with 1.2M series, ~80k samples/s ingested and ~15s scrape interval.+* Checkpoint period was 20mins, so we need to scale up the number of WAL files to account for the default of 30mins.+* At any given point, we have 2 complete checkpoints present on the disk and a 2 sets of WAL files between checkpoints (and now).+* This peaks at 3 checkpoints and 3 lots of WAL momentarily, as we remove the old checkpoints.++|  |  |+|:-|:-|+| Size of 1 checkpoint for 1.2M series | 1410 MiB |+| Avg checkpoint size per series | 1.2 KiB |+| No. of WAL files between checkpoints (30m checkpoint) | 30 mins x 87 / 20mins = 130 |+| Size per WAL file | 32 MiB (reduced from Prometheus) |+| Total size of WAL | 4160 MiB |+| Steady state usage | 2 x 1410 + 2 x 4160 = ~11 GiB |+| Peak usage | 3 x 1410 + 3  x 4160 = ~17 GiB |++For 1M series at 15s scrape interval with checkpoint duration of 30m++|  |  |+|:-|:-|+| Steady state usage | 11 / 1.2 = ~9.2 GiB |
| Steady state usage | 11 GiB / 1.2 = ~9.2 GiB |
codesome

comment created time in a month

Pull request review commentcortexproject/cortex

Add disk space requirement in WAL doc

 _The WAL is currently considered experimental._  2. As there are no transfers between ingesters, the tokens are stored and recovered from disk between rollout/restarts. This is [not a new thing](https://github.com/cortexproject/cortex/pull/1750) but it is effective when using statefulsets. +## Disk space requirement++Based on tests in real worl++* Numbers from an ingester with 1.2M series, ~80k samples/s ingested and ~15s scrape interval.+* Checkpoint period was 20mins, so we need to scale up the number of WAL files to account for the default of 30mins.+* At any given point, we have 2 complete checkpoints present on the disk and a 2 sets of WAL files between checkpoints (and now).+* This peaks at 3 checkpoints and 3 lots of WAL momentarily, as we remove the old checkpoints.++|  |  |+|:-|:-|+| Size of 1 checkpoint for 1.2M series | 1410 MiB |+| Avg checkpoint size per series | 1.2 KiB |+| No. of WAL files between checkpoints (30m checkpoint) | 30 mins x 87 / 20mins = 130 |+| Size per WAL file | 32 MiB (reduced from Prometheus) |+| Total size of WAL | 4160 MiB |+| Steady state usage | 2 x 1410 + 2 x 4160 = ~11 GiB |+| Peak usage | 3 x 1410 + 3  x 4160 = ~17 GiB |++For 1M series at 15s scrape interval with checkpoint duration of 30m++|  |  |+|:-|:-|+| Steady state usage | 11 / 1.2 = ~9.2 GiB |+| Peak usage | 17 / 1.2 = ~14.2 |
| Peak usage | 17 GiB / 1.2 = ~14.2 GiB |
codesome

comment created time in a month

Pull request review commentcortexproject/cortex

Add disk space requirement in WAL doc

 _The WAL is currently considered experimental._  2. As there are no transfers between ingesters, the tokens are stored and recovered from disk between rollout/restarts. This is [not a new thing](https://github.com/cortexproject/cortex/pull/1750) but it is effective when using statefulsets. +## Disk space requirement++Based on tests in real worl
Based on tests in real world:
codesome

comment created time in a month

pull request commentgrafana/grafana

Logs: Derived fields link design

My comment to @aocenas was that it take too much clicking and scrolling to get to the trace, the link is too small and generally this feature is very undiscoverable:

image

(It also seem to be broken in master, but that unrelated).

aocenas

comment created time in a month

pull request commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

That looks really good, thanks for sticking with this Sandeep. LGTM!

sandeepsukhani

comment created time in a month

issue openedcortexproject/cortex

Update remote read to the new streaming interface.

@bwplotka introduced a new streaming remote read in Prometheus, we should add the same to Cortex for people who point Prometheus and Thanos at Cortex.

created time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

     ]   else [], +  slackAlertTitle:: '{{ template "__alert_title" . }}',+  slackAlertText:: '{{ template "__alert_text" . }}',++  build_slack_receiver(name, slack_channel)::+    {+      name: name,+      slack_configs: [{+        api_url: $._config.slack_url,+        channel: slack_channel,+        send_resolved: true,+        title: $.slackAlertTitle,+        text: $.slackAlertText,+        actions: [+          {+            type: 'button',+            text: 'Runbook :green_book:',+            url: '{{ (index .Alerts 0).Annotations.runbook_url }}',

Nice, thanks.

sandeepsukhani

comment created time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

     ]   else [], +  slackAlertTitle:: '{{ template "__alert_title" . }}',+  slackAlertText:: '{{ template "__alert_text" . }}',

Do we actually override it though? I don't think this library has many users other than us. Plus, they can specify their own template if need be.

sandeepsukhani

comment created time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha 62c70cec3efdea40b28d90ffa079c479b0062698

Small refactors: - Put all the mixin config in one place. - Make dashboardWithTagsAndLinks just an override on the dashboard constructor. - Factor out the helper functions. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha 78349a89c5c920dd3f97a663e4f9043749777211

More refactoring: - Move the dashboards to separate files and namespace them off so they only have access to the config. - Add a AddRowIf helper to massively rationalise the inclusion / exclusion of rates. - Alter how we conditionally include dashboards themselves. Should be a no-op change to the dashboards themselve. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha d299d64d24785d3dd1a5074a8487f212388a808d

Use set-style selector for bigtable/cassandra/dynamodb. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha 2f71235a1a04d95fd52758108e4b6e511e3429da

Make selectors in PromQL queries on the writes dashboard support single process mode. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha ef468e356c8731fed10682b891c13f57a8a5273c

Get all the panels on the write dashboard working with the single binary. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha c706d9a21f5ce0954b8dfcc1f7211834b6c5c82a

Get read dashboard working for single process. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha a3c106c03a9098becb188cc8528e7ed35a10dc10

Apply suggestions from code review Co-Authored-By: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

Tom Wilkie

commit sha 848899a23367acf8628e14ae806c46959db9ad8b

Update chunks and queries dashboards. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha bb0b966bca569e31dba81734e9d8563583fc73fc

Only add the links if addClusterSelectorTemplates is called. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha 4539b86432795e4358d067ecdec08182bc2e6166

Only tag dashboards with cluster selectors. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

Tom Wilkie

commit sha b6263cdfef2ba750704fcc90fc67762dbd3f138f

Merge pull request #43 from grafana/optional-k8s Make the cortex-mixin work on non-k8s deployments.

view details

push time in a month

delete branch grafana/cortex-jsonnet

delete branch : optional-k8s

delete time in a month

PR merged grafana/cortex-jsonnet

Make the cortex-mixin work on non-k8s deployments.

But first, a bunch of refactoring to tidy the dashboards up:

  • Put all the mixin config in one place.
  • Make dashboardWithTagsAndLinks just an override on the dashboard constructor.
  • Factor out the helper functions.
  • Move the dashboards to separate files and namespace them off so they only have access to the config.
  • Add a AddRowIf helper to massively rationalise the inclusion / exclusion of rows.
  • Alter how we conditionally include dashboards themselves.
  • Use set-style selector for bigtable/cassandra/dynamodb.
  • Make the mixin by default not need any config, and generate dashboards for all possible permutations of tsdb, chunks, bigtable, dynamodb, cassandra, gcs, s3 etc.
+1211 -1013

1 comment

15 changed files

tomwilkie

pr closed time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha 4539b86432795e4358d067ecdec08182bc2e6166

Only tag dashboards with cluster selectors. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

     ]   else [], +  slackAlertTitle:: '{{ template "__alert_title" . }}',+  slackAlertText:: '{{ template "__alert_text" . }}',++  build_slack_receiver(name, slack_channel)::+    {+      name: name,+      slack_configs: [{+        api_url: $._config.slack_url,+        channel: slack_channel,+        send_resolved: true,+        title: $.slackAlertTitle,+        text: $.slackAlertText,+        actions: [+          {+            type: 'button',+            text: 'Runbook :green_book:',+            url: '{{ (index .Alerts 0).Annotations.runbook_url }}',

A thought: what happens when there is no runbook url on the alert?

sandeepsukhani

comment created time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

     ]   else [], +  slackAlertTitle:: '{{ template "__alert_title" . }}',+  slackAlertText:: '{{ template "__alert_text" . }}',

Now these are much simpler, is there a reason not to inline them below?

sandeepsukhani

comment created time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

+# This builds the silence URL.  We exclude the alertname in the range

Might be worth adding a comment saying where this was inspired by.

sandeepsukhani

comment created time in a month

Pull request review commentgrafana/jsonnet-libs

support for generating slack receivers with buttons for alerts

     ]   else [], +  slackAlertTitle:: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.cluster }}: {{ .GroupLabels.alertname }} ({{ .GroupLabels.namespace }})',+  slackAlertText:: |||+    {{ .CommonAnnotations.summary }}+    {{ if .Alerts.Firing | len }}Firing alerts:+    {{ range .Alerts.Firing }}- {{ .Annotations.message }}{{ .Annotations.description }}+    {{ end }}{{ end }}{{ if .Alerts.Resolved | len }}Resolved alerts:+    {{ range .Alerts.Resolved }}- {{ .Annotations.message }}{{ .Annotations.description }}+    {{ end }}{{ end }}+  |||,

Can we move this into the template too?

sandeepsukhani

comment created time in a month

Pull request review commentgrafana/cortex-jsonnet

Make the cortex-mixin work on non-k8s deployments.

+(import 'grafana-builder/grafana.libsonnet') {++  // Override the dashboard constructor to add:+  // - default tags,+  // - some links that propagate the selectred cluster.+  dashboard(title)::

@pstibrany good catch - I've made the links get added automatically if addClusterSelectorTemplate is called.

tomwilkie

comment created time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha bb0b966bca569e31dba81734e9d8563583fc73fc

Only add the links if addClusterSelectorTemplates is called. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventcortexproject/cortex

Ganesh Vernekar

commit sha b94b3b1d2cbd92e7b6d25f51b15e29b116feb69e

Additional notes on running WAL in production (#2487) * Additional notes on running WAL in production Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Address feedbacks Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

view details

push time in a month

PR merged cortexproject/cortex

Additional notes on running WAL in production size/S

The changes are mostly geared towards non-kubernetes scenerio.

/cc @tomwilkie

+11 -0

0 comment

1 changed file

codesome

pr closed time in a month

issue closedgrafana/grafana

Explore table view doesn't show data

Running a prometheus query, the instance query in explore table view says "returned 0 rows" when it did in fact return rows:

Screenshot 2020-04-17 at 17 35 55

Version: Grafana v7.0.0-254070a2pre (43fc6c3a17)

closed time in a month

tomwilkie

issue commentgrafana/grafana

Explore table view doesn't show data

Can confirm, works for me now too.

tomwilkie

comment created time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha 848899a23367acf8628e14ae806c46959db9ad8b

Update chunks and queries dashboards. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

pull request commentcortexproject/cortex

Update golang to 1.14.2 and other build deps

@gouthamve Why? What's the problem of updating the build-image in this PR?

I think Goutham prefers to use the master-xyz tag for the build image.

gouthamve

comment created time in a month

push eventcortexproject/cortex

Giedrius Statkevičius

commit sha 3c1a32422e7c1370ba0fec5d31e1f6ce781519d1

frontend: also log the POST data of long queries (#2481) * frontend: also log the POST data of long queries The feature of query-frontend to log slower queries is already amazing but we can make it even better by logging the POST body as well. It is not uncommon nowadays to use POST with the Prometheus API. All of the data that is passed via POST is not visible via the URL. Thus, let's also print the POST data if it is available. Small testing shows that the message now looks like the following: ``` level=info ts=2020-04-17T16:34:51.200200445Z caller=frontend.go:174 msg="slow query" org_id=fake url=http://localhost:9009/api/prom/api/v1/query_range time_taken=1.638201ms body="end=1587141285&query=prometheus_http_requests_total&start=1587140385&step=15" ``` Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * CHANGELOG: add item Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Update CHANGELOG.md Added PR number. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

view details

push time in a month

PR merged cortexproject/cortex

frontend: also log the POST data of long queries size/S

The feature of query-frontend to log slower queries is already amazing but we can make it even better by logging the POST body as well. It is not uncommon nowadays to use POST with the Prometheus API. All of the data that is passed via POST is not visible via the URL. Thus, let's also print the POST data if it is available.

Small testing shows that the message now looks like the following:

level=info ts=2020-04-17T16:34:51.200200445Z caller=frontend.go:174 msg="slow query" org_id=fake url=http://localhost:9009/api/prom/api/v1/query_range time_taken=1.638201ms body="end=1587141285&query=prometheus_http_requests_total&start=1587140385&step=15"

Signed-off-by: Giedrius Statkevičius giedriuswork@gmail.com

+11 -1

1 comment

2 changed files

GiedriusS

pr closed time in a month

pull request commentgrafana/cortex-jsonnet

Make the cortex-mixin work on non-k8s deployments.

Thanks for the feedback Jacob!

I noticed some dashboards and the recording rules still have cluster hardcoded but I assume that will come later.

I think we can get away with still having the cluster label in recording rules, even when the underlying metric doesn't have a cluster label. Its basically just ignored, and I prefer having consistent recording rules across deployments. We might even have multiple clusters of single process Cortex, and re-introduce the cluster label...

tomwilkie

comment created time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha a3c106c03a9098becb188cc8528e7ed35a10dc10

Apply suggestions from code review Co-Authored-By: Jacob Lisi <jacob.t.lisi@gmail.com>

view details

push time in a month

Pull request review commentgrafana/cortex-jsonnet

Make the cortex-mixin work on non-k8s deployments.

+local utils = import 'mixin-utils/utils.libsonnet';++(import 'grafana-builder/grafana.libsonnet') {++  _config:: error 'must provide _config',++  // Override the dashboard constructor to add:+  // - default tags,+  // - some links that propagate the selectred cluster.+  dashboard(title)::+    super.dashboard(title) + {+      tags: $._config.tags,++      links: [+        {+          asDropdown: true,+          icon: 'external link',+          includeVars: true,+          keepTime: true,+          tags: $._config.tags,+          targetBlank: false,+          title: 'Cortex Dashboards',+          type: 'dashboards',+        },+      ],++      addRowIf(condition, row)::+        if condition+        then self.addRow(row)+        else self,++      addClusterSelectorTemplates()::+        if $._config.singleBinary+        then self.addMultiTemplate('job', 'cortex_build_info', 'job')+        else self+             .addMultiTemplate('cluster', 'cortex_build_info', 'cluster')+             .addMultiTemplate('namespace', 'cortex_build_info', 'namespace'),+    },++  // The ,ixin allow specialism of the job selector depending on if its a single binary+  // deployment or a namespaced one.+  jobMatcher(job)::

I'm not sure this is actually much of a problem - redundant metrics exported by jobs (ie ingester metrics from the distributor) don't influence the final results (ie qps or latency). For almost all cases, we can target specific eg methods to get the qps/latency of a specific component. In some cases (kv stores) the metrics aren't broken out like that yet[1], but for eg caches they are. So I think we'll be okay.

Do you have a specific case in mind?

[1]https://github.com/cortexproject/cortex/issues/2484

tomwilkie

comment created time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha c706d9a21f5ce0954b8dfcc1f7211834b6c5c82a

Get read dashboard working for single process. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha ef468e356c8731fed10682b891c13f57a8a5273c

Get all the panels on the write dashboard working with the single binary. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventgrafana/jsonnet-libs

Tom Wilkie

commit sha b7d0399a4c8b9fe3ee381b3dc8752e7c778b3f1a

Sometimes you want to not specify the matcher, but you need the label in the recording rule name. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

push eventgrafana/cortex-jsonnet

Tom Wilkie

commit sha 2f71235a1a04d95fd52758108e4b6e511e3429da

Make selectors in PromQL queries on the writes dashboard support single process mode. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

issue openedcortexproject/cortex

Need a way to differentiate between KV metrics for different rings and dudupers in single binary mode.

Current its all in one histogram - cortex_kv_request_duration_seconds_count. Need to add a name label or something.

created time in a month

PR opened grafana/cortex-jsonnet

Make the cortex-mixin work on non-k8s deployments.

But first, a bunch of refactoring to tidy the dashboards up:

  • Put all the mixin config in one place.
  • Make dashboardWithTagsAndLinks just an override on the dashboard constructor.
  • Factor out the helper functions.
  • Move the dashboards to separate files and namespace them off so they only have access to the config.
  • Add a AddRowIf helper to massively rationalise the inclusion / exclusion of rates.
  • Alter how we conditionally include dashboards themselves.
  • Use set-style selector for bigtable/cassandra/dynamodb.
+1143 -988

0 comment

14 changed files

pr created time in a month

create barnchgrafana/cortex-jsonnet

branch : optional-k8s

created branch time in a month

push eventtomwilkie/sudugo

Tom Wilkie

commit sha 1c17f7c36f654701d00ae91ee66a7fd2fcb095bc

Spelling, and another example. Signed-off-by: Tom Wilkie <tom@grafana.com>

view details

push time in a month

create barnchtomwilkie/sudugo

branch : master

created branch time in a month

created repositorytomwilkie/sudugo

A sudoku solver in Go

created time in a month

more