profile
viewpoint
Lucas Servén Marín squat Red Hat Berlin https://squat.ai working on Kubernetes, Prometheus, and Thanos

coreos/terraform-aws-kubernetes 115

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more

coreos/issue-sync 80

A tool for synchronizing issue tracking between GitHub and JIRA

coreos/terraform-azurerm-kubernetes 19

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more

slrtbtfs/promql-lsp 19

PromQL language server

squat/drae 18

A RESTful API for el Diccionario de la Real Academia Española

squat/darkapi 2

An API for Darknet image detection neural networks like YOLO

observatorium/up 1

simple Prometheus remote-write client

squat/acatiris 1

An ASCII Art Middleware for Express

squat/berlinstrength 1

an RFID scanner application for Berlin Strength

push eventthanos-io/thanos

Ben Ye

commit sha cf4e4500588ac4f5874e8969a39d37e315d4176a

fix ruler ui alerts click (#2040) Signed-off-by: yeya24 <yb532204897@gmail.com>

view details

push time in a day

PR merged thanos-io/thanos

Fx ruler ui alerts click

Signed-off-by: yeya24 yb532204897@gmail.com

<!-- Keep PR title verbose enough and add prefix telling about what components it touches e.g "query:" or ".*:" -->

<!-- Don't forget about CHANGELOG!

Changelog entry format:
- [#<PR-id>](<PR-URL>) Thanos <Component> ...

<PR-id> Id of your pull request.
<PR-URL> URL of your PR such as https://github.com/thanos-io/thanos/pull/<PR-id>
<Component> Component affected by your changes such as Query, Store, Receive.

-->

  • [ ] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Changes

Fixes https://github.com/thanos-io/thanos/issues/2024

In https://github.com/thanos-io/thanos/pull/1854, I mistakenly changed the click behavior of the alerts section in Ruler UI.

Now change back to queryURL and it works now. I have already tested in my local env

Verification

<!-- How you tested it? How do you know it works? -->

+71 -71

1 comment

2 changed files

yeya24

pr closed time in a day

issue closedthanos-io/thanos

Thanos-rule : Bad query url in Alerts view

<!-- Template relevant to bug reports only!

Keep issue title verbose enough and add prefix telling about what components it touches e.g "query:" or ".*:" -->

<!-- In case of issues related to exact bucket implementation, please ping corresponded maintainer from list here: https://github.com/thanos-io/thanos/blob/master/docs/storage.md -->

Thanos, Prometheus and Golang version used: Thanos : 0.10.0 Go: go1.13.1 Prometheus: 2.15.2 Docker image : quay.io/thanos/thanos:v0.10.0 on all components <!-- Output of "thanos --version" or docker image:tag used. (Double-check if all deployed components/services have expected versions)

If you are using custom build from master branch, have you checked out the tip of the master? -->

Object Storage Provider: Min.io

What happened: On Ruler web UI, in Alerts section, all url is thanos-rule itself like : https://thanos-rule.ihm/graph?g0.expr=... instead of https://thanos-query.ihm/graph?g0.expr=...

In Rules section all URL are OK.

What you expected to happen: I need correct URL to view query in Thanos Query.

How to reproduce it (as minimally and precisely as possible): Go to Thanos Rule IHM, Alerts section and click one expr or alertname.

Full logs to relevant components:

<!-- Uncomment if you would like to post collapsible logs:

<details>Logs <p>

</p> </details> -->

Anything else we need to know:

<!-- Uncomment and fill if you use not casual environment or if it might be relevant.

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Others:

-->

closed time in a day

zenman94

pull request commentthanos-io/thanos

Fx ruler ui alerts click

Nice :)

yeya24

comment created time in a day

pull request commentopenshift/telemeter

Add missing jsonnet binary to builds

/lgtm /retest

kakkoyun

comment created time in 2 days

Pull request review commentopenshift/telemeter

Add missing jsonnet binary to builds

 $(PROMETHEUS_BIN): $(BIN_DIR) $(EMBEDMD_BIN): $(BIN_DIR) 	GO111MODULE=on go build -mod=vendor -o $@ github.com/campoy/embedmd +$(JSONNET): $(BIN_DIR)

This should be $(JSONNET_BIN) right?

kakkoyun

comment created time in 2 days

Pull request review commentthanos-io/thanos

Cut 0.10.1

-0.10.0

:p

bwplotka

comment created time in 3 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 490616e1a6fa13a80a69478f8759dad1e4b8c6a7

cv: widen at 1300px

view details

push time in 3 days

Pull request review commentopenshift/telemeter

Add missing jsonnet binary to builds

 UP_BIN=$(BIN_DIR)/up MEMCACHED_BIN=$(BIN_DIR)/memched PROMETHEUS_BIN=$(BIN_DIR)/prometheus GOJSONTOYAML_BIN=$(BIN_DIR)/gojsontoyaml+JSONNET?=$(BIN_DIR)/jsonnet # We need jsonnet on CI; here we default to the user's installed jsonnet binary; if nothing is installed, then install go-jsonnet.-JSONNET_BIN=$(if $(shell which jsonnet 2>/dev/null),$(shell which jsonnet 2>/dev/null),$(BIN_DIR)/jsonnet)+JSONNET_BIN=$(if $(shell which jsonnet 2>/dev/null),$(shell which jsonnet 2>/dev/null),$(JSONNET))

to have this be a bit more consistent, let's keep the other one as $(JSONNET_BIN) make this special variable something explicit like $(JSONNET_LOCAL_OR_INSTALLED)

kakkoyun

comment created time in 3 days

pull request commentthanos-io/thanos

cmd/thanos/compact: add bucket UI

ack :+1: :+1: :+1: :+1: :+1:

squat

comment created time in 3 days

pull request commentopenshift/telemeter

Makefile: execute built `jb` from correct path

/lgtm

cben

comment created time in 3 days

pull request commentopenshift/cluster-monitoring-operator

pkg/manifests: don't overwrite telemetry remote write

Yes that test covers what I was asking about

s-urbaniak

comment created time in 3 days

pull request commentopenshift/cluster-monitoring-operator

pkg/manifests: don't overwrite telemetry remote write

The question is:

I am running OpenShift and dependencies on-prem. How do I point telemeter remote write ONLY at my local stack? Appending is not a valid because then prometheus will be failing to write to the canonical URL.

Can I still override the canonical remote-write url using the TelemeterServerURL config option? Or do I have to resolve the canonical DNS name using my internal DNS?

s-urbaniak

comment created time in 3 days

pull request commentopenshift/cluster-monitoring-operator

pkg/manifests: don't overwrite telemetry remote write

@s-urbaniak yes but with this change, the url is no longer replaceable. That is the question I am raising. How do users running their telemetry on-prem point remote write at their on-prem stack? Do they have to make their internal DNS resolve the canonical telemetry name and point internally?

s-urbaniak

comment created time in 3 days

pull request commentopenshift/cluster-monitoring-operator

pkg/manifests: don't overwrite telemetry remote write

Then how do customers running OpenShift on prem or in some offline environment override telemeter? Do they have to override DNS?

s-urbaniak

comment created time in 3 days

pull request commentopenshift/telemeter

Fix remote write metrics

/retest

squat

comment created time in 3 days

pull request commentopenshift/telemeter

Fix remote write metrics

/retest

squat

comment created time in 3 days

pull request commentopenshift/telemeter

dockerfiles: simplify app-sre build tooling

/retest

squat

comment created time in 3 days

pull request commentopenshift/telemeter

Capture total CAM app workload migrations by migration state.

/lgtm /retest

djwhatle

comment created time in 4 days

Pull request review commentopenshift/telemeter

Fix remote write metrics

 const (  var ( 	forwardSamples = prometheus.NewCounter(prometheus.CounterOpts{-		Name: "telemeter_forward_samples_total",-		Help: "Total amount of samples successfully forwarded",-	})-	forwardErrors = prometheus.NewCounter(prometheus.CounterOpts{-		Name: "telemeter_forward_request_errors_total",

ack, confirmed we don't have any alerts or recording rules referencing these metrics today. eventually we'll want to add some alerts based on the error rates tho

squat

comment created time in 4 days

Pull request review commentopenshift/telemeter

Fix remote write metrics

 const (  var ( 	forwardSamples = prometheus.NewCounter(prometheus.CounterOpts{-		Name: "telemeter_forward_samples_total",-		Help: "Total amount of samples successfully forwarded",-	})-	forwardErrors = prometheus.NewCounter(prometheus.CounterOpts{-		Name: "telemeter_forward_request_errors_total",

I'll investigate. In any case, there are no other references to those metrics in this repo, so we'lll need to make adjustments in either obs/cfg or the deployment repos

squat

comment created time in 4 days

PR opened openshift/telemeter

dockerfiles: simplify app-sre build tooling

This commit eliminates the dockerfiles directory at the root of the repository and updates the build_deploy.sh script used by App-SRE tooling to directly use the main Dockerfile.

Signed-off-by: Lucas Servén Marín lserven@gmail.com

cc @jfchevrette @jmelis @brancz

+1 -28

0 comment

4 changed files

pr created time in 4 days

create barnchsquat/telemeter

branch : simplify-app-sre-build

created branch time in 4 days

PR opened openshift/telemeter

Fix remote write metrics

Today, we only track the number of errors encountered when forwarding requests from the v1 upload endpoint to thanos. We don't track the total amount of requests made from the v1 upload endpoint handler. We also don't track any requests made when forwarding requests from the v2 endpoint to thanos.

This PR adds these metrics to give us better visibility during another incident.

cc @metalmatze @brancz @bwplotka @kakkoyun

+35 -15

0 comment

3 changed files

pr created time in 4 days

create barnchsquat/telemeter

branch : fix-remote-write-metrics

created branch time in 4 days

pull request commentopenshift/telemeter

pkg/receive: return correct status code

/retest

squat

comment created time in 4 days

pull request commentopenshift/telemeter

pkg/receive: return correct status code

/benchmark

squat

comment created time in 4 days

pull request commentopenshift/telemeter

pkg/receive: return correct status code

If there's a gateway timeout, then err is not nil, so we hit this line of code: https://github.com/openshift/telemeter/blob/95621237a985b078efbb26d5c4f12500a40e4cd3/pkg/receive/handler.go#L63

squat

comment created time in 5 days

PR opened openshift/telemeter

pkg/receive: return correct status code

status code, i.e. it does not overwrite the status code returned by the downstream handler.

Signed-off-by: Lucas Servén Marín lserven@gmail.com

cc @metalmatze @brancz @kakkoyun @bwplotka

+2 -1

0 comment

1 changed file

pr created time in 5 days

create barnchsquat/telemeter

branch : correctstatuscode

created branch time in 5 days

pull request commentthanos-io/thanos

mixin/thanos: fix typo in alert name

Seems we got a flake. Re-running

simonpasquier

comment created time in 5 days

Pull request review commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

 func (c *controller) waitForCacheSync(stop <-chan struct{}) error { 	return nil } -func (c *controller) worker() {-	for c.queue.get() {+func (c *controller) podWorker() {+	fn := func() bool {+		key, quit := c.podQ.Get()+		if quit {+			return false+		}+		defer c.podQ.Done(key)+		if err := c.cleanUp(key.(*corev1.Pod)); err != nil {+			level.Error(c.logger).Log("msg", "unable to clean up PVC", "err", err)+			c.podQ.AddRateLimited(key)+			return true+		}+		c.podQ.Forget(key)+		return true+	}+	for fn() {+	}+}++func (c *controller) stsWorker() {+	for c.stsQ.get() { 		c.sync() 	} } +func (c *controller) resolvePodOwnerRef(namespace string, refs []metav1.OwnerReference) (*appsv1.StatefulSet, error) {+	for _, ref := range refs {+		// If the owner reference points at the wrong kind of object, skip.+		if ref.Kind != "StatefulSet" {+			continue+		}+		// If the owner reference points at something that we don't have, then skip.+		obj, ok, err := c.ssetInf.GetStore().GetByKey(fmt.Sprintf("%s/%s", namespace, ref.Name))+		if !ok {+			continue+		}+		if err != nil {+			return nil, err+		}+		sts := obj.(*appsv1.StatefulSet)+		if sts.UID != ref.UID {+			return nil, errors.Wrap(err, "owner reference UID does not match StatefulSet")+		}+		return sts, nil+	}+	return nil, nil+}++func (c *controller) generateHelper(name string, sts *appsv1.StatefulSet) *corev1.Pod {+	initContainerTemplate := corev1.Container{+		Name:            "replicate",+		Image:           c.options.cleanupImage,+		ImagePullPolicy: corev1.PullIfNotPresent,+		Args: []string{+			"run",+			"--single-run",+			"--objstorefrom.config=$(OBJSTORE_CONFIG_FROM)",+			"--objstoreto.config=$(OBJSTORE_CONFIG_TO)",+			"--log.level=debug",+		},+		Env: []corev1.EnvVar{+			{+				Name:  "OBJSTORE_CONFIG_FROM",+				Value: "type: FILESYSTEM\nconfig:\n\tdirectory: " + fromMountPath,+			},+			{+				Name: "OBJSTORE_CONFIG_TO",+				ValueFrom: &corev1.EnvVarSource{+					SecretKeyRef: &corev1.SecretKeySelector{+						LocalObjectReference: corev1.LocalObjectReference{+							Name: sts.Labels[objstoreSecretLabelKey],+						},+						Key: sts.Labels[objstoreSecretKeyLabelKey],+					},+				},+			},+		},+	}+	// Inject extra environment variables into the cleanup Pod if provided.+	if _, ok := sts.Labels[envVarSecretLabelKey]; ok {+		initContainerTemplate.EnvFrom = []corev1.EnvFromSource{+			{+				SecretRef: &corev1.SecretEnvSource{+					LocalObjectReference: corev1.LocalObjectReference{+						Name: sts.Labels[envVarSecretLabelKey],+					},+				},+			},+		}+	}++	helper := &corev1.Pod{+		ObjectMeta: metav1.ObjectMeta{+			Name: "cleanup-" + name,+		},+		Spec: corev1.PodSpec{+			RestartPolicy:  corev1.RestartPolicyNever,+			InitContainers: make([]corev1.Container, len(sts.Spec.VolumeClaimTemplates)),+			Containers: []corev1.Container{+				{+					Name:            "cleanup",+					Image:           c.options.cleanupImage,+					Command:         []string{"rm", "-rf"},+					ImagePullPolicy: corev1.PullIfNotPresent,+					VolumeMounts:    make([]corev1.VolumeMount, len(sts.Spec.VolumeClaimTemplates)),+				},+			},+			Volumes: make([]corev1.Volume, len(sts.Spec.VolumeClaimTemplates)),+		},+	}+	var v corev1.Volume+	var vname, mountPath string+	for i, t := range sts.Spec.VolumeClaimTemplates {+		// Create a new Volu,e for this template.+		vname = fmt.Sprintf("%s-%s", t.Name, name)+		v = corev1.Volume{+			Name: vname,+			VolumeSource: corev1.VolumeSource{+				PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{+					ClaimName: vname,+				},+			},+		}+		helper.Spec.Volumes[i] = v+		// Create an init container to replicate this Volume.+		helper.Spec.InitContainers[i] = *initContainerTemplate.DeepCopy()+		helper.Spec.InitContainers[i].VolumeMounts = []corev1.VolumeMount{{+			Name:      v.Name,+			MountPath: fromMountPath,+		}}+		// Add this Volume to the container that removes files.+		mountPath = filepath.Join("/pvc", vname)+		helper.Spec.Containers[0].VolumeMounts[i] = corev1.VolumeMount{+			Name:      v.Name,+			MountPath: mountPath,+		}+		helper.Spec.Containers[0].Command = append(helper.Spec.Containers[0].Command, filepath.Join(mountPath, "*"))+	}++	return helper+}++func (c *controller) cleanUp(pod *corev1.Pod) error {+	c.cleanupAttempts.Inc()+	sts, err := c.resolvePodOwnerRef(pod.Namespace, pod.GetOwnerReferences())+	if err != nil {+		c.cleanupErrors.WithLabelValues(fetchLabel).Inc()+		return errors.Wrap(err, "could not get StatefulSet")+	}+	// This probably means that the Pod did not belong to a StatefulSet with+	// our label selector, i.e. not a StatefulSet we are watching.+	if sts == nil {+		return nil+	}+	// Nothing to clean up.+	if len(sts.Spec.VolumeClaimTemplates) == 0 {+		return nil+	}++	_, secretOK := sts.Labels[objstoreSecretLabelKey]+	_, keyOK := sts.Labels[objstoreSecretKeyLabelKey]+	if !secretOK || !keyOK {+		c.cleanupErrors.WithLabelValues(decodeLabel).Inc()+		return fmt.Errorf("StatefulSet %s/%s is missing either the %s or %s label", sts.Namespace, sts.Name, objstoreSecretLabelKey, objstoreSecretKeyLabelKey)+	}+	helper := c.generateHelper(pod.Name, sts)++	if _, err := c.klient.CoreV1().Pods(pod.Namespace).Create(helper); err != nil && !kerrors.IsAlreadyExists(err) {

done :)

squat

comment created time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha f86fe053bbc6fbae4182a21f99b944faf8843af5

jsonnet: add role for pods and jobs

view details

Lucas Servén Marín

commit sha 3da631aab2d72535c8950e05250e8acc5e047f7e

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha c1a799a995aab829bca7265d063952cfde511f25

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

Pull request review commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

 func (c *controller) waitForCacheSync(stop <-chan struct{}) error { 	return nil } -func (c *controller) worker() {-	for c.queue.get() {+func (c *controller) podWorker() {+	fn := func() bool {+		key, quit := c.podQ.Get()+		if quit {+			return false+		}+		defer c.podQ.Done(key)+		if err := c.cleanUp(key.(*corev1.Pod)); err != nil {+			level.Error(c.logger).Log("msg", "unable to clean up PVC", "err", err)+			c.podQ.AddRateLimited(key)+			return true+		}+		c.podQ.Forget(key)+		return true+	}+	for fn() {+	}+}++func (c *controller) stsWorker() {+	for c.stsQ.get() { 		c.sync() 	} } +func (c *controller) resolvePodOwnerRef(namespace string, refs []metav1.OwnerReference) (*appsv1.StatefulSet, error) {+	for _, ref := range refs {+		// If the owner reference points at the wrong kind of object, skip.+		if ref.Kind != "StatefulSet" {+			continue+		}+		// If the owner reference points at something that we don't have, then skip.+		obj, ok, err := c.ssetInf.GetStore().GetByKey(fmt.Sprintf("%s/%s", namespace, ref.Name))+		if !ok {+			continue+		}+		if err != nil {+			return nil, err+		}+		sts := obj.(*appsv1.StatefulSet)+		if sts.UID != ref.UID {+			return nil, errors.Wrap(err, "owner reference UID does not match StatefulSet")+		}+		return sts, nil+	}+	return nil, nil+}++func (c *controller) generateHelper(name string, sts *appsv1.StatefulSet) *corev1.Pod {+	initContainerTemplate := corev1.Container{

init containers by design all execute in series; the rm -rf container could even be a second initcontainer. so I don't think this is much of an abuse. another option is to have multiple jobs that are dispatched one after another, i.e. one backup job, then a cleanup job. However, this seems like a lot of book-keeping for not much benefit

squat

comment created time in 6 days

pull request commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

This PR is currently blocked on https://github.com/observatorium/thanos-replicate/blob/eb8dd2444ac8a24b1cf2d458055d1e1c9f727591/scheme.go#L197.

Because there are no labels on the block before the receive shipper uploads them, thanos-replicate skips the block when running in the cleanup job.

Will make a PR to move this forward

squat

comment created time in 6 days

Pull request review commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

 func (c *controller) waitForCacheSync(stop <-chan struct{}) error { 	return nil } -func (c *controller) worker() {-	for c.queue.get() {+func (c *controller) podWorker() {+	fn := func() bool {+		key, quit := c.podQ.Get()+		if quit {+			return false+		}+		defer c.podQ.Done(key)+		if err := c.cleanUp(key.(*corev1.Pod)); err != nil {+			level.Error(c.logger).Log("msg", "unable to clean up PVC", "err", err)+			c.podQ.AddRateLimited(key)+			return true+		}+		c.podQ.Forget(key)+		return true+	}+	for fn() {+	}+}++func (c *controller) stsWorker() {+	for c.stsQ.get() { 		c.sync() 	} } +func (c *controller) resolvePodOwnerRef(namespace string, refs []metav1.OwnerReference) (*appsv1.StatefulSet, error) {+	for _, ref := range refs {+		// If the owner reference points at the wrong kind of object, skip.+		if ref.Kind != "StatefulSet" {+			continue+		}+		// If the owner reference points at something that we don't have, then skip.+		obj, ok, err := c.ssetInf.GetStore().GetByKey(fmt.Sprintf("%s/%s", namespace, ref.Name))+		if !ok {+			continue+		}+		if err != nil {+			return nil, err+		}+		sts := obj.(*appsv1.StatefulSet)+		if sts.UID != ref.UID {+			return nil, errors.Wrap(err, "owner reference UID does not match StatefulSet")+		}+		return sts, nil+	}+	return nil, nil+}++func (c *controller) generateHelper(name string, sts *appsv1.StatefulSet) *corev1.Pod {+	initContainerTemplate := corev1.Container{+		Name:            "replicate",+		Image:           c.options.cleanupImage,+		ImagePullPolicy: corev1.PullIfNotPresent,+		Args: []string{+			"run",+			"--single-run",+			"--objstorefrom.config=$(OBJSTORE_CONFIG_FROM)",+			"--objstoreto.config=$(OBJSTORE_CONFIG_TO)",+			"--log.level=debug",+		},+		Env: []corev1.EnvVar{+			{+				Name:  "OBJSTORE_CONFIG_FROM",+				Value: "type: FILESYSTEM\nconfig:\n\tdirectory: " + fromMountPath,+			},+			{+				Name: "OBJSTORE_CONFIG_TO",+				ValueFrom: &corev1.EnvVarSource{+					SecretKeyRef: &corev1.SecretKeySelector{+						LocalObjectReference: corev1.LocalObjectReference{+							Name: sts.Labels[objstoreSecretLabelKey],+						},+						Key: sts.Labels[objstoreSecretKeyLabelKey],+					},+				},+			},+		},+	}+	// Inject extra environment variables into the cleanup Pod if provided.

I considered this but I am a bit hesitant. I would really want to keep the scope of customizations minimal, especially because there controller has to override several fields:

  • volumes (to specify the pvcs)
  • container volume mounts ( to mount the pvcs)
  • container env (to set the 2 objstore configs, one of which points to the local pvc)
  • container cmd + args (to pass the env var to the cmd) In order to override those fields we need to select the container index in a stable way.
squat

comment created time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha ba8dadb7402a955ed8b04c3127005b43cfa4fdc1

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

Pull request review commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

 func (c *controller) waitForCacheSync(stop <-chan struct{}) error { 	return nil } -func (c *controller) worker() {-	for c.queue.get() {+func (c *controller) podWorker() {+	fn := func() bool {+		key, quit := c.podQ.Get()+		if quit {+			return false+		}+		defer c.podQ.Done(key)+		if err := c.cleanUp(key.(*corev1.Pod)); err != nil {+			level.Error(c.logger).Log("msg", "unable to clean up PVC", "err", err)+			c.podQ.AddRateLimited(key)+			return true+		}+		c.podQ.Forget(key)+		return true+	}+	for fn() {+	}+}++func (c *controller) stsWorker() {+	for c.stsQ.get() { 		c.sync() 	} } +func (c *controller) resolvePodOwnerRef(namespace string, refs []metav1.OwnerReference) (*appsv1.StatefulSet, error) {+	for _, ref := range refs {+		// If the owner reference points at the wrong kind of object, skip.+		if ref.Kind != "StatefulSet" {+			continue+		}+		// If the owner reference points at something that we don't have, then skip.+		obj, ok, err := c.ssetInf.GetStore().GetByKey(fmt.Sprintf("%s/%s", namespace, ref.Name))+		if !ok {+			continue+		}+		if err != nil {+			return nil, err+		}+		sts := obj.(*appsv1.StatefulSet)+		if sts.UID != ref.UID {+			return nil, errors.Wrap(err, "owner reference UID does not match StatefulSet")+		}+		return sts, nil+	}+	return nil, nil+}++func (c *controller) generateHelper(name string, sts *appsv1.StatefulSet) *corev1.Pod {+	initContainerTemplate := corev1.Container{+		Name:            "replicate",+		Image:           c.options.cleanupImage,+		ImagePullPolicy: corev1.PullIfNotPresent,+		Args: []string{+			"run",+			"--single-run",+			"--objstorefrom.config=$(OBJSTORE_CONFIG_FROM)",+			"--objstoreto.config=$(OBJSTORE_CONFIG_TO)",+			"--log.level=debug",+		},+		Env: []corev1.EnvVar{+			{+				Name:  "OBJSTORE_CONFIG_FROM",+				Value: "type: FILESYSTEM\nconfig:\n\tdirectory: " + fromMountPath,+			},+			{+				Name: "OBJSTORE_CONFIG_TO",+				ValueFrom: &corev1.EnvVarSource{+					SecretKeyRef: &corev1.SecretKeySelector{+						LocalObjectReference: corev1.LocalObjectReference{+							Name: sts.Labels[objstoreSecretLabelKey],+						},+						Key: sts.Labels[objstoreSecretKeyLabelKey],+					},+				},+			},+		},+	}+	// Inject extra environment variables into the cleanup Pod if provided.

so we can inject e.g. AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID, which we don't put inside of the thanos objstore secret

squat

comment created time in 6 days

Pull request review commentobservatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

 func (c *controller) waitForCacheSync(stop <-chan struct{}) error { 	return nil } -func (c *controller) worker() {-	for c.queue.get() {+func (c *controller) podWorker() {+	fn := func() bool {+		key, quit := c.podQ.Get()+		if quit {+			return false+		}+		defer c.podQ.Done(key)+		if err := c.cleanUp(key.(*corev1.Pod)); err != nil {+			level.Error(c.logger).Log("msg", "unable to clean up PVC", "err", err)+			c.podQ.AddRateLimited(key)+			return true+		}+		c.podQ.Forget(key)+		return true+	}+	for fn() {+	}+}++func (c *controller) stsWorker() {+	for c.stsQ.get() { 		c.sync() 	} } +func (c *controller) resolvePodOwnerRef(namespace string, refs []metav1.OwnerReference) (*appsv1.StatefulSet, error) {+	for _, ref := range refs {+		// If the owner reference points at the wrong kind of object, skip.+		if ref.Kind != "StatefulSet" {+			continue+		}+		// If the owner reference points at something that we don't have, then skip.+		obj, ok, err := c.ssetInf.GetStore().GetByKey(fmt.Sprintf("%s/%s", namespace, ref.Name))+		if !ok {+			continue+		}+		if err != nil {+			return nil, err+		}+		sts := obj.(*appsv1.StatefulSet)+		if sts.UID != ref.UID {+			return nil, errors.Wrap(err, "owner reference UID does not match StatefulSet")+		}+		return sts, nil+	}+	return nil, nil+}++func (c *controller) generateHelper(name string, sts *appsv1.StatefulSet) *corev1.Pod {+	initContainerTemplate := corev1.Container{+		Name:            "replicate",+		Image:           c.options.cleanupImage,+		ImagePullPolicy: corev1.PullIfNotPresent,+		Args: []string{+			"run",+			"--single-run",+			"--objstorefrom.config=$(OBJSTORE_CONFIG_FROM)",+			"--objstoreto.config=$(OBJSTORE_CONFIG_TO)",+			"--log.level=debug",+		},+		Env: []corev1.EnvVar{+			{+				Name:  "OBJSTORE_CONFIG_FROM",+				Value: "type: FILESYSTEM\nconfig:\n\tdirectory: " + fromMountPath,+			},+			{+				Name: "OBJSTORE_CONFIG_TO",+				ValueFrom: &corev1.EnvVarSource{+					SecretKeyRef: &corev1.SecretKeySelector{+						LocalObjectReference: corev1.LocalObjectReference{+							Name: sts.Labels[objstoreSecretLabelKey],+						},+						Key: sts.Labels[objstoreSecretKeyLabelKey],+					},+				},+			},+		},+	}+	// Inject extra environment variables into the cleanup Pod if provided.+	if _, ok := sts.Labels[envVarSecretLabelKey]; ok {+		initContainerTemplate.EnvFrom = []corev1.EnvFromSource{+			{+				SecretRef: &corev1.SecretEnvSource{+					LocalObjectReference: corev1.LocalObjectReference{+						Name: sts.Labels[envVarSecretLabelKey],+					},+				},+			},+		}+	}++	helper := &corev1.Pod{+		ObjectMeta: metav1.ObjectMeta{+			Name: "cleanup-" + name,+		},+		Spec: corev1.PodSpec{+			RestartPolicy:  corev1.RestartPolicyNever,+			InitContainers: make([]corev1.Container, len(sts.Spec.VolumeClaimTemplates)),+			Containers: []corev1.Container{+				{+					Name:            "cleanup",+					Image:           c.options.cleanupImage,+					Command:         []string{"rm", "-rf"},+					ImagePullPolicy: corev1.PullIfNotPresent,+					VolumeMounts:    make([]corev1.VolumeMount, len(sts.Spec.VolumeClaimTemplates)),+				},+			},+			Volumes: make([]corev1.Volume, len(sts.Spec.VolumeClaimTemplates)),+		},+	}+	var v corev1.Volume+	var vname, mountPath string+	for i, t := range sts.Spec.VolumeClaimTemplates {+		// Create a new Volu,e for this template.+		vname = fmt.Sprintf("%s-%s", t.Name, name)+		v = corev1.Volume{+			Name: vname,+			VolumeSource: corev1.VolumeSource{+				PersistentVolumeClaim: &corev1.PersistentVolumeClaimVolumeSource{+					ClaimName: vname,+				},+			},+		}+		helper.Spec.Volumes[i] = v+		// Create an init container to replicate this Volume.+		helper.Spec.InitContainers[i] = *initContainerTemplate.DeepCopy()+		helper.Spec.InitContainers[i].VolumeMounts = []corev1.VolumeMount{{+			Name:      v.Name,+			MountPath: fromMountPath,+		}}+		// Add this Volume to the container that removes files.+		mountPath = filepath.Join("/pvc", vname)+		helper.Spec.Containers[0].VolumeMounts[i] = corev1.VolumeMount{+			Name:      v.Name,+			MountPath: mountPath,+		}+		helper.Spec.Containers[0].Command = append(helper.Spec.Containers[0].Command, filepath.Join(mountPath, "*"))+	}++	return helper+}++func (c *controller) cleanUp(pod *corev1.Pod) error {+	c.cleanupAttempts.Inc()+	sts, err := c.resolvePodOwnerRef(pod.Namespace, pod.GetOwnerReferences())+	if err != nil {+		c.cleanupErrors.WithLabelValues(fetchLabel).Inc()+		return errors.Wrap(err, "could not get StatefulSet")+	}+	// This probably means that the Pod did not belong to a StatefulSet with+	// our label selector, i.e. not a StatefulSet we are watching.+	if sts == nil {+		return nil+	}+	// Nothing to clean up.+	if len(sts.Spec.VolumeClaimTemplates) == 0 {+		return nil+	}++	_, secretOK := sts.Labels[objstoreSecretLabelKey]+	_, keyOK := sts.Labels[objstoreSecretKeyLabelKey]+	if !secretOK || !keyOK {+		c.cleanupErrors.WithLabelValues(decodeLabel).Inc()+		return fmt.Errorf("StatefulSet %s/%s is missing either the %s or %s label", sts.Namespace, sts.Name, objstoreSecretLabelKey, objstoreSecretKeyLabelKey)+	}+	helper := c.generateHelper(pod.Name, sts)++	if _, err := c.klient.CoreV1().Pods(pod.Namespace).Create(helper); err != nil && !kerrors.IsAlreadyExists(err) {

we have to watch pods anyways, these keeps the number of informers stores down. we can definitely use a job instead though

squat

comment created time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha abea992db91189867d13ca6d8e2f1c463a25fda8

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha 58ba741e06892d746a619ba20a01405e63dacfb5

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha 3af3eda34dc8d65880fd39416c69294f28607fc7

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 6 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha cd42c5a1391155db11ec601655a3a7015dbc7e93

go.* revendor

view details

Lucas Servén Marín

commit sha f4b54005c715c5800737442fce0ce7e0a80d5363

jsonnet: add role for pods

view details

Lucas Servén Marín

commit sha 9e540da6626ab3c7d14ae0934e0218194c5e0ac4

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 9 days

PR opened observatorium/thanos-receive-controller

main.go: enable cleanup of old PVCs

This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and rm -rfs the contents of them.

Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

+355 -39

0 comment

5 changed files

pr created time in 9 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha 18212ad866b580ebdfe68b506c8d31401ec1d632

jsonnet: add role for pods

view details

Lucas Servén Marín

commit sha 46ff0c8a939ad76bbf604bcfa73b4d9669432f97

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 9 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha 7d9a2979f6cd88b376104c548e2928f9ca6f32e4

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 9 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha 198744aca6895a1fb3b4969ae9ad16d9403356ea

main.go: enable cleanup of old PVCs This commit enables the controller to clean up the PVCs of Thanos Receive Pods that are watched by it. Whenever a receiver is deleted, the controller will spawn a helper container that mounts all PVCs specified by the StatefulSet for that container and `rm -rf`s the contents of them. Tested on a kind cluster. A follow up PR will add E2E tests and verify this functionality.

view details

push time in 9 days

create barnchsquat/thanos-receive-controller

branch : cleanup-pvcs

created branch time in 9 days

issue commentthanos-io/thanos

receive: can not load WAL data when restart

This entire code path is simply leveraging Prometheus TSDB packages. Do we see similar spikes in memory when Prometheus restarts and replays a large WAL?

GuyCheung

comment created time in 9 days

pull request commentopenshift/telemeter

Consume authorize_url from Secret

/approve

maorfr

comment created time in 10 days

pull request commentopenshift/telemeter

Consume authorize_url from Secret

/retest

maorfr

comment created time in 11 days

Pull request review commentobservatorium/configuration

Fix namespace issues with service DNS

-local kt = (import 'kube-thanos.libsonnet');- (import 'observatorium/observatorium-api.libsonnet') {   observatorium+:: {-    namespace:: 'observatorium',+    local namespace = 'observatorium',

Yes please. I’d prefer we not establish any more unneeded jsonnet patterns I think in some of those other uses we actually used the local variable more than once

kakkoyun

comment created time in 11 days

Pull request review commentobservatorium/configuration

Fix namespace issues with service DNS

-local kt = (import 'kube-thanos.libsonnet');- (import 'observatorium/observatorium-api.libsonnet') {   observatorium+:: {-    namespace:: 'observatorium',+    local namespace = 'observatorium',

Why is this local necessary? Vs namespace:: 'observatorium'

kakkoyun

comment created time in 11 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 21299c5197bc251f589c880fca16a5cd2e0aae57

fix scaling

view details

push time in 11 days

PR closed openshift/telemeter

Reviewers
DO NOT MERGE make test fail to read logs approved size/XS

make the integration test fail so that we can read the logs from prow

+4 -2

3 comments

1 changed file

squat

pr closed time in 11 days

pull request commentopenshift/telemeter

test: use UP for querying in integration test

/retest

squat

comment created time in 11 days

pull request commentopenshift/telemeter

DO NOT MERGE make test fail to read logs

/test integration

squat

comment created time in 11 days

pull request commentopenshift/telemeter

test: use UP for querying in integration test

/retest

squat

comment created time in 11 days

push eventsquat/telemeter

Lucas Servén Marín

commit sha 4f44aac558222dc9362f65c6cd9bb7576ae06863

make test fail to read logs

view details

push time in 11 days

push eventsquat/telemeter

Lucas Servén Marín

commit sha 6e06a6358fffb1b96af2313eecaca540a0230bda

make,test: fix running memcached

view details

push time in 11 days

PR opened openshift/telemeter

DO NOT MERGE make test fail to read logs

make the integration test fail so that we can read the logs from prow

+2 -2

0 comment

1 changed file

pr created time in 11 days

create barnchsquat/telemeter

branch : testfail

created branch time in 11 days

push eventsquat/telemeter

Lucas Servén Marín

commit sha 3134e125a41bd94487ef2ac24b46cbcbf62306e6

test: use UP for querying in integration test This commit updates the telemter v2 integration test to use UP for the querying functionality, rather than using a bash loop. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

Lucas Servén Marín

commit sha 036c2fe7aecb302f957fd2cff3f293ee074cb836

vendor: re-vendor Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

push time in 11 days

pull request commentopenshift/telemeter

test: use UP for querying in integration test

/retest

squat

comment created time in 11 days

push eventobservatorium/configuration

Kemal Akkoyun

commit sha a479fdb756dba23887c7259e99252e28e9a6eff0

Add observatorium api resources

view details

Kemal Akkoyun

commit sha a4bdeaf5e4c4d3a34023589ba2dbbf7744b00457

Add observatorium api openshift resources

view details

Kemal Akkoyun

commit sha bc6e17dfcf84a293814ad316f755368d875419af

Add observatorium api service monitor

view details

Kemal Akkoyun

commit sha f9354721b8744620e8b0607609c1b063e3d5c636

Separate observatorium-api template

view details

Kemal Akkoyun

commit sha 3997e4d8e829b40eedea6b2eefbddb6384c4a4d1

Separate concerns more clearly

view details

Lucas Servén Marín

commit sha ac03e86f2c294007357c517609355522ed09c2d0

Merge pull request #153 from kakkoyun/obs_api_stage Add Observatorium API manifests

view details

push time in 11 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 31863fe0eb54af11a3b4a2cfa21b8453653db124

fix typo

view details

push time in 11 days

issue commentthanos-io/thanos

Thanos Receiver Design

Can you please explain why? Working with receive is a lot simpler in a multi-cluster case - you just need one store per storage on each cluster. When using the side-car, you need to be able to query both the store and Prometheus from each cluster...

It’s a bit subjective but in the end it comes down to the relative difficulties of:

  • ensuring open and secure ports to read metrics from sidecars of multiple clusters and configuring querier with every endpoint

vs

  • running and maintaining an ingestion hashring of receivers and scaling as needed

If all of the clusters are owned by you, then the former can be quite easy and so the latter may not make sense. On the other hand, if you don’t control the clusters, then the former is likely not even an option.

tdinucci

comment created time in 11 days

Pull request review commentthanos-io/thanos

receive: add receive writer metrics.

 func runReceive( 					} 					level.Info(logger).Log("msg", "tsdb started") 					localStorage.Set(db.Get(), startTimeMargin)-					webHandler.SetWriter(receive.NewWriter(log.With(logger, "component", "receive-writer"), localStorage))+					webHandler.SetWriter(receive.NewWriter(log.With(logger, "component", "receive-writer"), reg, localStorage))

Have you tested this on a receive that flushes it’s database? I don’t think this should work. Receive creates a new writer, and thus would register metrics, every time every time it flushes it’s DB. But if you try to register the same metrics multiple times, the registerer will panic. I think you will need to use the receive.UnRegisterer here.

johncming

comment created time in 11 days

Pull request review commentthanos-io/thanos

receive: add receive writer metrics.

 type Appendable interface { type Writer struct { 	logger log.Logger 	append Appendable++	numOutOfOrder  prometheus.Gauge+	numDuplicates  prometheus.Gauge

These looks like they should probably all be counters instead of gauges

johncming

comment created time in 11 days

push eventthanos-io/thanos

Kemal Akkoyun

commit sha 46a97fdc9765bbe7eb9d75d801abf186015e1b01

mixin: Add Thanos Ruler alerts (#1963) * Add Thanos Ruler alerts Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix wrong job selector Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add pod label as aggregator for evaluation latency query Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

view details

push time in 11 days

PR merged thanos-io/thanos

mixin: Add Thanos Ruler alerts

This PR adds missing Ruler alerts to thanos-mixin.

Alerts are inspired by https://thanos.io/components/rule.md/#must-have-essential-ruler-alerts

Signed-off-by: Kemal Akkoyun kakkoyun@gmail.com

  • [ ] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Changes

  • Adds Ruler alerts
  • Adds alert queue metric panels to Thanos Rule dashboards

Verification

For rules:

  • make example-rules-lint
  • Manual tests using Query UI against running cluster.

For dashboards:

  • Imported generated dashboard JSON to a Grafana with live data.
+467 -38

3 comments

6 changed files

kakkoyun

pr closed time in 11 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 214a65380e84d9ddfaa4bb27e747db2364104ece

use png

view details

push time in 12 days

Pull request review commentobservatorium/configuration

Add Observatorium API manifests

+local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';+local list = import 'telemeter/lib/list.libsonnet';++local service = k.core.v1.service;+local configmap = k.core.v1.configMap;+local secret = k.core.v1.secret;+local deployment = k.apps.v1.deployment;++(import '../kubernetes/observatorium.libsonnet') ++{+  observatorium+:: {+    local namespace = '${NAMESPACE}',+    namespace:: namespace,++    proxyImage:: '${PROXY_IMAGE}:${PROXY_IMAGE_TAG}',+    proxyConfig+:: {+      sessionSecret: '',+    },++    api+: {+      image:: '${OBSERVATORIUM_API_IMAGE}:${OBSERVATORIUM_API_IMAGE_TAG}',++      // The proxy secret is there to encrypt session created by the oauth proxy.+      proxySecret:+        secret.new('observatorium-proxy', {+          session_secret: std.base64($.observatorium.proxyConfig.sessionSecret),+        }) ++        secret.mixin.metadata.withNamespace(namespace) ++        secret.mixin.metadata.withLabels({ 'app.kubernetes.io/name': $.observatorium.api.name }),++      service+:+        service.mixin.metadata.withNamespace(namespace) ++        service.mixin.metadata.withAnnotations({+          'service.alpha.openshift.io/serving-cert-secret-name': 'observatorium-tls',+        }) + {+          spec+: {+            ports+: [+              service.mixin.spec.portsType.newNamed('https', 8081, 'https'),+            ],+          },+        },+      local volume = deployment.mixin.spec.template.spec.volumesType,+      local container = deployment.mixin.spec.template.spec.containersType,+      local volumeMount = container.volumeMountsType,+      deployment+:+        {+          spec+: {+            template+: {+              spec+: {+                containers: [+                  if c.name == 'observatorium-api' then c {+                    resources: {+                      requests: {+                        cpu: '${OBSERVATORIUM_API_CPU_REQUEST}',+                        memory: '${OBSERVATORIUM_API_MEMORY_REQUEST}',+                      },+                      limits: {+                        cpu: '${OBSERVATORIUM_API_CPU_LIMIT}',+                        memory: '${OBSERVATORIUM_API_MEMORY_LIMIT}',+                      },+                    },+                  } else c+                  for c in super.containers+                ] + [+                  container.new('proxy', $.observatorium.proxyImage) ++                  container.withArgs([+                    '-provider=openshift',+                    '-https-address=:%d' % $.observatorium.api.service.spec.ports[1].port,+                    '-http-address=',+                    '-email-domain=*',+                    '-upstream=http://localhost:%d' % $.observatorium.api.service.spec.ports[0].port,+                    '-openshift-service-account=prometheus-telemeter',+                    '-openshift-sar={"resource": "namespaces", "verb": "get", "name": "${NAMESPACE}", "namespace": "${NAMESPACE}"}',+                    '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get", "name": "${NAMESPACE}", "namespace": "${NAMESPACE}"}}',+                    '-tls-cert=/etc/tls/private/tls.crt',+                    '-tls-key=/etc/tls/private/tls.key',+                    '-client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token',+                    '-cookie-secret-file=/etc/proxy/secrets/session_secret',+                    '-openshift-ca=/etc/pki/tls/cert.pem',+                    '-openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt',+                    '-skip-auth-regex=^/metrics',+                  ]) ++                  container.withPorts([+                    { name: 'https', containerPort: $.observatorium.api.service.spec.ports[1].port },+                  ]) ++                  container.withVolumeMounts(+                    [+                      volumeMount.new('secret-api-tls', '/etc/tls/private'),+                      volumeMount.new('secret-api-proxy', '/etc/proxy/secrets'),+                    ]+                  ),+                ],+              },+            },+          },+        } ++        deployment.mixin.metadata.withNamespace(namespace) ++        deployment.mixin.spec.withReplicas('${OBSERVATORIUM_API_REPLICAS}') ++        deployment.mixin.spec.template.spec.withServiceAccount('prometheus-telemeter') ++        deployment.mixin.spec.template.spec.withServiceAccountName('prometheus-telemeter') ++        deployment.mixin.spec.template.spec.withVolumes([+          volume.fromSecret('secret-api-tls', 'api-tls'),+          volume.fromSecret('secret-api-proxy', 'api-proxy'),+        ]),+    },+  },+} + {+  apiVersion: 'v1',+  kind: 'Template',+  metadata: {+    name: 'observatorium',+  },+  objects: [+    $.observatorium.api[name]+    for name in std.objectFields($.observatorium.api)+  ],+  parameters: [

this part about adding the parameters is specific to the templating part; should this go in the observatorium.jsonnet, as that file is for generating the template?

kakkoyun

comment created time in 12 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 38b812f24ebacfa675173452fde7c2f6b35e93ed

style index

view details

push time in 12 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha c1c12dce093b5358caca079d97c48b390f5ce2d4

add index page

view details

push time in 12 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 47544d38b512c6ccc1a5cb6cccf9084a9e86f741

simplify contact layout

view details

push time in 12 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha df17be9a9390b12553ea9d797c61392a90716aa1

fix for small widths

view details

push time in 12 days

startedarxiv-vanity/engrafo

started time in 12 days

push eventsquat/squat.github.io

Lucas Servén Marín

commit sha 88f950d1059b671e0b1fb550060dbb05bbe47cb3

Create CNAME

view details

push time in 12 days

create barnchsquat/squat.github.io

branch : master

created branch time in 12 days

created repositorysquat/squat.github.io

created time in 12 days

pull request commentobservatorium/observatorium

Provide example k8s manifests

looks good to me generally :+1:

kakkoyun

comment created time in 13 days

pull request commentobservatorium/observatorium

Provide example k8s manifests

It would be easier to review this PR if it was broken down into multiply smaller PRs, e.g.

  • introduce image tagging
  • add jsonnet tooling
  • add k8s manifests
kakkoyun

comment created time in 13 days

push eventsquat/telemeter

Lucas Servén Marín

commit sha fab7a88322d3d522c810e68b005b445bfc16e51b

vendor: re-vendor Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

push time in 13 days

push eventsquat/telemeter

Lucas Servén Marín

commit sha b7dbf5186635018263e907e301f4d062c0331c31

test: use UP for querying in integration test This commit updates the telemter v2 integration test to use UP for the querying functionality, rather than using a bash loop. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

push time in 13 days

issue commentthanos-io/thanos

Thanos Receiver Design

Ack, at the risk of hindsight bias, would it be clearer to say simply:

Instead of directly scraping metrics, however, the Thanos receiver accepts Prometheus remote-write requests and writes these into a local TSDB.

tdinucci

comment created time in 13 days

pull request commentopenshift/telemeter

test: use UP for querying in integration test

/unhold

squat

comment created time in 13 days

pull request commentopenshift/telemeter

test: use UP for querying in integration test

/hold-cancel

squat

comment created time in 13 days

push eventsquat/telemeter

Rafael Porres Molina

commit sha a78bbe37edd7d113c3fe546ca3e908aa67bfe7a7

OWNERS: add rporres

view details

Kemal Akkoyun

commit sha 6bffe61c61626b5001a8dc76ff2a88b93a75e472

Use specific version of UP

view details

OpenShift Merge Robot

commit sha d37e44d98dbb2c9e91b7e79f13a2c992cad7f8b6

Merge pull request #294 from kakkoyun/fix_up test: Use specific version of UP to fix broken integration tests

view details

Lucas Servén Marín

commit sha 91d116b469b649e46e0503c5ee9673fee4aeabb0

vendor: revendor

view details

Lucas Servén Marín

commit sha a5ff789104c7469e795c43f56ab6274cad194d56

Makefile,tools.go: use tools for build binaries This commit switches to using the tools pattern for installing the go build dependencies. This allows us to more easily pin our dependency on UP so that we can fix the integration test.

view details

OpenShift Merge Robot

commit sha c54ca978aed9394790056ab8948020dcca855675

Merge pull request #291 from rporres/add-rporres-to-owners OWNERS: add rporres

view details

OpenShift Merge Robot

commit sha 46439efe387e426ede37ad2a87cf08d578e5ea84

Merge pull request #293 from squat/tools tools.go: fix integration test by pinning deps

view details

Lucas Servén Marín

commit sha f459b6f1a65f90479608cabb0a82bb98e3773113

test: use UP for querying in integration test This commit updates the telemter v2 integration test to use UP for the querying functionality, rather than using a bash loop. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

push time in 13 days

issue commentthanos-io/thanos

Thanos Receiver Design

@tdinucci there are two complimentary strategies to avoid losing data that is sent by Prometheus that has not yet been uploaded to S3.

  1. Putting the TSDB on a persistent volume. Thanos Receive uses the same TSDB package as Prometheus, which features a WAL. This means that even if the process dies suddenly, all samples that have been ACK'd but not yet written to a full block in the TSDB are saved in the WAL. If the server uses some type of persistent storage, then next time the process starts back up, it can re-read the WAL and pick up again from where it left off.
  2. Enable replication on Thanos Receive. With this feature you can enable dynamo-style replication of all data received. If you set the replication factor to 3, then Thanos Receive will only ACK a request when at least 2 nodes have successfully written the data to their local TSDB. Now, even if you lose a node AND its storage, you can guarantee that the data will be stored on at least one more node.
tdinucci

comment created time in 13 days

issue commentthanos-io/thanos

Thanos Receiver Design

Hi @tdinucci regarding question 1, Thanos receive does not require a new Prometheus instance per Prometheus on the tenant side. All metrics received by Thanos Receive are written to a local TSDB and the these completed blocks are uploaded. The flow is: app <--scrape-- Prometheus --remote-write--> Thanos Receive Hashring --write--> local TSDB on disk --upload--> S3

Please let us know what part of the doc is confusing so we can clarify it going forward.

tdinucci

comment created time in 13 days

delete branch squat/thanos-receive-controller

delete branch : gRPC

delete time in 13 days

push eventobservatorium/thanos-receive-controller

Lucas Servén Marín

commit sha c3125b9bf4dfd9f9e7f7648422961a5e4a6010e5

main.go: controller generate gRPC endpoints This commit updates the controller to generate gRPC endpoints rather than HTTP endpoints, following https://github.com/thanos-io/thanos/pull/1970. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

Lucas Servén Marín

commit sha c4e9017b07f93690506fc675d19f3a52a9ee7b6b

Merge pull request #37 from squat/gRPC main.go: controller generate gRPC endpoints

view details

push time in 13 days

PR merged observatorium/thanos-receive-controller

main.go: controller generate gRPC endpoints

This commit updates the controller to generate gRPC endpoints rather than HTTP endpoints, following https://github.com/thanos-io/thanos/pull/1970.

Signed-off-by: Lucas Servén Marín lserven@gmail.com

cc @kakkoyun @metalmatze

+19 -26

0 comment

2 changed files

squat

pr closed time in 13 days

push eventsquat/thanos-receive-controller

Lucas Servén Marín

commit sha c3125b9bf4dfd9f9e7f7648422961a5e4a6010e5

main.go: controller generate gRPC endpoints This commit updates the controller to generate gRPC endpoints rather than HTTP endpoints, following https://github.com/thanos-io/thanos/pull/1970. Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

view details

push time in 13 days

PR opened observatorium/thanos-receive-controller

main.go: controller generate gRPC endpoints

This commit updates the controller to generate gRPC endpoints rather than HTTP endpoints, following https://github.com/thanos-io/thanos/pull/1970.

Signed-off-by: Lucas Servén Marín lserven@gmail.com

cc @kakkoyun @metalmatze

+18 -19

0 comment

2 changed files

pr created time in 13 days

create barnchsquat/thanos-receive-controller

branch : gRPC

created branch time in 13 days

more