profile
viewpoint
Shea Stewart stewartshea @ArctiqTeam Canada www.arctiq.ca Hanging out with @ArctiqTeam helping teams build great software through adoption of devops cultural and automation practices.

BCDevOps/openshift-wiki 9

Gitbook URL of WIKI

BCDevOps/platform-services 7

Collection of platform related tools and configurations

BCDevOps/devops-platform-workshops 4

OCP Training Workshop Material and Labs

BCDevOps/openshift-components 4

A trove of apps/components/stuff/things that are usable by anyone running OpenShift - in particular BC Gov teams "doing" Agile/DevOps.

BCDevOps/developer-experience 2

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)

bcgov/ocp-sso 2

Single Sign-on

ArctiqTeam/openshift-gocd 1

Tools for running GoCD on OpenShift

BCDevOps/OpenShift4-RollOut 1

This is the primary board for all activities related to the roll out of OpenShift 4

issue closedBCDevOps/developer-experience

Jenkins Optimization

Based on dashboard results, we can get some headroom by reducing jenkins cpu requests;

  • [x] Create dashboard of Jenkins usage
  • [x] Test Jenkins startup time with various configurations
  • [x] Write up report on utilization
  • [x] Write up recommendations in devhub for Jenkins resource configurations
  • [x] Send note to teams around results

This is all about reducing load on OCP3; source ticket here: https://github.com/BCDevOps/OpenShift4-RollOut/issues/201

closed time in a day

stewartshea

issue closedhashicorp/vault-k8s

Single Injector Pod: remote error: tls: bad certificate

Describe the bug The vault injector pod is rejecting calls for an injection due to a certificate validation error:

2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.22:58500: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.28:52266: remote error: tls: bad certificate

This feels unrelated to #141 this configuration is only configured with a single replica. This just seemly stopped working about a day ago, and debugging via different versions and recreating the objects has not led to any additional insights.

In the latest attempt, one actually got the agent, and then on a recreation, consistently failed.

  • Delete pods from namespace
$ kubectl delete pods --all -n metrics
pod "grafana-6dddc59d7c-nsbfn" deleted
pod "influxdb-5bb4c44c4f-mpgnq" deleted
  • Review logs from injector
$ kubectl logs vault-agent-injector-6596487b7d-9b6j6
Listening on ":8080"...
2020-08-07T13:05:12.307Z [INFO]  handler: Starting handler..
Updated certificate bundle received. Updating certs...
2020-08-07T13:05:45.801Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-08-07T13:05:45.807Z [DEBUG] handler: checking if should inject agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: checking namespaces..
2020-08-07T13:05:45.807Z [DEBUG] handler: setting default annotations..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating new agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: validating agent configuration..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating patches for the pod..
2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
  • Check status of injected pods (notice 50% success)
[sheastewart@sheawei ~]$ kubectl get pods -n metrics
NAME                        READY   STATUS             RESTARTS   AGE
grafana-6dddc59d7c-hv9dn    2/2     Running            0          45s
influxdb-5bb4c44c4f-59tcj   0/1     CrashLoopBackOff   2          45s
  • Delete pods again
$ kubectl delete pods --all -n metrics
pod "grafana-6dddc59d7c-hv9dn" deleted
pod "influxdb-5bb4c44c4f-59tcj" deleted
  • Review logs from injector
$ kubectl logs vault-agent-injector-6596487b7d-9b6j6
Listening on ":8080"...
2020-08-07T13:05:12.307Z [INFO]  handler: Starting handler..
Updated certificate bundle received. Updating certs...
2020-08-07T13:05:45.801Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-08-07T13:05:45.807Z [DEBUG] handler: checking if should inject agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: checking namespaces..
2020-08-07T13:05:45.807Z [DEBUG] handler: setting default annotations..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating new agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: validating agent configuration..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating patches for the pod..
2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.22:58500: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.28:52266: remote error: tls: bad certificate
2020/08/07 13:06:37 http: TLS handshake error from 10.162.0.20:55824: remote error: tls: bad certificate
2020/08/07 13:06:37 http: TLS handshake error from 10.162.0.22:58798: remote error: tls: bad certificate
  • Check status of injected pods (0% success)
$ kubectl get pods -n metrics
NAME                        READY   STATUS             RESTARTS   AGE
grafana-6dddc59d7c-zhn7t    0/1     CrashLoopBackOff   1          17s
influxdb-5bb4c44c4f-7wrzw   0/1     Error              2          17s

To Reproduce Steps to reproduce the behavior:

  1. Delete pods with injection annotations
  2. Watch logs of vault-agent injector
  3. See error from vault injector logs

Application deployment:

  • Injector deployment (in namespace vault-injector)
---
# Source: vault/templates/injector-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vault-agent-injector
---
# Source: vault/templates/injector-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: vault-agent-injector-svc
spec:
  ports:
  - port: 443
    targetPort: 8080
  selector:
    app: vault-agent-injector
---
# Source: vault/templates/injector-deployment.yaml
# Deployment for the injector
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vault-agent-injector
  labels:
    app: vault-agent-injector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vault-agent-injector
  template:
    metadata:
      labels:
        app: vault-agent-injector
    spec:
      serviceAccountName: "vault-agent-injector"
      securityContext:
        runAsNonRoot: true
        runAsGroup: 1000
        runAsUser: 100
      containers:
        - name: sidecar-injector
          
          image: "hashicorp/vault-k8s:0.4.0"
          imagePullPolicy: "IfNotPresent"
          env:
            - name: AGENT_INJECT_LISTEN
              value: ":8080"
            - name: AGENT_INJECT_LOG_LEVEL
              value: "trace"
            - name: AGENT_INJECT_VAULT_ADDR
              value: "[REMOTE SERVER"
            - name: AGENT_INJECT_VAULT_IMAGE
              value: "vault:1.4.2"
            - name: AGENT_INJECT_TLS_AUTO
              value: vault-agent-injector-cfg
            - name: AGENT_INJECT_TLS_AUTO_HOSTS
              value: vault-agent-injector-svc,vault-agent-injector-svc.vault-injector,vault-agent-injector-svc.vault-injector.svc
          args:
            - agent-inject
            - 2>&1
          livenessProbe:
            httpGet:
              path: /health/ready
              port: 8080
              scheme: HTTPS
            failureThreshold: 2
            initialDelaySeconds: 1
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 5
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
              scheme: HTTPS
            failureThreshold: 2
            initialDelaySeconds: 2
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 5
---
# Source: vault/templates/injector-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vault-agent-injector-clusterrole
rules:
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["mutatingwebhookconfigurations"]
  verbs: 
    - "get"
    - "list"
    - "watch"
    - "patch"
---
# Source: vault/templates/injector-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vault-agent-injector-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: vault-agent-injector-clusterrole
subjects:
- kind: ServiceAccount
  name: vault-agent-injector
  namespace: vault-injector
---
# Source: vault/templates/injector-mutating-webhook.yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  name: vault-agent-injector-cfg
webhooks:
  - name: vault.hashicorp.com
    clientConfig:
      service:
        name: vault-agent-injector-svc
        namespace: vault-injector
        path: "/mutate"
      caBundle: 
    rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: [""]
        apiVersions: ["v1"]
        resources: ["pods"]


Expected behavior A clear and concise description of what you expected to happen.

Environment

  • Kubernetes version: v1.15.12-gke.2
    • Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): GKE
    • Other configuration options or runtime services (istio, etc.): preemptible nodes
  • vault-k8s version: 0.4.0 (tested on 0.3.0 as well)

Additional context Add any other context about the problem here.

  • This is currently being deployed via Anthos Configuration Management (similar to GKE Config-Sync). We are investigating whether the deployment tool is somehow related to this issue. From a timeline perspective, this issue aligns with an update to the ACM components.
  • The GKE cluster uses preemptible nodes, however, there were no node replacement events that occurred at this time.

closed time in 2 days

stewartshea

issue commenthashicorp/vault-k8s

Single Injector Pod: remote error: tls: bad certificate

Well.. after a day of digging, a team member pointed me in the right direction to help determine the cause:

  • The upgrade to ACM 1.4 was indeed causing the error, stuck on an application loop to the mutatingwebhookconfiguration

  • From the syncer logs

$ kubectl logs -f  syncer-7b7654c7b5-68gk9 -n config-mangement-system
I0807 13:46:18.244289       1 apply.go:238] Patched [admissionregistration.k8s.io/v1beta1 kind=MutatingWebhookConfiguration name=vault-agent-injector-cfg]
I0807 13:46:18.261382       1 util.go:71] Status for ClusterConfig "config-management-cluster-config" is already up-to-date.
I0807 13:46:19.204569       1 apply.go:238] Patched [admissionregistration.k8s.io/v1beta1 kind=MutatingWebhookConfiguration name=vault-agent-injector-cfg]
I0807 13:46:19.223145       1 util.go:71] Status for ClusterConfig "config-management-cluster-config" is already up-to-date.
I0807 13:46:19.266502       1 apply.go:238] Patched [admissionregistration.k8s.io/v1beta1 kind=MutatingWebhookConfiguration name=vault-agent-injector-cfg]
  • Reviewing the mutatingwebhookconfiguration we can see the version increasing
$ kubectl describe  mutatingwebhookconfiguration vault-agent-injector-cfg
Name:         vault-agent-injector-cfg
Namespace:    
Labels:       app.kubernetes.io/managed-by=configmanagement.gke.io
Annotations:  configmanagement.gke.io/cluster-name: 
              configmanagement.gke.io/declared-config:
                {"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{"configmanagement.gk...
              configmanagement.gke.io/managed: enabled
              configmanagement.gke.io/source-path: cluster/vault-injector-cluster.yaml
API Version:  admissionregistration.k8s.io/v1beta1
Kind:         MutatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2020-08-07T13:23:13Z
  Generation:          2843
  Resource Version:    121732734
...

$ kubectl describe  mutatingwebhookconfiguration vault-agent-injector-cfg
Name:         vault-agent-injector-cfg
Namespace:    
Labels:       app.kubernetes.io/managed-by=configmanagement.gke.io
Annotations:  configmanagement.gke.io/cluster-name: 
              configmanagement.gke.io/declared-config:
                {"apiVersion":"admissionregistration.k8s.io/v1beta1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{"configmanagement.gk...
              configmanagement.gke.io/managed: enabled
              configmanagement.gke.io/source-path: cluster/vault-injector-cluster.yaml
API Version:  admissionregistration.k8s.io/v1beta1
Kind:         MutatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2020-08-07T13:23:13Z
  Generation:          2873
  Resource Version:    121732860
...

I'm unsure why ACM is continuously patching the mutatingwebhookconfiguration at this time, but our resolution for now is to pull this configuration out of ACM.

stewartshea

comment created time in 2 days

issue openedhashicorp/vault-k8s

Single Injector Pod: remote error: tls: bad certificate

Describe the bug The vault injector pod is rejecting calls for an injection due to a certificate validation error:

2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.22:58500: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.28:52266: remote error: tls: bad certificate

This feels unrelated to #141 this configuration is only configured with a single replica. This just seemly stopped working about a day ago, and debugging via different versions and recreating the objects has not led to any additional insights.

In the latest attempt, one actually got the agent, and then on a recreation, consistently failed.

  • Delete pods from namespace
$ kubectl delete pods --all -n metrics
pod "grafana-6dddc59d7c-nsbfn" deleted
pod "influxdb-5bb4c44c4f-mpgnq" deleted
  • Review logs from injector
$ kubectl logs vault-agent-injector-6596487b7d-9b6j6
Listening on ":8080"...
2020-08-07T13:05:12.307Z [INFO]  handler: Starting handler..
Updated certificate bundle received. Updating certs...
2020-08-07T13:05:45.801Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-08-07T13:05:45.807Z [DEBUG] handler: checking if should inject agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: checking namespaces..
2020-08-07T13:05:45.807Z [DEBUG] handler: setting default annotations..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating new agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: validating agent configuration..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating patches for the pod..
2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
  • Check status of injected pods (notice 50% success)
[sheastewart@sheawei ~]$ kubectl get pods -n metrics
NAME                        READY   STATUS             RESTARTS   AGE
grafana-6dddc59d7c-hv9dn    2/2     Running            0          45s
influxdb-5bb4c44c4f-59tcj   0/1     CrashLoopBackOff   2          45s
  • Delete pods again
$ kubectl delete pods --all -n metrics
pod "grafana-6dddc59d7c-hv9dn" deleted
pod "influxdb-5bb4c44c4f-59tcj" deleted
  • Review logs from injector
$ kubectl logs vault-agent-injector-6596487b7d-9b6j6
Listening on ":8080"...
2020-08-07T13:05:12.307Z [INFO]  handler: Starting handler..
Updated certificate bundle received. Updating certs...
2020-08-07T13:05:45.801Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-08-07T13:05:45.807Z [DEBUG] handler: checking if should inject agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: checking namespaces..
2020-08-07T13:05:45.807Z [DEBUG] handler: setting default annotations..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating new agent..
2020-08-07T13:05:45.807Z [DEBUG] handler: validating agent configuration..
2020-08-07T13:05:45.807Z [DEBUG] handler: creating patches for the pod..
2020/08/07 13:05:45 http: TLS handshake error from 10.162.0.20:55478: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.22:58500: remote error: tls: bad certificate
2020/08/07 13:06:08 http: TLS handshake error from 10.162.0.28:52266: remote error: tls: bad certificate
2020/08/07 13:06:37 http: TLS handshake error from 10.162.0.20:55824: remote error: tls: bad certificate
2020/08/07 13:06:37 http: TLS handshake error from 10.162.0.22:58798: remote error: tls: bad certificate
  • Check status of injected pods (0% success)
$ kubectl get pods -n metrics
NAME                        READY   STATUS             RESTARTS   AGE
grafana-6dddc59d7c-zhn7t    0/1     CrashLoopBackOff   1          17s
influxdb-5bb4c44c4f-7wrzw   0/1     Error              2          17s

To Reproduce Steps to reproduce the behavior:

  1. Delete pods with injection annotations
  2. Watch logs of vault-agent injector
  3. See error from vault injector logs

Application deployment:

  • Injector deployment (in namespace vault-injector)
---
# Source: vault/templates/injector-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vault-agent-injector
---
# Source: vault/templates/injector-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: vault-agent-injector-svc
spec:
  ports:
  - port: 443
    targetPort: 8080
  selector:
    app: vault-agent-injector
---
# Source: vault/templates/injector-deployment.yaml
# Deployment for the injector
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vault-agent-injector
  labels:
    app: vault-agent-injector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vault-agent-injector
  template:
    metadata:
      labels:
        app: vault-agent-injector
    spec:
      serviceAccountName: "vault-agent-injector"
      securityContext:
        runAsNonRoot: true
        runAsGroup: 1000
        runAsUser: 100
      containers:
        - name: sidecar-injector
          
          image: "hashicorp/vault-k8s:0.4.0"
          imagePullPolicy: "IfNotPresent"
          env:
            - name: AGENT_INJECT_LISTEN
              value: ":8080"
            - name: AGENT_INJECT_LOG_LEVEL
              value: "trace"
            - name: AGENT_INJECT_VAULT_ADDR
              value: "[REMOTE SERVER"
            - name: AGENT_INJECT_VAULT_IMAGE
              value: "vault:1.4.2"
            - name: AGENT_INJECT_TLS_AUTO
              value: vault-agent-injector-cfg
            - name: AGENT_INJECT_TLS_AUTO_HOSTS
              value: vault-agent-injector-svc,vault-agent-injector-svc.vault-injector,vault-agent-injector-svc.vault-injector.svc
          args:
            - agent-inject
            - 2>&1
          livenessProbe:
            httpGet:
              path: /health/ready
              port: 8080
              scheme: HTTPS
            failureThreshold: 2
            initialDelaySeconds: 1
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 5
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
              scheme: HTTPS
            failureThreshold: 2
            initialDelaySeconds: 2
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 5
---
# Source: vault/templates/injector-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vault-agent-injector-clusterrole
rules:
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["mutatingwebhookconfigurations"]
  verbs: 
    - "get"
    - "list"
    - "watch"
    - "patch"
---
# Source: vault/templates/injector-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vault-agent-injector-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: vault-agent-injector-clusterrole
subjects:
- kind: ServiceAccount
  name: vault-agent-injector
  namespace: vault-injector
---
# Source: vault/templates/injector-mutating-webhook.yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  name: vault-agent-injector-cfg
webhooks:
  - name: vault.hashicorp.com
    clientConfig:
      service:
        name: vault-agent-injector-svc
        namespace: vault-injector
        path: "/mutate"
      caBundle: 
    rules:
      - operations: ["CREATE", "UPDATE"]
        apiGroups: [""]
        apiVersions: ["v1"]
        resources: ["pods"]


Expected behavior A clear and concise description of what you expected to happen.

Environment

  • Kubernetes version: v1.15.12-gke.2
    • Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.): GKE
    • Other configuration options or runtime services (istio, etc.): preemptible nodes
  • vault-k8s version: 0.4.0 (tested on 0.3.0 as well)

Additional context Add any other context about the problem here.

  • This is currently being deployed via Anthos Configuration Management (similar to GKE Config-Sync). We are investigating whether the deployment tool is somehow related to this issue. From a timeline perspective, this issue aligns with an update to the ACM components.
  • The GKE cluster uses preemptible nodes, however, there were no node replacement events that occurred at this time.

created time in 2 days

issue commentBCDevOps/developer-experience

RocketChat Flatpak Login Issue with GitHub 2FA

I'm unaware of any other issues with this, but I've also stopped using the flatpak app. I'd suggest closing for now if nobody else has reported the issue.

stewartshea

comment created time in 4 days

issue commenthashicorp/vault-k8s

Need secrets as key and value pair available in pod environment directly

@hoantran3108 Here is the command instruction that I used to do this with an influxdb image. Of course the final command will depend on the configuration of your particular image.

        command: ["/bin/bash", "-c", "source /vault/secrets/influxdb_env && ./entrypoint.sh influxd"]

arunporwal

comment created time in 5 days

issue commenthashicorp/vault-k8s

put secret generated from annotations template to existing path / like consul destination directive

In my case I still needed to create arbitrary directory structures to support the way grafana is configured, but it looks like #158 may solve this for me.

ch0mik

comment created time in 7 days

startedargoproj/argo-rollouts

started time in 7 days

issue commenthashicorp/vault-k8s

put secret generated from annotations template to existing path / like consul destination directive

As a workaround, I was able to use the command feature like this:

        vault.hashicorp.com/agent-inject-command-datasource.yaml: /bin/sh -c 'mkdir
          -p /vault/secrets/provisioning/datasources && ln -s /vault/secrets/datasource.yaml
          /vault/secrets/provisioning/datasources/datasource.yaml'
ch0mik

comment created time in 9 days

issue commenthashicorp/vault-k8s

put secret generated from annotations template to existing path / like consul destination directive

I too could use a feature like this, or even the ability to create subdirectories under the /vault/secrets shared path. Grafana has a need to access configuration files under a couple of different directories, which would possibly work with something like this layout;

  • /vault/secrets/provisioning/datastources/<keys>
  • '/vault/secrets/provisioning/dashboards/<keys>`
ch0mik

comment created time in 9 days

issue commentBCDevOps/developer-experience

Jenkins Optimization

Everything from my side has been completed on this task @lukegonis

I would suggest that possible we discuss the last item with @mitovskaol about sending a note to teams about reducing their Jenkins configurations per the docs and guidelines we've created

stewartshea

comment created time in 11 days

push eventBCDevOps/platform-services

stewartshea

commit sha 6fa734176d538f35035f4bf1c7a2f3b06674eb2b

initial load of azure key vault code

view details

push time in 12 days

issue commentBCDevOps/OpenShift4-RollOut

Vault Automation Design for OCP 4

I've had to reach out to Andy to get an SP created in Azure which will then allow us to use tf cloud to build the key vaults

j-pye

comment created time in 12 days

push eventstewartshea/envoy-simple-test

stewartshea

commit sha 7437e05f9f50250cdbc7dcad15aad9ed4749b334

minor doc updates

view details

push time in 17 days

push eventstewartshea/envoy-simple-test

stewartshea

commit sha 5688b96c0b652b1cd89c16779055487107b833f0

fix image

view details

push time in 17 days

push eventstewartshea/envoy-simple-test

stewartshea

commit sha 9cf638f1ecddc6f72559b97e14e21162cc8dda88

update docs and sidecar details

view details

push time in 17 days

issue commentBCDevOps/developer-experience

Isolate DB access without Aporeto

That's not true. I'm documenting the second sidecar approach and then sending it to him. The process works, but we need a clearer set of example documentation

Shea Stewart (He/Him), Partner Arctiq: Intelligent Architecture shea.stewart@arctiq.ca +1-647-972-5191 http://www.arctiq.ca Book at meeting here: https://x.ai/calendar/sheastewart

Why I Put Pronouns on my Email Signature https://medium.com/gender-inclusivit/why-i-put-pronouns-on-my-email-signature-and-linkedin-profile-and-you-should-too-d3dc942c8743

On Thu, Jul 16, 2020 at 1:17 PM Jeff notifications@github.com wrote:

Sidecar option is with Wade to test/validate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BCDevOps/developer-experience/issues/517#issuecomment-659648210, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELXHOHCSCEDC65AEI2E7ZDR35N6TANCNFSM4OBAFC7A .

jefkel

comment created time in 23 days

push eventstewartshea/envoy-simple-test

stewartshea

commit sha 4dba8bdd5ce5e5cfd5697a9f232ba93150e69431

doc touchups

view details

stewartshea

commit sha 041bf91c87c87cc29f7eefe4006d82202fac4464

update docs

view details

stewartshea

commit sha 3446237eec01627ea60c60e3c1803804af548c7b

commit changes with envoy sidecar

view details

push time in 24 days

issue commentBCDevOps/platform-services

Sysdig Teams Operator - Alerts are global?!?

It looks like this feature has been added now and teams have access to their own alerts. Further testing is required but at first glance it all looks good.

stewartshea

comment created time in a month

issue closedBCDevOps/platform-services

Sysdig teams operator bug/error with updating existing team

The sysdig teams operator creates the following error when an existing team is updated. It appears to modify the team correctly, but errors out when trying to add a default dashboard (which should ideally be skipped)

Status:
  Conditions:
    Last Transition Time:  2020-01-27T20:12:15Z
    Message:               Running reconciliation
    Reason:                Running
    Status:                False
    Type:                  Running
    Ansible Result:
      Changed:             0
      Completion:          2020-01-27T20:12:53.233935
      Failures:            1
      Ok:                  32
      Skipped:             10
    Last Transition Time:  2020-01-27T20:12:53Z
    Message:               The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'json'

The error appears to be in '/opt/ansible/roles/monitoring/tasks/set_default_dashboard.yml': line 13, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Get token from Admin user of team
  ^ here

    Reason:  Failed
    Status:  True

closed time in a month

stewartshea

delete branch BCDevOps/platform-services

delete branch : bug/sysdig-teams

delete time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha aa76ef124e02da30dd8aeb255265b86aeac11101

test gh-actions

view details

stewartshea

commit sha 518e736673e90f0c2c4b2324e46f4031b0b9349d

added name

view details

stewartshea

commit sha 23ac36a66208db26bf630ad0da889a3301d96d13

update workflow

view details

stewartshea

commit sha d0d96277f9ee9bf15a62371cefddca65c6c235a8

test update to actions

view details

stewartshea

commit sha 9096a3d48da093d9e094d4bce16fa3a5d268a38a

update workflow

view details

stewartshea

commit sha 06151d75db211749c937f5460d2c24f6c1a431f3

syntax fix

view details

stewartshea

commit sha 131917d4f24f950dd27fc65453711a63b604c4ad

add runs-on

view details

stewartshea

commit sha a427c2095ad71942fafdad07d3bfeda8e035342d

update path

view details

stewartshea

commit sha 7c52411c7c68858a048e16af0828c47efba333be

update workflow with branch

view details

stewartshea

commit sha 2120d05fbc38174149d8eeb65b46b4c08f9724e4

fix typo

view details

stewartshea

commit sha aeb5caa6f4fbdedd6214346b70ce9323f7d3c8e4

update openshift action definition

view details

stewartshea

commit sha 84033b073180dafe5b2ba2f458531c69143155c8

test workflow mod

view details

stewartshea

commit sha 77cfe50a76e2c890555007e541f889a1717b27ea

update workflow

view details

stewartshea

commit sha dc458469ff63de931b8b1fb83d63f70b6f2351d2

fix secret ref

view details

stewartshea

commit sha 78dbb0fd66e8dd489a1c6d1451874e3903ca6f51

update workflow

view details

stewartshea

commit sha 903a7af2a60b3ad8a7ce740174ff59c282e005a0

change workflow version

view details

stewartshea

commit sha 4c329dde3ade10ffe37a2347b6515c546a095771

mv dockerfile to root of operator

view details

stewartshea

commit sha c91d7b8b411a94ef95abc594858a80f46e9d81c7

update workflow

view details

stewartshea

commit sha 4c560587a32bf9d977cd2a327b0fb2f8ebfcadd1

dummy tax

view details

stewartshea

commit sha 2c43ee770a7c2e06c4084ca88822d2e645c84a3b

double dummy tax

view details

push time in a month

pull request commentBCDevOps/platform-services

bug/sysdig-teams

fixes #613 and part of #614

stewartshea

comment created time in a month

PR opened BCDevOps/platform-services

bug/sysdig-teams
+69 -4

0 comment

7 changed files

pr created time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 591cb755a7964bc48b74c30714f8ab16d18e6b81

fix up wildcard

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 82b2f87afd0d406e87e0c033df5afaae96a28ec2

debugging lab workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha a3d6f13beada67e352297107b1dbe367b8d2fe1d

reduce delete scope in github actions

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 1dd4bb195a8de173dd8927d74b60739dea479b20

still debugging actions filter

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 99a589a1349c1e2a4b91f433bb32620b866e8697

debugging actions filter

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 78e2295d32c655b28cc8f6624ec5dfa0aadd71db

remove wildcard from actions

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 9dd01426825161fda77df99615820b507afd92f6

add workflows to path

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha e19773b7fb06639db2184d0663c479391b72d8e6

test path kick

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha fb93efb187e87ae73bb3e731183c3767beb667b6

update github actions path

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 21ad80be1f4442359903765d897de6354254d7c7

update build pipelin

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha c95a6f0e0ad33c60b19f1845d796dc5da96c036b

add lab build and change pathfinder build. Fix playbook

view details

stewartshea

commit sha 164e174b16f39dd5893f7c8686f3d1fabc4459a2

update lab build pipeline

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 10f80f490fd8936c3268bf8c62c65f4392693223

update actions path

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 8d8bd5e0564d7dcbbdd579b2f493d83874bcb7ba

change build name and image in manifest

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 88f9449b1b1439de4f2448f4aff43ba9b26aeac9

fixup final command

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 3feae6358a3f0d7238ce44fad283140dbebef40d

test new branchref idea

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 7129537d2dff27686b58b223fba71fbba2318105

test new branch ref

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 2c43ee770a7c2e06c4084ca88822d2e645c84a3b

double dummy tax

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 4c560587a32bf9d977cd2a327b0fb2f8ebfcadd1

dummy tax

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha c91d7b8b411a94ef95abc594858a80f46e9d81c7

update workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 4c329dde3ade10ffe37a2347b6515c546a095771

mv dockerfile to root of operator

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 903a7af2a60b3ad8a7ce740174ff59c282e005a0

change workflow version

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 78dbb0fd66e8dd489a1c6d1451874e3903ca6f51

update workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha dc458469ff63de931b8b1fb83d63f70b6f2351d2

fix secret ref

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 77cfe50a76e2c890555007e541f889a1717b27ea

update workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 84033b073180dafe5b2ba2f458531c69143155c8

test workflow mod

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha aeb5caa6f4fbdedd6214346b70ce9323f7d3c8e4

update openshift action definition

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 2120d05fbc38174149d8eeb65b46b4c08f9724e4

fix typo

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 7c52411c7c68858a048e16af0828c47efba333be

update workflow with branch

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha a427c2095ad71942fafdad07d3bfeda8e035342d

update path

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 131917d4f24f950dd27fc65453711a63b604c4ad

add runs-on

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 06151d75db211749c937f5460d2c24f6c1a431f3

syntax fix

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 9096a3d48da093d9e094d4bce16fa3a5d268a38a

update workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha d0d96277f9ee9bf15a62371cefddca65c6c235a8

test update to actions

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 23ac36a66208db26bf630ad0da889a3301d96d13

update workflow

view details

push time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 518e736673e90f0c2c4b2324e46f4031b0b9349d

added name

view details

push time in a month

create barnchBCDevOps/platform-services

branch : bug/sysdig-teams

created branch time in a month

delete branch BCDevOps/platform-services

delete branch : dashboard-update

delete time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha a9d2a414fe3954d9b6d1470bb28ebebfefe50706

dashboard update

view details

stewartshea

commit sha 6839695d4dae243caef683e88e3e72ae4d23e7ca

update templates

view details

stewartshea

commit sha fded64f293b0c42800b123a77487b33dd72fcc3a

cleanup file

view details

stewartshea

commit sha 5ce3ccf34a8d0996976490d521366f222e42807b

fix filenames

view details

stewartshea

commit sha 5cefc8e1b7e1e784eb9dadf92edeab05613ffa50

Change time interval to 15 minutes

view details

Shea Stewart

commit sha d5b258d8047d0d2a8b77ac5970fc5a9e69f40194

Merge pull request #699 from BCDevOps:dashboard-update dashboard-update

view details

push time in a month

PR opened BCDevOps/platform-services

dashboard-update
+216 -1475

0 comment

3 changed files

pr created time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 5cefc8e1b7e1e784eb9dadf92edeab05613ffa50

Change time interval to 15 minutes

view details

push time in a month

issue commentBCDevOps/developer-experience

Sysdig Dashboard: CPU/Mem Capacity

The other changes have been applied to the dashboard.

sbarre-esit

comment created time in a month

issue commentBCDevOps/developer-experience

Sysdig Dashboard: CPU/Mem Capacity

I'm getting some feedback from Sysdig on the longer timeframe since it tends to continue to increase for the longer time periods which doesn't make too much sense.

sbarre-esit

comment created time in a month

push eventBCDevOps/platform-services

stewartshea

commit sha 5ce3ccf34a8d0996976490d521366f222e42807b

fix filenames

view details

push time in a month

create barnchBCDevOps/platform-services

branch : dashboard-update

created branch time in a month

issue commentBCDevOps/developer-experience

Sysdig Dashboard: CPU/Mem Capacity

Following the latest review, a few updates are required;

  • [ ] Add links to the resiliency guidelines
  • [ ] Node description
  • [ ] # of Prod Projects
  • [ ] Remove Limits
  • [ ] Stable threshold at 75% < Red >
  • [ ] Look into the graphing of the number - 24 - 72 hour period.
  • [ ] Simplify time view (or remove time picker)
sbarre-esit

comment created time in a month

IssuesEvent

issue openedBCDevOps/platform-services

Change StatusPage Auth to DevHub Realm

The statuspage is currently configured to the next gen security realm, and should be the devhub realm.

Definition of done; Reconfigure auth configuration and validate that users can access via IDIR and GitHub

created time in a month

issue openedBCDevOps/OpenShift4-RollOut

Integrate ARO with config management

The ARO Canadian Cluster needs additional configurations and exploration; for this task, integrate the cluster into a gitops based configuration framework.

Definition of Done Demonstrate configuration changes via a GitOps approach to the platform-services ARO cluster.

created time in a month

issue closedBCDevOps/OpenShift4-RollOut

Build ARO (azure) 4.x Cluster for Training and Learning

We would like to understand the ARO offering better and need to offload training use cases from the OCP 3.11 cluster:

  • [x] Build ARO Cluster
  • [x] Configure ARO Cluster
  • [x] Test training guides on new cluster
  • [x] Handover to Platform Experience team to retrofit OCP 101 training to OCP4 material
  • [x] Move to Canada

closed time in a month

stewartshea

issue commentBCDevOps/developer-experience

Isolate DB access without Aporeto

One approach has been documented. A second approach, with sidecar configurations, is also being developed.

jefkel

comment created time in a month

issue commentBCDevOps/developer-experience

Enable PromQL

Waiting on a timeline from Sysdig TAM

stewartshea

comment created time in a month

push eventstewartshea/envoy-simple-test

stewartshea

commit sha 9d9b13f37093286264dcd3af7c673353532bfd5d

reorg

view details

push time in a month

create barnchstewartshea/envoy-simple-test

branch : master

created branch time in a month

created repositorystewartshea/envoy-simple-test

created time in a month

create barnchBCDevOps/platform-services

branch : update-sysdig-resources

created branch time in a month

issue commentBCDevOps/developer-experience

INC0032789 Sysdig agents disconnecting

I'm not even sure it's CPU related since those pods don't seem to have that type of spike.

Lets follow the guidelines from the updated support ticket... Update the version and collect some logs again

sbarre-esit

comment created time in a month

create barnchBCDevOps/platform-services

branch : capacity-dashboard-update

created branch time in 2 months

pull request commentBCDevOps/platform-services

disable appcheck and jmx and statsd

Further related to https://github.com/BCDevOps/developer-experience/issues/327 @sbarre-esit can you apply the configmap changes to prod app nodes and reload the agents?

stewartshea

comment created time in 2 months

issue commentBCDevOps/developer-experience

INC0032789 Sysdig agents disconnecting

Created PR https://github.com/BCDevOps/platform-services/pull/697 to disable additional metric checks to see if we can fully reduce the crashing. @sbarre-esit can you apply the configmap changes and reload the agents in prod?

sbarre-esit

comment created time in 2 months

create barnchBCDevOps/platform-services

branch : sysdig-debug-configmap-test

created branch time in 2 months

issue commentBCDevOps/developer-experience

Modify the default % report on the Sysdig dashboard to show % or requests instead of limits

Additional updates;

  • [ ] total pods
  • [ ] total namespaces
  • [ ] limits capacity
  • [ ] look into removing non long-running pods from resource quota ?
stewartshea

comment created time in 2 months

issue commentBCDevOps/devops-platform-workshops

OCP4 ARO Cluster requires RWX storage class

@patricksimonian I've created a storageclass called azure-file. Take it for a spin.

patricksimonian

comment created time in 2 months

issue commentBCDevOps/platform-services-status-page-notifications

why do we delete our old cluster advisories? Wouldn't it be helpful to have that history somewhere?

@wenzowski We purge the old notices to keep the folder clean and to reduce the notices that are posted to the online app. There are other ways to clean this up, but we felt no need to keep them live in the folder since we have a full commit history.

stewartshea

comment created time in 2 months

pull request commentBCDevOps/platform-services-status-page-notifications

Initial draft of comms for PROD master cert renewal

@wenzowski I'll respond to your question in issue #85

wmhutchison

comment created time in 2 months

issue openedBCDevOps/platform-services-status-page-notifications

why do we delete our old cluster advisories? Wouldn't it be helpful to have that history somewhere?

Reposting from the pull request comment:

@mitovskaol why do we delete our old cluster advisories? Wouldn't it be helpful to have that history somewhere? /cc @Maralsotoudehnia

created time in 2 months

issue commentBCDevOps/devops-platform-workshops

OCP4 ARO Cluster requires RWX storage class

@patricksimonian I need to investigate the right approach to provisioning azure-file and investigate the permission structure. I'm actively working on this and will have some findings for you within a day or two.

patricksimonian

comment created time in 2 months

issue commentBCDevOps/OpenShift4-RollOut

Determine new DNS FQDN for platform-services cluster

This can be created under aro.devops.gov.bc.ca when deployed in Azure

stewartshea

comment created time in 2 months

issue commentBCDevOps/developer-experience

Sysdig configuration tuning to fix agent crashes

We should likely close this in favour of #327

stewartshea

comment created time in 2 months

issue commentBCDevOps/OpenShift4-RollOut

ARO DNS Decision for Canadian Deployment

@sbarre-esit I don't think it's about the specific cluster name yet, but more about the subdomain it will belong to. I'm thinking something like aro.devops.gov.bc.ca?

stewartshea

comment created time in 2 months

issue openedBCDevOps/OpenShift4-RollOut

ARO DNS Decision for Canadian Deployment

We will need a subdomain created and delegated to Azure for ARO related clusters.

created time in 2 months

issue commentBCDevOps/developer-experience

INC0032789 Sysdig agents disconnecting

Possibly figuring out how to remove other stats from collection that we aren't using today based on feedback from support.

sbarre-esit

comment created time in 2 months

issue closedBCDevOps/developer-experience

Sysdig Dashboard: CPU/Mem Capacity

https://trello.com/c/M5zL2PME/104-sysdig-dashboard-cpu-mem-capacity

Mirror AdvSol Grafana Compute cluster top 6 panels

  • CPU and Mem - Usage, Requests, Limits (over time)

closed time in 2 months

sbarre-esit

issue commentBCDevOps/developer-experience

Enable PromQL

The new dashboard format has been enabled, but it looks like PromQL is still in the queue. I'll follow up.

image.png

stewartshea

comment created time in 2 months

more