profile
viewpoint
Gonzalo Peci pecigonzalo Palma de Mallorca, Spain linkedin.com/in/pecig

pecigonzalo/cadvisor 1

Analyzes resource usage and performance characteristics of running containers.

pecigonzalo/agent 0

The Buildkite Agent is an open-source toolkit written in Golang for securely running build jobs on any device or network

pecigonzalo/amazon-ecr-credential-helper 0

Automatically gets credentials for Amazon ECR on docker push/docker pull

pecigonzalo/amazon-ecs-agent 0

Amazon EC2 Container Service Agent

pecigonzalo/amazon-ecs-cli 0

The Amazon ECS CLI enables users to run their applications on ECS/Fargate using the Docker Compose file format, quickly provision resources, push/pull images in ECR, and monitor running applications on ECS/Fargate.

pecigonzalo/amplify-cli 0

A CLI toolchain for simplifying serverless web and mobile development.

pecigonzalo/application_python 0

A Chef cookbook to deploy Python applications.

pecigonzalo/atom-project-shell-env 0

Atom package to load shell env variables from project directory

delete branch sourcegraph/deploy-sourcegraph-docker

delete branch : feature/readinessProbeforAll

delete time in 2 hours

push eventsourcegraph/deploy-sourcegraph-docker

Gonzalo Peci

commit sha c2bf177bfb25a66768d16c098ae5cdf14a3d872e

Increase minimum interval and timeout for all services (#130)

view details

push time in 2 hours

PR merged sourcegraph/deploy-sourcegraph-docker

Increase minimum interval and timeout for all services

Increase interval and timeout of all services to be less strict.

Sister deploy-sourcegraph change: https://github.com/sourcegraph/deploy-sourcegraph/pull/817

+9 -9

0 comment

1 changed file

pecigonzalo

pr closed time in 2 hours

push eventsourcegraph/deploy-sourcegraph

Gonzalo Peci

commit sha 552d2e6f55ada3fe32bb5f62fde7d238c84068ca

Increase minimum failureThreshold and periodSeconds (#817) This affects searcher and indexed-search

view details

push time in 2 hours

delete branch sourcegraph/deploy-sourcegraph

delete branch : feature/readinessProbeforAll

delete time in 2 hours

PR merged sourcegraph/deploy-sourcegraph

Increase minimum failureThreshold and periodSeconds

Change the failureThreshold and periodSeconds of searcher and indexed-searcher to be less strict.

Sister deploy-sourcegraph-docker change: https://github.com/sourcegraph/deploy-sourcegraph-docker/pull/130

+4 -4

0 comment

2 changed files

pecigonzalo

pr closed time in 2 hours

pull request commentsourcegraph/deploy-sourcegraph

repo-updater: Add readinessProbe

Rerunning tests, as the old tests ran with the old image. I have also verified this locally and probes work correctly. Hopefully this will reduce the deployment downtime for this service.

pecigonzalo

comment created time in 16 hours

issue commentsourcegraph/sourcegraph

Precise code intel API server error during background config update

We asked the customer to verify if this is a cluster connectivity issue by trying to reach the service from other pods in the cluster running in the same node.

dadlerj

comment created time in 17 hours

PR opened sourcegraph/deploy-sourcegraph-docker

Increase minimum interval and timeout for all services

<!-- Kubernetes and Docker Compose MUST be kept in sync. You should not merge a change here without a corresponding change in the other repository, unless it truly is specific to this repository. -->

Sister deploy-sourcegraph change:

<!-- add link or explanation of why it is not needed here -->

+9 -9

0 comment

1 changed file

pr created time in 2 days

PR opened sourcegraph/deploy-sourcegraph

Increase minimum failureThreshold and periodSeconds

Change the failureThreshold and periodSeconds of searcher and indexed-searcher to be less strict.

<!-- Kubernetes and Docker Compose MUST be kept in sync. You should not merge a change here without a corresponding change in the other repository, unless it truly is specific to this repository. -->

Sister deploy-sourcegraph-docker change:

<!-- add link or explanation of why it is not needed here -->

+4 -4

0 comment

2 changed files

pr created time in 2 days

create barnchsourcegraph/deploy-sourcegraph

branch : feature/readinessProbeforAll

created branch time in 2 days

pull request commentsourcegraph/deploy-sourcegraph

overlays: collect PVCs into a separate base

Wouldn't this be a problem anyway because we specify the PVC in the volume definition of the Deployment?

uwedeportivo

comment created time in 3 days

push eventsourcegraph/sourcegraph

Gonzalo Peci

commit sha 2add748b30069ad2770629285712a776dbf23bf5

repo-updater: Add /healthz endpoint (#12880) * repo-updater: Add /healthz endpoint Currently, it does not check for anything, but it will help ensure the http service is up before rolling new images * Update cmd/repo-updater/repoupdater/server.go Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com> * Update cmd/repo-updater/repoupdater/server.go Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com> * Use simple function for /healthz as in enterprise Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>

view details

push time in 3 days

delete branch sourcegraph/sourcegraph

delete branch : feature/repo-updater-healthcheck

delete time in 3 days

PR merged sourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

Currently the healthz endpoint does not check for anything, but it will help ensure the http service is up before rolling new images. Ill make another change to deploy-sourcegraph to add the K8s probe.

Notes

handleHealthz does not need to be a method of Server, but it seems all handlers are part of that class, so I assumed we attach them there by convention.

<!-- Reminder: Have you updated the changelog and relevant docs (user docs, architecture diagram, etc) ? -->

+3 -0

2 comments

1 changed file

pecigonzalo

pr closed time in 3 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

Last week

Kicked-off 360 review cycle and I was focused on that. I paired with Geoffrey to get more familiar with our Dhall implementation and architecture. I meet with Eric to talk about running code intel on firecracker VMs and how would we deploy those.

This week

Ill be working mainly on our 360 reviews and 3.20 planning. Ill also like to Dhall, I would like to do more testing now that I understand its structure better. Im also working on improving our incidents pipeline so its easier to track the status and number of active incidents.

Team update

Given the current number of customer issues, we are over our original estimates for support time, which might have an impact on the delivery of all listed items. We started planning 3.20 last week and should by the end of the week.

pecigonzalo

comment created time in 3 days

pull request commentsourcegraph/deploy-sourcegraph

repo-updater: Add readinessrobe

@bobheadxi Yep, I will not merge that until the image is released, I created the PR so I could link it to the other one and show what I'm changing.

pecigonzalo

comment created time in 3 days

Pull request review commentsourcegraph/about

Update customer issues process

+# Filing customer issues++Read [the support overview](index.md) before filing an issue.++## Create a customer issue++Customer support tickets should be translated to GitHub issues. In such cases, [create a+new issue for the request](https://github.com/sourcegraph/customer/issues/new).

I believe incidents are for the most part customer specific, a customer having a problem related in many cases to its particular environment or deployment. There might be a set of general issues to be created that are related and must be linked to the incident, and those should be done in /sourcegraph where they can be :+1: and publicly discussed. This works in a similar way to incident response workflows, were the immediate issue is worked from the generated alert (and in some cases automated issue) and later on we create any follow-up issues for the permanent or general fixes.

The main concerns is that right now there does not seem to be a clear funnel and a lot of this flows through a Slack thread or conversation, or linked from Jira in Slack, which makes it impossible to get an overview of the current issues or past issues we worked on over an iteration. Additionally, issues in /sourcegraph because we cant label or add any customer information, its quite hard to group by customer and get an overview of a customer incidents.

I think the tradeoff of having 2 issues trackers in GitHub (one for most of our issues and one for customer specific ones) for the issues that engineering has to work on is fairly minimal, and creates a simple funnel for all incidents, which at present are distributed between many places (Slack, Jira, etc). Ideally, we would have a single issue tracker, but unfortunately that is not possible when mixing customer and public incidents, and I think additionally having 2 trackers, makes it harder to leak information by incorrectly setting a message or setting an issue to private (if we had a tracker that allowed that).

pecigonzalo

comment created time in 3 days

push eventsourcegraph/sourcegraph

Gonzalo Peci

commit sha ab4df58ff99645a9b534ad3047af9a720d5e9d9f

Use simple function for /healthz as in enterprise

view details

push time in 3 days

pull request commentsourcegraph/about

Update customer issues process

I think we can work out the details here and then create a new draft PR.

pecigonzalo

comment created time in 3 days

pull request commentsourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

What do you think about just doing something like its done in enterprise/

mux.HandleFunc("/healthz", func(w http.ResponseWriter, _ *http.Request) {
		w.WriteHeader(http.StatusOK)
	})

Instead of doing a method of Server given we are not checking anything?

pecigonzalo

comment created time in 3 days

Pull request review commentsourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

 func (s *Server) Handler() http.Handler { 	return mux } +func (s *Server) handleHealthz(w http.ResponseWriter, r *http.Request) {+	w.WriteHeader(200)+	_, err := w.Write([]byte("ok"))+	if err != nil {+		log15.Info("Error checking /healthz: " + err.Error())

No, I copy/pasted this from one of our other services to keep it consistent.

pecigonzalo

comment created time in 3 days

push eventsourcegraph/sourcegraph

Gonzalo Peci

commit sha f8794299bbc9f823b6fc4145e6c1b04faa1a204c

Update cmd/repo-updater/repoupdater/server.go Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>

view details

push time in 3 days

push eventsourcegraph/sourcegraph

Gonzalo Peci

commit sha a6090e80d8199133625e68c681c5a4d628b4cfda

Update cmd/repo-updater/repoupdater/server.go Co-authored-by: ᴜɴᴋɴᴡᴏɴ <joe@sourcegraph.com>

view details

push time in 3 days

PR opened sourcegraph/deploy-sourcegraph

repo-updater: Add readinessrobe

<!-- Kubernetes and Docker Compose MUST be kept in sync. You should not merge a change here without a corresponding change in the other repository, unless it truly is specific to this repository. -->

Sister deploy-sourcegraph-docker change:

<!-- add link or explanation of why it is not needed here -->

+8 -0

0 comment

1 changed file

pr created time in 3 days

push eventsourcegraph/deploy-sourcegraph

Gonzalo Peci

commit sha 0252dd9eec9fc39b579e7941a9cfe3e2dec337d5

fixup! repo-updater: Add readinessProbe

view details

push time in 3 days

PR opened sourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

Currently, it does not check for anything, but it will help ensure the http service is up before rolling new images

<!-- Reminder: Have you updated the changelog and relevant docs (user docs, architecture diagram, etc) ? -->

+10 -0

0 comment

1 changed file

pr created time in 3 days

create barnchsourcegraph/sourcegraph

branch : feature/repo-updater-healthcheck

created branch time in 3 days

delete branch sourcegraph/sourcegraph

delete branch : feature/repo-updater-healthcheck

delete time in 3 days

PR closed sourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

<!-- Reminder: Have you updated the changelog and relevant docs (user docs, architecture diagram, etc) ? -->

+108 -5

0 comment

8 changed files

pecigonzalo

pr closed time in 3 days

PR opened sourcegraph/sourcegraph

repo-updater: Add /healthz endpoint

<!-- Reminder: Have you updated the changelog and relevant docs (user docs, architecture diagram, etc) ? -->

+108 -5

0 comment

8 changed files

pr created time in 3 days

create barnchsourcegraph/sourcegraph

branch : feature/repo-updater-healthcheck

created branch time in 3 days

pull request commentsourcegraph/about

Update customer issues process

@slimsag I have merged this with the updated text, please let me know this does not fix your concern and ill address it on a separate PR.

pecigonzalo

comment created time in 3 days

push eventsourcegraph/about

Gonzalo Peci

commit sha 0c192a40f349f85aedde91d7098035ba6fd5d9a2

Update customer issues process (#1371) * Create Github issues for customer issues Guide CE to create Github issues in our private tracker and use labels to group customers * fixup! Create Github issues for customer issues * fixup! Create Github issues for customer issues * Improve the descrpition of incidents and fix a link * Update the process for handling customer issues * Clarify where to remove private information from * Clarify when to create GitHub issues

view details

push time in 3 days

delete branch sourcegraph/about

delete branch : gp/issues

delete time in 3 days

PR merged sourcegraph/about

Reviewers
Update customer issues process

I would like to ensure all customer issues are tracked and created in GitHub. Right now, some issues are created, some are in other tracking systems, which makes it difficult to analyze the types of issues we have, or how many there are, etc.

As most of our workflow is already in GitHub, I think it will be a good idea to continue to do it here, as we can benefit from adding issues to labels, projects and milestones, as well as cross-link them to PRs or other issues. We could, for example, use labels to categorize issues.

The downside of using GitHub is that there is no out of the box tool to analyze the data for any MTTR or other types of report. Given those are not required at the moment, I dont think it will be a problem.

Relates to: https://github.com/sourcegraph/sourcegraph/issues/11904

+43 -48

1 comment

7 changed files

pecigonzalo

pr closed time in 3 days

startedliljencrantz/crush

started time in 6 days

issue openedsourcegraph/sourcegraph

WIP: Distribution 3.20 Tracking issue

Plan

<!-- Summarize what the team wants to achieve this iteration.

  • What are the problems we want to solve or what information do we want to gather?
  • Why is solving those problems or gathering that information important?
  • How do we plan to solve those problems or gather that information? -->

Availability

If you have planned unavailability this iteration (e.g., vacation), you can note that here.

Tracked issues

<!-- BEGIN WORK --> <!-- END WORK -->

Legend

  • 👩 Customer issue
  • 🐛 Bug
  • 🧶 Technical debt
  • 🛠️ Roadmap
  • 🕵️ Spike
  • 🔒 Security issue
  • :shipit: Pull Request

created time in 6 days

Pull request review commentsourcegraph/about

Update customer issues process

+# Filing customer issues++Read [the support overview](index.md) before filing an issue.++## Create a customer issue++Customer support tickets should be translated to GitHub issues. In such cases, [create a+new issue for the request](https://github.com/sourcegraph/customer/issues/new).

@slimsag updated, please review

pecigonzalo

comment created time in 6 days

Pull request review commentsourcegraph/about

Update customer issues process

+# Filing customer issues++Read [the support overview](index.md) before filing an issue.++## Create a customer issue++Customer support tickets should be translated to GitHub issues. In such cases, [create a+new issue for the request](https://github.com/sourcegraph/customer/issues/new).++Provide the appropriate context and add a label with the affected customer as `customer/$name`. Once its created, sharing it with the required [team](routing_questions.md).+If necessary, link to the appropriate JIRA Service Desk ticket or [HubSpot](#find-the-unique-company-url) notes.++### General issues++General issues are those that affect more users than those of a particular deployment. In such cases, create a [new issue for the request](https://github.com/sourcegraph/sourcegraph/issues/new/choose) describing it. If there was a previous [customer issue](##create-a-customer-issue), please link the issue in its description.++Remove any potentially private information (e.g. individual people's names, company names, self-hosted Sourcegraph URLs, repo names, screenshots, etc.)

@uwedeportivo updated, please review

pecigonzalo

comment created time in 6 days

push eventsourcegraph/about

Gonzalo Peci

commit sha e67792a509745a3b2fe94cccfd6e75df009dc7e2

Clarify where to remove private information from

view details

Gonzalo Peci

commit sha f35a86076f74de8ea77f2ba1125243fc3b3eafbd

Clarify when to create GitHub issues

view details

push time in 6 days

Pull request review commentsourcegraph/about

Update customer issues process

+# Filing customer issues++Read [the support overview](index.md) before filing an issue.++## Create a customer issue++Customer support tickets should be translated to GitHub issues. In such cases, [create a+new issue for the request](https://github.com/sourcegraph/customer/issues/new).

Ill rephrase it to indicate when it requires other engineers. While its not the intention for all the issues they manage to get a GitHub issue associated, I think it would be a useful metric for CE to be able to analyze what should we document more or what are the frequent type of questions we get.

pecigonzalo

comment created time in 6 days

Pull request review commentsourcegraph/about

Update customer issues process

+# Filing customer issues++Read [the support overview](index.md) before filing an issue.++## Create a customer issue++Customer support tickets should be translated to GitHub issues. In such cases, [create a+new issue for the request](https://github.com/sourcegraph/customer/issues/new).++Provide the appropriate context and add a label with the affected customer as `customer/$name`. Once its created, sharing it with the required [team](routing_questions.md).+If necessary, link to the appropriate JIRA Service Desk ticket or [HubSpot](#find-the-unique-company-url) notes.++### General issues++General issues are those that affect more users than those of a particular deployment. In such cases, create a [new issue for the request](https://github.com/sourcegraph/sourcegraph/issues/new/choose) describing it. If there was a previous [customer issue](##create-a-customer-issue), please link the issue in its description.++Remove any potentially private information (e.g. individual people's names, company names, self-hosted Sourcegraph URLs, repo names, screenshots, etc.)

Ill rephrase it, I meant it to ensure no private information is on the general issue filed on sourcegrpah/sourcegraph

pecigonzalo

comment created time in 7 days

PR opened sourcegraph/about

Reviewers
Update customer issues process

I would like to ensure all customer issues are tracked and created in GitHub. Right now, some issues are created, some are in other tracking systems, which makes it difficult to analyze the types of issues we have, or how many there are, etc.

As most of our workflow is already in GitHub, I think it will be a good idea to continue to do it here, as we can benefit from adding issues to labels, projects and milestones, as well as cross-link them to PRs or other issues. We could, for example, use labels to categorize issues.

The downside of using GitHub is that there is no out of the box tool to analyze the data for any MTTR or other types of report. Given those are not required at the moment, I dont think it will be a problem.

+43 -48

0 comment

7 changed files

pr created time in 7 days

create barnchsourcegraph/about

branch : gp/issues

created branch time in 7 days

pull request commentsourcegraph/sourcegraph

Document when to introduce new services or not

I think many of the items listed on the "additional complexity" section could be misleading as most apply to any new feature/service. Metrics, alerts, docs, update the deployment, how it scales, etc. need to be thought about, updated and/or created regardless of it being a new service or its developed as part of an existing service.

slimsag

comment created time in 7 days

pull request commentsourcegraph/about

distribution roadmap

@sqs It is, but we have to update it to match our current goals.

slimsag

comment created time in 8 days

issue commentsourcegraph/sourcegraph

Disable low resource utilization alerts

We have some changes already in 3.19 that might mitigate this, and we will re-review this issue after that release.

pecigonzalo

comment created time in 8 days

pull request commentsourcegraph/sourcegraph

monitoring: encourage silencing, render entry for alerts w/o solutions

@bobheadxi I dont share that concern, if the have a link or a clear relationship to how to silence alerts, I think its actually more likely that someone will search for "silence alerts sourcegraph" than wait for an alert to pop up. If we want to point them in the right direction, we could link to the "how to silence" page/section from the alert. If its hard to find the sections and understand our documentation on how to monitor and manage alerts, we should fix that instead.

In general, I would actually not encourage silencing without expire, as its likely the silence will remain there after the issue is fixed.

bobheadxi

comment created time in 8 days

pull request commentsourcegraph/sourcegraph

monitoring: encourage silencing, render entry for alerts w/o solutions

I think it would be simpler to have a header that says "silencing alerts" and tells you how to do it in a generic fashion and we can reference/link that instead.

Silence an alert

If you are aware of an alert and want to silence notifications for it, add the following to your site configuration:

{
  "observability.silenceAlerts": [
    "ALERT_NAME"
  ]
}

You can find the ALERT_NAME on lorem ipsum

bobheadxi

comment created time in 8 days

pull request commentsourcegraph/about

cloud: document manual migrations we're performing

I think there are several reasons to avoid supporting that in the service. Security wise, if the service needs to create the user, it means the service has admin permissions to create users and other functionality, as it requires them for migration, which is in most cases something undesired.

Aside from that, while you can perform any action in a migration, as it can execute any SQL command, I dont think this is a migration, as the scope of change its outside of its own database. Migrations traditionally manage the schema of a database, but not the database itself or any other part of the system. I would say this is provisioning and configurations which, in my opinion, is not in the scope of the service. Allowing admins to provision and manage their system using their own tools favors composability and reduces the amount permutations we have to account for when validating we can provision a database because we dont allow customers to do it on their own. We already have customers with restricted environments which will most likely hit this issue.

Just to clarify, we already require admins to provision a database for Sourcegraph, performing this migrations will require them to have a dedicated database server, not just a database for Sourcegraph.

slimsag

comment created time in 9 days

startedsamber/awesome-prometheus-alerts

started time in 9 days

push eventpecigonzalo/mothership

push time in 9 days

push eventpecigonzalo/mothership

Gonzalo Peci

commit sha 829fd8a9ce82429ced4ab684a3713ff781cc6f29

--wip-- [skip ci]

view details

Gonzalo Peci

commit sha 7c76159efc71727d3ef247124f780b76d40f2dc4

--wip-- [skip ci]

view details

Gonzalo Peci

commit sha a2dad90b6a71eea14386e1b314156d4939616856

--wip-- [skip ci]

view details

Gonzalo Peci

commit sha 3f8d4786efdb1ac3d78a9b2e3a0a954bdc097623

--wip-- [skip ci]

view details

Gonzalo Peci

commit sha 2a0eecc6843c2b6186ffb61a9afa6bbf4f28ffd3

Fix network compat for ansible

view details

push time in 9 days

issue openedsourcegraph/sourcegraph

Disable low resource utilization alerts

We currently have multiple [WARNING] symbols: less than X that notify about services or resources that are over-provisioned. As we are not periodically reviewing and actioning these alerts, we want to remove their notifications and re-asses how to implement these alerts in the future.

Task

  • [ ] Disable low resource utilization notifications via Slack
  • [ ] Disable low resource utilization notifications via site-admin

created time in 10 days

issue openedsourcegraph/src-cli

Disable low resource utilization alerts

We currently have multiple [WARNING] symbols: less than X alerts which fire at different thresholds and levels that notify about services that are over-provisioned. As we are not periodically reviewing and actioning the alerts, we want to remove their notifications and re-asses how to implement these alerts in the future.

Task

  • [ ] Disable low resource utilization notifications via Slack
  • [ ] Disable low resource utilization notifications via site-admin

created time in 10 days

pull request commentsourcegraph/about

cloud: document manual migrations we're performing

@tsenart I dont think there should be for indexes, unless we are testing something and even then, they could be done in code and feature flagged or reverted.

There are another cases that are being referenced here, which are non-application related tasks, as the RO user creation. Those types of changes or settings should not be part of the application migration, as they are not part of the application, they are more on the side of provisioning and deployment and are not relevant to all environments and deployments.

Lets say we would like to require one or multiple read-only users, we should not impose how those are created as different environments will deploy and configure their databases differently, some might not even grant the Sourcegraph migration script the permissions to create users. As an administrator, I should be able to choose to use my own database and administer it following my internal requirements and guidelines as long as it meets the requirements by the Sourcegraph service.

slimsag

comment created time in 10 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

Last week

We finished our initial team goals, I also finalized the review of RFC-199. We will make we test using microVMs with ignite for a v0 and will have to review the outcome of that testing before we can move to v1 and define how we deploy/support/HA/etc.

This week

We will kick-off our 360 review cycle and I will focus on that. Ill be working on the roadmap and a product readiness document with Stephen and will pair with Geoffrey to get more familiar with our Dhall implementation. I have not been able to progress RFC-202 and if time allows I would like to finish that up.

Team update

The high priority items from last week seem to be resolved, and we will return to our tracking issue priorities.

Ill update this again after I confirm those issues are resolved

pecigonzalo

comment created time in 10 days

delete branch sourcegraph/about

delete branch : gp/commitments

delete time in 13 days

push eventsourcegraph/about

Gonzalo Peci

commit sha 539795f5b3b8b05f85d88ab44eb63fc2f822e9fd

distribution: Creating GCP commitments (#1312)

view details

push time in 13 days

PR merged sourcegraph/about

distribution: Creating GCP commitments

Document current commitments and the commitment creation process.

+47 -0

0 comment

2 changed files

pecigonzalo

pr closed time in 13 days

PR opened sourcegraph/about

distribution: Creating GCP commitments

Document current commitments and the commitment creation process.

+47 -0

0 comment

2 changed files

pr created time in 13 days

create barnchsourcegraph/about

branch : gp/commitments

created branch time in 13 days

delete branch sourcegraph/about

delete branch : gp/distribution-goals

delete time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha a601f73e85e97ac3fd1c0b1ec32e8b467e344e78

Add distribution team goals (#1294) * Add distribution team goals * fixup! Add distribution team goals * Guide goals update * Update planning processs * fixup! Guide goals update * Update handbook/engineering/distribution/goals.md Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com> * fixup! Guide goals update Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in 15 days

PR merged sourcegraph/about

Add distribution team goals

Adds our initial team goals

+49 -3

0 comment

3 changed files

pecigonzalo

pr closed time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha f76df3a324fd67d4e8aef355cc592ed72cd77354

fixup! Guide goals update

view details

push time in 15 days

Pull request review commentsourcegraph/about

Add distribution team goals

+# Goals++Goals are continuously updated and reviewed. If you find these goals do not reflect our current priorities or are out of date, please update them as soon as possible or add it as a topic to our [weekly sync](recurring_processes.md#weekly-distribution-team-sync).++## Medium-term goals++### Any engineer at Sourcegraph can create a release for all of our supported deployment types by running a single command++Creating a new release for our deployments is currently a semi-automated process, which requires several manual steps and synchronizing our versioned artifacts (Sourcegraph, Kubernetes manifests, docker-compose manifests, etc). We want to enable any engineer to perform a release as often as needed, to enable this we want to make releasing Sourcegraph a simple, automated process.++- **Owner**: Distribution Team+- **Status**: In Progress+- **Outcomes**:+  - Releases can be triggered by a single manual step+  - All supported deployment types are released at the same time with the same command+  - Support documentation enables any engineer to perform a release with confidence++### Upgrades between releases are easy to perform++Performing upgrades to deployments is currently a complicated process that requires keeping a fork of our configuration and resolving diff conflicts when performing upgrades which are often complicated as the configuration might contain environment-specific customization. This process creates a bad experience for our customers because of the unknown amount of effort of the upgrade process.+We will start by looking at our Kubernetes deployment and working on an easier update process.++- **Owner**: Distribution Team+- **Status**: In Progress+- **Outcomes**:+  - Upgrades to deployments do not require resolving diff conflicts from upstream+  - Upgrading a deployment configuration requires less than 2 hours of work++### Improve the debugging and troubleshooting process+As we deploy Sourcegraph to multiple dissimilar environments, we need to provide a consistent and straight forward process. We will initially focus on reducing the time it takes to collect troubleshooting information.

As we deploy Sourcegraph to multiple dissimilar environments, we need to provide a consistent and straight forward process to debug issues. We are currently lacking tools to collect debugging information (configuration, type, size, diff from upstream, etc) consistently and a process to capture the output of debugging sessions to feed back into our priorities and documentation. We will initially focus on reducing the time it takes to collect troubleshooting information.

I think this might reflect better.

pecigonzalo

comment created time in 15 days

delete branch sourcegraph/about

delete branch : gp/terraform-state-guide

delete time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha fcef7b5a2ad197b3f9e0448f5e5acf5caa024103

Document Terraform state styleguide (#1297) * Document Terraform state styleguide * Set @sourcegraph/distribution as CODEOWNERS for terraform style

view details

push time in 15 days

PR merged sourcegraph/about

Reviewers
Document Terraform state styleguide

This will add information about our standard for terraform state configuration.

+31 -0

1 comment

2 changed files

pecigonzalo

pr closed time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha 2cd48c9576861d3ade8a99e4e5d335a2cf3557a5

Update handbook/engineering/distribution/goals.md Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha b2e3f6f4cb7fb6121a5a8c946ef0d6edd59cd579

Set @sourcegraph/distribution as CODEOWNERS for terraform style

view details

push time in 15 days

Pull request review commentsourcegraph/about

Document Terraform state styleguide

  - General Terraform [styleguide](https://www.terraform.io/docs/configuration/style.html) +## State++State must be stored using a [GCS Terraform state backend](https://www.terraform.io/docs/backends/types/gcs.html).++Example configuration+```+terraform {+  required_version = "0.12.26"++  backend "gcs" {+    bucket = "sourcegraph-tfstate"+    prefix = "infrastructure/dns"+  }+}+```++### State for state buckets++Because we need to create state buckets as code, we also need to store the state of the code that creates the state bucket. Given this code rarely changes and that moving it to be stored in a remote location creates a chicken and egg situation, we will store state bucket creation's state in Git.

I have that on a shirt :D

pecigonzalo

comment created time in 15 days

PR opened sourcegraph/about

Reviewers
Document Terraform state styleguide

This will add information about our standard for terraform state configuration.

+30 -0

0 comment

1 changed file

pr created time in 15 days

create barnchsourcegraph/about

branch : gp/terraform-state-guide

created branch time in 15 days

issue closedsourcegraph/sourcegraph

Migrate terraform state to GCP

Currently the following terraform deployments rely on local state, and developers running a terraform apply then checking in there code + a state file into the repo. In doing so we assume the risk that developers could corrupt a state file or forget to check it in.

The following terraform deployments should be migrated to use remote state in GCP:

  • [ ] https://github.com/sourcegraph/infrastructure/tree/master/cloud
  • [ ] https://github.com/sourcegraph/infrastructure/tree/master/dns
  • [ ] https://github.com/sourcegraph/infrastructure/tree/master/site24x7

TODO

Determine naming scheme for each deployment

closed time in 15 days

davejrt

push eventsourcegraph/about

Gonzalo Peci

commit sha 7556590c718852da480ed2cf79bffc151c99dd83

fixup! Guide goals update

view details

push time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha 454a2ae757d1e8d290c194cf9c488065457804c6

Guide goals update

view details

Gonzalo Peci

commit sha e58d6b0d53f21c70438410f1aae98bdbe40b064f

Update planning processs

view details

push time in 15 days

push eventsourcegraph/about

Gonzalo Peci

commit sha 155a608b272bb9d9c063dc288cb4db668aa9d6c2

fixup! Add distribution team goals

view details

push time in 15 days

PR opened sourcegraph/about

Add distribution team goals
+43 -0

0 comment

2 changed files

pr created time in 15 days

create barnchsourcegraph/about

branch : gp/distribution-goals

created branch time in 15 days

pull request commentsourcegraph/sourcegraph

search: trace and observe each zoekt host

I would be careful with sort of metrics as they can create a cardinality explosion for the metrics DB. In most cases, I believe the downstream service should actually provide the metrics if possible, and this service only provide his health (general latency to the upstream, etc)

keegancsmith

comment created time in 16 days

startedKartikChugh/Otto

started time in 16 days

pull request commentsourcegraph/sourcegraph

search: trace and observe each zoekt host

In theory, each service should already have an identifier as prometheus adds it to it from discovery. Ill verify.

keegancsmith

comment created time in 16 days

fork pecigonzalo/beancount

Official Beancount repository.

fork in 16 days

startedbeancount/beancount

started time in 16 days

fork pecigonzalo/fava

Fava - web interface for Beancount

https://beancount.github.io/fava/

fork in 16 days

startedbeancount/fava

started time in 16 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

Week July 20

Last week focus has been working with the team to set our team goals. The test around using GitHub projects for tracking progress seems to be working and ill continue with this during the rest of the iteration. I have been also talking with Chayim about the secrets loading implementation.

Week July 27

Ill continue to focus on setting our team goals, we have settled on them but we are still working out the details. I will also try to finalize RFC-202 and the review of RFC 199.

Team update

Issue https://github.com/sourcegraph/customer/issues/65 has been resolved, but our focus remains on the sub-issues created by it https://github.com/sourcegraph/customer/issues/69 and https://github.com/sourcegraph/customer/issues/70.

pecigonzalo

comment created time in 17 days

Pull request review commentsourcegraph/about

update values

 # Sourcegraph values -Our values are:+These values are some of the beliefs and principles that help us achieve our [goals](goals/index.md) and [vision](strategy.md#vision). -## People+This list isn't intended to cover everything we care about; instead, it lists the values that we frequently find useful and refer to. We'll keep this list up to date with the frequently used beliefs and principles (adding, editing, and removing entries as needed). Our hope is that this makes this list more accurate and useful than if it were a list of stale, vague, aspirational, or obvious values. -Together we are advancing technology for the good of people all around the world. We will attract, hire and retain the best teammates in the world and treat everyone in a first-class manner.+## High quality -## Journey+Every person on our team is individually responsible for knowing what high-quality work looks like and producing high-quality work.

I like this point, but its description seems to be more aimed towards self-management than delivering high quality.

sqs

comment created time in 17 days

pull request commentsourcegraph/sourcegraph

monitoring: implement owner routing for alerts

I'll add something to check on the rendered routes

Exactly, we should not test Alertmanager itself or that it routes properly, only that we render the expected config given X input.

bobheadxi

comment created time in 17 days

pull request commentsourcegraph/sourcegraph

monitoring: implement owner routing for alerts

I would use owners similarly to how we use level and not onLevel. Could we add some tests to this to ensure it behaves as expected without doing a full deployment?

bobheadxi

comment created time in 17 days

startedhwayne/awesome-cold-showers

started time in 18 days

issue commentsourcegraph/sourcegraph

Bare-metal Buildkite agents capable of running Docker and VMs

Maybe we could use https://github.com/firecracker-microvm/firecracker in a similar way as RFC-199

slimsag

comment created time in 20 days

issue commentsourcegraph/sourcegraph

Proposal: Move monitoring configuration closer to the service code, use TOML

It would be great to define or link the original problem we are trying to fix. I believe there was some talk about it in relation to https://github.com/sourcegraph/about/pull/1221 but I cant find the content to link it directly, maybe it was talked about on a meeting.

The main benefit I see with the Go based generator is around generating dashboards for Grafana from the same code we use to define rules and all the wrapping we had to do because of alert_count. I think ideally services would define their dashboards and alerts automagically as part of their service definition/code like they define their metrics, although this could end up being even more complex.

The main downsides for me are around readability and easy to grok output from code because of the abstraction, this was made more complicated for me because part is its config is scattered between places (siteconfig, generator, static files, configmap). Maybe this is not a problem of the generator itself and its just about making it simpler to onboard to by cleaning old configs, reducing the amount of places this is spread on and documenting what goes where and how.

As an example, without taking into account Grafana and wrapping, an alert rule without all the wrapping is quite simple to understand and even write.

groups:
- name: replacer
  rules:
  - alert: replacer_frontend_internal_api_error_responses
    expr: sum by(category) (increase(src_frontend_internal_request_duration_seconds_count{job="replacer",code!~"2.."}[5m])) > X
    labels:
      level: warning
      service_name: replacer
    annotations:
      summary: SOME summary
      description: SOME description

Or even "simpler" if we use generic alerts instead, as we will not have multiple copies of this same alert

groups:
- name: api_error_responses
  rules:
  - alert: frontend_internal_api_error_responses
    expr: sum by(category) (increase(src_frontend_internal_request_duration_seconds_count{code!~"2.."}[5m])) > X
    labels:
      level: warning
      service_name: "{{ $labels.job }}"
    annotations:
      summary: "SOME {{ $labels.foo }} summary"
      description: |
        SOME {{ $labels.bar }} description

Than the following Go definition, which I also need to go to each type and function to understand what it will do.

package main

func Replacer() *Container {
	return &Container{
		Name:        "replacer",
		Title:       "Replacer",
		Description: "Backend for find-and-replace operations.",
		Groups: []Group{
			{
				Title: "General",
				Rows: []Row{
					{
						sharedFrontendInternalAPIErrorResponses("replacer"),
					},
				},
			},
		},
	}
}

A rule with all the wrapping would be really ugly because we have to repeat the wrapper everywhere.

clamp_max(clamp_min(floor(
      max((((( 
THE_REAL_QUERY
) OR on() vector(0)) >= 0) OR on() vector(1))
      ), 0), 1) OR on() vector(1)

Therefore, I assume the generator is mainly there to help us resolve 3 things:

  • Generating Grafana dashboards because they are cumbersome to write and read
  • Generating wrapping due to alert_count and other current requirements (this could be changed/fixed)
  • Ensuring alerts/rules conform to certain requirements

I strongly feel Dhall would not be a good choice here, that feels like raising the barrier even further for most on the team.

I think this is a valid concern for Dhall right now, but if we use it for Kuberentes then it will be required anyway, otherwise the same concern applies there.

slimsag

comment created time in 20 days

more