profile
viewpoint

pull request commentopenshift/machine-config-operator

Bug 1775009: pkg/controller: do not enqueue a nil MCP

/cherrypick release-4.2

runcom

comment created time in an hour

PR opened openshift/machine-config-operator

Bug 1775009: pkg/controller: do not enqueue a nil MCP

if a node, for any reason, belongs to only a custom pool, do not enqueue a nil worker mcp as it would panic when processed.

Signed-off-by: Antonio Murdaca runcom@linux.com

+5 -1

0 comment

1 changed file

pr created time in an hour

pull request commentopenshift/machine-config-operator

Bug 1772440: OpenStack: decrease infra pods resource consumption

/bugzilla refresh

Fedosin

comment created time in an hour

create barnchruncom/machine-config-operator

branch : worker-nil

created branch time in an hour

pull request commentopenshift/machine-config-operator

Add manifest annotations for hosted deployment exclusions

/hold cancel

keeping the hold till @derekwaynecarr comes back since we're past freeze

csrwng

comment created time in 2 days

pull request commentopenshift/machine-config-operator

baremetal: Add external DNS entries to coredns

/hold

since freeze we now need a BZ for it

cybertron

comment created time in 2 days

pull request commentopenshift/machine-config-operator

[Baremetal] Haproxy add support for IPv6 frontend

/hold

since freeze

yboaron

comment created time in 2 days

pull request commentopenshift/machine-config-operator

[Baremetal] Haproxy add support for IPv6 frontend

/approve

yboaron

comment created time in 2 days

issue commentopenshift/machine-config-operator

Add an option to cancel current deployment

TL DR: any way to cancel the current deployment/sync?

delete the MC? if the rollout isn't finished, it'll stop since you no longer want that.

Imagine you added a validated but wrong file config that cause workers to fails in any way

if the worker fails at a system/kube level, the MCO won't roll out to other nodes, this is how it works.

gmontalvoy

comment created time in 3 days

pull request commentopenshift/machine-config-operator

Bug 1770273: Verify all containers are dead in stop_all_containers() function

/approve

@hexfusion can you take a last pass on this?

retroflexer

comment created time in 6 days

push eventruncom/machine-config-operator

Antonio Murdaca

commit sha 5b97432531b9e56c0a598c911a1390443fd37a07

pkg/controller: allow kubelet config and runtime changes for custom pools Having kubelet config or runtime MCs for custom pools isn't possible today. The reason for that was to avoid risking drift between workers when it comes to kubelet and runtime configs. This patches changes that behavior by allowing custom pools to use the worker base templates in order to generate MCs for kubelet and runtime configs. Signed-off-by: Antonio Murdaca <runcom@linux.com>

view details

push time in 6 days

pull request commentopenshift/machine-config-operator

Bug 1755558: Ensure ETCD_INITIAL_CLUSTER is preserved during restore

re-echoing the lgtm and leaving to patch manager

/lgtm

smarterclayton

comment created time in 6 days

pull request commentopenshift/machine-config-operator

etcd: Add initial support for an IPv6 control plane

This...seems safer to do in 4.4 to me. But I know that's annoying as it makes development now harder.

I'm agreeing with this actually, we cannot take this PR anyway at this point tho. It looks pretty safeto me tho so when 4.4 opens it can go in

russellb

comment created time in 6 days

delete branch runcom/machine-config-operator

delete branch : dockefile-fixes

delete time in 18 days

PR opened kikisdeliveryservice/telemeter

update manifests

Signed-off-by: Antonio Murdaca runcom@linux.com

+6 -1

0 comment

4 changed files

pr created time in 21 days

create barnchruncom/telemeter

branch : generate-manfiests

created branch time in 21 days

fork runcom/telemeter

Prometheus push federation

fork in 21 days

pull request commentopenshift/telemeter

[WIP] metrics: add telemetry to track mcd_host_os_and_version

So, using Docker (on osx) yields the same as @LiliC:

➜  telemeter git:(pr/origin/257) ✗ git st
On branch pr/origin/257
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   docs/data-collection.md
	modified:   docs/sample-metrics.md
	modified:   manifests/benchmark/statefulSetTelemeterServer.yaml
	modified:   manifests/client/deployment.yaml

no changes added to commit (use "git add" and/or "git commit -a")

@kikisdeliveryservice are you using podman on fedora? any chance you'd install docker from upstream and retry? (Dan Walsh is gonna kill me but 🤷‍♂ )

kikisdeliveryservice

comment created time in 21 days

pull request commentopenshift/machine-config-operator

install/telemetry: add prometheus alerts

/skip /lgtm

kikisdeliveryservice

comment created time in 21 days

pull request commentopenshift/machine-config-operator

daemon: Refuse to disable FIPS mode

This makes sense to me. Should we refuse to let them change FIPS at all since Day 2 wouldn't be valid?

yeah, I was actually thinking if day2 isn't valid at all we might want to go the route to completely drop this FIPS from MC and either you install with fips or not. That would be the safest option to me w/o providing any way to disable it through the MCO

cgwalters

comment created time in 21 days

PR opened openshift/ocp-build-data

MCO: add runcom and steve to MCO build

Adding myself and Steve for notifications (so that when it breaks we're aware instantly)

cc @ashcrow

Signed-off-by: Antonio Murdaca runcom@linux.com

+2 -0

0 comment

1 changed file

pr created time in 21 days

create barnchruncom/ocp-build-data

branch : add-runcom-steve

created branch time in 21 days

fork runcom/ocp-build-data

Configuration data used to build OCP images

fork in 21 days

PR opened openshift/machine-config-operator

remove make binaries from Dockerfiles and use -mod=vendor when building

Fixes an issue with builds in ART as well

cc @cgwalters

+1 -3

0 comment

3 changed files

pr created time in 21 days

create barnchruncom/machine-config-operator

branch : dockefile-fixes

created branch time in 21 days

pull request commentopenshift/machine-config-operator

install/telemetry: add prometheus alerts

/approve /lgtm

kikisdeliveryservice

comment created time in 21 days

pull request commentopenshift/machine-config-operator

[fcos] Add vrutkovs to OWNERS

/override ci/prow/build-rpms-from-tar /override ci/prow/e2e-aws

vrutkovs

comment created time in 21 days

pull request commentopenshift/machine-config-operator

[fcos] Add vrutkovs to OWNERS

I doubt I can, but trying

/override

vrutkovs

comment created time in 21 days

pull request commentopenshift/machine-config-operator

[fcos] Add vrutkovs to OWNERS

/approve /lgtm

vrutkovs

comment created time in 21 days

pull request commentopenshift/machine-config-operator

give etcd-metrics container privilege

@runcom updated. I am pretty sure it's kube that is doing this. CRI-O has no understanding of static pods, and only gives containers privilege when it's told to by kubelet

yeah, s/CRIO/kube/g sounds fine to me 👍

/lgtm

haircommander

comment created time in 23 days

pull request commentopenshift/machine-config-operator

give etcd-metrics container privilege

so talked with @mrunalp - here's why we need this PR now (@haircommander would be nice to update the commit message with this):

  • CRI-O 1.14 would turn privileged on for every container in a static pod as long as just one had privileged on. That's why etcd was still running in privileged in 1.14 even w/o the flag, see #526
  • CRI-O 1.16 (kube?) has disabled that now and we need to set that for every container in a static pod
haircommander

comment created time in 23 days

pull request commentopenshift/machine-config-operator

give etcd-metrics container privilege

cross linking the original change that dropped the privileged flag https://github.com/openshift/machine-config-operator/pull/526

haircommander

comment created time in 23 days

pull request commentopenshift/machine-config-operator

give etcd-metrics container privilege

@rphillips ptal (Sam's out also, not sure who to ping from etcd, but at some point this priviledged bit was dropped because it had a typo and now this is getting back)

haircommander

comment created time in 23 days

pull request commentopenshift/machine-config-operator

templates: Let m-c-d binary on host process encapsulated MC on firstboot

I'm about to drop the hold once I run a 4.1->4.2 upgrade

sinnykumari

comment created time in 23 days

pull request commentopenshift/machine-config-operator

[release-4.1] pkg/daemon: remove force validation file if it exists

/retest

need to get BZs for this to get in for 4.1 and 4.2

openshift-cherrypick-robot

comment created time in 23 days

pull request commentopenshift/installer

data/rhcos: Bump to rhcos-4.3/ 43.81.20191028.2

/approve

this is a prereq to land kargs day1 support in MCO also

sinnykumari

comment created time in 23 days

pull request commentopenshift/installer

data/rhcos: Bump to rhcos-4.3/ 43.81.20191028.2

scaleup-rhel7 job is flacky as hell, and it's not related to the kargs change as the workers are rhel7 workers and we don't apply kargs there.

I'm not sure about metal and azure failures tho but aws works 👍

sinnykumari

comment created time in 23 days

pull request commentopenshift/installer

data/rhcos: Bump to rhcos-4.3/ 43.81.20191028.2

/retest

sinnykumari

comment created time in 23 days

pull request commentopenshift/machine-config-operator

daemon/telemetry: implement basic prometheus metrics

/skip /approve

leaving to @yuqi-zhang @ericavonb for a final lgtm

kikisdeliveryservice

comment created time in 23 days

pull request commentopenshift/installer

data/rhcos: Bump to rhcos-4.3/ 43.81.20191025.3

/retest /approve

sinnykumari

comment created time in 24 days

pull request commentopenshift/machine-config-operator

Set correct labels to ovirt and kni infra pods

/approve

Fedosin

comment created time in 24 days

pull request commentopenshift/machine-config-operator

build-sys: Remove dead code in Makefile, fix HACKING.md typo

needs rebase

/approve

cgwalters

comment created time in 25 days

push eventruncom/machine-config-operator

Antonio Murdaca

commit sha fef22819c7618cacb25afd2fb93cc4b5ee1ecd95

pkg/daemon: rollback dropins Signed-off-by: Antonio Murdaca <runcom@linux.com>

view details

push time in a month

push eventruncom/machine-config-operator

Antonio Murdaca

commit sha 0afa0d44e87b80fcadbabc235e3c42c188f7be16

templates: rename our dropins to include the mco string Mainly to avoid ppl to ship something which could override the MCO files. Signed-off-by: Antonio Murdaca <runcom@linux.com>

view details

push time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

About this PR and Bug tho, the issue was mainly the inability to roll back and I think that's the case because the pools are hitting their maxUnavailable so bumping that to 2 will reconcile the cluster, I'm verifying that, we can then get the rename in and postpone any later discussion about validation when spec 3 will be in MCO maybe (?)

runcom

comment created time in a month

push eventruncom/machine-config-operator

Antonio Murdaca

commit sha 7ef963637eb3a4c8a266af3755c6a8a2408806d2

templates: rename our dropins to include the mco string Mainly to avoid ppl to ship something which could override the MCO files. Signed-off-by: Antonio Murdaca <runcom@linux.com>

view details

push time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

I'm updating this PR to move to -mco- for our dropins meanwhile - still doesn't solve the issue tho :(

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

and hopefully users would not include -mco in their names.

so if they do instead, I think we need to avoid rendering and communicate that, how does that sound?

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

But the clear shorter term fix is to change our validation to check the last one for consistency with what we actually write.

I don't believe this is the right thing to do now as a short term hack - the reason here writing a file at a location and writing a dropin config which later writes there. Allowing the validation to pass means that any configuration already shipped (for things like crio and kubelet) can be firstly overridden and secondly skipped from validation. I believe this a broader issue also, how does someone ship a crio dropin? should we allow that since crio is controlled by the CRC crd? hence it makes sense to just error out when someone provides a configuration that would overwrite anything specified before.

What do you all think?

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

templates: Let m-c-d binary on host process encapsulated MC on firstboot

/retest

can't wait to test this out 👍

sinnykumari

comment created time in a month

issue commentopenshift/machine-config-operator

Kubelet failed to join the cluster

@deads2k fyi (not sure who else to ping?)

liqlin2015

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

Can you clarify? Do you mean a systemd drop in or an appended config?

yep, a systemd dropin.

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764720: [release-4.2] kubelet: add dependency on network-online.target

/cherrypick release-4.1

BZ for 4.1 is here https://bugzilla.redhat.com/show_bug.cgi?id=1764719

openshift-cherrypick-robot

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug TODO: [release-4.2] kubelet: add dependency on network-online.target

cc @miabbott for the 4.2 BZ attachment

openshift-cherrypick-robot

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1763700: kubelet: add dependency on network-online.target

/cherrypick release-4.2

rphillips

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1763695: [release-4.2] pkg/daemon: drain before applying changes

Patch manager will add the label to pick this up, we're waiting on this to let it soak a bit in master

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

The last one listed "wins" but only if overwrite is true

here the scenario is "we write a file at a dropin location" then "a dropin writes on that location again" ouch - does spec v3 disallow this?

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

[release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs

/retest

investigating the failure meanwhile

openshift-cherrypick-robot

comment created time in a month

pull request commentopenshift/machine-config-operator

Include candidate IP in log message.

/lgtm

russellb

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

I believe it's a hard error to have duplicate files in Ignition spec 3. We should also indeed disallow this in the MCO I'd say.

uhm, so then the MCC has to learn to generate rendered MCs by always using the last entry in alphabetical order? 🤔

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

maybe we should completely avoid having users override what we ship?

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

(This PR isn't working as intended also, so keep holding)

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

@cgwalters wondering why this wasn't the case till the beginning :/

the actual issue here is that the rendered machineconfigs can contain duplicate entries for e.g. a service or unit. If that's the case, the validate routine can validate and fail only the first entry but what we have written on disk is the second one - should we change the validate routine to always check the last entry if there's a duplicate? This PR just makes sure we can rollback but maybe the fix to validation is needed as well.

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1764116: pkg/daemon: validate on-disk when in desired config

@cgwalters wondering why this wasn't the case till the beginning :/

runcom

comment created time in a month

push eventruncom/machine-config-operator

Antonio Murdaca

commit sha bf106a89e9af43990ab9b14a523b1ee700822d95

pkg/daemon: validate on-disk when in desired config Validate what we have on disk once we know that we're indeed in our desired config. Otherwise we can get into a deadlock where we can't even rollback a bad machineconfig. Signed-off-by: Antonio Murdaca <runcom@linux.com>

view details

push time in a month

PR opened openshift/machine-config-operator

Bug 1764001: pkg/daemon: validate on-disk when in desired config do-not-merge/hold

Validate what we have on disk once we know that we're indeed in our desired config. Otherwise we can get into a deadlock where we can't even rollback a bad machineconfig.

holding pending CI and further manual testing

Signed-off-by: Antonio Murdaca runcom@linux.com

+20 -18

0 comment

1 changed file

pr created time in a month

create barnchruncom/machine-config-operator

branch : validate-indesired

created branch time in a month

pull request commentopenshift/machine-config-operator

[release-4.2] Bug 1763205: revert #1177 and fix common templates in MCs

/approve

@mrunal you might need to override the bz label somehow I think

openshift-cherrypick-robot

comment created time in a month

delete branch runcom/machine-config-operator

delete branch : cc-versioninig

delete time in a month

delete branch runcom/machine-config-operator

delete branch : 42-fix-mco-images-race

delete time in a month

delete branch runcom/machine-config-operator

delete branch : master-fix-mco-images-race

delete time in a month

delete branch runcom/machine-config-operator

delete branch : 41-fix-mco-images-race

delete time in a month

Pull request review commentopenshift/machine-config-operator

Bug 1763635: pkg/operator: fix race between images CM and MCO

 metadata: data:   images.json: >     {+      "releaseVersion": "0.0.1-snapshot",

also, grep for 0.0.1-snapshot in openshift/origin - that's where the substitution is

runcom

comment created time in a month

Pull request review commentopenshift/machine-config-operator

Bug 1763635: pkg/operator: fix race between images CM and MCO

 metadata: data:   images.json: >     {+      "releaseVersion": "0.0.1-snapshot",

oc adm release new is substituting it according to my manual testing, if you can double check that , it would be amazing

runcom

comment created time in a month

pull request commentopenshift/machine-config-operator

Build dockerfile cleanup

/skip

cgwalters

comment created time in a month

pull request commentopenshift/machine-config-operator

Bug 1763635: pkg/operator: fix race between images CM and MCO

lifting the hold as there is enough evidence that this is the right approach and fix

/hold cancel

runcom

comment created time in a month

more