profile
viewpoint

armon/bloomd 1167

C network daemon for bloom filters

armon/go-socks5 980

SOCKS5 server in Golang

armon/go-metrics 948

A Golang library for exporting performance and runtime metrics to external metrics systems (i.e. statsite, statsd)

armon/go-radix 598

Golang implementation of Radix trees

armon/libart 551

Adaptive Radix Trees implemented in C

armon/hlld 432

C network daemon for HyperLogLogs

armon/go-proxyproto 180

Golang package to handle HAProxy Proxy Protocol

armon/circbuf 139

Golang circular (ring) buffer

armon/consul-api 117

Golang API client for Consul

armon/go-chord 113

Golang implementation of the Chord protocol

fork yeshuibo/go-socks5

SOCKS5 server in Golang

fork in an hour

issue commenthashicorp/consul

Consul node enters endless election loop after restart during network outage.

I was looking at the wrong rejoin earlier. That function is serf is attempting to reconnect to the existing nodes. The retry-join handling is https://github.com/hashicorp/consul/blob/master/agent/retry_join.go. It does retry multiple times.

I think the problem may be that -retry-join=127.0.0.1 stops the retry attempts because it sees connecting to itself as a successful join.

s-matyukevich

comment created time in 2 hours

Pull request review commenthashicorp/consul

Allow consul version/consul download url to be inputted via Terraform

 set -e # From: https://alestic.com/2010/12/ec2-user-data-output/ exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 +# Install Consul+if [[ -n "${consul_download_url}" ]]; then+/home/ubuntu/scripts/install-consul --download-url "${consul_download_url}"

This might look better indented to show the logic.

s-christoff

comment created time in 4 hours

issue commenthashicorp/consul

Configure time to cleanup failed consul clients when working with AWS Spot instances

@mkeeler how it is possible to advertise a timeout only for one specific agent? isn't it a global setting?

sebamontini

comment created time in 4 hours

issue closedhashicorp/vault

Agent Auth Failure When Namespace in ENV and Config file

Describe the bug When starting vault in agent mode with a config file and directing it to auto-auth an approle, specifying the namespace in the configuration file as well as the environment causes an HTTP400/HTTP403 error (depending on if you're in windows or linux)

Specifying the namespace in only one place, ENV or Config file works as expected.

To Reproduce Steps to reproduce the behavior:

  1. export VAULT_NAMESPACE='my_namespace'
  2. vault agent -config ./agent-config.hcl
==> Vault server started! Log data will stream in below:

==> Vault agent configuration:

                     Cgo: disabled
               Log Level: trace
                 Version: Vault v1.3.2

2020-03-11T11:07:44.214-0700 [INFO]  sink.file: creating file sink
2020-03-11T11:07:44.214-0700 [TRACE] sink.file: enter write_token: path=./token
2020-03-11T11:07:44.214-0700 [TRACE] sink.file: exit write_token: path=./token
2020-03-11T11:07:44.214-0700 [INFO]  sink.file: file sink configured: path=./token mode=-rw-r-----
2020-03-11T11:07:44.215-0700 [INFO]  auth.handler: starting auth handler
2020-03-11T11:07:44.215-0700 [INFO]  auth.handler: authenticating
2020-03-11T11:07:44.215-0700 [INFO]  template.server: starting template server
2020-03-11T11:07:44.215-0700 [INFO]  sink.server: starting sink server
2020-03-11T11:07:44.215-0700 [INFO]  template.server: no templates found
2020-03-11T11:07:44.215-0700 [INFO]  template.server: template server stopped
2020-03-11T11:07:45.256-0700 [ERROR] auth.handler: error authenticating: error="Error making API request.

URL: PUT https://vault.addr/v1/my_namespace/auth/approle/login
Code: 403. Errors:

* 1 error occurred:
	* permission denied

" backoff=1.158592927

Expected behavior Specifying the namespace in two places should trigger precedence logic where one negates the other. It appears that something is making this additive and producing an invalid request.

Environment:

  • Vault Server Version (retrieve with vault status): Vault v1.3.2 (I don't operate the server, but this is what I'm told)
  • Vault CLI Version (retrieve with vault version): Vault v1.3.2
  • Server Operating System/Architecture: Linux x64

Vault AGENT configuration file(s):

pid_file = "./vault-agent.pid"

vault {
    address = "https://vault.addr"
}

auto_auth {
    method "approle" {
        namespace = "my_namespace"
        config = {
            role_id_file_path = "./role-id"
            secret_id_file_path = "./secret-id"
        }
    }

    sink "file" {
        config = {
            path = "./token"
        }
    }
}

closed time in 4 hours

acilate

push eventhashicorp/nomad

davemay99

commit sha 215a67f5295de1675bd742de4e296ff25cfb98f2

ensure files are unable to escape the capture directory

view details

push time in 5 hours

issue closedhashicorp/raft

Use of break in replication.go

https://github.com/hashicorp/raft/blob/ae3f4f2e0ae5e88dab8c9f6dc2521a45121f9a2a/replication.go#L426 https://github.com/hashicorp/raft/blob/ae3f4f2e0ae5e88dab8c9f6dc2521a45121f9a2a/replication.go#L432

Why use break statement in these two lines while setting shouldStop = true achieve the same thing?

closed time in 5 hours

aprimadi

issue closedhashicorp/raft

TCPTransport maxPool

what's a good value for this setting?

closed time in 5 hours

jkassis

issue commenthashicorp/raft

Use of break in replication.go

Hey there, This issue has been automatically closed because there hasn't been any activity for a while. If you are still experiencing problems, or still have questions, feel free to open a new one :+1

aprimadi

comment created time in 5 hours

issue commenthashicorp/raft

TCPTransport maxPool

Hey there, This issue has been automatically closed because there hasn't been any activity for a while. If you are still experiencing problems, or still have questions, feel free to open a new one :+1

jkassis

comment created time in 5 hours

Pull request review commenthashicorp/vault

Backport 1.6.1: "vault operator usage" CLI for client count reporting (#10365)

+package command++import (+	"encoding/json"+	"errors"+	"fmt"+	"sort"+	"strings"+	"time"++	"github.com/hashicorp/vault/api"+	"github.com/mitchellh/cli"+	"github.com/posener/complete"+	"github.com/ryanuber/columnize"+)++var _ cli.Command = (*OperatorUsageCommand)(nil)+var _ cli.CommandAutocomplete = (*OperatorUsageCommand)(nil)++type OperatorUsageCommand struct {+	*BaseCommand+	flagStartTime time.Time+	flagEndTime   time.Time+}++func (c *OperatorUsageCommand) Synopsis() string {+	return "Lists historical client counts"+}++func (c *OperatorUsageCommand) Help() string {+	helpText := `+Usage: vault operator usage++  List the client counts for the default reporting period.++	  $ vault operator usage++  List the client counts for a specific time period.++          $ vault operator usage -start-time=2020-10 -end-time=2020-11++` + c.Flags().Help()++	return strings.TrimSpace(helpText)+}++func (c *OperatorUsageCommand) Flags() *FlagSets {+	set := c.flagSet(FlagSetHTTP | FlagSetOutputFormat)++	f := set.NewFlagSet("Command Options")++	f.TimeVar(&TimeVar{+		Name:       "start-time",+		Usage:      "Start of report period. Defaults to 'default_reporting_period' before end time.",+		Target:     &c.flagStartTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})+	f.TimeVar(&TimeVar{+		Name:       "end-time",+		Usage:      "End of report period. Defaults to end of last month.",+		Target:     &c.flagEndTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})++	return set+}++func (c *OperatorUsageCommand) AutocompleteArgs() complete.Predictor {+	return complete.PredictAnything+}++func (c *OperatorUsageCommand) AutocompleteFlags() complete.Flags {+	return c.Flags().Completions()+}++func (c *OperatorUsageCommand) Run(args []string) int {+	f := c.Flags()++	if err := f.Parse(args); err != nil {+		c.UI.Error(err.Error())+		return 1+	}++	data := make(map[string][]string)+	if !c.flagStartTime.IsZero() {+		data["start_time"] = []string{c.flagStartTime.Format(time.RFC3339)}+	}+	if !c.flagEndTime.IsZero() {+		data["end_time"] = []string{c.flagEndTime.Format(time.RFC3339)}+	}++	client, err := c.Client()+	if err != nil {+		c.UI.Error(err.Error())+		return 2+	}++	resp, err := client.Logical().ReadWithData("sys/internal/counters/activity", data)+	if err != nil {+		c.UI.Error(fmt.Sprintf("Error retrieving client counts: %v", err))

oh okay, wasn't sure what the ux was

mgritter

comment created time in 5 hours

Pull request review commenthashicorp/vault

"vault operator usage" CLI for client count reporting (#10365)

+package command++import (+	"encoding/json"+	"errors"+	"fmt"+	"sort"+	"strings"+	"time"++	"github.com/hashicorp/vault/api"+	"github.com/mitchellh/cli"+	"github.com/posener/complete"+	"github.com/ryanuber/columnize"+)++var _ cli.Command = (*OperatorUsageCommand)(nil)+var _ cli.CommandAutocomplete = (*OperatorUsageCommand)(nil)++type OperatorUsageCommand struct {+	*BaseCommand+	flagStartTime time.Time+	flagEndTime   time.Time+}++func (c *OperatorUsageCommand) Synopsis() string {+	return "Lists historical client counts"+}++func (c *OperatorUsageCommand) Help() string {+	helpText := `+Usage: vault operator usage++  List the client counts for the default reporting period.++	  $ vault operator usage++  List the client counts for a specific time period.++          $ vault operator usage -start-time=2020-10 -end-time=2020-11++` + c.Flags().Help()++	return strings.TrimSpace(helpText)+}++func (c *OperatorUsageCommand) Flags() *FlagSets {+	set := c.flagSet(FlagSetHTTP | FlagSetOutputFormat)++	f := set.NewFlagSet("Command Options")++	f.TimeVar(&TimeVar{+		Name:       "start-time",+		Usage:      "Start of report period. Defaults to 'default_reporting_period' before end time.",+		Target:     &c.flagStartTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})+	f.TimeVar(&TimeVar{+		Name:       "end-time",+		Usage:      "End of report period. Defaults to end of last month.",+		Target:     &c.flagEndTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})++	return set+}++func (c *OperatorUsageCommand) AutocompleteArgs() complete.Predictor {+	return complete.PredictAnything+}++func (c *OperatorUsageCommand) AutocompleteFlags() complete.Flags {+	return c.Flags().Completions()+}++func (c *OperatorUsageCommand) Run(args []string) int {+	f := c.Flags()++	if err := f.Parse(args); err != nil {+		c.UI.Error(err.Error())+		return 1+	}++	data := make(map[string][]string)+	if !c.flagStartTime.IsZero() {+		data["start_time"] = []string{c.flagStartTime.Format(time.RFC3339)}+	}+	if !c.flagEndTime.IsZero() {+		data["end_time"] = []string{c.flagEndTime.Format(time.RFC3339)}+	}++	client, err := c.Client()+	if err != nil {+		c.UI.Error(err.Error())+		return 2+	}++	resp, err := client.Logical().ReadWithData("sys/internal/counters/activity", data)+	if err != nil {+		c.UI.Error(fmt.Sprintf("Error retrieving client counts: %v", err))

This is how the client handles most errors from Vault, I think. If there's a better example I'll copy it (but that should be addressed in the main branch, not this backport PR.)

mgritter

comment created time in 5 hours

Pull request review commenthashicorp/vault

"vault operator usage" CLI for client count reporting (#10365)

+package command++import (+	"encoding/json"+	"errors"+	"fmt"+	"sort"+	"strings"+	"time"++	"github.com/hashicorp/vault/api"+	"github.com/mitchellh/cli"+	"github.com/posener/complete"+	"github.com/ryanuber/columnize"+)++var _ cli.Command = (*OperatorUsageCommand)(nil)+var _ cli.CommandAutocomplete = (*OperatorUsageCommand)(nil)++type OperatorUsageCommand struct {+	*BaseCommand+	flagStartTime time.Time+	flagEndTime   time.Time+}++func (c *OperatorUsageCommand) Synopsis() string {+	return "Lists historical client counts"+}++func (c *OperatorUsageCommand) Help() string {+	helpText := `+Usage: vault operator usage++  List the client counts for the default reporting period.++	  $ vault operator usage++  List the client counts for a specific time period.++          $ vault operator usage -start-time=2020-10 -end-time=2020-11++` + c.Flags().Help()++	return strings.TrimSpace(helpText)+}++func (c *OperatorUsageCommand) Flags() *FlagSets {+	set := c.flagSet(FlagSetHTTP | FlagSetOutputFormat)++	f := set.NewFlagSet("Command Options")++	f.TimeVar(&TimeVar{+		Name:       "start-time",+		Usage:      "Start of report period. Defaults to 'default_reporting_period' before end time.",+		Target:     &c.flagStartTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})+	f.TimeVar(&TimeVar{+		Name:       "end-time",+		Usage:      "End of report period. Defaults to end of last month.",+		Target:     &c.flagEndTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})++	return set+}++func (c *OperatorUsageCommand) AutocompleteArgs() complete.Predictor {+	return complete.PredictAnything+}++func (c *OperatorUsageCommand) AutocompleteFlags() complete.Flags {+	return c.Flags().Completions()+}++func (c *OperatorUsageCommand) Run(args []string) int {+	f := c.Flags()++	if err := f.Parse(args); err != nil {+		c.UI.Error(err.Error())

i'm assuming that c.UI.<loglevel> calls output strings to the user? If so, do we want to expose the internal workings of the system by printing the error given back from function calls?

mgritter

comment created time in 6 hours

Pull request review commenthashicorp/vault

"vault operator usage" CLI for client count reporting (#10365)

+package command++import (+	"encoding/json"+	"errors"+	"fmt"+	"sort"+	"strings"+	"time"++	"github.com/hashicorp/vault/api"+	"github.com/mitchellh/cli"+	"github.com/posener/complete"+	"github.com/ryanuber/columnize"+)++var _ cli.Command = (*OperatorUsageCommand)(nil)+var _ cli.CommandAutocomplete = (*OperatorUsageCommand)(nil)++type OperatorUsageCommand struct {+	*BaseCommand+	flagStartTime time.Time+	flagEndTime   time.Time+}++func (c *OperatorUsageCommand) Synopsis() string {+	return "Lists historical client counts"+}++func (c *OperatorUsageCommand) Help() string {+	helpText := `+Usage: vault operator usage++  List the client counts for the default reporting period.++	  $ vault operator usage++  List the client counts for a specific time period.++          $ vault operator usage -start-time=2020-10 -end-time=2020-11++` + c.Flags().Help()++	return strings.TrimSpace(helpText)+}++func (c *OperatorUsageCommand) Flags() *FlagSets {+	set := c.flagSet(FlagSetHTTP | FlagSetOutputFormat)++	f := set.NewFlagSet("Command Options")++	f.TimeVar(&TimeVar{+		Name:       "start-time",+		Usage:      "Start of report period. Defaults to 'default_reporting_period' before end time.",+		Target:     &c.flagStartTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})+	f.TimeVar(&TimeVar{+		Name:       "end-time",+		Usage:      "End of report period. Defaults to end of last month.",+		Target:     &c.flagEndTime,+		Completion: complete.PredictNothing,+		Default:    time.Time{},+		Formats:    TimeVar_TimeOrDay | TimeVar_Month,+	})++	return set+}++func (c *OperatorUsageCommand) AutocompleteArgs() complete.Predictor {+	return complete.PredictAnything+}++func (c *OperatorUsageCommand) AutocompleteFlags() complete.Flags {+	return c.Flags().Completions()+}++func (c *OperatorUsageCommand) Run(args []string) int {+	f := c.Flags()++	if err := f.Parse(args); err != nil {+		c.UI.Error(err.Error())+		return 1+	}++	data := make(map[string][]string)+	if !c.flagStartTime.IsZero() {+		data["start_time"] = []string{c.flagStartTime.Format(time.RFC3339)}+	}+	if !c.flagEndTime.IsZero() {+		data["end_time"] = []string{c.flagEndTime.Format(time.RFC3339)}+	}++	client, err := c.Client()+	if err != nil {+		c.UI.Error(err.Error())+		return 2+	}++	resp, err := client.Logical().ReadWithData("sys/internal/counters/activity", data)+	if err != nil {+		c.UI.Error(fmt.Sprintf("Error retrieving client counts: %v", err))

same as above

mgritter

comment created time in 6 hours

issue commenthashicorp/consul

Consul node enters endless election loop after restart during network outage.

I believe I understand the problem now. Looking at serf.handleRejoin. It only attempts to join the LAN gossip pool once at start up. Since the network was down at that time, it failed to rejoin the pool and gave up.

    2020-11-24T22:50:13.073Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.18.0.4:8300 [Follower]" leader=
    2020-11-24T22:50:13.097Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul3.dc1 172.18.0.4
    2020-11-24T22:50:13.098Z [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: consul1.dc1: 172.18.0.2:8302
    2020-11-24T22:50:13.098Z [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: consul2.dc1: 172.18.0.3:8302
    2020-11-24T22:50:13.098Z [WARN]  agent.server.serf.wan: serf: Failed to re-join any previously known node
    2020-11-24T22:50:13.106Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul3 172.18.0.4
    2020-11-24T22:50:13.106Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: consul1: 172.18.0.2:8301
    2020-11-24T22:50:13.106Z [INFO]  agent.server: Adding LAN server: server="consul3 (Addr: tcp/172.18.0.4:8300) (DC: dc1)"
    2020-11-24T22:50:13.106Z [INFO]  agent.server: Raft data found, disabling bootstrap mode
    2020-11-24T22:50:13.106Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul3.dc1 area=wan
    2020-11-24T22:50:13.106Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: consul2: 172.18.0.3:8301
    2020-11-24T22:50:13.109Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-11-24T22:50:13.109Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-11-24T22:50:13.109Z [INFO]  agent: (LAN) joining: lan_addresses=[consul1, consul2, 127.0.0.1]
    2020-11-24T22:50:20.530Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-11-24T22:50:21.294Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
    2020-11-24T22:50:21.294Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.18.0.4:8300 [Candidate]" term=3
    2020-11-24T22:50:21.422Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=575bbaa9-7a22-886e-c431-bb6d762ebafc fallback=172.18.0.3:8300 error="Could not find address for server id 575bbaa9-7a22-886e-c431-bb6d762ebafc"
    2020-11-24T22:50:21.422Z [WARN]  agent.server.raft: unable to get address for sever, using fallback address: id=19067efc-dfc8-c8b9-dd27-6fe28fb96097 fallback=172.18.0.2:8300 error="Could not find address for server id 19067efc-dfc8-c8b9-dd27-6fe28fb96097"
    2020-11-24T22:50:21.422Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 19067efc-dfc8-c8b9-dd27-6fe28fb96097 172.18.0.2:8300}" error="dial tcp <nil>->172.18.0.2:8300: connect: network is unreachable"
    2020-11-24T22:50:21.422Z [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 575bbaa9-7a22-886e-c431-bb6d762ebafc 172.18.0.3:8300}" error="dial tcp <nil>->172.18.0.3:8300: connect: network is unreachable"
    2020-11-24T22:50:23.111Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve consul1: lookup consul1 on 127.0.0.11:53: read udp 127.0.0.1:52217->127.0.0.11:53: i/o timeout

The raft heartbeat timed out, so it started its own leader election. That seems correct.

After the network is reconnected I checked the member view from consul3 using consul members and it only had itself listed. The other two nodes showed consul3 as left. It seems that serf never reconnected at that point. I guess because there was no new event to inform raft of the change in state, it continued to attempt to elect a leader, even though it would have been able to rejoin at this point.

The logs may be a bit misleading, because it is raft complaining. Raft uses the addresses it gets from serf to find the other servers. In the logs we see consul3 failing to get the addresses (because serf doesn't have them), but it is using a fallback address which was still correct, which is why the other nodes keep receiving leader vote requests.

Running consul join consul1 from the consul3 node resolves the problem. Once consul3 re-joins the LAN gossip pool it logs:

    2020-11-24T23:08:36.815Z [INFO]  agent: (LAN) joining: lan_addresses=[consul1]
    2020-11-24T23:08:36.820Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul1 172.18.0.2
    2020-11-24T23:08:36.820Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul2 172.18.0.3
    2020-11-24T23:08:36.820Z [INFO]  agent: (LAN) joined: number_of_nodes=1
    2020-11-24T23:08:36.820Z [INFO]  agent.server: Adding LAN server: server="consul1 (Addr: tcp/172.18.0.2:8300) (DC: dc1)"
    2020-11-24T23:08:36.820Z [INFO]  agent.server: Adding LAN server: server="consul2 (Addr: tcp/172.18.0.3:8300) (DC: dc1)"
    2020-11-24T23:08:36.822Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul1.dc1 172.18.0.2
    2020-11-24T23:08:36.822Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul2.dc1 172.18.0.3
    2020-11-24T23:08:36.822Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul1.dc1 area=wan
    2020-11-24T23:08:36.822Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul2.dc1 area=wan
    2020-11-24T23:08:39.184Z [WARN]  agent.server.raft: Election timeout reached, restarting election
    2020-11-24T23:08:39.185Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.18.0.4:8300 [Candidate]" term=148
    2020-11-24T23:08:41.069Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-11-24T23:08:44.624Z [WARN]  agent.server.raft: failed to get previous log: previous-index=132 last-index=17 error="log not found"
    2020-11-24T23:08:44.624Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.18.0.4:8300 [Follower]" leader=172.18.0.3:8300
    2020-11-24T23:08:44.728Z [INFO]  agent.server: New leader elected: payload=consul2

As far as I can tell, raft on consul3 required some event to make it understand that the other nodes are available once again. I'm not sure what that is supposed to be. Clearly serf can signal it once it rejoins, but I'm not sure if it would be appropriate for retry-join to keep retrying, or if it should be some other event.

s-matyukevich

comment created time in 6 hours

Pull request review commenthashicorp/nomad

client: always wait 200ms before sending updates

 func (c *Client) allocSync() { 			}  			var resp structs.GenericResponse-			if err := c.RPC("Node.UpdateAlloc", &args, &resp); err != nil {+			err := c.RPC("Node.UpdateAlloc", &args, &resp)+			if err != nil {+				// Error updating allocations, do *not* clear+				// updates and retry after backoff 				c.logger.Error("error updating allocations", "error", err)-				syncTicker.Stop()-				syncTicker = time.NewTicker(c.retryIntv(allocSyncRetryIntv))-				staggered = true-			} else {-				updates = make(map[string]*structs.Allocation)-				if staggered {-					syncTicker.Stop()-					syncTicker = time.NewTicker(allocSyncIntv)-					staggered = false-				}+				syncTicker.Reset(c.retryIntv(allocSyncRetryIntv))+				continue 			}++			// Successfully updated allocs, reset map and ticker.+			// Always reset ticker to give loop time to receive+			// alloc updates. If the RPC took the ticker interval+			// we may call it in a tight loop before draining+			// buffered updates.+			updates = make(map[string]*structs.Allocation, len(updates))

What's the reason for specifying len(updates) here? If the updates were successfully applied, shouldn't this reset to the same state as line 1908?

schmichael

comment created time in 6 hours

Pull request review commenthashicorp/nomad

client: always wait 200ms before sending updates

 func (c *Client) allocSync() { 			}  			var resp structs.GenericResponse-			if err := c.RPC("Node.UpdateAlloc", &args, &resp); err != nil {+			err := c.RPC("Node.UpdateAlloc", &args, &resp)+			if err != nil {+				// Error updating allocations, do *not* clear+				// updates and retry after backoff 				c.logger.Error("error updating allocations", "error", err)-				syncTicker.Stop()-				syncTicker = time.NewTicker(c.retryIntv(allocSyncRetryIntv))-				staggered = true-			} else {-				updates = make(map[string]*structs.Allocation)-				if staggered {-					syncTicker.Stop()-					syncTicker = time.NewTicker(allocSyncIntv)-					staggered = false-				}+				syncTicker.Reset(c.retryIntv(allocSyncRetryIntv))

What is the functional difference between using Reset here vs Stop and re-creating the ticker, as was done here before, and is still done below?

schmichael

comment created time in 6 hours

pull request commenthashicorp/vault

Using changelog tool to update changelog [VAULT-1055]

I think it's either an improvement or a feature. I kind of lean "improvement" since I think we've mostly used "feature" for introducing new features as part of a major release.

HridoyRoy

comment created time in 6 hours

issue commenthashicorp/consul

Consul node enters endless election loop after restart during network outage.

I was able to reproduce the problem using the steps in this issue description. The logs are attached to this comment. consul3.log consul2.log consul1.log

s-matyukevich

comment created time in 6 hours

Pull request review commenthashicorp/nomad

nomad operator debug - add pprof duration / csi details

 func (c *OperatorDebugCommand) collectAgentHost(path, id string, client *api.Cli 		host, err = client.Agent().Host("", id, nil) 	} -	path = filepath.Join(path, id)-	c.mkdir(path)+	if err != nil {+		c.Ui.Error(fmt.Sprintf("%s/%s: Failed to retrieve agent host data, err: %v", path, id, err)) +		if strings.Contains(err.Error(), structs.ErrPermissionDenied.Error()) {+			// Drop a hint to help the operator resolve the error+			c.Ui.Warn(fmt.Sprintf("Agent host retrieval requires agent:read ACL or enable_debug=true.  See https://www.nomadproject.io/api-docs/agent#host for more information."))

Thanks for the note Tim, I'll rebase to pull that in. There is one additional permission requirement though. To capture pprof's you need agent:write, and in all cases enable_debug=true will override the ACL requirement. Do you think we should mention enable_debug in the help text as well?

davemay99

comment created time in 6 hours

PR opened hashicorp/consul

merge: release/1.9.0

Updates changelog entries, versions and website banner from changes made directly on release/1.9.0 branch.

+21 -20

0 comment

12 changed files

pr created time in 6 hours

create barnchhashicorp/consul

branch : merge/release-1.9.0

created branch time in 6 hours

push eventhashicorp/consul

Paul Banks

commit sha b4cb9155d8ed86b8d61a9759a981e5b347c681d0

Update ui-visualization.mdx

view details

Nitya Dhanushkodi

commit sha b6459fe725fc334d5419ef5d20b2f5c9a5791ac6

Merge pull request #9179 from hashicorp/ndhanushkodi-patch-1 Update Helm compatibility matrix

view details

Kyle Schochenmaier

commit sha ba82eab3fbdee4ed22c2c5897ebdc2403da41c75

Docs: for consul-k8s health checks (#8819) * docs for consul-k8s health checks Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com> Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com> Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com> Co-authored-by: Luke Kysow <1034429+lkysow@users.noreply.github.com>

view details

Iryna Shustava

commit sha 251841b759fbc50265f46bd960d21b0ee37778eb

docs: add link to the OpenShift platform guide to k8s docs (#9177)

view details

Luke Kysow

commit sha 9050263072b87a60026edebeca61093e118e553f

Docs for upgrading to CRDs (#9176) * Add Upgrading to CRDs docs

view details

Matt Keeler

commit sha 1f0007d3f31db6049af2f03ef5f906d353183293

[docs] Change links to the DNS information to the right place (#8675) The redirects were working in many situations but some (INTERNALS.md) was not. This just flips everything over to using the real link.

view details

Mike Morris

commit sha f3108c4901e47e3ed7ea99f13bee74d27f46da0b

changelog: fixup changelog.tmpl formatting

view details

Mike Morris

commit sha 883ba66bed97735297a78ba58cb82a94dc6a9054

Merge branch 'release/1.9.0-rc1' of github.com:hashicorp/consul into release/1.9.0-rc1

view details

Daniel Nephin

commit sha c6381b7e2bf15a18fb2bc1f72acfad3ebaf1407e

agent: fix bug with multiple listeners Previously the listener was being passed to a closure in a loop without capturing the loop variable. The result is only the last listener is used, so the http/https servers only listen on one address. This problem is fixed by capturing the variable by passing it into a function.

view details

Daniel Nephin

commit sha b2c5e2d0592efb4e46c56e22ce4b22b87f810d4b

Use freeport To prevent other tests which already use freeport from flaking when port 0 steals their reserved port.

view details

Daniel Nephin

commit sha 02314a50471edc45441ec89220878fcdcf21f841

Merge pull request #9225 from hashicorp/dnephin/1.9.0-fix-multiple-http-listeners [1.9.0] agent: fix bug with multiple listeners

view details

Mike Morris

commit sha 54fcfec78ca12c08eda07b18fe002e278c296f9d

Merge branch 'stable-website' into website/1.9.0-rc1

view details

Mike Morris

commit sha c2c85280733d26cce53d91292b0eff876ed990b5

website: update download callout for v1.9.0-rc1

view details

Kit Patella

commit sha f3380b1c430bbac8a0cb9e095e8d5cc13a2cbcc4

Merge pull request #9091 from scellef/correct-upgrade-guide Correcting text on when default was changed in Consul

view details

John Cowen

commit sha efe29ed5e7cefd0ef478ed20910bb18ceea74880

ui: Remove ghost healthcheck from the service instance healthcheck list (#9220) * ui: Fixup service instance healthcheck list not to show ghost check If the proxy is undefined, then an undefined vaule is appended to the list of checks * There are only 6 checks in the mocks so only expect 6

view details

John Cowen

commit sha ae049b7b965b66770cf4ada5956f8e37ec02a43b

ui: All metrics cards should default to the default nspace if not set (#9223) * ui: All metrics cards should default to the default nspace if not set * Use the up/downstream as the data/nspace for up/downstreams not the service

view details

John Cowen

commit sha d830f76bfe637bf7c0cfb7f569972c8837545e3d

ui: Sort lists with health by unhealthy/healthy by default (#9234) * ui: Update lists with Health to sort by unhealthy/healthy by default * Fix up tests for new sorting * Make specific services page-navigation test

view details

John Cowen

commit sha 6b3d403c7bf354397ef7ca15e45dd16244f7696d

ui: ACL Tokens > Roles and Policy search and sort (#9236) * ui: Ensure search is enabled for child items in the ACLs area * Refactor comparators to reuse some utility functions * Add search and sorting to the ACLs child selector * Add tests for searching within child selectors * Allow sorting by CreateIndex

view details

John Cowen

commit sha 84fd590930b360eaf0ff5c091f28a7036d0fefc6

ui: Surface 'detail' of API errors in the error page (#9237) * ui: Surface 'detail' of API errors in the error page * Make UI generated 404s look less bare

view details

John Cowen

commit sha 727a1053be1e0d9ac978eefa3d98c3dba67fc5cf

ui: Alter background color of filter bars (#9238)

view details

push time in 7 hours

PR merged hashicorp/consul

merge: release/1.9.0 back into 1.9.x
+125 -169

0 comment

15 changed files

mikemorris

pr closed time in 7 hours

issue openedhashicorp/vault

Vault Audit Logs include javascript as field name

Describe the bug Vault audit logs for sys/capabilities-self sometimes includes an invalid javascript based field name.

To Reproduce Steps to reproduce the behavior:

  1. Login via the UI (we use Google OIDC)
  2. Open an auth configuration page (ie. ui/vault/access/gcp/configuration)
  3. See audit log:
# grep 'sys/capabilities-self' /var/log/vault/audit.json | tail -1 | jq '.response'
{
  "mount_type": "system",
  "data": {
    "capabilities": [
      "hmac-sha256:deadbeef"
    ],
    "function(t){var a=t||{},s=[e[0]]\nreturn n.forEach(function(t,n){s.push(a[t],e[n+1])}),s.join(\"\")}": [
      "hmac-sha256:deadbeef"
    ]
  }
}

Expected behavior Vault Audit logs do not include invalid JSON fields. This causes major headaches with ELK centralized logging infrastructure.

Environment:

  • Vault Server Version (retrieve with vault status): v1.5.4
  • Vault CLI Version (retrieve with vault version): v1.5.4
  • Server Operating System/Architecture: Ubuntu 18.04

created time in 7 hours

push eventhashicorp/nomad

Tim Gross

commit sha 2188df361bdb1f8e4a0d389f0cff63eea082b206

belt-and-suspenders nil check

view details

push time in 7 hours

issue commenthashicorp/nomad

Host-Volumes + SELinux result in permission denied.

Hi @Tetha I got a chance to dig into this a bit and it looks like we're running into a Docker limitation, but one that appears to be intentional.

Any volume we mount with the volume_mount flag (host volumes or CSI volumes) get passed as part of the Docker driver MountConfig. This is the same as if you were using the mounts block in the Docker driver, as opposed to the volumes block like you're doing above.

The Docker container resulting from a job that has a volume_mount, a volumes block, and a mounts block looks like the following:

$ docker inspect a224
[
    {
        ...
        "HostConfig": {
            "Binds": [
                "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/alloc:/alloc",
                "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/local:/local",
                "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/secrets:/secrets",
                "/srv/volumeSource0:/local/srv"
            ],
            ...
            "Mounts": [
                {
                    "Type": "bind",
                    "Source": "/srv/volumeSource1",
                    "Target": "/local/vagrant",
                    "ReadOnly": true,
                    "BindOptions": {}
                },
                {
                    "Type": "bind",
                    "Source": "/srv/volumeSource2",
                    "Target": "/test",
                    "ReadOnly": true,
                    "BindOptions": {
                        "Propagation": "rprivate"
                    }
                }
            ],
            ...
        },
        ...
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/local",
                "Destination": "/local",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/secrets",
                "Destination": "/secrets",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/srv/volumeSource0",
                "Destination": "/local/srv",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/srv/volumeSource1",
                "Destination": "/local/vagrant",
                "Mode": "",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/srv/volumeSource2",
                "Destination": "/test",
                "Mode": "",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/alloc",
                "Destination": "/alloc",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        ....

So the mounts block maps to the Docker command line's --mount flag, about which the Docker docs say:

The --mount flag does not support z or Z options for modifying selinux labels.

It looks like their reasoning for this can be found in places like: https://github.com/moby/moby/issues/36282 https://github.com/moby/moby/issues/30934 https://github.com/docker/cli/pull/832/files

For Nomad, we define the relabelling in the client configuration, which is privileged so the destructive possibilities here are lessened (although it still could be a nasty footgun for someone). I'm still trying to figure out what the right way to handle this problem is and what we can do with it in the Nomad driver. So I just wanted to check in and let you know it's been at least looked at, but it's probably not going to get fixed in Nomad 1.0.0.

Tetha

comment created time in 7 hours

push eventhashicorp/consul

David Yu

commit sha 2a0555407c086bf861dcf49854b6b50257ab0bc6

Consul 1.9 GA Banner (#9272)

view details

push time in 7 hours

pull request commenthashicorp/consul

Consul 1.9 GA Banner

:cherries::white_check_mark: Cherry pick of commit 347d75934372b56153e4cbbdd93ecf495c519650 onto release/1.9.x succeeded!

david-yu

comment created time in 7 hours

push eventhashicorp/consul

David Yu

commit sha a15f99d74b84d38a810eae9140c5547fd4b933fb

Consul 1.9 GA Banner (#9272)

view details

push time in 7 hours

more