profile
viewpoint

imron/drush 0

Drush is a command-line shell and scripting interface for Drupal, a veritable Swiss Army knife designed to make life easier for those who spend their working hours hacking away at the command prompt.

imron/parseint 0

Rust port of https://kholdstare.github.io/technical/2020/05/26/faster-integer-parsing.html

imron/pywin32 0

Python for Windows (pywin32) Extensions

imron/Rocket 0

A web framework for Rust.

imron/rust 0

Empowering everyone to build reliable and efficient software.

imron/s3bench 0

AWS S3 benchmarking tool

imron/scalyr-agent-2 0

The source code for Scalyr Agent 2, the daemon process Scalyr customers run on their servers to collect metrics and logs.

imron/scalyr-agent-chef 0

Chef cookbook for the scalyr agent

imron/scalyr-aws 0

AWS spot instance launcher for s3 benchmarks

imron/scalyr-tool 0

Command-line tool for accessing Scalyr services

push eventscalyr/scalyr-agent-2

Steven Czerwinski

commit sha 1fb46bc20f2de5098c09dc41d273b35b3143dc0b

Added testcase for bug causing duplicate log upload issue

view details

Imron Alston

commit sha a3116e9d9fa4464759d9afaaf71a9b6c372cc8c1

Use a unique_id for each processor

view details

push time in 10 days

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 7381aed128e3d11704c527a82f4071008110febd

Use a unique_id for each processor

view details

push time in 10 days

create barnchscalyr/scalyr-agent-2

branch : duplicateLogs-2.1.6

created branch time in 11 days

pull request commentscalyr/scalyr-agent-2

DO NOT MERGE AGENT-425 Add debug log stmts for uploading duplicate logs

Added extra debug statements to see why we are closing all the files. Can someone please create a new debug file for release? (Note: the failing test is a problem with uploading codecov results to s3 and is unrelated to these changes).

oliverhsu77

comment created time in 11 days

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 0d15a940e4c9445d69d860ff454f9541b5fa92f0

add extra debug statements in callback handler to say why we are closing the processor

view details

push time in 11 days

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 8f037f2c6204c2f553a6e8d888ee94588802b64e

add extra debug statements in callback handler to say why we are closing the processor

view details

push time in 11 days

pull request commentscalyr/scalyr-fluentd

Re-format the code with rubocop, enable rubocop check on CI

👍 Looks good to me. Feel free to merge.

Kami

comment created time in 14 days

Pull request review commentscalyr/scalyr-fluentd

Add a Dockerfile for fluentd with out plugin installed

+FROM fluent/fluentd:v1.11-1++# Use root account to use apk+USER root++RUN apk add --no-cache --update --virtual .build-deps \

I wonder if it might be worth doing a multi-stage build, and then just copying the final gem into the fluentd container when done?

yanscalyr

comment created time in 16 days

push eventscalyr/scalyr-fluentd

Imron Alston

commit sha aee82e0f2dec83ee5d03a6e08b79e6e99f364ee4

start of multi_worker support

view details

Imron Alston

commit sha 9f0212a3f94a83ed1717437ac040992ab7fc14ed

add worker record. use tag for thread_id

view details

Imron Alston

commit sha c46fca7ff743863910dd7670acd7300472facd83

remove monotonically increasing timestamp restriction

view details

Imron Alston

commit sha df49a9f980203498378f89e1cc677214c01f066c

version bump, and updated defaults

view details

Imron Alston

commit sha 977760f6a5e453b00f5c80ee79d546664269b57d

remove worker attribute

view details

Imron

commit sha f41b7a35bd97dedc8503bb3a6b0fcc60c47e4b9a

Merge pull request #15 from scalyr/multi_worker Multi worker

view details

push time in 16 days

PR merged scalyr/scalyr-fluentd

Reviewers
Multi worker

This PR adds multi-worker support to our fluentd plugin. Testing under load has shown that logs from a single docker container are always routed to the same fluentd worker.

This PR also removes the monotonically increasing timestamp restriction that was previously required by the server, and uses the tag name as the "thread_id", so that we can completely avoid the need for synchronization.

Finally, it updates the defaults to take in to account the increased maximum buffer size on the server.

Here is an example config file using 4 workers. With this config, I was able to transfer up to 20MB/s from a single Azure server based in US-East.

<system>
  workers 4
</system>

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **> 
  @type scalyr
  api_write_token XXXXXXXXX
  use_hostname_for_serverhost true
  scalyr_server https://agent.scalyr.com/
  ssl_ca_bundle_path /etc/ssl/certs/ca-certificates.crt
  ssl_verify_peer true
  ssl_verify_depth 5
  max_request_buffer 5900000
  message_field log 
  force_message_encoding UTF-8
  replace_invalid_utf8 true
  compression_type deflate
  compression_level 3
  <buffer>
    compress text
    flush_mode interval
    flush_interval 5s
    flush_thread_count 1
    delayed_commit_timeout 30
    overflow_action throw_exception
  </buffer>
</match>

+15 -30

1 comment

4 changed files

imron

pr closed time in 16 days

push eventscalyr/scalyr-fluentd

Imron Alston

commit sha 977760f6a5e453b00f5c80ee79d546664269b57d

remove worker attribute

view details

push time in 16 days

pull request commentscalyr/scalyr-tool

Fixes issue #23

Any update on this? It will soon be my last day at scalyr (31 July), and it might be a good idea to merge before then.

imron

comment created time in 17 days

pull request commentscalyr/scalyr-fluentd

Multi worker

Tests have not yet been updated.

imron

comment created time in 17 days

PR opened scalyr/scalyr-fluentd

Multi worker

This PR adds multi-worker support to our fluentd plugin. Testing under load has shown that logs from a single docker container are always routed to the same fluentd worker.

This PR also removes the monotonically increasing timestamp restriction that was previously required by the server, and uses the tag name as the "thread_id", so that we can completely avoid the need for synchronization.

Finally, it updates the defaults to take in to account the increased maximum buffer size on the server.

Here is an example config file using 4 workers. With this config, I was able to transfer up to 20MB/s from a single Azure server based in US-East.

<system>
  workers 4
</system>

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **> 
  @type scalyr
  api_write_token XXXXXXXXX
  use_hostname_for_serverhost true
  scalyr_server https://agent.scalyr.com/
  ssl_ca_bundle_path /etc/ssl/certs/ca-certificates.crt
  ssl_verify_peer true
  ssl_verify_depth 5
  max_request_buffer 5900000
  message_field log 
  force_message_encoding UTF-8
  replace_invalid_utf8 true
  compression_type deflate
  compression_level 3
  <buffer>
    compress text
    flush_mode interval
    flush_interval 5s
    flush_thread_count 1
    delayed_commit_timeout 30
    overflow_action throw_exception
  </buffer>
</match>

+17 -30

0 comment

4 changed files

pr created time in 17 days

push eventscalyr/scalyr-fluentd

Imron Alston

commit sha df49a9f980203498378f89e1cc677214c01f066c

version bump, and updated defaults

view details

push time in 17 days

push eventscalyr/scalyr-fluentd

Imron Alston

commit sha c46fca7ff743863910dd7670acd7300472facd83

remove monotonically increasing timestamp restriction

view details

push time in 17 days

push eventscalyr/scalyr-fluentd

Imron Alston

commit sha 9f0212a3f94a83ed1717437ac040992ab7fc14ed

add worker record. use tag for thread_id

view details

push time in 17 days

create barnchscalyr/scalyr-fluentd

branch : multi_worker

created branch time in 18 days

pull request commentscalyr/scalyr-agent-2

DO NOT MERGE AGENT-425 Add debug log stmts for uploading duplicate logs

Hi Oliver,

I've added new debug statements and tested them for coverage, and they all appear good.

Are you able to review the changes, and then prepare a debug build?

oliverhsu77

comment created time in 19 days

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 6988057c0f85bc23c3572d51e4777b3ecd90d397

extra debug info

view details

push time in 19 days

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 78d7b53ed339c9847e6307464b22cac5f7ba392d

more debug messages

view details

push time in 25 days

Pull request review commentscalyr/scalyr-agent-2

AGENT-425 Add debug log stmts for uploading duplicate logs

 def prepare_for_inactivity(self, current_time=None):         if close_file:             for pending in self.__pending_files:                 self.__close_file(pending)+                log.info(

Worst case scenario is that it will be something like <FileState object at 0x105e764a8>

oliverhsu77

comment created time in 25 days

Pull request review commentscalyr/scalyr-agent-2

AGENT-425 Add debug log stmts for uploading duplicate logs

 def prepare_for_inactivity(self, current_time=None):         if close_file:             for pending in self.__pending_files:                 self.__close_file(pending)+                log.info(

It should, if using the %s placeholder and you use % for formatting, then python will invoke str() on the object, which will print a string representation.

oliverhsu77

comment created time in 25 days

Pull request review commentscalyr/scalyr-agent-2

AGENT-425 Add debug log stmts for uploading duplicate logs

 def prepare_for_inactivity(self, current_time=None):         if close_file:             for pending in self.__pending_files:                 self.__close_file(pending)+                log.info(

Can you add information about what the values are - both of the field, and also the results of the calculation used to trigger this branch? i.e. the value of delta, current_time, self.__modification_time_raw and self.__max_modification_duration.

This way we can not only tell what is happening, but also why.

oliverhsu77

comment created time in 25 days

Pull request review commentscalyr/scalyr-agent-2

AGENT-425 Add debug log stmts for uploading duplicate logs

 def find_matches(                     checkpoint_state = None                     # Get the last checkpoint state if it exists.                     if matched_file in previous_state:+                        log.info(+                            "%s has previous state %s" % (matched_file, previous_state)

This was the log message I was using in my test version, but when I was looking through the logs, I found it difficult to pick it out because it starts with the filename. I'd change this to say something like:

Copying %s from checkpoint %s, then we can just search the logs for "Copying" and will match either of those lines.

oliverhsu77

comment created time in 25 days

push eventimron/parseint

Imron Alston

commit sha b62bd669728e4413deba911b63a2528be68fe4bb

Set target-cpu=native

view details

push time in 25 days

PR opened pickfire/parseint

Set target-cpu=native

Setting the target-cpu to native can get you performance improvements.

Here are the before and after results on my laptop (macbook, with i9 CPU).

Before:

test bench_naive_bytes         ... bench:           7 ns/iter (+/- 0) = 2285 MB/s
test bench_naive_bytes_and     ... bench:           6 ns/iter (+/- 0) = 2666 MB/s
test bench_naive_bytes_and_c16 ... bench:           6 ns/iter (+/- 2) = 2666 MB/s
test bench_naive_bytes_iter    ... bench:           7 ns/iter (+/- 1) = 2285 MB/s
test bench_naive_chars         ... bench:          10 ns/iter (+/- 0) = 1600 MB/s
test bench_naive_chars_and     ... bench:           9 ns/iter (+/- 0) = 1777 MB/s
test bench_naive_chars_iter    ... bench:          10 ns/iter (+/- 0) = 1600 MB/s
test bench_str_parse           ... bench:          19 ns/iter (+/- 5) = 842 MB/s
test bench_trick               ... bench:           3 ns/iter (+/- 0) = 5333 MB/s
test bench_trick_128           ... bench:           4 ns/iter (+/- 0) = 4000 MB/s
test bench_trick_simd          ... bench:           3 ns/iter (+/- 0) = 5333 MB/s
test bench_trick_simd_c16      ... bench:          13 ns/iter (+/- 2) = 1230 MB/s
test bench_unrolled            ... bench:          12 ns/iter (+/- 1) = 1333 MB/s
test bench_unrolled_safe       ... bench:          11 ns/iter (+/- 0) = 1454 MB/s
test bench_unrolled_unsafe     ... bench:          11 ns/iter (+/- 1) = 1454 MB/s

After:

test bench_naive_bytes         ... bench:           7 ns/iter (+/- 1) = 2285 MB/s
test bench_naive_bytes_and     ... bench:           6 ns/iter (+/- 0) = 2666 MB/s
test bench_naive_bytes_and_c16 ... bench:           6 ns/iter (+/- 0) = 2666 MB/s
test bench_naive_bytes_iter    ... bench:           7 ns/iter (+/- 2) = 2285 MB/s
test bench_naive_chars         ... bench:          10 ns/iter (+/- 0) = 1600 MB/s
test bench_naive_chars_and     ... bench:           8 ns/iter (+/- 0) = 2000 MB/s
test bench_naive_chars_iter    ... bench:          11 ns/iter (+/- 1) = 1454 MB/s
test bench_str_parse           ... bench:          19 ns/iter (+/- 2) = 842 MB/s
test bench_trick               ... bench:           3 ns/iter (+/- 0) = 5333 MB/s
test bench_trick_128           ... bench:           4 ns/iter (+/- 0) = 4000 MB/s
test bench_trick_simd          ... bench:           1 ns/iter (+/- 0) = 16000 MB/s
test bench_trick_simd_c16      ... bench:           2 ns/iter (+/- 0) = 8000 MB/s
test bench_unrolled            ... bench:           7 ns/iter (+/- 0) = 2285 MB/s
test bench_unrolled_safe       ... bench:           6 ns/iter (+/- 1) = 2666 MB/s
test bench_unrolled_unsafe     ... bench:           6 ns/iter (+/- 0) = 2666 MB/s

The fastest time is now 1ns!

+3 -0

0 comment

1 changed file

pr created time in 25 days

fork imron/parseint

Rust port of https://kholdstare.github.io/technical/2020/05/26/faster-integer-parsing.html

fork in 25 days

fork imron/rust

Empowering everyone to build reliable and efficient software.

https://www.rust-lang.org

fork in a month

pull request commentscalyr/scalyr-agent-2

Log redis cluster replication offsets

Oh, also, can you rebase off master before you merge.

huchengming

comment created time in 2 months

PR merged scalyr/scalyr-agent-2

Reviewers
CT-25 - add support for k8s_logs in configuration file

This commit adds support for k8s_logs in the agent config file (or config directories).

Specifically config files support a new field k8s_logs which is identical to a regular logs stanza except:

  • There is no path field supported
  • There are 3 new optional fields, k8s_pod_glob, k8s_namespace_glob and k8s_container_glob, which are globs that default to *.

If any of those fields are present for a k8s_logs config then any container log files added by the kubernetes_monitor for a given pod are filtered against all of those globs, and if all of them match, then the config is applied to the logs for that container.

Only the first matching config will be applied to a given log.

This means that users should order k8s_logs entries from most specific to least specific.

Example config usage:

{
  "k8s_logs": [
    {
      // most specific, filtering on pod name and namespace
      "k8s_pod_glob": "nginx*",
      "k8s_namespace_glob": "web",
      "attributes" : { "parser": "nginxLog" }
    },
    {
      // filtering on just the container name defined in the k8s yaml
      "k8s_container_glob": "*redis*",
      "attributes" : { "parser": "redisLog" }
    },
    {
      // filtering just on namespaces - won't match anything already
      // matched by the previous config stanzas
      "k8s_namespace_glob": "*monitoring*",
      "attributes" : { "parser": "monitoringLogs" }
    },
    {
      // least specific, `k8s_pod_glob`, `k8s_namespace_glob` and
      // `k8s_container_glob` all default to `*`, so this will match
      // everything not matched in the previous stanzas
      "attributes" : { "parser": "defaultParser" }
    },

  ]
}

Note: You can match against a deployment, by setting k8s_pod_glob to "deployment*" because pods names for a given deployment will always start with the deployment name.

With the exception of path, all log configuration settings supported by logs are supported by k8s_logs, including parsers, attributes, redaction_rules and so on.

Configuration options set by any k8s annotations will override the values specified in the k8s_logs.

+673 -76

1 comment

5 changed files

imron

pr closed time in 2 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha b8a11657d56e68e681fbd90bed3bb1b278e1945a

CT-25 - Add support for k8s_logs in config file This commit adds support for k8s_logs in the agent config file (or config directories). Specifically config files support a new field `k8s_logs` which is identical to a regular `logs` stanza except: * There is no `path` field supported * There are 3 new optional fields, `k8s_pod_glob`, `k8s_namespace_glob` and `k8s_container_glob`, which are globs that default to `*`. If any of those fields are present for a `k8s_logs` config then any container log files added by the kubernetes_monitor for a given pod are filtered against all of those globs, and if all of them match, then the config is applied to the logs for that container. Only the first matching config will be applied to a given log. This means that users should order `k8s_logs` entries from most specific to least specific. Example config usage: ``` { "k8s_logs": [ { // most specific, filtering on pod name and namespace "k8s_pod_glob": "nginx*", "k8s_namespace_glob": "web", "attributes" : { "parser": "nginxLog" } }, { // filtering on just the container name defined in the k8s yaml "k8s_container_glob": "*redis*", "attributes" : { "parser": "redisLog" } }, { // filtering just on namespaces - won't match anything already // matched by the previous config stanzas "k8s_namespace_glob": "*monitoring*", "attributes" : { "parser": "monitoringLogs" } }, { // least specific, `k8s_pod_glob`, `k8s_namespace_glob` and // `k8s_container_glob` all default to `*`, so this will match // everything not matched in the previous stanzas "attributes" : { "parser": "defaultParser" } }, ] } ``` Note: You can match against a deployment, by setting `k8s_pod_glob` to `"deployment*"` because pods names for a given deployment will always start with the deployment name. With the exception of `path`, all log configuration settings supported by `logs` are supported by `k8s_logs`, including parsers, attributes, redaction_rules and so on. Configuration options set by any k8s annotations will override the values specified in the `k8s_logs`.

view details

Imron Alston

commit sha c89ab9dcba6d772313c09dbd897cdb102bc0236f

Fix failing smoke test

view details

push time in 2 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/ct-25

delete time in 2 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha a1a7919f30f65d274506548678d2c044bd4a07ed

Fix failing smoke test

view details

push time in 2 months

push eventscalyr/scalyr-agent-2

Tomaz Muraus

commit sha ceb809fddae527327f25b051cbdddbfa881561b3

Fix codecov coverage data upload (#558) * Update codecov dependency to v2.1.3 since current version is broken and is returning 400 on coverage report upload. * Make sure submit coverage data step fails the build if codecov coverage data upload fails.

view details

Tomaz Muraus

commit sha d9d09f2837064cf0873509bd53c4e2520c0e8468

Add a script which verifies /addEvents API returns correct response headers (#555) * Add a script which verifies that /addEvents API endpoint returns correct response headers on 401 and 200 response. * Hook it up to the CI.

view details

Tomaz Muraus

commit sha adc6b59cea560f1ac24b8fcc1860acc31e534412

Add SonarCloud config, various small code fixes (#560) * Use mkstemp instead of mktemp which is safer. * Add sonar cloud config file. * Fix a couple of more issues detected by sonarcloud.

view details

Tomaz Muraus

commit sha 04688522a66c2ae054c32947ff617592f8e352d3

Update sonar cloud settings. (#562)

view details

Tomaz Muraus

commit sha b789d75a762c2bf8d20caa60d7f8a4edbf19712e

Fix small style related issues detected by Sonar Cloud (#563) * Fix various small (mostly style) violations detected by sonar cloud.

view details

yanscalyr

commit sha d364ded0210490c1d40ce00cfd3479f314ee172a

AGENT-397: Improved logging for investigating skipped bytes (#561) * AGENT-397: Improved logging for investigating skipped bytes * Fix overwriting passed in message param * Fix typo * Fix incorrect rate_limited_time in warning message * Easy review responses * Add new fields to status __add__ test * Add unit test for new skipped bytes warnings

view details

yanscalyr

commit sha 099a789ff1147a77d71610e7268638313943d1e9

AGENT-406: Add rate limiting related changes to CHANGELOG and RELEASE_NOTES (#564) * AGENT-406: Add rate limiting related changes to CHANGELOG and RELEASE_NOTES * Review comments * Review comments

view details

czerwingithub

commit sha 5ad9d16cafc48112c88be171d75608f5520976fa

Update RELEASE_NOTES.md (#565) Tweak documentation about maximum send rate enforcement to provide some specific examples.

view details

czerwingithub

commit sha 78f4d921e74306bfdf42174ebcd11e5d715652eb

Fix/undo sonar cloud changes (#566) * Revert "Fix small style related issues detected by Sonar Cloud (#563)" This reverts commit b789d75a762c2bf8d20caa60d7f8a4edbf19712e. * Revert "Update sonar cloud settings. (#562)" This reverts commit 04688522a66c2ae054c32947ff617592f8e352d3. * Revert "Add SonarCloud config, various small code fixes (#560)" This reverts commit adc6b59cea560f1ac24b8fcc1860acc31e534412.

view details

Tomaz Muraus

commit sha d25a89a50c65b8dbad12584387a98f0ac020d128

Add back reverted tempfile.mkstemp changes, fix Windows compatibility (#568) * Revert "Fix/undo sonar cloud changes (#566)" This reverts commit 78f4d921e74306bfdf42174ebcd11e5d715652eb. * Update code which utilizes mkstemp so it works correctly on Windows if we try to delete that file later on. We do that either by immediately closing the open fd which is returned by the function (this is done in places where code is not security sensitive and the actual file is re-opened later by some other code) or by directly using fd returned by the function.

view details

Jenkins Automation

commit sha 67460c81933c44d57ee3a2a272b79de4c9fc7205

Agent release 2.1.6

view details

Steven Czerwinski

commit sha a84820cb17b1e4d445628c8198c9005a3b12f246

Merge branch 'release' of github.com:scalyr/scalyr-agent-2

view details

Tomaz Muraus

commit sha cbca9bc092573b0e35921f4602c76d0cb1b06142

Add lint check which will fail if any of the bundled certs will expire in 24 months or sooner (#553) * Add lint check which will fail if any of the bundled certs will expire in 24 months or sooner.

view details

Tomaz Muraus

commit sha 25d6d4ed0a45f5869fd10348f9a0a381fdc279fd

Also run unit tests on Windows (#569) * Revert "Fix/undo sonar cloud changes (#566)" This reverts commit 78f4d921e74306bfdf42174ebcd11e5d715652eb. * Try running unit tests on Windows. * Skip Linux / Posix specific tests on Windows. * Try workaround for test logging mess on Windows. * Try a workaround. * Also install compression algorithm libs on Windows. * Update code which utilizes mkstemp so it works correctly on Windows if we try to delete that file later on. We do that either by immediately closing the open fd which is returned by the function (this is done in places where code is not security sensitive and the actual file is re-opened later by some other code) or by directly using fd returned by the function. * Fix more tests so they pass on Windows. * slack orb doesn't work on Windows :/ * Revert testing changes. * Use different choco cache patth. * Use correct pip cache directory. * Log response on assert failure to make it easier to troubleshoot. * Ignore bandit false positive. * Also try run coverage with Windows unit tests. * Update comments. * Try using custom orb which works on Windows executor. * Test that specifying "shell: bash" also works for OS X and Linux jobs. * Move more of the orb code in-line until my fix pr has been merged. * Specify Windows slack orb configuration so it matches configuration for other jobs. * Update .gitignore. * Make sure we close FDs which are opened by the log handlers on tests tearDown. * Make sure we correctly close all the FDs used by loggers inside all the tests.

view details

ArthurKamalov

commit sha 419d33baa770bbaecda4fc2a81ec2e3bfb62effb

Windows packaging with pyinstaller. (#538) Pyinstaller library now is used in the Windows msi package build process.

view details

Oliver Hsu

commit sha a56e842634b1899895c64a81350c09cdaca38b7a

AGENT-413 K8S Scalyr agent uploads duplicate logs on restart/config change (#572) * Remove `kubernetes_monitor` calling copying_manager.remove_log_path in `stop`

view details

Tomaz Muraus

commit sha 92890d0e1247d332fae7aac542f1152a8349c9e0

Also run install only tests inside the AMI tests job (#573) * Update AMI tests script to also run raw install tests for Linux Distributions which test installing latest stable package using the latest installer script. * Also run install and upgrade tests on CentOS 7 and CentOS 8. Co-authored-by: Arthur Kamalov <artur.kamalov@akvelon.com>

view details

Tomaz Muraus

commit sha e5f949799e3e99b7d0cdbd7f1b617f10dba631b0

SonarCloud integration though our CI system (Circle CI) (#571) * Run sonar cloud analysis as part of our Circle CI job. This way we have more control of when it runs and we can pass additional data to it (e.g. coverage, etc).

view details

Tomaz Muraus

commit sha 09303ec7f87b385ed83ab572cf5bf070f26a0eed

Also run AMI end to end package tests in Debian 10 & Amazon Linux 2 (#575) * Add support for Debian to the script which runs AMI tests on Circle CI. * Also run AMI tests on Amazon Linux 2. * Update readme. * Use up to 12 attempts when trying to delete a volume - sometimes it takes a while for a volume to transition into "available" state when we can delete it. * Allow user to pass "installer_script_url" for AMI tests Circle CI job. This change makes it trivial to test installer script changes against all the distros we support in our AMI tests.

view details

Oliver Hsu

commit sha 124478b3ac1fec9a883c2eca0e0baf55abd2dc56

AGENT-413 Update Docker README for instructions to build agent K8S image (#576)

view details

push time in 2 months

Pull request review commentscalyr/scalyr-agent-2

Log redis cluster replication offsets

 def _initialize(self):             self.__redis_hosts.append(redis_host)      def gather_sample(self):-         for host in self.__redis_hosts:             new_connection = not host.valid()             try:                 host.log_slowlog_entries(self._logger, self.__lines_to_fetch)+                host.log_cluster_replication_info(self._logger)

Could you put this behind a configuration option defaulting to False? This way existing customers already using the Redis monitor won't start getting extra logs.

See here for an example of how to define configuration options. You can then enable it in the config option in the monitor snippet of agent.json e.g.

{
  monitors:[
    {
      "module": "scalyr_agent.builtin_monitors.redis_monitor",
      "log_cluster_replication_info": true

    }
  ]
}
huchengming

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

Log redis cluster replication offsets

 def log_slowlog_entries(self, logger, lines_to_fetch):         for entry in reversed(unseen_entries):             self.log_entry(logger, entry) +    def log_cluster_replication_info(self, logger):+        # Gather replication information from Redis+        replication_info = parse_info(self.redis.execute_command("INFO REPLICATION"))++        # We only care about information from Redis master, which contains offsets from both master and replica+        if replication_info["role"] != "master":+            return++        master_repl_offset = replication_info["master_repl_offset"]++        if master_repl_offset == 0:+            return++        master_replid = replication_info["master_replid"]+        connected_replicas = replication_info["connected_slaves"]++        # If there are more than one replicas, log the most up-to-date+        max_replica_offset = 0+        for n in range(connected_replicas):+            max_replica_offset = max(+                max_replica_offset, replication_info["slave{}".format(n)]["offset"]

For consistency reasons with the rest of the agent, can you use % formatting instead of .format?

huchengming

comment created time in 2 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 61b5db0489ce27f1b9ffc73e5752748cfe3137a3

Use variable substitution for rename_logfile

view details

Imron Alston

commit sha 80603fae35fd65beca1a4b250ba060f3823c0695

No longer merge annotation and log config attributes. If specified, annotations will override the entire attributes dict

view details

push time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def query_stats(self):         return self.query_api("/stats/summary")  +class K8sConfigBuilder(object):+    """Builds log configs for containers based on config snippets found in the `k8s_logs` field of+    the config file.+    """++    def __init__(+        self, k8s_log_configs, logger, rename_no_original, parse_format="json"+    ):+        """+        @param k8s_log_configs: The config snippets from the configuration+        @param logger: A scalyr logger+        @param rename_no_original: A bool, used to prevent the original log file name from being added to the attributes.+        @param parse_format: The parse format of this log config+        """+        self.__k8s_log_configs = k8s_log_configs+        self._logger = logger+        self.__rename_no_original = rename_no_original+        self.__parse_format = parse_format++    def _check_match(self, element, name, value, glob):+        """+        Checks to see if we have a match against the glob for a certain value+        @param element: The index number of the element in the k8s_config list+        @param name: A string containing the name of the field we are evaluating (used for debug log purposes)+        @param value: The value of the field to evaluate the glob against+        @param glob: A string containing a glob to evaluate+        """+        result = False+        if glob is not None and value is not None:+            # ignore this config if value doesn't match the glob+            if fnmatch.fnmatch(value, glob):+                result = True+            else:+                self._logger.log(+                    scalyr_logging.DEBUG_LEVEL_2,+                    "Ignoring k8s_log item %d because %s '%s' doesn't match '%s'"+                    % (element, name, value, glob,),+                )+        return result++    def get_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile,+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @param info: A dict containing docker information about the container we are creating a config for+        @param k8s_info: A dict containing k8s information about hte container we are creating a config for+        @param container_attributes: A set of attributes to add to the log config of this container+        @param parser: A string containing the name of the parser to use for this log config+        @param rename_logfile: A string containing the name to use for the renamed log file+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """++        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {+            "parser": parser,+            "path": path,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,+            "rename_no_original": self.__rename_no_original,

I think that makes sense. I'll write some longer comments once I have made some changes.

imron

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

AGENT-413 Update Docker README for instructions building agent K8S image

 Use the following commands to build the respective images:     ./scalyr-docker-agent-syslog --extract-packages     docker build -t scalyr/scalyr-docker-agent-syslog . +#### scalyr-k8s-agent++    cd scalyr-agent-2/docker+    python ../build_package.py --no-versioned-file-name k8s_builder+    ./scalyr-docker-agent-syslog --extract-packages+    docker build -t scalyr/scalyr-k8s-agent .

Need to add -f Dockerfile.k8s

oliverhsu77

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

AGENT-413 Update Docker README for instructions building agent K8S image

 Use the following commands to build the respective images:     ./scalyr-docker-agent-syslog --extract-packages     docker build -t scalyr/scalyr-docker-agent-syslog . +#### scalyr-k8s-agent++    cd scalyr-agent-2/docker+    python ../build_package.py --no-versioned-file-name k8s_builder+    ./scalyr-docker-agent-syslog --extract-packages

this should be ./scalyr-k8s-agent --extract-packages

oliverhsu77

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def query_stats(self):         return self.query_api("/stats/summary")  +class K8sConfigBuilder(object):+    """Builds log configs for containers based on config snippets found in the `k8s_logs` field of+    the config file.+    """++    def __init__(+        self, k8s_log_configs, logger, rename_no_original, parse_format="json"+    ):+        """+        @param k8s_log_configs: The config snippets from the configuration+        @param logger: A scalyr logger+        @param rename_no_original: A bool, used to prevent the original log file name from being added to the attributes.+        @param parse_format: The parse format of this log config+        """+        self.__k8s_log_configs = k8s_log_configs+        self._logger = logger+        self.__rename_no_original = rename_no_original+        self.__parse_format = parse_format++    def _check_match(self, element, name, value, glob):+        """+        Checks to see if we have a match against the glob for a certain value+        @param element: The index number of the element in the k8s_config list+        @param name: A string containing the name of the field we are evaluating (used for debug log purposes)+        @param value: The value of the field to evaluate the glob against+        @param glob: A string containing a glob to evaluate+        """+        result = False+        if glob is not None and value is not None:+            # ignore this config if value doesn't match the glob+            if fnmatch.fnmatch(value, glob):+                result = True+            else:+                self._logger.log(+                    scalyr_logging.DEBUG_LEVEL_2,+                    "Ignoring k8s_log item %d because %s '%s' doesn't match '%s'"+                    % (element, name, value, glob,),+                )+        return result++    def get_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile,+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @param info: A dict containing docker information about the container we are creating a config for+        @param k8s_info: A dict containing k8s information about hte container we are creating a config for+        @param container_attributes: A set of attributes to add to the log config of this container+        @param parser: A string containing the name of the parser to use for this log config+        @param rename_logfile: A string containing the name to use for the renamed log file+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """++        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {+            "parser": parser,+            "path": path,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,+            "rename_no_original": self.__rename_no_original,+        }++        # If we don't have any k8s information then we don't match against k8s_log_configs+        if k8s_info is None:+            return result++        # Now apply log configs+        for i, config in enumerate(self.__k8s_log_configs):+            # We check for glob matches against `k8s_pod_glob`, `k8s_namespace_glob` and `k8s_container_glob`++            # Check for the pod glob+            pod_glob = config.get("k8s_pod_glob", None)+            pod_name = k8s_info.get("pod_name", None)+            if not self._check_match(i, "pod_name", pod_name, pod_glob):

The main reason is because I also wanted those values for use in the following debug statement saying that we passed.

imron

comment created time in 2 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha de6191aa3e42651e4b774bf12f25b1035b6c44a9

Refactor __create_log_config to make it more testable + tests

view details

push time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def __get_last_request_for_log(self, path):          return scalyr_util.seconds_since_epoch(result) -    def __create_log_config(self, parser, path, attributes, parse_format="raw"):-        """Convenience function to create a log_config dict from the parameters"""+    def __create_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """ -        return {+        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {             "parser": parser,             "path": path,-            "parse_format": parse_format,-            "attributes": attributes,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,         } +        # This is for a hack to prevent the original log file name from being added to the attributes.+        if self.__use_v2_attributes and not self.__use_v1_and_v2_attributes:+            result["rename_no_original"] = True++        # If we don't have any k8s information then we don't match against k8s_log_configs+        if k8s_info is None:+            return result++        # Now apply log configs+        for i, config in enumerate(self.__k8s_log_configs):+            # We check for glob matches against `k8s_pod_glob`, `k8s_namespace_glob` and `k8s_container_glob`++            # Check for the pod glob+            pod_glob = config.get("k8s_pod_glob", None)+            pod_name = k8s_info.get("pod_name", None)+            if pod_glob is not None and pod_name is not None:+                # ignore this config if pod_name doesn't match the glob+                if not fnmatch.fnmatch(pod_name, pod_glob):+                    self._logger.log(+                        scalyr_logging.DEBUG_LEVEL_2,+                        "Ignoring k8s_log item %d because pod_name '%s' doesn't match '%s'"+                        % (i, pod_name, pod_glob,),+                    )+                    continue++            # Check for the namespace glob+            namespace_glob = config.get("k8s_namespace_glob", None)+            pod_namespace = k8s_info.get("pod_namespace", None)+            if namespace_glob is not None and pod_namespace is not None:+                # ignore this config if pod_namespace doesn't match the glob+                if not fnmatch.fnmatch(pod_namespace, namespace_glob):+                    self._logger.log(+                        scalyr_logging.DEBUG_LEVEL_2,+                        "Ignoring k8s_log item %d because pod_namespace '%s' doesn't match '%s'"+                        % (i, pod_namespace, namespace_glob,),+                    )+                    continue++            # Check for the k8s container name glob+            container_glob = config.get("k8s_container_glob", None)+            k8s_container = k8s_info.get("k8s_container_name", None)+            if container_glob is not None and k8s_container is not None:+                # ignore this config if k8s_container doesn't match the glob+                if not fnmatch.fnmatch(k8s_container, container_glob):+                    self._logger.log(+                        scalyr_logging.DEBUG_LEVEL_2,+                        "Ignoring k8s_log item %d because k8s_container_name '%s' doesn't match '%s'"+                        % (i, k8s_container, container_glob,),+                    )+                    continue++            self._logger.log(+                scalyr_logging.DEBUG_LEVEL_2,+                "Applying k8s_log config item %d.  Matched pod_name ('%s', '%s'), pod_namespace ('%s', '%s') and k8s_container_name ('%s', '%s')"+                % (+                    i,+                    pod_name,+                    pod_glob,+                    pod_namespace,+                    namespace_glob,+                    k8s_container,+                    container_glob,+                ),+            )+            # We have the first matching config.  Apply the log config and break+            # Note, we can't just .update() because otherwise the attributes dict+            # may get overridden, plus we also need to exclude `path`+            for key, value in six.iteritems(config):+                # Ignore `path` so people can't override it

After further thought, we need to be doing this, because it's not the default attributes we are merging, but the container specific attributes (pod_name, pod_namespace etc). We need to do that for every container anyway so I figure it's better to do it once rather than holding state.

imron

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def __get_last_request_for_log(self, path):          return scalyr_util.seconds_since_epoch(result) -    def __create_log_config(self, parser, path, attributes, parse_format="raw"):-        """Convenience function to create a log_config dict from the parameters"""+    def __create_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """ -        return {+        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {             "parser": parser,             "path": path,-            "parse_format": parse_format,-            "attributes": attributes,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,         } +        # This is for a hack to prevent the original log file name from being added to the attributes.+        if self.__use_v2_attributes and not self.__use_v1_and_v2_attributes:+            result["rename_no_original"] = True++        # If we don't have any k8s information then we don't match against k8s_log_configs+        if k8s_info is None:+            return result++        # Now apply log configs+        for i, config in enumerate(self.__k8s_log_configs):+            # We check for glob matches against `k8s_pod_glob`, `k8s_namespace_glob` and `k8s_container_glob`++            # Check for the pod glob+            pod_glob = config.get("k8s_pod_glob", None)+            pod_name = k8s_info.get("pod_name", None)+            if pod_glob is not None and pod_name is not None:

Done.

imron

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def __get_last_request_for_log(self, path):          return scalyr_util.seconds_since_epoch(result) -    def __create_log_config(self, parser, path, attributes, parse_format="raw"):-        """Convenience function to create a log_config dict from the parameters"""+    def __create_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """ -        return {+        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {             "parser": parser,             "path": path,-            "parse_format": parse_format,-            "attributes": attributes,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,         } +        # This is for a hack to prevent the original log file name from being added to the attributes.+        if self.__use_v2_attributes and not self.__use_v1_and_v2_attributes:+            result["rename_no_original"] = True++        # If we don't have any k8s information then we don't match against k8s_log_configs+        if k8s_info is None:+            return result++        # Now apply log configs+        for i, config in enumerate(self.__k8s_log_configs):

Unless we are instantiating this abstraction for every single container, then the only ones of these it can take are rename_no_original.

Everything other field will contain values specific to the container.

So, the abstraction I made takes the k8s_logs from the config, the logger, the log file parse format (json or cri) and the rename_no_original value (all of which are consistent across all containers).

There is then a get_log_config method that takes the remaining config specific items and returns the appropriate config.

imron

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

CT-25 - add support for k8s_logs in configuration file

 def __get_last_request_for_log(self, path):          return scalyr_util.seconds_since_epoch(result) -    def __create_log_config(self, parser, path, attributes, parse_format="raw"):-        """Convenience function to create a log_config dict from the parameters"""+    def __create_log_config(+        self, info, k8s_info, container_attributes, parser, rename_logfile+    ):+        """+        Creates a log_config from various attributes and then applies any `k8s_logs` configs that+        might apply to this log+        @return: A dict containing a log_config, or None if we couldn't create a valid config (i.e. log_path was empty)+        """ -        return {+        # Make sure we have a log_path for the log config+        path = info.get("log_path", None)+        if not path:+            return None++        # Build up the default config that we will use+        result = {             "parser": parser,             "path": path,-            "parse_format": parse_format,-            "attributes": attributes,+            "parse_format": self.__parse_format,+            "attributes": container_attributes,+            "rename_logfile": rename_logfile,         } +        # This is for a hack to prevent the original log file name from being added to the attributes.+        if self.__use_v2_attributes and not self.__use_v1_and_v2_attributes:+            result["rename_no_original"] = True

This is now done when creating the config builder abstraction.

imron

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

Add lint check which will fail if any of the bundled certs will expire in 6 months or sooner

+#!/usr/bin/env python+# Copyright 2014-2020 Scalyr Inc.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#   http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.++"""+Script which errors out if any of the bundled certs will expire in 24 months or sooner.+"""++from __future__ import absolute_import+from __future__ import print_function++if False:+    from typing import List

If this is never True, do we need to include it?

Kami

comment created time in 2 months

push eventscalyr/scalyr-agent-2

yanscalyr

commit sha 62932f1f627e27c8f4df3123bd1b8e644e77f196

CT-52: Increase default max_line_size (#550) * CT-52: Increase default max_line_size * Changelog note * Changelog format * Changelog format * Changelog format

view details

ArthurKamalov

commit sha 32c7a922a7f9b00e7d3de3977426042a59882679

Fix syslog strings incompatibility (#552) * SyslogHandler now accepts unicode string and all intermediate handlers have to convert their data before pass it to 'SyslogHandler.handle'. Co-authored-by: Tomaz Muraus <tomaz@tomaz.me>

view details

ArthurKamalov

commit sha a2958240a46c4fa9ac0e7ffda0834cdcc546d49c

Prepare documentation for 2.1.6 release (#557) * Prepare documentation for 2.1.6 release

view details

yanscalyr

commit sha 135bbc30d5af03da4c96907ce5d10dfe92d08656

AGENT-402: Use `max_send_rate_enforcement` to rate limit sent log bytes (#551) * AGENT-401: Implement new send rate defaults and fallback switches * Update copying manager tests * Test to see if defaults are causing smoketests to fail * Revert test to see if defaults are causing smoketests to fail * To make it easier to troubleshoot, log collectors which were found. * Make sure we only perform agent version check once. This check operates against local agent.log file so there is no need to wait on Scalyr API and retry on potential missing data (which indicates that the agent may not have sent the data yet or it hasn't been fully processed by the API yet). * Log how many matches were found to make it easier to troubleshoot. * Test a change - if file exists already, re-open it instead of modifying mtime + atime. * Test another change - don't pre-create the file. * Remove testing change, reduce sleep. * Print agent status on timeout / failure. This may help with troubleshooting. * Test another change. * Remove testing change. * Change max_allowed_request_size to test smoketest issues * Continue trying to narrow down problematic config option * Continue trying to narrow down problematic config option * Continue trying to narrow down problematic config option * Continue trying to narrow down problematic config option * Continue trying to narrow down problematic config option * Continue trying to narrow down problematic config option * Testing change * Try to narrow down where `--no-fork` causes issues * Try sleeping on fork to prove race condition * Make sure the fork is the actual cause * Make sure the fork is the actual cause * Test if flushing is enough to fix the issue * Try a sleep to test race condition * Test closing streams * Test closing streams * Test stream redirecting * Test stream redirecting * Test stream redirecting * Test disabling debug logging * Test lower but still enabled debug level * Test not uploading debug logs * Test not closing fds * Test None stdin * Test a change. * Log a message when we successfully write status data to file. * Set one new default back to old value for test * Test old defaults * Test old defaults * Test old defaults (Disable a lot of CI tests for now) * Test old defaults * Test old defaults * Test old defaults * Add AgentLogFilter to every log handler. That's the approach we used before #466 and #495 and it appears to fix the test failure issue. * Test handler change * Test handler change * Cleanup * Cleaner values for base 2 values * Remove test changes we don't need anymore. * Don't call close() in case handler is none (aka if __recreate_handler() return None). * Fix StdoutFilter. The filter wouldn't filter correctly if a custom value was provided for stdout_severity configuration option because getattr(logging, 'LEVEL NAME') returns reference to the logger function for that log level and not the actual number which maps to that log level name. For that, we need to use logging.getLevelName(). * Add TODO annotation. * handler can be None if __recreate_handler() method returns None. * Add changelog entry. * Update SIGTERM signal handler so we only try to log a message if the log handlers are still open. By default, agent_main.py stop command tries to send SIGTERM signal to the agent for up to 5 seconds with short sleep in between each attempt. On SIGTERM we invoke a termination handler which also closes all the log handlers. This means that subsequent SIGTERM signal handlers will try to log a message when all the handlers will already be closed which will cause users to see many of the following errors in stdout: IOError: [Errno 0] Error Logged from file platform_posix.py, line 637 Traceback (most recent call last): File "/usr/local/lib/python2.7/logging/__init__.py", line 889, in emit stream.write(fs % msg) * Make sure close_handlers() is called at the very end before shutting down. If we call it as part of run state stop callbacks, it means close_handlers() will be called early on before stopping worker thread which means that some messages produced after that function being called and before agent actually shutting down will be lost. * Make sure we call .upper() on the stdout severity string. This way it works correctly when using getattr(logging, level) and we don't accidentaly retrieve reference to some other logging module variable which doesn't represent a log name. For example, if user would pass in info, we would assume the level is correct, but we would actially get a reference to logging.info function and not the corresponding log level number. * Add a very simple release note * AGENT-402: Use `max_send_rate_enforcement` to rate limit sent log bytes * Test updates and minor fix * Leave notes to be in separate PR AGENT-406 * Review comments * Test issues * Test issues * Test issues * Add test for legacy overrides * Review comments * Use new configuration option * Limit output of override warning * Comment on RateLimiter bucket size * Nicer comment format * Spelling * Better override values * Test updates * Add @return comment Co-authored-by: Tomaz Muraus <tomaz@tomaz.me>

view details

Imron Alston

commit sha d6165ecb1fde1e0c996b3486167ced357bbc25a1

CT-25 - Add support for k8s_logs in config file This commit adds support for k8s_logs in the agent config file (or config directories). Specifically config files support a new field `k8s_logs` which is identical to a regular `logs` stanza except: * There is no `path` field supported * There are 3 new optional fields, `k8s_pod_glob`, `k8s_namespace_glob` and `k8s_container_glob`, which are globs that default to `*`. If any of those fields are present for a `k8s_logs` config then any container log files added by the kubernetes_monitor for a given pod are filtered against all of those globs, and if all of them match, then the config is applied to the logs for that container. Only the first matching config will be applied to a given log. This means that users should order `k8s_logs` entries from most specific to least specific. Example config usage: ``` { "k8s_logs": [ { // most specific, filtering on pod name and namespace "k8s_pod_glob": "nginx*", "k8s_namespace_glob": "web", "attributes" : { "parser": "nginxLog" } }, { // filtering on just the container name defined in the k8s yaml "k8s_container_glob": "*redis*", "attributes" : { "parser": "redisLog" } }, { // filtering just on namespaces - won't match anything already // matched by the previous config stanzas "k8s_namespace_glob": "*monitoring*", "attributes" : { "parser": "monitoringLogs" } }, { // least specific, `k8s_pod_glob`, `k8s_namespace_glob` and // `k8s_container_glob` all default to `*`, so this will match // everything not matched in the previous stanzas "attributes" : { "parser": "defaultParser" } }, ] } ``` Note: You can match against a deployment, by setting `k8s_pod_glob` to `"deployment*"` because pods names for a given deployment will always start with the deployment name. With the exception of `path`, all log configuration settings supported by `logs` are supported by `k8s_logs`, including parsers, attributes, redaction_rules and so on. Configuration options set by any k8s annotations will override the values specified in the `k8s_logs`.

view details

push time in 2 months

Pull request review commentscalyr/scalyr-agent-2

AGENT-402: Use `max_send_rate_enforcement` to rate limit sent log bytes

 def handle_completed_callback(result):                 else:                     processor.close() +            return total_bytes_copied

Can you add a @return: value to the function comments.

yanscalyr

comment created time in 2 months

PR opened scalyr/scalyr-agent-2

[WIP] add support for k8s_logs in configuration file

The PR adds a k8s_logs field to the configuration file, which is like a regular log config snippet, except that it matches against a certain k8s field rather than a file path.

Valid k8s fields are k8s_pod, k8s_namespace, k8s_controller and k8s_container. Each entry in the k8s_logs field must use the same key, e.g. they must all be k8s_pod, or all k8s_namespace, otherwise a BadConfiguration exception is thrown.

The value for the field is a string containing a glob pattern to match against.

+69 -4

0 comment

1 changed file

pr created time in 2 months

create barnchscalyr/scalyr-agent-2

branch : imron/ct-25

created branch time in 2 months

Pull request review commentscalyr/scalyr-agent-2

Add lint check which will fail if any of the bundled certs will expire in 6 months or sooner

+#!/usr/bin/env python+# Copyright 2014-2020 Scalyr Inc.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#   http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.++"""+Script which errors out if any of the bundled certs will expire in 6 months or sooner.+"""++from __future__ import absolute_import+from __future__ import print_function++import os+import sys+import glob+import datetime+from io import open++from cryptography import x509+from cryptography.hazmat.backends import default_backend+++def main():+    cwd = os.path.abspath(os.getcwd())++    for file_name in glob.glob("certs/*"):+        file_path = os.path.join(cwd, file_name)++        with open(file_path, "rb") as fp:+            content = fp.read()++        cert = x509.load_pem_x509_certificate(content, default_backend())++        now_dt = datetime.datetime.utcnow()+        expire_in_days = (cert.not_valid_after - now_dt).days++        if now_dt + datetime.timedelta(days=30 * 6) >= cert.not_valid_after:

We should also hook it up to build_package.py and fail the build if the certs are due to expire within the same timeframe.

Kami

comment created time in 2 months

Pull request review commentscalyr/scalyr-agent-2

Add lint check which will fail if any of the bundled certs will expire in 6 months or sooner

+#!/usr/bin/env python+# Copyright 2014-2020 Scalyr Inc.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#   http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.++"""+Script which errors out if any of the bundled certs will expire in 6 months or sooner.+"""++from __future__ import absolute_import+from __future__ import print_function++import os+import sys+import glob+import datetime+from io import open++from cryptography import x509+from cryptography.hazmat.backends import default_backend+++def main():+    cwd = os.path.abspath(os.getcwd())++    for file_name in glob.glob("certs/*"):+        file_path = os.path.join(cwd, file_name)++        with open(file_path, "rb") as fp:+            content = fp.read()++        cert = x509.load_pem_x509_certificate(content, default_backend())++        now_dt = datetime.datetime.utcnow()+        expire_in_days = (cert.not_valid_after - now_dt).days++        if now_dt + datetime.timedelta(days=30 * 6) >= cert.not_valid_after:

I'd increase this to 2 years or something. We have customers that don't upgrade for years and years.

Kami

comment created time in 2 months

CommitCommentEvent

Pull request review commentscalyr/scalyr-agent-2

[WIP] Shard Logs Across Multiple Copying Managers

 def __init__(self, add_events_request, completion_callback):         self.next_pipelined_task = None  +class CopyingManagerFacade(StoppableThread, LogWatcher):+    """+    The CopyingManagerFacade is used to hide the complexity of having sharded CopyingManagers.++    It conforms to the same interface as the original CopyingManager, and is responsible for creating+    and managing multiple copying managers and splitting logs between them.

for example, it won't need a _send_events method and a bunch of the private methods.

Initially it needed them for the tests because I had made CopyingManager private to the module and just exposed CopyingManagerFacade passing through to an internal copying manager and all the tests failed because the tests override CopyingManager.

I'm moving away from that and will just keep the tests for the CopyingManager as is (but using the private class).

I have no preference for using Facade, just thought it made the intent clearer in the design if other people read it. Happy to switch to ShardedCopyManager as we both agree on the general approach.

I'm leaving the checkpoints decision until after I've proven whether it works or not.

imron

comment created time in 2 months

pull request commentmhammond/pywin32

Add support for EvtFormatMessage and EvtCreateRenderContext

@ofek, thanks for picking this up!

ofek

comment created time in 2 months

PR opened scalyr/scalyr-agent-2

[WIP] Shard Logs Across Multiple Copying Managers

Problem

Due to limitations with the current ingest API, the Scalyr Agent can only have a single request in flight for a given Scalyr Agent session, and a single request can have a maximum of 6MB of data after decompression.

This places a limit on the amount of data a single agent can upload to the server, and a number of users are now running in to this limitation.

This branch is a draft to experiment with having multiple CopyingManagers, each monitoring a subset of logs, and each with its own connection and session.

This should allow us to have multiple requests in flight - each from a different session.

Proposed Solution

Create a CopyingManagerFacade with an interface identical to the current CopyingManager. The rest of the application interacts with the facade as normal, and the facade is then responsible for managing the complexity of managing multiple CopyingManagers and sharding logs between them.

This initial commit doesn't contain anything except a stub for the CopyManagerFacade. The main purpose of this commit is to create a draft PR for design discussion.

+334 -0

0 comment

1 changed file

pr created time in 3 months

create barnchscalyr/scalyr-agent-2

branch : imron/sharded-copymanager

created branch time in 3 months

push eventscalyr/scalyr-agent-2

Imron

commit sha 26f8e0a2385c3ed3dede1d204b9086344cc6ee85

AGENT-388 Add extra_config_path as an option (#541) Under kubernetes, it is not convenient to specify custom configuration snippets in the agent.d directory without overriding core configuration snippets. This PR allows users to keep the default configuration snippets we provide, but also provide their own configuration snippets in an additional configuration directory. This additional configuration directory is specified via an environment variable or a command line parameter. If present, the agent will process all .json files in this directory and add them to its current configuration. This only applies to the main agent, and not the agent config scripts.

view details

push time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/agent-388

delete time in 3 months

PR merged scalyr/scalyr-agent-2

Add additional config path as an option

Under kubernetes, it is not convenient to specify custom configuration snippets in the agent.d directory without overriding core configuration snippets.

This PR allows users to keep the default configuration snippets we provide, but also provide their own configuration snippets in an additional configuration directory.

This additional configuration directory is specified via an environment variable or a command line parameter.

If present, the agent will process all .json files in this directory and add them to its current configuration.

This only applies to the main agent, and not the agent config scripts.

+139 -7

0 comment

4 changed files

imron

pr closed time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 314e123a5b014158fe09e95550552fe2e2e39ac7

AGENT-388 Add extra_config_path as an option Under kubernetes, it is not convenient to specify custom configuration snippets in the agent.d directory without overriding core configuration snippets. This PR allows users to keep the default configuration snippets we provide, but also provide their own configuration snippets in an additional configuration directory. This additional configuration directory is specified via an environment variable or a command line parameter. If present, the agent will process all .json files in this directory and add them to its current configuration. This only applies to the main agent, and not the agent config scripts.

view details

push time in 3 months

push eventscalyr/scalyr-agent-2

czerwingithub

commit sha eba43ecdac1d31e960882a8505499adfd9beecbc

Fix invalid code coverage config (#544) The `verify-codecov-config.sh` script is failing. I believe this is the change that will fix it, but not completely confident.

view details

Tomaz Muraus

commit sha 4a387d009f1e3214075d5bb13cffb7085bed5921

Fix unit and smoke tests failure under Python 2.6 (#546) * Don't install various deb and Python dependencies when running unit and smoke tests under Python 2.6 and skip some tests which depend on those libraries. For the last couple of days the tests have been failing due to conflicting versions of some packages. Sadly we can't easily resolve those version conflicts (it's hard to get up to date Docker image with Python 2.6 and we don't want maintain our own at this point).

view details

yanscalyr

commit sha 2fbecb895a663590f32b1dc9e41a90d6dc866093

AGENT-398: Add a method to RateLimiter that will block until there is capacity (#543) * Add a method to RateLimiter that will block until there is capacity * Actually write the test I wanted instead of copying the method and forgetting * Review comments * Review comments * Change to unit test so it passes under python2.6

view details

yanscalyr

commit sha a0672d38fb9e3c96e98a9ebb7a2128528d51827a

Method to parse strings representing data rates (#545) * Method to parse strings representing data rates * Review comments

view details

Imron

commit sha bd9ff040d1df8fa6ae138024f77d98cc281d02a9

AGENT-270/373 - Ensure we don't ignore short-lived pods (#542) Problem -- When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods. The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers. This allows us to pick up short-lived containers that started and stopped since our previous query. However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call. Solution -- I've added a field `has_been_used` to the WarmingEntry of the cache warmer, which is set to False upon creation of the WarmingEntry, and a mechanism for querying this field `pending_first_use`, which returns the opposite of `has_been_used`. Now when we query containers, if a container is "old and dead" but the warming entry for the container is "pending first use" then we continue processing the container until it has been marked as used. We then only mark the entry as 'used' when we have read it from the cache. This ensures that we don't ignore any containers of interest until they have been queried from the cache at least once. Co-authored-by: Imron Alston <imron@scalyr.com>

view details

Imron Alston

commit sha 0733c59f61cbe4ada4b4ada9d582d8aaa4589afc

add additional config path as an option

view details

Imron Alston

commit sha 49a0058d9ff18137a61aeed95f86ffa21c63c032

rename addition_config_dir to extra_config_dir

view details

Imron Alston

commit sha 378033f548a6307afc20ee225709ef01725b20e3

Add unit tests for extra_config_dir

view details

push time in 3 months

push eventscalyr/scalyr-agent-2

Imron

commit sha bd9ff040d1df8fa6ae138024f77d98cc281d02a9

AGENT-270/373 - Ensure we don't ignore short-lived pods (#542) Problem -- When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods. The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers. This allows us to pick up short-lived containers that started and stopped since our previous query. However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call. Solution -- I've added a field `has_been_used` to the WarmingEntry of the cache warmer, which is set to False upon creation of the WarmingEntry, and a mechanism for querying this field `pending_first_use`, which returns the opposite of `has_been_used`. Now when we query containers, if a container is "old and dead" but the warming entry for the container is "pending first use" then we continue processing the container until it has been marked as used. We then only mark the entry as 'used' when we have read it from the cache. This ensures that we don't ignore any containers of interest until they have been queried from the cache at least once. Co-authored-by: Imron Alston <imron@scalyr.com>

view details

push time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/agent-373

delete time in 3 months

PR merged scalyr/scalyr-agent-2

AGENT-373 K8 agent doesn't collect logs from a CronJob

When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods.

The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers.

This allows us to pick up short-lived containers that started and stopped since our previous query.

However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call.

There is no clean way to address this issue, however after investigating a number of different options and discussions with Steven, the way we will handle this is to extend the amount of time we allow for dead containers.

+119 -4

2 comments

2 changed files

imron

pr closed time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha 8f5d8e779f6428d0fc858c1aab108b45c40c1b8a

rename addition_config_dir to extra_config_dir

view details

push time in 3 months

Pull request review commentmhammond/pywin32

Add support for EvtFormatMessage and EvtCreateRenderContext

 static LPWSTR FormatMessageInternal(EVT_HANDLE metadata, EVT_HANDLE event, DWORD 		char buf[2048]; 		sprintf(buf, "EvtFormatMessage: allocated %d, need buffer of size %d", allocated_size, returned_size); 		PyWin_SetAPIError(buf, err);+		free(buf);

This isn't freeing the allocated buffer, because buf is redefined as an array on the stack within this scope, so you'll be freeing the stack variable!

Maybe rename char buf[2048] to to char errorMessage[2048] and then have sprintf and PyWin_SetAPIError use errorMessage rather than buf.

This way buf will still refer to the buf defined in the outer scope.

ofek

comment created time in 3 months

push eventscalyr/scalyr-agent-2

czerwingithub

commit sha eba43ecdac1d31e960882a8505499adfd9beecbc

Fix invalid code coverage config (#544) The `verify-codecov-config.sh` script is failing. I believe this is the change that will fix it, but not completely confident.

view details

Tomaz Muraus

commit sha 4a387d009f1e3214075d5bb13cffb7085bed5921

Fix unit and smoke tests failure under Python 2.6 (#546) * Don't install various deb and Python dependencies when running unit and smoke tests under Python 2.6 and skip some tests which depend on those libraries. For the last couple of days the tests have been failing due to conflicting versions of some packages. Sadly we can't easily resolve those version conflicts (it's hard to get up to date Docker image with Python 2.6 and we don't want maintain our own at this point).

view details

yanscalyr

commit sha 2fbecb895a663590f32b1dc9e41a90d6dc866093

AGENT-398: Add a method to RateLimiter that will block until there is capacity (#543) * Add a method to RateLimiter that will block until there is capacity * Actually write the test I wanted instead of copying the method and forgetting * Review comments * Review comments * Change to unit test so it passes under python2.6

view details

yanscalyr

commit sha a0672d38fb9e3c96e98a9ebb7a2128528d51827a

Method to parse strings representing data rates (#545) * Method to parse strings representing data rates * Review comments

view details

Imron Alston

commit sha b2d0863e34c8444eaf97ce3d33af13cd0ea7d4c2

AGENT-270/373 - Ensure we don't ignore short-lived pods Problem -- When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods. The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers. This allows us to pick up short-lived containers that started and stopped since our previous query. However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call. Solution -- I've added a field `has_been_used` to the WarmingEntry of the cache warmer, which is set to False upon creation of the WarmingEntry, and a mechanism for querying this field `pending_first_use`, which returns the opposite of `has_been_used`. Now when we query containers, if a container is "old and dead" but the warming entry for the container is "pending first use" then we continue processing the container until it has been marked as used. We then only mark the entry as 'used' when we have read it from the cache. This ensures that we don't ignore any containers of interest until they have been queried from the cache at least once.

view details

push time in 3 months

push eventscalyr/agent-poc

Imron Alston

commit sha 316f860edb5f5bcb383152bcc5b0b68e86f93ed1

more formatting

view details

push time in 3 months

push eventscalyr/agent-poc

Imron Alston

commit sha dbe496a5a343922281f1c9f349c2e001c884ee10

formatting changes

view details

push time in 3 months

push eventscalyr/agent-poc

Imron Alston

commit sha 49fcf324e11054834dde0849bbd68f92c39e607e

Added README for standalone version

view details

push time in 3 months

push eventscalyr/agent-poc

Imron Alston

commit sha 355ae289a1b3524d202d7a2d6b73ce6dfbcde1b4

add k8s config map

view details

Imron Alston

commit sha 3e439c85092a5f46626f495f4cb1ce3f457d9e23

Add README and other other config files

view details

push time in 3 months

Pull request review commentscalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

 def _get_containers(                                             ignore_k8s_api_exception=True,                                         )                                         if pod:+                                            # We've read the pod from the cache, so any WarmingEntry associated+                                            # with this is no longer new.  This ensures we pick up short lived+                                            # logs that finished before they had a warm entry in the k8s_cache+                                            if controlled_warmer is not None:+                                                controlled_warmer.mark_not_new(cid)

Added a parameter to mark_has_been_used to say whether or not to also unmark from the warmer.

imron

comment created time in 3 months

Pull request review commentscalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

 def _get_containers(                 if ignore_container is not None and cid == ignore_container:                     continue -                # Note we need to *include* results that were created after the 'running_or_created_after' time.-                # that means we need to *ignore* any containers created before that-                # hence the reason 'create_before' is assigned to a value named '...created_after'-                if _ignore_old_dead_container(-                    container, created_before=running_or_created_after-                ):-                    continue+                # AGENT-373/270 - we don't want to ignore containers that are still+                # waiting to be warmed, otherwise we will miss out on short-lived+                # logs

Done, but note that I didn't use the ternary syntax pending_first_use = controlled_warmer.pending_first_use(container) if controlled_warmer is not None else False because the Black formatter made it just as many lines as the original but made it more awkward to read.

imron

comment created time in 3 months

Pull request review commentscalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

 def _prepare_to_stop(self):         finally:             self.__condition_var.release() +    def mark_not_new(self, container_id):+        """+        Marks a WarmingEntry as no longer new+        @param container_id: the container id of the warming entry to mark as not new+        """+        self.__lock.acquire()+        try:+            entry = self.__active_pods.get(container_id, None)+            if entry:+                entry.is_new = False++        finally:+            self.__lock.release()++    def is_new(self, container_id):

As per conversation earlier, this was changed to pending_first_use.

imron

comment created time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha f5e22460ed09e175cd238e927db524231f838198

address code review issues

view details

push time in 3 months

create barnchscalyr/agent-poc

branch : customer_c

created branch time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha b34a6510b055ff0c113542e3ab778b657c8829f8

AGENT-270/373 - Ensure we don't ignore short-lived pods Problem -- When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods. The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers. This allows us to pick up short-lived containers that started and stopped since our previous query. However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call. Solution -- I've added a field `is_new` to the WarmingEntry of the cache warmer, which is set to True upon creation of the WarmingEntry. Now when we query containers, if the warming entry for the container "is new" then we don't ignore them if they were created before the previous iteration of the loop. We then only mark the entry as 'not new' when we have read it from the cache. This ensures that we don't ignore any containers of interest until they have been queried from the cache at least once.

view details

push time in 3 months

pull request commentscalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

Ok. This approach doesn't work either.

Imagine:

  1. 0s - first iteration of loop
  2. 2s - container 123 created and finished
  3. 5s - first iteration of the loop where 123 is detected previous iteration was at 0s, and 123 was created after 0s so it is included but... 123 isn't in cache so it is marked as 'waiting to warm' and dropped from this iteration cachewarmer then sends query to api for container 123
  4. 7s - cachewarmer receives response from api, and container 123 is no longer pending because it is now warm
  5. 10s - second iteration through the loop where 123 is detected we see container 123 was created before 5s so plan to ignore it we check cachewarmer for pending containers, but 123 is no longer pending, so we do ignore it

Instead I've gone with a variation on this approach

  • When a WarmingEntry is created, it is marked as new.
  • We only ignore dead containers if the WarmingEntry for the container is not new
  • We don't set the WarmingEntry to 'not new' until we have read it from the cache at least once.

There's no need to worry about building up dead containers because, because k8s will eventually remove them from docker, and then we'll stop trying to query them.

I've tested this and it works, and will be uploading it in a moment.

imron

comment created time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha ff51fb3c8feccc065fa213deecf4a72b104ea28d

Ensure we don't ignore short-lived pods

view details

push time in 3 months

Pull request review commentscalyr/scalyr-agent-2

[WIP] add additional config path as an option

 def stop(self):         help="Read configuration from FILE",         metavar="FILE",     )+    parser.add_option(+        "--additional-config-dir",

I also imagine the most common usage of this (k8s) will be done with environment vars, and for non-k8s usage it's not really that necessary (this feature came about because k8s configmaps can't map sub-directories to files, but no such problem exists in standalone world).

imron

comment created time in 3 months

Pull request review commentscalyr/scalyr-agent-2

[WIP] add additional config path as an option

 def stop(self):         help="Read configuration from FILE",         metavar="FILE",     )+    parser.add_option(+        "--additional-config-dir",

The problem with having it as a list is that someone could redefine the list - and accidentally leave of agent.d, which would then break the agent (at least in k8s world where we have a bunch of essential files in agent.d).

Happy to rename to extra-config-dir?

imron

comment created time in 3 months

Pull request review commentmhammond/pywin32

Add support for EvtFormatMessage and EvtCreateRenderContext

 static PyObject *RenderEventValues(EVT_HANDLE render_context, EVT_HANDLE event) 		if (err != ERROR_INSUFFICIENT_BUFFER) { 			PyWin_SetAPIError("EvtRender", err); 			goto cleanup;-		} else {-			// allocate buffer size-			allocated_size = returned_size;-			variants = (PEVT_VARIANT)malloc(allocated_size);-			if (variants == NULL) {-				PyErr_NoMemory();-				goto cleanup;-			}--			Py_BEGIN_ALLOW_THREADS-			bsuccess = EvtRender(render_context, event, EvtRenderEventValues, allocated_size, variants, &returned_size, &prop_count);-			Py_END_ALLOW_THREADS

That part looks fine. I'll leave it to mark to comment on the rest of the PR.

ofek

comment created time in 3 months

Pull request review commentmhammond/pywin32

Add support for EvtFormatMessage and EvtCreateRenderContext

 static PyObject *RenderEventValues(EVT_HANDLE render_context, EVT_HANDLE event) 		if (err != ERROR_INSUFFICIENT_BUFFER) { 			PyWin_SetAPIError("EvtRender", err); 			goto cleanup;-		} else {-			// allocate buffer size-			allocated_size = returned_size;-			variants = (PEVT_VARIANT)malloc(allocated_size);-			if (variants == NULL) {-				PyErr_NoMemory();-				goto cleanup;-			}--			Py_BEGIN_ALLOW_THREADS-			bsuccess = EvtRender(render_context, event, EvtRenderEventValues, allocated_size, variants, &returned_size, &prop_count);-			Py_END_ALLOW_THREADS

By remove else block, I think it just means the literal else and not the contents of the block.

The goto statement on the line before the else means that having an else to control the flow is unnecessary.

The contents of the else block are essential to the correct behaviour of this function, otherwise memory won't be allocated and also the bsuccess field won't be set to the result of the EvtRender call.

ofek

comment created time in 3 months

pull request commentmhammond/pywin32

Add support for EvtFormatMessage and EvtCreateRenderContext

Not at all!

imron

comment created time in 3 months

pull request commentscalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

I had spotted the same flaw and was trying to think of a good way out of it and I think what you've suggested is good.

The code in this commit is because I'm trying to get in to the habit of creating [WIP] draft PRs once I start working on a ticket, and I couldn't find a way to do that without having at least one commit on the branch, hence that initial commit.

imron

comment created time in 3 months

PR opened scalyr/scalyr-agent-2

WIP: AGENT-373 K8 agent doesn't collect logs from a CronJob

When the k8s monitor is using the docker socket for querying containers, it misses out on short-lived pods.

The reason this happens is that when we query the docker socket, we query all containers (running and stopped), but filter them by containers started after the previous time we queried the containers.

This allows us to pick up short-lived containers that started and stopped since our previous query.

However, since the k8s ratelimiter changes were added, if no matching pod is found in the k8s cache then the container is also skipped. This will always happen the first time a pod is queried on the cache, and so it means that short-lived pods are now ignored, because they won't show up when we next query the list of running containers, because it will only show containers created after our previous call.

There is no clean way to address this issue, however after investigating a number of different options and discussions with Steven, the way we will handle this is to extend the amount of time we allow for dead containers.

+21 -1

0 comment

1 changed file

pr created time in 3 months

push eventscalyr/scalyr-agent-2

Imron Alston

commit sha f020f1b5f72bc949488a482e278cb0182bfdba9e

WIP: allow extra time when querying short lived containers

view details

push time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : v2.0.53

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : tests

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/third_party_tls

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/namespace-whitelist

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/monotonic

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/cpu-metrics

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/agent-374

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/agent-239-mysql

delete time in 3 months

delete branch scalyr/scalyr-agent-2

delete branch : imron/agent-115-container-blacklist

delete time in 3 months

create barnchscalyr/scalyr-agent-2

branch : tests

created branch time in 3 months

more