As we see the number of connected agents go back to expected levels, and the number of agents running the previous nightly going down, we consider this incident resolved.
Posted Dec 15, 2022 - 06:58 UTC
Monitoring
The new build (1.37.0-55) has completed for most platforms. Please follow the instructions at https://learn.netdata.cloud/docs/agent/packaging/installer/update if you are on the affected version (1.37.0-48) and want to upgrade your agents manually. If you have automatic updates configured, you can also wait for the update to be done during your night.
We will be monitoring the progress of Agents as they reconnect.
Posted Dec 14, 2022 - 19:02 UTC
Update
The new build (1.37.0-55) has been triggered and we will post an update when it is ready. We will include instructions on how to update manually, or you can wait until the auto-upgrade happens during your night.
Note: * If you are running a nightly build older than 1.37.0-48, you are not affected and no action is required. * If you are running a stable build, you are not affected and no action is required. However, we do strongly recommend upgrading to 1.37.1 because of two security vulnerabilities in older versions.
Posted Dec 14, 2022 - 17:17 UTC
Identified
We have identified the offending change in the Agent.
Only the latest nightly build (1.37.0-48-nightly) of the Agent is affected. The problem only occurs if the Agent tries to reconnect after having lost its first connection to Cloud. This means that if you restart your agent, the problem is avoided until its connection to Cloud drops.
We will issue a new nightly build that removes the offending change.
Posted Dec 14, 2022 - 15:56 UTC
Update
We are able to reproduce the issue and are attempting to pinpoint the cause.
Posted Dec 14, 2022 - 14:38 UTC
Investigating
We are seeing an increasing number of Agents that cannot (properly) connect to Cloud. We are investigating the cause, but initial indications are that it may be related to the latest nightly release of the Agent (version 1.37.0-48-nightly).