Startup issue in latest Agent nightly (1.40.0-6-nightly)
Incident Report for Netdata
Resolved
All packages have been published. If your nodes are still on 1.40.0-6, please refer to the instructions to upgrade: https://learn.netdata.cloud/docs/maintaining/update-netdata-agents#updates-for-most-systems. We are now closing this incident, but please let us know if things are still not working on your nodes.
Posted Jun 16, 2023 - 17:58 UTC
Update
The source tarballs with the fix for native builds are now available. Packages for ARM systems are still building but should be fully published and available by 17:00 UTC at the latest.
Posted Jun 16, 2023 - 14:57 UTC
Monitoring
The native packages for x86-based distributions have been published. The ARM ones are still building and should follow shortly, as well as the static builds. We're monitoring Netdata Cloud and the various social networking tools to monitor the outcome of the new builds.
Posted Jun 16, 2023 - 14:42 UTC
Update
The fix has been merged, we've kicked off the build process for the packages. We will provide another update when the packages for the affected systems have been pushed.
Posted Jun 16, 2023 - 13:24 UTC
Update
We have created a fix for this issue, which is a combination of making systemd not change the ownership and permissions the directories the Agent uses, and the Agent properly changing permissions recursively to recover from the effects of the bad version. As soon as we've tested the fix, and the packages have been built, we will trigger an explicit push to the nightlies repos.
Posted Jun 16, 2023 - 11:43 UTC
Update
While we are working on a fix, which requires a new package to be built, we have developed a workaround. It requires downgrading the Agent to 1.40.0-2-nightly and fixing the permissions. For Debian based systems, this script should work, run as root: https://gist.github.com/ralphm/1326498c474aaacf0a12f9e569dac863
Posted Jun 16, 2023 - 08:11 UTC
Identified
Agents running the most recent nightly (1.40.0-6-nightly) fail to start on some platforms, because of a permissioning issue. We believe the culprit is this change: https://github.com/netdata/netdata/pull/14890, and are working on a fix. As this happens early on in the Agent, this affects Cloud and non-Cloud users alike.
Posted Jun 16, 2023 - 06:53 UTC
Investigating
We are currently investigating an issue with agent connectivity to the cloud.
Posted Jun 16, 2023 - 05:55 UTC
This incident affected: Agent - Cloud Connection (ACLK) and Agent (all platforms).