TrueNAS Scale: UPS (NUT) service fails to start correctly
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Scott Harris March 19, 2021 at 6:18 AM
Im happy to provide diagnostic stuff if craig can't as this was affecting me also. Although i have fixed using the following command
systemctl enable nut-server
On reboot everything is happy.
Waqar March 18, 2021 at 7:08 PM
ping @Craig King
Waqar March 9, 2021 at 6:00 PM
I wonder if it has something to do with the fact of NUT trying to start before networking is configured and then failing in the process. Can you please share the file created by "systemd-analyze plot > bootorder.svg" ?
Craig King February 23, 2021 at 7:19 PM(edited)
Here you go @Waqar, with a bit more information for good measure.
Immediately after restart:
The UI shows the service is running:
The output you requested:
truenas# midclt call ups.config | jq .
{ "id": 1,
"mode": "MASTER",
"identifier": "ups",
"remotehost": "192.168.0.2",
"remoteport": 3493,
"driver": "usbhid-ups$Back-UPS USB",
"port": "auto",
"options": "",
"optionsupsd": "",
"description": "",
"shutdown": "BATT",
"shutdowntimer": 30,
"monuser": "upsmon",
"monpwd": "UbPZV3Nr4NmUp7P",
"extrausers": "",
"rmonitor": false,
"emailnotify": false,
"toemail": [],
"subject": "UPS report generated by %h",
"powerdown": false,
"nocommwarntime": null,
"hostsync": 15,
"shutdowncmd": null,
"complete_identifier":"ups@localhost:3493" }
(Test system, so password is random and not sensitive)
truenas# journalctl -u nut-server
–- Journal begins at Tue 2021-02-23 11:01:43 PST, ends at Tue 2021-02-23 11:04:48 PST. --
-– No entries --
truenas# journalctl -u nut-upsmon
-- Journal begins at Tue 2021-02-23 11:01:43 PST, ends at Tue 2021-02-23 11:04:53 PST. --
-- No entries –-
Note that nut-server is not running...
truenas# systemctl status nut-server
● nut-server.service - Network UPS Tools - power devices information server
Loaded: loaded (/lib/systemd/system/nut-server.service; disabled; vendor preset: disabled)
Active: inactive (dead)
... so nut-monitor can't connect.
truenas# systemctl status nut-monitor
● nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-02-23 11:02:07 PST; 3min 35s ago
Main PID: 1605 (upsmon)
Tasks: 2 (limit: 9485)
Memory: 3.3M
CGroup: /system.slice/nut-monitor.service
├─1604 /lib/nut/upsmon
└─1605 /lib/nut/upsmon}}Feb 23 11:04:53 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:04:58 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:03 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:08 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:13 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:18 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:23 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:28 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:33 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Feb 23 11:05:38 truenas.local upsmon[1605]: UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
... and the UPS cannot be listed.
truenas# upsc -l
Error: Connection failure: Connection refused
Then, restarting the nut-server service:
truenas# systemctl restart nut-server
... happy days:
truenas# journalctl -u nut-server
–- Journal begins at Tue 2021-02-23 11:01:43 PST, ends at Tue 2021-02-23 11:08:26 PST. –-
Feb 23 11:07:44 truenas.local systemd[1]: Starting Network UPS Tools - power devices information server...
Feb 23 11:07:44 truenas.local upsd[4490]: fopen /run/nut/upsd.pid: No such file or directory
Feb 23 11:07:44 truenas.local upsd[4490]: listening on ::1 port 3493
Feb 23 11:07:44 truenas.local upsd[4490]: listening on 127.0.0.1 port 3493
Feb 23 11:07:44 truenas.local upsd[4490]: listening on ::1 port 3493
Feb 23 11:07:44 truenas.local upsd[4490]: listening on 127.0.0.1 port 3493
Feb 23 11:07:44 truenas.local upsd[4490]: Connected to UPS [ups]: usbhid-ups-ups
Feb 23 11:07:44 truenas.local upsd[4490]: Connected to UPS [ups]: usbhid-ups-ups
Feb 23 11:07:44 truenas.local systemd[1]: Started Network UPS Tools - power devices information server.
Feb 23 11:07:44 truenas.local upsd[4491]: Startup successful
Feb 23 11:07:48 truenas.local upsd[4491]: User upsmon@::1 logged into UPS [ups]
truenas# journalctl -u nut-upsmon
–- Journal begins at Tue 2021-02-23 11:01:43 PST, ends at Tue 2021-02-23 11:08:57 PST. –-
-– No entries –-
truenas# upsc -l
Init SSL without certificate database
ups
Edit: formatting
Edit #2: formatting, for real this time
After configuring the UPS service and setting it to automatically start, it does not start up correctly after a full power cycle.
Instead, an alert is SOMETIMES generated, reflecting communication loss with the UPS:
And the system logs show the following:
At this point, the UPS cannot be seen via "upsc -l".
Issuing "systemctl restart nut-monitor" fixes the situation and the UPS is again visible via "upsc -l", implying that the NUT configuration being saved is correctly but that configuration is perhaps not available to the service at the correct time.
Related to https://ixsystems.atlassian.net/browse/NAS-109513#icft=NAS-109513 and https://ixsystems.atlassian.net/browse/NAS-109449#icft=NAS-109449 (debug available on the latter).