Divide by Zero Error on Boot

Description

I upgraded from TRUANAS SCALE 21.06 BETA to 21.08 using the GUI. Upon reboot I got a red screen of death "divide by zero" screen and Truanas would no longer boot.

To fix, I downloaded and booted from the 21.08 usb image and tried again to upgrade when prompted. This resulted in a failure. I then repeated the process and only option left was to do a clean install which I did successfully. I then restored the settings from the backup file I made just prior to the reboot and all is owking fine again.

Later I tried rebooting again and same divide by zero erro pops up. Repeated the install and restore to fix.
So system works fine until I reboot, and then fails catastrophically as can't boot or recover without a fresh install/restore. I've attached a picture of the error message shown.

The system it is running on is a HP Smartserver Gen8 with 16GB of RAM and using 5x 4TB HDDs in RADIDZ1 as 1vdev, plus a 8TB USB3 HDD as a single stripe (for temporary/non critical data).
Prior to Trunas SCALE 21.08 this was running Ubuntu 20.04 LTS and then SCALE 21.06 without any issues.
Happy to provide more details/logs if needed, but as I can't boot when it happens I'm not sure what else I can provide.

Problem/Justification

None

Impact

None

Activity

Paul Hawker 
October 22, 2021 at 2:43 PM

Hi Ryan,
As requested I've now found the time to gather more data you asked for.
I can now confirm that after the server boots, it shows the boot menu briefly, then for a fraction of a second the "Welcome to Grub" black screen shows. You have to be fast to catch it though as it's so quick. Normally after that it switches to the TrueNas blue bootloader screen for a second and then starts booting Linux. However, once the crash occurs after the very fast Grub screen shows it goes straight ot he red "Divide By Zero" screen instead of the usual Turneas blue bootloader screen.

This additional information would seem to prove it is not the server's firmware having this issue as suspected above as far as I can tell, but happy to be corrected if you think that is possible even after the Welcome to Grub screen.

A few more relevant pieces of information I have found:
1) This error only occurs if I select Reboot from the Truenas shutdown menu. If I select Shutdown and then turn the server back on again all is fine (I have tested this over half a dozen times now so confidence is high that is safe now).
2) This error also affects the new 21.08 BETA2 in the same way.

So to summarise, this issue is avoidable usually if I remember to never use the Reboot option. Unfortunately upgrading to higher releases from the GUI is out of the question as this automatically reboots afterwards instead of shutting down so causes the issue.
A fresh install from USB stick is fine as that gives the option to shutdown once completed instead of reboot.

N.B. Once the Divide By Zero error has occurred once, it never boots again even if power down/up completely, as Divide By Zero comes back every single time. Also once you have rebooted and got that error, the usb boot upgrade previous installation always fails so you can only install it fresh while choosing the format the boot device option.
I have not yet tried a full shutdown and then boot to new version on USB and then an upgrade via that method but I suspect it would work. I will try it when 21.10 is out.

Thanks!

Ryan Moeller 
September 20, 2021 at 5:49 PM

Unfortunately I don't think there's anything we can do to resolve this issue other than hope it is an issue that gets fixed upstream somewhere. There's nothing useful for me to go off of and it doesn't seem to be a common issue other people experience either.

Paul Hawker 
September 8, 2021 at 4:55 PM

Hi,
Up to June 2021 it was running Ubuntu 20.04 fine since around mid 2020. Prior to that it was running Windows years ago then Linux Mint for a few years, never with any issues. At that point switched to SCALE 21.06 and that also was fine the few times I rebooted it since.

I've not tried another OS since upgrading to 21.08 and seeing this error as this system is in use so difficult to find the time to do that, but will do if useful at some point.
Interesting that you say it's not from Linux, I hadn't considered that as assumed the OS as that was the only thing that had changed, and it had rebooted fine many times including a few times right before the upgrade but it could be true and just a case of coincidental timing I guess?
The server goes though the usual POST checks and all passes and then get to the point where the bootloader would normally show briefly, but while the screen goes black right before the RSOD it is too quick to read it even so no chance of switching the boot option.

If it is a firmware bug could it be something that 21.08 is doing differently that past OS's have not that could explain it? Can i provide any logs before the reboot to help?

Ryan Moeller 
September 8, 2021 at 3:22 PM

Does this machine still work with other operating systems? You say you ran Ubuntu and a previous scale version fine before, but what about after?

This error doesn't come from the Linux kernel, it looks like it comes from the machine's firmware.

At what point do you see this message? Is it before or after the GRUB boot menu? Can you choose the 21.06 boot environment from the boot menu to boot successfully?

Third Party to Resolve

Details

Assignee

Reporter

Labels

Impact

Time remaining

0m

Components

Fix versions

Affects versions

Priority

Katalon Platform

Created September 3, 2021 at 8:59 AM
Updated July 6, 2022 at 9:01 PM
Resolved September 20, 2021 at 5:49 PM