Failed to check for alert BootPoolStatus

Description

Hello.

I'm getting this alert in my system:

New alerts:

  • smartd is not running.

Current alerts:

  • smartd is not running.

And after 30 minutes this one:

New alerts:

  • Failed to check for alert BootPoolStatus:
    concurrent.futures.process._RemoteTraceback:
    """
    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 97, in main_worker
    res = loop.run_until_complete(coro)
    File "/usr/local/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 53, in _run
    return await self._call(name, serviceobj, methodobj, params=args, job=job)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 36, in _call
    with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 250, in _init_
    self._ws.connect()
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 93, in connect
    rv = super(WSClient, self).connect()
    File "/usr/local/lib/python3.7/site-packages/ws4py/client/_init_.py", line 223, in connect
    bytes = self.sock.recv(128)
    socket.timeout: timed out
    """

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/alert.py", line 674, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/../alert/source/boot_pool.py", line 15, in check
pool = await self.middleware.call("zfs.pool.query", [["id", "=", "freenas-boot"]])
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1141, in call
app=app, pipes=pipes, job_on_progress_cb=job_on_progress_cb, io_thread=True,
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1081, in _call
return await self._call_worker(name, *args)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1101, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1036, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1010, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
socket.timeout: timed out

Current alerts:

  • Failed to check for alert BootPoolStatus:
    concurrent.futures.process._RemoteTraceback:
    """
    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 97, in main_worker
    res = loop.run_until_complete(coro)
    File "/usr/local/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 53, in _run
    return await self._call(name, serviceobj, methodobj, params=args, job=job)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 36, in _call
    with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 250, in _init_
    self._ws.connect()
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 93, in connect
    rv = super(WSClient, self).connect()
    File "/usr/local/lib/python3.7/site-packages/ws4py/client/_init_.py", line 223, in connect
    bytes = self.sock.recv(128)
    socket.timeout: timed out
    """

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/alert.py", line 674, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/../alert/source/boot_pool.py", line 15, in check
pool = await self.middleware.call("zfs.pool.query", [["id", "=", "freenas-boot"]])
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1141, in call
app=app, pipes=pipes, job_on_progress_cb=job_on_progress_cb, io_thread=True,
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1081, in _call
return await self._call_worker(name, *args)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1101, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1036, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1010, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
socket.timeout: timed out

Maybe this is relevant: the alerts occurred when the scheduled scrub of 5 pools was running. All 5 at same sime.

Debug attached.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

William Gryzbowski 
July 28, 2020 at 10:59 AM

any luck?

William Gryzbowski 
June 26, 2020 at 12:24 PM

Yes, thats what I meant.

The middleware is stuck on OS syscalls and there is not enough data to investigate unless looking at it while its happening. USB stack is "fragile" which could sort of explain this.

vicmarto2 
June 25, 2020 at 10:24 PM

Yes, the boot pool is in a SSD attached by USB.

What is your suggestion? To use a SSD attached by SATA/SAS?

William Gryzbowski 
June 25, 2020 at 5:13 PM

Dont see anything obvious, however the boot pool being in a USB is a red herring.

Any chance you could try with a regular disk/ssd?

vicmarto2 
June 25, 2020 at 1:51 AM

Hi William,

new debug attached, thanks.

Need additional information

Details

Assignee

Reporter

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created June 21, 2020 at 3:27 AM
Updated July 1, 2022 at 4:52 PM
Resolved October 1, 2020 at 6:14 PM