mail alerts pool as OFFLINE except it's not

Description

I get frequent mail alerts, claiming my pool ist OFFLINE. When I check the GUI nothing seems to be offline. Below is an example email text, the messages differ from time to time. I also get alarms for my main pool claiming it would be offline.

I am unable to figure out what is causing these messages.

Example email text:

FreeNAS @ freenas.WORKGRP

The following alert has been cleared:

  • Failed to check for alert BootPoolStatus:
    concurrent.futures.process._RemoteTraceback:
    """
    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 95, in main_worker
    res = loop.run_until_complete(coro)
    File "/usr/local/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 51, in _run
    return await self._call(name, serviceobj, methodobj, params=args, job=job)
    File "/usr/local/lib/python3.7/site-packages/middlewared/worker.py", line 34, in _call
    with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 360, in _init_
    self._ws.connect()
    File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 181, in connect
    rv = super(WSClient, self).connect()
    File "/usr/local/lib/python3.7/site-packages/ws4py/client/_init_.py", line 239, in connect
    self.protocols, self.extensions = self.process_handshake_header(headers)
    File "/usr/local/lib/python3.7/site-packages/ws4py/client/_init_.py", line 332, in process_handshake_header
    raise HandshakeError("Invalid challenge response: %s" % value)
    ws4py.exc.HandshakeError: Invalid challenge response: b'3chy7ewd0p3reftzzk0xpfrygm4='
    """


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/alert.py", line 650, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/../alert/source/boot_pool.py", line 15, in check
pool = await self.middleware.call("zfs.pool.query", [["id", "=", "freenas-boot"]])
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1127, in call
app=app, pipes=pipes, job_on_progress_cb=job_on_progress_cb, io_thread=True,
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1074, in _call
return await self._call_worker(name, *args)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1094, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1029, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1003, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
ws4py.exc.HandshakeError: Invalid challenge response: b'3chy7ewd0p3reftzzk0xpfrygm4='

Current alerts:

  • Scrub of pool 'freenas-boot' finished.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Vladimir Vinogradenko 
February 28, 2020 at 9:55 AM

it's already closed

The simple test I've proposed (SHA1 which is used as part of websocket protocol we use for internal inter-process communication) did not show any obvious errors; probably, a few circumstances must meet together for fault to happen.

mistermanko 
February 28, 2020 at 9:46 AM

Former user no worries, what makes me really wonder is what you mentioned above: "cryptographic functions are misbehaving on your system"

I beginning to wonder if the CPU is at fault when checking hashes. Is there a way to test if AES-IN is working properly in FreeBSD?

I also propose to close this issue and switch over to the other ticket.

Thank you!

Vladimir Vinogradenko 
February 28, 2020 at 9:33 AM

sorry, I was not aware of that ticket. It's very likely that your hardware is ok and these two are related.

mistermanko 
February 28, 2020 at 9:20 AM

Former user memtest finished successfully - as expected - see attached screenshot.

If it happens to be a hardware fault, what kind of faulty hardware are we talking about? HDD? Mainboard? Processor? Is it possible this issue is related to my other open ticket?: https://jira.ixsystems.com/browse/NAS-105046

Vladimir Vinogradenko 
February 28, 2020 at 6:14 AM

freenas-report should just say All Files, Directories and Symlinks in the system were verified successfully; however, on your system the list of inconsistent files is pretty big. I suggest you to re-install your system (preserving config) and see if that helps and hope it's not a result of a faulty hardware.

User Configuration Error

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created February 24, 2020 at 9:12 AM
Updated July 1, 2022 at 4:50 PM
Resolved February 28, 2020 at 6:14 AM