Spare Shows Offline And Online Simultaneously
Description
Problem/Justification
Impact
Activity
As of now, the pool status still shows /dev/ada5p2 being used as a spare, but if I try to detach it, it says it doesn't exist. And it shows "healthy" in the status.
Thank you for the report, @Steven Wormuth. Can you please attach a debug file to the "Private Attachments" section of this ticket? To generate a debug file on TrueNAS CORE, log in to the TrueNAS web interface, go to System > Advanced, then click SAVE DEBUG and wait for the file to download to your local system.
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 977, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 235, in detach
self.__zfs_vdev_operation(name, label, lambda target: target.detach())
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 226, in __zfs_vdev_operation
op(target, *args)
File "libzfs.pyx", line 391, in libzfs.ZFS._exit_
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 226, in __zfs_vdev_operation
op(target, *args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 235, in <lambda>
self.__zfs_vdev_operation(name, label, lambda target: target.detach())
File "libzfs.pyx", line 2070, in libzfs.ZFSVdev.detach
AttributeError: 'NoneType' object has no attribute 'type'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 138, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1205, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1062, in detach
await self.middleware.call('zfs.pool.detach', pool['name'], found[1]['guid'])
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1248, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1219, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1146, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1120, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
AttributeError: 'NoneType' object has no attribute 'type'
Generally after hot-spare being activated, it should either detach automatically when the main disk is replaced (as I suppose was done here), or it should promote to the main disk when the original disk is detached. I've just run a quick test from the command line, simulating disk removal, that activated the spare, replaced the removed disk with another, and the spare automatically detached as expected after resilver completion, so I am not sure how to reproduce this issue.
In the debug attached I see the pool in more reasonable state:
pool: HURTLOCKER state: ONLINE scan: scrub repaired 0B in 01:31:40 with 0 errors on Sat Oct 30 18:20:53 2021 config: NAME STATE READ WRITE CKSUM HURTLOCKER ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/085068be-7cba-11e8-803f-4ccc6a933c0a ONLINE 0 0 0 gptid/cf5ca541-2e6e-11e7-ab3a-4ccc6a933c0a ONLINE 0 0 0 spare-2 ONLINE 0 0 0 gptid/e434f685-399f-11ec-a6ce-6805ca8fe536 ONLINE 0 0 0 ada5p2 ONLINE 0 0 0 gptid/d14ca354-2e6e-11e7-ab3a-4ccc6a933c0a ONLINE 0 0 0 spares ada5p2 INUSE currently in use
, where the only issue is that spare was not detached after the original disk replacement. In the linked forum thread the reporter was able to successfully detach the spare disk from the command line, so the middleware backtraces above look like some middleware problem. Or may be some activity, like reboots/scrubs brought the pool from the state on screenshot to one latest in the debug, where detach had succeeded.