UNMAP delays possibly causing FC errors
Description
Problem/Justification
Impact
relates to
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
I am closing this as duplicate of NAS-107364.
The only errors I see in last few days are related to SAS, not FC. It looks like your enclosure decvice disappears and reappears, but for that I'd blame the hardware/firmware. At least for your HBAs there should be newer firmware versions. Whether there are for your expander/backplane/JBOD – I don't know.
Latest debug after "clean" scrub
@Alexander Motin ok, so I think the theory about dedup + scrub is right, I nuked the dataset/zvol that had dedup enables and ran a scrub for that pool and did not see any of the previous errors. I do some other odd crap in dmesg, I will throw a newer debug up for your viewing pleasure.
This does make me wonder though, the deduped dataset/zvol only had like 40G of data, pretty sure I can hold 40G of DDT in 768G of RAM, wonder if for some reason it was forcing reads of the DDT off disk or it really is some hashing issue.
@Alexander Motin ok, sorry for the delay, had to wait for a workload to finish, moved the dedup dataset off the affected pool and nuked it, scrub seems to be going ok at the moment, going to watch it for a bit and see what happen with a light load, then going to introduce some more pressure to it in a day or so. Will report back.
Just upgraded from 11.3-U2 (~180 days of uptime no issues that I recall regarding this.) Now on 11.3-U4.1, it started a scrub after it came up and is causing the systems that run on it to lockup fairly frequently. I am sure I am missing the info you want, so please ask me for anything that you don't see in the debug or want to know.
Sep 13 23:50:53 The-Archive (0:4:10/10): UNMAP. CDB: 42 00 00 00 00 00 00 03 e8 00
Sep 13 23:50:53 The-Archive (0:4:10/10): Tag: 0x12a6f0, type 1
Sep 13 23:50:53 The-Archive (0:4:10/10): ctl_process_done: 288 seconds
Sep 13 23:50:53 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: isp_handle_platform_ctio: CTIO7[12a6f0] seq 0 nc 1 sts 0x8 flg 0x1 sns 0 resid 0 MID
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: isp_handle_platform_ctio: CTIO7[12a6f0] seq 1 nc 1 sts 0x8 flg 0x8040 sns 0 resid 0 FIN
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: isp_handle_platform_ctio: CTIO7[12a6f0] seq 0 nc 1 sts 0x8 flg 0x1 sns 0 resid 0 MID
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0
Sep 13 23:50:54 The-Archive isp1: isp_handle_platform_ctio: CTIO7[12a6f0] seq 1 nc 1 sts 0x8 flg 0x8040 sns 0 resid 0 FIN
Sep 13 23:50:54 The-Archive isp1: CTIO7 completed with Invalid RX_ID 0x12a6f0