Refactor command ordering/blocking mechanism in CTL

Description

We're running FreeNAS 11.1U4 in production as 'secondary' storage for our VMware environment (running vSphere 6.5).

In the last few weeks we've noticed that ESXi has been losing connectivity to the iSCSI LUNs at around the same time every day (07:25 CET). It usually recovers itself after a few minutes, although we have on one occasion been forced to restart the iSCSI service on FreeNAS in order to restore connectivity.

Here are some logs I've captured on the FreeNAS box while experiencing the issue.

/var/log/messages
Jan 14 07:25:54 freenas ctl_datamove: tag 0xd689705 on (46:3:2) aborted
Jan 14 07:25:54 freenas ctl_datamove: tag 0xd68970d on (46:3:2) aborted
Jan 14 07:26:01 freenas ctl_datamove: tag 0x126780cc on (14:3:2) aborted
Jan 14 07:26:01 freenas ctl_datamove: tag 0x1274dc09 on (15:3:2) aborted
Jan 14 07:26:01 freenas ctl_datamove: tag 0x1274dc08 on (15:3:2) aborted
Jan 14 07:26:01 freenas ctl_datamove: tag 0x1274dc0a on (15:3:2) aborted
Jan 14 07:26:01 freenas ctl_datamove: tag 0x1274dc0b on (15:3:2) aborted
Jan 14 07:26:02 freenas ctl_datamove: tag 0x8b047e1 on (30:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd689753 on (46:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd686179 on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd68617a on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd68617c on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd68617d on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd68617e on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd686176 on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd686177 on (47:3:2) aborted
Jan 14 07:26:14 freenas ctl_datamove: tag 0xd68617f on (47:3:2) aborted
Jan 14 07:26:35 freenas ctl_datamove: tag 0xd6861aa on (47:3:2) aborted
Jan 14 07:26:55 freenas ctl_datamove: tag 0x4ee77b18 on (40:3:2) aborted
Jan 14 07:26:55 freenas ctl_datamove: tag 0x4ee77b19 on (40:3:2) aborted
Jan 14 07:26:55 freenas ctl_datamove: tag 0x4ee77b1a on (40:3:2) aborted
Jan 14 07:26:56 freenas ctl_datamove: tag 0x4ee77b4d on (40:3:2) aborted
Jan 14 07:26:57 freenas ctl_datamove: tag 0x4ee77b4e on (40:3:2) aborted
Jan 14 07:26:57 freenas ctl_datamove: tag 0x4ee77b50 on (40:3:2) aborted
Jan 14 07:26:58 freenas ctl_datamove: tag 0x4ede56e0 on (41:3:2) aborted
Jan 14 07:26:58 freenas ctl_datamove: tag 0x4ee77b8b on (40:3:2) aborted
Jan 14 07:26:58 freenas ctl_datamove: tag 0x4ede56d0 on (41:3:2) aborted
Jan 14 07:27:03 freenas ctl_datamove: tag 0x16a531a9 on (55:3:2) aborted
Jan 14 07:27:03 freenas ctl_datamove: tag 0x1bd49455 on (3:3:2) aborted
Jan 14 07:27:03 freenas ctl_datamove: tag 0x13922263 on (56:3:2) aborted
Jan 14 07:27:03 freenas ctl_datamove: tag 0x16a531ab on (55:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24ab153b on (35:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x22d49897 on (9:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x22d49896 on (9:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24a6fe38 on (34:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24a6fe39 on (34:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24a6fe37 on (34:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24a6fe36 on (34:3:2) aborted
Jan 14 07:27:35 freenas ctl_datamove: tag 0x24a6fe35 on (34:3:2) aborted
Jan 14 07:27:49 freenas ctl_datamove: tag 0x4b7c761b on (8:3:2) aborted
Jan 14 07:27:49 freenas ctl_datamove: tag 0x4b7468a0 on (10:3:2) aborted
Jan 15 00:00:00 freenas syslog-ng[2116]: Configuration reload request received, reloading configuration;
Jan 15 07:25:47 freenas ctl_datamove: tag 0x180dbe3c on (48:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x180dbe41 on (48:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x1393c32c on (6:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x1393c32d on (6:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x12881093 on (15:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x1ad9ef6d on (19:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x1ad9ef6e on (19:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x12881092 on (15:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x1ad9ef6f on (19:3:2) aborted
Jan 15 07:25:47 freenas ctl_datamove: tag 0x24b7e06d on (35:3:2) aborted

ctladm dumpooa taken on Jan 18 07:25:
Dumping OOA queues
LUN 0 tag 0x8bbdf78 RTR: WRITE(16). CDB: (6 ms)
LUN 1 tag 0xa84f9d9 RTR: READ(16). CDB: (3964 ms)
LUN 1 tag 0xa84f9da BLOCKED: READ(16). CDB: (3963 ms)
LUN 1 tag 0xa8e5f34 RTR: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa8e5f35 BLOCKED: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa8e5f36 BLOCKED: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa8e5f37 BLOCKED: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa8e5f38 BLOCKED: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa8e5f39 BLOCKED: READ(16). CDB: (2837 ms)
LUN 1 tag 0xa84f9ee BLOCKED: READ(16). CDB: (2838 ms)
LUN 1 tag 0xa84f9ef BLOCKED: READ(16). CDB: (2838 ms)
LUN 1 tag 0x2327010c RTR: COMPARE AND WRITE. CDB: (2218 ms)
LUN 1 tag 0x2327010d BLOCKED: COMPARE AND WRITE. CDB: (2218 ms)
LUN 1 tag 0x2327010e BLOCKED: COMPARE AND WRITE. CDB: (2218 ms)
LUN 1 tag 0x24dad821 RTR: COMPARE AND WRITE. CDB: (1846 ms)
LUN 1 tag 0x16bcf81c RTR: COMPARE AND WRITE. CDB: (1799 ms)
LUN 1 tag 0x1205469c RTR: WRITE(10). CDB: 2a 00 00 0c 7c 08 00 00 08 00 (1798 ms)
LUN 1 tag 0x5406c4e2 RTR: READ(16). CDB: (1754 ms)
LUN 1 tag 0xa8e5f9a RTR: COMPARE AND WRITE. CDB: (1745 ms)
LUN 1 tag 0x5406c4e3 BLOCKED: READ(16). CDB: (1744 ms)
LUN 1 tag 0x5406c4e4 BLOCKED: READ(16). CDB: (1708 ms)
LUN 1 tag 0x6e38ee4 RTR: COMPARE AND WRITE. CDB: (1657 ms)
LUN 1 tag 0x3daeb53d RTR: COMPARE AND WRITE. CDB: (1516 ms)
LUN 1 tag 0x3e54ea7e RTR: WRITE(16). CDB: (1423 ms)
LUN 1 tag 0xa84fa6b RTR: WRITE(10). CDB: 2a 00 14 61 00 f0 00 00 08 00 (1415 ms)
LUN 1 tag 0x3e6152fb RTR: WRITE(16). CDB: (1411 ms)
LUN 1 tag 0x3e6152fc RTR: WRITE(16). CDB: (1411 ms)
LUN 1 tag 0x3e6152fd RTR: WRITE(16). CDB: (1395 ms)
LUN 1 tag 0x3e6152fe RTR: WRITE(16). CDB: (1395 ms)
LUN 1 tag 0x24dad83d RTR: WRITE(10). CDB: 2a 00 bf 24 0b 58 00 00 18 00 (1395 ms)
LUN 1 tag 0x3e6152ff RTR: WRITE(16). CDB: (1395 ms)
LUN 1 tag 0x3e615300 RTR: WRITE(16). CDB: (1395 ms)
LUN 1 tag 0x24dad83e RTR: WRITE(10). CDB: 2a 00 bf 24 0b 78 00 00 48 00 (1388 ms)
LUN 1 tag 0x3e615301 RTR: WRITE(16). CDB: (1386 ms)
LUN 1 tag 0x24d6b82d RTR: WRITE(10). CDB: 2a 00 bf 24 0c 90 00 00 28 00 (1385 ms)
LUN 1 tag 0x3e54ea80 RTR: WRITE(16). CDB: (1385 ms)
LUN 1 tag 0x3e54ea81 RTR: WRITE(16). CDB: (1385 ms)
LUN 1 tag 0x24d6b82e RTR: WRITE(10). CDB: 2a 00 bf 24 0c c8 00 00 10 00 (1385 ms)
LUN 1 tag 0x3e54ea82 RTR: WRITE(16). CDB: (1385 ms)
LUN 1 tag 0x13ad791c RTR: WRITE(10). CDB: 2a 00 04 26 8a 60 00 00 08 00 (1383 ms)
LUN 1 tag 0x24d6b82f RTR: WRITE(10). CDB: 2a 00 bf 24 0c e0 00 00 08 00 (1379 ms)
LUN 1 tag 0x24d6b830 RTR: WRITE(10). CDB: 2a 00 bf 24 0c f0 00 00 a8 00 (1378 ms)
LUN 1 tag 0x24d6b831 RTR: WRITE(16). CDB: (1378 ms)
LUN 1 tag 0x24d6b832 RTR: WRITE(16). CDB: (1378 ms)
LUN 1 tag 0x13b2e42c RTR: WRITE(10). CDB: 2a 00 0a b6 2b f8 00 00 40 00 (1377 ms)
LUN 1 tag 0x24d6b833 RTR: WRITE(16). CDB: (1375 ms)
LUN 1 tag 0x24d6b834 RTR: WRITE(16). CDB: (1372 ms)
LUN 1 tag 0xa8e5fb7 RTR: WRITE(10). CDB: 2a 00 14 61 01 18 00 00 10 00 (1370 ms)
LUN 1 tag 0x24dad840 RTR: WRITE(16). CDB: (1369 ms)
LUN 1 tag 0x24dad841 RTR: WRITE(16). CDB: (1369 ms)
LUN 1 tag 0x24dad842 RTR: WRITE(16). CDB: (1369 ms)
LUN 1 tag 0x24dad843 RTR: WRITE(16). CDB: (1368 ms)
LUN 1 tag 0x24dad844 RTR: READ(16). CDB: (1368 ms)
LUN 1 tag 0x24dad845 RTR: READ(16). CDB: (1367 ms)
LUN 1 tag 0x24dad846 RTR: WRITE(10). CDB: 2a 00 53 27 23 b8 00 00 20 00 (1367 ms)
LUN 1 tag 0xa8e5fb8 RTR: WRITE(10). CDB: 2a 00 14 61 01 48 00 00 08 00 (1367 ms)
LUN 1 tag 0xa8e5fb9 RTR: WRITE(10). CDB: 2a 00 14 61 01 58 00 00 40 00 (1367 ms)
LUN 1 tag 0x24dad847 RTR: WRITE(10). CDB: 2a 00 53 27 24 20 00 00 10 00 (1366 ms)
LUN 1 tag 0x3e54ea83 RTR: WRITE(16). CDB: (1366 ms)
LUN 1 tag 0x3e54ea84 RTR: WRITE(16). CDB: (1366 ms)
LUN 1 tag 0x3e54ea85 RTR: WRITE(16). CDB: (1366 ms)
LUN 1 tag 0x24d6b835 RTR: WRITE(10). CDB: 2a 00 53 27 24 68 00 00 08 00 (1365 ms)

TOP:
last pid: 89309; load averages: 2.30, 2.64, 2.12 up 50+22:20:40 07:25:45
2351 processes:35 running, 2220 sleeping, 96 waiting
CPU: 0.0% user, 0.0% nice, 9.8% system, 0.2% interrupt, 89.9% idle
Mem: 18M Active, 261M Inact, 244M Laundry, 123G Wired, 1404M Free
ARC: 102G Total, 12G MFU, 86G MRU, 23M Anon, 4333M Header, 15M Other
98G Compressed, 141G Uncompressed, 1.44:1 Ratio
Swap: 10G Total, 422M Used, 9818M Free, 4% Inuse

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
5 root -16 - 0K 160K - 3 30.0H 62.40% ctl{work1}
5 root -16 - 0K 160K - 13 328:11 26.41% ctl{work4}
5 root -16 - 0K 160K - 0 551:23 24.05% ctl{work0}
5 root -16 - 0K 160K - 22 310:47 23.62% ctl{work7}
5 root -16 - 0K 160K - 24 397:27 23.60% ctl{work3}
5 root -16 - 0K 160K - 19 321:57 11.78% ctl{work6}
5 root -16 - 0K 160K - 10 266:04 7.30% ctl{work5}
5 root -16 - 0K 160K - 26 348:31 7.28% ctl{work2}
0 root 8 - 0K 25408K - 9 326:48 4.62% kernel{cblk1 taskq_3}
0 root 8 - 0K 25408K CPU6 6 326:48 4.45% kernel{cblk1 taskq_12}
0 root 8 - 0K 25408K - 10 326:42 4.25% kernel{cblk1 taskq_2}
0 root 8 - 0K 25408K - 20 326:34 4.23% kernel{cblk1 taskq_4}
0 root 8 - 0K 25408K - 27 326:55 4.12% kernel{cblk1 taskq_6}
0 root 8 - 0K 25408K - 29 326:50 4.10% kernel{cblk1 taskq_5}
0 root 8 - 0K 25408K CPU16 16 326:35 3.93% kernel{cblk1 taskq_11}
0 root 8 - 0K 25408K - 27 326:40 3.76% kernel{cblk1 taskq_10}
0 root 8 - 0K 25408K - 19 326:42 3.70% kernel{cblk1 taskq_9}
0 root 8 - 0K 25408K - 10 326:58 3.64% kernel{cblk1 taskq_0}

FreeNAS debug attached to issue.

From ESXi side we see:
2019-01-14T06:26:48.401Z cpu26:66648)WARNING: ScsiDeviceIO: 1188: Device naa.6589cfc000000e67968a2515f597169d performance has deteriorated. I/O latency increased from average value of 3360 microseconds to 3566124 microseconds.
2019-01-14T06:26:48.609Z cpu45:66648)WARNING: ScsiDeviceIO: 1188: Device naa.6589cfc000000e67968a2515f597169d performance has deteriorated. I/O latency increased from average value of 3360 microseconds to 7150418 microseconds.
2019-01-14T06:26:54.989Z cpu4:65664)ScsiDeviceIO: 2912: Cmd(0x43955e9a0d00) 0x28, CmdSN 0xe4 from world 5369729 to dev "naa.6589cfc000000e67968a2515f597169d" failed H:0x8 D:0x0 P:0x0
2019-01-14T06:26:55.316Z cpu4:65664)ScsiDeviceIO: 2912: Cmd(0x43955bc78680) 0x28, CmdSN 0xf7 from world 5369729 to dev "naa.6589cfc000000e67968a2515f597169d" failed H:0x8 D:0x0 P:0x0

2019-01-15T06:26:11.722Z: [vmfsCorrelator] 4897457694402us: [vob.vmfs.heartbeat.timedout] 5af55092-7c0ff165-deb1-0cc47a58dbec freenas-lun02
2019-01-15T06:26:11.723Z: [vmfsCorrelator] 4897422699991us: [esx.problem.vmfs.heartbeat.timedout] 5af55092-7c0ff165-deb1-0cc47a58dbec freenas-lun02

Other useful information:

  • We have approximately 130VMs spread across the LUNs, most of them are idle.

  • Total IOPS from ESXi to FreeNAS is around 2000-3000.

  • There is nothing special that happens at 07:25 that we're aware of.

  • The workload (in terms of IOPS and throughput) is lower at this time than at other times during the day.

  • The LUNs are formatted with VMFS6 and UNMAP is disabled from the ESXi side.

  • We have disabled scrub on ZFS as part of troubleshooting.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander Motin 
March 13, 2019 at 8:50 PM

I've just merged the change into FreeNAS 11.3 nightly. (https://github.com/freenas/os/commit/2ddca8835e6194ddb993ab3433e1e229b8cfb6e8)

QE: Main thing I would test, aside of just iSCSI under heavy load, is ALUA on TrueNAS HA with iSCSI client connected to passive node and traffic going through the interlink. I had to modify that part of code in process.

Alexander Motin 
February 27, 2019 at 9:46 PM

I've just made commit to FreeBSD head (https://svnweb.freebsd.org/changeset/base/344636) that should fix this pathological case when slow backend accumulates zillion of blocked and partially aborted requests, wasting tons of CPU time and making general situation even worse. It is quite significant and invasive change, so I'll let it soak for some time before merging it to FreeNAS.

James McEwan 
February 8, 2019 at 9:26 AM

I've performed a packet capture of iSCSI traffic on the freenas box to try and identify where the IO causing the issue is originating from.

However, I'm having trouble matching the information provided in 'ctladm dumpooa' to the pcap file.

The following information is provided by ctladm dumpooa:
LUN 1 tag 0x24e2423d ABORT RTR: READ(16). CDB: (58376 ms)

I've tried searching for the tag identifier in the pcap, but there are no matches. I suppose this is an internal tag only?

Surely there must be some way of identifying which initiator generated the IO with the highest latency in dumpooa?

James McEwan 
February 5, 2019 at 7:32 AM

Doesn't look like that setting exists in 11.1U4 unfortunately...

root@gs1203:~ # sysctl kern.cam.ctl.time_io_secs
sysctl: unknown oid 'kern.cam.ctl.time_io_secs'

root@gs1203:~ # sysctl -a | grep kern.cam.ctl
kern.cam.ctl2cam.max_sense: 252
kern.cam.ctl.ha_role: 0
kern.cam.ctl.ha_link: 0
kern.cam.ctl.ha_id: 0
kern.cam.ctl.ha_mode: 0
kern.cam.ctl.iscsi.maxtags: 256
kern.cam.ctl.iscsi.login_timeout: 60
kern.cam.ctl.iscsi.ping_timeout: 5
kern.cam.ctl.iscsi.debug: 1
kern.cam.ctl.block.num_threads: 14
kern.cam.ctl.lun_map_size: 1024
kern.cam.ctl.debug: 0
kern.cam.ctl.worker_threads: 8

Alexander Motin 
February 4, 2019 at 9:42 PM

James McEwan wrote:
> Thanks, do you know if the system requires a reboot for that sysctl to take effect?

No, it is changeable live:
<pre>
mav@mavlab:/root# sysctl kern.cam.ctl.time_io_secs
kern.cam.ctl.time_io_secs: 90
mav@mavlab:/root# sysctl kern.cam.ctl.time_io_secs=10
kern.cam.ctl.time_io_secs: 90 -> 10
mav@mavlab:/root# sysctl kern.cam.ctl.time_io_secs
kern.cam.ctl.time_io_secs: 10
</pre>

Just note that it reports I/O not 10 seconds after it is running, but only when it finally complete.

> Is there any further information you need from us with regards to the issue with CTL? Or do you have everything you need?

I'd say it is enough for now to see that aborted requests can be a problem. Whether there is anything else is hard to say without tracking this live in real time.

> Are there any other suggestions you have (apart from rebuilding the pool) to help us deal with this issue? I wonder for example if reducing the queue size on the initiator side may minimise the impact this issue is having.

If you have not many VM hosts, limiting queue depth on each may indeed help a bit, but as I have told aborted requests are not visible for initiator's queue, but are still a burden for the target.

Complete

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created January 18, 2019 at 8:43 AM
Updated December 19, 2019 at 12:40 PM
Resolved October 30, 2019 at 6:36 PM