Third Party to Resolve
Details
Details
Assignee
Triage Team
Triage TeamReporter
Troy Fontaine
Troy FontaineLabels
Components
Fix versions
Affects versions
Priority
More fields
More fields
Katalon Platform
Katalon Platform
Created March 20, 2022 at 3:47 PM
Updated July 6, 2022 at 8:58 PM
Resolved May 18, 2022 at 6:10 PM
I stumbled across this issue while copying large files between 6 and 18 Gigabytes from another host on the same network as my TrueNAS host (running on an HP Microserver 8th Gen) over gigabit ethernet (w/ jumbo frames enabled). I have been trying to move over to SCALE from Core on this host in order to leverage the container benefits w/ SCALE. After moving the host to SCALE, there was a noticeable sequential write performance issue when copying files. It manifested as the initial transfer starting at the expected 115 MB/s - 120 MB/s and then dropping after about 30 seconds (I didn't time this specifically, so it might be faster) to below 20 MB/s and staying there for the remainder of the transfer. The ZFS cache did not appear to be used as reviewing its status from the dashboard while a transfer was taking place did not show it expanding as you would expect.
The HP Microserver 8th Gen is using the onboard B120i storage controller. This controller supports both software RAID and two HBA modes (AHCI and Legacy) and looks to just be an Intel SATA controller that is rebranded. The lspci output shows the following identifier for the controller:
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
After reaching out to the community for assistance, I was asked to run fio benchmarks to test the sequential write performance to try and determine if the problem was with Samba or with ZFS. This lead to the suspicion that the issue is actually with the driver used for this controller on Linux vs. the one used on FreeBSD. The performance impact is most dramatic when using the controllers "Legacy" mode (set in the system's BIOS) which splits the controller into two so that you can boot from the optical drive SATA port. There is also an additional "Drive Write Cache" setting which, based on the test results, may be ignored on FreeBSD but may actually be obeyed with the Linux driver.
Below are comparisons of the results from fio:
TrueNAS Core (Controller Mode: Legacy w/ Drive Write Cache disabled):
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.27
Starting 1 process
random-write: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][19.4%][w=182MiB/s][w=46.7k IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][25.0%][w=172MiB/s][w=43.9k IOPS][eta 00m:36s]
Jobs: 1 (f=1): [W(1)][32.7%][w=161MiB/s][w=41.2k IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][42.6%][w=127MiB/s][w=32.6k IOPS][eta 00m:31s]
Jobs: 1 (f=1): [W(1)][51.8%][w=124MiB/s][w=31.7k IOPS][eta 00m:27s]
Jobs: 1 (f=1): [W(1)][60.7%][w=177MiB/s][w=45.3k IOPS][eta 00m:22s]
Jobs: 1 (f=1): [W(1)][70.2%][w=178MiB/s][w=45.6k IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][79.3%][w=150MiB/s][w=38.5k IOPS][eta 00m:12s]
Jobs: 1 (f=1): [W(1)][88.1%][w=185MiB/s][w=47.3k IOPS][eta 00m:07s]
Jobs: 1 (f=1): [W(1)][98.3%][w=177MiB/s][w=45.2k IOPS][eta 00m:01s]
Jobs: 1 (f=1): [W(1)][100.0%][w=174MiB/s][w=44.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1696: Fri Mar 18 20:33:27 2022
write: IOPS=44.7k, BW=175MiB/s (183MB/s)(10.0GiB/58680msec); 0 zone resets
slat (nsec): min=1248, max=48665k, avg=6192.59, stdev=43464.73
clat (usec): min=22, max=450050, avg=669.19, stdev=2116.12
lat (usec): min=30, max=450067, avg=675.38, stdev=2116.74
clat percentiles (usec):
1.00th=[ 72], 5.00th=[ 109], 10.00th=[ 147], 20.00th=[ 289],
30.00th=[ 537], 40.00th=[ 660], 50.00th=[ 709], 60.00th=[ 742],
70.00th=[ 775], 80.00th=[ 816], 90.00th=[ 881], 95.00th=[ 988],
99.00th=[ 1401], 99.50th=[ 1663], 99.90th=[10814], 99.95th=[32637],
99.99th=[83362]
bw ( KiB/s): min=44776, max=477011, per=100.00%, avg=178846.33, stdev=57449.87, samples=116
iops : min=11194, max=119252, avg=44711.30, stdev=14362.40, samples=116
lat (usec) : 50=0.14%, 100=3.70%, 250=13.83%, 500=10.66%, 750=33.91%
lat (usec) : 1000=33.09%
lat (msec) : 2=4.38%, 4=0.13%, 10=0.06%, 20=0.03%, 50=0.04%
lat (msec) : 100=0.02%, 250=0.01%, 500=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=38, max=6428, avg=506.85, stdev=513.63
sync percentiles (usec):
1.00th=[ 55], 5.00th=[ 72], 10.00th=[ 89], 20.00th=[ 145],
30.00th=[ 251], 40.00th=[ 343], 50.00th=[ 441], 60.00th=[ 553],
70.00th=[ 652], 80.00th=[ 750], 90.00th=[ 857], 95.00th=[ 971],
99.00th=[ 2737], 99.50th=[ 4621], 99.90th=[ 6128], 99.95th=[ 6325],
99.99th=[ 6456]
cpu : usr=21.87%, sys=51.07%, ctx=733769, majf=0, minf=1
IO depths : 1=0.1%, 2=0.5%, 4=1.8%, 8=4.0%, 16=66.3%, 32=27.6%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.7%, 8=2.9%, 16=1.0%, 32=0.5%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2621440,0,8581 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=175MiB/s (183MB/s), 175MiB/s-175MiB/s (183MB/s-183MB/s), io=10.0GiB (10.7GB), run=58680-58680msec
TrueNAS SCALE (Controller Mode: Legacy w/ Drive Write Cache disabled):
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][11.7%][w=13.5MiB/s][w=3456 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][21.7%][w=29.9MiB/s][w=7655 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][w=17.9MiB/s][w=4576 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.7%][w=11.5MiB/s][w=2946 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=17.1MiB/s][w=4384 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=22.6MiB/s][w=5792 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=15.4MiB/s][w=3939 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=29.9MiB/s][w=7660 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=11.5MiB/s][w=2944 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=14.4MiB/s][w=3680 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=9875: Fri Mar 18 20:43:18 2022
write: IOPS=7222, BW=28.2MiB/s (29.6MB/s)(1693MiB/60004msec); 0 zone resets
slat (nsec): min=122, max=219996, avg=376.58, stdev=1030.53
clat (usec): min=9, max=168436, avg=4412.21, stdev=9600.21
lat (usec): min=10, max=168436, avg=4412.59, stdev=9600.25
clat percentiles (usec):
1.00th=[ 562], 5.00th=[ 594], 10.00th=[ 619], 20.00th=[ 644],
30.00th=[ 676], 40.00th=[ 717], 50.00th=[ 758], 60.00th=[ 816],
70.00th=[ 914], 80.00th=[ 3490], 90.00th=[ 13566], 95.00th=[ 24773],
99.00th=[ 44827], 99.50th=[ 57410], 99.90th=[ 83362], 99.95th=[ 99091],
99.99th=[120062]
bw ( KiB/s): min= 3840, max=145152, per=100.00%, avg=28992.54, stdev=29636.75, samples=119
iops : min= 960, max=36288, avg=7248.13, stdev=7409.19, samples=119
lat (usec) : 10=0.01%, 20=0.05%, 50=0.04%, 100=0.03%, 250=0.01%
lat (usec) : 500=0.01%, 750=47.55%, 1000=25.85%
lat (msec) : 2=3.33%, 4=3.61%, 10=4.71%, 20=8.14%, 50=5.91%
lat (msec) : 100=0.72%, 250=0.04%
fsync/fdatasync/sync_file_range:
sync (usec): min=8, max=35020, avg=967.96, stdev=4139.02
sync percentiles (usec):
1.00th=[ 12], 5.00th=[ 18], 10.00th=[ 21], 20.00th=[ 26],
30.00th=[ 31], 40.00th=[ 38], 50.00th=[ 48], 60.00th=[ 61],
70.00th=[ 76], 80.00th=[ 578], 90.00th=[ 717], 95.00th=[ 3032],
99.00th=[15795], 99.50th=[34866], 99.90th=[34866], 99.95th=[34866],
99.99th=[34866]
cpu : usr=1.25%, sys=0.21%, ctx=22896, majf=6, minf=48
IO depths : 1=0.1%, 2=1.0%, 4=7.0%, 8=23.8%, 16=57.6%, 32=11.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=96.9%, 8=0.2%, 16=0.1%, 32=2.8%, 64=0.0%, >=64=0.0%
issued rwts: total=0,433377,0,1666 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=28.2MiB/s (29.6MB/s), 28.2MiB/s-28.2MiB/s (29.6MB/s-29.6MB/s), io=1693MiB (1775MB), run=60004-60004msec
TrueNAS SCALE (Controller Mode: AHCI w/ Disk Write Cache enabled):
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
random-write: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][17.9%][w=151MiB/s][w=38.7k IOPS][eta 00m:32s]
Jobs: 1 (f=1): [W(1)][25.5%][w=143MiB/s][w=36.7k IOPS][eta 00m:38s]
Jobs: 1 (f=1): [W(1)][32.8%][w=72.4MiB/s][w=18.5k IOPS][eta 00m:39s]
Jobs: 1 (f=1): [W(1)][41.7%][w=134MiB/s][w=34.4k IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=148MiB/s][w=37.8k IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=143MiB/s][w=36.7k IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=149MiB/s][w=38.1k IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=132MiB/s][w=33.8k IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=120MiB/s][w=30.7k IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=150MiB/s][w=38.4k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=9716: Fri Mar 18 23:19:16 2022
write: IOPS=37.7k, BW=147MiB/s (155MB/s)(8846MiB/60001msec); 0 zone resets
slat (nsec): min=123, max=245296, avg=434.23, stdev=925.35
clat (usec): min=10, max=14211, avg=838.19, stdev=673.83
lat (usec): min=10, max=14211, avg=838.62, stdev=673.85
clat percentiles (usec):
1.00th=[ 155], 5.00th=[ 180], 10.00th=[ 202], 20.00th=[ 273],
30.00th=[ 461], 40.00th=[ 644], 50.00th=[ 783], 60.00th=[ 889],
70.00th=[ 979], 80.00th=[ 1090], 90.00th=[ 1418], 95.00th=[ 1844],
99.00th=[ 3982], 99.50th=[ 4146], 99.90th=[ 4359], 99.95th=[ 4490],
99.99th=[ 5276]
bw ( KiB/s): min=29208, max=644352, per=100.00%, avg=150996.71, stdev=86341.93, samples=119
iops : min= 7302, max=161088, avg=37749.21, stdev=21585.48, samples=119
lat (usec) : 20=0.01%, 50=0.01%, 100=0.06%, 250=18.20%, 500=13.09%
lat (usec) : 750=14.92%, 1000=25.27%
lat (msec) : 2=24.32%, 4=3.21%, 10=0.92%, 20=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=4, max=3950, avg=292.27, stdev=389.20
sync percentiles (usec):
1.00th=[ 12], 5.00th=[ 19], 10.00th=[ 24], 20.00th=[ 37],
30.00th=[ 63], 40.00th=[ 101], 50.00th=[ 163], 60.00th=[ 217],
70.00th=[ 314], 80.00th=[ 474], 90.00th=[ 717], 95.00th=[ 947],
99.00th=[ 2057], 99.50th=[ 2671], 99.90th=[ 3261], 99.95th=[ 3359],
99.99th=[ 3949]
cpu : usr=10.58%, sys=2.24%, ctx=379917, majf=6, minf=46
IO depths : 1=0.1%, 2=0.1%, 4=0.6%, 8=9.6%, 16=67.2%, 32=22.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.4%, 8=1.6%, 16=1.5%, 32=1.5%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2264641,0,8014 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=147MiB/s (155MB/s), 147MiB/s-147MiB/s (155MB/s-155MB/s), io=8846MiB (9276MB), run=60001-60001msec
TrueNAS SCALE (Controller Mode: Legacy w/ Drive Write Cache enabled):
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][11.7%][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][21.7%][w=177MiB/s][w=45.3k IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.0%][w=301MiB/s][w=77.1k IOPS][eta 00m:36s]
Jobs: 1 (f=1): [W(1)][50.8%][w=173MiB/s][w=44.3k IOPS][eta 00m:30s]
Jobs: 1 (f=1): [W(1)][60.0%][w=183MiB/s][w=46.7k IOPS][eta 00m:24s]
Jobs: 1 (f=1): [W(1)][70.5%][w=179MiB/s][w=45.9k IOPS][eta 00m:18s]
Jobs: 1 (f=1): [W(1)][80.3%][w=175MiB/s][w=44.9k IOPS][eta 00m:12s]
Jobs: 1 (f=1): [W(1)][90.2%][w=174MiB/s][w=44.5k IOPS][eta 00m:06s]
Jobs: 1 (f=1): [W(1)][100.0%][w=171MiB/s][w=43.9k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=11691: Fri Mar 18 22:37:21 2022
write: IOPS=37.0k, BW=148MiB/s (155MB/s)(8897MiB/60001msec); 0 zone resets
slat (nsec): min=120, max=230996, avg=394.22, stdev=772.39
clat (usec): min=9, max=1900.0k, avg=833.37, stdev=13955.21
lat (usec): min=9, max=1900.0k, avg=833.77, stdev=13955.21
clat percentiles (usec):
1.00th=[ 167], 5.00th=[ 194], 10.00th=[ 212], 20.00th=[ 265],
30.00th=[ 404], 40.00th=[ 529], 50.00th=[ 570], 60.00th=[ 627],
70.00th=[ 758], 80.00th=[ 906], 90.00th=[ 1106], 95.00th=[ 1385],
99.00th=[ 3720], 99.50th=[ 4047], 99.90th=[ 9241], 99.95th=[ 19530],
99.99th=[566232]
bw ( KiB/s): min= 256, max=563200, per=100.00%, avg=167112.44, stdev=90222.92, samples=108
iops : min= 64, max=140800, avg=41778.11, stdev=22555.73, samples=108
lat (usec) : 10=0.01%, 20=0.03%, 50=0.01%, 100=0.05%, 250=18.38%
lat (usec) : 500=15.99%, 750=35.21%, 1000=15.70%
lat (msec) : 2=12.72%, 4=1.37%, 10=0.44%, 20=0.04%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 2000=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=5, max=3665, avg=217.48, stdev=267.10
sync percentiles (usec):
1.00th=[ 12], 5.00th=[ 18], 10.00th=[ 21], 20.00th=[ 28],
30.00th=[ 40], 40.00th=[ 64], 50.00th=[ 92], 60.00th=[ 169],
70.00th=[ 229], 80.00th=[ 408], 90.00th=[ 594], 95.00th=[ 717],
99.00th=[ 1237], 99.50th=[ 1385], 99.90th=[ 1614], 99.95th=[ 1647],
99.99th=[ 3654]
cpu : usr=8.59%, sys=1.72%, ctx=348556, majf=5, minf=49
IO depths : 1=0.1%, 2=0.2%, 4=2.1%, 8=13.2%, 16=64.6%, 32=20.3%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.9%, 8=1.0%, 16=1.3%, 32=1.8%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2277556,0,8243 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s), io=8897MiB (9329MB), run=60001-60001msec