Excessive space prediction alerts

Description

<p>More info in the forum post <a href="https://www.truenas.com/community/threads/truecommand-2-0-issues.93712/post-648672">https://www.truenas.com/community/threads/truecommand-2-0-issues.93712/post-648672</a> </p>

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Basil Hendroff
June 18, 2021 at 2:53 AM
(edited)

This morning, I received a single prediction alert (see attached image tc70). In the context of this ticket, I'm now satisfied that the number of predictions has dropped substantially and TC 2 nighlty is now 'less chatty' that its predecessor.

I'd like to take this opportunity to explore predictions a little more and stimulate some discussion, which may lead to improvements for predictions in a future release of TC.

When I saw the prediction alert this morning, I took a moment to reflect on what might have led to this alert. There was only one instance of a large data transfer. Here's a chronology of events.

To test out the updating wheel issue TC-1763, I set up a replication task early yesterday morning local time.
The pull replication started at 1:36 AM and completed around 4:21 AM on 17/6/2021. Around 200 GB was replicated over 2.75 hours.
This morning at 5:15 AM on 18/6/2021 (a day later), I received the prediction alert.

A few observations:

Interestingly, the alert was logged as critical rather than as a warning (see image tc70).
The alert relates to the system shown in image tc71. It shows that I've 6TB (actually 5.66TB) out of 20.5TB of total usable capacity. That's around 25% of space used. It will take me a very long time to use up 80% of the total space. My gut feel is that I'm unlikely to reach this figure by the end of the year on this particular server.
The alert suggests that 82% of the total usable capacity will be consumed in just over two weeks. It's way off the mark in the context of the previous point. I suspect what's happening is that the intent is to say 'if the rate of consumption continues at X rate, Y% will be consumed in Z days'. FWIW, I feel what's missing from the alert text is the rate X. The % consumed Y, I feel, doesn't appear to be particularly useful information. I wonder if something along the following lines might be a more accurate statement for alert text 'If storage consumption continues at X rate, it will take Z days to reach 80% of usable storage capacity'. The 80% rule is a familiar rule of thumb within the broader ZFS community. I believe TC 1.x used this for its future predictions.
I had to try to figure out what event triggered the alert. When the replication occurred, the TC updating wheel showed the progress of the replication. However, once the replication completed, the only evidence in TC that the replication occurred seems to be the prediction alert. I wonder whether it's worthwhile logging something in the alert system whenever the updating wheel is activated e.g. updates, scrubs, replication, etc. This might be useful intel and, in the case of replications, provide some context for prediction alerts.
I feel like the prediction should have been sent in a more timely manner rather than a day later. The further away from the event that triggered the alert, the harder it is to figure out what led to the alert.

EDIT: I'd forgotten I have scheduled TV shows being recorded to this pool. That's likely the reason for the bursty prediction alerts.

Basil Hendroff
June 16, 2021 at 5:51 PM

TC 2 nightly 20210616 - Inconclusive

At this stage, I can't draw any conclusions. Without knowing what the trigger is to initiate a prediction alert, I'm unable to test this.

Resize issue view side panel

Done

Details

Assignee

Ken Moore(Deactivated)

Reporter

Basil Hendroff

Time remaining

Components

Fix versions

2.0.1

Affects versions

2.0-Release

Priority

Lowest

Katalon Platform

Created June 14, 2021 at 3:28 PM

Updated July 6, 2022 at 8:57 PM

Resolved June 15, 2021 at 6:14 PM