Create a troubleshooting wizard

Description

Ticket created as suggested by Kris Moore

The question arose about which areas of documents/help would have big benefits to improve. Multiple users ( Morgan Littlewood Kjeld Schouten-lebbing (ornias) and I think implied @JGreco) picked adding a troubleshooter of some kind to the software as a beneficial idea. Therefore, as troubleshooting covers a vast unending area of issues, I set out to consider what a feasible, practical level of troubleshooter might look like. The idea was discussed over a few posts, reposted as requested from the forum .

----------------------------------------------------

Me:

Troubleshooting is a big area. Often the user cant say "I was setting ACLs and now I can't reach my files", sometimes its just "I can't reach my files and I dont know why".

Given the huge range of issues and situations, how can this be addressed systematically in some kind of troubleshooter framework?

It feels almost like it needs a guided online flowchart thing, which asks relevant questions depending on previous answers, or something. Even if various lines of troubleshooting end up at "ask the community".

In other words, I don't think a documents or table based format is a good idea for a troubleshooter, for a product like TrueNAS. But I do think its the first places minds will go, both at IX and here.

It feels like the first questions ahould be metaquestions, like "how to approach it" and "what framework/methodology suits the needs?"

How can a good troubleshooter be created? At least for common use cases and ruling in or out, basic matters.
A good troubleshooter feels fundamentally different from a docs page/table. What tools exist specifically for that purpose? Are there existing open source softwares, which can better provide a more intelligent or helpful guided troubleshooter, where community members can add as PRs, rules and statements that guide the user on new issues, and thereby update the troubleshooter easily enough that it fills out and becomes a good aide to solve many common symptoms?

There need to be realistic limits. Two kinds of approach come to mind:

A kind of.basic troubleshooter that starts with simple symptom questions (you see this symptom? Check out these things).
Something that aims to utilise the 80/20 rule, and tries to maximise the useful case-outcomes (common symptoms/issues) for minimum work.

------------------------------------

JoeSchmuck:

As someone who has written a lot of troubleshooting guides and other support documentation for the military, I can tell you that this can be a huge undertaking, no exaggerating at all. One thing you folks may want to think about is to as @Stilez mentioned, a Flow Chart. Create a graphical flow chart in maybe PDF and then decision blocks punt the user to a related Resource (or smaller troubleshooting guides) with hopes they can fix their problem. If you decide to make a full blow troubleshooting chapter then it's going to be a nightmare to maintain. Also place realistic expectations on what the troubleshooting section is to achieve and stick to it. You can't solve all the various problems that people encounter and you should expect a certain level of skill/knowledge from the users. Define assumptions because wrong assumptions will mess troubleshooting all up.

-------------------------------------

MorganL:

I like the idea of a flow chart and a a guide on how to report issues to the forum. In other words initial focus is on a methodology and a way of engaging all the expertise in the community forum. We are a slow AI engine....with thousands of years of Human Learning

From the flow chart you could then have specific docs that discuss how to diagnose drive issues or RAM issues etc. these docs can be improved over time.

-------------------------------------

Me:

A reporting template approach often seems to get good compliance - one with slots for the user to put specific information, and report if they did certain tests. I'd perhaps do it this way:

Make it driven from the web UI, with an online version in case the UI is inaccessible
Web UI is wizard style, asks key questions, compiles a report, and asks the user to download it and attach it to their forum post.
This means it can also interrogate the platform itself, to automate answers on the basic hardware, system, VM/jails, and pool structures, which can be presented as a text area and the user asked to blank anything sensitive. (Logs are useful but very hard to isolate private data, so automated inclusion is not suitable for forum help)

So the process would be a bit like this...

The troubleshooter takes the user through a bunch of standard questions, it acts more as a collator than an AI. We don't need to.ask about hardware or software, or enabled services, since the report gathers that data - we only need to ask the user to review and edit for privacy.

Introduction:
"Welcome to the TrueNAS troubleshooter. This component will ask questions and collect basic information that can be useful if you need to request help. It may suggest actions to try, or provide a report that can be attached to a forum post, as a good starting point for requesting help from the community or ixSystems support."
"All information collected is private. If a report is created, you will have an opportunity to remove any sensitive information before deciding whether to share it."
Type of problem:
Is the problem 1) hardware, 2) software, 3) initial install, 4) suggestion for improvement, or 5) not sure/other?
Appropriate followup questions -
If hardware: disk, ram, boot, network, cpu, not sure/other.
If software: boot, TrueNAS error, pool data issues, VM/jail issue, client ability to use or connect, networking, administration, not sure/other
Problem details:
What is the problem?
Is this a new or recent problem?
When did the problem start?
Did the problem seem to start after anything changed?
What do you have to do, to reproduce the problem (make it happen)?
Does the problem occur every time? If not, when does it seem to happen?
What have you tried so far, to solve the problem?
What is the effect of the issue:
Choose one of these:
Entire platform won't work or unreliable. Platform seems to work but won't boot. Platform boots but can't access user interface/console. Platform UI can be reached but errors occur, or specific features don't work as expected. Platform works but data or storage corruption/risk. Platform works correctly but network clients, VMs or jails aren't working or connecting as expected. Other.
Basic troubleshooting report:
Do you want to create a basic system summary to include with your report. You will be able to review this and remove any private information before using it?
If yes:
Creates basic system summary - HW, SW, VM/jails, services, ZFS structures, SMART, and dumps it in a textarea for the user to review and "ok". It'll be added to the report.
About you:
How would you describe your technical ability and experience?
Reporting/wrapup:
"The troubleshooter has identified the articles listed below, that may be relevant.
"You can download a report of this issue, based on your answers, which may help if you need to seek help."
"You can read next, about the different ways to obtain support, and when these are likely to be appropriate. These include asking the community for help with an issue, obtaining professional support from ixSystems, or reporting a bug or suggestion if you are sure that this is the issue."

(List of links to troubleshooting articles with matching tags to those the troubleshooter thinks relevant, or something)

(Links to articles: "How to obtain community and professional support" "How to report a confirmed bug or suggestion")

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Kris Moore 
July 18, 2024 at 6:06 PM

Thank you for submitting this feature request! To better accommodate and gauge community interest for future versions of TrueNAS we have moved the submission process to our TrueNAS Community Forums. If this feature is still important and relevant for consideration, please refer to the links below on how to submit it for community voting and TrueNAS roadmap review.

Feature Requests Forum:
https://forums.truenas.com/c/features/12

Feature Requests FAQ:
https://forums.truenas.com/t/about-the-feature-requests-category-readme-first/8802

Stilez 
January 30, 2021 at 10:53 AM
(edited)

I had that in mind but I think the way I've suggested seems effective enough, less change needed (hence more realistic/practical implementation), and incorporates that. Taking your situation as an example, I'd have the web UI Troubleshooting Wizard present the user at the end with a results page like this:

"The Troubleshooter has identified the articles listed below, that may be relevant:

Alternatively, you can download a troubleshooting report of this session, and post a request for help on the forum, or seek professional Enterprise/Priority support from ixSystems.

How to: forum support
How to: professional support"

Those solutions are then a tabular Q&A like you suggest ("problem/possible causes/suggested steps"). But the Q&A are provided area by area in standard docs pages, not (for example) individually stored and then selected by the troubleshooter and collated as output.

The advantage is that yes the "solutions" pages might overlap, but its much much simpler to design and maintain, because you just need one docs page per area of the NAS in a "Possible issue -> steps to try" tabular format, which is easy to arrange and even accommodates sections with their own tables if needed, instead of a full blown intelligent selector.

If a "new" symptom/problem occurs related to (say) jails, it can be added by anyone to the jails solutions page and any other relevant solutions page/s, with extreme ease, and without any new tools. The troubleshooting wizard can easily find relevant solutions pages to offer, by searching for tags "[Troubleshooting] & ([OR-ed list of tags suggested by the problem types the user selected])". For example [Troubleshooting] & ([Jails] | [Upgrades]), so that's easy as well.

Kjeld Schouten-lebbing 
January 30, 2021 at 10:00 AM

I think we, as a community, can get this started by creating a standardised format for this.
I would suggest something that is a in-between Q&A style and troubleshooting guide.

For easier problems this creates a nice simple format, while leaving room for more steps:
Q: I updated TrueNAS but my jail is still the old version / How do I update my jail
A: You need to upgrade the jail manually
1. Go to the console
2. Enter command: dfrtdfhygdfh

Unresolved

Details

Priority

Assignee

Reporter

More fields

Katalon Platform

Created January 29, 2021 at 4:22 PM
Updated July 18, 2024 at 6:08 PM