All systems have weaknesses, even those that are apparently more robust and tested
this is what I came across, a system that has been in use for 10 years and that has started to give problems for a day ... excluding hw / sw faults, what has changed in use?
from this moment an important daily and statistical monitoring analysis began, many variables, many false positives ...

here we discover temporal conflicts between different events that make the same peripheral but the first chart is perplexing, the problem occurs even when there are few conflicts ... so how come the problem exists and persists?

a second analysis differentiates the types of conflicts but also here without great success if not that of identifying exactly the typologies and intervening surreptitiously in the

To simplify the operation and make it accessible to everyone, I created a graph that in real time reads and displays all the conflicts, the thresholds are checked on the statistics and are still often varied but it is a basis on which to work

thanks to this monitoring system and on statistics you can see that the system sum of errors of several days but we still don't know exactly how many, we started from 3 days now we are at 5 ... the monitoring continues!