If you've run a Cestus node, you've seen it: the dashboard says "no models available" while your GPU is clearly busy and the node is serving requests. Or it shows a node as up, with no way to tell whether it just spent the last ten minutes disconnected and scrambling to rejoin. The dashboard is barebones. It does not show download progress, it does not show connectivity history, and it does not give you a straight answer to the only question an operator actually has: is my node healthy and serving right now?
The information exists. It's just buried in the node's log — thousands of lines of raw text scrolling past faster than anyone can read.
So I built a small tool that reads that log and answers the question.
The blind spot
A Rebus node does a lot on startup that the dashboard hides. It pulls six models (on my run: two image models and four text models, including a ~17GB uncensored Qwen), each downloading in parallel over many minutes. Then, during normal operation, it can silently drop off the swarm and reconnect — a :remote, :closed disconnect followed by a retry loop. None of this is surfaced cleanly. If you only watch the dashboard, you are flying blind on the two things that matter most: did my models actually finish, and is my node actually connected.
The approach
rebus-watch is a single Python script, standard library only, no dependencies. It reads the node's own log and turns it into a readable status report. It runs in two modes:
- Report mode parses a saved log and prints a one-shot summary — every model and its state (downloading, downloaded, serving), plus a full list of disconnects and reconnects with timestamps.
- Watch mode follows a live log and prints each operator-relevant event the moment it happens: a download finishing, a model joining the serving pool, a disconnect, a reconnect.
Here is the part that makes it real. The single most useful line in my whole log was this one:
10:16:06.011 [warning] rebus: disconnected ({:remote, :closed}), retry 1/10 in 10s
That's the node dropping off the network. Ten seconds later it rejoined. On the dashboard, that entire event is invisible. The tool catches it with one small piece of code:
RE_DISCONNECT = re.compile(
TS + r'.*?disconnected \((?P<reason>[^)]*)\), retry (?P<retry>\d+/\d+)')
Feed it the log and the disconnect and reconnect show up in the connectivity section, timestamped, every time. That's the difference between "I think my node is fine" and "I know my node dropped at 10:16 and was back by 10:16:16."
Why this is reusable
The tool ships with a real sample log from an actual node run, so anyone can verify it against real data without standing up their own node. And because it reads the log rather than the dashboard, it keeps working regardless of what the dashboard does or doesn't show. Any Cestus/Rebus operator can clone it and get real visibility in under a minute.
The code, the README, and the sample log are all here:
GitHub: https://github.com/walkonwayvs/rebus-watch
If you're running one of these nodes, stop trusting the dashboard and start reading the log. This makes that painless.