Most facilities have a rough sense of what breaks most — it's usually the thing that makes the most noise or causes the most headaches. But "rough sense" isn't data. Data is what gets budget approved, what identifies the asset that's costing you more than its replacement price, and what turns a reactive maintenance team into a proactive one.
Downtime tracking is the foundation of that data. Here's how to do it in a way that's actually usable.
What downtime tracking means in practice
Downtime is any period when a piece of equipment is unavailable to do its job. That includes unplanned failures (the pump seized), planned maintenance (the annual overhaul), and partial failures (the machine runs but at reduced capacity or requires constant babysitting).
Tracking it means recording, at minimum:
- Which asset went down
- When it went down (timestamp)
- Why it went down (failure reason or category)
- When it came back up (resolution timestamp)
- What was done to resolve it
Everything else — cost analysis, pattern detection, budgeting — is derived from that core log. If the log is inconsistent or lives in people's heads, none of the downstream analysis works.
The most common reason downtime doesn't get tracked
It's not that maintenance teams don't understand why it matters. It's that logging downtime takes time, and time is the one thing a maintenance team dealing with active downtime doesn't have.
The fix is making the logging fast and frictionless. If your team can mark equipment down on a phone in 15 seconds — before they even start diagnosing — you'll get consistent data. If they have to find a computer, log into a system, navigate to the right asset, and fill out a six-field form, the log will happen "later," which usually means never.
What to log — and what not to overthink
Start simple. For each downtime event, you need: asset, start time, end time, reason. That's it. You can add fields later — parts used, technician, root cause category — but the core four get you 80% of the value immediately.
Reason codes are worth a small upfront investment. A free-text "why did it fail" field sounds flexible but produces data you can't analyze. A short list of categories — mechanical failure, electrical failure, operator error, scheduled maintenance, waiting on parts — lets you filter and aggregate. Keep the list short enough that it gets used consistently.
The three numbers that actually matter
Once you have downtime data, these are the metrics worth watching:
- MTBF (Mean Time Between Failures) — average time between breakdowns for a given asset. A declining MTBF tells you an asset is degrading and heading toward more frequent failure. That's the signal that triggers a deeper inspection or a replacement conversation.
- MTTR (Mean Time to Repair) — average time from failure to resolution. High MTTR can mean parts availability problems, diagnostic difficulty, or technician skill gaps on a specific asset type. It's where you find bottlenecks.
- Downtime by reason code — a simple breakdown of why equipment is going down. If 40% of your downtime events are "waiting on parts," that's a procurement problem, not a maintenance problem. If it's 60% "mechanical failure" on one asset category, that's a PM gap.
How downtime history prevents future failures
A downtime log is also a failure pattern database. When the same asset fails for the same reason three times, you have a pattern. That pattern tells you something the manufacturer's PM schedule might not — that this specific machine, in your specific environment, under your specific load conditions, needs more attention at a particular point in its cycle.
That's how reactive maintenance teams start to turn proactive: not by overhauling their whole process at once, but by using their own failure history to target the highest-impact PM additions first. Fix the thing that fails most. Then the next. Over 12 months, the emergency work order count drops visibly.
Getting downtime into a system
A spreadsheet works for a small operation. The limitation is that a spreadsheet doesn't push notifications, doesn't automatically calculate MTBF, and doesn't connect your downtime data to your PM schedule. You're always one step behind.
Shiftlyio logs downtime events from any device, ties them to the asset's full history, and surfaces patterns automatically — so you're not digging through rows to find the signal. When an asset crosses a downtime threshold, the data is already there to make the case for a PM change or a replacement. See how it works →
