Incident Title: Forum Down
Date & Time of Incident: 2025-02-23 02:01:00 (UTC) to 2025-02-23 10:43:00 (UTC)
Prepared by: Q
1. Summary
- The hard drive became full, rejecting any write operation.
- The Webserver stopped normal operation, yielding "502 Bad Gateway" errors.
- The Forum was down.
2. Root Cause Analysis
- Logging and Backups made the hard drive, which was at it's limit for some time, become full.
3. Timeline
Time (UTC)/Event Description
- 02:01 dxasmodeus reported the issue, via an external communication tool.
- 10:31 Q started working on the server
- 10:37 Q determined the cause, being the full hard drive.
- 10:43 Q made space on the hard drive and restarted the server
4. Resolution & Actions Taken
- Old backups were deleted
- Server was restarted.
5. Immediate preventative Measures
- Make additional space on webserver
- Add a command to backend login, that tells the amount of left space in MegaBytes
- Add a file `buffer-storage-blocker-for-emergencies.dat` to the root of the filesystem, to that in the future such space can be freed even quicker.
6. Further possible Mitigations:
- Create some Alerting capability, to notify the responsible Admins, through e.g. Email.
- Mounting variable size directories on a separate volume (e.g. the uploads, the log files and/or the database)
6. Lessons Learned
- Communication among staff worked well.
- There was an absence of monitoring/alerting, that made the little hard drive not as quickly visible.
- There is a permanent lack of storage on the server.
7. Action Items
Task/Owner/Deadline/Status
- Space on Webserver / Q / 2025-02-23 / Done
- Add Command / Q / 2025-03-05 / Done
- Add File / Q / 2025-02-23 / Done
- Separate Data to a separate Volume/Mount Point / swammy+Q / 2025-02-27 / Done
- Complete Post-Mortem-Analysis / Q / 2025-03-02 / Done
- Consider further Actions&Mitigations / Futanari Staff / 2025-03-09 / Open