ADR-0002: Observability Stack Choice¶
Status¶
Accepted
Context¶
As the HomeOps Platform evolves toward hosting services on a self-hosted server environment, visibility into system behavior becomes increasingly important. Even at small scale, operating backend services and containerized applications requires insight into resource usage, service health, and runtime behavior.
The available server hardware is limited in terms of CPU and memory, and the platform prioritizes learning value, operational clarity, and maintainability over enterprise-scale observability features. As a result, heavyweight SIEM or log analytics platforms are not considered appropriate at this stage.
An observability solution is needed to support basic monitoring, logging, and alerting for self-hosted services, while remaining lightweight enough to run reliably on constrained hardware.
Decision¶
The HomeOps Platform will adopt a lightweight observability stack based on the following components:
- Metrics collection using Prometheus
- Log aggregation using Loki
- Visualization and alerting using Grafana
This stack will be deployed in a modular manner and treated as a supporting platform service rather than as a core application component.
Rationale¶
The selected observability stack provides the following advantages:
- Low resource footprint compared to traditional SIEM platforms
- Clear separation between metrics, logs, and visualization
- Web-based user interface suitable for headless server environments
- Strong ecosystem support and documentation
- Compatibility with containerized workloads
The stack supports essential observability needs—such as system metrics, application logs, and basic alerting, without introducing unnecessary complexity or operational overhead. This aligns with the platform’s incremental approach to deployment and automation.
More advanced security analytics or centralized event correlation can be considered in later phases if system scale or requirements change.
Consequences¶
Positive¶
- Improved visibility into system and application behavior
- Early detection of resource constraints and runtime issues
- Lightweight deployment suitable for limited hardware
- Observability capabilities available before enabling continuous deployment
Negative¶
- Limited native support for advanced security analytics
- No built-in long-term log retention without additional configuration
- Manual tuning required to balance data retention and resource usage
These limitations are considered acceptable given the current scope and goals of the platform.
Related Decisions¶
- ADR-0001: Decision to introduce a self-hosted server environment
- Future decisions regarding security monitoring and intrusion detection
- Future decisions regarding network segmentation and trust boundaries
- Future decisions regarding continuous deployment enablement