Skip to content

ADR-0002: Observability Stack Choice

Status

Accepted

Context

As the HomeOps Platform evolves toward hosting services on a self-hosted server environment, visibility into system behavior becomes increasingly important. Even at small scale, operating backend services and containerized applications requires insight into resource usage, service health, and runtime behavior.

The available server hardware is limited in terms of CPU and memory, and the platform prioritizes learning value, operational clarity, and maintainability over enterprise-scale observability features. As a result, heavyweight SIEM or log analytics platforms are not considered appropriate at this stage.

An observability solution is needed to support basic monitoring, logging, and alerting for self-hosted services, while remaining lightweight enough to run reliably on constrained hardware.

Decision

The HomeOps Platform will adopt a lightweight observability stack based on the following components:

  • Metrics collection using Prometheus
  • Log aggregation using Loki
  • Visualization and alerting using Grafana

This stack will be deployed in a modular manner and treated as a supporting platform service rather than as a core application component.

Rationale

The selected observability stack provides the following advantages:

  • Low resource footprint compared to traditional SIEM platforms
  • Clear separation between metrics, logs, and visualization
  • Web-based user interface suitable for headless server environments
  • Strong ecosystem support and documentation
  • Compatibility with containerized workloads

The stack supports essential observability needs—such as system metrics, application logs, and basic alerting, without introducing unnecessary complexity or operational overhead. This aligns with the platform’s incremental approach to deployment and automation.

More advanced security analytics or centralized event correlation can be considered in later phases if system scale or requirements change.

Consequences

Positive

  • Improved visibility into system and application behavior
  • Early detection of resource constraints and runtime issues
  • Lightweight deployment suitable for limited hardware
  • Observability capabilities available before enabling continuous deployment

Negative

  • Limited native support for advanced security analytics
  • No built-in long-term log retention without additional configuration
  • Manual tuning required to balance data retention and resource usage

These limitations are considered acceptable given the current scope and goals of the platform.

  • ADR-0001: Decision to introduce a self-hosted server environment
  • Future decisions regarding security monitoring and intrusion detection
  • Future decisions regarding network segmentation and trust boundaries
  • Future decisions regarding continuous deployment enablement