Health and Metrics

Ash exposes health and metrics endpoints for monitoring, alerting, and integration with orchestration systems. Neither endpoint requires authentication.

Health Check

GET /health

Returns the server's current status, active session and sandbox counts, uptime, and detailed sandbox pool statistics.

Request

No request body. No authentication required.

Response

200 OK

{
  "status": "ok",
  "activeSessions": 3,
  "activeSandboxes": 5,
  "uptime": 86400,
  "pool": {
    "total": 5,
    "cold": 0,
    "warming": 1,
    "warm": 1,
    "waiting": 2,
    "running": 1,
    "maxCapacity": 1000,
    "resumeWarmHits": 42,
    "resumeColdHits": 7
  }
}

Field	Type	Description
`status`	string	Always `"ok"` if the server is reachable
`activeSessions`	integer	Number of sessions in `active` status
`activeSandboxes`	integer	Number of live sandbox processes
`uptime`	integer	Seconds since server start
`pool`	PoolStats	Sandbox pool breakdown

Pool Stats

The pool object provides a detailed view of sandbox states:

Field	Type	Description
`total`	integer	Total sandboxes in the pool
`cold`	integer	Sandboxes not yet started
`warming`	integer	Sandboxes currently starting up
`warm`	integer	Sandboxes ready but not assigned to a session
`waiting`	integer	Sandboxes assigned to a session, idle between messages
`running`	integer	Sandboxes actively processing a message
`maxCapacity`	integer	Maximum number of sandboxes allowed (configured via `ASH_MAX_SANDBOXES`)
`resumeWarmHits`	integer	Total warm resumes (sandbox was still alive)
`resumeColdHits`	integer	Total cold resumes (new sandbox created, state restored)

Prometheus Metrics

GET /metrics

Returns metrics in Prometheus text exposition format. No authentication required.

Request

No request body.

Response

200 OK with Content-Type: text/plain; version=0.0.4; charset=utf-8

# HELP ash_up Whether the Ash server is up (always 1 if reachable).
# TYPE ash_up gauge
ash_up 1

# HELP ash_uptime_seconds Seconds since server start.
# TYPE ash_uptime_seconds gauge
ash_uptime_seconds 86400

# HELP ash_active_sessions Number of active sessions.
# TYPE ash_active_sessions gauge
ash_active_sessions 3

# HELP ash_active_sandboxes Number of live sandbox processes.
# TYPE ash_active_sandboxes gauge
ash_active_sandboxes 5

# HELP ash_pool_sandboxes Sandbox count by state.
# TYPE ash_pool_sandboxes gauge
ash_pool_sandboxes{state="cold"} 0
ash_pool_sandboxes{state="warming"} 1
ash_pool_sandboxes{state="warm"} 1
ash_pool_sandboxes{state="waiting"} 2
ash_pool_sandboxes{state="running"} 1

# HELP ash_pool_max_capacity Maximum sandbox capacity.
# TYPE ash_pool_max_capacity gauge
ash_pool_max_capacity 1000

# HELP ash_resume_total Total session resumes by path (warm=sandbox alive, cold=new sandbox).
# TYPE ash_resume_total counter
ash_resume_total{path="warm"} 42
ash_resume_total{path="cold"} 7

Metric Reference

Metric	Type	Labels	Description
`ash_up`	gauge	--	Always `1` if the server is reachable
`ash_uptime_seconds`	gauge	--	Seconds since server process started
`ash_active_sessions`	gauge	--	Number of sessions in `active` status
`ash_active_sandboxes`	gauge	--	Number of live sandbox processes
`ash_pool_sandboxes`	gauge	`state`	Sandbox count broken down by state: `cold`, `warming`, `warm`, `waiting`, `running`
`ash_pool_max_capacity`	gauge	--	Configured maximum sandbox capacity
`ash_resume_total`	counter	`path`	Cumulative session resume count by path: `warm` (sandbox still alive) or `cold` (new sandbox created)

Prometheus Configuration

Add the following scrape config to your prometheus.yml:

scrape_configs:
  - job_name: 'ash'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:4100']
    metrics_path: '/metrics'

Kubernetes Probes

The /health endpoint is suitable for both liveness and readiness probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ash-server
spec:
  template:
    spec:
      containers:
        - name: ash
          livenessProbe:
            httpGet:
              path: /health
              port: 4100
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 4100
            initialDelaySeconds: 5
            periodSeconds: 5

The liveness probe verifies the server process is responsive. The readiness probe can be used to gate traffic until the server has completed initialization. Both return 200 with {"status": "ok", ...} when the server is healthy.

Health Check​

Request​

Response​

Pool Stats​

Prometheus Metrics​

Request​

Response​

Metric Reference​

Prometheus Configuration​

Kubernetes Probes​

Health Check

Request

Response

Pool Stats

Prometheus Metrics

Request

Response

Metric Reference

Prometheus Configuration

Kubernetes Probes