Observability
Platform Observability
The Archestra platform exposes Prometheus metrics and OpenTelemetry traces for monitoring system health, tracking HTTP requests, and analyzing LLM API performance.
Health Check
The endpoint http://localhost:9000/health returns basic service status:
{
"status": "Archestra Platform API",
"version": "0.0.1"
}
Metrics
The endpoint http://localhost:9000/metrics exposes Prometheus-formatted metrics including:
HTTP Metrics
http_request_duration_seconds_count- Total HTTP requests by method, route, and statushttp_request_duration_seconds_bucket- Request duration histogram bucketshttp_request_summary_seconds- Request duration summary with quantiles
LLM Metrics
llm_request_duration_seconds- LLM API request duration by provider, agent_id, agent_name, and status codellm_tokens_total- Token consumption by provider, agent_id, agent_name, and type (input/output)
Process Metrics
process_cpu_user_seconds_total- CPU time in user modeprocess_cpu_system_seconds_total- CPU time in system modeprocess_resident_memory_bytes- Physical memory usageprocess_start_time_seconds- Process start timestamp
Node.js Runtime Metrics
nodejs_eventloop_lag_seconds- Event loop lag (latency indicator)nodejs_heap_size_used_bytes- V8 heap memory usagenodejs_heap_size_total_bytes- Total V8 heap sizenodejs_external_memory_bytes- External memory usagenodejs_active_requests_total- Currently active async requestsnodejs_active_handles_total- Active handles (file descriptors, timers)nodejs_gc_duration_seconds- Garbage collection timing by typenodejs_version_info- Node.js version information
Distributed Tracing
The platform exports OpenTelemetry traces to help you understand request flows and identify performance bottlenecks. Traces can be consumed by any OTLP-compatible backend (Jaeger, Tempo, Honeycomb, Grafana Cloud, etc.).
Configuration
Configure the OpenTelemetry Collector endpoint via environment variable:
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4318/v1/traces
If not specified, the platform defaults to http://localhost:4318/v1/traces.
What's Traced
The platform automatically traces:
- HTTP requests - All API requests with method, route, and status code
- LLM API calls - External calls to OpenAI, Anthropic, and Gemini with dedicated spans showing exact response time
LLM Request Spans
Each LLM API call includes detailed attributes for filtering and analysis:
Span Attributes:
route.category=llm-proxy- All LLM proxy requestsllm.provider- Provider name (openai,anthropic,gemini)llm.model- Model name (e.g.,gpt-4,claude-3-5-sonnet-20241022)llm.stream- Whether the request was streaming (true/false)route.category- The category of the route (e.g.,llm-proxy,mcp-gateway,api)agent.id- The ID of the agent handling the requestagent.name- The name of the agent handling the requestagent.<label_key>- Custom agent labels (e.g.,agent.environment=production,agent.team=data-science)
Span Names:
openai.chat.completions- OpenAI chat completion callsanthropic.messages- Anthropic message callsgemini.generateContent- Gemini content generation calls
These dedicated spans show the exact duration of external LLM API calls, separate from your application's processing time.
Setting Up Prometheus
The following instructions assume you are familiar with Grafana and Prometheus and have them already set up.
Add the following to your prometheus.yml:
scrape_configs:
- job_name: 'archestra-backend'
static_configs:
- targets: ['localhost:9000'] # Platform API base URL
scrape_interval: 15s
metrics_path: /metrics
If you are unsure what the Platform API base URL is, check the Platform UI's Settings.
Chart Examples
Here are some PromQL queries for Grafana charts to get you started:
HTTP Metrics
-
Request rate by route:
rate(http_request_duration_seconds_count[5m]) -
Error rate by route:
sum(rate(http_request_duration_seconds_count{status_code=~"4..|5.."}[5m])) by (route, method) / sum(rate(http_request_duration_seconds_count[5m])) by (route, method) * 100 -
Response time percentiles:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) -
Memory usage:
process_resident_memory_bytes / 1024 / 1024
LLM Metrics
-
LLM requests per second by agent and provider:
sum(rate(llm_request_duration_seconds_count[5m])) by (agent_name, provider) -
LLM error rate by provider:
sum(rate(llm_request_duration_seconds_count{status_code!="200"}[5m])) by (provider) / sum(rate(llm_request_duration_seconds_count[5m])) by (provider) * 100 -
LLM token usage rate (tokens/sec) by agent name:
sum(rate(llm_tokens_total[5m])) by (provider, agent_name, type) -
Total tokens by agent name:
sum(rate(llm_tokens_total[5m])) by (agent_name, type) -
Request duration by agent name and provider:
histogram_quantile(0.95, sum(rate(llm_request_duration_seconds_bucket[5m])) by (agent_name, provider, le)) -
Error rate by agent:
sum(rate(llm_request_duration_seconds_count{status_code!~"2.."}[5m])) by (agent_name) / sum(rate(llm_request_duration_seconds_count[5m])) by (agent_name)
The screenshot below shows the request rate and duration charts, as well as the rate of LLM calls and their token usage:
