Single LLM Call Tracing Input/output capture per call |
GA- Full prompt/response logging via OpenLLMetry & OTel GenAI conventions
- Token usage, latency, error capture per call
- Grail data lakehouse stores all call data
|
GA- OTLP tracing via EDOT (Python, Java, Node.js)
- Integrates LangTrace, OpenLIT, OpenLLMetry
- Captures model used, duration, errors, tokens, prompt/response
|
GA- LLM service traces via Splunk APM with OTel
- AI Interactions tab in trace view
- AI Events tab for parsed LLM response quality logs
|
GA- Auto-instruments OpenAI, LangChain, Bedrock, Anthropic
- Latency, token usage, error capture without code changes
- Correlated alongside APM data
|
GA- AI Monitoring with auto-instrumentation for Python & Node.js
- Correlates LLM call data with backend service traces
|
GA- Core primitive — "Run" is native concept
- Captures full prompt context, tool availability, and decision state per step
- Enables single-step isolation for debugging
|
GA- LLM call tracing with embedding-level visibility
- Drift detection on LLM output distributions
- Arize Phoenix: OTel-native, open-source option
|
Tool Call Visibility Which tools the agent invoked, with what arguments |
GA- Tool invocations tracked via agentic framework instrumentation
- Supports MCP protocol monitoring
- A2A (agent-to-agent) communication tracing
|
GA- LangChain tool call tracing via EDOT
- Agentic workflow tracing captures tool interactions
|
GA- Tool call spans with runtime & memory details
- Execution paths for agent workflows in AI Agent Monitoring
|
GA- Tool call tracing integrated with LLM spans
- Evaluates tool selection quality
|
Preview- Agent Monitoring release targets multi-agent tool visibility
- Tool invocation data within trace view
|
GA- Every tool call captured with arguments, results, timing
- Used natively in single-step evaluations
|
GA- Tool selection quality as a scored evaluation metric
- Arize AX tracks tool usage patterns
|
Cost & Token Monitoring Token usage, cost-per-request tracking |
GA- Token usage, service fees, resource cost monitoring
- Intelligent detection for cost spikes and usage changes
- A/B model comparison for cost decisions
|
GA- Pre-built dashboards: total invocations & tokens per model/endpoint
- PTU (provisioned throughput units) tracking
- Billing cost visualization for Azure OpenAI, Bedrock
|
GA- Token consumption & request volume in AI Agent Monitoring dashboard
- AI Infrastructure Monitoring for GPU/compute cost
- LLM cost management aligned to business goals
|
GA- Per-request token cost tracking and aggregation
- Cost dashboards correlated to model/deployment version
|
GA- Token and cost tracking in AI Monitoring
- Cost metrics tied to model and workload type
|
GA- Token usage and latency per run and trace
- Cost aggregated per thread/dataset
|
GA- Token cost monitoring with model comparison
- Cost-per-query tracking for production agents
|
End-to-End Agent Trace Multi-step trajectory from input to final output |
GA- End-to-end traces from user request through LLM → orchestration → tools
- Nested structure across all AI stack layers
- Supports LangChain, LlamaIndex, Amazon Bedrock, Strands SDK
|
GA- LangChain request tracing with full execution path
- APM trace view with dependency mapping
- Covers frontend → backend → LLM chain
|
GA- Agent Conversations & AI Trace Views (Alpha → GA Q1 2026)
- Trace view: span details, tool call runtime, agent workflow paths
- Integrated APM + AI Agent Monitoring for full-stack trace
|
GA- LLM traces alongside existing APM data
- Google ADK integration for agent trace visualization
- Trace correlates LLM calls with DB queries and infra metrics
|
GA- 2025 Agentic AI Monitoring: multi-agent systems visibility
- Full-stack trace correlating AI calls with infra
|
GA- Native "Trace" primitive — complete multi-step agent execution
- Nested run structure with parent-child relationships
- Can handle 100MB+ traces for long-horizon agents
|
GA- End-to-end LLM + agent tracing via Arize Phoenix (OTel-based)
- Trace visualization with step-by-step breakdown
|
Topology & Dependency Mapping How agents, tools, and services relate to each other |
GA- Smartscape real-time dependency graph includes AI agent nodes
- Agentic Topology View (roadmap: Smartscape-grade for agent flows)
- Maps agent-to-agent, agent-to-tool, agent-to-service relationships
|
GA- APM service map includes AI/LLM services
- Dependency isolation for bottleneck detection
|
GA- Enhanced flowmaps for AI agent topology
- Service-to-AI dependency visualization in AppDynamics
|
GA- Agent service maps within LLM Observability
- Google ADK integration maps agent decision graphs
|
Preview- Service maps extended to show interconnected agent relationships
|
GA- Trace hierarchy shows nested agent/tool relationships
- Thread view groups traces by session
|
GA- Visual trace explorer with agent flow graphs
- Embedding cluster maps for semantic drift
|
RAG / Retrieval Observability Vector DB, retrieval quality, context grounding |
GA- Vector DB monitoring: Milvus, Weaviate, Chroma
- Semantic cache tracking
- RAG pipeline instrumentation via LangChain/LlamaIndex
|
GA- Integrates with RAG orchestration frameworks
- Prompt/response logging for hallucination detection
- Document transparency in context dashboards
|
GA- Vector DB dashboards: Milvus, Pinecone in AI Infra Monitoring
- Document reliability classification (green/yellow/red)
- Retrieval-to-generation trace for RAG pipelines
|
GA- LangChain + LlamaIndex auto-instrumentation for RAG
- Context relevance and groundedness as evaluation metrics
|
GA- LLM Monitoring includes retrieval pipeline tracing
- RAG context and source tracking in AI Monitoring
|
GA- Full LangChain/LangGraph instrumentation includes retrieval steps
- Each retrieval documented as a child run within trace
|
GA- TruLens integration for RAG-specific metrics
- Context relevance, groundedness, answer relevance scoring
- Hallucination detection purpose-built
|
Guardrails & Safety Monitoring Content filtering, prompt injection, policy compliance |
GA- Guardrail metrics monitoring for bias, errors, misuse
- Compliance monitoring with full data lineage
- Audit trail for all inputs/outputs
|
GA- Amazon Bedrock Guardrails integration
- Azure OpenAI content filter monitoring
- PII/sensitive data leak detection via AI Assistant
- Prompt injection detection
|
GA- Cisco AI Defense integration: prompt injection, PII leakage, hallucination detection, policy violations
- LLM risk, misuse, drift, leakage mitigation
|
GA- Built-in hallucination & failed response detection
- Security scanners for prompt injection & data leaks
|
Preview- Safety metrics within AI Monitoring
- Partner-dependent guardrails integration
|
GA- Online evaluators can run guardrail checks on every trace
- Reference-free evaluations for safety scoring in production
|
GA- Real-time guardrail interception via Luna-2 evaluators (Galileo, integrated)
- PII and policy violation blocking before execution
|
Multi-Turn Session Tracking Grouping traces into conversational threads |
GA- Session-level context preserved across agent executions
- Grail stores time-series session state across turns
|
GA- Multi-turn LangChain session tracing
- Thread-level conversation logs in Elasticsearch
|
Preview (Alpha)- Agent Conversations view groups multi-turn interactions
- Business journey mapping across agent sessions
|
GA- Session replay for multi-turn conversation debugging
- LLM trace correlations across turns
|
Preview- Multi-agent system visibility includes session grouping
- SRE Agent includes incident conversation context
|
GA- Native "Thread" primitive — groups multiple traces into sessions
- Multi-turn evaluation validates context persistence across turns
- State evolution tracking turn-by-turn
|
GA- Thread-level conversation tracing in Arize AX
- Context drift detection across turns
|
State & Memory Tracking How agent memory and artifacts change across turns |
GA- Agent state captured via Grail unified lakehouse
- Continuous context mapping via Smartscape
|
Preview- State stored in Elasticsearch; query-able across sessions
- No dedicated agent memory diff view yet
|
Preview- Agent state changes tracked within conversation view
- AppDynamics: business journey mapping captures state context
|
Preview- State changes viewable through trace spans
- LLM Experiments for testing prompt/state changes against production
|
Roadmap- Announced as part of AI agent monitoring expansion
|
GA- State changes (file writes, memory updates) tracked as part of full-turn evaluation
- Artifacts and memory files inspectable per thread turn
|
GA- Session state monitoring and semantic memory drift detection
|
Single-Step Evaluation Did the agent make the right decision at a specific step? |
GA- Regression tests per model call
- LLM-as-judge scoring integrated (planned: full prompt lifecycle)
- Step-level anomaly detection via Davis AI
|
GA- LLM response evaluation via AI Assist
- Prompt/response sampling for quality review
|
Preview- Quality Evaluations (Alpha in Observability Cloud)
- AGNTCY Metric Compute Engine: relevance, hallucination scoring per step
- LLM-as-judge evaluators in AI Agent Monitoring
|
GA- Built-in hallucination & quality evaluations per trace span
- Tool selection quality as metric
- LLM Experiments tests prompt changes vs. production
|
Preview- Business impact analysis for AI app decisions
- AI Monitoring includes decision quality tracking
|
GA- Core single-step eval workflow: set state → run one step → assert decision
- Production run states extractable as offline test cases
|
GA- Per-call scoring with custom LLM-as-judge or human feedback
- Evaluations on runs with built-in metrics (correctness, relevance)
|
Full-Turn (Trajectory) Evaluation Did the agent execute the full task correctly end-to-end? |
GA- End-to-end trace evaluation via Davis causal analysis
- Trajectory anomaly detection (tool call sequences)
- A/B model comparisons for trajectory efficiency
|
GA- APM trace-level analysis of LangChain agent flows
- Error and bottleneck identification across full trajectory
|
GA- Agent Scorecard (Alpha) for end-to-end performance
- Trajectory checks via AI Agent Monitoring dashboards
- Error rate + performance tracking across full runs
|
GA- Full trace evaluation with latency, error, and quality scoring
- LLM Experiments for offline trajectory testing
|
GA- Agentic AI Monitoring with end-to-end flow assessment
- Business impact analysis per agent execution
|
GA- Full-turn evaluation on traces: trajectory, final response, state change assertions
- Easiest granularity to build evaluations against
|
GA- Trace-level scoring with hallucination, context adherence, tool selection
- Dataset-based offline evaluation workflows
|
Multi-Turn Evaluation Does the agent maintain context correctly over a full session? |
Preview- Agentic Topology View (roadmap) targets multi-turn context visualization
- Davis AI correlates anomalies spanning multiple interactions
|
Preview- Multi-turn session logs queryable via Elasticsearch
- No native automated multi-turn eval framework yet
|
Preview (Alpha)- Agent Conversations view supports multi-turn context review
- AGNTCY metrics propagate across multi-turn sessions
|
Preview- Session replay enables multi-turn inspection
- Cross-turn correlation in LLM Observability
|
Roadmap- Announced multi-agent & session-based monitoring expansion
|
GA- Native "Thread" evaluation validates context persistence across turns
- Conditional eval logic per turn to keep tests on-rails
|
GA- Thread-level semantic drift detection
- Multi-turn session evaluations with per-turn scoring
|
Online (Production) Evaluation Continuous quality checks on live agent traffic |
GA- Davis AI continuously evaluates production behavior
- Intelligent anomaly detection on every trace ingested
- Real-time cost, latency, and quality alerting
|
GA- Real-time dashboards for prompt/response quality
- Guardrail alerting on production traffic
- Anomaly detection via Elastic ML
|
GA- AGNTCY quality metrics as streaming telemetry
- Real-time prompt injection, drift, PII leakage alerts via Cisco AI Defense
- AI Troubleshooting Agent auto-correlates MELT signals
|
GA- Continuous hallucination & injection detection on all production traces
- Watchdog AI anomaly detection on LLM metrics
|
GA- Real-time AI workload monitoring with SRE Agent analysis
- AI Monitoring ingests and scores production traces
|
GA- Online evaluators run as traces are ingested
- Reference-free evaluators (no ground truth needed)
- Trajectory flags, efficiency monitoring, quality scoring in production
|
GA- Real-time guardrail scoring with sub-200ms latency (Luna-2)
- Continuous production monitoring with LLM-as-judge
|
Offline Evaluation / Datasets Building test suites from production traces; pre-deployment testing |
GA- Holdout evaluation sets for model drift comparison
- Custom regression tests per model version
|
Preview- Evaluation via LangTrace/OpenLIT integrations
- No native dataset management for offline eval
|
Preview (Alpha)- Quality Evaluations Alpha supports test set creation from traces
- AppDynamics: compliance-focused offline evaluation
|
GA- LLM Experiments: test prompt changes vs. production baseline
- Offline evaluation integrated with trace replay
|
Roadmap- No-code agent builder will support offline evaluation flows
|
GA- Production traces → datasets (automated pipeline)
- Run offline evals on commit or pre-deployment
- Prompt caching to avoid redundant model calls during eval
|
GA- Dataset management for offline evaluations
- Experiment tracking (Arize AX) with version comparison
- Human annotation workflows for ground truth labeling
|
Ad-Hoc Insights / AI-Assisted Analysis Querying traces at scale; pattern discovery; LLM-as-judge |
GA- Davis AI + Dynatrace Intelligence: causal root cause analysis
- Natural language querying via DQL / notebooks
- Agentic ops system: deterministic + agentic AI fused reasoning
|
GA- Elastic AI Assistant for anomaly investigation
- ES|QL queries across trace data at scale
|
GA- AI Troubleshooting Agent: correlates MELT, surfaces root cause, generates remediation plans
- Splunk MCP Server: query Observability Cloud via AI agents/LLMs
- Splunk platform for ad-hoc log querying at scale
|
GA- Watchdog AI for pattern discovery across LLM metrics
- Dashboards + analytics for failure mode identification
|
GA- SRE Agent for conversational incident investigation
- AI-assisted root cause analysis in observability platform
|
GA- Insights Agent: AI-assisted analysis of large trace datasets
- Query threads to surface failure patterns, inefficiencies, decision explanations
|
GA- Cluster analysis on embeddings for behavioral pattern discovery
- Natural language querying on trace data
|
| OpenTelemetry & Framework Support |
GA- OTel + OpenLLMetry (20+ AI/Agent frameworks)
- Amazon Bedrock, Azure AI Foundry, Strands, AgentCore, Vertex AI, OpenAI, Gemini, DeepSeek, NVIDIA NIM, MCP protocol
|
GA- EDOT (Elastic Distributions of OTel) for Python, Java, Node.js
- Amazon Bedrock, Azure OpenAI, Azure AI Foundry, Google Vertex AI, OpenAI
- LangTrace, OpenLIT, OpenLLMetry as 3rd-party options
|
GA- Major OTel contributor; AGNTCY donation to Linux Foundation
- LangChain, OpenAI, AWS Bedrock, GCP VertexAI, NVIDIA NIMs, LiteLLM, Milvus, Pinecone
|
GA- OpenAI, LangChain, AWS Bedrock, Anthropic, LlamaIndex, Google ADK
- DDTRACE SDK auto-instrumentation
|
GA- OTel-native with Pixie for Kubernetes
- Python & Node.js LLM auto-instrumentation
- MCP server integrations via partner agents
|
GA- Purpose-built for LangChain/LangGraph (single env var setup)
- Supports 50+ frameworks via SDK
- OTel export for piping into other observability stacks
|
GA- Arize Phoenix: fully OTel-native, open source
- OpenAI, LangChain, LlamaIndex, Bedrock, CrewAI, AutoGen
- Interops with Datadog, Honeycomb, Grafana via OpenLLMetry
|
| Key Differentiator / Unique Strength |
🔵 Causal AI + Deterministic Agents: Davis AI provides causal root cause analysis grounded in real-time Smartscape topology. Dynatrace Intelligence fuses deterministic + agentic AI for trusted autonomous operations. 12x better problem resolution vs. pure LLM agents. |
🟡 Search + Observability + Security Unified: Elastic combines LLM observability, security (SIEM), and search in one platform. Strong OTel ecosystem via EDOT. Leader in 2025 Gartner Magic Quadrant for Observability Platforms. |
🟠 Cisco AI Defense + AGNTCY Standards: Unique network/security heritage via Cisco integration enables AI risk detection at infrastructure level. Strong OpenTelemetry contribution and vendor-neutral AGNTCY standard for agent quality metrics. |
🟣 Breadth + APM Correlation: LLM traces integrated directly alongside existing APM, infra, and security data. LLM Experiments allows prompt testing pre-deployment. Watchdog AI for continuous anomaly detection. Google ADK first-mover integration. |
🟢 Application-Centric Depth + Pricing: Strong APM heritage with code-level diagnostics. Predictable data-ingestion pricing. SRE Agent integrates with ServiceNow, PagerDuty, GitHub for agentic remediation. 30% QoQ growth in AI Monitoring adoption. |
🔴 Purpose-Built for Agent Evaluation: Only vendor where Runs, Traces, and Threads are first-class primitives. Production traces automatically become offline test datasets. Deepest LangChain/LangGraph integration. Insights Agent for AI-assisted trace analysis at scale. |
⚪ ML Pedigree + Open Source: Only vendor with traditional ML model monitoring (drift, bias) converging with LLM agent observability. Arize Phoenix is open-source and OTel-native. Strong RAG evaluation with TruLens. Best embedding-level drift detection. |