Skip to content

Monitoring & Observability

Learn how to monitor agent performance, track metrics, debug issues, and gain visibility into agent behavior in production.

Overview

Effective monitoring enables you to:

  • Track performance - Measure latency, throughput, and success rates
  • Debug issues - Identify and diagnose problems quickly
  • Optimize costs - Monitor token usage and API costs
  • Ensure reliability - Detect and respond to failures
  • Improve quality - Analyze agent behavior and outputs

Core Metrics

1. Performance Metrics

Track execution time and throughput:

typescript
import { createReActAgent } from '@agentforge/patterns';

class MetricsCollector {
  private metrics: Map<string, number[]> = new Map();
  
  record(metric: string, value: number) {
    if (!this.metrics.has(metric)) {
      this.metrics.set(metric, []);
    }
    this.metrics.get(metric)!.push(value);
  }
  
  getStats(metric: string) {
    const values = this.metrics.get(metric) || [];
    if (values.length === 0) return null;
    
    const sorted = [...values].sort((a, b) => a - b);
    return {
      count: values.length,
      min: sorted[0],
      max: sorted[sorted.length - 1],
      avg: values.reduce((a, b) => a + b, 0) / values.length,
      p50: sorted[Math.floor(sorted.length * 0.5)],
      p95: sorted[Math.floor(sorted.length * 0.95)],
      p99: sorted[Math.floor(sorted.length * 0.99)]
    };
  }
}

const metrics = new MetricsCollector();

// Measure agent execution time
const startTime = Date.now();
const result = await agent.invoke(input);
const duration = Date.now() - startTime;

metrics.record('agent.duration', duration);
metrics.record('agent.success', 1);

console.log('Performance stats:', metrics.getStats('agent.duration'));

2. Token Usage Metrics

Track token consumption and costs:

typescript
class TokenMetrics {
  private totalTokens = 0;
  private totalCost = 0;
  private requestCount = 0;
  
  record(usage: { promptTokens: number; completionTokens: number }, model: string) {
    const total = usage.promptTokens + usage.completionTokens;
    this.totalTokens += total;
    this.requestCount++;
    
    // Calculate cost
    const pricing = {
      'gpt-4': { input: 0.03, output: 0.06 },
      'gpt-3.5-turbo': { input: 0.0015, output: 0.002 }
    };
    
    const cost = 
      (usage.promptTokens / 1000) * pricing[model].input +
      (usage.completionTokens / 1000) * pricing[model].output;
    
    this.totalCost += cost;
  }
  
  getStats() {
    return {
      totalTokens: this.totalTokens,
      totalCost: this.totalCost.toFixed(4),
      requestCount: this.requestCount,
      avgTokensPerRequest: Math.round(this.totalTokens / this.requestCount),
      avgCostPerRequest: (this.totalCost / this.requestCount).toFixed(4)
    };
  }
}

const tokenMetrics = new TokenMetrics();

const result = await agent.invoke(input, {
  callbacks: [{
    handleLLMEnd: (output) => {
      tokenMetrics.record(output.llmOutput?.tokenUsage, 'gpt-4');
    }
  }]
});

console.log('Token stats:', tokenMetrics.getStats());

3. Error Metrics

Track failures and error rates:

typescript
class ErrorMetrics {
  private errors: Map<string, number> = new Map();
  private totalRequests = 0;
  
  recordSuccess() {
    this.totalRequests++;
  }
  
  recordError(errorType: string) {
    this.totalRequests++;
    this.errors.set(errorType, (this.errors.get(errorType) || 0) + 1);
  }
  
  getStats() {
    const totalErrors = Array.from(this.errors.values()).reduce((a, b) => a + b, 0);
    const errorRate = this.totalRequests > 0 ? (totalErrors / this.totalRequests) * 100 : 0;
    
    return {
      totalRequests: this.totalRequests,
      totalErrors,
      errorRate: errorRate.toFixed(2) + '%',
      errorsByType: Object.fromEntries(this.errors)
    };
  }
}

const errorMetrics = new ErrorMetrics();

try {
  const result = await agent.invoke(input);
  errorMetrics.recordSuccess();
} catch (error) {
  errorMetrics.recordError(error.name);
  throw error;
}

Logging

Structured Logging

Use structured logs for better analysis:

typescript
import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

// Log agent execution
logger.info('Agent invocation started', {
  agentType: 'ReActAgent',
  input: input.messages[0].content,
  timestamp: new Date().toISOString()
});

const result = await agent.invoke(input, {
  callbacks: [{
    handleLLMStart: (llm, prompts) => {
      logger.debug('LLM call started', {
        model: llm.modelName,
        promptLength: prompts[0].length
      });
    },
    handleLLMEnd: (output) => {
      logger.info('LLM call completed', {
        tokens: output.llmOutput?.tokenUsage,
        duration: output.llmOutput?.estimatedDuration
      });
    },
    handleToolStart: (tool, input) => {
      logger.info('Tool execution started', {
        tool: tool.name,
        input
      });
    },
    handleToolEnd: (output) => {
      logger.info('Tool execution completed', {
        outputLength: JSON.stringify(output).length
      });
    },
    handleLLMError: (error) => {
      logger.error('LLM error', {
        error: error.message,
        stack: error.stack
      });
    }
  }]
});

logger.info('Agent invocation completed', {
  success: true,
  duration: Date.now() - startTime
});

Log Levels

Use appropriate log levels:

typescript
// ERROR: System errors, failures
logger.error('Agent execution failed', { error: error.message });

// WARN: Degraded performance, retries
logger.warn('Token limit approaching', { usage: tokenUsage });

// INFO: Important events, completions
logger.info('Agent task completed', { duration, tokens });

// DEBUG: Detailed execution flow
logger.debug('Tool selected', { tool: toolName, reasoning });

// TRACE: Very detailed, token-level
logger.trace('LLM token generated', { token, position });

Tracing

Distributed Tracing

Track requests across services:

typescript
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';

// Setup tracing
const provider = new NodeTracerProvider();
provider.addSpanProcessor(
  new BatchSpanProcessor(new JaegerExporter())
);
provider.register();

const tracer = trace.getTracer('agentforge');

// Trace agent execution
async function tracedAgentInvoke(input: any) {
  const span = tracer.startSpan('agent.invoke');
  
  try {
    span.setAttribute('agent.type', 'ReActAgent');
    span.setAttribute('input.length', input.messages[0].content.length);
    
    const result = await agent.invoke(input, {
      callbacks: [{
        handleLLMStart: () => {
          const llmSpan = tracer.startSpan('llm.call', { parent: span });
          llmSpan.end();
        },
        handleToolStart: (tool) => {
          const toolSpan = tracer.startSpan('tool.execute', { parent: span });
          toolSpan.setAttribute('tool.name', tool.name);
          toolSpan.end();
        }
      }]
    });
    
    span.setStatus({ code: SpanStatusCode.OK });
    return result;
  } catch (error) {
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: error.message
    });
    throw error;
  } finally {
    span.end();
  }
}

LangSmith Integration

Use LangSmith for comprehensive tracing:

typescript
import { Client } from 'langsmith';

const client = new Client({
  apiKey: process.env.LANGSMITH_API_KEY
});

const result = await agent.invoke(input, {
  callbacks: [{
    handleChainStart: (chain, inputs, runId) => {
      client.createRun({
        id: runId,
        name: chain.name,
        run_type: 'chain',
        inputs
      });
    },
    handleChainEnd: (outputs, runId) => {
      client.updateRun(runId, {
        outputs,
        end_time: Date.now()
      });
    }
  }]
});

Metrics Exporters

Prometheus Integration

Export metrics to Prometheus:

typescript
import { register, Counter, Histogram, Gauge } from 'prom-client';

// Define metrics
const agentInvocations = new Counter({
  name: 'agent_invocations_total',
  help: 'Total number of agent invocations',
  labelNames: ['agent_type', 'status']
});

const agentDuration = new Histogram({
  name: 'agent_duration_seconds',
  help: 'Agent execution duration',
  labelNames: ['agent_type'],
  buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60]
});

const tokenUsage = new Counter({
  name: 'agent_tokens_total',
  help: 'Total tokens used',
  labelNames: ['agent_type', 'model']
});

const activeAgents = new Gauge({
  name: 'agent_active_count',
  help: 'Number of currently active agents'
});

// Instrument agent
async function monitoredAgentInvoke(input: any) {
  activeAgents.inc();
  const startTime = Date.now();

  try {
    const result = await agent.invoke(input, {
      callbacks: [{
        handleLLMEnd: (output) => {
          tokenUsage.inc({
            agent_type: 'ReActAgent',
            model: 'gpt-4'
          }, output.llmOutput?.tokenUsage?.totalTokens || 0);
        }
      }]
    });

    agentInvocations.inc({ agent_type: 'ReActAgent', status: 'success' });
    return result;
  } catch (error) {
    agentInvocations.inc({ agent_type: 'ReActAgent', status: 'error' });
    throw error;
  } finally {
    const duration = (Date.now() - startTime) / 1000;
    agentDuration.observe({ agent_type: 'ReActAgent' }, duration);
    activeAgents.dec();
  }
}

// Expose metrics endpoint
import express from 'express';
const app = express();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(9090);

DataDog Integration

Send metrics to DataDog:

typescript
import { StatsD } from 'node-dogstatsd';

const dogstatsd = new StatsD();

async function datadogMonitoredInvoke(input: any) {
  const startTime = Date.now();

  try {
    const result = await agent.invoke(input, {
      callbacks: [{
        handleLLMEnd: (output) => {
          dogstatsd.increment('agent.llm.calls', 1, ['model:gpt-4']);
          dogstatsd.histogram('agent.llm.tokens',
            output.llmOutput?.tokenUsage?.totalTokens || 0,
            ['model:gpt-4']
          );
        },
        handleToolStart: (tool) => {
          dogstatsd.increment('agent.tool.calls', 1, [`tool:${tool.name}`]);
        }
      }]
    });

    dogstatsd.increment('agent.invocations', 1, ['status:success']);
    return result;
  } catch (error) {
    dogstatsd.increment('agent.invocations', 1, ['status:error']);
    throw error;
  } finally {
    const duration = Date.now() - startTime;
    dogstatsd.histogram('agent.duration', duration, ['agent:react']);
  }
}

Dashboards

Grafana Dashboard

Create a comprehensive monitoring dashboard:

json
{
  "dashboard": {
    "title": "AgentForge Monitoring",
    "panels": [
      {
        "title": "Agent Invocations",
        "targets": [{
          "expr": "rate(agent_invocations_total[5m])"
        }]
      },
      {
        "title": "Success Rate",
        "targets": [{
          "expr": "rate(agent_invocations_total{status=\"success\"}[5m]) / rate(agent_invocations_total[5m])"
        }]
      },
      {
        "title": "P95 Latency",
        "targets": [{
          "expr": "histogram_quantile(0.95, agent_duration_seconds_bucket)"
        }]
      },
      {
        "title": "Token Usage",
        "targets": [{
          "expr": "rate(agent_tokens_total[5m])"
        }]
      },
      {
        "title": "Active Agents",
        "targets": [{
          "expr": "agent_active_count"
        }]
      }
    ]
  }
}

Custom Dashboard

Build a real-time dashboard:

typescript
import express from 'express';
import { Server } from 'socket.io';

const app = express();
const server = require('http').createServer(app);
const io = new Server(server);

// Serve dashboard
app.get('/dashboard', (req, res) => {
  res.sendFile(__dirname + '/dashboard.html');
});

// Emit metrics to connected clients
setInterval(() => {
  io.emit('metrics', {
    timestamp: Date.now(),
    performance: metrics.getStats('agent.duration'),
    tokens: tokenMetrics.getStats(),
    errors: errorMetrics.getStats()
  });
}, 1000);

server.listen(3000);
html
<!-- dashboard.html -->
<!DOCTYPE html>
<html>
<head>
  <title>Agent Dashboard</title>
  <script src="/socket.io/socket.io.js"></script>
  <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
  <h1>AgentForge Dashboard</h1>

  <div>
    <h2>Performance</h2>
    <canvas id="performanceChart"></canvas>
  </div>

  <div>
    <h2>Token Usage</h2>
    <canvas id="tokenChart"></canvas>
  </div>

  <script>
    const socket = io();

    socket.on('metrics', (data) => {
      updateCharts(data);
    });

    function updateCharts(data) {
      // Update performance chart
      // Update token chart
      // Update error chart
    }
  </script>
</body>
</html>

Alerting

Alert Rules

Define alert conditions:

typescript
class AlertManager {
  private rules: Array<{
    name: string;
    condition: (metrics: any) => boolean;
    action: (metrics: any) => void;
  }> = [];

  addRule(name: string, condition: (metrics: any) => boolean, action: (metrics: any) => void) {
    this.rules.push({ name, condition, action });
  }

  check(metrics: any) {
    for (const rule of this.rules) {
      if (rule.condition(metrics)) {
        console.log(`🚨 Alert triggered: ${rule.name}`);
        rule.action(metrics);
      }
    }
  }
}

const alertManager = new AlertManager();

// High error rate alert
alertManager.addRule(
  'High Error Rate',
  (metrics) => parseFloat(metrics.errorRate) > 5,
  (metrics) => {
    sendSlackAlert(`Error rate is ${metrics.errorRate}`);
  }
);

// High latency alert
alertManager.addRule(
  'High Latency',
  (metrics) => metrics.p95 > 10000,
  (metrics) => {
    sendSlackAlert(`P95 latency is ${metrics.p95}ms`);
  }
);

// High token usage alert
alertManager.addRule(
  'High Token Usage',
  (metrics) => metrics.totalTokens > 100000,
  (metrics) => {
    sendSlackAlert(`Token usage: ${metrics.totalTokens}`);
  }
);

// Check alerts periodically
setInterval(() => {
  alertManager.check({
    errorRate: errorMetrics.getStats().errorRate,
    p95: metrics.getStats('agent.duration')?.p95,
    totalTokens: tokenMetrics.getStats().totalTokens
  });
}, 60000);

Notification Channels

Send alerts to various channels:

typescript
import { WebClient } from '@slack/web-api';
import nodemailer from 'nodemailer';

// Slack notifications
const slack = new WebClient(process.env.SLACK_TOKEN);

async function sendSlackAlert(message: string) {
  await slack.chat.postMessage({
    channel: '#alerts',
    text: `🚨 AgentForge Alert: ${message}`
  });
}

// Email notifications
const transporter = nodemailer.createTransport({
  service: 'gmail',
  auth: {
    user: process.env.EMAIL_USER,
    pass: process.env.EMAIL_PASS
  }
});

async function sendEmailAlert(subject: string, message: string) {
  await transporter.sendMail({
    from: 'alerts@agentforge.com',
    to: 'team@company.com',
    subject: `[AgentForge] ${subject}`,
    text: message
  });
}

// PagerDuty integration
import { event } from '@pagerduty/pdjs';

async function sendPagerDutyAlert(severity: string, message: string) {
  await event({
    data: {
      routing_key: process.env.PAGERDUTY_KEY,
      event_action: 'trigger',
      payload: {
        summary: message,
        severity,
        source: 'agentforge'
      }
    }
  });
}

Debugging Tools

Agent Execution Visualizer

Visualize agent execution flow:

typescript
class ExecutionVisualizer {
  private steps: Array<{
    type: string;
    timestamp: number;
    data: any;
  }> = [];

  recordStep(type: string, data: any) {
    this.steps.push({
      type,
      timestamp: Date.now(),
      data
    });
  }

  generateMermaidDiagram(): string {
    let diagram = 'graph TD\n';

    this.steps.forEach((step, i) => {
      const nodeId = `step${i}`;
      const label = `${step.type}: ${JSON.stringify(step.data).substring(0, 30)}`;
      diagram += `  ${nodeId}[${label}]\n`;

      if (i > 0) {
        diagram += `  step${i-1} --> ${nodeId}\n`;
      }
    });

    return diagram;
  }

  generateTimeline(): string {
    const startTime = this.steps[0]?.timestamp || Date.now();

    return this.steps.map((step, i) => {
      const elapsed = step.timestamp - startTime;
      return `${elapsed}ms: ${step.type} - ${JSON.stringify(step.data)}`;
    }).join('\n');
  }
}

const visualizer = new ExecutionVisualizer();

const result = await agent.invoke(input, {
  callbacks: [{
    handleLLMStart: () => visualizer.recordStep('LLM Start', {}),
    handleLLMEnd: (output) => visualizer.recordStep('LLM End', { tokens: output.llmOutput?.tokenUsage }),
    handleToolStart: (tool, input) => visualizer.recordStep('Tool Start', { tool: tool.name, input }),
    handleToolEnd: (output) => visualizer.recordStep('Tool End', { output })
  }]
});

console.log('Execution Timeline:');
console.log(visualizer.generateTimeline());

console.log('\nExecution Diagram:');
console.log(visualizer.generateMermaidDiagram());

Performance Profiler

Profile agent performance:

typescript
class PerformanceProfiler {
  private profiles: Map<string, { count: number; totalTime: number; samples: number[] }> = new Map();

  async profile<T>(name: string, fn: () => Promise<T>): Promise<T> {
    const startTime = Date.now();

    try {
      return await fn();
    } finally {
      const duration = Date.now() - startTime;

      if (!this.profiles.has(name)) {
        this.profiles.set(name, { count: 0, totalTime: 0, samples: [] });
      }

      const profile = this.profiles.get(name)!;
      profile.count++;
      profile.totalTime += duration;
      profile.samples.push(duration);
    }
  }

  getReport() {
    const report: any = {};

    for (const [name, profile] of this.profiles.entries()) {
      const sorted = [...profile.samples].sort((a, b) => a - b);
      report[name] = {
        calls: profile.count,
        totalTime: profile.totalTime,
        avgTime: profile.totalTime / profile.count,
        minTime: sorted[0],
        maxTime: sorted[sorted.length - 1],
        p50: sorted[Math.floor(sorted.length * 0.5)],
        p95: sorted[Math.floor(sorted.length * 0.95)]
      };
    }

    return report;
  }
}

const profiler = new PerformanceProfiler();

// Profile different operations
await profiler.profile('agent.invoke', async () => {
  return await agent.invoke(input);
});

await profiler.profile('tool.webScraper', async () => {
  return await webScraper.invoke({ url: 'https://example.com' });
});

console.log('Performance Report:');
console.log(JSON.stringify(profiler.getReport(), null, 2));

Best Practices

1. Monitor Key Metrics

Always track these essential metrics:

  • Request rate and throughput
  • Latency (p50, p95, p99)
  • Error rate and types
  • Token usage and costs
  • Resource utilization

2. Set Up Alerts

Configure alerts for critical conditions:

  • Error rate > 5%
  • P95 latency > 10s
  • Token usage > budget
  • Memory usage > 80%

3. Use Structured Logging

Always use structured, searchable logs:

typescript
logger.info('event', { key: 'value', timestamp: Date.now() });

4. Implement Distributed Tracing

Track requests across services for complex systems.

5. Create Dashboards

Build real-time dashboards for visibility.

Next Steps

Further Reading

Released under the MIT License.