Deployment Strategies

Learn how to deploy AgentForge applications to production with best practices for scalability, reliability, and security.

Overview

Production deployment requires:

Scalability - Handle varying loads efficiently
Reliability - Ensure high availability and fault tolerance
Security - Protect API keys and sensitive data
Performance - Optimize for low latency and high throughput
Observability - Monitor and debug production issues

Deployment Architectures

1. Serverless Deployment

Deploy agents as serverless functions:

typescript

// Vercel Edge Function
import { createReActAgent } from '@agentforge/patterns';
import { ChatOpenAI } from '@langchain/openai';

export const config = {
  runtime: 'edge'
};

const agent = createReActAgent({
  model: new ChatOpenAI({
    model: 'gpt-4',
    apiKey: process.env.OPENAI_API_KEY
  }),
  tools: [webScraper, calculator]
});

export default async function handler(req: Request) {
  const { query } = await req.json();
  
  try {
    const result = await agent.invoke({
      messages: [{ role: 'user', content: query }]
    });
    
    return new Response(JSON.stringify(result), {
      headers: { 'Content-Type': 'application/json' }
    });
  } catch (error) {
    return new Response(JSON.stringify({ error: error.message }), {
      status: 500,
      headers: { 'Content-Type': 'application/json' }
    });
  }
}

Pros:

Auto-scaling
Pay per use
No infrastructure management
Global distribution

Cons:

Cold starts
Execution time limits
Limited memory
Stateless

2. Container Deployment

Deploy with Docker and Kubernetes:

dockerfile

# Dockerfile
FROM node:18-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code
COPY . .

# Build TypeScript
RUN npm run build

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node healthcheck.js

# Start application
CMD ["node", "dist/index.js"]

yaml

# kubernetes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agentforge-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agentforge-api
  template:
    metadata:
      labels:
        app: agentforge-api
    spec:
      containers:
      - name: api
        image: agentforge-api:latest
        ports:
        - containerPort: 3000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: openai-api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: agentforge-api
spec:
  selector:
    app: agentforge-api
  ports:
  - port: 80
    targetPort: 3000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agentforge-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: agentforge-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Pros:

Full control
Stateful support
No execution limits
Custom scaling

Cons:

Infrastructure management
Higher baseline cost
More complex setup

3. Hybrid Deployment

Combine serverless and containers:

typescript

// API Gateway (Serverless)
export default async function handler(req: Request) {
  const { query, priority } = await req.json();
  
  if (priority === 'high') {
    // Route to dedicated container cluster
    return await fetch('https://agents.company.com/invoke', {
      method: 'POST',
      body: JSON.stringify({ query })
    });
  } else {
    // Handle in serverless function
    return await agent.invoke({ messages: [{ role: 'user', content: query }] });
  }
}

Environment Configuration

Environment Variables

Manage configuration securely:

typescript

// config.ts
import { z } from 'zod';

const envSchema = z.object({
  NODE_ENV: z.enum(['development', 'staging', 'production']),
  OPENAI_API_KEY: z.string().min(1),
  ANTHROPIC_API_KEY: z.string().optional(),
  REDIS_URL: z.string().url(),
  DATABASE_URL: z.string().url(),
  LOG_LEVEL: z.enum(['error', 'warn', 'info', 'debug']).default('info'),
  MAX_CONCURRENT_AGENTS: z.coerce.number().default(10),
  TOKEN_BUDGET_PER_REQUEST: z.coerce.number().default(10000)
});

export const config = envSchema.parse(process.env);

bash

# .env.production
NODE_ENV=production
OPENAI_API_KEY=sk-...
REDIS_URL=redis://redis:6379
DATABASE_URL=postgresql://...
LOG_LEVEL=info
MAX_CONCURRENT_AGENTS=20
TOKEN_BUDGET_PER_REQUEST=15000

Secrets Management

Use secure secret management:

typescript

// AWS Secrets Manager
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';

async function getSecret(secretName: string): Promise<string> {
  const client = new SecretsManagerClient({ region: 'us-east-1' });
  
  const response = await client.send(
    new GetSecretValueCommand({ SecretId: secretName })
  );
  
  return response.SecretString!;
}

// Initialize with secrets
const openaiKey = await getSecret('openai-api-key');
const llm = new ChatOpenAI({ apiKey: openaiKey });

typescript

// HashiCorp Vault
import vault from 'node-vault';

const vaultClient = vault({
  endpoint: process.env.VAULT_ADDR,
  token: process.env.VAULT_TOKEN
});

async function getVaultSecret(path: string): Promise<any> {
  const result = await vaultClient.read(path);
  return result.data;
}

const secrets = await getVaultSecret('secret/agentforge');
const llm = new ChatOpenAI({ apiKey: secrets.openai_key });

Load Balancing

Application Load Balancer

Distribute traffic across instances:

typescript

// Express.js with clustering
import cluster from 'cluster';
import os from 'os';
import express from 'express';

if (cluster.isPrimary) {
  const numCPUs = os.cpus().length;
  console.log(`Primary ${process.pid} is running`);
  
  // Fork workers
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  
  cluster.on('exit', (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork();  // Replace dead worker
  });
} else {
  const app = express();
  
  app.post('/api/agent', async (req, res) => {
    const result = await agent.invoke(req.body);
    res.json(result);
  });
  
  app.listen(3000, () => {
    console.log(`Worker ${process.pid} started`);
  });
}

Queue-Based Load Balancing

Use message queues for async processing:

typescript

import { Queue, Worker } from 'bullmq';
import Redis from 'ioredis';

const connection = new Redis(process.env.REDIS_URL);

// Producer: Add jobs to queue
const agentQueue = new Queue('agent-tasks', { connection });

app.post('/api/agent/async', async (req, res) => {
  const job = await agentQueue.add('invoke', {
    query: req.body.query,
    userId: req.user.id
  });
  
  res.json({ jobId: job.id });
});

// Consumer: Process jobs
const worker = new Worker('agent-tasks', async (job) => {
  const result = await agent.invoke({
    messages: [{ role: 'user', content: job.data.query }]
  });
  
  return result;
}, { connection, concurrency: 5 });

worker.on('completed', (job) => {
  console.log(`Job ${job.id} completed`);
});

worker.on('failed', (job, err) => {
  console.error(`Job ${job.id} failed:`, err);
});

Caching Layer

Redis Caching

Implement distributed caching:

typescript

import { Redis } from 'ioredis';

class DistributedCache {
  private redis: Redis;

  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }

  async get(key: string): Promise<any | null> {
    const cached = await this.redis.get(key);
    return cached ? JSON.parse(cached) : null;
  }

  async set(key: string, value: any, ttl: number = 3600): Promise<void> {
    await this.redis.setex(key, ttl, JSON.stringify(value));
  }

  async invalidate(pattern: string): Promise<void> {
    const keys = await this.redis.keys(pattern);
    if (keys.length > 0) {
      await this.redis.del(...keys);
    }
  }
}

const cache = new DistributedCache(process.env.REDIS_URL!);

// Cached agent endpoint
app.post('/api/agent', async (req, res) => {
  const cacheKey = `agent:${hashQuery(req.body.query)}`;

  // Check cache
  const cached = await cache.get(cacheKey);
  if (cached) {
    return res.json({ ...cached, cached: true });
  }

  // Invoke agent
  const result = await agent.invoke(req.body);

  // Cache result
  await cache.set(cacheKey, result, 3600);

  res.json({ ...result, cached: false });
});

CDN Integration

Cache responses at the edge:

typescript

// Cloudflare Workers
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const cache = caches.default;

    // Check cache
    let response = await cache.match(request);
    if (response) {
      return response;
    }

    // Invoke agent
    const { query } = await request.json();
    const result = await agent.invoke({ messages: [{ role: 'user', content: query }] });

    // Create cacheable response
    response = new Response(JSON.stringify(result), {
      headers: {
        'Content-Type': 'application/json',
        'Cache-Control': 'public, max-age=3600'
      }
    });

    // Store in cache
    await cache.put(request, response.clone());

    return response;
  }
};

Database Integration

PostgreSQL with Connection Pooling

typescript

import { Pool } from 'pg';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,  // Maximum pool size
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000
});

// Store agent results
async function saveAgentResult(userId: string, query: string, result: any) {
  const client = await pool.connect();

  try {
    await client.query(
      'INSERT INTO agent_results (user_id, query, result, created_at) VALUES ($1, $2, $3, NOW())',
      [userId, query, JSON.stringify(result)]
    );
  } finally {
    client.release();
  }
}

// Retrieve history
async function getAgentHistory(userId: string, limit: number = 10) {
  const result = await pool.query(
    'SELECT * FROM agent_results WHERE user_id = $1 ORDER BY created_at DESC LIMIT $2',
    [userId, limit]
  );

  return result.rows;
}

MongoDB for Unstructured Data

typescript

import { MongoClient } from 'mongodb';

const client = new MongoClient(process.env.MONGODB_URL!);
await client.connect();

const db = client.db('agentforge');
const results = db.collection('agent_results');

// Create indexes
await results.createIndex({ userId: 1, createdAt: -1 });
await results.createIndex({ query: 'text' });

// Store result
async function saveResult(data: any) {
  await results.insertOne({
    ...data,
    createdAt: new Date()
  });
}

// Search results
async function searchResults(userId: string, searchQuery: string) {
  return await results.find({
    userId,
    $text: { $search: searchQuery }
  }).toArray();
}

Health Checks

Comprehensive Health Endpoint

typescript

import express from 'express';

const app = express();

app.get('/health', async (req, res) => {
  const health = {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {
      database: await checkDatabase(),
      redis: await checkRedis(),
      llm: await checkLLM()
    }
  };

  const isHealthy = Object.values(health.checks).every(check => check.status === 'ok');

  res.status(isHealthy ? 200 : 503).json(health);
});

async function checkDatabase(): Promise<{ status: string; latency?: number }> {
  const start = Date.now();
  try {
    await pool.query('SELECT 1');
    return { status: 'ok', latency: Date.now() - start };
  } catch (error) {
    return { status: 'error' };
  }
}

async function checkRedis(): Promise<{ status: string; latency?: number }> {
  const start = Date.now();
  try {
    await redis.ping();
    return { status: 'ok', latency: Date.now() - start };
  } catch (error) {
    return { status: 'error' };
  }
}

async function checkLLM(): Promise<{ status: string; latency?: number }> {
  const start = Date.now();
  try {
    await llm.invoke([{ role: 'user', content: 'test' }]);
    return { status: 'ok', latency: Date.now() - start };
  } catch (error) {
    return { status: 'error' };
  }
}

Readiness Probe

typescript

app.get('/ready', async (req, res) => {
  // Check if application is ready to serve traffic
  const ready = {
    initialized: agentInitialized,
    database: await checkDatabase(),
    cache: await checkRedis()
  };

  const isReady = Object.values(ready).every(check =>
    typeof check === 'boolean' ? check : check.status === 'ok'
  );

  res.status(isReady ? 200 : 503).json(ready);
});

Security

API Authentication

typescript

import jwt from 'jsonwebtoken';

// JWT middleware
function authenticateToken(req: Request, res: Response, next: NextFunction) {
  const authHeader = req.headers['authorization'];
  const token = authHeader && authHeader.split(' ')[1];

  if (!token) {
    return res.sendStatus(401);
  }

  jwt.verify(token, process.env.JWT_SECRET!, (err, user) => {
    if (err) return res.sendStatus(403);
    req.user = user;
    next();
  });
}

app.post('/api/agent', authenticateToken, async (req, res) => {
  const result = await agent.invoke(req.body);
  res.json(result);
});

Rate Limiting

typescript

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 100,  // Limit each IP to 100 requests per windowMs
  message: 'Too many requests from this IP',
  standardHeaders: true,
  legacyHeaders: false
});

app.use('/api/', limiter);

// Per-user rate limiting
const userLimiter = rateLimit({
  windowMs: 60 * 1000,  // 1 minute
  max: async (req) => {
    // Different limits based on user tier
    const user = await getUserTier(req.user.id);
    return user.tier === 'premium' ? 100 : 10;
  },
  keyGenerator: (req) => req.user.id
});

app.use('/api/agent', authenticateToken, userLimiter);

Input Validation

typescript

import { z } from 'zod';

const agentRequestSchema = z.object({
  query: z.string().min(1).max(1000),
  options: z.object({
    maxTokens: z.number().min(100).max(10000).optional(),
    temperature: z.number().min(0).max(2).optional()
  }).optional()
});

app.post('/api/agent', async (req, res) => {
  try {
    const validated = agentRequestSchema.parse(req.body);
    const result = await agent.invoke({
      messages: [{ role: 'user', content: validated.query }]
    });
    res.json(result);
  } catch (error) {
    if (error instanceof z.ZodError) {
      return res.status(400).json({ error: error.errors });
    }
    throw error;
  }
});

Error Handling

Global Error Handler

typescript

import { ErrorRequestHandler } from 'express';

const errorHandler: ErrorRequestHandler = (err, req, res, next) => {
  console.error('Error:', err);

  // Log to monitoring service
  logger.error('Request failed', {
    error: err.message,
    stack: err.stack,
    path: req.path,
    method: req.method,
    userId: req.user?.id
  });

  // Send appropriate response
  if (err.name === 'ValidationError') {
    return res.status(400).json({ error: 'Invalid input' });
  }

  if (err.name === 'UnauthorizedError') {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  if (err.name === 'RateLimitError') {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }

  // Generic error
  res.status(500).json({
    error: process.env.NODE_ENV === 'production'
      ? 'Internal server error'
      : err.message
  });
};

app.use(errorHandler);

Graceful Shutdown

typescript

let server: any;

async function gracefulShutdown(signal: string) {
  console.log(`${signal} received, starting graceful shutdown`);

  // Stop accepting new requests
  server.close(() => {
    console.log('HTTP server closed');
  });

  // Close database connections
  await pool.end();
  console.log('Database pool closed');

  // Close Redis connection
  await redis.quit();
  console.log('Redis connection closed');

  // Wait for ongoing requests to complete (max 30s)
  setTimeout(() => {
    console.log('Forcing shutdown');
    process.exit(0);
  }, 30000);
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

server = app.listen(3000, () => {
  console.log('Server started on port 3000');
});

CI/CD Pipeline

GitHub Actions

yaml

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Run linter
        run: npm run lint

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build Docker image
        run: docker build -t agentforge-api:${{ github.sha }} .

      - name: Push to registry
        run: |
          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
          docker push agentforge-api:${{ github.sha }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/agentforge-api \
            api=agentforge-api:${{ github.sha }}
          kubectl rollout status deployment/agentforge-api

Monitoring in Production

Application Metrics

typescript

import { register, collectDefaultMetrics } from 'prom-client';

// Collect default metrics
collectDefaultMetrics({ prefix: 'agentforge_' });

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Logging

Built-in Logger (Recommended):

typescript

import { createLogger, LogLevel } from '@agentforge/core';

// Use environment variable for log level
const logLevel = (process.env.LOG_LEVEL?.toLowerCase() as LogLevel) || LogLevel.INFO;

const logger = createLogger('production-agent', {
  level: logLevel,
  format: 'json', // JSON format for log aggregation
  includeTimestamp: true,
  includeContext: true
});

// Use throughout your application
logger.info('Server started', { port: 3000, env: process.env.NODE_ENV });
logger.error('Request failed', { error: err.message, requestId });

Advanced Logging with Winston (for remote logging):

typescript

import winston from 'winston';
import { Logtail } from '@logtail/node';

const logtail = new Logtail(process.env.LOGTAIL_TOKEN!);

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' }),
    new winston.transports.Stream({ stream: logtail })
  ]
});

Best Practices

1. Use Environment-Specific Configurations

typescript

const config = {
  development: {
    logLevel: 'debug',
    cacheEnabled: false,
    maxConcurrentAgents: 2
  },
  production: {
    logLevel: 'info',
    cacheEnabled: true,
    maxConcurrentAgents: 20
  }
}[process.env.NODE_ENV || 'development'];

2. Implement Circuit Breakers

typescript

import CircuitBreaker from 'opossum';

const breaker = new CircuitBreaker(agent.invoke, {
  timeout: 30000,  // 30 seconds
  errorThresholdPercentage: 50,
  resetTimeout: 30000
});

breaker.fallback(() => ({ error: 'Service temporarily unavailable' }));

app.post('/api/agent', async (req, res) => {
  const result = await breaker.fire(req.body);
  res.json(result);
});

3. Use Blue-Green Deployments

Deploy new versions without downtime:

yaml

# Deploy green version
kubectl apply -f deployment-green.yaml

# Wait for green to be ready
kubectl wait --for=condition=available deployment/agentforge-api-green

# Switch traffic to green
kubectl patch service agentforge-api -p '{"spec":{"selector":{"version":"green"}}}'

# Remove blue version
kubectl delete deployment agentforge-api-blue

4. Implement Canary Releases

Gradually roll out new versions:

yaml

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: agentforge-api
spec:
  hosts:
  - agentforge-api
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: agentforge-api
        subset: v2
  - route:
    - destination:
        host: agentforge-api
        subset: v1
      weight: 90
    - destination:
        host: agentforge-api
        subset: v2
      weight: 10

Next Steps

Monitoring - Production monitoring
Resource Management - Optimize resources
Streaming - Real-time deployment
Production Deployment Tutorial - Step-by-step deployment guide

Deployment Strategies ​

Overview ​

Deployment Architectures ​

1. Serverless Deployment ​

2. Container Deployment ​

3. Hybrid Deployment ​

Environment Configuration ​

Environment Variables ​

Secrets Management ​

Load Balancing ​

Application Load Balancer ​

Queue-Based Load Balancing ​

Caching Layer ​

Redis Caching ​

CDN Integration ​

Database Integration ​

PostgreSQL with Connection Pooling ​

MongoDB for Unstructured Data ​

Health Checks ​

Comprehensive Health Endpoint ​

Readiness Probe ​

Security ​

API Authentication ​

Rate Limiting ​

Input Validation ​

Error Handling ​

Global Error Handler ​

Graceful Shutdown ​

CI/CD Pipeline ​

GitHub Actions ​

Monitoring in Production ​

Application Metrics ​

Logging ​

Best Practices ​

1. Use Environment-Specific Configurations ​

2. Implement Circuit Breakers ​

3. Use Blue-Green Deployments ​

4. Implement Canary Releases ​

Next Steps ​

Further Reading ​

Deployment Strategies

Overview

Deployment Architectures

1. Serverless Deployment

2. Container Deployment

3. Hybrid Deployment

Environment Configuration

Environment Variables

Secrets Management

Load Balancing

Application Load Balancer

Queue-Based Load Balancing

Caching Layer

Redis Caching

CDN Integration

Database Integration

PostgreSQL with Connection Pooling

MongoDB for Unstructured Data

Health Checks

Comprehensive Health Endpoint

Readiness Probe

Security

API Authentication

Rate Limiting

Input Validation

Error Handling

Global Error Handler

Graceful Shutdown

CI/CD Pipeline

GitHub Actions

Monitoring in Production

Application Metrics

Logging

Best Practices

1. Use Environment-Specific Configurations

2. Implement Circuit Breakers

3. Use Blue-Green Deployments

4. Implement Canary Releases

Next Steps

Further Reading