Skip to content

健康检查与 Terminus

在现代分布式系统和微服务架构中,健康检查是一个至关重要的组件。它允许运维团队监控应用状态,确保服务正常运行,并在出现问题时及时响应。NestJS 通过 @nestjs/terminus 包提供了强大的健康检查功能,可以轻松地暴露 /health 端点,检查数据库、Redis、外部 API 等各种依赖服务的可用性。本文将深入探讨如何使用 Terminus 实现全面的健康检查机制。

1. 健康检查基础概念

1.1 什么是健康检查?

健康检查是一种监控机制,用于验证应用及其依赖服务是否正常运行:

typescript
// 健康检查的类型
// 1. 存活性检查(Liveness)- 应用是否在运行
// 2. 就绪性检查(Readiness)- 应用是否准备好接收流量
// 3. 启动检查(Startup)- 应用是否已成功启动

// 健康检查的状态
// - UP (200): 服务正常
// - DOWN (503): 服务不可用
// - OUT_OF_SERVICE (503): 服务暂停
// - UNKNOWN (503): 状态未知

1.2 Terminus 简介

Terminus 是 NestJS 的健康检查库,提供了以下功能:

typescript
// Terminus 的主要特性
// 1. 内置健康指示器:数据库、Redis、HTTP 等
// 2. 自定义健康检查:支持自定义检查逻辑
// 3. 多端点支持:可以创建多个健康检查端点
// 4. 详细信息:提供详细的健康状态信息
// 5. Kubernetes 集成:支持 Kubernetes 探针

2. Terminus 基础配置

2.1 安装和配置

bash
# 安装 Terminus
npm install @nestjs/terminus
typescript
// app.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HealthController } from './health/health.controller';

@Module({
  imports: [
    TerminusModule,
    // 其他模块...
  ],
  controllers: [HealthController],
})
export class AppModule {}

2.2 基础健康检查控制器

typescript
// health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, HttpHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly http: HttpHealthIndicator,
  ) {}
  
  @Get()
  @HealthCheck()
  async check() {
    return await this.health.check([
      () => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
    ]);
  }
}

3. 内置健康指示器

3.1 数据库健康检查

typescript
// 数据库健康检查
import { Controller, Get } from '@nestjs/common';
import { 
  HealthCheck, 
  HealthCheckService, 
  TypeOrmHealthIndicator,
  PrismaHealthIndicator
} from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly db: TypeOrmHealthIndicator,
    private readonly prisma: PrismaHealthIndicator,
  ) {}
  
  @Get()
  @HealthCheck()
  async check() {
    return await this.health.check([
      // TypeORM 健康检查
      () => this.db.pingCheck('database'),
      
      // Prisma 健康检查
      () => this.prisma.pingCheck('prisma'),
    ]);
  }
}

3.2 Redis 健康检查

typescript
// Redis 健康检查
import { 
  HealthCheckService, 
  RedisHealthIndicator,
  HealthIndicatorFunction
} from '@nestjs/terminus';
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';

@Injectable()
export class HealthService {
  constructor(
    private readonly health: HealthCheckService,
    private readonly redis: RedisHealthIndicator,
  ) {}
  
  getRedisChecks(): HealthIndicatorFunction[] {
    return [
      () => this.redis.checkHealth('redis-cache', {
        type: 'redis',
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT) || 6379,
      }),
      
      () => this.redis.checkHealth('redis-session', {
        type: 'redis',
        host: process.env.REDIS_SESSION_HOST || 'localhost',
        port: parseInt(process.env.REDIS_SESSION_PORT) || 6380,
      }),
    ];
  }
}

3.3 HTTP 健康检查

typescript
// HTTP 健康检查
import { HttpHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly http: HttpHealthIndicator,
  ) {}
  
  @Get()
  @HealthCheck()
  async check() {
    return await this.health.check([
      // 检查外部 API
      () => this.http.pingCheck('google', 'https://google.com'),
      
      // 检查内部服务
      () => this.http.pingCheck('auth-service', 'http://auth-service:3000/health'),
      
      // 自定义超时和状态码检查
      () => this.http.pingCheck('api-docs', 'https://api.example.com/docs', {
        timeout: 5000,
        statusCodes: [200, 201, 301, 302],
      }),
    ]);
  }
}

3.4 磁盘空间健康检查

typescript
// 磁盘空间健康检查
import { DiskHealthIndicator } from '@nestjs/terminus';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly disk: DiskHealthIndicator,
  ) {}
  
  @Get()
  @HealthCheck()
  async check() {
    return await this.health.check([
      // 检查磁盘空间
      () => this.disk.checkStorage('disk', {
        path: '/',
        thresholdPercent: 0.9, // 90% 阈值
      }),
      
      // 检查特定目录
      () => this.disk.checkStorage('tmp-directory', {
        path: '/tmp',
        thresholdPercent: 0.8, // 80% 阈值
      }),
    ]);
  }
}

4. 自定义健康检查

4.1 自定义健康指示器

typescript
// 自定义健康指示器
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult } from '@nestjs/terminus';
import { ConfigService } from '@nestjs/config';

@Injectable()
export class CustomHealthIndicator extends HealthIndicator {
  constructor(private readonly configService: ConfigService) {
    super();
  }
  
  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    try {
      // 执行自定义检查逻辑
      const isConfigValid = this.validateConfiguration();
      const customMetric = await this.getCustomMetric();
      
      if (!isConfigValid) {
        throw new Error('Configuration validation failed');
      }
      
      return this.getStatus(key, true, {
        configValid: isConfigValid,
        customMetric,
        timestamp: new Date().toISOString(),
      });
    } catch (error) {
      return this.getStatus(key, false, {
        message: error.message,
        timestamp: new Date().toISOString(),
      });
    }
  }
  
  private validateConfiguration(): boolean {
    // 验证关键配置项
    const requiredConfigs = [
      'DATABASE_URL',
      'JWT_SECRET',
      'API_KEY',
    ];
    
    return requiredConfigs.every(config => 
      this.configService.get(config) !== undefined
    );
  }
  
  private async getCustomMetric(): Promise<number> {
    // 获取自定义指标,如活跃用户数、队列长度等
    return Math.floor(Math.random() * 1000);
  }
}

4.2 业务逻辑健康检查

typescript
// 业务逻辑健康检查
@Injectable()
export class BusinessHealthIndicator extends HealthIndicator {
  constructor(
    private readonly userService: UserService,
    private readonly orderService: OrderService,
  ) {
    super();
  }
  
  async checkUserService(key: string): Promise<HealthIndicatorResult> {
    try {
      // 检查关键业务功能
      const userCount = await this.userService.getActiveUserCount();
      const canCreateUser = await this.userService.testUserCreation();
      
      if (userCount < 0 || !canCreateUser) {
        throw new Error('User service health check failed');
      }
      
      return this.getStatus(key, true, {
        activeUsers: userCount,
        createUserTest: canCreateUser,
      });
    } catch (error) {
      return this.getStatus(key, false, {
        message: error.message,
      });
    }
  }
  
  async checkOrderService(key: string): Promise<HealthIndicatorResult> {
    try {
      // 检查订单处理能力
      const pendingOrders = await this.orderService.getPendingOrderCount();
      const canProcessOrder = await this.orderService.testOrderProcessing();
      
      // 设置阈值告警
      const warningThreshold = 1000;
      const criticalThreshold = 5000;
      
      let status = true;
      let info: any = {
        pendingOrders,
        processOrderTest: canProcessOrder,
      };
      
      if (pendingOrders > criticalThreshold) {
        status = false;
        info.message = 'Critical: Too many pending orders';
      } else if (pendingOrders > warningThreshold) {
        info.message = 'Warning: High number of pending orders';
      }
      
      return this.getStatus(key, status, info);
    } catch (error) {
      return this.getStatus(key, false, {
        message: error.message,
      });
    }
  }
}

5. 多端点健康检查

5.1 分离的健康检查端点

typescript
// 多端点健康检查
@Controller()
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly db: TypeOrmHealthIndicator,
    private readonly redis: RedisHealthIndicator,
    private readonly http: HttpHealthIndicator,
  ) {}
  
  // 基础健康检查 - 只检查应用本身
  @Get('health')
  @HealthCheck()
  async healthCheck() {
    return await this.health.check([]);
  }
  
  // 就绪性检查 - 检查所有依赖服务
  @Get('ready')
  @HealthCheck()
  async readinessCheck() {
    return await this.health.check([
      () => this.db.pingCheck('database'),
      () => this.redis.checkHealth('redis', {
        type: 'redis',
        host: process.env.REDIS_HOST,
        port: parseInt(process.env.REDIS_PORT),
      }),
    ]);
  }
  
  // 存活性检查 - 只检查关键服务
  @Get('alive')
  @HealthCheck()
  async livenessCheck() {
    return await this.health.check([
      () => this.http.pingCheck('internal-service', 'http://localhost:3001/health'),
    ]);
  }
  
  // 详细健康检查 - 包含所有检查项
  @Get('health/detail')
  @HealthCheck()
  async detailedHealthCheck() {
    return await this.health.check([
      () => this.db.pingCheck('database'),
      () => this.redis.checkHealth('redis-cache', {
        type: 'redis',
        host: process.env.REDIS_HOST,
        port: parseInt(process.env.REDIS_PORT),
      }),
      () => this.http.pingCheck('external-api', 'https://api.external.com/health'),
      () => this.http.pingCheck('internal-service', 'http://internal-service:3000/health'),
    ]);
  }
}

5.2 条件性健康检查

typescript
// 条件性健康检查
@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly db: TypeOrmHealthIndicator,
    private readonly redis: RedisHealthIndicator,
    private readonly configService: ConfigService,
  ) {}
  
  @Get()
  @HealthCheck()
  async check(@Query('level') level: 'basic' | 'detailed' = 'basic') {
    const checks = [];
    
    // 基础检查(总是执行)
    checks.push(() => this.db.pingCheck('database'));
    
    // 详细检查(根据参数决定)
    if (level === 'detailed') {
      if (this.configService.get('REDIS_ENABLED') === 'true') {
        checks.push(() => this.redis.checkHealth('redis', {
          type: 'redis',
          host: process.env.REDIS_HOST,
          port: parseInt(process.env.REDIS_PORT),
        }));
      }
      
      if (this.configService.get('EXTERNAL_API_HEALTH_CHECK') === 'true') {
        checks.push(() => this.http.pingCheck('external-api', process.env.EXTERNAL_API_URL));
      }
    }
    
    return await this.health.check(checks);
  }
}

6. Kubernetes 集成

6.1 Kubernetes 探针配置

yaml
# Kubernetes Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nestjs-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nestjs-app
  template:
    metadata:
      labels:
        app: nestjs-app
    spec:
      containers:
      - name: nestjs-app
        image: nestjs-app:latest
        ports:
        - containerPort: 3000
        # 存活性探针
        livenessProbe:
          httpGet:
            path: /health/alive
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        # 就绪性探针
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        # 启动探针(Kubernetes 1.18+)
        startupProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30

6.2 环境特定的健康检查

typescript
// 环境特定的健康检查
@Injectable()
export class EnvironmentAwareHealthIndicator extends HealthIndicator {
  constructor(
    private readonly configService: ConfigService,
  ) {
    super();
  }
  
  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    const environment = this.configService.get('NODE_ENV');
    
    try {
      switch (environment) {
        case 'development':
          return this.checkDevelopmentEnvironment();
        case 'staging':
          return this.checkStagingEnvironment();
        case 'production':
          return this.checkProductionEnvironment();
        default:
          return this.getStatus(key, true);
      }
    } catch (error) {
      return this.getStatus(key, false, {
        message: error.message,
        environment,
      });
    }
  }
  
  private async checkDevelopmentEnvironment(): Promise<HealthIndicatorResult> {
    // 开发环境的健康检查
    return this.getStatus('environment', true, {
      environment: 'development',
      features: ['hot-reload', 'debug-mode'],
    });
  }
  
  private async checkStagingEnvironment(): Promise<HealthIndicatorResult> {
    // 预发布环境的健康检查
    return this.getStatus('environment', true, {
      environment: 'staging',
      features: ['performance-monitoring'],
    });
  }
  
  private async checkProductionEnvironment(): Promise<HealthIndicatorResult> {
    // 生产环境的健康检查
    const securityChecks = await this.performSecurityChecks();
    return this.getStatus('environment', securityChecks.passed, {
      environment: 'production',
      features: ['ssl-enabled', 'rate-limiting'],
      security: securityChecks,
    });
  }
  
  private async performSecurityChecks(): Promise<any> {
    // 执行安全相关检查
    return {
      passed: true,
      checks: [
        { name: 'ssl-certificate', status: 'valid' },
        { name: 'firewall', status: 'active' },
        { name: 'rate-limiting', status: 'enabled' },
      ],
    };
  }
}

7. 健康检查监控和告警

7.1 健康检查日志

typescript
// 健康检查日志记录
@Injectable()
export class HealthCheckLogger {
  private readonly logger = new Logger(HealthCheckLogger.name);
  
  logHealthCheck(result: any, request: any) {
    const { status, info, error, details } = result;
    
    this.logger.log({
      message: 'Health check performed',
      status,
      endpoint: request.url,
      ip: request.ip,
      userAgent: request.headers['user-agent'],
      timestamp: new Date().toISOString(),
      details: {
        info: Object.keys(info || {}).length,
        errors: Object.keys(error || {}).length,
      },
    });
    
    // 记录错误详情
    if (error && Object.keys(error).length > 0) {
      this.logger.error({
        message: 'Health check failed',
        errors: error,
        timestamp: new Date().toISOString(),
      });
    }
  }
}

7.2 健康检查指标收集

typescript
// 健康检查指标收集
@Injectable()
export class HealthMetricsCollector {
  private readonly gauge = new Gauge({
    name: 'app_health_status',
    help: 'Application health status (1=healthy, 0=unhealthy)',
    labelNames: ['service', 'type'],
  });
  
  private readonly histogram = new Histogram({
    name: 'health_check_duration_seconds',
    help: 'Duration of health checks in seconds',
    labelNames: ['endpoint', 'status'],
  });
  
  async collectHealthMetrics(
    endpoint: string,
    status: string,
    duration: number,
    details: any,
  ) {
    // 更新健康状态指标
    const isHealthy = status === 'ok' ? 1 : 0;
    this.gauge.set({ service: 'nestjs-app', type: 'overall' }, isHealthy);
    
    // 记录检查持续时间
    this.histogram.observe({ endpoint, status }, duration);
    
    // 记录详细指标
    if (details) {
      Object.keys(details).forEach(service => {
        const serviceStatus = details[service].status === 'up' ? 1 : 0;
        this.gauge.set({ service: 'nestjs-app', type: service }, serviceStatus);
      });
    }
  }
}

8. 总结

Terminus 提供的健康检查功能包括:

  1. 内置指示器:数据库、Redis、HTTP、磁盘等常见服务的健康检查
  2. 自定义检查:支持实现自定义的健康检查逻辑
  3. 多端点支持:可以创建不同级别的健康检查端点
  4. Kubernetes 集成:完美支持 Kubernetes 的各种探针
  5. 详细信息:提供丰富的健康状态信息
  6. 监控集成:可以轻松集成日志和指标收集

健康检查的最佳实践:

  1. 分层检查:区分存活性、就绪性和启动检查
  2. 快速响应:健康检查应该快速返回结果
  3. 最小依赖:避免在健康检查中引入过多依赖
  4. 详细日志:记录健康检查的结果和错误信息
  5. 指标收集:收集健康检查的指标用于监控
  6. 环境适配:根据不同环境调整健康检查策略

通过合理配置健康检查,我们可以:

  1. 提高系统可靠性:及时发现和处理服务异常
  2. 优化运维效率:自动化故障检测和恢复
  3. 增强用户体验:减少服务不可用时间
  4. 支持容器编排:完美集成 Kubernetes 等容器平台

在下一篇文章中,我们将探讨配置管理:ConfigService 如何加载 .env 文件,了解多环境配置、验证、注入时机。