健康检查与 Terminus
在现代分布式系统和微服务架构中,健康检查是一个至关重要的组件。它允许运维团队监控应用状态,确保服务正常运行,并在出现问题时及时响应。NestJS 通过 @nestjs/terminus 包提供了强大的健康检查功能,可以轻松地暴露 /health 端点,检查数据库、Redis、外部 API 等各种依赖服务的可用性。本文将深入探讨如何使用 Terminus 实现全面的健康检查机制。
1. 健康检查基础概念
1.1 什么是健康检查?
健康检查是一种监控机制,用于验证应用及其依赖服务是否正常运行:
typescript
// 健康检查的类型
// 1. 存活性检查(Liveness)- 应用是否在运行
// 2. 就绪性检查(Readiness)- 应用是否准备好接收流量
// 3. 启动检查(Startup)- 应用是否已成功启动
// 健康检查的状态
// - UP (200): 服务正常
// - DOWN (503): 服务不可用
// - OUT_OF_SERVICE (503): 服务暂停
// - UNKNOWN (503): 状态未知1.2 Terminus 简介
Terminus 是 NestJS 的健康检查库,提供了以下功能:
typescript
// Terminus 的主要特性
// 1. 内置健康指示器:数据库、Redis、HTTP 等
// 2. 自定义健康检查:支持自定义检查逻辑
// 3. 多端点支持:可以创建多个健康检查端点
// 4. 详细信息:提供详细的健康状态信息
// 5. Kubernetes 集成:支持 Kubernetes 探针2. Terminus 基础配置
2.1 安装和配置
bash
# 安装 Terminus
npm install @nestjs/terminustypescript
// app.module.ts
import { Module } from '@nestjs/common';
import { TerminusModule } from '@nestjs/terminus';
import { HealthController } from './health/health.controller';
@Module({
imports: [
TerminusModule,
// 其他模块...
],
controllers: [HealthController],
})
export class AppModule {}2.2 基础健康检查控制器
typescript
// health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import { HealthCheck, HealthCheckService, HttpHealthIndicator } from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly http: HttpHealthIndicator,
) {}
@Get()
@HealthCheck()
async check() {
return await this.health.check([
() => this.http.pingCheck('nestjs-docs', 'https://docs.nestjs.com'),
]);
}
}3. 内置健康指示器
3.1 数据库健康检查
typescript
// 数据库健康检查
import { Controller, Get } from '@nestjs/common';
import {
HealthCheck,
HealthCheckService,
TypeOrmHealthIndicator,
PrismaHealthIndicator
} from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly prisma: PrismaHealthIndicator,
) {}
@Get()
@HealthCheck()
async check() {
return await this.health.check([
// TypeORM 健康检查
() => this.db.pingCheck('database'),
// Prisma 健康检查
() => this.prisma.pingCheck('prisma'),
]);
}
}3.2 Redis 健康检查
typescript
// Redis 健康检查
import {
HealthCheckService,
RedisHealthIndicator,
HealthIndicatorFunction
} from '@nestjs/terminus';
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';
@Injectable()
export class HealthService {
constructor(
private readonly health: HealthCheckService,
private readonly redis: RedisHealthIndicator,
) {}
getRedisChecks(): HealthIndicatorFunction[] {
return [
() => this.redis.checkHealth('redis-cache', {
type: 'redis',
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT) || 6379,
}),
() => this.redis.checkHealth('redis-session', {
type: 'redis',
host: process.env.REDIS_SESSION_HOST || 'localhost',
port: parseInt(process.env.REDIS_SESSION_PORT) || 6380,
}),
];
}
}3.3 HTTP 健康检查
typescript
// HTTP 健康检查
import { HttpHealthIndicator } from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly http: HttpHealthIndicator,
) {}
@Get()
@HealthCheck()
async check() {
return await this.health.check([
// 检查外部 API
() => this.http.pingCheck('google', 'https://google.com'),
// 检查内部服务
() => this.http.pingCheck('auth-service', 'http://auth-service:3000/health'),
// 自定义超时和状态码检查
() => this.http.pingCheck('api-docs', 'https://api.example.com/docs', {
timeout: 5000,
statusCodes: [200, 201, 301, 302],
}),
]);
}
}3.4 磁盘空间健康检查
typescript
// 磁盘空间健康检查
import { DiskHealthIndicator } from '@nestjs/terminus';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly disk: DiskHealthIndicator,
) {}
@Get()
@HealthCheck()
async check() {
return await this.health.check([
// 检查磁盘空间
() => this.disk.checkStorage('disk', {
path: '/',
thresholdPercent: 0.9, // 90% 阈值
}),
// 检查特定目录
() => this.disk.checkStorage('tmp-directory', {
path: '/tmp',
thresholdPercent: 0.8, // 80% 阈值
}),
]);
}
}4. 自定义健康检查
4.1 自定义健康指示器
typescript
// 自定义健康指示器
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult } from '@nestjs/terminus';
import { ConfigService } from '@nestjs/config';
@Injectable()
export class CustomHealthIndicator extends HealthIndicator {
constructor(private readonly configService: ConfigService) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
try {
// 执行自定义检查逻辑
const isConfigValid = this.validateConfiguration();
const customMetric = await this.getCustomMetric();
if (!isConfigValid) {
throw new Error('Configuration validation failed');
}
return this.getStatus(key, true, {
configValid: isConfigValid,
customMetric,
timestamp: new Date().toISOString(),
});
} catch (error) {
return this.getStatus(key, false, {
message: error.message,
timestamp: new Date().toISOString(),
});
}
}
private validateConfiguration(): boolean {
// 验证关键配置项
const requiredConfigs = [
'DATABASE_URL',
'JWT_SECRET',
'API_KEY',
];
return requiredConfigs.every(config =>
this.configService.get(config) !== undefined
);
}
private async getCustomMetric(): Promise<number> {
// 获取自定义指标,如活跃用户数、队列长度等
return Math.floor(Math.random() * 1000);
}
}4.2 业务逻辑健康检查
typescript
// 业务逻辑健康检查
@Injectable()
export class BusinessHealthIndicator extends HealthIndicator {
constructor(
private readonly userService: UserService,
private readonly orderService: OrderService,
) {
super();
}
async checkUserService(key: string): Promise<HealthIndicatorResult> {
try {
// 检查关键业务功能
const userCount = await this.userService.getActiveUserCount();
const canCreateUser = await this.userService.testUserCreation();
if (userCount < 0 || !canCreateUser) {
throw new Error('User service health check failed');
}
return this.getStatus(key, true, {
activeUsers: userCount,
createUserTest: canCreateUser,
});
} catch (error) {
return this.getStatus(key, false, {
message: error.message,
});
}
}
async checkOrderService(key: string): Promise<HealthIndicatorResult> {
try {
// 检查订单处理能力
const pendingOrders = await this.orderService.getPendingOrderCount();
const canProcessOrder = await this.orderService.testOrderProcessing();
// 设置阈值告警
const warningThreshold = 1000;
const criticalThreshold = 5000;
let status = true;
let info: any = {
pendingOrders,
processOrderTest: canProcessOrder,
};
if (pendingOrders > criticalThreshold) {
status = false;
info.message = 'Critical: Too many pending orders';
} else if (pendingOrders > warningThreshold) {
info.message = 'Warning: High number of pending orders';
}
return this.getStatus(key, status, info);
} catch (error) {
return this.getStatus(key, false, {
message: error.message,
});
}
}
}5. 多端点健康检查
5.1 分离的健康检查端点
typescript
// 多端点健康检查
@Controller()
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly redis: RedisHealthIndicator,
private readonly http: HttpHealthIndicator,
) {}
// 基础健康检查 - 只检查应用本身
@Get('health')
@HealthCheck()
async healthCheck() {
return await this.health.check([]);
}
// 就绪性检查 - 检查所有依赖服务
@Get('ready')
@HealthCheck()
async readinessCheck() {
return await this.health.check([
() => this.db.pingCheck('database'),
() => this.redis.checkHealth('redis', {
type: 'redis',
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT),
}),
]);
}
// 存活性检查 - 只检查关键服务
@Get('alive')
@HealthCheck()
async livenessCheck() {
return await this.health.check([
() => this.http.pingCheck('internal-service', 'http://localhost:3001/health'),
]);
}
// 详细健康检查 - 包含所有检查项
@Get('health/detail')
@HealthCheck()
async detailedHealthCheck() {
return await this.health.check([
() => this.db.pingCheck('database'),
() => this.redis.checkHealth('redis-cache', {
type: 'redis',
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT),
}),
() => this.http.pingCheck('external-api', 'https://api.external.com/health'),
() => this.http.pingCheck('internal-service', 'http://internal-service:3000/health'),
]);
}
}5.2 条件性健康检查
typescript
// 条件性健康检查
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly redis: RedisHealthIndicator,
private readonly configService: ConfigService,
) {}
@Get()
@HealthCheck()
async check(@Query('level') level: 'basic' | 'detailed' = 'basic') {
const checks = [];
// 基础检查(总是执行)
checks.push(() => this.db.pingCheck('database'));
// 详细检查(根据参数决定)
if (level === 'detailed') {
if (this.configService.get('REDIS_ENABLED') === 'true') {
checks.push(() => this.redis.checkHealth('redis', {
type: 'redis',
host: process.env.REDIS_HOST,
port: parseInt(process.env.REDIS_PORT),
}));
}
if (this.configService.get('EXTERNAL_API_HEALTH_CHECK') === 'true') {
checks.push(() => this.http.pingCheck('external-api', process.env.EXTERNAL_API_URL));
}
}
return await this.health.check(checks);
}
}6. Kubernetes 集成
6.1 Kubernetes 探针配置
yaml
# Kubernetes Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: nestjs-app
spec:
replicas: 3
selector:
matchLabels:
app: nestjs-app
template:
metadata:
labels:
app: nestjs-app
spec:
containers:
- name: nestjs-app
image: nestjs-app:latest
ports:
- containerPort: 3000
# 存活性探针
livenessProbe:
httpGet:
path: /health/alive
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# 就绪性探针
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# 启动探针(Kubernetes 1.18+)
startupProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 306.2 环境特定的健康检查
typescript
// 环境特定的健康检查
@Injectable()
export class EnvironmentAwareHealthIndicator extends HealthIndicator {
constructor(
private readonly configService: ConfigService,
) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
const environment = this.configService.get('NODE_ENV');
try {
switch (environment) {
case 'development':
return this.checkDevelopmentEnvironment();
case 'staging':
return this.checkStagingEnvironment();
case 'production':
return this.checkProductionEnvironment();
default:
return this.getStatus(key, true);
}
} catch (error) {
return this.getStatus(key, false, {
message: error.message,
environment,
});
}
}
private async checkDevelopmentEnvironment(): Promise<HealthIndicatorResult> {
// 开发环境的健康检查
return this.getStatus('environment', true, {
environment: 'development',
features: ['hot-reload', 'debug-mode'],
});
}
private async checkStagingEnvironment(): Promise<HealthIndicatorResult> {
// 预发布环境的健康检查
return this.getStatus('environment', true, {
environment: 'staging',
features: ['performance-monitoring'],
});
}
private async checkProductionEnvironment(): Promise<HealthIndicatorResult> {
// 生产环境的健康检查
const securityChecks = await this.performSecurityChecks();
return this.getStatus('environment', securityChecks.passed, {
environment: 'production',
features: ['ssl-enabled', 'rate-limiting'],
security: securityChecks,
});
}
private async performSecurityChecks(): Promise<any> {
// 执行安全相关检查
return {
passed: true,
checks: [
{ name: 'ssl-certificate', status: 'valid' },
{ name: 'firewall', status: 'active' },
{ name: 'rate-limiting', status: 'enabled' },
],
};
}
}7. 健康检查监控和告警
7.1 健康检查日志
typescript
// 健康检查日志记录
@Injectable()
export class HealthCheckLogger {
private readonly logger = new Logger(HealthCheckLogger.name);
logHealthCheck(result: any, request: any) {
const { status, info, error, details } = result;
this.logger.log({
message: 'Health check performed',
status,
endpoint: request.url,
ip: request.ip,
userAgent: request.headers['user-agent'],
timestamp: new Date().toISOString(),
details: {
info: Object.keys(info || {}).length,
errors: Object.keys(error || {}).length,
},
});
// 记录错误详情
if (error && Object.keys(error).length > 0) {
this.logger.error({
message: 'Health check failed',
errors: error,
timestamp: new Date().toISOString(),
});
}
}
}7.2 健康检查指标收集
typescript
// 健康检查指标收集
@Injectable()
export class HealthMetricsCollector {
private readonly gauge = new Gauge({
name: 'app_health_status',
help: 'Application health status (1=healthy, 0=unhealthy)',
labelNames: ['service', 'type'],
});
private readonly histogram = new Histogram({
name: 'health_check_duration_seconds',
help: 'Duration of health checks in seconds',
labelNames: ['endpoint', 'status'],
});
async collectHealthMetrics(
endpoint: string,
status: string,
duration: number,
details: any,
) {
// 更新健康状态指标
const isHealthy = status === 'ok' ? 1 : 0;
this.gauge.set({ service: 'nestjs-app', type: 'overall' }, isHealthy);
// 记录检查持续时间
this.histogram.observe({ endpoint, status }, duration);
// 记录详细指标
if (details) {
Object.keys(details).forEach(service => {
const serviceStatus = details[service].status === 'up' ? 1 : 0;
this.gauge.set({ service: 'nestjs-app', type: service }, serviceStatus);
});
}
}
}8. 总结
Terminus 提供的健康检查功能包括:
- 内置指示器:数据库、Redis、HTTP、磁盘等常见服务的健康检查
- 自定义检查:支持实现自定义的健康检查逻辑
- 多端点支持:可以创建不同级别的健康检查端点
- Kubernetes 集成:完美支持 Kubernetes 的各种探针
- 详细信息:提供丰富的健康状态信息
- 监控集成:可以轻松集成日志和指标收集
健康检查的最佳实践:
- 分层检查:区分存活性、就绪性和启动检查
- 快速响应:健康检查应该快速返回结果
- 最小依赖:避免在健康检查中引入过多依赖
- 详细日志:记录健康检查的结果和错误信息
- 指标收集:收集健康检查的指标用于监控
- 环境适配:根据不同环境调整健康检查策略
通过合理配置健康检查,我们可以:
- 提高系统可靠性:及时发现和处理服务异常
- 优化运维效率:自动化故障检测和恢复
- 增强用户体验:减少服务不可用时间
- 支持容器编排:完美集成 Kubernetes 等容器平台
在下一篇文章中,我们将探讨配置管理:ConfigService 如何加载 .env 文件,了解多环境配置、验证、注入时机。