Kubernetes HPA/VPA完全ガイド：本番環境で失敗しないオートスケーリング設計と実装

はじめに

Kubernetesを本番環境で運用する上で、適切なリソース管理とスケーリングは避けて通れない課題です。過剰なリソース割り当てはコストの無駄になり、不足していればパフォーマンス低下やサービス障害につながります。

本記事では、Kubernetesの自動スケーリング機能であるHorizontal Pod Autoscaler（HPA）とVertical Pod Autoscaler（VPA）について、基礎概念から本番環境での実践的な運用ノウハウまでを詳しく解説します。

HPAとVPAの違い

Horizontal Pod Autoscaler（HPA）

HPAは、Pod数を水平方向にスケールさせる機能です。負荷が増加するとPod数を増やし、負荷が減少するとPod数を減らします。

適用シーン：

ステートレスなWebアプリケーション
APIサーバー
ワーカープロセス

Vertical Pod Autoscaler（VPA）

VPAは、個々のPodのリソース要求（CPU/メモリ）を垂直方向に調整する機能です。実際の使用状況に基づいて、より適切なリソース割り当てを推奨・適用します。

適用シーン：

ステートフルなアプリケーション
データベースやキャッシュサーバー
バッチ処理ジョブ

HPA実践ガイド

基本的なHPA設定

まずはシンプルなCPUベースのHPAから始めましょう。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78


# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
      - name: web-api
        image: myregistry/web-api:v1.2.3
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 1000m
            memory: 512Mi
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

カスタムメトリクスを使ったHPA

CPUやメモリだけでなく、アプリケーション固有のメトリクスでスケーリングすることも可能です。

Prometheus Adapterの設定

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


# prometheus-adapter-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
    
    - seriesQuery: 'request_queue_length{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)$"
        as: "${1}"
      metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

カスタムメトリクスHPAの定義

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-custom-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 100
  metrics:
  # リクエスト数ベースのスケーリング
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 1000  # Pod当たり1000 RPS
  # キュー長ベースのスケーリング
  - type: Pods
    pods:
      metric:
        name: request_queue_length
      target:
        type: AverageValue
        averageValue: 10  # Pod当たりキュー長10以下

外部メトリクスを使ったスケーリング

SQSキューやPub/Subサブスクリプションなど、外部システムのメトリクスに基づくスケーリングも可能です。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_messages_visible
        selector:
          matchLabels:
            queue_name: "production-tasks"
      target:
        type: AverageValue
        averageValue: 20  # ワーカー当たり20メッセージ

HPAのスケーリング動作を制御するbehavior設定

Kubernetes 1.18以降では、behaviorフィールドでスケーリングの動作を細かく制御できます。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


behavior:
  scaleDown:
    # スケールダウン前の安定化期間（急な縮小を防ぐ）
    stabilizationWindowSeconds: 300
    policies:
    # 60秒ごとに最大10%減少
    - type: Percent
      value: 10
      periodSeconds: 60
    # または60秒ごとに最大2 Pod減少
    - type: Pods
      value: 2
      periodSeconds: 60
    # 複数ポリシーがある場合、最小の変更量を採用
    selectPolicy: Min
  
  scaleUp:
    # スケールアップは即座に反応
    stabilizationWindowSeconds: 0
    policies:
    # 15秒ごとに最大100%増加
    - type: Percent
      value: 100
      periodSeconds: 15
    # または15秒ごとに最大4 Pod増加
    - type: Pods
      value: 4
      periodSeconds: 15
    # 複数ポリシーがある場合、最大の変更量を採用
    selectPolicy: Max

VPA実践ガイド

VPAのインストール

1
2
3
4
5
6
7


# VPAコンポーネントのインストール
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# インストール確認
kubectl get pods -n kube-system | grep vpa

VPAの設定

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: web-api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

VPAの動作モード

モード	説明	ユースケース
Off	推奨値の計算のみ。適用しない	初期検証、リソース分析
Initial	Pod作成時のみ適用	安全に導入したい場合
Recreate	Podを再作成して適用	ダウンタイム許容可能な場合
Auto	最適な方法で自動適用	本番環境での運用

VPAの推奨値を確認

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


# VPAの状態確認
kubectl describe vpa web-api-vpa -n production

# 出力例
Status:
  Recommendation:
    Container Recommendations:
    - Container Name: web-api
      Lower Bound:
        Cpu:     150m
        Memory:  180Mi
      Target:
        Cpu:     250m
        Memory:  320Mi
      Uncapped Target:
        Cpu:     250m
        Memory:  320Mi
      Upper Bound:
        Cpu:     500m
        Memory:  640Mi

HPAとVPAの併用

HPAとVPAを同じDeploymentに適用する場合、注意が必要です。両者がリソース設定を競合して変更すると、予期しない動作が発生する可能性があります。

推奨アプローチ1：VPAをOff/Initialモードで使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# VPAは推奨値の計算のみ
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"  # 自動適用しない
  resourcePolicy:
    containerPolicies:
    - containerName: web-api
      controlledResources: ["memory"]  # メモリのみVPAで管理

推奨アプローチ2：リソース種別で分担

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36


# VPAはメモリのみ管理
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web-api
      controlledResources: ["memory"]  # CPUは管理しない
---
# HPAはCPUベースでスケーリング
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

本番環境での設計ベストプラクティス

1. 適切なリソースrequestsの設定

HPAはrequestsに対する使用率を計算します。requestsが不適切だと、スケーリングも不適切になります。

1
2
3
4
5
6
7
8
9


resources:
  requests:
    # 実際の平均使用量に近い値を設定
    cpu: 200m     # 観測された平均: 180m
    memory: 256Mi # 観測された平均: 220Mi
  limits:
    # バースト時の上限
    cpu: 1000m
    memory: 512Mi

2. Pod Disruption Budgetの設定

スケールダウン時にサービス断が発生しないよう、PDBを設定します。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-api-pdb
  namespace: production
spec:
  minAvailable: 2  # または maxUnavailable: 1
  selector:
    matchLabels:
      app: web-api

3. Readiness Probeの最適化

新しいPodがトラフィックを受け入れる前に、十分なウォームアップ時間を確保します。

1
2
3
4
5
6
7
8


readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 30  # アプリ起動時間を考慮
  periodSeconds: 10
  successThreshold: 1
  failureThreshold: 3

4. スケーリングのテスト

本番投入前に、負荷テストでスケーリング動作を検証します。

1
2
3
4
5


# 負荷テストツール（k6）でのテスト例
k6 run --vus 100 --duration 10m load-test.js

# HPAの状態を監視
watch kubectl get hpa web-api-hpa -n production

トラブルシューティング

HPAが動作しない場合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# HPAの状態確認
kubectl describe hpa web-api-hpa -n production

# よくあるエラー
# 1. "unable to get metrics for resource cpu"
#    → metrics-serverが動作していない
kubectl get pods -n kube-system | grep metrics-server

# 2. "missing request for cpu"
#    → Podにresources.requestsが設定されていない

# 3. ScalingActive: False
#    → メトリクスが取得できていない
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq

VPAの推奨が反映されない場合

1
2
3
4
5
6


# VPAコンポーネントの状態確認
kubectl logs -n kube-system -l app=vpa-recommender
kubectl logs -n kube-system -l app=vpa-updater

# Podのアノテーション確認
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations}'

スケーリングが遅い場合

1
2
3
4
5
6
7
8


# HPAのscaleUp設定を調整
behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # 即座にスケールアップ
    policies:
    - type: Percent
      value: 200  # より積極的にスケール
      periodSeconds: 10

メトリクス収集とモニタリング

Prometheus + Grafanaでのダッシュボード

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


# PrometheusRuleでアラート設定
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hpa-alerts
  namespace: monitoring
spec:
  groups:
  - name: hpa.rules
    rules:
    - alert: HPAAtMaxReplicas
      expr: |
        kube_horizontalpodautoscaler_status_current_replicas
        == kube_horizontalpodautoscaler_spec_max_replicas
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "HPA {{ $labels.horizontalpodautoscaler }} is at max replicas"
        
    - alert: HPAScalingFailed
      expr: |
        kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "HPA {{ $labels.horizontalpodautoscaler }} scaling is inactive"

まとめ

Kubernetesのオートスケーリングを効果的に活用するためのポイントをまとめます：

HPAとVPAの使い分け：ステートレスなアプリにはHPA、ステートフルなアプリにはVPAが適している
適切なメトリクス選定：CPU/メモリだけでなく、アプリケーション固有のメトリクスも検討
behaviorによる動作制御：急激なスケーリングを防ぎ、安定した運用を実現
PDBとの組み合わせ：スケールダウン時のサービス断を防止
事前の負荷テスト：本番投入前にスケーリング動作を検証

オートスケーリングは設定して終わりではありません。定期的にメトリクスを確認し、実際の負荷パターンに合わせて調整を続けることが、効率的なKubernetes運用の鍵となります。

はじめに#

HPAとVPAの違い#

Horizontal Pod Autoscaler（HPA）#

Vertical Pod Autoscaler（VPA）#

HPA実践ガイド#

基本的なHPA設定#

カスタムメトリクスを使ったHPA#

Prometheus Adapterの設定#

カスタムメトリクスHPAの定義#

外部メトリクスを使ったスケーリング#

HPAのスケーリング動作を制御するbehavior設定#

VPA実践ガイド#

VPAのインストール#

VPAの設定#

VPAの動作モード#

VPAの推奨値を確認#

HPAとVPAの併用#

推奨アプローチ1：VPAをOff/Initialモードで使用#

推奨アプローチ2：リソース種別で分担#

本番環境での設計ベストプラクティス#

1. 適切なリソースrequestsの設定#

2. Pod Disruption Budgetの設定#

3. Readiness Probeの最適化#

4. スケーリングのテスト#

トラブルシューティング#

HPAが動作しない場合#

VPAの推奨が反映されない場合#

スケーリングが遅い場合#

メトリクス収集とモニタリング#

Prometheus + Grafanaでのダッシュボード#

まとめ#