MLOPS/kubernetes

postgresql Pod을 NAS mount 시 발생한 트러블 슈팅 기록

개발허재 2023. 8. 19. 14:54

배경

Airflow on Kubernetes 를 구축하던 상황이었습니다.

helm 차트로 Airflow를 설치하던 중, PostgreSQL 생성에서 이슈가 발생했는데요.

PostgreSQL 생성하는 statefulset에서는 PVC를 활용하여 저장공간을 마운트하게 되어있는데, 저는 해당 DB 를 영구적으로 보존하기 위해 postgresql 의 /bitnami/postgresql 경로를 nas 장비의 /a/b/c 라는 경로에 마운트를 해놓은 PV를 활용했습니다.

하지만, PostgreSQL Pod의 상태는 Error 였는데요.

 

과정

PostgreSQL Pod 의 첫번째 로그는 아래와 같았습니다.

postgresql 07:19:25.61
postgresql 07:19:25.61 Welcome to the Bitnami postgresql container
postgresql 07:19:25.61 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 07:19:25.61 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 07:19:25.61
postgresql 07:19:25.63 INFO  ==> ** Starting PostgreSQL setup **
postgresql 07:19:25.64 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 07:19:25.65 INFO  ==> Loading custom pre-init scripts...
postgresql 07:19:25.66 INFO  ==> Initializing PostgreSQL database...
postgresql 07:19:25.68 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 07:19:25.68 INFO  ==> Generating local authentication configuration
postgresql 07:19:25.69 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql 07:19:25.70 INFO  ==> Configuring replication parameters
postgresql 07:19:25.73 INFO  ==> Configuring fsync
postgresql 07:19:25.74 INFO  ==> Configuring synchronous_replication
postgresql 07:19:25.78 INFO  ==> Loading custom scripts...
postgresql 07:19:25.78 INFO  ==> Enabling remote connections
postgresql 07:19:25.79 INFO  ==> ** PostgreSQL setup finished! **

postgresql 07:19:25.81 INFO  ==> ** Starting PostgreSQL **
2023-08-18 07:19:25.827 GMT [1] FATAL:  data directory "/bitnami/postgresql/data" has invalid permissions
2023-08-18 07:19:25.827 GMT [1] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

/bitnami/postgresql/data 의 권한 이슈가 발생했습니다. 따라서, 저는 nas 장비의 /a/b/c/data 권한을 0700 권한으로 적용해주었습니다. 

 

 

두번째 이슈입니다.

postgresql 07:19:43.78
postgresql 07:19:43.78 Welcome to the Bitnami postgresql container
postgresql 07:19:43.78 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 07:19:43.78 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 07:19:43.78
postgresql 07:19:43.80 INFO  ==> ** Starting PostgreSQL setup **
postgresql 07:19:43.82 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 07:19:43.82 INFO  ==> Loading custom pre-init scripts...
postgresql 07:19:43.83 INFO  ==> Initializing PostgreSQL database...
mkdir: cannot create directory ‘/bitnami/postgresql’: Permission denied

Postgresql은 /bitnami/postgresql 디렉토리를 내부에서 생성하려고 하는데, 저는 nas 장비의 /a/b/c 를 /bitnami/postgresql에 마운트했기 때문에 c 디렉토리를 mkdir 할 수 없었습니다. 따라서, 저는 PV를 /bitnami 에 마운트 하는 것으로 수정했습니다.

 

 

세번째 이슈입니다.

postgresql 07:21:35.19
postgresql 07:21:35.19 Welcome to the Bitnami postgresql container
postgresql 07:21:35.19 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 07:21:35.19 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 07:21:35.20
postgresql 07:21:35.21 INFO  ==> ** Starting PostgreSQL setup **
postgresql 07:21:35.23 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 07:21:35.24 INFO  ==> Loading custom pre-init scripts...
postgresql 07:21:35.24 INFO  ==> Initializing PostgreSQL database...
mkdir: cannot create directory ‘/bitnami’: Permission denied

 

이번에는, /bitnami 디렉토리를 mkdir 시도합니다... 그렇다고 / (절대경로)를 마운트 할 수도 없고,,,


해결

컨테이너 mountPath에 subPath 필드의 기능이 생각났습니다.

subPath를 활용하면, 마운트 경로 하위에 특정 디렉토리를 생성하여 해당 디렉토리 하위에 마운트하게 됩니다.

예를 들어, 현재 상황에서 PostgreSQL statefulset 명세에

"mountPath: /bitnami, subPath: data" 를 적용하게 되면,

nas 장비의 a/b/c/data 가 postgresql의 /bitnami에 마운트하게 되고, mkdir /bitnami 문제는 해결되게 됩니다.

위 방법으로 PostgreSQL을 정상적으로 운영할 수 있게 되었습니다.

 

Manifests

PostgreSQL Statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: airflow
    meta.helm.sh/release-namespace: airflow
  labels:
    app.kubernetes.io/component: primary
    app.kubernetes.io/instance: airflow
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: postgresql
    helm.sh/chart: postgresql-10.5.3
  name: airflow-postgresql
  namespace: airflow
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: airflow
      app.kubernetes.io/name: postgresql
      role: primary
  serviceName: airflow-postgresql-headless
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: primary
        app.kubernetes.io/instance: airflow
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: postgresql
        helm.sh/chart: postgresql-10.5.3
        role: primary
      name: airflow-postgresql
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: primary
                  app.kubernetes.io/instance: airflow
                  app.kubernetes.io/name: postgresql
              namespaces:
              - airflow
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: POSTGRESQL_PORT_NUMBER
          value: "5432"
        - name: POSTGRESQL_VOLUME_DIR
          value: /bitnami/postgresql
        - name: PGDATA
          value: /bitnami/postgresql/data
        - name: POSTGRES_USER
          value: postgres
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: postgresql-password
              name: airflow-postgresql
        - name: POSTGRESQL_ENABLE_LDAP
          value: "no"
        - name: POSTGRESQL_ENABLE_TLS
          value: "no"
        - name: POSTGRESQL_LOG_HOSTNAME
          value: "false"
        - name: POSTGRESQL_LOG_CONNECTIONS
          value: "false"
        - name: POSTGRESQL_LOG_DISCONNECTIONS
          value: "false"
        - name: POSTGRESQL_PGAUDIT_LOG_CATALOG
          value: "off"
        - name: POSTGRESQL_CLIENT_MIN_MESSAGES
          value: error
        - name: POSTGRESQL_SHARED_PRELOAD_LIBRARIES
          value: pgaudit
        image: docker.io/bitnami/postgresql:11.12.0-debian-10-r44
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
          failureThreshold: 6
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: airflow-postgresql
        ports:
        - containerPort: 5432
          name: tcp-postgresql
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - -e
            - |
              exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
              [ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
          failureThreshold: 6
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 250m
            memory: 256Mi
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
        - mountPath: /bitnami
          name: data
          subPath: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir:
          medium: Memory
        name: dshm
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem

 

NFS mount PersistantVolume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 8Gi
  nfs:
    path: /a/b/c
    server: abc.example.com
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem