MLOPS/kubernetes

Knative-serving 다루기

개발허재 2024. 5. 20. 22:17

Knative Document

https://knative.dev/docs/serving/

 

Overview - Knative

We use analytics and cookies to understand site traffic. Information about your use of our site is shared with Google for that purpose. Learn more.

knative.dev

 

 

Knative-serving Virtualservice의 http.route.match.headers.host 고정하기

istio와 상호작용하는 knative serving이 기본적으로 배포하는 istio virtualservice는 http.route.match.headers.host에 kubernete dns 기본정책이나 커스텀 domain을 자동으로 선언한다.

 

하지만, 나는 사내 API GW를 통과하기 위해 request header의 HOST 를 고정할 필요가 있었고, 이는 configmap 수정으로 가능했다.

Knative Serving의 네트워킹과 도메인 관련 동작은 주로 config-network와 config-domain configmap을 통해 조정된다.

k edit cm -n knative-serving config-network

...
apiVersion: v1
data:
  domain-template: "my.domain.com" # 이 부분을 추가한다.
  _example: |
    ################################
    #                              #
    #    EXAMPLE CONFIGURATION     #
    #                              #
    ################################
...

 

 

그 다음 다시 ksvc를 배포하면

➜  ~ k get vs -n llm-engine llm-engine-ksvc-ingress
NAME                      GATEWAYS                                                                              HOSTS                                                                                                                             AGE
llm-engine-ksvc-ingress   ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]   ["my.domain.com","llm-engine-ksvc.llm-engine","llm-engine-ksvc.llm-engine.svc","llm-engine-ksvc.llm-engine.svc.cluster.local"]   39s

 

위와 같이 my.domain.com이 HOSTS에 추가된것을  확인할 수  있고, 아래와 같이 제대로 요청성공을 한다.

결론

위와 같이 구현은 가능하지만, istio와 knative serving은 서비스 메쉬 구현을 위해 Host로 서비스를 분리하고 제어하는 것이 기본 원리이기 때문에, 추후에는 좀 더 고도화된 방법을 적용해봐야겠다.

 

다른 방안

(config-domain 컨피그맵만 수정함으로써 {sevice-name}.{namespace}.{domain} 형식으로 분리가능)

k edit cm -n knative-serving config-domain

...
apiVersion: v1
data:
  my.domain.com: "" # 이 부분을 추가한다.
  _example: |
    ################################
    #                              #
    #    EXAMPLE CONFIGURATION     #
    #                              #
    ################################
...

 

llm-engine-ksvc-ingress   ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"]   ["llm-engine-ksvc.llm-engine","llm-engine-ksvc.llm-engine.svc","llm-engine-ksvc.llm-engine.svc.cluster.local","llm-engine-ksvc.llm-engine.my.domain.com"]   17d

PVC와 Node Selector 사용하기

k edit --namespace knative-serving configmap/config-features

...
apiVersion: v1
data:
  #### 아래 세 부분 추가해준다####
  kubernetes.podspec-nodeselector: enabled
  kubernetes.podspec-persistent-volume-claim: enabled
  kubernetes.podspec-persistent-volume-write: enabled
  ###########################
  _example: |-  
...

엔드포인트 /model-api 로 라우팅해놓은 Istio로 호출하기

 

Knative가 자동으로 생성한 Virtualservice이외의 다른 하나의 VS를 생성한다.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: llm-engine-service
  namespace: llm-engine
spec:
  gateways:
  - knative-serving/knative-ingress-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /model-api/llmagent/
    rewrite:
      authority: llm-engine-ksvc.llm-engine.svc.cluster.local
      uri: /llmagent/
    route:
    - destination:
        host: knative-local-gateway.istio-system.svc.cluster.local
        port:
          number: 80
      weight: 100

 

gateway는 기본 생성 VS와 동일하게 knative의 gw를 바라보게 한다.

 

부하테스트

  • hey를 활용한 부하테스트 (hey is a tiny program that sends some load to a web application)
    1. brew install hey
    2. hey -z 3s -c 50 "http://my.domain.com/model-api/llmagent/alivecheck"
    3. 위 명령어는 3초동안 50개의 HTTP 요청을 지속적으로 보낸다.
Summary:
  Total:	3.1023 secs
  Slowest:	0.3858 secs
  Fastest:	0.0135 secs
  Average:	0.0960 secs
  Requests/sec:	509.6224

  Total data:	23374 bytes
  Size/request:	14 bytes

Response time histogram:
  0.014 [1]	|
  0.051 [268]	|■■■■■■■■■■■■■■■■■■■■
  0.088 [544]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.125 [424]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.162 [152]	|■■■■■■■■■■■
  0.200 [136]	|■■■■■■■■■■
  0.237 [38]	|■■■
  0.274 [10]	|■
  0.311 [2]	|
  0.349 [5]	|
  0.386 [1]	|


Latency distribution:
  10% in 0.0454 secs
  25% in 0.0574 secs
  50% in 0.0860 secs
  75% in 0.1186 secs
  90% in 0.1765 secs
  95% in 0.1922 secs
  99% in 0.2389 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0006 secs, 0.0135 secs, 0.3858 secs
  DNS-lookup:	0.0001 secs, 0.0000 secs, 0.0026 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0007 secs
  resp wait:	0.0952 secs, 0.0134 secs, 0.3857 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0032 secs

Status code distribution:
  [200]	377 responses
  [429]	1204 responses

 

  • shell script를 통한 부하테스트
$ vi stress.sh


#!/bin/bash

# 동시에 300개 요청
for ((i=1;i<=300;i++))
do
   curl "http://dev.innerapi.wehago.com/model-api/llmagent/alivecheck" &
done

:wq

$ chmod 777 stress.sh
$ stress.sh


{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":20...

 

Pod Autoscaled Result

$ k get pods -n llm-engine -o wide -w
NAME                                                READY   STATUS        RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm   1/2     Terminating   0          94s   10.244.5.205   ds-dev-005   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6   2/2     Running       0          94s   10.244.3.87    ds-dev-004   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-hmdkl   2/2     Running       0          23m   10.244.7.46    ds-dev-006   <none>           <none>



llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6   2/2     Terminating   0          112s   10.244.3.87    ds-dev-004   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6   1/2     Terminating   0          2m21s   10.244.3.87    ds-dev-004   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm   0/2     Terminating   0          6m39s   10.244.5.205   ds-dev-005   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm   0/2     Terminating   0          6m40s   10.244.5.205   ds-dev-005   <none>           <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm   0/2     Terminating   0          6m40s   10.244.5.205   ds-dev-005   <none>           <none>

 

Knative Autoscaler Log

...
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081258906Z","logger":"autoscaler","caller":"kpa/kpa.go:154","message":"SKS should be in Proxy mode: want = 1, ebc = -206, #act's = 3 PA Inactive? = false","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081331457Z","logger":"autoscaler","caller":"kpa/kpa.go:174","message":"PA scale got=1, want=1, desiredPods=0 ebc=-206","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081361011Z","logger":"autoscaler","caller":"kpa/kpa.go:184","message":"Observed pod counts=kpa.podCounts{want:1, ready:1, notReady:0, pending:0, terminating:0}","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081418582Z","logger":"autoscaler","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001","duration":"449.98µs"}
...

 


containerConcurrency

컨테이너가 동시에 요청을 받을 수 있는 개수

autoscaling.knative.dev/target

시스템은 각 Pod가 평균적으로 n개의 동시 요청을 처리할 수 있도록 스케일링을 목표로 함

 

예시

containerConcurrency를 100, autoscaling.knative.dev/target을 10으로 설정한 Knative 서비스(Knative Serving, ksvc)

 

[헬스체크 api 기준]

100개의 요청이 들어올때, pod은 10개씩 골고루 받아서 스케일링 되야함에도 불구하고 단일 pod이 100개 요청 다 처리

500개 역시 모두 처리

1000개 오토스케일링됨 (하지만, 다시 1000번 호출했을때 pod 한개로 처리됨. 캐싱?이 적용된건가)

 

[응답시간 약 1초 api 기준]

10개의 요청은 단일 pod이 처리

20개의 요청은 pod 1개 오토스케일링됨

30개의 요청은 pod 2개 오토스케일링됨 (총 pod 3개)

 


계속....