Knative-serving 다루기
Knative Document
https://knative.dev/docs/serving/
Overview - Knative
We use analytics and cookies to understand site traffic. Information about your use of our site is shared with Google for that purpose. Learn more.
knative.dev
Knative-serving Virtualservice의 http.route.match.headers.host 고정하기
istio와 상호작용하는 knative serving이 기본적으로 배포하는 istio virtualservice는 http.route.match.headers.host에 kubernete dns 기본정책이나 커스텀 domain을 자동으로 선언한다.
하지만, 나는 사내 API GW를 통과하기 위해 request header의 HOST 를 고정할 필요가 있었고, 이는 configmap 수정으로 가능했다.
Knative Serving의 네트워킹과 도메인 관련 동작은 주로 config-network와 config-domain configmap을 통해 조정된다.
k edit cm -n knative-serving config-network
...
apiVersion: v1
data:
domain-template: "my.domain.com" # 이 부분을 추가한다.
_example: |
################################
# #
# EXAMPLE CONFIGURATION #
# #
################################
...
그 다음 다시 ksvc를 배포하면
➜ ~ k get vs -n llm-engine llm-engine-ksvc-ingress
NAME GATEWAYS HOSTS AGE
llm-engine-ksvc-ingress ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"] ["my.domain.com","llm-engine-ksvc.llm-engine","llm-engine-ksvc.llm-engine.svc","llm-engine-ksvc.llm-engine.svc.cluster.local"] 39s
위와 같이 my.domain.com이 HOSTS에 추가된것을 확인할 수 있고, 아래와 같이 제대로 요청성공을 한다.

결론
위와 같이 구현은 가능하지만, istio와 knative serving은 서비스 메쉬 구현을 위해 Host로 서비스를 분리하고 제어하는 것이 기본 원리이기 때문에, 추후에는 좀 더 고도화된 방법을 적용해봐야겠다.
다른 방안
(config-domain 컨피그맵만 수정함으로써 {sevice-name}.{namespace}.{domain} 형식으로 분리가능)
k edit cm -n knative-serving config-domain
...
apiVersion: v1
data:
my.domain.com: "" # 이 부분을 추가한다.
_example: |
################################
# #
# EXAMPLE CONFIGURATION #
# #
################################
...
llm-engine-ksvc-ingress ["knative-serving/knative-ingress-gateway","knative-serving/knative-local-gateway"] ["llm-engine-ksvc.llm-engine","llm-engine-ksvc.llm-engine.svc","llm-engine-ksvc.llm-engine.svc.cluster.local","llm-engine-ksvc.llm-engine.my.domain.com"] 17d
PVC와 Node Selector 사용하기
k edit --namespace knative-serving configmap/config-features
...
apiVersion: v1
data:
#### 아래 세 부분 추가해준다####
kubernetes.podspec-nodeselector: enabled
kubernetes.podspec-persistent-volume-claim: enabled
kubernetes.podspec-persistent-volume-write: enabled
###########################
_example: |-
...
엔드포인트 /model-api 로 라우팅해놓은 Istio로 호출하기
Knative가 자동으로 생성한 Virtualservice이외의 다른 하나의 VS를 생성한다.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: llm-engine-service
namespace: llm-engine
spec:
gateways:
- knative-serving/knative-ingress-gateway
hosts:
- '*'
http:
- match:
- uri:
prefix: /model-api/llmagent/
rewrite:
authority: llm-engine-ksvc.llm-engine.svc.cluster.local
uri: /llmagent/
route:
- destination:
host: knative-local-gateway.istio-system.svc.cluster.local
port:
number: 80
weight: 100
gateway는 기본 생성 VS와 동일하게 knative의 gw를 바라보게 한다.
부하테스트
- hey를 활용한 부하테스트 (hey is a tiny program that sends some load to a web application)
- brew install hey
- hey -z 3s -c 50 "http://my.domain.com/model-api/llmagent/alivecheck"
- 위 명령어는 3초동안 50개의 HTTP 요청을 지속적으로 보낸다.
Summary:
Total: 3.1023 secs
Slowest: 0.3858 secs
Fastest: 0.0135 secs
Average: 0.0960 secs
Requests/sec: 509.6224
Total data: 23374 bytes
Size/request: 14 bytes
Response time histogram:
0.014 [1] |
0.051 [268] |■■■■■■■■■■■■■■■■■■■■
0.088 [544] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.125 [424] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.162 [152] |■■■■■■■■■■■
0.200 [136] |■■■■■■■■■■
0.237 [38] |■■■
0.274 [10] |■
0.311 [2] |
0.349 [5] |
0.386 [1] |
Latency distribution:
10% in 0.0454 secs
25% in 0.0574 secs
50% in 0.0860 secs
75% in 0.1186 secs
90% in 0.1765 secs
95% in 0.1922 secs
99% in 0.2389 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0006 secs, 0.0135 secs, 0.3858 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0026 secs
req write: 0.0000 secs, 0.0000 secs, 0.0007 secs
resp wait: 0.0952 secs, 0.0134 secs, 0.3857 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0032 secs
Status code distribution:
[200] 377 responses
[429] 1204 responses
- shell script를 통한 부하테스트
$ vi stress.sh
#!/bin/bash
# 동시에 300개 요청
for ((i=1;i<=300;i++))
do
curl "http://dev.innerapi.wehago.com/model-api/llmagent/alivecheck" &
done
:wq
$ chmod 777 stress.sh
$ stress.sh
{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":200,"resultMsg":"요청 성공","resultData":{}}{"resultCode":20...
Pod Autoscaled Result
$ k get pods -n llm-engine -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm 1/2 Terminating 0 94s 10.244.5.205 ds-dev-005 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6 2/2 Running 0 94s 10.244.3.87 ds-dev-004 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-hmdkl 2/2 Running 0 23m 10.244.7.46 ds-dev-006 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6 2/2 Terminating 0 112s 10.244.3.87 ds-dev-004 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4x6p6 1/2 Terminating 0 2m21s 10.244.3.87 ds-dev-004 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm 0/2 Terminating 0 6m39s 10.244.5.205 ds-dev-005 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm 0/2 Terminating 0 6m40s 10.244.5.205 ds-dev-005 <none> <none>
llm-engine-ksvc-00001-deployment-57889fcf78-4fvzm 0/2 Terminating 0 6m40s 10.244.5.205 ds-dev-005 <none> <none>
Knative Autoscaler Log
...
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081258906Z","logger":"autoscaler","caller":"kpa/kpa.go:154","message":"SKS should be in Proxy mode: want = 1, ebc = -206, #act's = 3 PA Inactive? = false","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081331457Z","logger":"autoscaler","caller":"kpa/kpa.go:174","message":"PA scale got=1, want=1, desiredPods=0 ebc=-206","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081361011Z","logger":"autoscaler","caller":"kpa/kpa.go:184","message":"Observed pod counts=kpa.podCounts{want:1, ready:1, notReady:0, pending:0, terminating:0}","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001"}
{"severity":"INFO","timestamp":"2024-05-21T05:06:47.081418582Z","logger":"autoscaler","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"6ec4509","knative.dev/controller":"knative.dev.serving.pkg.reconciler.autoscaling.kpa.Reconciler","knative.dev/kind":"autoscaling.internal.knative.dev.PodAutoscaler","knative.dev/traceid":"e582fd0a-9a21-4c69-b361-53a33bd3ac96","knative.dev/key":"llm-engine/llm-engine-ksvc-00001","duration":"449.98µs"}
...
containerConcurrency
컨테이너가 동시에 요청을 받을 수 있는 개수
autoscaling.knative.dev/target
시스템은 각 Pod가 평균적으로 n개의 동시 요청을 처리할 수 있도록 스케일링을 목표로 함
예시
containerConcurrency를 100, autoscaling.knative.dev/target을 10으로 설정한 Knative 서비스(Knative Serving, ksvc)
[헬스체크 api 기준]
100개의 요청이 들어올때, pod은 10개씩 골고루 받아서 스케일링 되야함에도 불구하고 단일 pod이 100개 요청 다 처리
500개 역시 모두 처리
1000개 오토스케일링됨 (하지만, 다시 1000번 호출했을때 pod 한개로 처리됨. 캐싱?이 적용된건가)
[응답시간 약 1초 api 기준]
10개의 요청은 단일 pod이 처리
20개의 요청은 pod 1개 오토스케일링됨
30개의 요청은 pod 2개 오토스케일링됨 (총 pod 3개)
계속....