도커를 활용한 Triton Inference Server 구축

MLOPS/SERVING

도커를 활용한 Triton Inference Server 구축

개발허재 2022. 4. 14. 15:35

Triton Inference Server GitHub 주소: https://github.com/triton-inference-server/server

GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.

The Triton Inference Server provides an optimized cloud and edge inferencing solution. - GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge i...

github.com

도커 이미지를 띄우기 전에 TRTIS 도커 이미지를 가져왔으면 docker run 하기 전 TRTIS에 모델을 설정 해줍니다.

TRTIS는 정해진 모델 폴더 양식이 따로 있습니다. 모델의 저장소는 아래와 같은 레이아웃 형태를 지켜야합니다.

호스트 서버에 모델과 레이아웃을 지켜준 폴더를 만들어주면, 도커로 run 할 때 docker의 -v 옵션을 통해서 컨테이너와 마운트시켜줍니다.

<model-repository-path>/
	<model-name>/
		[config.pbtxt]
		[<output-labels-file> ...]
		<version>/
			<model-definition-file>
		<version>/
			<model-definition-file>
		...
	<model-name>/
		[config.pbtxt]
		[<output-labels-file> ...]
		<version>/
			<model-definition-file>
		<version>/
			<model-definition-file>
	 	...
	...

예시로,

data/model_repo        <사용자지정 폴더명>
	models/             <사용자지정 모델명>
		config.pbtxt		<무조건 이 이름을 지켜야함>
		1/              <version 아무렇게나 지어주면 됨>
			model.plan		<TensorRT의 확장자명은 plan, onnx는 onnx.. 이름은 무조건 model로 고정시켜야 함>
		2/
			model.plan
		...
	...

모델 저장소 안에는 여러 모델 이름 폴더들이 있고 그 안에는 각 모델의 정보가 담긴 config.pbtxt 파일과 그 아래에는 그 모델의 버전 폴더가 있고 각 버전안에는 모델이 들어가 있습니다. 호스트 서버에 첨부된 data/model_repo 폴더를 넣어줍니다.

config.pbtxt는 모델의 초기 설정과 같은 config 파일이고, 인풋 사이즈와 아웃풋 사이즈를 정해주고 이 모델이 TensorRT인지 Pytorch인지 Tensorflow인지 알려주면서, 배치사이즈 최대 크기는 몇으로 둬야 할지 등등 미리 정해줍니다.

참고로, 후에 docker run 할때 명령어 중 --strict-model-config=false 옵션을 사용하면, ONNX나 TensorFlow 모델에선 따로 config.pbtxt 파일을 제공할 필요가 없습니다. TensorRT를 사용할 때는 설정을 해주어야 합니다.

아래는 TensorRT의 config.pbtxt 파일 예시입니다.

platform: "tensorrt_plan"
max_batch_size: 1
input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 3, 540, 960 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 3, 1080, 1920 ]
  }
]
dynamic_batching { }

자세한 모델 저장소 구조 설명은 https://github.com/triton-inference-server/server/blob/master/docs/model_repository.md를 참고

GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.

The Triton Inference Server provides an optimized cloud and edge inferencing solution. - GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge i...

github.com

저는 다음과 같은 명령어로 도커 이미지를 활용하여 TRTIS 를 띄웠습니다.

sudo docker run -it --gpus all --ipc=host --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /data/model_repo:/models nvcr.io/nvidia/tritonserver:21.10-py3 tritonserver --model-repository=/models

TRTIS 서버가 성공적으로 작동하면, 아래와 같이 마지막 세줄에는 이 문구가 나옵니다.

I0414 04:54:03.284260 1 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0414 04:54:03.284517 1 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0414 04:54:03.325577 1 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002