Chapter 6.5 오픈텔레메트리로 구현하는 o11y

오픈텔레메트리 컬렉터에 대해서 설명합니다. 도서에서는 오픈텔레메트리 컬렉터 예제를 제공하지 않습니다. 하지만 블로그를 통해서 3개 예제를 제공하므로, 유튜브 동영상을 제공합니다.

관찰가능성은 컬렉터 없이도 구축이 가능합니다. 하지만 오픈텔레메트리와 컬렉터를 사용하면, 더 쉽고 빠르게 관찰가능성 구축이 가능합니다.

컬렉터의 활용 방법은 다양합니다.

  1. 오픈텔레메트리 메세지의 라우팅과 변환을 처리합니다.
  2. 다양한 프로토콜을 지원하므로, 통합이 쉽습니다.

오픈텔레메트리 프로젝트에서 컬렉터는 항상 사용하고, 중요합니다.

쉽고 빠르게 구현하는 관찰가능성

자바 어플리케이션에 오픈텔레메트리와 컬렉터를 적용한다고 가정하면, 아래의 순서대로 진행하는 것을 권장합니다.

  1. 오픈텔레메트리 추적을 사용해서, 추적과 로그를 생성할 수 있습니다.
  2. 추적의 span metrics를 사용해서, 메트릭을 생성합니다.
  3. 일반적으로는 스프링부트와 Log4j를 오픈텔레메트리 로그와 바인딩합니다. 로그의 문맥을 보강합니다. 하지만 추적을 사용해서 로그를 생성하는 것도 가능합니다.
  4. 추가적인 메트릭은 프로메테우스 자바 API (예를 들어 exemplar)를 사용해서 구현합니다.
  5. 그라파나에서 다양한 상관 관계를 구현합니다.
  6. 텔레메트리 라우팅(AIOps, 이상탐지 파이프라인)과 변환은 컬렉터에서 수행합니다.

다수의 시행착오와 경험을 통해서, 위에서 언급한 절차를 검증하였습니다.

컬렉터 예제 #3

아래의 기능을 데모합니다.

  1. 오픈텔레메트리 추적을 오픈텔레메트리 컬렉터로 전송합니다.
  2. 프로메테우스는 오픈텔레메트리 컬렉터의 메트릭을 수집합니다.
  3. 컬렉터는 Prometheus, Jaeger와 연계합니다.
  4. Go로 개발된 클라이언트는 서버를 주기적으로 호출하면서, 텔레메트리를 생성합니다.
  5. 요청 개수와 지연시간 메트릭을 제공합니다.

컬렉터를 실행합니다.

./otelcol-contrib --config ./otel-collector-config.yaml

Jaeger를 실행합니다.

./jaeger-all-in-one --collector.zipkin.host-port=:9411

서버를 실행합니다.

./server

클라이언트를 실행합니다.

./client

서버와 클라이언트는 메세지를 주고 받으며, 트랜잭션을 자동으로 생성합니다.

아래처럼 서비스 목록을 출력합니다.

추적의 처리 시간 분포도를 출력합니다.

추적을 출력하기 위해서, 그라파나 대시보드를 사용할 수도 있습니다.

그라파나는 Jaeger 추적을 출력합니다.

오픈텔레메트리 컬렉터는 아래와 같은 메트릭을 자동으로 생성합니다. 오픈텔레메트리 컬렉터는 많은 트랜잭션을 처리합니다. 높은 성능을 처리하도록 항상 모니터링해야 합니다.

컬렉터 예제 #1

이번 글에서 데모하는 컬렉터는 Prometheus, Tempo, Loki와 연계합니다.

파이썬 가상환경을 시작합니다. 그리고 라이브러리를 설치합니다.

pip install opentelemetry-api==1.9.0
pip install opentelemetry-sdk==1.9.0
pip install opentelemetry-propagator-b3==1.9.0
pip install opentelemetry-instrumentation==0.28b0
pip install opentelemetry-instrumentation-wsgi==0.28b0
pip install opentelemetry-semantic-conventions==0.28b0
pip install flask
pip install requests
pip install opentelemetry-exporter-otlp
pip install protobuf==3.20.*

만약 라이브러리 에러가 발생하면, 라이브러리 재설치합니다.

pip uninstall opentelemetry-exporter-otlp-proto-http
pip uninstall opentelemetry-exporter-otlp-proto-grpc
pip install opentelemetry-sdk==1.9.0
pip install opentelemetry-exporter-otlp==1.9.0

컬렉터를 시작합니다.

./otelcol-contrib --config ./config/collector/config.yml

컬렉터의 로그를 출력합니다. 오픈텔레메트리 추적과 로그를 포함합니다. 메트릭은 포함되지 않습니다.

2022-12-08T20:30:55.756+0900 info builder/exporters_builder.go:255 Exporter was built. {"kind": "exporter", "name": "logging"}
2022-12-08T20:30:55.756+0900 info builder/pipelines_builder.go:223 Pipeline was built. {"name": "pipeline", "name": "traces"}
2022-12-08T20:30:55.756+0900 info builder/pipelines_builder.go:223 Pipeline was built. {"name": "pipeline", "name": "logs"}
2022-12-08T20:30:55.756+0900 info filterprocessor@v0.43.0/filter_processor.go:78 Metric filter configured {"kind": "processor", "name": "filter/network-connections", "include match_type": "", "include expressions": [], "include metric names": [], "include metrics with resource attributes": null, "exclude match_type": "strict", "exclude expressions": [], "exclude metric names": ["system.network.connections"], "exclude metrics with resource attributes": null, "checksMetrics": true, "checkResouces": false}
2022-12-08T20:30:55.756+0900 info builder/pipelines_builder.go:223 Pipeline was built. {"name": "pipeline", "name": "metrics"}
2022-12-08T20:30:55.756+0900 info builder/receivers_builder.go:226 Receiver was built. {"kind": "receiver", "name": "hostmetrics", "datatype": "metrics"}
2022-12-08T20:30:55.756+0900 info builder/receivers_builder.go:226 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "traces"}
2022-12-08T20:30:55.756+0900 info builder/receivers_builder.go:226 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "logs"}
2022-12-08T20:30:55.756+0900 info builder/receivers_builder.go:226 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "metrics"}
2022-12-08T20:30:55.757+0900 info service/service.go:82 Starting extensions...
2022-12-08T20:30:55.757+0900 info service/service.go:87 Starting exporters...
2022-12-08T20:30:55.757+0900 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "logging"}
2022-12-08T20:30:55.757+0900 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "logging"}
2022-12-08T20:30:55.757+0900 info service/service.go:92 Starting processors...
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"}
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "traces"}
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "logs"}
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "logs"}
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "metrics"}
2022-12-08T20:30:55.757+0900 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "metrics"}
2022-12-08T20:30:55.757+0900 info service/service.go:97 Starting receivers...
2022-12-08T20:30:55.757+0900 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "hostmetrics"}
2022-12-08T20:30:55.758+0900 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "hostmetrics"}
2022-12-08T20:30:55.758+0900 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2022-12-08T20:30:55.758+0900 info otlpreceiver/otlp.go:69 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"}
2022-12-08T20:30:55.758+0900 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"}
2022-12-08T20:30:55.758+0900 info service/telemetry.go:95 Setting up own telemetry...
2022-12-08T20:30:55.759+0900 info service/telemetry.go:115 Serving Prometheus metrics {"address": ":8888", "level": "basic", "service.instance.id": "765bbc8e-22bd-41bb-ae4e-572852a38a1f", "service.version": "latest"}
2022-12-08T20:30:55.759+0900 info service/collector.go:229 Starting otelcol-contrib... {"Version": "0.43.0", "NumCPU": 8}
2022-12-08T20:30:55.759+0900 info service/collector.go:124 Everything is ready. Begin running and processing data.
2022-12-08T20:31:05.758+0900 INFO loggingexporter/logging_exporter.go:69 LogsExporter {"#logs": 2}
2022-12-08T20:31:05.758+0900 DEBUG loggingexporter/logging_exporter.go:79 ResourceLog #0
Resource SchemaURL:
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.9.0)
-> net.host.name: STRING(philip-virtual-machine)
-> net.host.ip: STRING(127.0.1.1)
-> service.name: STRING(grocery-store)
-> service.version: STRING(0.1.2)
InstrumentationLibraryLogs #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary opentelemetry.sdk._logs
LogRecord #0
Timestamp: 2022-12-08 11:30:50.181229824 +0000 UTC
Severity: INFO
ShortName:
Body: WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Trace ID:
Span ID:
Flags: 0
LogRecord #1
Timestamp: 2022-12-08 11:30:50.181906688 +0000 UTC
Severity: INFO
ShortName:
Body: Press CTRL+C to quit
Trace ID:
Span ID:
Flags: 0

2022-12-08T20:31:55.763+0900 INFO loggingexporter/logging_exporter.go:69 LogsExporter {"#logs": 2}
2022-12-08T20:31:55.763+0900 DEBUG loggingexporter/logging_exporter.go:79 ResourceLog #0
Resource SchemaURL:
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.9.0)
-> net.host.name: STRING(philip-virtual-machine)
-> net.host.ip: STRING(127.0.1.1)
-> service.name: STRING(shopper)
-> service.version: STRING(0.1.2)
InstrumentationLibraryLogs #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary opentelemetry.sdk._logs
LogRecord #0
Timestamp: 2022-12-08 11:31:46.349606656 +0000 UTC
Severity: INFO
ShortName:
Body: add orange to cart
Trace ID: c904b0da1a7d96ba0c5dc5e5229d07f5
Span ID: c6c9020d9e3ce949
Flags: 1
ResourceLog #1
Resource SchemaURL:
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.9.0)
-> net.host.name: STRING(philip-virtual-machine)
-> net.host.ip: STRING(127.0.1.1)
-> service.name: STRING(grocery-store)
-> service.version: STRING(0.1.2)
InstrumentationLibraryLogs #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary opentelemetry.sdk._logs
LogRecord #0
Timestamp: 2022-12-08 11:31:46.348048128 +0000 UTC
Severity: INFO
ShortName:
Body: 127.0.0.1 - - [08/Dec/2022 20:31:46] "GET /products HTTP/1.1" 200 -
Trace ID: c904b0da1a7d96ba0c5dc5e5229d07f5
Span ID: a72316a5ee626b42
Flags: 1

2022-12-08T20:31:55.763+0900 INFO loggingexporter/logging_exporter.go:40 TracesExporter {"#spans": 7}
2022-12-08T20:31:55.763+0900 DEBUG loggingexporter/logging_exporter.go:49 ResourceSpans #0
Resource SchemaURL:
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.9.0)
-> net.host.name: STRING(philip-virtual-machine)
-> net.host.ip: STRING(127.0.1.1)
-> service.name: STRING(shopper)
-> service.version: STRING(0.1.2)
InstrumentationLibrarySpans #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary shopper 0.1.2
Span #0
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : fb2c8fd93ca6b41e
ID : b40e53112018e22d
Name : web request
Kind : SPAN_KIND_CLIENT
Start time : 2022-12-08 11:31:46.33546731 +0000 UTC
End time : 2022-12-08 11:31:46.349429499 +0000 UTC
Status code : STATUS_CODE_OK
Status message :
Attributes:
-> http.method: STRING(GET)
-> http.flavor: STRING(1.1)
-> http.url: STRING(http://localhost:5000/products)
-> net.peer.ip: STRING(127.0.0.1)
-> http.status_code: INT(200)
-> location: STRING(europe)
Events:
SpanEvent #0
-> Name: about to send a request
-> Timestamp: 2022-12-08 11:31:46.335615138 +0000 UTC
-> DroppedAttributesCount: 0
SpanEvent #1
-> Name: request sent
-> Timestamp: 1970-01-01 00:00:00 +0000 UTC
-> DroppedAttributesCount: 0
-> Attributes:
-> url: STRING(http://localhost:5000/products)
Span #1
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : fb2c8fd93ca6b41e
ID : c6c9020d9e3ce949
Name : europe:orange:5
Kind : SPAN_KIND_INTERNAL
Start time : 2022-12-08 11:31:46.349510167 +0000 UTC
End time : 2022-12-08 11:31:46.349765843 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> item: STRING(orange)
-> quantity: INT(5)
-> location: STRING(europe)
Span #2
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : 4d914b846b6f7334
ID : fb2c8fd93ca6b41e
Name : browse
Kind : SPAN_KIND_INTERNAL
Start time : 2022-12-08 11:31:46.335307894 +0000 UTC
End time : 2022-12-08 11:31:46.349820949 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> location: STRING(europe)
Span #3
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID :
ID : 4d914b846b6f7334
Name : visit store
Kind : SPAN_KIND_INTERNAL
Start time : 2022-12-08 11:31:46.335092783 +0000 UTC
End time : 2022-12-08 11:31:46.349833825 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> location: STRING(europe)
ResourceSpans #1
Resource SchemaURL:
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.9.0)
-> net.host.name: STRING(philip-virtual-machine)
-> net.host.ip: STRING(127.0.1.1)
-> service.name: STRING(grocery-store)
-> service.version: STRING(0.1.2)
InstrumentationLibrarySpans #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary grocery-store 0.1.2
Span #0
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : cd4c80fbf31d6d82
ID : 71283789c7591254
Name : inventory request
Kind : SPAN_KIND_INTERNAL
Start time : 2022-12-08 11:31:46.340301935 +0000 UTC
End time : 2022-12-08 11:31:46.347509007 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> http.method: STRING(GET)
-> http.flavor: STRING(HttpFlavorValues.HTTP_1_1)
-> http.url: STRING(http://localhost:5001/inventory)
-> net.peer.ip: STRING(127.0.0.1)
-> location: STRING(europe)
Span #1
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : b40e53112018e22d
ID : cd4c80fbf31d6d82
Name : /products
Kind : SPAN_KIND_SERVER
Start time : 2022-12-08 11:31:46.340153365 +0000 UTC
End time : 2022-12-08 11:31:46.347549802 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> http.flavor: STRING(HTTP/1.1)
-> http.method: STRING(GET)
-> http.user_agent: STRING(python-requests/2.28.1)
-> http.host: STRING(localhost:5000)
-> http.scheme: STRING(http)
-> http.target: STRING(/products)
-> http.client_ip: STRING(127.0.0.1)
-> location: STRING(europe)
InstrumentationLibrarySpans #1
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary opentelemetry.instrumentation.wsgi 0.28b0
Span #0
Trace ID : c904b0da1a7d96ba0c5dc5e5229d07f5
Parent ID : b40e53112018e22d
ID : a72316a5ee626b42
Name : HTTP GET
Kind : SPAN_KIND_SERVER
Start time : 2022-12-08 11:31:46.339725696 +0000 UTC
End time : 2022-12-08 11:31:46.348410324 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> http.method: STRING(GET)
-> http.server_name: STRING(127.0.0.1)
-> http.scheme: STRING(http)
-> net.host.port: INT(5000)
-> http.host: STRING(localhost:5000)
-> http.target: STRING(/products)
-> net.peer.ip: STRING(127.0.0.1)
-> http.user_agent: STRING(python-requests/2.28.1)
-> net.peer.port: INT(42684)
-> http.flavor: STRING(1.1)
-> http.status_code: INT(200)
-> location: STRING(europe)

2022-12-08T20:32:05.763+0900 INFO loggingexporter/logging_exporter.go:54 MetricsExporter {"#metrics": 1}
2022-12-08T20:32:05.763+0900 DEBUG loggingexporter/logging_exporter.go:64 ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/v1.5.0
InstrumentationLibraryMetrics #0
InstrumentationLibraryMetrics SchemaURL:
InstrumentationLibrary
Metric #0
Descriptor:
-> Name: system.memory.usage
-> Description: Bytes of memory in use.
-> Unit: By
-> DataType: Sum
-> IsMonotonic: false
-> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
Data point attributes:
-> state: STRING(used)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 1451200512
NumberDataPoints #1
Data point attributes:
-> state: STRING(free)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 14068133888
NumberDataPoints #2
Data point attributes:
-> state: STRING(buffered)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 117788672
NumberDataPoints #3
Data point attributes:
-> state: STRING(cached)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 1736159232
NumberDataPoints #4
Data point attributes:
-> state: STRING(slab_reclaimable)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 168185856
NumberDataPoints #5
Data point attributes:
-> state: STRING(slab_unreclaimable)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2022-12-08 11:31:55.775525231 +0000 UTC
Value: 218247168

컬렉터 예제 #2

미니쿠베를 시작합니다.

minikube start --vm-driver=none --v1.20.0 --memory=26000 --cpus=4

프로메테우스를 시작합니다.

kubectl apply -f prometheus-service.yaml
kubectl apply -f prometheus-deployment.yaml
kubectl apply -f prometheus-claim0-persistentvolumeclaim.yaml
cp prometheus.yml /tmp/hostpath-provisioner/default/prometheus-claim0

로키를 시작합니다.

kubectl apply -f loki-service.yaml
kubectl apply -f loki-deployment.yaml

예거를 시작합니다.

kubectl apply -f jaeger-service.yaml
kubectl apply -f jaeger-deployment.yaml

오픈텔레메트리를 시작합니다.

kubectl apply -f opentelemetry-collector-service.yaml
kubectl apply -f opentelemetry-collector-deployment.yaml
kubectl apply -f opentelemetry-collector-claim0-persistentvolumeclaim.yaml
kubectl apply -f opentelemetry-collector-claim1-persistentvolumeclaim.yaml
cp opentelemetry-collector.yml /tmp/hostpath-provisioner/default/opentelemetry-collector-claim0
cp opentelemetry-collector.yml /tmp/hostpath-provisioner/default/opentelemetry-collector-claim1

그라파나를 시작합니다.

ubectl apply -f grafana-service.yaml
kubectl apply -f grafana-deployment.yaml
kubectl apply -f grafana-claim0-persistentvolumeclaim.yaml
cd /tmp/hostpath-provisioner/default/grafana-claim0
mkdir datasources
cp datasources.yaml /tmp/hostpath-provisioner/default/grafana-claim0/datasources

로키 데이터 소스에서 http://jaeger:16686/trace/${__value.raw}
예거 데이터 소스에서 http://jaeger:16686

프롬테일을 시작합니다.

kubectl apply -f promtail-deployment.yaml
kubectl apply -f promtail-claim0-persistentvolumeclaim.yaml

어플리케이션을 시작합니다.

kubectl apply -f grocery-store-service.yaml
kubectl apply -f grocery-store-deployment.yaml

어플리케이션을 시작합니다.

kubectl apply -f legacy-inventory-service.yaml
kubectl apply -f legacy-inventory-deployment.yaml

어플리케이션을 시작합니다.

kubectl apply -f shopper-deployment.yaml

대시보드에 접속하시면, 아래의 화면을 보실 수 있습니다.

--

--

모니터링의 새로운 미래 관측 가능성

모니터링의 새로운 미래 관측 가능성의 소스를 설명합니다.