首頁 > 軟體

skywalking容器化部署docker映象構建k8s從測試到可用

2022-03-01 10:00:12

前言碎語

skywalking是個非常不錯的apm產品,但是在使用過程中有個非常蛋疼的問題,在基於es的儲存情況下,es的資料一有問題,就會導致整個skywalking web ui服務不可用,然後需要agent端一個服務一個服務的停用,然後服務重新部署後好,全部走一遍。這種問題同樣也會存在skywalking的版本升級迭代中。而且apm 這種過程資料是允許丟棄的,預設skywalking中關於trace的資料記錄只儲存了90分鐘。故博主準備將skywalking的部署容器化,一鍵部署升級。下文是整個skywalking 容器化部署的過程。

目標:將skywalking的docker映象執行在k8s的叢集環境中提供服務

docker映象構建

FROM registry.cn-xx.xx.com/keking/jdk:1.8
ADD apache-skywalking-apm-incubating/  /opt/apache-skywalking-apm-incubating/
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime 
    && echo 'Asia/Shanghai' >/etc/timezone 
    && chmod +x /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh 
    && chmod +x /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh 
    && chmod +x /opt/apache-skywalking-apm-incubating/bin/startup.sh 
    && echo "tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log" >> /opt/apache-skywalking-apm-incubating/bin/startup.sh

EXPOSE 8080 10800 11800 12800
CMD /opt/apache-skywalking-apm-incubating/config/setApplicationEnv.sh 
     && sh /opt/apache-skywalking-apm-incubating/webapp/setWebAppEnv.sh 
     && /opt/apache-skywalking-apm-incubating/bin/startup.sh

在編寫Dockerfile時需要考慮幾個問題:skywalking中哪些設定需要動態設定(執行時設定)?怎麼保證程序一直執行(skywalking 的startup.sh和tomcat中 的startup.sh類似)?

application.yml

#cluster:
#  zookeeper:
#    hostPort: localhost:2181
#    sessionTimeout: 100000
naming:
  jetty:
    #OS real network IP(binding required), for agent to find collector cluster
    host: 0.0.0.0
    port: 10800
    contextPath: /
cache:
#  guava:
  caffeine:
remote:
  gRPC:
    # OS real network IP(binding required), for collector nodes communicate with each other in cluster. collectorN --(gRPC) --> collectorM
    host: #real_host
    port: 11800
agent_gRPC:
  gRPC:
    #os real network ip(binding required), for agent to uplink data(trace/metrics) to collector. agent--(grpc)--> collector
    host: #real_host
    port: 11800
    # Set these two setting to open ssl
    #sslCertChainFile: $path
    #sslPrivateKeyFile: $path

    # Set your own token to active auth
    #authentication: xxxxxx
agent_jetty:
  jetty:
    # OS real network IP(binding required), for agent to uplink data(trace/metrics) to collector through HTTP. agent--(HTTP)--> collector
    # SkyWalking native Java/.Net/node.js agents don't use this.
    # Open this for other implementor.
    host: 0.0.0.0
    port: 12800
    contextPath: /
analysis_register:
  default:
analysis_jvm:
  default:
analysis_segment_parser:
  default:
    bufferFilePath: ../buffer/
    bufferOffsetMaxFileSize: 10M
    bufferSegmentMaxFileSize: 500M
    bufferFileCleanWhenRestart: true
ui:
  jetty:
    # Stay in `localhost` if UI starts up in default mode.
    # Change it to OS real network IP(binding required), if deploy collector in different machine.
    host: 0.0.0.0
    port: 12800
    contextPath: /
storage:
  elasticsearch:
    clusterName: #elasticsearch_clusterName
    clusterTransportSniffer: true
    clusterNodes: #elasticsearch_clusterNodes
    indexShardsNumber: 2
    indexReplicasNumber: 0
    highPerformanceMode: true
    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
    bulkActions: 2000 # Execute the bulk every 2000 requests
    bulkSize: 20 # flush the bulk every 20mb
    flushInterval: 10 # flush the bulk every 10 seconds whatever the number of requests
    concurrentRequests: 2 # the number of concurrent requests
    # Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.
    traceDataTTL: 2880 # Unit is minute
    minuteMetricDataTTL: 90 # Unit is minute
    hourMetricDataTTL: 36 # Unit is hour
    dayMetricDataTTL: 45 # Unit is day
    monthMetricDataTTL: 18 # Unit is month
#storage:
#  h2:
#    url: jdbc:h2:~/memorydb
#    userName: sa
configuration:
  default:
    #namespace: xxxxx
    # alarm threshold
    applicationApdexThreshold: 2000
    serviceErrorRateThreshold: 10.00
    serviceAverageResponseTimeThreshold: 2000
    instanceErrorRateThreshold: 10.00
    instanceAverageResponseTimeThreshold: 2000
    applicationErrorRateThreshold: 10.00
    applicationAverageResponseTimeThreshold: 2000
    # thermodynamic
    thermodynamicResponseTimeStep: 50
    thermodynamicCountOfResponseTimeSteps: 40
    # max collection's size of worker cache collection, setting it smaller when collector OutOfMemory crashed.
    workerCacheMaxSize: 10000
#receiver_zipkin:
#  default:
#    host: localhost
#    port: 9411
#    contextPath: /

webapp.yml

server:
  port: 8080
collector:
  path: /graphql
  ribbon:
    ReadTimeout: 10000
    listOfServers: #real_host:10800
security:
  user:
    admin:
      password: #skywalking_password

動態設定:密碼,grpc等需要繫結主機的ip都需要執行時設定,這裡我們在啟動skywalking的startup.sh只之前,先執行了兩個設定設定的指令碼,通過k8s在執行時設定的環境變數來替換需要動態設定的引數

setApplicationEnv.sh

#!/usr/bin/env sh
sed -i "s/#elasticsearch_clusterNodes/${elasticsearch_clusterNodes}/g" /opt/apache-skywalking-apm-incubating/config/application.yml
sed -i "s/#elasticsearch_clusterName/${elasticsearch_clusterName}/g" /opt/apache-skywalking-apm-incubating/config/application.yml
sed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/config/application.yml

setWebAppEnv.sh

#!/usr/bin/env sh
sed -i "s/#skywalking_password/${skywalking_password}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.yml
sed -i "s/#real_host/${real_host}/g" /opt/apache-skywalking-apm-incubating/webapp/webapp.yml

保持程序存在:通過在skywalking 啟動指令碼startup.sh末尾追加"tail -fn 100 /opt/apache-skywalking-apm-incubating/logs/webapp.log",來讓程序保持執行,並不斷輸出webapp.log的紀錄檔

Kubernetes中部署

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: skywalking
  namespace: uat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: skywalking
  template:
    metadata:
      labels:
        app: skywalking
    spec:
      imagePullSecrets:
      - name: registry-pull-secret
      nodeSelector:
         apm: skywalking
      containers:
      - name: skywalking
        image: registry.cn-xx.xx.com/keking/kk-skywalking:5.2
        imagePullPolicy: Always
        env:
        - name: elasticsearch_clusterName
          value: elasticsearch
        - name: elasticsearch_clusterNodes
          value: 172.16.16.129:31300
        - name: skywalking_password
          value: xxx
        - name: real_host
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        resources:
          limits:
            cpu: 1000m
            memory: 4Gi
          requests:
            cpu: 700m
            memory: 2Gi

---
apiVersion: v1
kind: Service
metadata:
  name: skywalking
  namespace: uat
  labels:
    app: skywalking
spec:
  selector:
    app: skywalking
  ports:
  - name: web-a
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: web-b
    port: 10800
    targetPort: 10800
    nodePort: 31181
  - name: web-c
    port: 11800
    targetPort: 11800
    nodePort: 31182
  - name: web-d
    port: 12800
    targetPort: 12800
    nodePort: 31183
  type: NodePort

Kubernetes部署指令碼中唯一需要注意的就是env中關於pod ip的獲取,skywalking中有幾個ip必須繫結容器的真實ip,這個地方可以通過環境變數設定到容器裡面去

文末結語

整個skywalking容器化部署從測試到可用大概耗時1天,其中花了個多小時整了下譚兄的skywalking-docker映象(https://hub.docker.com/r/wutang/skywalking-docker/),發現有個指令碼有許可權問題(譚兄反饋已解決,還沒來的及測試),以及有幾個地方自己不是很好控制,便build了自己的docker映象,其中最大的問題還是解決叢集中網路通訊的問題,一開始我把skywalking中的服務ip都設定為0.0.0.0,然後通過叢集的nodePort對映出來,這個時候的agent通過叢集ip+31181是可以存取到naming服務的,然後通過naming服務獲取到的collector gRPC服務缺變成了0.0.0.0:11800, 這個地址agent肯定存取不到collector的,後面通過繫結pod ip的方式解決了這個問題。

以上就是skywalking容器化部署docker映象構建k8s從測試到可用的詳細內容,更多關於skywalking容器化部署docker映象構建k8s的資料請關注it145.com其它相關文章!


IT145.com E-mail:sddin#qq.com