在kubernetes中部署skywalking(elasticsearch7集群)
根据skywalking不同版本,需要不同版本es集群,es6集群部署 ,这次使用elasticsearch7的集群作为skywalking的存储,pv的创建同样使用在kubernetes中statefu方式安装elasticsearch-6集群步骤
下面是elasticsearch 7 rbac配置
设置RBAC
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: skywalking-sa-cluster
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skywalking-sa-role
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skywalking-sa-role-binding
namespace: default
subjects:
- kind: ServiceAccount
name: skywalking-sa-cluster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: skywalking-sa-role
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skywalking-sa-cluster-role
rules:
- apiGroups: [ "" ]
resources:
- "pods" # @feature: cluster; OAP needs to read other OAP Pods information to form a cluster
# @feature: als; OAP needs to read Pods metadata to analyze the access logs
- "services" # @feature: als; OAP needs to read services metadata to analyze the access logs
- "endpoints" # @feature: als; OAP needs to read endpoints metadata to analyze the access logs
- "nodes" # @feature: als; OAP needs to read nodes metadata to analyze the access logs
- "configmaps"
verbs: [ "get", "watch", "list" ]
- apiGroups: [ "batch" ]
resources:
- "jobs" # @feature: cluster; OAP needs to wait for the init job to complete
verbs: [ "get", "watch", "list" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: skywalking-sa-cluster-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: skywalking-sa-cluster-role
subjects:
- kind: ServiceAccount
name: skywalking-sa-cluster
namespace: default
elasticsearch 7的参数配置,statefulset 部署配置
---
apiVersion: v1
kind: ConfigMap
metadata:
name: es-cm
namespace: default
data:
elasticsearch.yml: |
node.data: ${NODE_DATA:true}
node.master: ${NODE_MASTER:true}
node.ingest: ${NODE_INGEST:true}
node.name: ${HOSTNAME}
network.host: 0.0.0.0
bootstrap.memory_lock: ${BOOTSTRAP_MEMORY_LOCK:false}
processors: ${PROCESSORS:}
gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:2}
gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:1}
gateway.recover_after_time: ${RECOVER_AFTER_TIME:5m}
gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:2}
gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
http.cors.enabled: true
http.cors.allow-origin: "*"
xpack.security.enabled: false
ingest.geoip.downloader.enabled: false
log4j2.properties: |-
status = error
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
logger.searchguard.name = com.floragunn
logger.searchguard.level = info
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: elasticsearch
namespace: default
spec:
serviceName: es-svc
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
nodeSelector:
deploytype: normal
initContainers:
- name: init
image: busybox:1.32.0
command: ['/bin/sh','-c','sysctl -w vm.max_map_count=262144 && sleep 1']
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
- name: chown
image: busybox:1.32.0
command: ['/bin/sh','-c','chown -R 1000:1000 /usr/share/elasticsearch/data']
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: es-data
mountPath: /usr/share/elasticsearch/data
serviceAccount: skywalking-sa-cluster
containers:
- name: elasticsearch
image: elasticsearch:7.16.2
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 2
memory: 2G
requests:
cpu: 500m
memory: 512Mi
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
readinessProbe:
httpGet:
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 10
env:
- name: node.name
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: cluster.name
value: es-cluster
- name: DISCOVERY_SERVICE
value: "es-svc"
- name: ES_JAVA_OPTS
value: "-Djava.net.preferIPv4Stack=true -Xms1400m -Xmx1400m"
- name: discovery.seed_hosts
value: "elasticsearch-0.es-svc,elasticsearch-1.es-svc,elasticsearch-2.es-svc"
- name: cluster.initial_master_nodes
value: "elasticsearch-0,elasticsearch-1,elasticsearch-2 "
volumeMounts:
- name: es-data
mountPath: /usr/share/elasticsearch/data
- mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
name: config
subPath: elasticsearch.yml
- mountPath: /usr/share/elasticsearch/config/log4j2.properties
name: log4j2
subPath: log4j2.properties
imagePullSecrets:
- name: harbor-bbotte
volumes:
- name: config
configMap:
name: es-cm
- name: log4j2
configMap:
name: es-cm
volumeClaimTemplates:
- metadata:
name: es-data
spec:
accessModes: [ "ReadWriteMany" ]
resources:
requests:
storage: 50Gi
---
kind: Service
apiVersion: v1
metadata:
name: es-svc
namespace: default
labels:
app: elasticsearch
spec:
type: NodePort
clusterIP: 172.30.8.248
selector:
app: elasticsearch
ports:
- port: 9200
name: es9200
targetPort: 9200
nodePort: 30920
- port: 9300
name: es9300
targetPort: 9300
注意这里es使用了clusterIP,是对es服务一个固定ip,因为skywalking要用。现在es集群已经部署完毕,查看es的接口确认服务是否正常,
curl -XGET 'http://localhost:9200/_cluster/stats?human&pretty'
最后部署skywalking,测试的时候倒是其中几个参数耽误挺长时间,比如SW_STORAGE是elasticsearch7,不是elasticsearch;SW_STORAGE_ES_CLUSTER_NODES的值是ip:port,不能是elasticsearch的服务名:port。初始化的job官方还有个探测es是否就绪的判断,这里省了。
apiVersion: v1
kind: Service
metadata:
name: oap
spec:
selector:
app: skywalking
ports:
- name: metrics
port: 1234
- name: grpc
port: 11800
- name: http
port: 12800
---
apiVersion: batch/v1
kind: Job
metadata:
name: oap-init-job # @feature: cluster; set up an init job to initialize ES templates and indices
spec:
template:
metadata:
name: oap-init-job
spec:
serviceAccountName: skywalking-sa-cluster
restartPolicy: Never
containers:
- name: oap-init
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:8.7.0-es7
imagePullPolicy: IfNotPresent
env: # @feature: cluster; make sure all env vars are the same with the cluster nodes as this will affect templates / indices
- name: JAVA_OPTS
value: "-Dmode=init " # @feature: cluster; set the OAP mode to "init" so the job can complete
- name: SW_STORAGE
value: elasticsearch7 #这里必须是6或者7,要加上
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "172.30.8.248:9200" #这里必须写ip,不能写es的服务名
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: skywalking
labels:
app: skywalking
spec:
replicas: 2 # @feature: cluster; set OAP replicas to >1
selector:
matchLabels:
app: skywalking
template:
metadata:
labels:
app: skywalking
spec:
initContainers:
- name: wait-for-oap-init
image: bitnami/kubectl:1.20.12
command:
- 'kubectl'
- 'wait'
- '--for=condition=complete'
- 'job/oap-init-job'
serviceAccountName: skywalking-sa-cluster
containers:
- name: skywalking
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:8.7.0-es7
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 4000m
memory: "4096Mi"
requests:
cpu: 1000m
memory: "1024Mi"
ports:
- name: metrics # @feature: so11y; set a name for the metrics port that can be referenced in otel config
containerPort: 1234
- name: grpc
containerPort: 11800
- name: http
containerPort: 12800
livenessProbe:
tcpSocket:
port: 12800
initialDelaySeconds: 15
periodSeconds: 20
env:
- name: JAVA_OPTS
value: "-Dmode=no-init"
- name: SW_CLUSTER
value: kubernetes # @feature: cluster; set cluster coordinator to kubernetes
- name: SW_CLUSTER_K8S_NAMESPACE
value: default
- name: SW_CLUSTER_K8S_LABEL
value: "app=skywalking,release=skywalking,component=oap" # @feature: cluster; set label selectors to select OAP Pods as a cluster
- name: SKYWALKING_COLLECTOR_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: SW_OTEL_RECEIVER
value: default # @feature: so11y;vm;kubernetes-monitor enable OpenTelemetry receiver to receive OpenTelemetry metrics
- name: SW_OTEL_RECEIVER_ENABLED_OC_RULES
# @feature: vm; enable vm rules to analyze VM metrics
# @feature: so11y; enable oap rules to analyze OAP metrics
# @feature: kubernetes-monitor; enable rules to analyze Kubernetes Cluster/Node/Service metrics
# @feature: istiod-monitor; enable rules to analyze Istio control plane metrics
value: oap,k8s-cluster,k8s-node,k8s-service
- name: SW_STORAGE
value: elasticsearch7
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "172.30.8.248:9200" #这里必须写ip,不能写es的服务名,所以es中用了clusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: skywalking-ui
labels:
app: skywalking-ui
spec:
replicas: 1
selector:
matchLabels:
app: skywalking-ui
template:
metadata:
labels:
app: skywalking-ui
spec:
affinity:
containers:
- name: skywalking-ui
image: skywalking.docker.scarf.sh/apache/skywalking-ui:8.7.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: page
env:
- name: SW_OAP_ADDRESS
value: http://oap:12800
---
apiVersion: v1
kind: Service
metadata:
name: skywalking-ui
labels:
app: skywalking-ui
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
protocol: TCP
nodePort: 30080
selector:
app: skywalking-ui
安装skywalking遇到奇奇怪怪的错误,多是因为配置的问题,还有skywalking init缺少表的,先把skysalking的Deployment containers 中 -Dmode=no-init改为 -Dmode=init,kubectl apply 这个yaml文件后,等pod启动完毕再把配置修改回来,再次kubectl apply yaml文件,可以解决问题
Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."
org.apache.skywalking.oap.server.core.storage.model.ModelInstaller -330978 [main] INFO [] - table: meter_oap_instance_trace_latency_percentile does not exist. OAP is running in 'no-init' mode, waiting... retry 3s later.
table: alarm_record does not exist. OAP is running in 'no-init' mode, waiting... retry 3s later
对于elasticsearch提示安全的问题,可以在deployment env中加配置项:
env:
- name: 'xpack.security.enabled'
value: 'false'
参考
skywalking-kubernetes这里是helm的模板,可以导出为yaml配置,比如:
export SKYWALKING_RELEASE_NAME=skywalking
helm template "${SKYWALKING_RELEASE_NAME}" ./chart/skywalking/ --set ui.image.tag=8.8.1 --set oap.storageType=elasticsearch --set oap.image.tag=8.8.1 > skywalking.yaml
最后,业务客户端连skywalking的服务名为 oap:11800
2021年01月24日 于 linux工匠 发表