使用sealos 快速部署 victoria metrics stack 套装
sealos run docker.io/labring/victoria-metrics-k8s-stack:v1.96.0
Vm stack不同于prometheus stack,对于一些需要证书的metrics需要手动生成配置
部署后 vmagent 可以看到如下报错:
warn VictoriaMetrics/lib/promscrape/scrapework.go:387 cannot scrape target "https://192.168.0.55:2379/metrics" ({endpoint="http-metrics",instance="192.168.0.55:2379",job="kube-etcd",namespace="kube-system",pod="etcd-dev-55",service="victoria-metrics-k8s-stack-kube-etcd"}) 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: cannot read data: cannot scrape "https://192.168.0.55:2379/metrics": Get "https://192.168.0.55:2379/metrics": remote error: tls: bad certificate
cannot scrape target "https://192.168.0.55:10257/metrics" ({endpoint="http-metrics",instance="192.168.0.55:10257",job="kube-controller-manager",namespace="kube-system",pod="kube-controller-manager-dev-55",service="victoria-metrics-k8s-stack-kube-controller-manager"}) 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: cannot read data: cannot scrape "https://192.168.0.55:10257/metrics": Get "https://192.168.0.55:10257/metrics": tls: failed to verify certificate: x509: certificate is valid for localhost, localhost, not kubernetes
cannot scrape target "https://192.168.0.55:10257/metrics" ({endpoint="http-metrics",instance="192.168.0.55:10257",job="kube-controller-manager",namespace="kube-system",pod="kube-controller-manager-dev-55",service="victoria-metrics-k8s-stack-kube-controller-manager"}) 1 out of 1 times during -promscrape.suppressScrapeErrorsDelay=0s; the last error: cannot read data: cannot scrape "https://192.168.0.55:10257/metrics": Get "https://192.168.0.55:10257/metrics": tls: failed to verify certificate: x509: certificate is valid for localhost, localhost, not kubernetes
可以看到关于集群 etcd , controller manager, scheduler 的metrics采集均有报错,所以需要配置方可采集
Etcd
首先是etcd,etcd是需要证书来访问metrics的
而我们的集群是kubeadm安装,etcd使用静态pod部署在master节点(/etc/kubernetes/manifests/etcd.yaml): 在集群master节点执行如下命令,可以手动获取metrics:
curl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key https://192.168.0.55:2379/metrics
1.> 生成证书secret,用于挂载
kubectl -n vm create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
2.> 修改vmagent CRD来挂载secret 证书
kubectl edit vmagent -n vm victoria-metrics-k8s-stack -o yaml
volumeMounts:
- mountPath: /etc/etcd-certs
name: etcd-certs
readOnly: true
volumes:
- name: etcd-certs
secret:
secretName: etcd-certs
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
annotations:
meta.helm.sh/release-name: victoria-metrics-k8s-stack
meta.helm.sh/release-namespace: vm
creationTimestamp: "2024-04-07T08:20:54Z"
finalizers:
- apps.victoriametrics.com/finalizer
generation: 4
labels:
app.kubernetes.io/instance: victoria-metrics-k8s-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
app.kubernetes.io/version: v1.96.0
helm.sh/chart: victoria-metrics-k8s-stack-0.18.11
name: victoria-metrics-k8s-stack
namespace: vm
resourceVersion: "87502302"
uid: eb563784-5e3d-4949-af3b-ec6f3fb137e7
spec:
additionalScrapeConfigs:
key: prometheus-additional.yaml
name: additional-scrape-configs
arbitraryFSAccessThroughSMs: {}
externalLabels:
cluster: cluster-name
extraArgs:
promscrape.streamParse: "true"
image:
tag: v1.96.0
remoteWrite:
- url: http://vminsert-victoria-metrics-k8s-stack.vm.svc:8480/insert/0/prometheus/api/v1/write
resources: {}
scrapeInterval: 20s
selectAllByDefault: true
volumeMounts:
- mountPath: /etc/etcd-certs
name: etcd-certs
readOnly: true
volumes:
- name: etcd-certs
secret:
secretName: etcd-certs
status:
availableReplicas: 0
replicas: 1
selector: ""
shards: 0
unavailableReplicas: 0
updatedReplicas: 0
| 添加vmagent.spec.volumes 和vmagent.spac.volumeMounts
CRD定义参考API接口文档: https://docs.victoriametrics.com/operator/api/#vmagentspec
3.> 修改vmservicescrape 抓取配置:
tlsConfig:
caFile: /etc/etcd-certs/ca.crt
certFile: /etc/etcd-certs/healthcheck-client.crt
keyFile: /etc/etcd-certs/healthcheck-client.key
| 执行 kubectl edit vmservicescrapes.operator.victoriametrics.com -n vm victoria-metrics-k8s-stack-kube-etcd -o yaml 指定tlsconfig.caFile, certFile, keyFile
kubectl get vmservicescrapes.operator.victoriametrics.com -n vm victoria-metrics-k8s-stack-kube-etcd -o yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
annotations:
meta.helm.sh/release-name: victoria-metrics-k8s-stack
meta.helm.sh/release-namespace: vm
creationTimestamp: "2024-04-07T08:20:54Z"
generation: 4
labels:
app.kubernetes.io/instance: victoria-metrics-k8s-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
app.kubernetes.io/version: v1.96.0
helm.sh/chart: victoria-metrics-k8s-stack-0.18.11
name: victoria-metrics-k8s-stack-kube-etcd
namespace: vm
resourceVersion: "87505107"
uid: 89bb5105-bcc3-4201-bf15-e2fb05408235
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics
scheme: https
tlsConfig:
caFile: /etc/etcd-certs/ca.crt
certFile: /etc/etcd-certs/healthcheck-client.crt
keyFile: /etc/etcd-certs/healthcheck-client.key
jobLabel: jobLabel
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: victoria-metrics-k8s-stack-kube-etcd
app.kubernetes.io/instance: victoria-metrics-k8s-stack
4.> 重启vmagent:
kubectl rollout restart deploy -n vm vmagent-victoria-metrics-k8s-stack
Controller-manager, scheduler
相比于etcd来说controller manasger和scheduler不需要配置证书,使用集群权限只要跳过验证即可:
kubectl edit vmservicescrapes.operator.victoriametrics.com -n vm victoria-metrics-k8s-stack-kube-controller-manager -o yaml
kubectl edit vmservicescrapes.operator.victoriametrics.com -n vm victoria-metrics-k8s-stack-kube-scheduler -o yaml
添加tlsConfig.insecureSkipVerify = true 即可
kubectl get vmservicescrapes.operator.victoriametrics.com -n vm victoria-metrics-k8s-stack-kube-controller-manager -o yaml
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
annotations:
meta.helm.sh/release-name: victoria-metrics-k8s-stack
meta.helm.sh/release-namespace: vm
creationTimestamp: "2024-04-07T08:20:54Z"
generation: 2
labels:
app.kubernetes.io/instance: victoria-metrics-k8s-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
app.kubernetes.io/version: v1.96.0
helm.sh/chart: victoria-metrics-k8s-stack-0.18.11
name: victoria-metrics-k8s-stack-kube-controller-manager
namespace: vm
resourceVersion: "87433090"
uid: 338ebc6b-bd7c-41e0-91ba-4969cc35be62
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
serverName: kubernetes
jobLabel: jobLabel
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: victoria-metrics-k8s-stack-kube-controller-manager
app.kubernetes.io/instance: victoria-metrics-k8s-stack