Example support bundle specs
This topic includes example support bundle specifications. For more, see the Troubleshoot example repository in GitHub.
Check application health
This example shows how to use the deploymentStatus analyzer to check an application's health.
The deploymentStatus analyzer uses data from the default clusterResources collector to check if liveness probes are passing and if the required number of replicas are ready for an application deployment.
Replicated recommends using the deploymentStatus analyzer to check application health because it doesn't require any additional collectors besides clusterResources. It also requires no additional RBAC permissions. The http, exec, and runPod collectors can probe health endpoints directly, but each has environment-specific limitations:
httpcannot resolve in-cluster DNS from the CLI.execrequirespods/execRBAC and tools such ascurl,wget, orncin the container.runPodrequirespods/createRBAC. In air-gapped environments, the image must already be present on the node.
For more information, see Deployment Status and Cluster Resources in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- deploymentStatus:
name: api
namespace: default
outcomes:
- fail:
when: "< 1"
message: The api deployment does not have any ready replicas. Check that the application's liveness probes are passing and that the application is not in a crash loop.
- warn:
when: "= 1"
message: The api deployment has only a single ready replica.
- pass:
message: The api deployment is healthy and has multiple ready replicas.
Check HTTP requests
If your application has its own API that serves status, metrics, performance data, and so on, this information can be collected and analyzed.
The example below uses the http collector and the textAnalyze analyzer to check that an HTTP request to the Slack API at https://api.slack.com/methods/api.test made from the cluster returns a successful response of "status": 200,.
For more information, see HTTP and Regular Expression in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- http:
collectorName: slack
get:
url: https://api.slack.com/methods/api.test
analyzers:
- textAnalyze:
checkName: Slack Accessible
fileName: slack.json
regex: '"status": 200,'
outcomes:
- pass:
when: "true"
message: "Can access the Slack API"
- fail:
when: "false"
message: "Cannot access the Slack API. Check that the server can reach the internet and check [status.slack.com](https://status.slack.com)."
Check Kubernetes version
The example below uses the clusterVersion analyzer to check the version of Kubernetes running in the cluster. The clusterVersion analyzer uses data from the default clusterInfo collector.
For more information, see Cluster Version and Cluster Info in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- clusterVersion:
outcomes:
- fail:
message: This application relies on kubernetes features only present in 1.16.0
and later.
uri: https://kubernetes.io
when: < 1.16.0
- warn:
message: Your cluster is running a version of kubernetes that is out of support.
uri: https://kubernetes.io
when: < 1.24.0
- pass:
message: Your cluster meets the recommended and quired versions of Kubernetes.
Check node resources
The example below uses the nodeResources analyzer to check that the minimum requirements are met for memory, CPU cores, number of nodes, and ephemeral storage. The nodeResources analyzer uses data from the default clusterResources collector.
For more information, see Cluster Resources and Node Resources in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- nodeResources:
checkName: One node must have 2 GB RAM and 1 CPU Cores
filters:
allocatableMemory: 2Gi
cpuCapacity: "1"
outcomes:
- fail:
when: count() < 1
message: Cannot find a node with sufficient memory and cpu
- pass:
message: Sufficient CPU and memory is available
- nodeResources:
checkName: Must have at least 3 nodes in the cluster
outcomes:
- fail:
when: "count() < 3"
message: This application requires at least 3 nodes
- warn:
when: "count() < 5"
message: This application recommends at last 5 nodes.
- pass:
message: This cluster has enough nodes.
- nodeResources:
checkName: Each node must have at least 40 GB of ephemeral storage
outcomes:
- fail:
when: "min(ephemeralStorageCapacity) < 40Gi"
message: Noees in this cluster do not have at least 40 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- warn:
when: "min(ephemeralStorageCapacity) < 100Gi"
message: Nodes in this cluster are recommended to have at least 100 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- pass:
message: The nodes in this cluster have enough ephemeral storage.
Check node status
The following example uses the nodeResources analyzer to check the status of the nodes in the cluster. The nodeResources analyzer uses data from the default clusterResources collector.
For more information, see Node Resources and Cluster Resources in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors: []
analyzers:
- nodeResources:
checkName: Node status check
outcomes:
- fail:
when: "nodeCondition(Ready) == False"
message: "Not all nodes are online."
- warn:
when: "nodeCondition(Ready) == Unknown"
message: "Not all nodes are online."
- pass:
message: "All nodes are online."
Collect logs using multiple selectors
The example below uses the logs collector to collect logs from various Pods where application workloads are running. It also uses the textAnalyze collector to analyze the logs for a known error.
For more information, see Pod Logs and Regular Expression in the Troubleshoot documentation.
You can use the selector attribute of the logs collector to find Pods that have the specified labels. Depending on the complexity of an application's labeling schema, you might need a few different declarations of the logs collector, as shown in this example. You can include the logs collector as many times as needed.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-nginx
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-api
- logs:
namespace: {{ .Release.Namespace }}
selector:
- app=slackernews-frontend
- logs:
selector:
- app=postgres
analyzers:
- textAnalyze:
checkName: Axios Errors
fileName: slackernews-frontend-*/slackernews.log
regex: "error - AxiosError"
outcomes:
- pass:
when: "false"
message: "Axios errors not found in logs"
- fail:
when: "true"
message: "Axios errors found in logs"
Collect logs using limits
The example below uses the logs collector to collect Pod logs from the Pod where the application is running. This specification uses the limits field to set a maxAge and maxLines to limit the output provided.
For more information, see Pod Logs in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- logs:
selector:
- app.kubernetes.io/name=myapp
namespace: {{ .Release.Namespace }}
limits:
maxAge: 720h
maxLines: 10000
Collect Redis and MySQL server information
The following example uses the redis and mysql collectors to collect information about Redis and MySQL servers running in the cluster.
For more information, see Redis and MySQL in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- mysql:
collectorName: mysql
uri: 'root:my-secret-pw@tcp(localhost:3306)/mysql'
parameters:
- character_set_server
- collation_server
- init_connect
- innodb_file_format
- innodb_large_prefix
- innodb_strict_mode
- log_bin_trust_function_creators
- redis:
collectorName: my-redis
uri: rediss://default:replicated@server:6380
Run and analyze a pod
The example below uses the textAnalyze analyzer to check that a command successfully executes in a Pod running in the cluster. The Pod specification is defined in the runPod collector.
For more information, see Run Pods and Regular Expression in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
name: example
labels:
troubleshoot.sh/kind: support-bundle
stringData:
support-bundle-spec: |-
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: example
spec:
collectors:
- runPod:
collectorName: "static-hi"
podSpec:
containers:
- name: static-hi
image: alpine:3
command: ["echo", "hi static!"]
analyzers:
- textAnalyze:
checkName: Said hi!
fileName: /static-hi.log
regex: 'hi static'
outcomes:
- fail:
message: Didn't say hi.
- pass:
message: Said hi!