Automatically Remove Completed Kubernetes Jobs

1. Overview

In this tutorial, we’ll learn different methods to clean up Kubernetes Job resources to prevent cluttering up the cluster.

2. Kubernetes Job and CronJob

Job and CronJob are Kubernetes APIs for declaratively creating short-running tasks that are executed to completion. On a high-level overview, the Job resource creates Pods for executing a certain workload to completion. The CronJob resource is built on top of the Job resource to allow us to manage the lifecycle of a Job resource using time-based scheduling.

2.1. Job

In Kubernetes, a Job is a resource that spawns and executes Pods to completion. It’s typically used for running short-lived tasks, such as cleaning up resources and batch processing resources. A Job is considered complete if the Pods terminate successfully without any error.

To create a Job resource, we define its specification using the Kubernetes Job API in the YAML format:

$ cat standalone-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: hello-job
spec:
  template:
    spec:
      containers:
      - name: hello-container
        image: alpine:latest
        command: ["echo", "Hello, Kubernetes!"]
      restartPolicy: Never
  backoffLimit: 4

The Job specification above defines a simple Job resource that starts a container using the alpine:latest container image. Then, we instruct the container to run the command echo “Hello, Kubernetes!” using the command field.

Additionally, we can use the restartPolicy field to define that the Pods in our Job should never be restarted. If the Pods exit with non-zero status, the Pod will be considered failed.

When the Pods fail, the Job resource can decide if a retry is needed. This is controlled by the backoffLimit field. In our example, we’ve set the backoffLimit to a value of four so we can retry the execution four times before marking the Job as failed.

2.2. CronJob

A CronJob is a higher-level abstraction of a Job in Kubernetes that offers various APIs for managing a Job’s lifecycle. Primarily, a CronJob resource can control the Job creation schedule using cron syntax. Besides that, a CronJob also allows us to control the amount of Job execution history we want to keep declaratively.

For example, we can create a CronJob that creates a Job resource once every minute:

$ cat hello-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello-cronjob
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello-job
            image: alpine
            command: ["echo", "Hello, Kubernetes!"]
          restartPolicy: Never

From the CronJob manifest above, we can see that a CronJob resource definition requires a jobTemplate that describes the Job it should create. Additionally, the schedule field allows us to define the Job creation schedule using cron syntax.

When we apply this manifest, a Kubernetes controller will create the Job resource as described by our hello-cronjob resource under the jobTemplate according to the schedule, which is once every minute.

3. Job Execution History

By default, the Kubernetes controller does not automatically remove Job resources even after completion. This applies to both successful and failed Jobs. Additionally, Kubernetes retains the Pods created by these Jobs with their status marked as Completed or Error.

Let’s demonstrate this behavior with a practical example. Firstly, we create the Job resource in our cluster as defined by the standalone-job.yaml using the kubectl apply command:

$ kubectl apply -f standalone-job.yaml
job.batch/hello-job created

When we create a Job resource, the Kubernetes controller will spawn the relevant Pods as defined in the manifest and run them to completion.

We can get a list of Job resources in our cluster using the kubectl get jobs command:

$ kubectl get jobs
NAME        COMPLETIONS   DURATION   AGE
hello-job   1/1           17s        32s

We can see from the output that our hello-job is completed. Notably, Kubernetes keeps the Job resource in the cluster even after it’s completed. Similarly, the Pods associated with the Jobs are retained in the cluster:

$ kubectl get pods
NAME              READY   STATUS      RESTARTS   AGE
hello-job-8kspm   0/1     Completed   0          23s

This history of executions is invaluable for providing visibility into the outcome of each Job resource. Additionally, it enables us to debug and diagnose the Job execution as it persists the logs and status information of failed Jobs.

However, the histories can clutter up the cluster if the cluster frequently creates short-living Jobs without a periodic cleanup of the completed Job resources. To keep our cluster tidy, we should define and implement cleanup policies for completed Job resources.

Fortunately, Kubernetes provides two main ways of cleaning up completed Job resources. We can either rely on the CronJob construct to clean up completed Jobs, or we can clean up manually using the kubectl delete jobs command.

4. Cleaning up Jobs Created by CronJob

The CronJob resource defines two APIs for us to define the cleanup policy of past Job executions: the failedJobsHistoryLimit and successfulJobHistoryLimit. As the name implies, the failedJobsHistoryLimit defines the number of failed Job executions the CronJob will retain. The failedJobsHistoryLimit by default only retains the last failed Job execution in the cluster.

Similarly, the successfulJobHistoryLimit specifies the number of successful Job executions to retain with a default value of three.

Let’s extend our hello-cronjob CronJob resource so we retain only the two most recent successful Job executions using the successfulJobHistoryLimit field:

$ cat hello-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello-cronjob
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 2
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello-job
            image: alpine
            command: ["echo", "Hello, Kubernetes!"]
          restartPolicy: Never

Let’s test it out by applying the manifest to our Kubernetes cluster using the kubectl command:

$ kubectl apply -f hello-cronjob.yaml
cronjob.batch/hello-cronjob created

After our CronJob resource is created, we can observe that Kubernetes creates a hello-cronjob Job resource once every minute. As expected, Kubernetes only keeps the last two successful executions of the Job managed by this CronJob in the cluster:

$ kubectl get jobs
NAME                     COMPLETIONS   DURATION   AGE
hello-cronjob-28761583   1/1           6s         67s
hello-cronjob-28761584   1/1           5s         7s

5. Cleaning up Job Not Managed by CronJob

To clean up Job resources that are not managed by a CronJob resource, we can use the kubectl delete command. The kubectl delete command is a general-purpose command that deletes resources from the cluster.

To delete a Job by its name, we can use the kubectl delete jobs command, followed by the Job resource name:

$ kubectl delete job hello-job
job.batch "hello-job" deleted

Besides that, the kubectl delete jobs command supports the –field-selector filter for targeting Job resources by certain attributes. For example, we can delete all the successful Job executions from our cluster using the status.successful=1 selector:

$ kubectl delete jobs --field-selector status.successful=1
job.batch "hello-cronjob-28761592" deleted
job.batch "hello-cronjob-28761593" deleted

Additionally, we can target Job resources by their label using the -l option. For instance, we can delete Job resources with the label job-type=cleanup-after-done:

$ kubectl delete jobs -l job-type=cleanup-after-done --field-selector status.successful=1

The command above combines the -l and –field-selector to target Jobs with the label job-type=cleanup-after-done and has exited successfully.

6. Conclusion

In this tutorial, we’ve learned that Job is a resource in Kubernetes that runs Pods to completion. Then, we’ve also learned that the CronJob resource builds on top of the Job resource to allow scheduling of Job execution and cleaning up of Job execution history.

Later, we learned that the CronJob resource offers the failedJobsHistoryLimit and successfulJobsHistoryLimit fields for limiting the amount of Job execution history to keep. Then, we’ve looked at using the kubectl delete jobs command for deleting Job resources manually.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Full Archive

About Baeldung