In this tutorial, we'll look at how to run cron jobs in Kubernetes.
2. What Is a Cron Job?
For background, a cron job refers to any task that is repeated on a schedule. Unix and most related operating systems offer some cron job functionality.
A typical use case for cron jobs is to automatically perform important tasks on a recurring basis. For example:
- Cleaning disk space
- Backing up files or directories
- Generating metrics or reports
Cron jobs run on a schedule. Using a standard notation, we can define a wide range of schedules to execute a job:
- Every night at 10:00 PM
- The 1st day of every month at 6:00 AM
- Every day at 8:00 AM and 6:00 PM
- Every Tuesday at 7:00 AM
3. Defining Cron Jobs in Kubernetes
Starting with version 1.21, Kubernetes provides first-class support for cron jobs. First, let's look at how to define a cron job and how to set its schedule.
3.1. Defining Cron Jobs
Cron jobs in Kubernetes are similar to other workloads such as deployments or daemon sets. In fact, the YAML for defining cron jobs looks very similar:
apiVersion: batch/v1 kind: CronJob metadata: name: cleanup-job spec: schedule: "0 2 * * *" concurrencyPolicy: Allow suspend: false successfulJobsHistoryLimit: 10 failedJobsHistoryLimit: 3 startingDeadlineSeconds: 60 jobTemplate: spec: template: spec: containers: - name: cleanup-job image: busybox:1.28 imagePullPolicy: IfNotPresent command: - /bin/rm - -f - /tmp/* restartPolicy: OnFailure
The above YAML defines a cron job that will run every day at 2:00 AM and cleanup files from the temp directory.
As mentioned earlier, the YAML for cron jobs is nearly the same as other workloads in Kubernetes. In fact, the jobTemplate portion of the configuration is identical to deployments, replica sets, and other types of workloads.
The main difference is that cron job specifications contain additional fields for defining cron job execution behavior:
- schedule: Required field that specifies the cron job schedule using standard cron syntax.
- concurrencyPolicy: Optional field that specifies how to handle concurrent jobs. The default is Allow, meaning more than one job can run simultaneously. Other possible values are Forbid (not allowed) or Replace (new jobs will replace any running job).
- suspend: Optional field that specifies if future executions of a job should be skipped. Default is false.
- successfulJobsHistoryLimit: Optional field that specifies how many successful executions to track in history. Default is 3.
- failedJobsHistoryLimit: Optional field that specifies how many failed executions to track in history. Default is 1.
- startingDeadlineSeconds: Optional field that specifies how many seconds a job is allowed to miss its scheduled start time before it is considered failed. The default is to not enforce any such deadline.
Note that only one of the fields, schedule, is required. We'll take a closer look at this field later on.
3.2. Managing Cron Jobs
Now that we've seen how to define cron jobs, let's look at how to manage them in Kubernetes.
First, let's assume we would put the cron job YAML definition into a file named cronjob.yaml. We can then create the cron job using the kubelet command:
kubelet create -f /path/to/cronjob.yaml
Additionally, we can list all cron jobs with the following command:
kubectl get cronjob NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE cleanup-job * 2 * * * False 0 15s 32s
We can also use the describe command to see details of a specific cron job, including run history:
kubectl describe cronjob hello Name: cleanup-job Namespace: default Labels: <none> Annotations: <none> Schedule: * 2 * * * Concurrency Policy: Allow Suspend: False Successful Job History Limit: 3 Failed Job History Limit: 1 Starting Deadline Seconds: <unset> Selector: <unset> Parallelism: <unset> Completions: <unset> Pod Template: Labels: <none> Containers: hello: Image: busybox:1.28 Port: <none> Host Port: <none> Command: /bin/rm -f /tmp/* Environment: <none> Mounts: <none> Volumes: <none> Last Schedule Time: Mon, 30 May 2022 02:00:00 -0600 Active Jobs: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 16m cronjob-controller Created job cleanup-job-27565242 Normal SawCompletedJob 16m cronjob-controller Saw completed job: cleanup-job-27565242, status: Complete
Finally, when we no longer need a cron job, we can remove it with the following command:
kubectl delete cronjob cleanup-job
3.3. Cron Schedule Syntax
The cron job schedule syntax contains five parameters separated by spaces. Each parameter can be either an asterisk or a digit.
The order of the parameters is the same as traditional Unix cron syntax. The fields, from left the right, have the following meanings and possible values:
- Minute (0 – 59)
- Hour (0 – 23)
- Day of Month (1 – 31)
- Month (1 – 12)
- Day of Week (0 – 6)
Note that for the Day of Month parameter, some systems treat 0 as Sunday, and others treat it as Monday.
In addition to the possible values identified above, any parameter can also be an asterisk, meaning it applies to all possible values of that field.
Let's look at some examples. First, we can schedule a job to run every day at 8:00 AM:
0 8 * * *
Let's see the schedule parameters that would run a job every Tuesday at 5:00 PM:
0 17 * * 2
Finally, let's see how to run a job on the 15th day of every month at the 30-minute mark of every other hour:
30 0,2,4,6,8,10,12,14,16,18,20,22 15 * *
Note that the above schedule can also be simplified using the skip syntax:
30 0-23/2 15 * *
3.4. Special Cron Job Entries
In addition to standard schedule syntax, cron jobs can also specify their schedule using a number of special identifiers:
- @yearly / @annually – Runs a job at midnight on January 1st of every year
- @monthly – Runs a job at midnight on the 1st day of every month
- @weekly – Runs a job at midnight Sunday every week
- @daily / @midnight – Runs a job every day at midnight
- @hourly – Runs a job at the start of every hour of every day
3.5. Cron Job Time Zones
By default, all Kubernetes cron jobs run in the time zone of the control manager. In some cases, it's possible to specify a specific time zone using the variable CRON_TZ or TZ. However, this is not officially supported. These variables are considered internal implementation details and, therefore, are subject to change without warning.
As of Kubernetes version 1.24, it's possible to specify a time zone as part of the cron job spec:
spec: schedule: "0 2 * * *" timeZone: "GMT"
The timeZone field can be any valid time zone identifier.
Since this feature is still experimental, we must first enable the CronJobTimeZone feature gate prior to using it.
Cron jobs are a great way to repeatedly execute important system tasks.
In this article, we've looked at how to utilize cron jobs inside a Kubernetes cluster. First, we saw the YAML required to define a cron job and how to manage its lifecycle using the kubectl command. Finally, we looked at various ways to define their schedule and how to handle time zones.