
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
In this tutorial, we shall make a brief introduction to backup, its types, strategies, and policies. For every one of us that relies on digital information systems, one of the first questions that come to mind is: “is our data safe?” or better yet: “how safe our data is?”. In fact, those are very good questions. Many menaces endanger information systems nowadays. Hardware failures, hacker attacks (such as ransomware, logic bombs, viruses, and so on), software bugs, and usage mistakes, to name a few.
By Backup, we understand a copy of data that can be used to recover the system in the case of data losses. Indeed, as there are so many reasons for data to be lost, it is impossible to anticipate them all. So, even though there are great solutions to safeguard data, RAID, for instance, the best option is to safeguard the data by doing as many copies as necessary. Also, we can establish objectives like recovering the data as it was on a previous date.
Let us say that we have a payroll system that we must ensure its availability. The system receives daily changes as long as people are hired, retire, or laid off. Also, it calculates the paychecks, taxes, and so on every month. Therefore, its data is constantly changing. So, in the event of a data loss we would want, ideally that, after recovery, as soon as possible, we could be still able to manage all employees. Even the ones we hired just before the event. If that is not possible, we’ll try to minimize the need for manual data entries. That exemplifies the main concept behind backup objectives: how long it’ll take to resume normal operations and what data will be recovered.
As we can see, the above metrics are very akin to the actual business needs. Also, they directly affect the backup system sizing, cost, and viability. Furthermore, they can differ a lot for different domains, for instance, tax data needs to be available as long as the revenue service may require it for auditing purposes, usually for a few years. On the other hand, employee information may be needed for decades after the employment contract has ended. And to make things worse, each country’s legal requirements for multinational companies will lead to different objectives for similar data. If the data is not country partitioned, we shall adapt the policy to the most demanding goals such as having all Recovery Point Objectives required for any situation, with the shorter Recovery Time Objective and using the longest Data Retention.
Now, let’s review some of the major classifications for backups.
First, regarding the recovery point objectives we may have:
Now, regarding backup strategies, we can also choose from:
The figure shows how they differ from each other:
In practice, we most likely will create a policy that uses mixed strategies to ensure the needed Recovery Point Objectives are achievable in the Backup Window. Also that the backup recovery can be done within the required Recovery Time Objectives.
A backup policy is a document that defines, for a set of systems, how its data will be protected, taking into account its specific objectives. It should cover:
The key considerations to correctly size a backup solution are the amount of total data it must store, how much the data changes throughout time, the backup window it must operate, and the recovery time objectives. Those metrics will help establish the media types, the throughput, and the system’s total capacity. Regarding the total storage, to keep costs a little lower, most solutions are able to do data compression and data deduplication. For external backup solutions, cloud-based for instance, we must also consider using cryptography.
The need for a reliable backup system nowadays is quite pressing. In fact, with the multiple risks and threats we face, any measure we take may not be just enough. However, with proper backup policies, we can anticipate incidents and be prepared to quickly respond and mitigate them by recovering the affected systems. In this tutorial, we discussed one of the most valuable tools to help us in our system’s overall resilience. Backup is the ultimate resource at hand to prevent major damages and catastrophic data losses and failures.