Hit and Miss Ratio in Cache for Content Delivery Networks

1. Introduction

Video streaming has become an essential part of modern life, and Content Delivery Networks (CDNs) have become indispensable for providing high-quality video content to a worldwide audience.

Two essential metrics, CDN Cache Hit Ratio and Cache Miss Ratio, gauge the effectiveness of a CDN. These metrics help us evaluate how well the CDN can cache often accessed content and decrease the number of slow main memory accesses.

In this tutorial, we’ll explore the CDN cache hit ratio and cache miss ratio metrics and how they relate to video streaming. Furthermore, we’ll also learn some strategies to tune these metrics to improve the performance of CDNs.

2. What Is a Cache

The term cache has French roots and means, literally, to store. Caching is the process of storing recently accessed information for future use. This involves making copies of files and storing them in a temporary storage location called a cache.

In summary, caches help to speed up access times. While a cache can be any temporary storage location for data or files, it is often used in the context of Internet technologies. For example, web browsers cache HTML files, JavaScript, and images, to load websites faster. Similarly, DNS servers cache DNS records to speed up lookups, and CDN servers cache content to reduce latency.

The fundamental concept behind caching is using the principle of locality of reference, which suggests that data recently accessed is likely to be accessed again. However, the cached data may or may not be used again in the near future. Therefore, caches are only beneficial if the data storing cost is lower than the cost of retrieving or computing the data again.

3. Caching in CDNs

A CDN caches content data, such as images, videos, or web pages, in proxy servers closer to the end users than the origin servers. This scenario is shown in the diagram below:

By having servers closer to the user making the request, a CDN is able to provide content more rapidly. When a user requests content from a website that uses a CDN, the CDN retrieves the content from the origin server and saves a copy for future requests. Cached content remains in the CDN cache as long as users keep requesting it.

4. Cache Hit Ratio

A cache hit occurs when a client device requests content from a cache, and the cache already has that content stored. The cache hit ratio represents the percentage of data requests successfully served by the cache, which means the requested content was found in the cache and delivered to the client device without retrieving it from the origin server.

The formula to calculate the cache hit ratio is given below:

$\text{Cache hit ratio}=\frac{\text{Number of cache hits}}{\text{Number of cache hits}+\text{Number of cache misses}}$

For example, let’s assume a CDN with 18 cache hits and 2 cache misses over a given timeframe. Then the cache hit ratio is 18 divided by 20, or 0.9. We can also express the cache hit ratio as a percentage by multiplying this result by 100. As a percentage, this would be a cache hit ratio of 90%.

A CDN with a high cache hit ratio means that the frequently accessed video content or data is effectively stored in the caches. Thus, it reduces requests to the origin server and indicates that the CDN is effective.

On the other hand, a low cache hit ratio indicates that the CDN cache is not effective. It results in a higher number of requests sent to the origin server, indicating an increased number of cache misses. Therefore, monitoring the cache hit ratio is crucial to ensure that a CDN is effective.

5. Cache Miss Ratio

The definition of a cache miss is the opposite of a cache hit. A cache miss occurs when the cache does not have the requested content. So, the system will retrieve the requested content from the origin server. The cache miss ratio represents the percentage of cache misses compared to the total number of data requests made to the cache.

The formula to calculate the cache miss ratio is given below:

$\text{Cache miss ratio} =\frac{\text{Number of cache misses}}{\text{Number of cache hits}+\text{Number of cache misses}} =1-\text{Cache hit ratio}$

Taking the same example presented in the previous section, the respective miss ratio of the given CDN is 10%.

A high cache miss ratio indicates that the CDN cache is not effectively storing frequently accessed video segments or data, resulting in many memory accesses not being satisfied by the cache. This can lead to slower access times and increased network traffic, negatively affecting the user experience.

On the other hand, a low cache miss ratio, ideally zero, is desirable as it indicates that the cache is effectively storing frequently accessed data, resulting in fewer slower main memory accesses. This improves the user experience by providing faster access times and reducing network traffic. Therefore, reducing the cache miss ratio is critical for optimizing the performance of a CDN.

6. What Is a Good CDN Cache Hit Ratio?

A website that mostly provides static content can have a high cache hit ratio, typically about 95-99%. However, the cache hit ratio may be lower for websites with a lot of dynamic content. While caching is an essential part of a CDN, the location of CDN cache servers is also critical to improve the performance of the CDN system.

In this way, global CDN providers commonly place caching servers in data centers worldwide to ensure that content is as close as possible to end users. This strategy helps to reduce network latency and provide faster delivery of content. For instance, Cloudflare has CDN servers in 285 locations spread out worldwide.

7. How to Improve Cache Hit Ratio?

Let’s take look at some ideas to improve the cache hit ratio (or reduce the cache miss ratio) to improve the CDN’s performance.

The first strategy is fine tunning the Time-To-Live (TTL) parameter. In CDNs, the TTL parameter determines how long the data stays in the cache. A too-small TTL may lead to a high cache miss ratio since the life cycle of data in the CDN cache is too short. On the other hand, if the TTL is too high, stale data will stay on the CDN longer, and the cache will get filled up, i.e., out-of-memory, also leading to a high cache miss ratio. An accurate and precise TTL configuration is thus required. Usually, we can set a longer TTL for static content, such as CSS files and images, which rarely change.

The second strategy is optimizing video encoders and packagers. Adaptive video streaming protocols require the storing of video content with different bitrates. This also leads to the out-of-memory for CDN caches. Optimizing the video encoder and packager to reduce the number of variations in video content is thus needed.

Another strategy is using Multi-CDN. With multiple CDNs, we can distribute the data load, and video traffic can be dynamically routed to the CDN with the lowest latency and cache miss ratio.

8. Conclusion

CDNs are a pretty relevant infrastructure for the Internet. By locating caching servers closer to end-users, CDNs are able to deliver content more quickly to them. This helps to increase the cache hit ratio, thus reducing latency and improving the quality of service for end-users.

In this brief article, we covered the definition of CDN caching and two important parameters: the cache hit and miss ratio. We also explored some strategies to increase the cache hit ratio to improve the performance of CDN.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung