How to Warmup the Cache of CloudFlare CDN - Part II

Ask A Question


Upcoming Version of the Kitt Cache Crawler for Cloudflare


In a previous post, we analyzed the question of whether it is possible to warm up the CDN cache of Cloudflare and Quic.cloud. Regardless of the challenges and the specifics of warming up the cache of a Content Delivery Network, the CDN cache warmup behaves fundamentally the same as any other cache warmup. In any case, it always requires an HTTP request, regardless of how the request is executed, as long as it uses the GET method. However, the effort required for CDN cache warmup is considerably higher, and would generally exclude the possibility of performing a cache warmup. After many tests, we have found a way to drastically reduce the effort, especially the time required, for this task. With a special version of the Kitt Cache Crawler, we are working on a cache warmer for Cloudflare.

What is a Content Delivery Network (CDN) and How Do You Benefit from It?



Before we discuss whether and how you can warm up the CDN cache, especially with Cloudflare, we first need to clarify some of the biggest misunderstandings regarding CDNs.

A key misunderstanding is that many users have insufficient knowledge of what a CDN actually is. Although the term "Content Delivery Network" may seem self-explanatory, a CDN is not automatically a CDN just because a provider calls it one. For a CDN to truly deliver content across a globally distributed network of nodes, certain minimum requirements must be met. The essential prerequisite is to adjust the domain registrar’s nameserver addresses to those of the CDN provider. With this seemingly small change, a noticeable improvement can often be achieved, but not universally, because the resolution of the IP address of the origin host often takes longer when the browser doesn’t yet know the IP address of the requested URL. Since CDN nodes are typically closer to the user’s location and often have more IP address information, this results in time savings—though not always, and not with subsequent requests to the same host. This means that the advantage of faster IP resolution of the origin host only applies on the first request to a URL. With each subsequent request, the browser already knows the IP address and bypasses the CDN to some extent. Although this mechanism for nameservers and IP address resolution may seem unique, a CDN is just following the general operational principles of the internet. This operational behavior can also be mimicked on any standard computer or mobile device by adding the IP address of the origin host and the respective domain to the "hosts" file. This "hack" can, for example, bypass the CDN, as the browser prioritizes this information and bypasses the CDN. In any case, CDN providers only use what is necessary for internet use. The significant difference and advantage of a CDN lies in the availability of globally distributed CDN nodes. Metaphorically, a CDN network makes the "hosts" file available via globally distributed CDN nodes.

Another benefit, often misunderstood, comes from using a proxy. A proxy identifies itself as a server in front of the origin host, or as a CDN node. When you use this proxy service to access a URL, all requests and subsequent calls are routed through this proxy because it hides the origin host’s IP address. The browser is thus forced to always go through the proxy, though this can be circumvented if you know the origin host’s IP address, or if you define this IP address in the "hosts" file. This proxy functionality has two main benefits. Firstly, it often provides access to the "Web Application Firewall" (WAF) that is offered by many CDNs. Secondly, this proxy function is essential for a CDN to effectively distribute content. The emphasis is on "essential." A proxy can optionally be used for a WAF and caching, but for both functions, the proxy feature is always required. Or in other words, no proxy means no WAF or cache. A proxy can have many functions and is not restricted to use within a CDN network.

Another misconception is that a CDN always brings benefits automatically. A CDN can also have disadvantages, which is not uncommon. The perceived benefit of a CDN lies in the worldwide distribution of CDN nodes. The many CDN nodes and their shorter distance to a user's location can shorten the time it takes to resolve the origin host’s IP address. On the other hand, because the distance also plays a role, it reduces the data transfer time when a source is already cached on a CDN node. This describes the ideal scenario, but this ideal rarely occurs. From this, one can already conclude that while a CDN could improve load times, the ideal scenario can only be achieved in theory and with considerable effort. You could also say that CDN providers promise more than they can deliver.

The proof of this lies in the functioning of the internet itself, though CDN providers often overlook it. In a CDN, the number of globally distributed nodes and the resulting reduction in distance between the CDN node and the user is emphasized. Major CDN providers like Cloudflare typically have at least one node in every country. In the USA or Europe (EU), this number is much larger, leading one to believe that the CDN benefit is guaranteed in these countries. However, this apparent guarantee has a crucial flaw or handicap. Requests to a URL whose host is in the same country as the user’s location are rarely faster if routed through a CDN node and can even take longer than without the CDN, because local internet service providers are better connected than large CDN providers like Cloudflare. This limitation affects both the time it takes to resolve the origin host’s IP address and the data transfer times for cached content. The term "fast" or "faster" must be understood relatively. Data transmission over the internet happens almost at the speed of light, so almost in real-time. The transmission time for a webpage is mainly influenced when multiple "hops," or routers, must be queried for IP resolution. Even though each query is fast, it often significantly extends the time it takes for a web server to respond. Ultimately, you can save yourself the question of whether a CDN is beneficial if most of your website’s visitors come from the same country as the host. For these users, the advantage turns into a disadvantage. Or put simply, a CDN only makes sense if your website serves a predominantly international audience.

When is a CDN Really a CDN?



Although the answer to the question of when a CDN is truly a CDN is practically self-explanatory from the previous descriptions, let’s clarify it further. To live up to the name "Content Delivery," a CDN, or a CDN node, must deliver the content of a URL instead of the origin host. This, in turn, means that the content must be cached by the CDN. If this is not the case, it cannot technically be called a CDN. In reality, it’s "just" anycast, though anycast providers can sometimes ensure faster IP address resolution than traditional CDN providers, who also apply the anycast technology.

Interim Conclusion: In theory, a CDN is a great tool, but only in theory, as many factors must align in practice, which in most cases cannot be achieved. It requires both a predominantly international audience and cached content for most source countries. Both conditions must be met simultaneously. Otherwise, any investment is wasted and should be used elsewhere.

If you believe you can meet these conditions with your website, then you’ll be pleased to know that we will soon offer you the Kitt Cache Crawler, which will help you maximize the benefits of a CDN.

What Makes CDN Cache Warmup So Complicated?


The complexity of CDN cache warmup arises almost self-evidently from the operational principle of any CDN. A Content Delivery Network is defined primarily by a large number of globally distributed nodes, which reduce the distance between the user and the CDN node based on the user’s location. This reduced distance and thus shorter latency enable faster load times. This often significant advantage only comes into play when the content of a URL is already cached on the respective node. If this is not the case, the request is forwarded to the origin host. Regular purging of the cache before its expiration drastically reduces the benefits of a CDN. You may still benefit from faster IP resolution of the origin host, but the widely promoted advantage of faster load times no longer exists and is significantly restricted. To maintain the perceived advantage, the CDN cache would need to be almost continuously warmed up. It sounds logical, but it's not so easy to implement.

The difficulties in warming up a CDN cache stem from the very advantages of a CDN. When the content of a source is cached, that cached copy is not stored centrally for all nodes, but is only available to the CDN node nearest to the user. To fully leverage a CDN’s benefits, the cache would need to be warmed up not only for the source but for every CDN node. Since Cloudflare operates a very dense network of CDN nodes, the number of necessary requests increases exponentially.

Despite the contradictions arising from attempts to warm up the CDN cache, there remains not just hope, but a real chance to overcome these contradictions. Since the largest challenge arises from the sheer number of CDN nodes, it makes sense to reduce the number of nodes by selecting countries and regions with the most users. When using any type of cache, the goal is to achieve the best possible efficiency. Since the efficiency of a CDN cache is measured by whether a source is already cached, it is logical to also make the warmup process efficient. While it sounds logical and the effort can be significantly reduced through selection, warming up a CDN cache remains a major challenge! If you're up for the challenge, prepare yourself now. Soon, we will offer a special version of the Kitt Cache Crawler that will enable CDN cache warmup for Cloudflare.

The preparations for the "Kitt Cache Crawler for Cloudflare" are already well advanced. The unique features of this cache crawler are currently being tailored to the specific requirements of CDN cache warmup and expanded accordingly. These expansions include cache warmup for static sources, such as images, CSS files, and Javascript.

More useful Posts