Results of our Polyfill Hack Investigation

We crawled websites to understand what the current status of the polyfill hack is and how many sites are still affected.

We're The Crawl Tool - a smart, low cost or free, SEO crawler that can help you find user experience and on-site SEO issues on your site.

LEARN MORE

Polyfill Hack Current State Investigation

In June of 2024 reports started surfacing about an attack on websites that redirected visitors to, amongst others, gambling sites. This has become known as the polyfill attack. The nature of the attack means that it is on-going, and so we pose the question "what is the current status"?

What is a Supply Chain Attack and What Happened

A supply chain attack is an attack on the less secure elements of a chain that goes into something. In this case the polyfill code provided more modern features in older browsers. By using it on a site, the site developer could then use these more modern features safe in the knowledge they would also work if the user had an older browser.

It is, in our view unfortunately, common practice for developers to load code like this from elsewhere - a CDN. This comes with some advantages - such as it being able to automatically update and therefore reducing maintenance requirements, but also with disadvantages such as leading to slowing of initial load, and of course a single point of failure and in this case ingress.

Sites doing this are relying on a third party to continue to provide trustworthy code. In the case of polyfill, the domain on which it was served was ultimately sold to an untrustworthy party (through no fault of the code author it seems) and that party modified the code nefariously. While in this case a propertion of traffic was redirected, the net effect is that the entire site is under the control of the bad actor.

Initial Reported Scale

Initial reports put the number of instances of usages of polyfill io at 110,000+ . Whilst it does have the plus after it, this appears to have been calculated on a database with only 478M web pages. Given the high likelihood that the code is repeated across nearly all pages of a particular web site, this represents a tiny fraction of websites.

Initial Responses

Initial responses were varied. An issue is that the because the idea of linking to a CDN'd version of a library is to reduce the need for a developer to maintain them, many sites will just not be aware they even have the issue. Notably Google Ads blocked sites running the polyfill site - it is unclear why but presumably they worried about the script being used nefariously in some way.

The CDNs Cloudflare and Fastly provided their own safe version of the scripts. Cloudflare has since started replacing this automatically, with no sense of irony that they are automatically modifying websites from a CDN that represents a single point of failure/ingress in the supply chain because a script on a CDN that represents a single point of failure/ingress in the supply chain has been compromised.

Furthermore, the domain registrar has suspended the domain in question.

Our experiment

We took the common crawl and searched for mentions of the affected polyfill script. This crawl was performed in July, August, and September. It contains 95.4 million domains, with the original data coming from over 2.5 billion pages. Or in other words, considerably larger than the dataset the original estimate was made on. Given the time differences and the fact that some sites will have fixed the issue, we would expect this number to be somewhat lower than the original number of affected sites. However, we must also keep in mind that people are talking about the issue and so some of these mentions may not be actual code loading but instead people talking about it indiscreetly. Additionally quite a large number of these seem to consist of the original script being commented out in the code and replaced with Cloudflare/fastly code.

However, from this initial data set we found 2,506,159 sites. Strongly suggesting that the original number of 110,000+ is a severe underestimate.

We created a web crawler to crawl the 2.5M site's root pages with the intent of extracting only those that are currently running the polyfill CDN code.

This comes up with the number of 29251.

What Can We Say From This?

In this case, whilst we cannot place an exact number on it - it does seem likely that the original reported number of affected sites was severely underestimated. Of those sites that were affected the publicity and the actions of a few large CDNs have mitigated the issue on a large number of sites. However, the issue still remains present on at least 29251 websites.

The Situation for the 29251 websites

Because the domain registrar, Namecheap, suspended the domain and therefore the name does not currently resolve to anything the script will not be loaded on these sites. This means that traffic will not be redirected from them, but also that some functionality of their sites will not work. It's also important to note that this fix is temporary as at some point the domain will presumably be released and future owners would then have the potential to control these websites. This is then not something that these site owners can simply ignore.

Using The Crawl Tool to Check Your Site

The simplest way to find out if your site is using the library in question is to look for cdn polyfill io in the Offsite JS Scripts report alternatively the Offsite JS By Page report will tell you what scripts are on what pages. You can use the "filter text..." filter and enter "polyfill" to isolate it down.

Here's a video of the process.

The Crawl Tool is a smart, low cost or free, SEO crawler that can help you find user experience and on-site SEO issues on your site.

LEARN MORE