# The Crawl Tool - the smartest SEO crawler [The Crawl Tool - the smartest SEO crawler](https://www.thecrawltool.com/) ## Unlock your SEO potential Stop guessing, and start knowing with our in-depth SEO crawler. We provide you with in-depth analysis of your website, identifying areas for improvement and opportunities for growth. Our tool gives you the data you need to outrank your competiton. Sign up for your free trial and take control of your SEO today. No credit card necessary ### Sign in to The Crawl Tool ### Hassle-Free Website Crawling To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video #### Web-Based Platform Say goodbye to complicated installations and software updates with The Crawl Tool. Our platform is entirely web-based, meaning you can access powerful website crawling and SEO reporting features directly from your browser. No need to download or run any programs on your computer. #### Enhanced Workflow Our platform allows you to assign statuses to issues, making it easy to track progress and prioritize tasks. With multi-user support, team members can work together seamlessly, sharing insights and updates. #### Experience the Power Discover the full potential of The Crawl Tool with our generous free trial, allowing you to crawl up to 1,000 URLs at no cost. This trial gives you a firsthand experience of our seo crawl tool's capabilities, enabling you to identify and fix issues, optimize your site, and see the benefits. ### Pricing Plans #### Free Trial #### Pay Monthly #### Pay As You Go ### Latest Blog Posts #### Weekly Feature Update 3 #### Is Low Effort Content Always Bad? #### Weekly Feature Update 2 #### What is a Soft 404? #### Weekly Feature Update 1 #### World's Smallest AI Text Detection Model ### Frequently Asked Questions We use credits to charge for service usage. One credit crawls one url. We charge credits in batches of a minimum of 100. In the future it will also be possible to use credits for additional services. The credit allowance for monthly plans is generous, but if you run short then you can always buy more credits for that month. Most users will only have one sort of credit on their account. In the case of the free plan the credits will expire after 3 months. If you want to crawl sites after then, you'd need to consider a paid plan. Pay As You Go credits expire after one year. You should plan to use them up by then, after a year you will need to buy more if you want to crawl. A monthly plan gives credits each month. They don't roll over to the next month, so credits from the previous month expire when you get your allocation of credits for the next month. After running a crawl, the tool generates reports that take up space on our servers. Each plan comes with a data retention timeframe. If the last crawl of a project/site is longer than this time frame then we reserve the right to delete the reports after that timeframe if we feel it will help enhance performance for other users. You can always download the reports as xlsx files beforehand. For each site you want to crawl, you start a project. This project will house the crawl data/reports and (apart from the Free version) allow you to manage employee or third party access to a project's reports. Depending on your plan, you may have a different number of projects. You can always download the data and delete projects if you need to make space for new ones. The tool works as a system that has one account. This account and setup projects and request crawls. You might want co-workers or third parties to look at an access those reports though. For this reason we allow you to create access keys. Each access key can be given a name and set to which projects/sites it can access. By giving appending the access key to the report url and sending it to the other party, they can access the reports and action it (such as changing statuses). In order to preserve your credits balance and to protect unexpected large usage, we cap each individual crawl to a maximum of 50000 urls. This will cover nearly all sites, but get in touch to discuss your requirements if you need more. # The Crawl Tool Blog - Everything you must know [The Crawl Tool Blog - Everything you must know](https://www.thecrawltool.com/blog) #### Weekly Feature Update 3 #### Is Low Effort Content Always Bad? #### Weekly Feature Update 2 #### What is a Soft 404? #### Weekly Feature Update 1 #### World's Smallest AI Text Detection Model #### Understanding Semantic Vector Space for SEO Professionals #### The Importance of Topic Relevance for SEO #### The SEO Title is too Short #### Why We Support Data Driven SEO #### 4 Free Tools That Will Revolutionize Your SEO Strategy Overnight #### SEO Content Optimization: How to Write for Search Engines and Readers #### Is SEO Really Dead? #### Avoiding the Temptation to Over-SEO #### What is a Web Crawler and How Does it Work #### Basics of Technical SEO for New Website Owners #### Introduction of Linking Domains Functionality #### Using AI to Keep a Tightly Themed Website #### What Do Too Short and Too Long Titles Mean in SEO Tools? #### Enabling Devs to Check Sites for Errors Before Production #### Discovering Cookies With The Crawl Tool #### Achieving Perfect Robots.txt Functionality #### Introducing The New Crawl Tool Project Dashboard #### Results of Our Polyfill Hack Investigation #### An Introduction to The New Header Tag Report #### How to List All Offsite Javascript Scripts #### Amazon AWS Startup Founders Programme Credits #### Introducing the Smartest SEO Crawler - The Crawl Tool # Free Credits [Free Credits](https://www.thecrawltool.com/free-credits) ## Free Credits At The Crawl Tool we give 1000 credits just for signing up, this lets everyone try out our systems and lets small websites use the service for quite a long time. But we're also not shy about raiding the marketing budget of free credits to give to good causes and those that help us out. Here's how you can get your hands on some of those sweet free credits... ### Give us Feedback The Crawl Tool exists for and plans its future development for its users. So naturally we want to hear what you have to say. What is good? What can be improved? What do you wish it could do? Email us feedback to feedback@thecrawltool.com and if we find it helpful then you could find a few more credits in your account than you had. ### Shout us out We love it so so much when people mention us on social media. As a small. growing, business the way to make us happy is to follow us on social media and give us a shout out. If you do, let us know on social@thecrawltool.com so we know what account to drop a 1000 credit thankyou on. You can shout us out as much as you like (but unfortunately we can only pay the credit once. But don't let that stop you!). ### Be a Good Cause If you're the sort of organization that changes the world for the better, then The Crawl Tool wants to help you out. Drop us an email to: grants@thecrawltool.com telling us what you do and how it impacts the world. If we like what you do and say, there could be 50000 credits in your account. You don't need to be a charity for this, but you do need to be an organization that spends the majority of its time on doing good. If you're organization is really good and the website size requires it, we might, in some cases, give more. ### Ask Us One of the principles of The Crawl Tool is that we follow user feedback. We don't believe that we, alone, can think of all the good reasons we should give our free credits. If you think you've got a good one email us on goodreason@thecrawltool.com . # Guide [Guide](https://www.thecrawltool.com/guide) ## Guide To Using The Crawl Tool Our goal with The Crawl Tool is to make it effortless and easy to use. But you may still wish to have some instructions. If so, this section of the website is for you. ### Starting Out Taking your first steps with The Crawl Tool? These pages are probably of most interest to you. How To Start Your First Crawl ### The Reports The Crawl Tool can provide a variety of useful reports. With the exception of the dashboard, these can be accessed from the report dropdown at the top of the project pages. Each link below describes the report. The Project Dashboard The Broken Links Report The Crawl Log The Insecure Content Report The Internal Links Report The Meta Descriptions Report The Meta Keywords Report The Meta Open Graph Report The Meta Twitter Report The Missing Alt Tags Report The Offsite Links Report (with AI) The Page Links Overview Report The Pages Linking Redirects Report The Redirects Report The Titles Report The Offsite JS Scripts Report The Offsite JS By Page Report Header Tags Report The Linking Domains Report The Cookies Report The Site Information Report The Theme Report (AI) ### Project Management Choose the columns that display in a report Schedule Crawls Edit Project Delete Project ### Task Management One major difference with The Crawl Tool is that we have a lot of functionality to help manage fixing issues with a site. Making Access Keys for Third Parties Assigning Status and Assignee Setting Due Dates and Notes Bulk Assignment ### Filters and Exports The Crawl Tool provides a number of ways to filter your data within the system, or to export the data as Excel or Google Sheets. Filters Applying Robots.txt to filter data Exporting Data ### Public API For programmatic data access, use our public API. Public API # Login to The Crawl Tool [Login to The Crawl Tool](https://www.thecrawltool.com/login) ## Unlock Your Website's Full Potential Ensuring your website is optimized for search engines is crucial for attracting and retaining visitors. The Crawl Tool offers a comprehensive solution for identifying and fixing issues that could be holding your site back. With our in-depth reports, you can uncover hidden problems, streamline your site's performance, and enhance its visibility on search engines. Don't let technical issues or overlooked details undermine your site's success. ### Sign in to The Crawl Tool # The Crawl Tool Privacy Policy and Cookies [The Crawl Tool Privacy Policy and Cookies](https://www.thecrawltool.com/privacy) This Privacy Policy describes the policies of The Crawl Tool, Haringstraat 27H, Zuid-Holland 2586XT, Netherlands (the), email: support@thecrawltool.com, phone: +441483983024‬ on the collection, use and disclosure of your information that we collect when you use our website (https://www.thecrawltool.com). (the "Service"). By accessing or using the Service, you are consenting to the collection, use and disclosure of your information in accordance with this Privacy Policy. If you do not consent to the same, please do not access or use the Service. We may modify this Privacy Policy at any time without any prior notice to you and will post the revised Privacy Policy on the Service. The revised Policy will be effective 180 days from when the revised Policy is posted in the Service and your continued access or use of the Service after such time will constitute your acceptance of the revised Privacy Policy. We therefore recommend that you periodically review this page. #### 1. Information We Collect We will collect and process the following personal information about you: #### 2. How We Use Your Information We will use the information that we collect about you for the following purposes: If we want to use your information for any other purpose, we will ask you for consent and will use your information only on receiving your consent and then, only for the purpose(s) for which grant consent unless we are required to do otherwise by law. #### 3. Retention Of Your Information We will retain your personal information with us until account termination or for as long as we need it to fulfill the purposes for which it was collected as detailed in this Privacy Policy. We may need to retain certain information for longer periods such as record-keeping / reporting in accordance with applicable law or for other legitimate reasons like enforcement of legal rights, fraud prevention, etc. Residual anonymous information and aggregate information, neither of which identifies you (directly or indirectly), may be stored indefinitely. #### 4. Your Rights Depending on the law that applies, you may have a right to access and rectify or erase your personal data or receive a copy of your personal data, restrict or object to the active processing of your data, ask us to share (port) your personal information to another entity, withdraw any consent you provided to us to process your data, a right to lodge a complaint with a statutory authority and such other rights as may be relevant under applicable laws. To exercise these rights, you can write to us at support@thecrawltool.com. We will respond to your request in accordance with applicable law. Do note that if you do not allow us to collect or process the required personal information or withdraw the consent to process the same for the required purposes, you may not be able to access or use the services for which your information was sought. #### 5. Cookies The Crawl Tool uses only functional cookies to enable users to log in and use their account. #### 6. Security The security of your information is important to us and we will use reasonable security measures to prevent the loss, misuse or unauthorized alteration of your information under our control. However, given the inherent risks, we cannot guarantee absolute security and consequently, we cannot ensure or warrant the security of any information you transmit to us and you do so at your own risk. #### 7. Grievance / Data Protection Officer If you have any queries or concerns about the processing of your information that is available with us, you may email our Grievance Officer at The Crawl Tool, Haringstraat 27H, email: support@thecrawltool.com. We will address your concerns in accordance with applicable law. # Register for The Crawl Tool [Register for The Crawl Tool](https://www.thecrawltool.com/register) ## Register for Success You're one step away from SEO success using The Crawl Tool's advanced tools. Don't let technical issues or overlooked details undermine your site's success. Create your account now. ### Register for The Crawl Tool # The Crawl Tool - Our Terms and Conditions [The Crawl Tool - Our Terms and Conditions](https://www.thecrawltool.com/terms) These terms and conditions outline the rules and regulations for the use of The Crawl Tool's Website, located at https://www.thecrawltool.com. By accessing this website we assume you accept these terms and conditions. Do not continue to use https://www.thecrawltool.com if you do not agree to take all of the terms and conditions stated on this page. The following terminology applies to these Terms and Conditions, Privacy Statement and Disclaimer Notice and all Agreements: "Client", "You" and "Your" refers to you, the person log on this website and compliant to the Company's terms and conditions. "The Company", "Ourselves", "We", "Our" and "Us", refers to our Company. "Party", "Parties", or "Us", refers to both the Client and ourselves. All terms refer to the offer, acceptance and consideration of payment necessary to undertake the process of our assistance to the Client in the most appropriate manner for the express purpose of meeting the Client's needs in respect of provision of the Company's stated services, in accordance with and subject to, prevailing law of Netherlands. Any use of the above terminology or other words in the singular, plural, capitalization and/or he/she or they, are taken as interchangeable and therefore as referring to same. #### Cookies We employ the use of cookies. By accessing https://www.thecrawltool.com, you agreed to use cookies in agreement with the The Crawl Tool's Privacy Policy . Most interactive websites use cookies to let us retrieve the user's details for each visit. Cookies are used by our website to enable the functionality of certain areas to make it easier for people visiting our website. Some of our affiliate/advertising partners may also use cookies. #### License Unless otherwise stated, The Crawl Tool and/or its licensors own the intellectual property rights for all material on https://www.thecrawltool.com. All intellectual property rights are reserved. You may access this from https://www.thecrawltool.com for your own personal use subjected to restrictions set in these terms and conditions. You must not: Only one trial account is permitted per customer Signup with so called "disposable", "temporary", or other fraudulent details is prohibited. In such cases any fraudulently obtained free credits will be billable at the advertised rates. #### Content Liability We shall not be hold responsible for any content that appears on your Website. You agree to protect and defend us against all claims that is rising on your Website. No link(s) should appear on any Website that may be interpreted as libelous, obscene or criminal, or which infringes, otherwise violates, or advocates the infringement or other violation of, any third party rights. #### Your Privacy Please read Privacy Policy #### Reservation of Rights We reserve the right to request that you remove all links or any particular link to our Website. You approve to immediately remove all links to our Website upon request. We also reserve the right to amen these terms and conditions and it's linking policy at any time. By continuously linking to our Website, you agree to be bound to and follow these linking terms and conditions. #### Removal of links from our website If you find any link on our Website that is offensive for any reason, you are free to contact and inform us any moment. We will consider requests to remove links but we are not obligated to or so or to respond to you directly. We do not ensure that the information on this website is correct, we do not warrant its completeness or accuracy; nor do we promise to ensure that the website remains available or that the material on the website is kept up to date. #### Disclaimer To the maximum extent permitted by applicable law, we exclude all representations, warranties and conditions relating to our website and the use of this website. Nothing in this disclaimer will: The limitations and prohibitions of liability set in this Section and elsewhere in this disclaimer: As long as the website and the information and services on the website are provided free of charge, we will not be liable for any loss or damage of any nature. # Why The Crawl Tool? [Why The Crawl Tool?](https://www.thecrawltool.com/why-the-crawl-tool) ## Why The Crawl Tool? A genuine question you may have is "Why The Crawl Tool"? The crawl tool is an innovative tool to help find issues on your website, improving your on-site SEO and user experience, and helping you manage those issues. But what makes the crawl tool a such useful tool and how does it differ from it's competitors? ### The SEO Spider Market is Broken The Crawl Tool sits in the category of SEO spiders/crawlers. It started out by looking at what is happening in that market and to generate a product that is better for our users. The unescapable conclusion is that the market is broken. There are two categories of competitors. Installed Software You can rent software to install on your computer. The software is based on old technologies and uses your own resources, but you pay yearly for it. It provides a vast amount of data to try to justify the price, but 99% of that data you will never use and it just makes it confusing and cluttered. Because your are using your own resources for the crawl your IP gives you away if you do an analysis of a competitor site and they may feed you false information. Less established businesses may also slow their internet connection down and sometimes break their ISPs terms of service. As your team gets larger there's another problem that surfaces - it gets difficult to share data without more licenses. If you want to do things like schedule crawls then you need a masters degree in IT. Updates? Run them yourself. Whilst a crawler as an installable software item seems like a good idea at first (our platform started with that as a concept but scrapped it), in practice it's not so practical. You can see this for yourself, take their software, apply it to their own sites and see if they use it themselves in practice. You'll be surprised! If they don't see value in their own products and use them themselves, why should customers pay for it? Online Software There are online sites where you'd need to mortgage your grandmother to pay their monthly charges. That may be okay for the big boys, but what about the rest of us? What about the average company looking to improve their website or the individual SEO or small agency? These sites are generally resellers of data gathered elsewhere, with their own on top. Like installed software there is a huge amount of un-necessary data to try to justify the cost. However, they at least try to simplify the presentation of this by combining it into proprietary scoring systems. However, this simplification also throws out a lot of useful information. What does having a 7/10 on-site SEO score even mean?! These sites may argue over who has the fastest crawler, etc, but not tell you that having the fastest crawler is only an advantage for the most busy websites and a disadvantage for everyone else. It's not that these tools are bad, but each really only fulfills the needs of a tiny amount of websites, SEOs, and agencies. The majority are forced to pick an option that doesn't fit their needs. ### The Three Principles of "The Crawl Tool" In designing the crawl tool, we took a look at the state of the market and designed some principles to ensure that our product fit market needs. For every decision, including every design and feature design, we ask the question "does this fit and fulfill the principles". In that way we can ensure that The Crawl Tool is not about overwhelming you with data you won't use or vaguely defined meaningless statistics but real, actionable, data you can use. Our three principles are: Simplicity - that's not to say that it's missing things, but that we strive to present what is important in an easy and actionable way. Anybody can use it - whether that's yourself, a tech team, or your work experience intern. Low Cost - we charge for what you use and we keep our costs low so we can charge less. We want to make things accessible to small business and starters too. Ease of Management - presenting you with data is great, but we want to make it easy for you to manage fixing any problems. By applying these three, customer focused, principles to our product each and every time, we will fix the broken SEO crawler market. ### The Tool On these principles, The Crawl Tool is a web based tool. You register, and it's immediately ready for you to use. Add a site as a project, click Crawl and it fire up a crawler to crawl it and send you an email to you when it's done. We handle the specifics of the crawl behind the scenes for you. When the crawl is done, the data is ready for you. We present it as some graphs and tables. We don't make up meaningless pretend scores! You can use our interface but we don't hold your data hostage, you can export it to excel or Google Sheets if you want. You can set deadlines for yourself to fix issues. Or if you have a team - deadlines for others. As well as assign statuses, and add notes. You pay us not for work we did over a decade ago, or reformatting mostly public data, but inline with your resource usage. If your usage is tiny, then so is the cost. It's only fair. It's the tool 99% of the market needs. Check out The Crawl Tool with 1000 free credits when you register. We can't wait to help improve your website. # Is Low Effort Content Always Bad? [Is Low Effort Content Always Bad?](https://www.thecrawltool.com/blog/is-low-effort-content-always-bad) ##### The Crawl Tool Team ### Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is how Google's describes his position: John coordinates Google Search Relations efforts. He and his team connect the Google-internal world of Search engineering, to those who create and optimize public websites. Though I often find myself feeling that there is a huge disconnect between what he public states on social media opinions to the actual work How common is it in non-SEO circles that "technical" / "expert" articles use AI-generated images? I totally love seeing them [*] . [*] Because I know I can ignore the article that they ignored while writing. And, why not should block them on social too ### Is This True? No We can summarise the post as saying that when he says AI generated images, he knows that the article was low effort - the implication is that it was AI generated. This is, of course, one aspect of modern AI that we can't ignore. It is possible to use AI to generate an article and images entirely programmatically with zero input at all. There are also clearly people that do this. The other side of the coin is that being a subject matter and technical expert doesn't make you an image/photo expert (unless that's the subject!). One could argue that for the majority of the articles on the internet subject matter and technical experts use an alternative lazy option - to license stock images. From large to small businesses and organizations, they're all doing it. They have put very little to practically zero effort into the image. Really small entities may not have the budget for this. The Crawl Tool , for example, tries to keep costs low for our customers. Paying to license images would increase prices. Luckily there are free sources of stock images that we use. An alternative to free sources of stock images that has become available over recent years is AI images. Just like there are tools that programatically do the entire thing for you, there are tools that you can use to just help you add AI images (much like there are also tools to help add stock photos, there are even entirely automated ones that existed before generative AI). In short, the overwhelming majority of image use on the internet has been extraordinarily low effort. AI usage is just a modern aspect of that. Most topical experts are not experts in making their own images! The idea that the image determines the accuracy and usefulness of the text is fallacious. I'm no fan of AI images, but that's as true for AI as it has always been for other forms of low effort images. ### But Why I Still Wouldn't Use Them There are two reasons I still wouldn't use them. Fallacious argument or not, it is unfortunately the case that AI images co-occur with AI text on larger proportion of cases than would be normal due to the fully programmatic tools. The second reason is connected - whilst the core argument he is making is fallacious, that doesn't mean that a lot of people don't share that same belief and fallacies. People's willingness to overlook fallacies often depends on their own belief systems and what they are pro/against. And boy have we learned that in recent years. There's no claim from anyone that Meuller is talking from a Google viewpoint here, but if we take his statement as a potential visitor to our website with a belief system that is anti-AI, then we have to accept that a proportion of visitors holds beliefs so strong that they would completely overlook and actively try to avoid your content if there was even a whiff of AI. ### Where Does That Bias Come From If we follow the thread, Meuller continues: I struggle with the "but our low-effort work actually looks good" comments. Realistically, cheap & fast will reign when it comes to mass content production, so none of this is going away anytime soon, probably never. "Low-effort, but good" is still low-effort. It's that programmatic mass produced content we mentioned earlier. I think we can all relate, have seen stuff like that, and that indicates that the number of people who share his opinion would be very large. ### But Isn't This The Pot Calling The Kettle Black? Why yes it is. Google started as a search engine. It programatically scraped web pages and programatically enabled that to be searched and served up. Google is, and has long been, the king of programatically produced content and it's done so on an extraordinary mass scale that nobody had seen before. It is the king of low effort content. Google expanded since then and has things like Youtube. That's a website that users contribute virtually all the meaningful content to. For Google, it's as low effort as you get. If a user, for example, doesn't chose a thumbnail image for their video then it's programmatically determined for them. Take their copyright systems - they are virtually entirely programmatic, leading to a vast majority of complaints. In literally everything they have done, Google have aimed for low effort content and systems for themselves. But this is not "AI" you say. Well a lot of it is, but it's not generative AI. But take a look at all of those systems and try to find one that doesn't now have a generative AI "summary" prominently attached to it. People are literally having to type swear words into search to turn them off. Literally everything Google does is low effort content, and that is often forced upon users. If the logos were only AI generated, then Meuller would know they're not worth his time! I kid of course, but it is weird to be a representative of the world's biggest producer of low effort, programmatic content, publically discussing his disdain for low effort content! And granted, Google products are good (less good recently, but still good) - but as Meuller says "Low-effort, but good is still low-effort" That said, I'm just pointing out the comparison - please don't take that as an excuse to produce even more trash. ### I Can't write In Conclusion Here Because It Will Trigger AI Detectors That an image is low effort says nothing about the text. Just the same as the fact that Google pays logo designers doesn't say their pages aren't low effort programmatic content. Indeed, it has long been the way of the internet and before that printed media, that because experts in something aren't necessarily experts in images then they take a low effort approach and outsource that - formerly to stock photo providers, and now more to generative AI. And if you want an image of a six fingered queen waving, why not? Just beware the Meuller is not alone in his opinion and it may have a negative effect on some traffic. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Weekly Feature Update 1 [Weekly Feature Update 1](https://www.thecrawltool.com/blog/weekly-feature-update-1) ##### The Crawl Tool Team The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also regularly adds new features and functionality. For that reason, we've decided to do a weekly update so you can assess whether you want to use the new features and learn a bit about how to use them. ### Favicons and Show All Button We've added favicons into the projects list. It's a small thing but it's not just a nice visual touch, it helps you find the project you want to look at quicker. With the SEO Dashboard for a project you get a lot of information. But perhaps you want to simplify it? If there are no broken links then a broken links box that says 0 may be re-assuring but it doesn't tell you much. At the top of the SEO dashboard there is now a "Show All" toggle. This lets you decide for yourself if you want to see all the boxes, or only those with actionable information. ### Fresh The Crawl Tool's crawler can now remember when it first found a URL. That's enabled us to add a new tab to the projects called "Fresh". This lists the pages found in the last 7 days for a site. If you're the SEO for a site but not necessarily the content producer, or don't produce all of the content, then this is useful to see what new content has been placed on the site. It's also useful to watch competitors to see what new pages they are producing. It's helpful to use this with the Scheduled Crawl feature to make sure the site is crawled every week. If it's a competitor you may want to set the crawl speed to "slow" in the project settings. We've also added an option to only crawl on-site pages which doesn't check if offsite links are broken or the relevance of offsite links, but it's handy for saving credits if you just want to monitor a competitor for basic information and fresh pages. The first time a site is crawled everything will be "fresh", of course. But after a week it will show only pages found in the last 7 days. ### Page Information When you click on a link in any report on The Crawl Tool it has taken you to the page. Now we've introduced a step in between. If we have information on that page then it presents a "Page Information" page. You can still click the "Open Page" link to get to the page, or you can check out the wealth of information about the page, it's problems, search data, traffic. We also use vector analysis from the crawls to find and show similar pages here. These are the pages you might want to consider linking to this page or linking from this page. It's a great opportunity to discover internal linking opportunities. ### World's Smallest AI Dectector We've talked about the world's smallest ai text detector in a blog post before. It's a tiny AI text detection model that can run in browser, meaning there's no costs to us to run it, which in turn means you can use it as much as you like. It may not be as accurate as its bigger brothers, but it's pretty good and certainly useful. Please check the previous blog post for the reasoning and what it tells us about various myths about AI text and SEO in general. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Weekly Feature Update 2 [Weekly Feature Update 2](https://www.thecrawltool.com/blog/weekly-feature-update-2) ##### The Crawl Tool Team It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there were a million things in the first update, and only two in this second one. But they are big things. ### Webpage Design The crawling and reporting aspects of The Crawl Tool are the exciting part and so they've been the focus of most of the work. This left things like the home page , the guide , and the blog , looking a bit dated. That's more noticeable than before since the beginning of these weekly posts, that weren't that easy to read. So all those pages have had a fresh, new, more modern, and clean redesign that will make everything easier to see, read, and digest. ### Backlinks A clear request from users is for backlink information. There's lots of ways we can use that - from a simple report that you can search or export if you wish, to backlink info by page on the backlink info pages, to a backlink research tool. We've been unable to do these because access to backlink databases is expensive and it would compromise our low cost approach. Until now...we've started building our own backlinks database. This database building process will last a couple of weeks more before we can start adding the features we just mentioned. Except for one - we currently have a partial database. When you crawl a website it'll query that partial database and fill in a "Back Links" report for you on the dropdown. Until we've built the database, it's just a fraction of the backlink information that we'll be able to provide users with but we figured we may as well give what we have now. We're considering opening an API to this data and are interested in hearing any ideas for potential use cases, requirements, etc, to sound out if that's a worthwhile effort. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Weekly Feature Update 3 [Weekly Feature Update 3](https://www.thecrawltool.com/blog/weekly-feature-update-3) ##### The Crawl Tool Team In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the user interface of The Crawl Tool to make it easy to use and access the data on your crawled projects. But you might want to create your own little tool or do a simple task programmatically. Now you can with our public API. At the moment we have 3 functions you can call: This is the start of our desire to enable all types of website owners, SEOs, and agencies to improve their sites in whatever way they want. We'll keep working on this. For example, there's a huge backlink project we're working on that we hope to add to the API in the near future. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # What is a Soft 404? [What is a Soft 404?](https://www.thecrawltool.com/blog/what-is-a-soft-404) ##### The Crawl Tool Team There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: We all know that 404 means page not found, but what exactly is a Soft 404? ### Time Travelling to 2008 HTTP is the protocol behind serving web pages, it was later expanded to include HTTPS, but the differences in that expansion aren't important here. It has always been the case the a web server serves status codes with web pages in response to an https. Many of these you will know. 200 for "okay". 301 for a permanent redirect. 302 for a temporary redirect. 404 for page not found. This was originally so that web browsers could read the status and take the appropriate action or inform the user. Later, web crawlers like Google's used them to guide their crawls. After all, all of that is helpful for them to know. The first instance I can find of talk about "Soft 404s" is in 2008. We should be clear here that a "Soft 404" isn't an official thing. It's not part of the standards that the internet is based upon. Google appears to have simply made it up. ### So What Is It? As we've mentioned, when a requested page doesn't exist, the server is meant to return a 404 status code. In the old days of the internet often the browser would choose the text to display to the user. But nowadays it is way more common for websites to send a web page along with the 404 to inform the user, like with those funny messages that the website owner things are cute but aren't funny. The standards allow that, that text could technically be anything. It is possible that a website is misconfigured. In that case it could return the content of a 404 page, but without the 404 status code. A visitor to such a website would highly likely not notice, because the status code isn't rendered anywhere so they are seeing exactly the same visual 404 page as they would otherwise see. A web crawler, however, is dumb. It does not really "see" the web page and is therefore pretty reliant on the 404 status code being sent correctly. This is what the made up term "Soft 404" refers to - pages that are delivering a 404 alongside a page that basically says the page could not be found. It's relevant only to automated systems like Google. ### But Why? Running a web crawler is actually expensive and everything you want to store in the search index is taking costly space. If servers are returning page not found errors but not the 404 codes then a crawler is crawling it and storing it as a search result, which is essentially useless from the search engine's perspective. It's as simple as that. ### Google's Soft 404 Detection is Really Bad In years gone by, search engines used to compete by proudly announcing the number of web pages in their index. That's not really a metric that defines usefulness to the searcher. It is no more. That's a good shift as it encourages search engines to think more about their users, and ultimately these so called "Soft 404s" aren't useful. Observationally though, in recent years with the advancement of AI Google has become more aggressive at pruning what pages it wants to crawl. We can guess they want to use resources used for crawling for AI instead. Efforts to ramp up crawler efficiency therefore seem to be in progress. Detecting a so called "Soft 404" is not an easy thing to do, it also has a cpu usage cost associated of doing the actual check. There's a catch 22 in there - if you try to too accurately detect such pages then you end up spending more resources than gain by being able to filter them out. Any balanced level will always have a lot of false positives - pages that aren't so called Soft 404s that are detected as such. If you're a search engine with a lot of pages in your database, you can afford that. In fact, if you're trying to save costs you can make it really aggressively filter and accept lots of false positives because you probably already cover the topic those pages were about. Good for search engines, not so great for the website owners. ### Fixing a Soft 404 Okay, so one of you're pages has been hit by Google's made up term and aggressive cost cutting and a genuine page of yours is marked as a soft 404. What are your options? Well, the answer is kind of easy. Think about how you would set up a 404 page. It tends to have a standard template and very little text, perhaps a heading and a sentence or two. Now do the opposite. Add some paragraphs of text in. Now it doesn't look like a 404 page! That was easy. Given it has been marked as a kind of 404, it may take Google's crawler some time to come back and read and update it. With Google being ever stricter on crawl budget allocations for the same reasons mentioned above, it could take some time. Patience is key. Or just give it a new url because crawlers are stupid, a new url is a new page according to a crawler. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # World's Smallest AI Text Detection Model [World's Smallest AI Text Detection Model](https://www.thecrawltool.com/blog/worlds-smallest-ai-text-detection-model) ##### The Crawl Tool Team ### Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny, but before we get into that we need to address the context and the why. In the last few years, Large Language Models have become really good at generating human-like text. For web professionals and those interested in the web sites, that has led to several obvious effects. Around these things a number of myths have also appeared. AI models take a lot of resources, and therefore money, to run. The tools to detect AI generated text generally use costly methods that mean you can use their tools for free for a small amount, but then they want ... they need ... you to pay. A more silent revolution in the world of AI is that small models can now be run in a web browser. A really small AI text detection model can therefore be run at very low cost, enabling it to be provided for free for as many uses as someone wants. ### Building the Model Building an AI model is a process that has been getting easier. But it's still overly complicated, especially if it's an area you're less familiar with. So I used Claude 3.5 in Cursor AI to give me some assistance. This is an experience where it constantly gets it wrong, but if you know just enough to keep trying to prompt it in the right direction then you eventually get there. I'm not usually a fan of AI coding because of the amount of errors and corrections you need to do, which normally take more time than actually coding it yourself. But here, where it would otherwise take weeks to learn about the topic, it worked effectively. It is weird though that an AI Large Language Model (LLM) was helping me create an AI to detect AI LLM text! The Hugging Face website has some datasets we can choose from, making the process a bit simpler. I chose MAGE . I'd also then effectively decided to train it only for English, which I think was a reasonable compromise for such a small model. Before I knew it, my GPU was on a 10 hour training run. This is the first learning lesson - for a small, simple, model it really doesn't take that long to train them. I just left it overnight. ### Initial Observations and "Accuracy" Following waking up to a newly built AI model, it's time to test it. Against samples in the MAGE dataset that it hadn't seen, it scored just under 90% accuracy. With text stripped from an AI text dataset and an equivalent human text dataset it scored an accuracy of 83.64%. Here is our first "myths" learning point. Tools to test for AI Generated Text will often quote this figure. Indeed, scientific papers often do this. But in a real world context that's probably not task suited. If you're using them to find paragraphs of text that somebody may have created with AI, for example if you're a teacher checking a student's essay, then what is probably important to you is how often it misclassifies Human text as AI. The same is probably true if you're analyzing a piece of text paragraph by paragraph. When my tiny model makes an error in classifying something as AI or human text then it has a strong tendency to misclassify human text as AI text, rather than the other way around. If other models have the same tendency, which seems logical, than paragraph by paragraph analysis just doesn't make much sense. Because of the way the AI works we do need to feed in text in chunks and using paragraphs makes sense. Then we can calculate a score across the whole text. But what I'm saying is the model, and probably such models in general, is way better at generating an opinion on longer texts than it is at a paragraph or shorter level. Following these automated tests, I did some manual testing. This is much more limited but I took some known AI generated articles from web pages and some known non-AI ones and tested them. This is, by nature, a much more limited form of testing but it seemed to perform pretty well for standard website articles. One interesting observation is that it classified virtually all news articles I tried as AI, even when I could be sure they weren't. This suggests the patterns between LLM generated text and news sites text are more similar and therefore more nuanced than a small model can handle. It would probably take a larger model and some specific training to handle this. It also suggests where a lot of the training of LLMs may have come from! ### Search Engines Can't Detect AI Content Myth An advantage of the previous tests is they were on my machine and I could use the GPU, so they were quick to do. Because it's such a small model, it runs really fast on the GPU. One common myth I read is that search engines can't detect AI content. This is based off the idea that it would simply take too much computer resources to run scans across so many pages. Whilst we don't know if they're doing it in practice, the ability to create such a small model puts that idea in serious question. Assume you were a search engine, you could use a small model like this to test a fairly large amount of web pages. Whilst it may not be the most accurate model in the world, it has enough accuracy to give an indication that a page might be AI. Following this you could take a second, larger and more detailed model, and test just the pages that test positive. This would significantly reduce the amount of resources you would need to expend. Regardless of whether they're doing it at the moment or not, building the tiny AI has shown that the potential to detect AI content is not just a theory but something that would highly likely be currently implementable. You should not expect AI content to perform well in search engines in anything but the very short term. ### Building an Interface Whilst search engines would want to run a tiny AI model as fast as possible, we want to run it as availably and at as lower cost as possible. To do this I used some new web technologies and traded in "time". The most important of these was ONNX runtime , but transformers.js is also a worthy mention. I spent some time with Claude 3.5 working out how to turn the model into an onnx format model and also how to make it a bit smaller. Then implementing it into the web interface that I've put here . This allowed me to more quickly texts. Notably it is slower because it is now using the browser and the cpu, rather than the GPU. But it came with a nicer interface and a cost to run of practically zero. So I spent some more time testing articles. Originally it listed every paragraph and the results, but for the reasons given earlier I didn't like that so much and wanted the focus to be on the Overall Document Analysis. I didn't want to get rid of it completely though, so collapsed it under "View Paragraphs". Feel free to play with the live AI Text Detection Tool to see how it works. You can use it as much as you like, no limits. The speed was not great, so I worked on making it work with web workers. This basically just means that we can run multiple threads in the background and therefore increase the speed at which we get a result. However - all this proved to be too much for mobile phones. Firstly a library bug meant I had to use an older version of a library, and secondly the web workers weren't playing well. Ultimately I decided to make a second, simplified, interface for mobile. This is slower, it works and I think the tradeoff is worth it as this is really a tool where most people will copy and paste documents on a desktop computer. ### The Humanizers Myth If you look on the internet you will see no shortage of tools that claim to "Humanize" AI text. Even some of The Crawl Tool 's competitors have them. To understand these we first need to understand that the AI model that detects AI text content is just looking for patterns and indicators in the text it is given in order to try to classify it as human or AI. If you have ever used one of these "Humanizer" tools, you might have been surprised to see how often they create text that doesn't seem human at all. This is because they don't make things human, they simply try to obfuscate the patterns and indicators in the text in a way that doesn't get detected by AI text detection models. The problem is that it is a program doing this, not a human. As it's a program it will put its own patterns and indicators in the text. It's just that no detector available to the public is looking for those particular patterns as it has never been trained on them. Content from these humanizers doesn't read any more "human", often less so. So there's only one reason for using them - to trick search engines. The issue here, of course, is that if it is a successful strategy and these techniques were used then it is somewhat trivial to train models also on that "humanizer" output. A simple model can be done in a day! That it passes known ai text content detection tools now, doesn't mean it passes non public ones companies may own, and it doesn't mean it will forever. It's not "humanized" and at best it's short term content waiting to be wiped out in the future. ### General Uses of The Tiny Model Traditional AI text detection tools take lots of resources to run, making them costly. Whilst they have free plans, they eventually need to be charging you to access. I'm not sure those costs are inline with the amount of utility they are providing. Whilst it is possible to get a fairly accurate overall score for an article, a lot of the other features (especially at a more granular level in terms of the text) are essentially worthless. This tiny model is virtually cost free to run because it runs in the web browser, though its size means it is likely less accurate than commercial models, doesn't do languages other than english (which other models may) and underperforms in some cases (e.g. news). This makes it's main benefits: ### Going Forwards... I like the idea I've mentioned a few times in this article of using a multi model approach, where the smaller model determines if analysis goes on at a deeper level. For that reason, I might consider making a larger model. There are a couple of stages that we might be able to integrate this into The Crawl Tool . But this isn't without some hurdles to be solved before that's possible. So I'll be looking into the how and the what the crawl tool can do with this. Once again, check it out here! #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # SEO Content Optimization: How to Write for Search Engines and Readers [SEO Content Optimization: How to Write for Search Engines and Readers](https://www.thecrawltool.com/blog/seo-content-optimization-how-to-write-for-search-engines-and-readers) ##### The Crawl Tool Team ## SEO Content Optimization: How To Write For Search Engines and Readers Ensuring your content is aligned with the topic and keyword phrases your audience is looking for is essential to SEO. Many will suggest other things, such as links, are important. These people are not wrong, but they do tend to underestimate the importance of aligning your content in the mix. At play is also a balance between how much are you writing for search engines and how much are you writing for users. Because search engines are not perfect estimators of the applicability of a web page to the audience, this will always be a factor that you're going to have to deal with. To what extent do you write for the search engines and to what extent do you write for users? Is there a way to write for both without sacrificing quality? ### Write What You Know About Traditionally for SEO content optimization you would pick a keyword or keyword phrase you are using and write your content around it, using that keyword phrase a few times but not too often that it would be considered spam . Quite where the boundary of spam is has always been vague. For computers this is difficult to set, but as humans we can get a view for it. This gave rise to E-E-A-T: Experience, Expertise, Authoritativeness, Trustworthiness . This is used, for example, by human assessors at search engines to check a page. In recent years it has been picked up as the standard of what you should be working to as if this is what search engine algorithms judge. But like when is "spam" actually spam, these things are difficult for algorithms to judge (with the possible exception of Authoritativeness which can be done on links). If you think about it both "Expertise" and "Trustworthiness" aren't something that can even be judged by humans. Expertise is really only judgeable if you are also an expert in the same topic and there are countless examples of large organizations that appear trustworthy that turned out not to be. Whilst the extent to which search engines use E-E-A-T is therefore questionable, it should be pointed out that they are a great way to make pages for users. It's therefore important not to ignore this but to use it as a set of guiding principles. If your articles can show experience, expertise, for example, then you are likely to gain authoritativeness in the form of people linking to you. Arguably we could simplify this to the author should be an expert in the topic, or write about what you know about. A natural extension of that is that unless you have vast amounts of Authoritativeness (links) then your site should be focused on a topic, preferably as tightly as possible. You need to cover a variety of areas in your topic or your site isn't going to be interesting for users (and we want them to read your blog post or article and click onto another one) but you want to steer clear of things that are outliers if you aren't heavily linked yet, there'll be time for that later. ### Keyword Phrases Still in these times there's a lot of talk about people targeting some or other keyword phrase for a web page. But times have moved on. It's not entirely wrong but it's not the correct way to do it. With the topic of the article being more important, it's unlikely that repeating a keyword or phrase works anywhere near as well as some people seem to think. Traditionally many tools, always with inaccurate estimates, will encourage you to do through the keyword phrase that brings the most traffic in comparison to an estimate of competition and how many links your site has. You would then use these. This we would suggest is a mistake as it is writing for the search engines. Your first step should be to define your audience. Who are they? Why are they interested in reading this article or blog post? What keywords and phrases are these specific people likely to search for if they are interested in learning about the topic of the article? Pick a handful of the best keywords, some shorter and some long-tail and use all of these. Why? Your aim isn't to simply attract visitors but those interested in the topic you're writing about. Because these keyword phrases are topically close, you're helping the search engines to know what the page is about to send the correct audience, but the focus of your writing is still entirely for that audience. Double win! The only disadvantage is that you're not really going to have on of those highly inaccurate traffic estimates. ### Structure and Style It goes without saying that your article and blog post should be structured well and easy to read. This means using appropriate headings, something that both helps search engines and your audience. But it also means using appropriate language for the topic. Trying to sound overly clever is going to make it difficult for your audience to understand and for each engines to determine what it's about. ### Easy! That's it. It's that easy. The key to writing for search engines without sacrificing quality is to have an expert in the topic writing the text. The article or blog post should fit with others on the website, complementing them. It should focus on a particular, defined, audience and use a handful of phrases that they are likely to search for in the text. It should be well laid out and readable, and use language that is appropriate to the intended audience. If you do those things, you can't help but have content that is optimized for both search engines and visitors! #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # The Importance of Topic Relevance for SEO [The Importance of Topic Relevance for SEO](https://www.thecrawltool.com/blog/the-importance-of-topic-relevance-for-seo) ##### The Crawl Tool Team ### Understanding Topic Relevance in SEO What is topic relevance in seo? An article, such as a blog post, should be about a certain subject - known as a topic. It is important that the article is tightly focused around a specific topic, enables search engines to use semantic algorithms to know what the article is about and if the keywords used are relevant. Over time the idea of keyword stuffing articles has fall out of practice. The development of semantic algorithms means that ranking is able to understand more than just a direct match between the words a search engine user types and those listed on a page, they take into account meaning. This fundamentally changed the way that search worked. The intent of the search and the semantic meaning became more important. While keywords and keyword phrases are still important, the topic in general and their relevance to it (and therefore other phrases on the page) has taken over. ### How to Enhance Topic Relevance in Your Content Understanding what your users are searching for is key here. You want just not the keyword phrase you will targeting, which will still be the main focus of your article, but all relevant keyword phrases. You will want to use your main phrase, but also sprinkle in these other phrases to help keep your article focused on your topic. This should be both part of and feed into your methodology of creating content. Understanding the associated keyword phrases people might be looking for should give you ideas for what you should be writing about. Using these keyword phrases can therefore expand your content, whilst keeping it on topic. Because of the semantic connection, this helps your main keyword phrase be more relevant to the article as a whole. ### The Impact of Topic Relevance on User Experience This semantic topic relevance is a win-win. Your article will cover the topic more fully whilst maintaining a tightness and topic relevance that means that visiting readers will be getting what they expected. The reader ends up with a positive view of your site and what you have to offer. That can have immediate benefits, they may decided to buy something! Or it can have long term benefits where it improves their perception of the site as a brand and increases the possibility they'll return - hopefully by clicking on search results in the future. Not only that, but well written articles about a specific topic are more likely to be shared on social media or to get those precious backlinks. ### Measuring the Effectiveness of Topic Relevance for SEO Describe tools and metrics used to measure topic relevance and their impact on SEO, such as Google Analytics and search engine result rankings. There are few tools to help measure this topical relevance. The Crawl Tool does it in several ways. The Crawl Tool provides AI based similarity scores for pages to a site average. This allows you to see how broad you are generally. That's great for an overview. The new Writer feature works with Wordpress. From within The Crawl Tool you can edit or create posts. Clicking the "Calculate Page Info" button will highlight keywords that an AI algorithm finds topically relevant, it will calculate the relevance of the title for you, and indicate "Low Relevance" content in your article. Cleaning up that "Low Relevance" content is a great start, as it nearly always leads to an increase in the relevance of the article as a whole. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # The Offsite JS Scripts Report [The Offsite JS Scripts Report](https://www.thecrawltool.com/the-offsite-js-scripts-report) ## The Offsite JS Scripts Report This report lists all .js urls listed on all pages across the site. This is useful if you want an overview of what scripts the site is running or if you want to see if it is running a particular script that may have a security issue. For performance reasons, it is recommend to keep the number of offsite scripts as low as possible. # Avoiding the Temptation to Over-SEO [Avoiding the Temptation to Over-SEO](https://www.thecrawltool.com/blog/avoiding-the-temptation-to-over-seo) ##### The Crawl Tool Team ## Avoiding the Temptation to Over-SEO There's a certain balance between optimizing your content and over optimizing it. One that's important to get right if you want to enjoy success with your SEO. Just because you can do something, doesn't always mean you should. Overdoing things can sometimes lead to penalties. You can generally notice these points because what you're doing often also leads to a bad user experience. Here's a guide to five areas where you should strive for a good balance. ### Prioritize Quality Content for Your Audience Think about it - why are you creating web pages to start with? For your audience and because you want your audience to learn something or do something. You should focus your SEO practices to your audience too. This is a long way of saying - write your content for your audience. Use relevant keywords and don't use misleading ones just because they're higher traffic. Don't keyword stuff. Just write well researched and engaging content and use keywords naturally in your pages. ### Keep Your Website’s Footer Clean and Organized You've seen websites with about 10 million links in the footer right? Did you read them all? Were they of any interest to you? The print size is often so tiny that you'll need to get a magnifying glass out just to be able to read them and decide if they're relevant to you, which they're invariably not. Don't stuff your footer (or your head) with links. Keep it for the most important links that you're visitor is most likely to want to visit next. ### Use Natural and Descriptive Anchor Texts Anchor text is the text inside a link. It's a key indicator to search engines about what the topic of the linked page is. It's tempting to stuff these with exact-match keywords as the search engines will then rank them right? Wrong. Over time the theme has become more and more important in comparison to the exact text. Sure, use your keywords in your link anchor texts but use them when they fit naturally. The text in the anchor text should help users know a bit about the page they'll be sent to if they click the link. ### Focus on High-Quality Link Sources We all know that links are the magic juice that pumps up your web pages search engine rankings. But do you really need every link you can find? Should you get a bulk link creation service to create you a million links from anywhere? Increasingly the quality and topic of a link is important. Focussing your time on high quality links that are relevant to the topic of your site/page is a far less risky strategy, with more benefit for the time spent. ### Make Every Page Valuable Every page should have a purpose. It should provide information, add value, and/or solve a problem. Creating pages just to add more pages is an unproductive time waste. Instead find the gaps in your content and fill them in. Write on-topic, helpful, content. Your audience will love it, and the search engines will too. Please don't think here that we're saying don't do anything SEO. If you log into The Crawl Tool and do an audit then you'll find plenty that needs doing. What we are saying is that often these things go hand in hand with improving your site's user experience. For example, when you fix some broken links. It's a good way to guide your work in order to not waste time or push things to far - ask yourself "does this improve the user experience of my visitors?" #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Introducing the Smartest SEO Crawler - The Crawl Tool [Introducing the Smartest SEO Crawler - The Crawl Tool](https://www.thecrawltool.com/blog/introducing-the-crawl-tool) ##### The Crawl Tool Team ## Introducing The Crawl Tool Websites are difficult to manage. Very quickly they have lots of links that might break, and there are ever changing demands from web browsers developers, search engines, and social networks. This can quickly lead to there being thousands, tens of thousands, of even hundreds of thousands, of things to check. If you want to give your user's a good user experience, avoid browsers telling them your website is insecure, and rank well in the search engines then you have to be on top of these things. ### A Broken Market The obvious answer to this problem is to use software tools to help find these issues. But looking at the market it is filled with tools using archaic technology that you pay for the pleasure of using your own resources, or tools that you need a second mortgage on your house for that consist mostly of re-worked public data. To justify the costs these tools flood their users with un-necessary and unactionable data and when that becomes too much often resort to undefined out of 10 scoring systems that reduce any informational utility even further. Some claim to offer the fastest crawlers, but don't tell you that for the average site you don't want that as it will take your site down. Arguably the biggest, most trafficked websites, are catered for. But your regular small to average sized site, or person managing those sites, is forced to shoe-horn itself into one of the offerings. ### A Year in Development Enter "The Crawl Tool". The Crawl Tool takes a different approach. The approach is to define who the users are (the SEOs, the agencies, the webmasters, the business, run small to medium websites, your average website) and ask the question at every step if the development is meeting a user need. Each developed feature is also tested against The 3 Principles . * Simplicity - that's not to say that it's missing things, but that The Crawl Tool strives to present what is important in an easy and actionable way. Anybody can use it - whether that's yourself, a tech team, or your work experience intern. * Low Cost - The Crawl Tool charges for what you use and keeps costs low so you pay less. The philosophy is that it should be accessible to small business and starters too. * Ease of Management - presenting you with data is great, but The Crawl Tool wants to make it easy for you to manage fixing any problems. ### Like No Other By leveraging modern technologies The Crawl Tool is able to abstract the technicalities of crawls and building the reports from the user and simplify it to a website where you can easily runs these crawls and reports with a button click. The reports contain useful, actionable, data with no fluff. They're simple to use both for yourself or if you want to delegate the work. And we provide functionality to help you manage actioning any fixes you need to do. You want your website or the websites you manage to have great user experience and on-site SEO. So does The Crawl Tool. And the big bonus is that by leveraging modern technologies, The Crawl Tool drives down the cost of creating these reports and providing this functionality. A cost reduction that is passed on to customers with the lowest prices amongst competitors. Significantly so. You pay for what you use, not marketing. ### Check It Out If you've not already registered, then you can get 1000 crawl credits for free on registration. So check it out at https://www.thecrawltool.com/ . As a user focused business any suggestions and feedback you have are important feedback@thecrawltool.com . Genuinely, your feed back will be listened to. There might even be a little reward for helpful feedback. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Introduction of Linking Domains Functionality [Introduction of Linking Domains Functionality](https://www.thecrawltool.com/blog/introduction-of-linking-domains) ##### The Crawl Tool Team ## Introduction of Linking Domains Every few months the common crawl releases data on how domains link to each other. We call that "linking domains". The Crawl Tool is primarily an on-site SEO tool, because there's an argument that if you're looking to acquire links that are on topic then simply searching is a good way (perhaps even the best way). However, an overview of linking domain data can be handy in understanding why a site ranks the way it does and to understand which aspects of the linking of the site are most important. We, therefore, don't want to rule out having link related functionality completely. A new feature in The Crawl Tool is therefore a list of linking domains. This will show on crawling a project, assuming that our database of linking domains contains one item to the site in the project. Keep in mind here that if the site in the project is less than three months old, it may not have any backlinks in the database yet. When we refresh the database from the crawl in a few months time, if there are links that have been found then they'll start to show after subsequent crawls. In most cases, this means if you're an existing user it will show once you've recrawled your project by clicking the crawl button, or after the next scheduled crawl (if you don't have scheduled crawling set up, what a great time to do so !). Scheduling a crawl is the easiest way to keep your data up to date and to get new reports automatically. New users will see the report available after the first crawl. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # What Do Too Short and Too Long Titles Mean in SEO Tools? [What Do Too Short and Too Long Titles Mean in SEO Tools?](https://www.thecrawltool.com/blog/what-do-too-short-and-too-long-titles-mean-in-seo-tools) ##### The Crawl Tool Team ## What Do Too Short and Too Long Titles Mean in SEO Tools? Most SEO tools, The Crawl Tool included, will tell you when titles are too short or long. But what does this mean? If you've been using The Crawl Tool for even a little while you may noticed comments about issues on a site in the Titles report or this "Title Best Practice" box on the project dashboard. At The Crawl Tool we're about giving actionable information to improve your site; to action "too short" and "too long" titles means understanding what SEO tools are talking about here. The Title tag has long been known to be a significant on-site ranking factor. And why wouldn't it? It's the very first description of the page. It's shown on the browser tab to the user, but in a world where people increasingly have about 10 million tabs open it is arguably less seen. More importantly it's shown, mostly, as it is written in search engine results and so can impact how many people click through to your page. ### Too Long Let's start with too long. The general consensus for a maximum number of characters in a title is 60 characters. But why? It's unlikely that title length is a ranking factor, maybe if a title is absurdly long it might be. What we are talking about here is title clipping. Beyond 60 characters the search engine may choose to cut off the rest of the title, and they way they do so may vary between engines and over time. If we think back to what we said about search engines using the title on your search result then we can see that in the overwhelming majority of cases this will make your title less clickable. For this reason, you will virtually never want your title to be more than 60 characters. The are perhaps times when you want clipping for mystery, but in that circumstance you would want to control it. Another circumstance is where you're a bit early but you feel that the long title contributes to describing the topic, whilst the beginning of the title is enough to convince a user to click even when clipped. In The Crawl Tool you can click on the number next to "Too Long" or go to the Titles report and choose "Ignore", to - well - ignore it. But these should be the rare cases. ### Too Short At The Crawl Tool we set a too short title as being less than 40 characters. Where a title is too short, search engines have been known to experiment with things such as using a heading tag instead. So there is a small apect of having the ability to control how your search engine listing appears here. However, on the whole this criteria is less about visible things than a long title. Because the title of a page gives search engines clues about what a page is about, you want your title to be long enough to contain that information. A "Too Short" title is then, on the whole, an indicator that you should consider lengthening it to add more keywords and information and that you are probably wasting an opportunity to do so and to rank better. For that reason, "Too Short" is much more of a suggestion than "Too Long". You should feel free to choose "Ignore" in The Crawl Tool if you feel that the title is already descriptive enough. ### A Suggested Process for Title Issues There's a third issue - "Missing" which we should talk about in process. If a title tag is missing then that's your first job - give it one! From there, we'd suggest that you should look at titles which are "Too Long". Virtually all of these you will want to shorten whilst keeping the same meaning in them. Your thought process should also be about making them as attractive and clickable as possible - if they were shown to you target visitor search listings would they click on them? Finally you should look at "Too Short" titles. We'd suggest that you start by ignoring ones that can't really be changed or that you are unconcerned about rank - for example, "Privacy Policy", "Cookie Policy", all come in too short but giving them a different title isn't going to help and you're normally not too concerned about ranking those pages in search engines. Then, for each remaining Title with a "Too Short" issue attached, think about if there's a longer, more descriptive, title that could replace it. If you can think of one - great. But if you feel the title is already descriptive enough and you can't improve it, then select ignore. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Public API [Public API](https://www.thecrawltool.com/public-api) ## The Public API Sometimes you may not want to use The Crawl Tool to access your data, but may want to access it programmatically instead. It's your data - so why not? The Crawl Tool public API allows you to do exactly that. ### Forming Requests All requests need to pass your The Crawl Tool API key as the bearer token on the request. You can find this by logging into The Crawl Tool, dropping down the menu with your name at the top left, and selecting settings. The API key is on the settings screen (currently under the title "Wordpress Connection" but this will change soon). Or email support@thecrawltool.com for assistance. The API is a JSON rest API. The API endpoint is https://www.thecrawltool.com/public_api/Followed by the function. We currently have three functions: getProjects getReports getReportRows Each API request costs 1 credit from your The Crawl Tool account to make (this nominal amount is just to encourage efficiency in making requests) ### getProjects The getProjects call will return the projects on the account that matches the API key in the bearer token. curl -X GET https://www.thecrawltool.com/public_api/getProjects -H 'Authorization: Bearer ' -H 'Accept: application/json' Will return json like: {"success":true,"data":[{"id":x,"name":"Sitename","baseurl":"https:\/\/somesitenamehere","crawling":0},{"id":x,"name":"Sitename","baseurl":"https:\/\/somesitenamehere\/","crawling":0}]} The key fields are: id - the project id is a numeric identification for this project. name - the name of the project baseurl - the project base url (the url it starts crawling from) crawling - 0 or 1. If this is set to 1 then the project is currently crawling. Report data may be changing whilst crawling, you should probably wait! ### getReports The getReports call will return the reports available on a given project id (from getProjects). The caller must pass the project_id in the request json body. curl -X POST https://www.thecrawltool.com/public_api/getReports -H 'Authorization: Bearer ' -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"project_id": }'Will return json like: {"success":true,"data":[{"id":x,"name":"Broken Links"},{"id":x,"name":"Meta Descriptions"},{"id":20,"name":"Meta Keywords"}]} The key fields are: id - the report id is a numeric identifier for the report name - the name of the report as it appears on the dropdown at the top in The Crawl Tool. ### getReportRows The getReportRows call will return the rows from a report for a given report id. The caller must pass the report_id in the request json body. The return has a limit of 100000 rows return. The caller can optionally pass the page parameter in the request json body. curl -X POST https://www.thecrawltool.com/public_api/getReportRows -H 'Authorization: Bearer ' -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"report_id": }' Will return json like: {"success":true,"data":{"rows":[{"cell_1":"Source URL","cell_2":"Broken Link","cell_3":"Anchor","cell_4":"","cell_5":"","cell_6":"","cell_7":"","cell_8":"","cell_9":"","cell_10":"","cell_11":"","cell_12":"","cell_13":"","cell_14":"","cell_15":"","cell_16":"","cell_17":"","cell_18":"","cell_19":"","cell_20":""},{"cell_1":"data1","cell_2":"data2","cell_3":"data3","cell_4":"","cell_5":"","cell_6":"","cell_7":"","cell_8":"","cell_9":"","cell_10":"","cell_11":"","cell_12":"","cell_13":"","cell_14":"","cell_15":"","cell_16":"","cell_17":"","cell_18":"","cell_19":"","cell_20":""}],"total_rows":3,"current_page":1,"total_pages":1}} The key field are: Total rows, current_page, total_pages - allow you to calculate if you need to send the optional page parameter to fetch more rows. Under rows - each column is labelled cell_1, cell_2, cell_3 ... cell_20 . rows contains all rows and these columns, making up the report. Each report varies by what data is in what column. The first row will always contain the headers of the report columns. ### Future and Support This API will likely expand over time, for now it should allow access to all data from your crawled projects. If you have any issues or questions, please contact support@thecrawltool.com # Scheduling Crawls [Scheduling Crawls](https://www.thecrawltool.com/scheduling-crawls) ## Scheduling Crawls By default you have to click the "Start Crawl" button in a project to start a crawl running. When you have a project up and running, you may wish to crawl the site more often. A scheduled crawl allows you to pick a day of the week to crawl the site automatically. You'll get an email when the crawl is done. If you work on problems with your site on a fixed schedule, this is a great way of having data available in time without doing anything. Within a project you want to choose the "Schedule Crawls" option from the three dots menu at the top right. Simply choose a day of the week and click Save. If you want to cancel it later, come back and choose "Never". # The Cookies Report [The Cookies Report](https://www.thecrawltool.com/the-cookies-report) ## The Cookies Report Modern cookie law and requirements is a problem for website owners because it's difficult to know exactly what cookies your website is pushing. You've probably installed a cookie manager to get permissions but how do you know that the website still isn't pushing cookies when, for example, a CDN like cloudflare can push cookies without you knowing? During a crawl with The Crawl Tool, it logs all cookies that a web browser would have been asked to store. It doesn't agree to cookie manager permissions, so it shows you the cookies that are being pushed outside of that. Giving you that picture of what is going on. ### Cookie The name of the cookie. ### Value The value that the cookie contains ### Domain The domain for which this cookie is in scope (applies to). There could, for example, be multiple cookies with the same Cookie name but on different domains. ### Path The path the cookie applies to. ### Expires The expiry date of the cookie. When the crawler starts it has no cookies stored, so when taken relatively to the crawl time this can give you an idea of how long a cookie is set to last for. # The Meta Keywords Report [The Meta Keywords Report](https://www.thecrawltool.com/the-meta-keywords-report) ## The Meta Keywords Report Practically nobody uses the meta keywords tag anymore, the value of spending time on doing one is questionable. But when doing competitor analysis they may have one which gives invaluable insight into what keywords they are targetting. ### Source URL The URL that the meta keywords are to be found on. ### Keywords The contents of the meta keywords tag. For your site, this will likely be blank. For competitors sites you hope they're stuffed with helpful keywords. Tip: you can export them to xlsx or Google Sheets to post-process how you wish. ### Issue Any issues, such as two meta keywords tags. # The Offsite JS By Page Report [The Offsite JS By Page Report](https://www.thecrawltool.com/the-offsite-js-by-page-report) ## The Offsite JS By Page Report This report lists each Source URL found and the scripts that exist on them. A common way to use this report, for example, would be to type a script name into the "filter text.." filter in order to limit it to one of those scripts and the source URLs will be a list of every page on which it occurs. ### Source URL The source page the script is loaded from ### Script The URL to the remote script # AI Text Detector [AI Text Detector](https://www.thecrawltool.com/ai-text-detector/) ## AI Text Detector Welcome to the AI Text Dectector. This is a small AI model that is trained to detect text that is not human written. Uniquely it runs entirely in your web browser. Traditionally such models are run on expensive GPU servers, but the size of this model means it can run on the GPU in your computer, phone, or table. The size means it may not be as accurate as its bigger brothers, but it also means that you can use it as much as you like to check texts, free of charge. Paste any text you would like to check into the box above and click the analyze button. The model will take a few seconds to analyze the text and then make its assessment. On the whole, the model is fairly accurate - scoring 90%+ in tests. But there are areas, such as news articles, where it will often report something human written as AI. # Applying Robots.txt [Applying Robots.txt](https://www.thecrawltool.com/applying-robots-txt) ## Filtering Using Robots.txt On reports where the Source URL is present, The Crawl Tool provides a special kind of filter to allow you to see what pages would be allowed or blocked by a robots.txt, according to the code from Google's googlebot. A button titled "Robots.txt" appears at the top. If you click on it then a side panel will open on the left hand side for the various options. ### Enter robots.txt The "Enter robots.txt" textbox allows you to write or paste in a robots.txt file to use. This is useful to check if a robots.txt file or changes will work as expected before actually making it live on your site. ### Fetch robots.txt The fetch robots.txt button grabs the robots.txt from the live site and fills it into the "Enter robots.txt" textbox for you. Useful to analyse the current situation and it saves time when you want to test changes. ### Filter by Allowed Clicking this button applies a filter to the data that will only show the lines where robots are allowed to crawl the pages. ### Filter by Blocked Clicking this button applies a filter to the data that will only show lines that robots are blocked from crawling. This is useful to easily see what you are excluding. ### Show All Click this button will turn off any previous "Filter by Allowed" or "Filter by Blocked" to show all data lines irrespective of the robots.txt # Assigning Status and Assignee [Assigning Status and Assignee](https://www.thecrawltool.com/assigning-status-and-assignee) ## Assigning Status and Assignee A unique offering of The Crawl Tool is that it helps manage tasks, whether you're an individual working on it or a team. ### Setting Status Clicking on a cell in the Status column will bring up the status menu. These are mostly self explanatory. The statuses will show up elsewhere (such as on the summaries on the project page). Because of this some slightly different function for two of the statuses is worth mentioning. "Ignore" - will cause the row not to show on the default "All (except ignore)" or count on the summaries on the project page. Essentially hiding it unless you set the status filter to "Ignore". You can use this for rows you aren't interested in and want to hide. "Done" - marks a task as Done, indicating you have fixed something, so the Due Date would not count on the project dashboard summaries. ### Setting Assignee Setting the assignee is very similar to setting the status. If you click on a cell in the column then a menu will appear. This menu will also have your name on it, so you can assign tasks to yourself. Additionally it will have the names listed on any Access Keys you have granted to other others if they can access this project, so you can assign it to them. If you don't see one on this list that you are expecting, it's worth checking the access key has access to this project. With the Assignee set, the assignee filter can be used to show items for that person. A handy way to see your own tasks! # 4 Free Tools That Will Revolutionize Your SEO Strategy Overnight [4 Free Tools That Will Revolutionize Your SEO Strategy Overnight](https://www.thecrawltool.com/blog/4-free-tools-that-will-revolutionize-your-seo-strategy-overnight) ##### The Crawl Tool Team ## 4 Free Tools That Will Revolutionize Your SEO Strategy Overnight Being competitive when it comes to SEO means having the right information at your fingertips, and having the right information means having the right tools. If you're not using them, then your competitors likely are. In this blog post we'll take a look at free SEO tools that can help you improve your site SEO and start competing for those top spots in the search engine rankings. By using these tools you can improve your SEO strategy and start improving your site and rankings today. ### 1. The Crawl Tool Naturally we start this article with the best. Easily find technical issues such as broken links and missing titles. Analyse metadata, fix duplicate titles, perform audits in no time, and schedule them to happen automatically. The Crawl Tool is the best software for technical SEO and on-site SEO. And best of all is that it's cloud based, so there's nothing to install. The generous free plan keeps small and medium sites going for months. Visit The Crawl Tool . ### 2. UberSuggest When using Keyword suggestion/research tools it's important to be aware that these are, at best, estimates based on historical data, inaccurate collected data, or by using PPC data. In our opinion it's better to take an approach that's more based on what the intended visitor might search for than the old fashioned idea of difficulty/audience size. But if you're going to use a keyword research tool, UberSuggest is a good option. As long as you can tolerate the popup, you can get the keyword difficulty and volume estimates you're fairly simply. For what it does, that's a great tool. It bases the data on PPC data, which is probably the least worse option. Visit UberSuggest . ### 3. Answer the Public: Content Ideas and Keyword Insights Answer the public shows you what questions people are asking when given a keyword. It provides up to 3 of those again. It's similar to the keywords suggestions report in The Crawl Tool if you filtered by question words. But it doesn't need too gather data first so if you've got a small amount of keywords to generate questions for it might be worth a try. Visit Answer the Public . ### 4. Rank Math: Advanced On-Page SEO for WordPress We have plans for a The Crawl Tool wordpress plugin to help connect WordPress to The Crawl Tool for SEO. But until that point what can you do? The Rank Math pluging for WordPress could be your temporary measure to help. As well as integrating with Analytics, it can give you Google data and various statistics about your posts. It can take care of Schema's, etc. Like many plugins, the basics are free but for the most useful functions you need to cough up some cash :( Visit Rank Math . So there you go - 4 free tools you can use to improve your SEO overnight. We reckon that one of those is far superior to the rest and are constantly working on adding new features. But where we don't have the functionality quite yet, why not check them out in the meantime? #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Amazon AWS Startup Founders Programme Credits [Amazon AWS Startup Founders Programme Credits](https://www.thecrawltool.com/blog/aws-startup-founders-programme-credits) ##### The Crawl Tool Team ## Amazon AWS Startup Founders Programme Credits If you're a small startup or business then things are tough and it often seems like a continuous struggle to get big tech to recognize you. That's the case, also, with The Crawl Tool . But it's not all bad news as some big tech does have ways to support people. The Crawl Tool joined Amazon AWS's Activate Founders programme and is pleased and grateful to have received credits for their services under their "AWS credits packages for early-stage startups" offering. The Crawl Tool makes it easy to find website user experience or on-site SEO issues and to provide that ease and keep costs low it leverages cloud based infrastructure. This enables the provision of a better, easier to use, and cheaper way of finding website issues and fixing them. Consequently the overwhelming majority of the costs are in the tech stack. These credits are, therefore, a major factor in enabling The Crawl Tool to charge low prices to customers and to run it's own promotional credits and credits for good causes programmes. With the added benefit that, as the costs are covered, customer's won't be learning to use a tool that might disappear tomorrow. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Basics of Technical SEO for New Website Owners [Basics of Technical SEO for New Website Owners](https://www.thecrawltool.com/blog/basics-of-technical-seo-for-new-website-owners) ##### The Crawl Tool Team ## Basics of Technical SEO for New Website Owners For new website owner's it can be confusing. What are all these different types of SEO? While content and backlinks are important, in this article we want to get you familiar with the aspects that make up Technical SEO. The Crawl Tool covers many types of SEO, but a large part of it is technical SEO. So this will help with your understanding and use of the tool if SEO is new to you. ### What is Technical SEO? Technical SEO is about ensuring your website functions well for both search engines and users. It's a win-win in the sense that technical SEO helps with ranking in the search engines but also improves your user's experience of your site. Rather then focusing on content and backlinks, technical SEO focuses on infrastructure, navigation, and speed. ### Why is Technical SEO Important? It's common to hear one part of SEO is more important than another (particularly from people who make tools that cover that one part), whereas the reality is that good SEO is a blend of all these techniques. What technical SEO brings to the mix is: It improves how crawlable your pages are, resulting in more opportunities to rank in the search engines and bring in visitors. It enhances user experience. Who doesn't like a fast site? Slow sites are proven to put user's off. It boosts rankings. ### What is a crawler and How does it work? To understand why the things matter better, it's important to understand what a crawler is. A crawler is also known as a spider, or simply a bot. It follows links from pages, gathering data. Search engines, therefore, use crawlers to build their search databases to present results from. The Crawl Tool uses crawlers to gather useful information to help you make your site better. ### What is a robots.txt file? A robots.txt file lets you give crawlers some directions when crawling sites. You can specify what pages can be crawled and what pages it should not crawl. This is a useful for technical SEO as you can ask them not to crawl things you don't want them to but also things there is little benefit to you having them crawl (thereby making them focus on crawling your important pages and adding them to the search engine database). ### What is an XML Sitemap An XML Sitemap can be viewed as the opposite of a robots.txt. Where a robots.txt generally asks the robots.txt file not to crawl something, an XML Sitemap lists all possible urls that you do want a search engine crawler to find. This is advantageous for search engines because they can fetch this list and know what there is to crawl, without necessarily having to follow each link. For you the advantage is that a search engine is likely to list new content quicker. ### The Importance of Speed One important aspect of Technical SEO is website speed. You might ask yourself - how can I improve my website's speed? The are a few easy wins here: Choose a fast hosting provider! It sounds obvious, but it is one of the biggest causes of slow or better speed for a website. Use gzip compression - hopefully your hosting provider has this set for you. Some aspects of the design/coding of the website can help: Use browser caching whenever possible. Minify CSS and Javascript files. Compress images to reduce their size. ### Mobile Friendly Websites As time has gone on, having a mobile friendly website has become an important aspect of technical SEO. The majority of search engine crawl requests now come from their "mobile" crawler. It's not an actual mobile phone, but it's looking at pages from that perspective. Things that are critical for mobile - like small file sizes, and a UI that is easily tappable on mobile screens, are therefore also very important for SEO. ### What is HTTPS and Why Should I Use It HTTPS is the secure version of HTTP, the method where browsers fetch and receiving web pages. As time has progressed, nearly all sites have switched to HTTPS. This is because it encrypts data from the client to the server. Even in cases where this encryption doesn't serve any real world purpose (no confidential data is being sent), people tend to use HTTPS as search engines and web browsers often ignore or block pages without. There is, of course, much more to it. But as a new website owner, understanding these basic terms and principles will take you a long way. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Discovering Cookies With The Crawl Tool [Discovering Cookies With The Crawl Tool](https://www.thecrawltool.com/blog/discovering-cookies-with-the-crawl-tool) ##### The Crawl Tool Team ## What Cookies is Your site Pushing? You've set up your website, worked on your cookie and privacy policies, added a cookie permissions script, and you're all set and compliant with cookie regulations right? European regulation, in particular, is strict and sites and tools often tell you they are GDPR compliant when they aren't. A common thought amongst web analytics tools, for example, is that if they just store identifications in a local database then it's compliant. Not true. The same temptation occurs with cookies. If you just add a cookie permission script then are you not pushing cookies anymore? Many sites use, for example, a CDN like Cloudflare to speed up their site. It might surprise you to learn, as it did us, that CDNs like Cloudflare can add cookies to a user's browser that you might be unaware of. These cookies are to enable various Cloudflare features and services, which require tracking of users. Under EU law, these should show in your cookie policy. But because, in this case, it's your CDN that's doing it - these cookies have no awareness of whether a visitor has agreed to them or not. That's just one example, there could be other cookies sailing past your cookie permissions script without you knowing. ### Enter The Crawl Tool When The Crawl Tool crawls a site it starts by not knowing any cookies. You can think of it as a browser that has just been installed or cleared to its default state without cookies, that then visits every page on the site. To help with the problem we've just described, one feature of The Crawl Tool is that we store a list of each cookie that something tries to install into the browser's cookies during our crawl. The crawler doesn't agree to cookie permissions, so this is a list of cookies that are being pushed anyway. The Cookies Report in The Crawl Tool them shows these. In some cases, such as XSRF tokens (for security) and session cookies for operation, we can consider these essential cookies for site operations. But in an ideal world this would be empty. In any case it allows you to look at the cookie names, the data pushed, for what domains and to decide if these are essential cookies that should've been set or not. If they shouldn't then the domain column should give you information about where to start looking if it is an external script that has caused this. The Offsite JS Scripts Report may useful in combination with this to isolate which script. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Enabling Devs to Check Sites for Errors Before Production [Enabling Devs to Check Sites for Errors Before Production](https://www.thecrawltool.com/blog/enabling-devs-to-check-sites-for-errors-before-production) ##### The Crawl Tool Team ## Enabling Devs to Check Sites for Errors Before Production When developers go live on production then the site should be working properly, this includes user experience and on-site SEO issues. But here you often hit an issue - how do you check a site when it's not live on the production server? Often times sites are in a staging environment first but these are generally hidden behind http authentication to keep nosey web surfers and crawling bots out. A new feature in The Crawl Tool solves this problem by adding the ability to set a username and password for http authentication. It can then use this to crawl your staging site to check it for issues. Here's how... When you're creating a project (through the "Add Project" left menu item), fill in a name and the url as usual. When you click "Create Project" the crawl tool will check the URL you entered. Because the server will respond asking for a username and password, The Crawl Tool will adjust the form to ask for these. It's as easy as filling in the correct username and password for the site and then clicking the Create Project button again. This time the tool will perform the check using this username and password, which should create the project. When a crawl is performed then The Crawl Tool will also use this username and password to crawl the site and gather the data. Bingo! You get The Crawl Tool user experience and on-site SEO reports even on a staging site behind a username and password. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # An Introduction to The New Header Tag Report [An Introduction to The New Header Tag Report](https://www.thecrawltool.com/blog/introducing-the-header-tag-report) ##### The Crawl Tool Team ## What Are Header Tags Header Tags allow web designers to title sections of a page using headings, subheadings, etc. In web development we have tags for this ranking from H1 to H6. Where H1 is normally the main theme of the page and sections are structured in a hierarchy going through the levels to H2, H3 (if necessary), etc. Outside of title tags, they're recognized as one of the most important factors for user experience, making text digestible, and on-site SEO. This new report in The Crawl Tool was developed to give you an easy over-view of how a site is using header tags. Rather than visiting every page and checking the source code, it's now as simple as opening a report on a crawled site. The report shows you the page title, which is useful to know as depending on your keyword targets you may or may not want to repeat the title in the H1. It then lists the contents of the first H1, H2, and H3 on the page. Useful for knowing what the page is about. A unique feature of header tags is that they are a non-enforced hierarchy. There's nothing to stop you jumping from an H2 to an H6, for example, it's just not as effective. Further in the HTML source code, there's no enforcement of this structure. This means that dynamic sites, in particular, have trouble with producing a good structure. For this reason we developed the "structure" field. This scans through the pages and lists h1, h2,.., h6 tags in order as a string. This provides an at a glance view of a page's header structure. Additionally the "filter text" box is useful here to look for lost opportunities - for example you could try searching "h1 h3" and in any results that h3 could very likely better be an h2! This is one of "The Crawl Tools" uniquely powerful features. Find missed opportunities with header tags that are otherwise very difficult to find. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Is SEO Really Dead? [Is SEO Really Dead?](https://www.thecrawltool.com/blog/is-seo-really-dead) ##### The Crawl Tool Team ## Is SEO Really Dead? You know the scenario. It's January and all of a sudden several people start proclaiming "SEO is Dead". There's so many people doing this that you even start to believe them. Just like brand recall in marketing, this is partially the same people saying it over and over again because you're more likely to believe it. But it's also partially a number of people jumping on the controversy bandwagon. In recent times the latest clickbait has become to declare something "dead", often with an attempt to sell an alternative. This is also known as "controversy marketing". Controversy marketing is a high risk marketing approach which is deliberately controversial in order to generate chat and buzz around a statement in the hopes of generating publicity for a brand or sales. But there's a risk with controversy marketing. Those using it, who have built up a trusting base of social media followers, are deliberately telling them untruths to generate this buzz. That audience, of course, likely contains their customers and certainly potential customers. If caught out, then the trust they have built up is lost. They're no longer believable. And in the world of SEO, trust and believability is everything. ### The Constant Evolution of Search Algorithms Search engine algorithms are not static, but they change over time. This variable nature is less predictable than a fixed algorithm and system. In one update some sites do well and others poorly, in another the opposite may be true. Of late this variability does seem to be extreme, which makes the measurability of the long term benefits of any SEO changes difficult to estimate. An issue then is that statements like "SEO is dead" are clearly false, but simultaneously hard to categorically show as wrong. That said, we don't need to as those claiming "SEO is dead" rarely offer justification for their claims. At best you will see one misinterpretation of cause and effect, most often nothing. You're brain fills in the gaps "we'll, there's certainly a lot of variability, and there do seem to be a lot of sites losing traffic, so it must be true". ### The Birth of AI Technologies are advancing and one thing we've seen a lot of lately is more capable AI systems. I say "more capable" rather than good here, because the performance of things like Large Language Models is impressive but ultimately not much better than asking a 5 year old. To understand this, we just need to understand the very basics of how they work. The neural networks that make up AI simply encode mathematical functions. What makes them unique is that these functions are trained, rather than explicitly programmed as a sequence of steps by a programmer. There should still be some debate of whether this is intelligence, but an interesting characteristic is that it runs a function that the programmer(s) don't understand everything it is doing! That's pretty cool. A Large Language Model splits text in to tokens. Then, essentially, all it is doing is saying "given the previous tokens, what tokens have the best chance of coming next" and it picks one. Rinse and repeat and you have some output text. By using a vast amount of training data, it turns out they can get very good at predicting what tokens likely come next. But there's some interesting things about this that make even a 5 year old better. It can never be creative. Further, in the context of what you type in to start it going (the prompt), it can never be particularly creative. The output is always going to be somewhere around the average of what would normally come after the prompt. Unfortunately this average and lack of creativity is self re-enforcing, as they're used to create text that is placed on the internet they will inevitably end up training on other AI's data - re-enforcing that average. When we consider AI and SEO, there are two potential claims that are relevant. The first are that human created pages aren't necessary because these AI LLMs can answer everything. They can't, at best they're locked in time at the time they were trained. To overcome this and the problem of self-enforcing the averageness of AI output, it's in the AI creators interests to ensure that webmasters can and want to generate new content. The claim you'll see is that AI is replacing websites in the search results. Because of what we've talked about with AI LLMs needing new content to improve, these link to the original sources. That won't kill SEO, but it may change it. It's highly advantageous to be linked from those AI answers. Think about it - you're making better than average human written pages, the AI response in the search engine is at best limited to average because of the way the algorithms work. The interest level of the user that clicks on those links is astronomical, and they've landed on your far superior page. Developments in technology, especially AI, tends to happen in big jumps. Each jump is successively harder to make than the last one. That we've just made a big jump, doesn't mean the next jump is imminent. ### User Experience The basics of SEO have been around for ages. Write good content. Ensure your site is technically good. Get organic links. These all still work. They're about creating a great user experience for your users and being the sort of site people want to link to and search engines want to show. Even if the search engines sometimes get their algorithms wrong, in the long term this work is what benefits you. Sure, it may sometimes seem like it isn't working in the short term, but SEO is and always has been a long term benefit. ### Dangers of Believing SEO is Dead The harm these false claims do is also not to be underestimated. Often it's used to sell non SEO tools, often AI tools, that simply put don't work and in those cases there is the obvious direct harm. But they breach the trust of those they are speaking to and those believing the claim they eliminate one of the best ways for small and medium sites to build and get traffic, which compounds, over the long term. An issue is that the SEO field isn't devoid of predatory people. There are companies, agencies, and tool builders that charge many many many times what their product and services are worth and many of those are very well known and very popular. It's easy to suggest that SEO is expensive and doesn't drive traffic in the short term, so why bother? But that ignores that amount of SEOs and tool builders (like us) that genuinely want to help small and medium sites drive traffic, where the costs are reasonable and nearly always work out to be the cheapest form of traffic in the long term. The potential for SEO to grow small and medium businesses is, and always has been, huge. Play with whatever these "SEO is Dead" proclaimers are selling if you must, but be wary of missing out on something that is actually, you know, proven to work in the long term. Because, well, you're competitors probably aren't using what the "SEO is Dead" proclaimers are selling, but they are using SEO! ### Controversy Marketing is Dead Far from SEO being Dead, let me suggest that controversy marketing is dead! One of the glimmers of hope in this whole situation is that the "SEO is dead" and similar controversy marketing techniques still works, but it's getting less effective as people get wiser to it. It's like the silly people who just ask dumb questions on twitter to get engagement, over time everyone else starts to work out what is happening and gets bored of it. Betraying the trust of your users, potential users, and followers, is always going to be a short term strategy to oblivion. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Introducing The New Crawl Tool Project Dashboard [Introducing The New Crawl Tool Project Dashboard](https://www.thecrawltool.com/blog/new-project-dashboard) ##### The Crawl Tool Team ## New Project Dashboard Not everyone likes, or needs, to do a deep spreadsheet style analysis every time they check their site with The Crawl Tool. For this reason a lot of work has been put into improving the project dashboards. The idea of the project dashboard is to put the most important summary information in an easy to view and read report. This should save time for those times when you just want to get an idea of the current state of the site. We're calling this version 1 of the new Projects Dashboard as we've still got plenty of ideas and enhancements we'd like to do to it. But it's so useful we've made the first version live now. If you have any suggestions, get in touch: feedback@thecrawltool.com #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # How to List All Offsite Javascript Scripts [How to List All Offsite Javascript Scripts](https://www.thecrawltool.com/blog/offsite-javascript-scripts) ##### The Crawl Tool Team ## Offsite Javascript A new feature on The Crawl Tool is the ability to see what Offsite Javascript a website is using. There is a report "Offsite JS Scripts" and another "Offsite JS By Page", the difference being that the former lists all scripts on the site alone - which is useful for a general overview, and the latter lists them by page to help isolate down where they are used. There are several reasons you might want to know this information. ### Performance It has become common practice amongst many web developers to link to offsite CDNs of scripts they use. This has the advantage that those scripts can automatically update without someone having to manage the site and keep it up to date. It's a kind of lazy mans security benefit. Combined with the CDN likely being closer to the end user, this is often cited as faster. This would probably be true if we were in the original days of the internet. There are various things that slow down web requests, starting with converting any domain name to an IP address, continuing with the latency in setting up a connection to the server, and going through to the overhead in requesting pages. Modern web protocols minimize these by batching requests together to a particular server and minimizing the overhead, mostly connecting just once to each server. The key there is "mostly connecting just once to each server". If there is just one server then it very likely to be quicker than if a browser must connect to multiple servers to serve data, particularly if that data isn't very large in size such as some scripts. Moving scripts from being remotely hosted to hosted on the website itself can have performance advantages, so the reports allow you to see which scripts you might want to move. ### Security Having your website run code on a remote site means that you have to trust that remote site to not only deliver safe code but continue to deliver safe code in the future. A recent well known example of why this is a problem is the case of Polyfill (we won't link to the actual site for reasons that will be obvious). This was a popular script that added functionality to older browsers that people often used over the CDN.A company bought the Github and domain for this project, and started injecting code in some websites to redirect users to malicious and scam websites.Linking to a remote script is equivalent to giving them complete control of your site, which you may not want to do. ### Administration With privacy policies and cookie policies being something that virtually every website needs nowadays it's important to know exactly what third party services you are relying on and quite likely sharing data with. A scan for scripts can give you an instant overview of where the website frontend is communicating to and, thus, likely sending data to. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Achieving Perfect Robots.txt Functionality [Achieving Perfect Robots.txt Functionality](https://www.thecrawltool.com/blog/perfect-robots-txt-functionality) ##### The Crawl Tool Team ## Perfect Robots.txt Functionality The file robots.txt in the root of your website allows you to specify which search engines can crawl what. This is useful to focus crawler attention on content pages, reduce duplicate content, and to reduce the load on your website. The robots.txt standard is a very old standard and what many do not know is that quite a lot of it is subject to interpretation. The problem with most tools that allow you to check your robots.txt file and functionality that don't come from the search engines themselves is that it is difficult to be accurate when, at best, you're just working off a description to implement them. We've added robots.txt functionality to The Crawl Tool, but to overcome these problems we've done it a little differently. ### Code From Google In this github repository , Google provides code from their Googlebot crawler in the form of a library that can integrate with other software. Theoretically you can integrate this code with your software and get a reliable interpretation of robots.txt files using their assumptions on how to interpret it. Perfect! Except this is C++ code. This means it can easily be integrated into C or C++ software, but none of the main SEO tools are using that. Certainly in the cloud based tools space, which The Crawl Tool occupies, this isn't useable as it stands.Really the code is made primarily for writing a program in C/C++ that then uses this library. ### Enter WASM WASM is a technology that enables web browsers to run more traditional code. Such as C++ code! In principal there is a way to run it then.Firstly somebody would have to write a program that provides functionality to call Google's code. Then secondly, they must package it all up as a WASM file to make it usable in the browser. That's easier said than done, but with much trial and error - that's exactly what we've done at The Crawl Tool. ### Functionality This provides The Crawl Tool with a very unique functionality. Not only is it the only web based SEO tool that can be confident of their robots.txt implementation's accuracy but also we can check large numbers of web page links in parallel. In The Crawl Tool now a "robots.txt" filter will appear in appropriate reports. This allows you to fetch the robots.txt file from the site, or type one in, and filter the reports by Allowed or Blocked pages. This is a remarkably easy way to see what pages are blocked or not. We'd go as far as saying the best way. But it gets even better: because you can essentially test and view the results of any robots.txt file it provides the opportunity to modify the robots.txt file and to test and review what the results would be in a highly accurate way, before you risk changes to the actual robots.txt file in real life. ### Using The Technology The absolute best way to use this technology is to sign up to The Crawl Tool, remember you get 1000 credits for FREE, and crawl your site. Go to the reports and we'd suggest choosing the Titles report. Click the robots.txt button, and the "Fetch robots.txt" button, and your choice of "Filter by Allowed" or "Filter by Blocked". This way you're checking against your entire site worth of pages at once. However, if you want to test just one url, then the form below will let you paste a robots.txt file against a url. It's useful for quick checks and to demonstrate our technology works. But again, to check multiple pages at once you're best off signing up for a FREE account. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Results of Our Polyfill Hack Investigation [Results of Our Polyfill Hack Investigation](https://www.thecrawltool.com/blog/polyfill-hack-investigation) ##### The Crawl Tool Team ## Polyfill Hack Current State Investigation In June of 2024 reports started surfacing about an attack on websites that redirected visitors to, amongst others, gambling sites. This has become known as the polyfill attack. The nature of the attack means that it is on-going, and so we pose the question "what is the current status"? ### What is a Supply Chain Attack and What Happened A supply chain attack is an attack on the less secure elements of a chain that goes into something. In this case the polyfill code provided more modern features in older browsers. By using it on a site, the site developer could then use these more modern features safe in the knowledge they would also work if the user had an older browser. It is, in our view unfortunately, common practice for developers to load code like this from elsewhere - a CDN. This comes with some advantages - such as it being able to automatically update and therefore reducing maintenance requirements, but also with disadvantages such as leading to slowing of initial load, and of course a single point of failure and in this case ingress. Sites doing this are relying on a third party to continue to provide trustworthy code. In the case of polyfill, the domain on which it was served was ultimately sold to an untrustworthy party (through no fault of the code author it seems) and that party modified the code nefariously. While in this case a propertion of traffic was redirected, the net effect is that the entire site is under the control of the bad actor. ### Initial Reported Scale Initial reports put the number of instances of usages of polyfill io at 110,000+ . Whilst it does have the plus after it, this appears to have been calculated on a database with only 478M web pages. Given the high likelihood that the code is repeated across nearly all pages of a particular web site, this represents a tiny fraction of websites. ### Initial Responses Initial responses were varied. An issue is that the because the idea of linking to a CDN'd version of a library is to reduce the need for a developer to maintain them, many sites will just not be aware they even have the issue. Notably Google Ads blocked sites running the polyfill site - it is unclear why but presumably they worried about the script being used nefariously in some way. The CDNs Cloudflare and Fastly provided their own safe version of the scripts. Cloudflare has since started replacing this automatically, with no sense of irony that they are automatically modifying websites from a CDN that represents a single point of failure/ingress in the supply chain because a script on a CDN that represents a single point of failure/ingress in the supply chain has been compromised. Furthermore, the domain registrar has suspended the domain in question. ### Our experiment We took the common crawl and searched for mentions of the affected polyfill script. This crawl was performed in July, August, and September. It contains 95.4 million domains, with the original data coming from over 2.5 billion pages. Or in other words, considerably larger than the dataset the original estimate was made on. Given the time differences and the fact that some sites will have fixed the issue, we would expect this number to be somewhat lower than the original number of affected sites. However, we must also keep in mind that people are talking about the issue and so some of these mentions may not be actual code loading but instead people talking about it indiscreetly. Additionally quite a large number of these seem to consist of the original script being commented out in the code and replaced with Cloudflare/fastly code.However, from this initial data set we found 2,506,159 sites. Strongly suggesting that the original number of 110,000+ is a severe underestimate. We created a web crawler to crawl the 2.5M site's root pages with the intent of extracting only those that are currently running the polyfill CDN code. This comes up with the number of 29251. ### What Can We Say From This? In this case, whilst we cannot place an exact number on it - it does seem likely that the original reported number of affected sites was severely underestimated. Of those sites that were affected the publicity and the actions of a few large CDNs have mitigated the issue on a large number of sites. However, the issue still remains present on at least 29251 websites. ### The Situation for the 29251 websites Because the domain registrar, Namecheap, suspended the domain and therefore the name does not currently resolve to anything the script will not be loaded on these sites. This means that traffic will not be redirected from them, but also that some functionality of their sites will not work. It's also important to note that this fix is temporary as at some point the domain will presumably be released and future owners would then have the potential to control these websites. This is then not something that these site owners can simply ignore. ### Using The Crawl Tool to Check Your Site The simplest way to find out if your site is using the library in question is to look for cdn polyfill io in the Offsite JS Scripts report alternatively the Offsite JS By Page report will tell you what scripts are on what pages. You can use the "filter text..." filter and enter "polyfill" to isolate it down. Here's a video of the process. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # The SEO Title is too Short [The SEO Title is too Short](https://www.thecrawltool.com/blog/the-seo-title-is-too-short) ##### The Crawl Tool Team ## The SEO Title is too Short What does it mean when an SEO tool tells you the SEO title is too short? For example, in our Titles report. One of the key parts of on-page SEO is the title tag. But a common mistake amongst website owners and writers is to set the title of the web page to be too short . Don't get me wrong, there are many times when a short title is fine. Think, for example, about your privacy policy or contact us page. But you're also, generally, not interested in driving search engine traffic to those pages. They're more for the user to visit when they're already on your site. The difference is context. If the user is already on your site then they know the context. If they're coming from a search engine then a short title is a missed opportunity to provide context to search engines and visitors about the page you want them to visit. ### Why the Title is so Important While Google sometimes changes them a little bit, the title is generally the first thing that users will see about the page you want them to visit. It's right up there at the top of the block of information in your search result! There's extra focus on it because it's the clickable part of the result. A lot of a users focus and therefore decision on whether they'll visit your web page is based on that title. A well crafted title will greatly benefit you. ### Why a Short Title is a Problem As indicated previously, it's not always a problem. But in most cases it is. A short title fails to take advantage of the ability to convey what the page is about to search engines and users. You miss out on opportunities. ### What Should a Good Title be Like? From the above we can set some rules for what constitutes a good title. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Understanding Semantic Vector Space for SEO Professionals [Understanding Semantic Vector Space for SEO Professionals](https://www.thecrawltool.com/blog/understanding-semantic-vector-space-for-seo-professionals) ##### The Crawl Tool Team ### What is Semantic Vector Space? The concept of semantic vector space is important for SEOs to understand. But surprisingly, in the modern world, many SEOs still work on an exact-match basis when it comes to keywords. This exact-match idea doesn't include the concept of how related keywords are, whereas a semantic approach does . With the realization that modern search engines now look at words on a semantic level, we can see that a web page can be found to be relevant to a variety of search terms, even if they're not directly listed on the page. This explains the observation of many that successful web pages can rank in search engines for hundreds, or even thousands, of search queries. The thing that makes all this possible is the concept of a semantic vector space. By understanding this, we understand better how search engines work and therefore how to rank web pages. Let's take a deeper dive. ### How a Semantic Vector Space Works Vector spaces can seem complex. That's because they are multidimensional and our understanding of things tends to be limited to two or three dimensions, based on our 3 dimensional experience. For example, take the screen you're looking at. It shows a two dimensional image. It goes up and down, and left and right. Easy enough. In the real world there's another dimension, you can go forwards and backwards. It's 3 dimensional! We could add time as a 4th dimension but our understanding generally gets very fuzzy. How about a 5th one? It could exist but we live in a goldfish bowl of 3 dimensions so it's hard for us to understand as it doesn't equate to anything we know. Unfortunately for use vector spaces work with a multiple dimensional representation but luckily for us the precise way it works is exactly the same regardless of the number of dimensions. So we can simplify. Let's say we have a web page about cats. We can imagine a line that on one end reads "not relevant to cats" and on the other end reads "relevant to cats". Our web page sits somewhere on that line, and hopefully close to the "relevant to cats" end. We could put all web pages on that line. Let's give them a score from 0 to 1. Our page might score 0.92. A more relevant page may score 0.96. A page about dogs might score 0.3, because dogs are slightly related to cats. A page about newtonian physics might score 0.01. This is a one dimensional encoding of how closely related a web page is to a cat. A one dimensional vector. We can use AI/NLP to classify how closely related each web page is to cats. Now something interesting happens because we've just invented a search engine. It's not a very useful search engine because it only really covers one topic, but it's a search engine. How does it work? When a user enters a query, we can use the exact AI to give that query a number ( a one dimensional vector ). If they type "cats" then they will get a number around 1.0. The search engine should return the pages that score closest to this number. If they type "newtonian physics" it will come out around 0 and the search engine should return the pages that score closest to this number. This works if you think about it because the the search query was essentially not about cats so it returned pages that weren't about cats. Similarly if the user searches for "dogs" it would return pages close to the score of dogs. When we say "closest to", we don't care if the page is more to the left or right on the line than the query. We care solely about the distance. Let's imagine another dimension! Or, in other words, give our vector a second dimension/element. Let's call this vector "furry". We could imagine this as a second line with "not furry" on one end and "furry" on the other. If something is high on the "cats" line then it is likely to be high on the "furry" line. But at the other end it's not so clear. If something is not a cat then it could be furry, or it might not. It's probably better for us to mark this as a two dimensional graph. What this immediately allows us to do is to cater for caters that aren't furry, like a sphynx. Whereas previously with our search engine, typing in the query "sphynx" might bring up a one dimensional score close to "cat" and return pages close to that, now it returns a score close to cat but far from furry in our two dimensional vector space. What our imaginary search engine has started to do is not just understand the word "cats" and "furry", but to understand the meaning and the extent of the connection between them. We could keep adding more and more dimensions to make it more and more specific. These vectors map a position in an imaginary space and the distance between that and something else mapped into the space (say a query) is how relevant they are. And that's exactly what is done in practice. By adding more and more dimensions, the vector space can consider more and more concepts. In practice these won't exactly relate to terms like "cats" and "furry" but will be calculated, probably by AI, to maximally encode the concepts in whatever number of dimensions somebody has chosen to use. That number is often in the hundreds, sometimes thousands (the trade-off being that the more dimensions in the vector, the more time it takes for whatever application is using this technique to calculate the distance between varying vectors to find the similarity). ### How This Applies to SEO In practice, in search engines, there does still appear to be an element of exact match queries counting for quite a lot. This seems to be the case when pages have low link equity. But when link equity is high, or as it increases, the pages start to rank for a broader array of search terms. This basic understanding of how that comes about, hopefully, suggests to you why using a variety of connected search terms for keywords is a better tactic than always using the same keyword or phrase, and in particular always internal linking with that. In the case of low link equity, you have more opportunity to rank for more different terms and for high link equity or as it improves your page is more likely to sit central to that topic cluster and appear in more searches if you use a variety of terms. Whilst it might seem to initially work, the days of picking a keyword or keyword phrase and heavily targetting that are over. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Using AI to Keep a Tightly Themed Website [Using AI to Keep a Tightly Themed Website](https://www.thecrawltool.com/blog/using-ai-to-keep-a-tightly-themed-website) ##### The Crawl Tool Team ## Using AI to Keep a Tightly Themed Website Keeping a tightly themed website can help. There are several ways to detect the theme of a web site, one of them is using the links. Having a clear theme to a website can help both users and search rankings. One way to determine the theme of a website is by its linking. If it links to or from other websites or pages about a particular theme, then it is more likely the website itself is about that theme. The effect is not one sided though. In terms of links we can think of this as like putting a rubber band between the two sites or pages. It pulls sides closer together to a common theme. Of course, all the other links and all the other links on the linked site also have rubber bands pulling them to other things - so the effect of that single rubber band on each site can vary. In this process you're not completely out of control though. You choose who you link to! This is why we introduced site relevance and page relevance into the "Offsite Links" report. When The Crawl Tool crawls your website it uses a modern AI neural network to create a representation of the theme of each page on the site and each page it links to. This representation is known as an embedding or vector (as it takes the form of a multi-dimensional vector). It also calculates one of these vectors for the entire website you're crawling. These representations of theme have an interesting property in that themes that are similar also have similar representations. By using a method known as cosine similarity, we can calculate how close each representation is. A score of 1 would indicate that they are the same, but this will rarely occur except in the cases of duplicates. As mentioned previously, we use this to calculate two figures for offsite links. Site Relevance - this is a comparison of how closely the theme for the linked page is to the theme of your site. Page Relevance - this is a comparison of how closely the theme for the linked page is to the theme of your site. In the case where there is too little data to calculate a theme, we set these to 1000. So a suggested first step to using these is to filter the "Offsite Links" report to not include rows including that value by going to the filter text box and typing !1000.00 . Your next step is to sort the column you're interested in (e.g. site relevance). Most likely you will want these in descending order as this will list the least relevant (lowest numbers) first. There's no specific number below which it is bad, but in general we've found that numbers below 0.5 merit consideration. You should consider whether this links are necessary and if there's anything that can be done to improve them. When doing that keep in mind that the report is showing similarity to the page you are linking to (not the entire domain). If you control the page on the remote site then you might consider changing it by adding more text to make it more relevant. If you don't then maybe the webmaster might consider it. Otherwise you should consider whether it is truly useful for your website users. If not, then it's a good plan to remove that link. If so, then you should probably keep it (remember, the rubber band example we gave earlier in this article - it's setting the theme just a little bit. You should prefer a good user experience over a tiny change). Hopefully that's given you an insight, and another path to tighten the theme of your website. We're excited at The Crawl Tool as it's the first time we've used AI in the product, with more to come. However, in line with our philosophy and that everything needs to be transparent for it to be actionable, we'll describe all future things with articles like this too. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # What is a Web Crawler and How Does it Work [What is a Web Crawler and How Does it Work](https://www.thecrawltool.com/blog/what-is-a-web-crawler-and-how-does-it-work) ##### The Crawl Tool Team ## What is a Web Crawler and How Does it Work? The internet has billions of web pages. Have you ever wondered how these are turned into a database that you can search? Or how The Crawl Tool gathers information to help improve user experience and search engine rankings? This post delves into the world of website crawlers: what are they? How do they work? What are the benefits? What is a crawler in the search engine world? ### What is a Website Crawler A website crawler traverses the internet or a website systematically. They are also sometimes referred to as a web crawler, spider, or a bot (short for robot). It acts like a digital librarian, browsing and indexing websites to be able to easily retrieve information in the future. A crawler will follow the links from one page to other pages, collecting the information and building a database (otherwise known as an "index"). ### The Role of Crawlers in SEO For search engine optimisation we want our pages to rank higher. But we also want them to be found. Crawlers are therefore integral to the SEO process. We want our pages to be found! When a crawler visits a website it examines not only the links, but also the content, keywords, meta data, and other SEO elements. By understanding this information the search engine can rank the pages. ### How a Web Crawler Works Understanding how a web crawler works helps to understand their importance and how various aspects of SEO effect the visibility of your pages. A crawler's process can be broken down into several stages: Discover - The crawler starts with a list of URLs. These can be URLs it has found from a previous crawl, URLs from another source (such as a sitemap), or simply a list of URLs that the programmers assume are a good starting point. These form an initial queue of URLs to crawl. Crawling - The crawler visits each URL on the list. It downloads the content of the page and extracts any hyperlinks it finds. Assuming it is allowed to follow the link, it adds it to the queue of URLs to crawl. Parsing - The downloaded page is parsed for data that the crawler developers consider could be useful later for search results. Think here things like title tags, meta tags, all the way to the words in the actual page. Indexing - Using the parsed data, an index is built. The index is all the useful data gathered in the parsing phase, but organized in a way that is very quick for the search engine to query. ### The Importance of Crawlers for SEO Strategy Understanding the behavior of website crawlers helps to understand the connection and importance to SEO for website owners. Many aspects are related, some key ones are: Technical SEO - Having good internal linking and a fast site helps crawlers to access and index your site quicker. Not only does it help with ranking, but a site that is easier and faster for a crawler to navigate helps it index more pages. Sitemap - Having a sitemap helps the crawler to identify pages on your website with less work. Helping to make your pages more visible. Robots.txt - With a robots.txt you can control which pages a crawler should or should not visit. This is useful for directing the crawler so it crawls the pages you want, rather than crawling pages that don't have much value in appearing in search engine results. ### Identifying Site Issues with SEO Crawlers An SEO crawler like The Crawl Tool, also known as an SEO audit tool, essentially simulates the crawling process on the level of a single website and builds it's own database and data. This makes them instrumental in finding issues that may be affecting your search rankings. Broken Links and 404 Errors - Clicking on a link and getting a 404 error is annoying for users. Remember that search engines want to rank sites with great user experience. Crawlers also waste time following them, when they could be crawling another great page on your site. Broken Links are, therefore, something you should be fixing regularly on your site and tools like The Crawl Tool with its Broken Link report are instrumental to this. Duplicate Content - Duplicate content confuses search engines. They obviously don't want two results that are the same but where duplicate content exists it is a very difficult problem for them to know which they should include and which they should not. Also, as only one of the pages can be included, it is waste of crawler budget to have them crawl two pages the same. You should change one of them. Meta Tag Issues - While the meta keywords tag isn't so important anymore, the meta description tag is used and the various social media tags are too. An SEO crawler can help you identify problems with these. Internal Linking - An seo crawler tool can help you find issues with internal linking on your site. Does a page not show up that you were expecting? Then the page is not discoverable and you need to give it an internal link from somewhere. Are pages overlinked? Are there internal linking opportunities? This has been a very short, high level, overview of search engine and seo crawlers, but hopefully better understanding how they work helps to understand their usefulness and the importance for search engine optimization. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Why We Support Data Driven SEO [Why We Support Data Driven SEO](https://www.thecrawltool.com/blog/why-we-support-data-driven-seo) ##### The Crawl Tool Team ## Why We Support Data Driven SEO Why do we support Data Driven SEO? I've been spending some time on the SEO subreddit lately and there's a few things that strike me as worthy of note. The first is that, as ever, there are many differing opinions on SEO. The second is that answers to questions are always stated as absolutes. That's nothing new, of course, but it does seem to be more and more the case. It also reflects both social media trends and trends of SEO personalities. All this occurs while the conflicting theories show that whilst there is agreement on some things, there's no agreement amongst SEO over a lot of things. For people trying to find out about SEOs this causes a lot of confusion. ### Google Stokes Confusion As a search engine, Google wants to deliver the best possible results to users. There's an argument that if SEOs are producing great pages then there's a somewhat symbiotic relationship there. The issue here is that if people precisely knew the ranking factors Google used to determine what are great pages, then it would be possible to craft the quickest possible page that meets those factors without worrying about it being the best page. Google has a case of split interests here - it's in their interests to encourage SEOs to produce great pages but strongly against their interests for people to understand the ranking factors too well. Because of this Google has always had a practice of not exactly telling lies, but also not being exactly clear when they say things. By this I mean their wording is almost universally "tricky" when speaking about these things. For example, they'll suggest something isn't a ranking factor. That thing may not be directly coded into the algorithms as a ranking factor. But they'll fail to mention that it does influence another ranking factor. For all practical purposes, if something influences another ranking factor then it IS a ranking factor. But you could never actually argue that Google lied because it isn't coded in there. In May of 2024 the Google search documentation leak surprised a lot of SEOs. It led to a lot of analysis and theories, many of which we don't subscribe to. But it did reveal some things, such as: Whereas Google had previously denied using Chrome traffic. And: Whereas Google has denied things like dwell time. A key question here is is the precise wording for "dwell time" enough to deny it? To be fair here, some of this may have changed over time. Google could have stated something, but then a change to the algorithms means that's no longer the case. That would make sense with an algorithm that necessarily needs to constantly change. But for some reason some SEOs still happily quote very old statements as fact. Still: In our view - information from Google (that they haven't released by accident), has to be considered as being biased and must be considered along with the intent. Yes, Google have people who interface with SEOs - but news flash - their job isn't to tell you the ranking algorithm. But it is common to find SEOs repeating things Google have said, even from years ago, even things that have since been disproved, as gospel! ### Why a Data Based Approach is Better On reddit their was a question that we can summarize as does linking your blog posts on social media help with indexing/discoverability. Let's set some context around this - lately a topic initiated by Google that SEOs have been talking about is "crawl budget". In the old days a search engine used to be almost be judged based on the size of their index (how many pages it contained). Bigger was better! In more enlightened times search engines seem to have realised that having a large number of good pages is better than having more pages than their competitors because their index is full of an even larger number of bad pages. This means they can reduce the cost of building their search index if the just work out how to crawl the good pages more often and the bad pages less or not at all. That's huge if, for example, you want to direct resources into things like AI. So crawl budget is the topic of the moment. Anecdotally, it seems, people are getting crawled less and naturally want to know how to improve that. So does linking your blog posts help with that? In The Crawl Tool, if you link up your account with Google Search Console then it will pull data from the API. It does this to get searches, clicks, keyword data, and page data. Along with the page data Google will occasionally provide some links that get pulled into your "Back Links" report. One notable thing about those links is that they are sometimes simply a social media link. This occurs on indexed pages. So in those cases it is telling you that it's indexed the page, but the only thing it knows about it is a social media link. This doesn't tell us anything about if social media helps the page rank (one would expect not very much given it is a link likely created by the site owner themselves). But what it does tell us is that in indexing/discoverability that social media link almost certainly played a part. Data like this is unbiased, doesn't have an opinion, and doesn't have business interests to protect! But the interesting thing is that when I mentioned this on the reddit conversation, the categorical answers based on something Google once said continued. ### Is Data Always Absolute? No. In the example above we use the words "almost certainly played a part". We know, for example, that Google removes search and click data from query and page views in Google Search Console so that it under-reports. You can see this by adding up the values for all pages or all queries and comparing it to the main headline figures and you'll see under-reporting. It's possible that Google is also removing/replacing the links, but unlikely. The Data doesn't necessarily have to indicate a fact, but can also be a strong support for a theory. What's important though, is to use it to underpin unbiased theories (or facts) with a realistic assessment of the accuracy. It can also be the case that there is not enough data to indicate something but only to suggest something. But by using actual data and a realistic assessment of what it indicates we can separate out the: "I believe this because a biased commercial entity with an interest in people not knowing the ranking factors once said" (Google said) vs. "I believe this because observationally it always seem to be the case" (Every time I put a keyword phrase into the title and H1 it seems to do well) vs. "I believe this because of X" (Data) Because that list goes from the least trustworthy to most trustworthy source. ### But You Would Say That We would. The Crawl Tool gathers and presents data to help you improve your site and help you improve your search engine rankings. It stands to reason that we have a data-centric approach in both how we do that and how we assess various SEO theories. That's how our users win and outperform their competitors. #### Ready to find and fix your website's SEO issues? Start with a free crawl of up to 1,000 URLs and get actionable insights today. ### Recent Posts #### Weekly Feature Update 3 In the third of our weekly update on new features, we introduce the Public API. A lot of effort has been put into the use... #### Is Low Effort Content Always Bad? Why are you asking this? I recently came across this post by John Meuller on bluesky. For backfill on information this is ho... #### Weekly Feature Update 2 It's the second of our new weekly videos showcasing the improvements to The Crawl Tool in the last week. It seemed like there ... #### What is a Soft 404? There you are, doing your daily ritual of checking the "Google Page Data" report in The Crawl Tool and you come across this: ... #### Weekly Feature Update 1 The Crawl Tool is not only the best SEO tool for small to medium websites and web professionals working with them, it also reg... #### World's Smallest AI Text Detection Model Why Create The World's Smallest AI Text Detection Model? At just 85MB the AI text detection model we created really is tiny,... # Bulk Assignment [Bulk Assignment](https://www.thecrawltool.com/bulk-assignment) ## Bulk Assignment Sometimes you want to assign statuses, assignees, due dates, and/or notes to several rows at a time. To do this you can use bulk assignment, which works slightly differently to assigning one at a time. On the left of each grid you will notice checkboxes. You can use this to individually choose rows you wish to assign to. The column header of "..." will toggle all the rows on and off.TIP: you can use the filters to select a group of records that you want to bulk assign to in order to make the selection easier. Once you have selected some items, you will notice some new buttons appear at the top in the header to allow you to Bulk Set Status, Bulk Assign, Bulk Set Date, and Bulk Set Note. The function the same way as for single row assignment except when you use them they will apply to every row you have checked.For example, let's set them to "To Do". As soon as To Do is clicked, then in this example we get. # Choose The Columns That Display in A Report [Choose The Columns That Display in A Report](https://www.thecrawltool.com/choose-the-columns-that-display-in-a-report) ## Choosing The Columns That Display in a Report You may not wish every column to display in a report. From any report you can click on the three dots menu at the top right and select the "Columns" option to change these. Choosing it will display the columns on the left hand menu where you can toggle them on and off. When you're done toggling columns on and off you can simply click the cross to close the column options menu. # Delete A Project [Delete A Project](https://www.thecrawltool.com/delete-a-project) ## Delete A Project If you want to create more space to have another project, an option is to delete a project. You may want to go through the reports and export your data first as deleting a project will delete the data for it. From within a project, go to the three dots menu at the top right and select Delete Project. # Editing a Project [Editing a Project](https://www.thecrawltool.com/editing-a-project) ## Editing a Project From within a project you can click on the three dots menu at the top right and choose "Edit Project" Here you can change the Name and URL. If you're changing the URL we suggest that you actually Delete the project and set up a new one. But you can change it here if you insist. # Exports [Exports](https://www.thecrawltool.com/exports) ## Exporting Data In any report with a data grid, you have the option to export this so you can work with it in Excel or Google Sheets. If you have applied filters to the data then only the unfiltered rows (those visible on the screen) will be exported. ### Excel To use the data in Excel, The Crawl Tool will export an xlsx file. Clicking on the "Export xlsx" button at the top will download an Excel xlsx file. ### Google Sheets Initially you will not have a button to export to Google Sheets. This is because Google Sheets works differently and you need to link your Google Sheets account to The Crawl Tool. To do this you need to click your name at the top right and choose Settings from the little dropdown menu. This will take you to your personal settings screen. On that screen you will see a Connect Google button. Clicking this will enter Google's process for confirming the permissions for the Crawl Tool to be able to make Google Sheets. Agree to the permissions and it will send you back to The Crawl Tool. You will now have an "Export Google" button show above reports. Clicking on this will export the report into your Google Drive/Sheets with the site url and the report name.Should you ever wish to disconnect you can return the settings section where the button now reads "Disconnect Google" # Filters [Filters](https://www.thecrawltool.com/filters) ## Filters Filters in The Crawl Tool help you to narrow the data shown down to the data you're most interested in. They can be found at the top of any of the reports. Filters are additive. ### Status By default the status filter is set to "All (except ignore)", using this filter you can show items that are in a particular status. For example we can show only rows with the To Do status. ### Assignee Filters by an assignee. ### Filter text.. By typing in this text box the rows will be limited to rows that contain this text in one of its columns. For example, if we typed the word password in this box then one row from the previous example image would remain. Additionally there are some special filters you can type in. If you start the text with an explanation mark (!) then it will show all rows that do not contain the text. e.g. !example will show every row that does not include the word example. If in the text box you write empty, followed by a colon (:), followed by a column name then it will show every row where that column is empty. e.g. empty:Status will show every row with an empty Status column. The reverse is also possible by writing notempty. e.g. notempty:Status will show every row where the status column is not empty. # Header Tags Report [Header Tags Report](https://www.thecrawltool.com/header-tags-report) ## Header Tags Header tags give search engines vital information about what a page is about. After the Title Tag, they are considered probably the most important. Header Tags are defined by levels. Normally you want one H1 tag per page, with lower level tags nested underneath - such as H2. ### Source URL This is the URL on which the header tags appear. ### Title This is the title tag of the page. It can be a good idea to represent different keywords in your header tags to your title tag, or you may want to duplicate to strengthen the effect. The title here is to allow you to adjust to your style. ### H1 This is the contents of the first H1 tag on the page. ### H2 The contents of the first H2 on the page. ### H3 The contents of the first H3 on the page ### Structure The structure looks through the page and finds all the title tags in order and lists out their tags, such as H1 H2 H3 H2 H3 H4. This allows you to quickly see if the tags are following a good structure to optimize them. # Managing Access Keys [Managing Access Keys](https://www.thecrawltool.com/managing-access-keys) ## Managing Access Keys To manage your workload, you might wish to share reports for a project with someone else within your organization or with a third party. Access Keys enable this. Paid plans for The Crawl Tool come with a generous number of access keys to use. ### Creating an Access Key Creating an Access Key is simple. Click "Add Access Key" on the side menu and you will be presented with the screen to add an access key. In the "Name" box you will in the name for the key. This is the name that will be used to refer to this user in the system. For example, if you assign a task in a project to someone then this will show as an option. Under scope you have two choices. "All Projects" or "Specific Projects". All Projects will automatically enable access to every project you have for this access key. If you set it to "Specific Projects" then you will be able to set which projects it can access (you can change this later). Clicking on the "Create Key" button will create the key and send you back to the timeline. ### Managing the Key On the side menu under Access Keys you will now have a key listed with the name you gave it. If you click on it it will show the details. At the top you will see a line of text like: Unique link to give to others: https://www.thecrawltool.com/?access_key=mylvev0s7gauhqpp1zkiw5gxmp You can give this link to others that you want to access your projects. Because of the access_key at the end, they will not need to login. You should keep this link private between you. ### Managing Key Project Access On the same screen you will see a list of checkboxes with all Projects you have. By checking/unchecking them and clicking Save changes you can adjust what projects it has access to - for example, if you add a new project you would like it to access. Additionally there will be a Delete Key button - this will remove access for that key. # Setting Due Dates and Notes [Setting Due Dates and Notes](https://www.thecrawltool.com/setting-due-dates-and-notes) ## Setting Due Dates and Notes Due dates and notes allow you to manage task timescales for yourself or others, and add relevant notes to remind you later or inform someone else. ### Setting a Due Date Clicking on a cell in the Due Date column will pop up the Date/Time picker. Set your date and time and click Save. ### Setting a Note In a similar way, clicking on a cell in the Notes column will allow you to set a note for the row. # The Broken Links Report [The Broken Links Report](https://www.thecrawltool.com/the-broken-links-report) ## The Broken Links Report The broken links report is probably one of the most essential reports. Broken links on a site are extremely poor for user experience, and a lost SEO opportunity. ### Source URL The Source URL indicates the URL that the broken link originates from. This is the page you will want to change to fix it.If you click on a URL in this field it will open up the page. You can then find the link and repair it. If it isn't often then most browsers have a view-source (normally by CTRL-U) and by searching for the Broken Link you can often get more information about where it is. ### Broken Link Self explanatory! This is the destination url that isn't working. These could be internal links or external links. Importantly, these are links that were not working at the time of the crawl. Again, if you click on them it will open up the link for you so you can double check. If it's an external site then they may have had trouble at the time, or in the rare case a site like LinkedIn always reports that the page isn't working to crawlers(!) - you can safely change the status to "Ignore" for these and they won't show on future reports. ### Anchor This is the contents on the broken link - i.e. what is linked on the web page. This is just to aid you in finding the link on the Source page to fix it. # The Crawl Log [The Crawl Log](https://www.thecrawltool.com/the-crawl-log) ## The Crawl Log The Crawl Log report lists every URL that was crawled in the last crawl and the response code. So simple that we probably don't need to explain each column here. But it's deceptively powerful. Try typing a certain response code into the "Filter text..." filter box and you get an instant list of urls with that response. Or type a part of the url to check if pages are crawlable. # The Insecure Content Report [The Insecure Content Report](https://www.thecrawltool.com/the-insecure-content-report) ## The Insecure Content Report Using http:// content in an https:// website often causes mixed content errors. But even when it doesn't, having insecure content is bad, pointing your users to insecure content elsewhere isn't great either. ### Source URL The source URL is the url that the insecure content is on or linked from. ### Insecure Content This will contain the html of the insecure content. If this is an img tag then likely your users will also be experiencing mixed content errors or warnings. If it is an a tag then it is something that has been linked to.In both pages you should find a secure https: alternative and change it to that instead. # The Internal Links Report [The Internal Links Report](https://www.thecrawltool.com/the-internal-links-report) ## The Internal Links Report The internal links report shows every internal link within the website and what the Anchor text is. Try typing a keyword into the "Filter text..." box to filter a list of all links using that particular keyword. ### Source URL The page on which the internal link originates. ### Internal Link The URL being linked to ### Anchor The anchor used in the internal link. Try using the "Filter text..." box to search for keywords to see where they are being used. Or sort by "Anchor" by clicking the Anchor title on the grid so that Anchors cluster together in order to see how much something is being used. # The Linking Domains Report [The Linking Domains Report](https://www.thecrawltool.com/the-linking-domains-report) ## The Linking Domains Report Every 3 months the common crawl releases data about which domains link to other domains. This can be useful for consideration why a website ranks how it does and which domains/links may be important to that ranking. While The Crawl Tool is primarily about on-site SEO, we include this because of that utility. ### Source Domain This is the domain that is linking to the website in the project ### Domain Rank The rank of the domain based on the PageRank calculation. 1 is the top ranking domain, as it decreases the domains are considered less important. ### Domain PageRank A calculation of PageRank based on which domains link to which domains. ### Want to know more? You can also read our blog post about the linking domains report. # The Meta Descriptions Report [The Meta Descriptions Report](https://www.thecrawltool.com/the-meta-descriptions-report) ## The Meta Descriptions Having a meta description tag in your page helps when a relevant snippet can't be found on the page. The meta description report helps you find where these are missing, duplicated, or where there are issues. ### Source URL Like all reports, this is the url of the page the description is found on. ### Description This is the text extracted from the meta description on the page. It's handy to sort these, for example, to look for duplicates or to use the "Filter text..." filter to look for keywords being used. ### Issues This will list any issues the crawler found. Because it's quite subjective, this doesn't focus on things like length but things like having two meta descriptions on a page. # The Meta Open Graph Report [The Meta Open Graph Report](https://www.thecrawltool.com/the-meta-open-graph-report) ## The Meta Open Graph Report The meta open graph report helps by showing what meta open graph tags there are on each page. ### Source URL The url the open graph tags were found on. ### og:title The og:title tag of the page. Like regular titles, you want this to be filled in and not a duplicate of others. ### og:description The og:description of the page. Should give a good, unique, description of your page. ### og:image The og:image tag. The image that sites linking to this page should use. It should point to a good, representative, image. ### og:url The og:url tag will generally contain the same url as "Source URL" ### og:type The og:type tag contains the type of the content according to opengraph. In most cases this will be "website" or "article". ### og:sitename The og:sitename tag contains the site's name, so generally his will list the same thing in each row. ### og:locale The og:local tag contains the site's local. Again, generally this will be the same thing in each row. ### og:image_alt The og:image_alt tag holds the alt tag for the og:image image. This should be filled in. ### Hint A number of these columns are the same on every row. Once you know what the value is, you might want to hide those columns to make the screen easier to read. Click on the three dots menu at the top left, choose "Columns" and turn off any columns you don't want. # The Meta Twitter Report [The Meta Twitter Report](https://www.thecrawltool.com/the-meta-twitter-report) ## The Meta Twitter Report The meta twitter report reports on twitter meta tags on pages, these are generally only used to determine how links to your pages show on twitter - although some other sites may use them. ### Source URL The page the twitter tags are found on. ### twitter:card The twitter card type. Often it makes most sense for this to be 'summary_large' ### twitter:title The twitter title ### twitter:description The twitter description ### twitter:image The image to display on any card ### twitter:image:alt The alt text for the image ### twitter:url The url of this page (generally the same as "Source URL") ### twitter:site The site name ### twitter:creator Creator details # The Missing Alt Tags Report [The Missing Alt Tags Report](https://www.thecrawltool.com/the-missing-alt-tags-report) ## The Missing Alt Tags Report Alt tags on images are vital for screen readers and accessibility and provide useful meta data for search engine crawlers about images. ### Source URL The URL the image is on. ### Src This is the url of the image that is being pointed to but there is no alt tag. # The Offsite Links Report [The Offsite Links Report](https://www.thecrawltool.com/the-offsite-links-report) ## The Offsite Links Report The offsite links report is the opposite of the internal links report. It shows all links that are leaving a site. ### Source URL This is the page that the offsite link is originating from. It's the page where you would want to look for it. ### Outbound Link This is the target url of the link - or in other words the url of the offsite page. ### Anchor This is the anchor used in the link. An idea is to try using the "Filter text..." box to find keywords that may often be in these links. ### Site Relevance The theme similarity between the page being linked and the crawled site. 1 means they are the same. ### Page Relevance The theme similarity between the page being linked and the page linking to it. 1 means they are the same. # The Page Links Overview Report [The Page Links Overview Report](https://www.thecrawltool.com/the-page-links-overview-report) ## The Page Links Overview Report The page links overview report is similar to other links reports, but in summary form. This allows you to quickly and easy get an idea of site and link structure. ### Source URL Is the URL that links are originating from or to. ### Title The title of the page. ### Inlinks / Inlinks Unique This is the number of links found in the site crawl that point to this page. Another page may link to this page several times (this is often the case if it is in a main menu and then, for example, mentioned in the body). Inlinks counts all these links. Unique Inlinks will only count a maximum of one from each page. ### Outlinks / Outlinks Unique Works exactly the same as Inlinks / Inlinks Unique but for links from the page to other pages. ### Tips You can sort columns by clicking on the header. If you sort Inlinks Unique to be descending then you get a good ranked overview of how important pages are considered internally on the site. Similarly you could sort by Outlinks to find pages that are perhaps linking to too much. # The Pages Linking Redirects Report [The Pages Linking Redirects Report](https://www.thecrawltool.com/the-pages-linking-redirects-report) ## The Pages Linking Redirects Report Often a quick win for user experience and site speed is to fix internal links that point to redirects. The pages linking redirects report helps with this. ### Source URL This is the URL doing the linking. You can click on the cell and it will open the url for you, this is where you'd want to fix it. ### Target Redirect This is the URL that the link points to but that links somewhere else. This is the URL you will find in the Source URL page's html code. ### Redirects To This is where the Target Redirect actually goes to. In your Source URL you want to change the link to the URL in Target Redirect to be this URL instead. ### Anchor The anchor text of the link, to help you find it. # The Project Dashboard [The Project Dashboard](https://www.thecrawltool.com/the-project-dashboard) ## The Project Dashboard The project dashboard shows key information about your project/the site you are interested in. We consider this as a report in itself as it gives you a quick overview of the health of the most important aspects of the site. At the top we have crawl information such as the number of URLs crawled, when they were crawled, and what the site URL was. As sites often change over time, this is important contextual information. ### Response Codes The Response Codes Block shows the response codes the crawler received for each URL it crawled. Ideally you want this to show as all 200 Responses, which indicates the web server did not have any problems serving the web pages, the pages were found, and there weren't any redirects.In the real world, that's hard to achieve and a great site will often show a couple of redirects but no 404s and 500s. This overview is a quick look, you can view the specific urls and the response codes returned in the "Crawl Log" report. ### Due... Within the reports the owner can assign issues either to themselves or to a third party; additionally they can set a due date on these items. The Due... block serves as an overview and shows how many items fall due within the project within specific amounts of time. This helps you manage your expected upcoming workload. ### Statuses Like Due Dates, the owner and authorised third parties can assign statuses to items in the reports. The statuses block gives information about how many of each type of status there is.This can help, for example, with knowing how many items still need discussion, or as a quick look at how much there is "To Do" in total. ### Unique Titles The Unique Titles bar is a quick look at how many of your titles are unique or not. You goal should be to have every title being unique and descriptive of the page it is on. ### Internal Redirect Links This reports the number of times a site links to a redirect within itself. This is the number of links that a site owner could have not go through redirects by simply adjusted the link to point to the ultimate target itself. This makes the link quicker to navigate for a user. # The Redirects Report [The Redirects Report](https://www.thecrawltool.com/the-redirects-report) ## The Redirects Report The Redirects Report can be viewed as a lightweight version of the "Pages Linking Redirects" report. It shows all redirects and where those redirects come from and too. It's useful for seeing if a site has a lot of redirects, what type they are, and for looking for circular redirects. ### Source URL This is the URL that is redirecting to something else. ### Type The type of the redirect. 301 = permanent, 302 = temporary. ### Redirect URL Where the URL redirects to. If this is the same as Source URL then there is a circular redirect and there's a problem! # The Site Information Report [The Site Information Report](https://www.thecrawltool.com/the-site-information-report) ## The Site Information Report During the crawl our seo crawler tool gathers other related information that is useful in performing seo tasks on your site. Here you can find, amongst others, DNS records and SSL information. This enables you to easily see when an SSL certificate expires or if a site is using a CDN such as Cloudflare. Additionally, because of the common use of TXT records for verification of services it is often possible to see what services a site uses. ### Type The Type of the information. e.g DNS or SSL ### Data1...Data2...Data3 Because information can vary in type, these 3 fields are used to hold further information but the meaning or contents of these fields can vary per type. e.g. for a type of DNS, Data1 is the record type, Data2 is the contents, and Data3 will be TXT contents. For SSL, usually only the first two fields are used with Data1 representing the SSL field name and Data2 representing the contents. # The Theme Report [The Theme Report](https://www.thecrawltool.com/the-theme-report) ## The Theme Report The Crawl Tool Theme Report helps you keep your site focused on a theme by using Artificial Intelligence to assess the theme of each page and giving a measure of how far each page is from the website's overall theme. ### Source URL The URL of the page the theme relevance measure refers to. ### Title The Title of the page the theme relevance measure refers to. ### Relevance How relevant the page's theme is to the overall site theme. 1.0 is completely relevant, 0.0 completely irrelevant. So higher is better. # The Titles Report [The Titles Report](https://www.thecrawltool.com/the-titles-report) ## The Titles Report The titles report lists the titles of all the pages of the website, along with any issues found (such as two titles on a page). ### Source URL The page on which the title is found. ### Title The title of the page. Every page should have a unique title. You can try ordering the Title column by clicking on the Title header, to look for duplicates. Or use the "Filter text..." filter to look for keywords. ### Issue Any issues found, such as having two title tags. # Your First Crawl [Your First Crawl](https://www.thecrawltool.com/your-first-crawl) ## Your First Crawl Using The Crawl Tool is easy. Let's run through a crawl scenario to show you just how simple it is. Let's say we want to analyze The Crawl Tool website itself. The Crawl Tool can give you lots of information, but for our scenario let's say we're interested in knowing what the most important pages are. After signing up and logging in you can click the Add Project link on the left menu. You'll see a screen like this. For our purposes we'll write "The Crawl Tool" in the Project Name box, https://www.thecrawltool.com/ in the Site URL box, and click create project. You might want to put your own website details in there. After you click "Create Project", it'll send you to the project page. You haven't crawled anything yet so this page will just give you some handy information about the crawl process. It's worth taking the time to read it. Click on the "Start A Crawl" button at the top. Before long you'll notice some notifications come up at the top right of the screen. These notifications will change as the various stages of the crawl process complete. The stages are "in the crawl queue", "crawling", and "crawl completed". In the example we're doing there's not a lot of pages to crawl so it will crawl quickly. The tool is nice to sites so doesn't overload them. Larger sites will therefore crawl slower. You don't have to wait around or keep this page open. The Crawl Tool will email you when the crawl is done. Here we've waited and we get this notification: Because the project is crawled, the data has changed. If you followed a link from the email that says it has finished then you'll see it straight away. Because we've stayed and watched the notifications for this one, we need to click "The Crawl Tool" on the menu on the left. This will take you to the project dashboard, we'll go over this in later sections of the guide. For our demonstration now, we're interested in the Page Links Overview report. You can get to that from the "Choose Report" dropdown button at the top left of the page. Here's what the report looks like. This report gives us data about all internal and external links. The data grid works like a regular spreadsheet - so first I'm going to resize some columns so we can see the data we're interested in a bit better. We're interested in how many internal pages link to other internal pages, these are known as inlinks. Our theory is that the ones with the most inlinks are the most important for the website. But pages could link to other pages multiple times, and often do. So in this scenario we'll choose to use "Inlinks Unique", which only counts one link from each page to another. Clicking on a column header sorts the data. So if we click "Inlinks Unique" twice it will sort from highest value to lowest value, putting the most important pages at the top. For us the Home Page and Blog page seems the most important. That seems right. There's not a lot of data here, but it serves as a good example. If you used your own site then you will probably have more. If you followed along using our site, you might want to delete the crawl of our site and try your own. To do that: You can click on "The Crawl Tool" under projects to go to the project dashboard. Then from the three dots menu at the top right, choose "Delete Project". After that start a new project and follow from the top of this guide.