Last updated on
A Reddit user raised a concern about their “crawl budget” problem, wondering if numerous 301 redirects leading to 410 error responses were draining Googlebot’s crawl capacity. In response, John Mueller from Google provided insight into the possible reasons behind the subpar crawling activity and clarified aspects of crawl budgets in general.
The concept of a crawl budget, widely embraced within the SEO community, was introduced to account for instances where certain websites aren’t crawled as extensively as desired. It suggests that each site is allocated a finite number of crawls, setting a limit on the extent of crawling it receives.
Understanding the origin of the crawl budget concept is crucial in grasping its essence. Google has consistently denied the existence of a singular “crawl budget,” although the crawling behavior of Google may imply otherwise.
Matt Cutts, a prominent Google engineer at the time, hinted at this complexity surrounding the crawl budget in a 2010 interview. He clarified that the notion of an indexation cap, often assumed by SEO practitioners, isn’t entirely accurate:
“Firstly, there isn’t really a concept of an indexation cap. Many believed that a domain would only have a certain number of pages indexed, but that’s not the case.
Furthermore, there isn’t a strict limit imposed on our crawling activities.”
In 2017, Google released a detailed explanation of the crawl budget, consolidating various crawling-related facts that align closely with what the SEO community had referred to as a “crawl budget.” This updated explanation offers more clarity compared to the broad term “crawl budget” previously used. (The Google crawl budget document was summarized by Search Engine Journal.)
The key points regarding crawl budget can be summarized as follows:
The Reddit user’s query revolves around whether the creation of what they perceive as low-value pages is impacting Google’s crawl budget. Specifically, they describe a scenario where a request for a non-secure URL of a page that no longer exists redirects to the secure version of the missing webpage, which returns a 410 error response (indicating the page is permanently removed).
Their inquiry is as follows:
“I’m attempting to stop Googlebot from crawling certain very old non-HTTPS URLs that have been crawled for six years. I’ve implemented a 410 response on the HTTPS side for these outdated URLs.
When Googlebot encounters one of these URLs, it first encounters a 301 redirect (from HTTP to HTTPS), followed by a 410 error.
Two questions: Is Googlebot content with this combination of 301 and 410 responses?
I’m facing ‘crawl budget’ issues, and I’m unsure if these two responses are contributing to Googlebot’s exhaustion.
Is the 410 response effective? In other words, should I skip the initial 301 redirect and return the 410 error directly?”
John Mueller from Google responded to the Reddit user’s query:
Regarding the combination of 301 redirects and 410 error responses, Mueller stated that it is acceptable.
In terms of crawl budget, Mueller clarified that it primarily becomes a concern for exceptionally large websites. He directed the user to Google’s documentation on managing crawl budget for large sites. If the user is experiencing crawl budget issues despite not having a massive site, Mueller suggested that Google might simply not perceive much value in crawling more of the site’s content, emphasizing that it’s not necessarily a technical problem.
Mueller suggested that Google “probably” doesn’t see the benefit in crawling additional webpages. This implies that those webpages may require scrutiny to determine why Google deems them unworthy of crawling.
Some prevalent SEO strategies tend to result in the creation of low-value webpages lacking original content. For instance, a common approach involves analyzing top-ranked webpages to discern the factors contributing to their ranking, then replicating those elements to enhance one’s own pages.
While this approach seems logical, it fails to generate genuine value. If we simplify the choice into binary terms, where “zero” represents existing search results and “one” signifies originality, mimicking existing content merely perpetuates zeros—producing websites that offer nothing beyond what’s already in the search engine results pages (SERPs).
Undoubtedly, technical issues like server health can impact crawl rates, among other factors. However, regarding the concept of crawl budget, Google has consistently stated that it primarily pertains to large-scale websites, rather than smaller to medium-sized ones.
Original news from SearchEngineJournal