Google Explains Reasons For Crawled Not Indexed

seo@optimus42.com

2 years ago

In May, Gary Illyes from Google was interviewed at the SERP Conf 2024 conference in Bulgaria, where he addressed the issue of crawled but not indexed pages. He highlighted several reasons behind this issue, which are valuable for diagnosing and resolving the problem.

Despite taking place in May, the interview received minimal coverage, and few have viewed the video. I only learned about it through a Facebook post by the insightful Olesia Korobka (@Giridja).

Thus, despite its earlier date, the interview’s insights remain relevant and beneficial.

Reason For Crawled – Currently Not Indexed

“Crawled But Not Indexed” refers to an error highlighted in the Google Search Console Page Indexing report, indicating that a page was crawled by Google but not included in its index.

In a live interview, a question was posed:

“Could ‘crawled but not indexed’ occur because a page is too similar to other content already indexed? Is Google implying there’s sufficient similar content already, and thus your content isn’t unique enough?”

Google’s Search Console documentation doesn’t explicitly address why Google might crawl a page without indexing it, making this question particularly relevant and unanswered.

Gary Illyes acknowledged that one reason for pages being crawled but not indexed could indeed be the existence of similar content already indexed. However, he also noted that there are other factors at play.

He explained:

“Yeah, that could be one interpretation. ‘Crawled but not indexed’ ideally should be broken down into more specific categories, but it’s challenging due to the way our internal data is structured.

There are various reasons for this situation. Duplicate content elimination is one example where we crawl a page but decide not to index it because there’s already a version or a very similar version of that content in our index with stronger signals.

So, yes, it can involve multiple factors.”

General Quality Of Site Can Impact Indexing

Gary then highlighted another factor why Google might crawl a site but opt not to index it, suggesting it could relate to the overall quality of the site.

He elaborated further:

“And the overall quality of the site can play a significant role in how many ‘crawled but not indexed’ URLs you observe in Search Console. If there’s a high number of these URLs, it could indicate broader quality concerns.

I’ve noticed this frequently since February, where there’s been a sudden decision to index a large number of URLs on a site simply because our perception of the site has evolved.”

Other Reasons For Crawled Not Indexed

Gary then discussed additional factors contributing to URLs being crawled but not indexed, suggesting that changes in Google’s perception of the site or technical issues could be at play.

He elaborated:

“…When you notice that number increasing, it could indicate a shift in Google’s perception of the site.

Alternatively, there could be technical errors on the site where the same page is served for every URL, causing the number to rise.

So, there are multiple potential reasons for observing this phenomenon.”

Takeaways

Gary provided insights to diagnose why a webpage might be crawled but not indexed by Google:

Similar content to what is already ranked in SERPs
Identical content existing on another site with stronger signals
Issues with the overall quality of the site
Technical issues

While Illyes didn’t go into detail about what he meant by “better signals” from another site, it seems likely he was referring to a situation where a site syndicates its content to another site, and Google opts to rank the syndicated site over the original publisher for that content.

Original news from SearchEngineJournal