Last updated on
In May, Gary Illyes from Google was interviewed at the SERP Conf 2024 conference in Bulgaria, where he addressed the issue of crawled but not indexed pages. He highlighted several reasons behind this issue, which are valuable for diagnosing and resolving the problem.
Despite taking place in May, the interview received minimal coverage, and few have viewed the video. I only learned about it through a Facebook post by the insightful Olesia Korobka (@Giridja).
Thus, despite its earlier date, the interview’s insights remain relevant and beneficial.
“Crawled But Not Indexed” refers to an error highlighted in the Google Search Console Page Indexing report, indicating that a page was crawled by Google but not included in its index.
In a live interview, a question was posed:
“Could ‘crawled but not indexed’ occur because a page is too similar to other content already indexed? Is Google implying there’s sufficient similar content already, and thus your content isn’t unique enough?”
Google’s Search Console documentation doesn’t explicitly address why Google might crawl a page without indexing it, making this question particularly relevant and unanswered.
Gary Illyes acknowledged that one reason for pages being crawled but not indexed could indeed be the existence of similar content already indexed. However, he also noted that there are other factors at play.
He explained:
“Yeah, that could be one interpretation. ‘Crawled but not indexed’ ideally should be broken down into more specific categories, but it’s challenging due to the way our internal data is structured.
There are various reasons for this situation. Duplicate content elimination is one example where we crawl a page but decide not to index it because there’s already a version or a very similar version of that content in our index with stronger signals.
So, yes, it can involve multiple factors.”
Gary then highlighted another factor why Google might crawl a site but opt not to index it, suggesting it could relate to the overall quality of the site.
He elaborated further:
“And the overall quality of the site can play a significant role in how many ‘crawled but not indexed’ URLs you observe in Search Console. If there’s a high number of these URLs, it could indicate broader quality concerns.
I’ve noticed this frequently since February, where there’s been a sudden decision to index a large number of URLs on a site simply because our perception of the site has evolved.”
Gary then discussed additional factors contributing to URLs being crawled but not indexed, suggesting that changes in Google’s perception of the site or technical issues could be at play.
He elaborated:
“…When you notice that number increasing, it could indicate a shift in Google’s perception of the site.
Alternatively, there could be technical errors on the site where the same page is served for every URL, causing the number to rise.
So, there are multiple potential reasons for observing this phenomenon.”
Gary provided insights to diagnose why a webpage might be crawled but not indexed by Google:
While Illyes didn’t go into detail about what he meant by “better signals” from another site, it seems likely he was referring to a situation where a site syndicates its content to another site, and Google opts to rank the syndicated site over the original publisher for that content.
Original news from SearchEngineJournal