Last updated on
In a recent LinkedIn update, Google Analyst Gary Illyes questioned a longstanding assumption regarding the location of robots.txt files.
Traditionally, it has been widely believed that a website’s robots.txt file must be placed at the root domain (e.g., example.com/robots.txt).
However, Illyes has clarified that this is not strictly mandatory and highlighted a lesser-known facet of the Robots Exclusion Protocol (REP).
The robots.txt file no longer needs to be positioned solely at the root domain (example.com/robots.txt).
According to Illyes, it is acceptable to have two separate robots.txt files hosted on different domains—one on the primary website and another on a content delivery network (CDN).
Illyes suggests that websites can consolidate their robots.txt file on the CDN while managing crawling directives for their main site.
For example, a website might maintain two robots.txt files: one located at https://cdn.example.com/robots.txt and another at https://www.example.com/robots.txt.
This approach enables the maintenance of a unified robots.txt file on the CDN and the redirection of requests from the main domain to this centralized file.
Illyes points out that crawlers adhering to RFC9309 will follow the redirection and utilize the targeted file as the robots.txt for the original domain.
As the Robots Exclusion Protocol marks its 30th anniversary this year, Illyes’ disclosure underscores the ongoing evolution of web standards.
He raises questions about whether the file must retain its traditional name “robots.txt,” suggesting potential shifts in how crawl instructions could be handled in the future.
Following Illyes’ recommendations can offer several benefits:
Adopting a streamlined method for handling robots.txt files can enhance both site administration and SEO endeavors.
Original news from SearchEngineJournal