In a recent LinkedIn update, Google Analyst Gary Illyes questioned a longstanding assumption regarding the location of robots.txt files.
Traditionally, it has been widely believed that a website’s robots.txt file must be placed at the root domain (e.g., example.com/robots.txt).
However, Illyes has clarified that this is not strictly mandatory and highlighted a lesser-known facet of the Robots Exclusion Protocol (REP).
Robots.txt File Flexibility
The robots.txt file no longer needs to be positioned solely at the root domain (example.com/robots.txt).
According to Illyes, it is acceptable to have two separate robots.txt files hosted on different domains—one on the primary website and another on a content delivery network (CDN).
Illyes suggests that websites can consolidate their robots.txt file on the CDN while managing crawling directives for their main site.
For example, a website might maintain two robots.txt files: one located at https://cdn.example.com/robots.txt and another at https://www.example.com/robots.txt.
This approach enables the maintenance of a unified robots.txt file on the CDN and the redirection of requests from the main domain to this centralized file.
Illyes points out that crawlers adhering to RFC9309 will follow the redirection and utilize the targeted file as the robots.txt for the original domain.
Looking Back At 30 Years Of Robots.txt
As the Robots Exclusion Protocol marks its 30th anniversary this year, Illyes’ disclosure underscores the ongoing evolution of web standards.
He raises questions about whether the file must retain its traditional name “robots.txt,” suggesting potential shifts in how crawl instructions could be handled in the future.
How This Can Help You
Following Illyes’ recommendations can offer several benefits:
- Centralized Control: By consolidating robots.txt rules in a single location, you can efficiently manage and update crawl directives across your entire web presence.
- Enhanced Uniformity: Having a unified source for robots.txt rules minimizes the chance of conflicting directives between your primary site and CDN.
- Increased Flexibility: This strategy enables more versatile configurations, which is particularly advantageous for websites with intricate architectures or employing multiple subdomains and CDNs.
Adopting a streamlined method for handling robots.txt files can enhance both site administration and SEO endeavors.
Original news from SearchEngineJournal