Skip Menu |
 

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 42420
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: wiml [...] hhhh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 5.822
  • 5.823
Fixed in: (no value)



Subject: WWW::RobotRules/LWP::RobotUA should not complain about Sitemap:
Download (untitled) / with headers
text/plain 420b
RobotRules emits a warning if a robots.txt file contains a Sitemap: line. Although RobotRules/RobotUA doesn't have much use for the sitemap, it shouldn't print warnings about a valid entry. Example error message: "RobotRules <http://www.adobe.com/robots.txt>: Unexpected line: Sitemap: http://www.adobe.com/sitemap.xml" Site maps: http://www.sitemaps.org/protocol.php#submit_robots (See also ticket number 19539)
Download (untitled) / with headers
text/plain 908b
commit a07c032942af65e68c4483a4d6e9b95c53f69c53 Author: Gisle Aas <gisle@aas.no> Date: Wed Jan 14 22:51:07 2009 +0100 Ignore Sitemap: lines in robots.txt [RT#42420] diff --git a/lib/WWW/RobotRules.pm b/lib/WWW/RobotRules.pm index 1ec7300..867952f 100644 --- a/lib/WWW/RobotRules.pm +++ b/lib/WWW/RobotRules.pm @@ -105,6 +105,9 @@ sub parse { push(@anon_disallowed, $disallow); } } + elsif (/^\s*Sitemap\s*:/i) { + # ignore + } else { warn "RobotRules <$robot_txt_uri>: Unexpected line: $_\n" if $^W; } diff --git a/t/robot/rules.t b/t/robot/rules.t index 6125e85..26b1025 100644 --- a/t/robot/rules.t +++ b/t/robot/rules.t @@ -62,6 +62,8 @@ User-Agent: SvartEnke2 Disallow: ftp://foo Disallow: http://foo:8080/ Disallow: http://bar/ + +Sitemap: http://www.adobe.com/sitemap.xml EOM my $content5 = <<EOM;


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.