Skip Menu |
 

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 19539
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: imacat [...] mail.imacat.idv.tw
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 5.805
  • 5.822
  • 5.823
Fixed in: (no value)



Subject: WWW::RobotRules/LWP::RobotUA Does Not Respect Crawl-delay:
Download (untitled) / with headers
text/plain 1.2k
Hi. This is imacat from Taiwan. I was trying LWP::RobotUA, and found that WWW::RobotRules does not respect Crawl-delay:. The test script is (an exact copy in WWW::RobotRules's POD): ========== #! /usr/bin/perl -w use WWW::RobotRules; my $rules = WWW::RobotRules->new('MOMspider/1.0'); use LWP::Simple qw(get); my $url = "http://sourceforge.net/robots.txt"; my $robots_txt = get $url; $rules->parse($url, $robots_txt) if defined $robots_txt; ========== The result I got is: ========== imacat@rinse ~/tmp % ./test.pl RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 10 RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 2 RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 2 imacat@rinse ~/tmp % ========== Crawl-delay: is a popular instruction that is used all over the world, and is obeyed by Yahoo, MSN and many robots. A package written with LWP::RobotUA with such a warning all the time could not be used. This would make LWP::RobotUA quite useless. Besides, if a website has specified Crawl-delay:, LWP::RobotUA should respect it instead of its own $ua->delay(). Could you look into this and fix this soon? Thank you.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.