Skip Menu |
 

This queue is for tickets about the WWW-RobotRules CPAN distribution.

Report information
The Basics
Id: 68219
Status: new
Priority: 0/
Queue: WWW-RobotRules

People
Owner: Nobody in particular
Requestors: yannick.simon [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: WWW-RobotRules parsing rules of robots.txt like googlebot
Date: Sun, 15 May 2011 23:32:37 +0200
To: bug-WWW-RobotRules [...] rt.cpan.org
From: Yannick Simon <yannick.simon [...] gmail.com>
Download (untitled) / with headers
text/plain 677b
Hello Thank you for this great library WWW-RobotRules the is_allowed function is "ok" for the pure robots.txt rules however, 1 - googlebot allows the rules with * characters for instance Disallow: /path/*/10 for instance, for googlebot /path/sgsdfg/10 is disallowed /path/sdfgsdfgzegz/10222D2 is disallowed (lets take a look at http://www.google.com/robots.txt) 2 - googlebot allows the "Allow" directive it would be great if there could be another "is_allowed" function for instance is_allowed_extended who acts as googlebot if you don't have time, perhaps we can imagine i try tho develop the "is_allowed_extended" function ? ;) Thank You regards Yannick


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.