From yannick.simon@gmail.com Sun May 15 17: | 32:46 2011 |
MIME-Version: | 1.0 |
X-Spam-Status: | No, score=-6.209 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RFC_ABUSE_POST=0.001, SPF_NEUTRAL=0.779, T_TO_NO_BRKTS_FREEMAIL=0.01] autolearn=ham |
X-Spam-Flag: | NO |
content-type: | text/plain; charset="utf-8" |
Message-ID: | <BANLkTin332iqtVwQZ+Nw9wT2j8+t7E8cAQ@mail.gmail.com> |
X-Virus-Scanned: | Debian amavisd-new at bestpractical.com |
X-Spam-Score: | -6.209 |
Received: | from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id BA357241A7D for <cpan-bug+WWW-RobotRules@hipster.bestpractical.com>; Sun, 15 May 2011 17:32:46 -0400 (EDT) |
Received: | from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q0TuOM7DUvVb for <cpan-bug+WWW-RobotRules@hipster.bestpractical.com>; Sun, 15 May 2011 17:32:45 -0400 (EDT) |
Received: | from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 9E2C5241A31 for <bug-WWW-RobotRules@rt.cpan.org>; Sun, 15 May 2011 17:32:44 -0400 (EDT) |
Received: | (qmail 21282 invoked by uid 103); 15 May 2011 21:32:44 -0000 |
Received: | from x16.dev (10.0.100.26) by x1.dev with QMQP; 15 May 2011 21:32:44 -0000 |
Received: | from mail-fx0-f50.google.com (HELO mail-fx0-f50.google.com) (209.85.161.50) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Sun, 15 May 2011 14:32:41 -0700 |
Received: | by fxm16 with SMTP id 16so2765702fxm.9 for <bug-WWW-RobotRules@rt.cpan.org>; Sun, 15 May 2011 14:32:38 -0700 (PDT) |
Received: | by 10.223.3.132 with SMTP id 4mr3227649fan.132.1305495157987; Sun, 15 May 2011 14:32:37 -0700 (PDT) |
Received: | by 10.223.96.9 with HTTP; Sun, 15 May 2011 14:32:37 -0700 (PDT) |
Authentication-Results: | hipster.bestpractical.com (amavisd-new); dkim=pass header.i=@gmail.com |
Authentication-Results: | hipster.bestpractical.com (amavisd-new); domainkeys=pass header.from=yannick.simon@gmail.com |
Delivered-To: | cpan-bug+WWW-RobotRules@hipster.bestpractical.com |
Subject: | WWW-RobotRules parsing rules of robots.txt like googlebot |
Return-Path: | <yannick.simon@gmail.com> |
Domainkey-Signature: | a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=UNlWHAeB6VOF/F0GxODQl5X+uo4LFpSUMYQLyN9HXhNUkAGyRb4TqBuP5Ln9LdD16v 5+PDWr6gpG6OiKYpfcrQVi0yqBjfk+3XkjKZ2DiuLLH5lAjuFgTcocOTh/Z5KDt/p5G/ G2oHL4IFlZgl/KKgtX7jYDFyEMs/gHYTWcwDY= |
X-RT-Mail-Extension: | www-robotrules |
X-Original-To: | cpan-bug+WWW-RobotRules@hipster.bestpractical.com |
X-Spam-Check-BY: | 16.mx.develooper.com |
Dkim-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=1d8YyZcCQUEyTlUJX9aJBVz5gdm2d4I0JdCxKUYlbDY=; b=CVRLoP4NCyc4O/x7jdHlWgHyYgDLK9o6BIMx08X2KPEe6vKnGxI4c4SNQIwRoloAqC xOt/8YZ0Uhb+S0F5ba1pCQ6Yki3TdsPhqvB3D2RIfR99PUVY6AotQ4eDHvk1k1eZx5Zq 6jQsjX//W4w5j5joPbkiQw2c40NmV9bXZcGTg= |
Date: | Sun, 15 May 2011 23:32:37 +0200 |
X-Spam-Level: | |
To: | bug-WWW-RobotRules@rt.cpan.org |
From: | Yannick Simon <yannick.simon@gmail.com> |
X-RT-Original-Encoding: | ISO-8859-1 |
Content-Length: | 677 |
Hello
Thank you for this great library WWW-RobotRules
the is_allowed function is "ok" for the pure robots.txt rules
however,
1 - googlebot allows the rules with * characters
for instance
Disallow: /path/*/10
for instance, for googlebot
/path/sgsdfg/10 is disallowed
/path/sdfgsdfgzegz/10222D2 is disallowed
(lets take a look at http://www.google.com/robots.txt)
2 - googlebot allows the "Allow" directive
it would be great if there could be another "is_allowed" function
for instance is_allowed_extended
who acts as googlebot
if you don't have time, perhaps we can imagine i try tho develop the
"is_allowed_extended" function ? ;)
Thank You
regards
Yannick