This queue is for tickets about the HTML-Selector-XPath CPAN distribution.

Report information
The Basics
Id:
81735
Status:
rejected
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
parlay [...] yopmail.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



Subject: workaround bug in HTML::TreeBuilder::XPath
As reported here: https://rt.cpan.org/Public/Bug/Display.html?id=81722 HTML::TreeBuilder lowercases the attribute names in the resulting HTML tree, but HTML::TreeBuilder::XPath doesn't bother to lowercase the attribute names in the selector, so if the user tries to select based upon the original-cased version of the attribute name, and it was anything other than all-lowercase, the match would fail. The author of HTML::TreeBuilder::XPath apparently doesn't think this is a bug, as he thinks it will suffice to document this in HTML::TreeBuilder, which isn't going to help somebody using a higher-level module, like Web::Scraper. Here is a test case demonstrating the issue in HTML::TreeBuilder::XPath: https://rt.cpan.org/Ticket/Attachment/1149252/604410/d.pl This can be worked around in HTML::Selector::XPath with the attached patch.
Subject: lcattr.diff
diff -Naur lib/HTML/Selector/XPath.pm /tmp/lib/HTML/Selector/XPath.pm --- lib/HTML/Selector/XPath.pm 2012-10-01 17:18:02.000000000 +0000 +++ /tmp/lib/HTML/Selector/XPath.pm 2012-12-06 06:36:07.000000000 +0000 @@ -50,6 +50,8 @@ sub convert_attribute_match { my ($left,$op,$right) = @_; + $left = lc $left; + # negation (e.g. [input!="text"]) isn't implemented in CSS, but include it anyway: if ($op eq '!=') { "\@$left!='$right'"; @@ -166,7 +168,7 @@ push @parts, '*'; $tag_index = $#parts; }; - push @parts, "[\@$1]"; + push @parts, "[\@\L$1]"; } elsif ($rule =~ $reg->{badattr}) { Carp::croak "Invalid attribute-value selector '$rule'"; } @@ -177,7 +179,7 @@ if ($sub_rule =~ s/$reg->{attr2}//) { push @parts, "[not(", convert_attribute_match( $1, $2, $^N ), ")]"; } elsif ($sub_rule =~ s/$reg->{attr1}//) { - push @parts, "[not(\@$1)]"; + push @parts, "[not(\@\L$1)]"; } elsif ($rule =~ $reg->{badattr}) { Carp::croak "Invalid attribute-value selector '$rule'"; } else {
Subject: Re: [rt.cpan.org #81735] workaround bug in HTML::TreeBuilder::XPath
Date: Wed, 5 Dec 2012 22:41:23 -0800
To: "bug-HTML-Selector-XPath@rt.cpan.org" <bug-HTML-Selector-XPath@rt.cpan.org>
From: Tatsuhiko Miyagawa <miyagawa@gmail.com>
use github https://github.com/miyagawa/HTML-Selector-XPath and make a pull request there.


On Wed, Dec 5, 2012 at 10:39 PM, parlay via RT <bug-HTML-Selector-XPath@rt.cpan.org> wrote:
Show quoted text
Thu Dec 06 01:39:44 2012: Request 81735 was acted upon.
Transaction: Ticket created by parlay
       Queue: HTML-Selector-XPath
     Subject: workaround bug in HTML::TreeBuilder::XPath
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: parlay@yopmail.com
      Status: new
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81735 >


As reported here: https://rt.cpan.org/Public/Bug/Display.html?id=81722

HTML::TreeBuilder lowercases the attribute names in the resulting HTML
tree, but HTML::TreeBuilder::XPath doesn't bother to lowercase the
attribute names in the selector, so if the user tries to select based
upon the original-cased version of the attribute name, and it was
anything other than all-lowercase, the match would fail.

The author of HTML::TreeBuilder::XPath apparently doesn't think this is
a bug, as he thinks it will suffice to document this in
HTML::TreeBuilder, which isn't going to help somebody using a
higher-level module, like Web::Scraper.

Here is a test case demonstrating the issue in HTML::TreeBuilder::XPath:
https://rt.cpan.org/Ticket/Attachment/1149252/604410/d.pl

This can be worked around in HTML::Selector::XPath with the attached patch.



--
Tatsuhiko Miyagawa
From: parlay@yopmail.com
On Thu Dec 06 01:41:55 2012, miyagawa@gmail.com wrote:
Show quoted text
> use github https://github.com/miyagawa/HTML-Selector-XPath and make a > pull > request there.
I'll copy this over there, but you should update the distribution metadata to point to github issues for the bugtracker, because metacpan and search.cpan.org links to rt, which is why I posted here.
As per discussion in https://github.com/miyagawa/HTML-Selector-XPath/issues/12 , this module is the wrong place to fix this.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.