This queue is for tickets about the HTML-Selector-XPath CPAN distribution.

Report information
The Basics
Id:
117127
Status:
open
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
kosmichal [...] gmail.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



MIME-Version: 1.0
X-Spam-Status: No, score=-2.698 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
X-Spam-Flag: NO
X-Virus-Checked: Checked
Content-Type: multipart/alternative; boundary="001a1130c92e116171053aa9bed4"
Message-ID: <CADGz=_hzxjwZOkX+wn8izy9KW4zH3JgAvhtaX3CGuBSdHNXwXA@mail.gmail.com>
X-Received: by 10.194.118.38 with SMTP id kj6mr17237217wjb.181.1471875596225; Mon, 22 Aug 2016 07:19:56 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -2.698
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 0182224026B for <cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com>; Mon, 22 Aug 2016 10:20:10 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1N9NCtN-9pwd for <cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com>; Mon, 22 Aug 2016 10:20:08 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id E3FEC2400B6 for <bug-HTML-Selector-XPath@rt.cpan.org>; Mon, 22 Aug 2016 10:20:07 -0400 (EDT)
Received: (qmail 17392 invoked by alias); 22 Aug 2016 14:20:05 -0000
Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 22 Aug 2016 07:20:02 -0700
Received: by mail-wm0-f50.google.com with SMTP id q128so123254633wma.1 for <bug-HTML-Selector-XPath@rt.cpan.org>; Mon, 22 Aug 2016 07:20:01 -0700 (PDT)
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i=@gmail.com
Delivered-To: cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com
Subject: selector_to_xpath wth css selector using :contains
Return-Path: <kosmichal@gmail.com>
X-RT-Mail-Extension: html-selector-xpath
X-Original-To: cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com
X-Spam-Check-BY: la.mx.develooper.com
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=IY3TY2sfRza3O+KKk+VthCtu8RFy0Y880d6t3tKxnlo=; b=mI7lz/qtin4C//p623G7DuDDkym5GpnyJLof0Pf/kD9T7Ub8fu35s1EHCVWbuj5l0b ZwnB5Lk8OjiZDKezjXPlFRa63nnns1PyiWNFHObIDSw2c81oU6W3MWBzwaFkEEr3K6Vk 0upg4SnrDnEr3n8e36wbw5pf2/4xnPXROQtzgZF/uLBpWVUZQWf8lvbjg3NdO0VBp8T5 ZWIs5QH5EceRZuxB1GldOxJVHKAMovNpBYFHDFWIMNrhCtDd7tnF4og7fQj/nwA1Uuys xqCrtkMuRw+qbjuKX0L//I+TJP7WmZFOXBimewXmkHHIpKq4EqGJAmwa+tV38V7mZ0Yy Be7w==
X-Google-Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=IY3TY2sfRza3O+KKk+VthCtu8RFy0Y880d6t3tKxnlo=; b=Uz3D0j7AUZQ0e+hESyk2O2gVzwhKu1o2lmOkT+7qb5bsuCVswAzZRdwZ8srKDS7o9T Lax0cpP8O/s3QXfUqdt2TsH68Kj65Dt5q6rrWsnKdd9b5RtwclBAEzMiDIeD82N0tIds QF/PsxWTeXnZMRrqR6WI7Iu03uuf8kXuDsl9ocfPSIWLqQApnGLmYBQpKJYqEMIPZg13 3m088t2HpFn9gxYb2WpzOutqhOTkTmUQ7p0uHJDR/XsTRV4fDECQoAtLk3siNsKbHAAI u3mpiBEr4YSm64ZNwizTg35rVUx0FaGVhqDW5jC3kbJTAsWrPt3JyoBsWNmrLwhOwQlJ H/Vw==
Date: Mon, 22 Aug 2016 14:19:45 +0000
X-Spam-Level:
To: "bug-HTML-Selector-XPath@rt.cpan.org" <bug-HTML-Selector-XPath@rt.cpan.org>
From: Michal Kos <kosmichal@gmail.com>
X-GM-Message-State: AEkooutpoeE/2IboGXXaMBFGqvmw+GfBo/ZARXdf2lrz9dxntxBhAmLM/GPGOdRPBaIwHKKqJrCHkJL7MCu8kQ==
X-RT-Interface: Email
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 1900
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 3069
Hi,
There seems to be a problem when converting css selectors to xpath if they use :contains pseudoclass.
Consider following html:

#####
<label ><span >Title</span>
                        <span >*</span>                  </label>
#####

now if I try following css selector (say in Chrome console) "label:contains('Title')" it returns <label>...</label>

if I use selector_to_xpath("label:contains('Title')") it returns following xpath:
"//label[text()[contains(string(.),'Title')]]"
trying to use this xpath results in empty result []
however if I modify the xpath and remove text() so the xpath is "//label[contains(string(.),'Title')]" it will return same element as css selector would do.

The fix would be to edit HTML/Selector/XPath.pm lines 221 and 223 and change:
push @parts, qq{[text()[contains(string(.),"$1")]]};
to
push @parts, qq{[contains(string(.),"$1")]};

Could you please apply it?

 * Distrubution name: HTML-Selector-XPath 0.20

 * Perl version:
$ perl -v

This is perl 5, version 20, subversion 2 (v5.20.2) built for x86_64-linux-gnu-thread-multi
(with 81 registered patches, see perl -V for more detail)

 * OS:
$ uname -a
Linux deb-vm 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

### test case ###
use strict;
use warnings;
use HTML::Selector::XPath 'selector_to_xpath';
use Test::More tests => 1;

my $css="label:contains('Title')";
my $xpath = selector_to_xpath($css); #//label[text()[contains(string(.),"Title")]]
my $expected="//label[contains(string(.),'Title')]";
is($xpath,$expected,"selector_to_xpath(label:contains('Title'))");
### test case ###
### html example ###
<!DOCTYPE html>
<html>
<head>
<title>
test
</title>
<script src="/resources/jquery-3.1.0.min.js"></script>
</head>
<body>



<label ><span >Title</span>
                        <span >*</span>                  </label>

</body>
</html>
### html example ###  

Thank you.
MIME-Version: 1.0
X-Spam-Status: No, score=-5.901 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, FROM_OUR_RT=-4, RP_MATCHES_RCVD=-0.001] autolearn=ham
In-Reply-To: <rt-4.0.18-31250-1471875611-357.117127-4-0@rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-117127@rt.cpan.org> <CADGz=_hzxjwZOkX+wn8izy9KW4zH3JgAvhtaX3CGuBSdHNXwXA@mail.gmail.com> <rt-4.0.18-31250-1471875611-357.117127-4-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <7791cf51-15d6-629c-b0a7-f5e09d158bbe@corion.net>
content-type: text/plain; charset="utf-8"; format="flowed"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.901
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id C566E240365 for <cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com>; Tue, 30 Aug 2016 15:16:25 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pU0zIgYjbxLg for <cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com>; Tue, 30 Aug 2016 15:16:24 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 0086B2401FC for <bug-HTML-Selector-XPath@rt.cpan.org>; Tue, 30 Aug 2016 15:16:23 -0400 (EDT)
Received: (qmail 22711 invoked by alias); 30 Aug 2016 19:16:23 -0000
Received: from mail.corion.net (HELO mail.corion.net) (46.163.73.47) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Tue, 30 Aug 2016 12:16:20 -0700
Received: from p57adc6c0.dip0.t-ipconnect.de ([87.173.198.192] helo=aliens.maischein.home) by mail.corion.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from <corion@corion.net>) id 1beoW3-00024M-S3 for bug-HTML-Selector-XPath@rt.cpan.org; Tue, 30 Aug 2016 21:16:12 +0200
Received: from cabininthewoods.maischein.home ([192.168.1.92]) by aliens.maischein.home with esmtp (Exim 4.84_2) (envelope-from <corion@corion.net>) id 1beoW3-0001Cd-5P for bug-HTML-Selector-XPath@rt.cpan.org; Tue, 30 Aug 2016 21:16:11 +0200
Delivered-To: cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #117127] selector_to_xpath wth css selector using :contains
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
Return-Path: <corion@corion.net>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+HTML-Selector-XPath@hipster.bestpractical.com
X-RT-Mail-Extension: html-selector-xpath
Date: Tue, 30 Aug 2016 21:16:02 +0200
X-Spam-Level:
To: bug-HTML-Selector-XPath@rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Max Maischein <corion@corion.net>
RT-Message-ID: <rt-4.0.18-14505-1472584586-1467.117127-0-0@rt.cpan.org>
Content-Length: 2148
Hello Michal, thank you very much for analyzing the issue and sending a patch.
Show quoted text
> There seems to be a problem when converting css selectors to xpath if they > use :contains pseudoclass.
I think part of the problem is that the :contains() selector was never really specified and is now deprecated. As implemented in HTML::Selector::XPath and its test suite, :contains only applies to the immediate node, not its child nodes.
Show quoted text
> Consider following html: > > ##### > <label ><span >Title</span> > <span >*</span> </label> > ##### > > now if I try following css selector (say in Chrome console) > "label:contains('Title')" it returns <label>...</label>
Neither Firefox nor Chrome implement the :contains() selector natively . The jQuery documentation supports your usage case of selecting all nodes which themselves or whose children contain a given text [1], but it also claims that the text may even span nodes, which your approach does not support...
Show quoted text
> Could you please apply it?
I have to think about a backwards compatible way that allows the also very useful old way of only selecting nodes whose immediate text contains the search text, so users have a simple way to adapt their queries to the changed semantics. Also, I have to see if/how jQuery supports text spanning across nodes and how it still matches that. My test case is the following HTML: <a href="Other"><p>Yes No No</p></a> <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> with this selector: a:contains("YesNo") And I expect the two following tags to match: <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> Basically this addition to t/02_html.t === --- input <a href="Other"><p>Yes No No</p></a> <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> --- selector a:contains("YesNo") --- expected <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> And currently, that fails and I'm not really sure how to fix it. -max [1] https://api.jquery.com/contains-selector/


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.