Skip Menu |
 

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 26436
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Jeff.Fearn [...] gmail.com
Requestors: eharrison [...] realestate.com.au
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.23
Fixed in: (no value)



Subject: as_trimmed_text in HTML::Element does not trim  
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 455
Download (untitled) / with headers
text/plain 455b
sub as_trimmed_text { my $text = shift->as_text(@_); $text =~ s/[\n\r\f\t ]+$//s; $text =~ s/^[\n\r\f\t ]+//s; $text =~ s/[\n\r\f\t ]+/ /g; return $text; } This fails to trim   from $text which is commonly used in HTML The following would resolve the problem: sub as_trimmed_text { my $text = shift->as_text(@_); $text =~ s/[\n\r\f\t\xA0 ]+$//s; $text =~ s/^[\n\r\f\t\xA0 ]+//s; $text =~ s/[\n\r\f\t\xA0 ]+/ /g; return $text; }
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-13487-1177050678-1000.26436-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: perl [...] cjmweb.net
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 424
Download (untitled) / with headers
text/plain 424b
On Mon Apr 16 22:41:10 2007, gzminiz wrote: Show quoted text
> sub as_trimmed_text {
Show quoted text
> This fails to trim &nbsp; from $text which is commonly used in HTML > The following would resolve the problem:
This behavior is as designed. U+00A0 (&nbsp;) is not considered whitespace in the HTML specification; see http://www.w3.org/TR/html4/struct/text.html#h-9.1 That said, it wouldn't hurt if this was mentioned in the docs for as_trimmed_text.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-26542-1272083595-1960.26436-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 63
Updated docs to be clearer on what white space will be cleaned.
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-13487-1177050678-1000.26436-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.6.HEAD-13487-1177050678-1000.26436-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-2908-1277765150-509.26436-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: dma_k [...] mail.ru
X-RT-Original-Encoding: utf-8
Content-Length: 435
Download (untitled) / with headers
text/plain 435b
Птн Апр 20 02:31:18 2007, CJM писал: Show quoted text
> This behavior is as designed. U+00A0 (&nbsp;) is not considered > whitespace in the HTML specification; see > http://www.w3.org/TR/html4/struct/text.html#h-9.1
Pity. Would be useful in many cases, as API consumers expect. Maybe one can introduce yet another helper to trim also non-breaking spaces? Or pass an additional option as an argument e.g. as_trimmed_text('trim_nbsp' => 1).
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-3226-1280716827-1245.26436-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 220
Download (untitled) / with headers
text/plain 220b
Hi, what I did was add a parameter,extra_chars, that allows the user to add a string that will be used in the regexes. e.g. to remove the encoded or un-encoded &nbsp; $h->as_trimmed_text(extra_chars => '&nbsp;\xA0');
MIME-Version: 1.0
Subject: 4.0 released
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-24885-1285123065-883.26436-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 96
Hi HTML::Tree ve4rsion 4.0 has been released which includes a fix for this issue. Cheers, Jeff.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.