Skip Menu |
 

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 18571
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: mjd [...] plover.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 3.1901
Fixed in: (no value)



Subject: More encoding bizarrities
Download (untitled) / with headers
text/plain 603b
use HTML::TreeBuilder; use Data::Dumper; my $TB = HTML::TreeBuilder->new(); my $html = $TB->parse("This &sim; is a twiddle")->eof->elementify(); for my $pack qw(HTML::Parser HTML::TreeBuilder HTML::Element) { print qq{$pack: ${"$pack\::VERSION"}\n}; } print $html->as_HTML("\x0"); print $html->as_HTML(""); The output is: HTML::Parser: 3.51 HTML::TreeBuilder: 3.13 HTML::Element: 3.16 Wide character in print at /tmp/tb2 line 10. <html><head></head><body>This \342? is a twiddle</body></html> <html><head></head><body>This &sim; is a twiddle</body></html> This can't be right.
Download (untitled) / with headers
text/plain 532b
On Thu Apr 06 13:10:58 2006, guest wrote: Show quoted text
> <html><head></head><body>This \342? is a twiddle</body></html> > <html><head></head><body>This &sim; is a twiddle</body></html> > > This can't be right.
The difference in the two is that "\0" and "" have differences in length, so HTML::Entities escapes nothing (assuming you have no nulls in your document) and "" uses HTML::Entities' default list of things to escape, of which &sim; is in that included list, assuming a Perl of 5.7 or newer. I'm not sure what I can do about this.
Subject: Re: [rt.cpan.org #18571] More encoding bizarrities
Date: Sat, 11 Nov 2006 23:20:19 -0500
To: bug-HTML-Tree [...] rt.cpan.org
From: Mark Jason Dominus <mjd [...] plover.com>
Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=18571 > > > On Thu Apr 06 13:10:58 2006, guest wrote:
> > <html><head></head><body>This \342? is a twiddle</body></html> > > <html><head></head><body>This &sim; is a twiddle</body></html> > > > > This can't be right.
> > The difference in the two is that "\0" and "" have differences in > length, so HTML::Entities escapes nothing (assuming you have no nulls in > your document) and "" uses HTML::Entities' default list of things to > escape, of which &sim; is in that included list, assuming a Perl of 5.7 > or newer. > > I'm not sure what I can do about this.
What is the problem? Why not something like this: my @escape_chars; if (defined $_[0]) { @escape_chars = split //, shift(); } else { @escape_chars = @default_escape_chars; } And now @escape_chars contains a list of characters to escape. "" now results in an empty list of characters, as it should, but a missing or undefined argument results in the default, as documented.
Download (untitled) / with headers
text/plain 381b
On Sat Nov 11 23:20:35 2006, mjd@plover.com wrote: Show quoted text
> What is the problem?
The problem was I wasn't thinking about just NOT calling the encode functionality when it shouldn't be encoded, and blaming the whole problem on HTML::Entities. Updated as_html to do the Right thing, as opposed to codifying the wrong thing. 3.23 is on its way to CPAN with this fix. Thank you very much.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.