Skip Menu |
 

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 18568
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: mjd [...] plover.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 3.13
Fixed in: 3.22



Subject: HTML::TreeBuilder mangles decimal number entities
Download (untitled) / with headers
text/plain 370b
use HTML::TreeBuilder; my $TB = HTML::TreeBuilder->new(); my $html = $TB->parse("This ſoftware has ſome bugs")->eof->element\ ify(); print $html->as_HTML(""); The content output from this program is not the same as the input. The input contains "&#17f" and "&#383". The output has erroneously translated this to "&#17f" and "&#383".
Download (untitled) / with headers
text/plain 864b
On Thu Apr 06 12:55:24 2006, guest wrote: Show quoted text
> The content output from this program is not the same as the input. The > input contains "&#17f" and "&#383". The output has erroneously > translated this to "&#17f" and "&#383".
There are two things going on here. One is that HTML::TreeBuilder was erroneously re-encoding entities such as ſ by escaping &. This has been fixed in 3.22, which will be released on CPAN this weekend as part of the Chicago Hackathon. The other, unfixable in HTML::TreeBuilder, is that HTML::Parser re-encodes both of the above to ſ instead of their original forms. Since HTML::TreeBuilder's parse method comes from HTML::Parser, this would have to be changed in the XS for HTML::Parser. However, I'm not convinced it's a bug, since they're the same entity when decoded. Will mark as resolved when 3.22 hits CPAN.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.