This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id:
18568
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
mjd [...] plover.com
Cc:
AdminCc:

BugTracker
Severity:
Critical
Broken in:
3.13
Fixed in:
3.22



Subject: HTML::TreeBuilder mangles decimal number entities
use HTML::TreeBuilder; my $TB = HTML::TreeBuilder->new(); my $html = $TB->parse("This ſoftware has ſome bugs")->eof->element\ ify(); print $html->as_HTML(""); The content output from this program is not the same as the input. The input contains "&#17f" and "&#383". The output has erroneously translated this to "&#17f" and "&#383".
On Thu Apr 06 12:55:24 2006, guest wrote:
Show quoted text
> The content output from this program is not the same as the input. The > input contains "&#17f" and "&#383". The output has erroneously > translated this to "&#17f" and "&#383".
There are two things going on here. One is that HTML::TreeBuilder was erroneously re-encoding entities such as ſ by escaping &. This has been fixed in 3.22, which will be released on CPAN this weekend as part of the Chicago Hackathon. The other, unfixable in HTML::TreeBuilder, is that HTML::Parser re-encodes both of the above to ſ instead of their original forms. Since HTML::TreeBuilder's parse method comes from HTML::Parser, this would have to be changed in the XS for HTML::Parser. However, I'm not convinced it's a bug, since they're the same entity when decoded. Will mark as resolved when 3.22 hits CPAN.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.