This queue is for tickets about the HTML-HTML5-Entities CPAN distribution.

Report information
The Basics
Id:
97659
Status:
resolved
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
pagenyon [...] gmail.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



Subject: doesn't actually work for most characters
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
Message-ID: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org>
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1406843415-10388-2"
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
Content-Length: 138
test case attached demonstrating encoding and decoding don't work for majority of characters: # Looks like you failed 3958 tests of 4250.
Subject: a.pl
MIME-Version: 1.0
Content-Type: text/x-perl; name="a.pl"
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline; filename="a.pl"
Content-Transfer-Encoding: binary
Content-Length: 352
#!/usr/bin/env perl use strict; use warnings; use HTML::HTML5::Entities; use Test::More; while (my ($ent, $chr) = each %HTML::HTML5::Entities::entity2char) { next unless ';' eq substr $ent, -1, 1; $ent = "&$ent"; is decode_entities($ent), $chr, "decoding entity"; is encode_entities($chr), $ent, "encoding character"; } done_testing;
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1406960006-17121-2"
Message-ID: <rt-4.0.18-17121-1406960006-324.0-0-0@rt.cpan.org>
Message-ID: <rt-4.0.18-17121-1406960006-1491.97659-0-0@rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: pagenyon@gmail.com
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 534
On Thu Jul 31 17:50:15 2014, pagenyon wrote:
Show quoted text
> test case attached demonstrating encoding and decoding don't work for > majority of characters: > > # Looks like you failed 3958 tests of 4250.
So the encoding is working, it's just returning numeric entities instead of the named entities. But the decoding is really broken. I attached a script that shows that the decode_entities from HTML::Entities can properly replace the entities using your %entity2char hash, whereas the decode_entities from your module doesn't have any effect.
MIME-Version: 1.0
Subject: b.pl
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-perl; name="b.pl"
Content-Disposition: inline; filename="b.pl"
Content-Transfer-Encoding: binary
Content-Length: 387
#!/usr/bin/env perl use strict; use warnings; use HTML::Entities; use HTML::HTML5::Entities (); use Test::More; *HTML::Entities::entity2char = \ %HTML::HTML5::Entities::entity2char; while (my ($ent, $chr) = each %HTML::HTML5::Entities::entity2char) { next unless ';' eq substr $ent, -1, 1; $ent = "&$ent"; is decode_entities($ent), $chr, "decoding entity"; } done_testing;
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-17121-1406960006-324.0-0-0@rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org> <rt-4.0.18-17121-1406960006-324.0-0-0@rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-2588-1407714699-54.0-0-0@rt.cpan.org>
Message-ID: <rt-4.0.18-2588-1407714699-773.97659-0-0@rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: pagenyon@gmail.com
Content-Length: 1131
On Sat Aug 02 02:13:26 2014, pagenyon wrote:
Show quoted text
> On Thu Jul 31 17:50:15 2014, pagenyon wrote:
> > test case attached demonstrating encoding and decoding don't work for > > majority of characters: > > > > # Looks like you failed 3958 tests of 4250.
> > So the encoding is working, it's just returning numeric entities > instead of the named entities. > > But the decoding is really broken. I attached a script that shows that > the decode_entities from HTML::Entities can properly replace the > entities using your %entity2char hash, whereas the decode_entities > from your module doesn't have any effect.
The problem is in your regular expression. I have a patch for decode_entities, I didn't look at _decode_entities: --- lib/HTML/HTML5/Entities.pm 2012-06-26 20:35:25.000000000 +0000 +++ /tmp/Entities.pm 2014-08-10 23:44:27.000000000 +0000 @@ -2526,7 +2526,7 @@ for (@$array) { s/ - (& + &( (?: \#(\d+) | \#[xX]([0-9a-fA-F]+) | (\w+) ) @@ -2538,7 +2538,7 @@ elsif (defined $3) { chr(hex $3); } else - { $entity2char{$4} || $1; } + { $entity2char{"$4;"} || $1; } /xeg; }
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-2588-1407714699-54.0-0-0@rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org> <rt-4.0.18-17121-1406960006-324.0-0-0@rt.cpan.org> <rt-4.0.18-2588-1407714699-54.0-0-0@rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-31661-1407714917-719.0-0-0@rt.cpan.org>
Message-ID: <rt-4.0.18-31661-1407714917-1109.97659-0-0@rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: pagenyon@gmail.com
Content-Length: 142
On Sun Aug 10 19:51:39 2014, pagenyon wrote: this { $entity2char{"$4;"} || $1; } should actually be this: { $entity2char{"$4;"} || "&$1"; }
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-10388-1406843415-1440.0-0-0@rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-9007-1410620562-1620.97659-0-0@rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 23
Fixed in 0.004 I think.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.