On Sat Aug 02 02:13:26 2014, pagenyon wrote:
Show quoted text
> On Thu Jul 31 17:50:15 2014, pagenyon wrote:
> > test case attached demonstrating encoding and decoding don't work for
> > majority of characters:
> >
> > # Looks like you failed 3958 tests of 4250.
>
> So the encoding is working, it's just returning numeric entities
> instead of the named entities.
>
> But the decoding is really broken. I attached a script that shows that
> the decode_entities from HTML::Entities can properly replace the
> entities using your %entity2char hash, whereas the decode_entities
> from your module doesn't have any effect.
The problem is in your regular expression. I have a patch for decode_entities, I didn't look at _decode_entities:
--- lib/HTML/HTML5/Entities.pm 2012-06-26 20:35:25.000000000 +0000
+++ /tmp/Entities.pm 2014-08-10 23:44:27.000000000 +0000
@@ -2526,7 +2526,7 @@
for (@$array)
{
s/
- (&
+ &(
(?:
\#(\d+) | \#[xX]([0-9a-fA-F]+) | (\w+)
)
@@ -2538,7 +2538,7 @@
elsif (defined $3)
{ chr(hex $3); }
else
- { $entity2char{$4} || $1; }
+ { $entity2char{"$4;"} || $1; }
/xeg;
}