Skip Menu |
 

This queue is for tickets about the HTML-HTML5-Parser CPAN distribution.

Report information
The Basics
Id: 79019
Status: resolved
Priority: 0/
Queue: HTML-HTML5-Parser

People
Owner: perl [...] toby.ink
Requestors: karavelov [...] mail.bg
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.206
Fixed in: (no value)



Subject: Failure mode of TagSoupParser
Download (untitled) / with headers
text/plain 164b
The parser dies when trying to parse broken xhtml with namespaced attributes. This is around line 2529. Putting the condition in 'eval' fixes the problem for me.
Download (untitled) / with headers
text/plain 315b
On 2012-08-16T15:47:33+01:00, KARAVELOV wrote: Show quoted text
> The parser dies when trying to parse broken xhtml with namespaced > attributes. This is around > line 2529. Putting the condition in 'eval' fixes the problem for me.
Do you have an example document that triggers the failure? Can you attach it to this bug report?
Subject: RE: [rt.cpan.org #79019] Failure mode of TagSoupParser
Date: Sat, 18 Aug 2012 13:54:29 +0300
To: bug-HTML-HTML5-Parser [...] rt.cpan.org
From: karavelov [...] mail.bg
Download (untitled) / with headers
text/plain 982b
----- Цитат от Toby Inkster via RT (bug-HTML-HTML5-Parser@rt.cpan.org), на 18.08.2012 в 09:54 ----- Show quoted text
>>On 2012-08-16T15:47:33+01:00, KARAVELOV wrote: >>The parser dies when trying to parse broken xhtml with namespaced >>attributes. This is around >>line 2529. Putting the condition in 'eval' fixes the problem for me.
Show quoted text
>Do you have an example document that triggers the failure? Can you attach >it to this bug report?
Here is my test case: perl -MURI -MHTML::HTML5::Parser -E ' my $uri = URI->new("http://www.blitz.bg/news/article/151210"); my $parser = HTML::HTML5::Parser->new; my $doc=$parser->parse_html_file($uri);' And here is the error in TagSoupParsers NAMESPACE ERROR: Attribute without a prefix cannot be in a namespace at /usr/share/perl5/HTML/HTML5/Parser/TagSoupParser.pm line 2524 All the articles at www.blitz.bg are severely broken. The error is on the second line "html xmlns:fb=...." Attached is a minimal test case document. -- Luben Karavelov
Download test.html
text/html 53b

Message body is not shown because sender requested not to inline it.

Download (untitled) / with headers
text/plain 196b
Confirmed. I'll try to sort out a fix for this in the next few days. Your suggestion of wrapping the offending line in an eval is noted, but if possible I'd like to address the underlying cause.
Download (untitled) / with headers
text/plain 600b
On 2012-08-18T15:56:55+01:00, TOBYINK wrote: Show quoted text
> Confirmed. I'll try to sort out a fix for this in the next few days. > > Your suggestion of wrapping the offending line in an eval is noted, but > if possible I'd like to address the underlying cause.
I've just uploaded a development release (0.207_01) to CPAN. It seems to work both for the minimal test document, plus the blitz.bg page. https://metacpan.org/release/TOBYINK/HTML-HTML5-Parser-0.207_01 If you have the time, please give it a try and let me know if it works for you. Assuming all is well, a stable 0.208 will be out in a few days.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.