Skip Menu |
 

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 55629
Status: open
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: NIKOLAS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.64
Fixed in: (no value)



Subject: Wrong parse
Download (untitled) / with headers
text/plain 434b
HTML: <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" height="100%" scrolling="auto" frameborder="no"></iframe> $parser = HTML::Parser->new( api_version => 3, start_h => [ sub{ my ($Self, $Text, $Tag, $Attr) = @_; print "Tag is: ".$Tag; }, "self, text, tagname, attr" ] ); $parser->ignore_elements( qw( iframe )); $parser->ignore_tags( qw( iframe )); output: Tag is: iframe/**/src="http://mail.ru"
Download (untitled) / with headers
text/plain 563b
Втр Мар 16 11:09:51 2010, NIKOLAS писал: Show quoted text
> HTML: > <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" > height="100%" scrolling="auto" frameborder="no"></iframe> > > $parser = HTML::Parser->new( > api_version => 3, > start_h => [ sub{ > my ($Self, $Text, $Tag, $Attr) = @_; > print "Tag is: ".$Tag; > }, "self, text, tagname, attr" ] > ); > $parser->ignore_elements( qw( iframe )); > $parser->ignore_tags( qw( iframe )); > > output: > Tag is: iframe/**/src="http://mail.ru"
HTML: <script/src="ya.ru"> wrong parse same
Download (untitled) / with headers
text/plain 165b
I don't understand what rules you propose that HTML::Parser should follow to parse this kind of bogus HTML. You think it should treat "/**/" and "/" as whitespace?
Download (untitled) / with headers
text/plain 197b
Here 3 regular expressions applied to the entrance text correct this problems: s{(/\*)}{ $1}g; s{(\*/)}{$1 }g; s{(<[^/\s<>]+)/}{$1 /}g; Probably you will find more correct architectural decision.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.