Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id:
55629
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
NIKOLAS [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
Important
Broken in:
3.64
Fixed in:
(no value)



Subject: Wrong parse
HTML: <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" height="100%" scrolling="auto" frameborder="no"></iframe> $parser = HTML::Parser->new( api_version => 3, start_h => [ sub{ my ($Self, $Text, $Tag, $Attr) = @_; print "Tag is: ".$Tag; }, "self, text, tagname, attr" ] ); $parser->ignore_elements( qw( iframe )); $parser->ignore_tags( qw( iframe )); output: Tag is: iframe/**/src="http://mail.ru"
Втр Мар 16 11:09:51 2010, NIKOLAS писал:
Show quoted text
> HTML: > <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" > height="100%" scrolling="auto" frameborder="no"></iframe> > > $parser = HTML::Parser->new( > api_version => 3, > start_h => [ sub{ > my ($Self, $Text, $Tag, $Attr) = @_; > print "Tag is: ".$Tag; > }, "self, text, tagname, attr" ] > ); > $parser->ignore_elements( qw( iframe )); > $parser->ignore_tags( qw( iframe )); > > output: > Tag is: iframe/**/src="http://mail.ru"
HTML: <script/src="ya.ru"> wrong parse same
I don't understand what rules you propose that HTML::Parser should follow to parse this kind of bogus HTML. You think it should treat "/**/" and "/" as whitespace?
Here 3 regular expressions applied to the entrance text correct this problems: s{(/\*)}{ $1}g; s{(\*/)}{$1 }g; s{(<[^/\s<>]+)/}{$1 /}g; Probably you will find more correct architectural decision.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.