Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id:
47748
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
JMEHNLE [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
Wishlist
Broken in:
(no value)
Fixed in:
(no value)



Subject: Handle <unclosed </tags
The other day, I received a spam e-mail with a text/html body part like this: ============================================================== blah blah<br><br <a href=http://domain/path.html target=_blank>Go!</a><br><p>blah ============================================================== My spam filter failed to parse the href URL from the message body due to the unclosed "<br" tag. Closing it causes HTML::Parser to correctly parse the URL. I noticed that http://search.cpan.org/dist/HTML-Parser/Parser.pm#BUGS says: «Unclosed start or end tags, e.g. "<tt<b>...</b</tt>" are not recognized.» I don't understand what the implication of this is, however. Is it a conscious decision not to support unclosed tags, or has there just been no use case for a fix? I tried how various browsers handle the HTML code from the spam message above: At least the following do render the link despite the preceding broken "<br" tag: Firefox 3, Konqueror from KDE 3.5.9, Safari 3 & 4, Mail.app At least the following do NOT render the link: IE 6, Opera 9.63 I'd appreciate it if an option could be added to HTML::Parser to recognize unclosed tags.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.