|Subject:||Handle <unclosed </tags|
The other day, I received a spam e-mail with a text/html body part like this: ============================================================== blah blah<br><br <a href=target=_blank>Go!</a><br><p>blah ============================================================== My spam filter failed to parse the href URL from the message body due to the unclosed "<br" tag. Closing it causes HTML::Parser to correctly parse the URL. I noticed that says: «Unclosed start or end tags, e.g. "<tt<b>...</b</tt>" are not recognized.» I don't understand what the implication of this is, however. Is it a conscious decision not to support unclosed tags, or has there just been no use case for a fix? I tried how various browsers handle the HTML code from the spam message above: At least the following do render the link despite the preceding broken "<br" tag: Firefox 3, Konqueror from KDE 3.5.9, Safari 3 & 4, Mail.app At least the following do NOT render the link: IE 6, Opera 9.63 I'd appreciate it if an option could be added to HTML::Parser to recognize unclosed tags.