Skip Menu | You are currently an anonymous guest. | Login | Return to Main | About rt.cpan.org
 

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.

X Report information
Id: 5472
Status: resolved
Left: 0 min
Priority: 0/0
Queue: HTML-Parser

Owner: Nobody
Requestors: html-parser [...] duffek.com
Cc:
AdminCc:

Severity: Normal
Broken in: 3.35
Fixed in: (no value)

X Attachments



X History Display mode: Brief headersFull headers
#   Fri Feb 27 14:42:41 2004 guest - Ticket created  
Subject: Feature request + patch: return first attribute occurrence instead of last
[text/plain 812b]
Hi,

This is more a behavior change request than a bug report.

When the same HTML attribute is specified multiple times in a single element, Internet Explorer and Mozilla both honor the first occurrence, but HTML::Parser honors the last.

For example, if a spammer specifies "<body background=white text=white text=black>random garbage<font color=black>advertisement</font></body>" in an HTML-formatted email message, most Windows users won't see the random garbage, but my Perl-based anti-spam filter will.

The attached patch emulates IE/Mozilla behavior by storing the first rather than the last attribute in the hash passed as the "attr" argument to event handlers.

Incidentally, I didn't find any mention of this ambiguity in a quick scan of the HTML 4.1 spec.

Thanks!

Nick Duffek
html-parser[...]duffek.com
[text/x-patch 537b]
diff -r -u -p HTML-Parser-3.35.orig/hparser.c HTML-Parser-3.35/hparser.c
--- HTML-Parser-3.35.orig/hparser.c 2003-10-27 16:14:24.000000000 -0500
+++ HTML-Parser-3.35/hparser.c 2004-02-27 14:20:59.000000000 -0500
@@ -414,7 +414,8 @@ report_event(PSTATE* p_state,
sv_lower(aTHX_ attrname);

if (argcode == ARG_ATTR) {
- if (!hv_store_ent(hv, attrname, attrval, 0)) {
+ if (hv_exists_ent(hv, attrname, 0) ||
+ !hv_store_ent(hv, attrname, attrval, 0)) {
SvREFCNT_dec(attrval);
}
SvREFCNT_dec(attrname);

#   Thu Apr 01 07:00:09 2004 GAAS - Status changed from 'new' to 'resolved'