This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id:
5472
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
html-parser [...] duffek.com
Cc:
AdminCc:

BugTracker
Severity:
Normal
Broken in:
3.35
Fixed in:
(no value)

Attachments


Subject: Feature request + patch: return first attribute occurrence instead of last
Hi, This is more a behavior change request than a bug report. When the same HTML attribute is specified multiple times in a single element, Internet Explorer and Mozilla both honor the first occurrence, but HTML::Parser honors the last. For example, if a spammer specifies "<body background=white text=white text=black>random garbage<font color=black>advertisement</font></body>" in an HTML-formatted email message, most Windows users won't see the random garbage, but my Perl-based anti-spam filter will. The attached patch emulates IE/Mozilla behavior by storing the first rather than the last attribute in the hash passed as the "attr" argument to event handlers. Incidentally, I didn't find any mention of this ambiguity in a quick scan of the HTML 4.1 spec. Thanks! Nick Duffek html-parser@duffek.com
diff -r -u -p HTML-Parser-3.35.orig/hparser.c HTML-Parser-3.35/hparser.c --- HTML-Parser-3.35.orig/hparser.c 2003-10-27 16:14:24.000000000 -0500 +++ HTML-Parser-3.35/hparser.c 2004-02-27 14:20:59.000000000 -0500 @@ -414,7 +414,8 @@ report_event(PSTATE* p_state, sv_lower(aTHX_ attrname); if (argcode == ARG_ATTR) { - if (!hv_store_ent(hv, attrname, attrval, 0)) { + if (hv_exists_ent(hv, attrname, 0) || + !hv_store_ent(hv, attrname, attrval, 0)) { SvREFCNT_dec(attrval); } SvREFCNT_dec(attrname);


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.