This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics

Nobody in particular
haarg [...]

(no value)
Broken in:
(no value)
Fixed in:
(no value)

Subject: <meta> parsing registers invalid HTTP headers
<meta> tags can have name attributes with colons in them, and this is perfectly valid. But HTML::HeadParser then tries to register these as X-Meta-<name> headers using HTTP::Headers. Newer versions of HTTP::Headers (since 6.05) have stricter checks for headers, and will refuse them if they contain colons. I would think that HTML::HeadParser should either skip trying to add headers like this, or further reformat the name attribute before trying to use it in an HTTP header name.
I second this. This is currently an issue for anyone running this tool against a website home page that includes that includes the Pinterest domain verify code on their front page, since their meta tag name is "p:domain_verify". After upgrading LWP, my related code started failing. Here's the Pinterest docs for confirmation:
One solution would be to limit the <meta> tag parsing to the http_equiv headers. It seems then that the set of valid characters should be the same. Another solution would be to translate any characters in the meta tag that aren't valid in an HTTP header into something that is valid an HTTP header. For example, translate all such characters into dashes.
Even if there are invalid characters in an http_equiv header, it shouldn't cause parsing the document to fail. I think the two options are transforming the headers to a format that will be accepted, or just not registering such headers and continuing with parsing.
Yeah, dropbox downloads fail due to Twitter meta tags during the HTTP redirects to the download file: ... START[meta] 4.24 KB received Illegal field name 'X-Meta-Twitter:card' at /home/jmates/perl-5.16.3/lib/site_perl/5.16.3/OpenBSD.amd64-openbsd/HTML/ line 207. Transfer aborted. Delete ...? [n]

This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with to