|Subject:||Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/HTTP/Message.pm line 264|
$ grep '$VERSION =' /usr/share/perl5/HTTP/Message.pm $VERSION = "5.828"; $ perl -wMLWP::Simple -e 'get ("")' Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/HTTP/Message.pm line 264. Oops. That's not so good. Line 264 is in content_charset, enlisting HTML::Parser to parse the header. I have attached the HTTP::Response object so the problem can be demonstrated without needing to connect to the website: $ perl -wMLWP::UserAgent -MData::Dumper -e ' my $ua = new LWP::UserAgent; my $r = $ua->get(" "); print Dumper $r; ' > /tmp/HTTP::Response-object (I've attached the file) $ perl -wMLWP -e ' my $r = do "/tmp/HTTP::Response-object"; $r->content_charset; ' I've got the mad idea that stripping/killing all 8-bit-chars for the parser --- along the lines of a "tr [\200-\377] [\000-\177];" --- might work, if we're only looking for headers that are ASCII encoded, but I am convinced that that's not really the right way. I am also not sure I truly understand what HTML::Parser is trying to tell HTTP::Message.