Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Web-Scraper CPAN distribution.

Report information
The Basics
Id: 54240
Status: resolved
Priority: 0/
Queue: Web-Scraper

People
Owner: Nobody in particular
Requestors: whatson [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Content decoding
Date: Wed, 3 Feb 2010 15:49:19 +1000
To: bug-Web-Scraper [...] rt.cpan.org
From: Andrew Whatson <whatson [...] gmail.com>
Download (untitled) / with headers
text/plain 551b
Hi, I've noticed that Web::Scraper doesn't handle HTTP::Response objects with a 'content-encoding' of gzip (and presumably others as well). Poking through the code, it seems to be because an attempt is made at decoding the content manually instead of using $http_response->decoded_content, and this manual decoding checks 'content-type' but ignores 'content-encoding'. A patch is attached that removes all attempts to decode content inside Web::Scraper and instead trusts the HTTP::Response object to decode its content accurately. Thanks, Andrew
Download encode.patch
text/x-diff 976b

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #54240] Content decoding
Date: Wed, 3 Feb 2010 11:27:10 -0800
To: bug-Web-Scraper [...] rt.cpan.org
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
Download (untitled) / with headers
text/plain 1.2k
Hi, thanks for the patch. Is is possible for you to fork on github http://github.com/miyagawa/web-scraper and also add a unit test if there isn't yet? Thanks! On Tue, Feb 2, 2010 at 9:50 PM, Andrew Whatson via RT <bug-Web-Scraper@rt.cpan.org> wrote: Show quoted text
> Wed Feb 03 00:50:06 2010: Request 54240 was acted upon. > Transaction: Ticket created by whatson@gmail.com >       Queue: Web-Scraper >     Subject: Content decoding >   Broken in: (no value) >    Severity: (no value) >       Owner: Nobody >  Requestors: whatson@gmail.com >      Status: new >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=54240 > > > > Hi, > > I've noticed that Web::Scraper doesn't handle HTTP::Response objects with a > 'content-encoding' of gzip (and presumably others as well).  Poking through > the code, it seems to be because an attempt is made at decoding the content > manually instead of using $http_response->decoded_content, and this manual > decoding checks 'content-type' but ignores 'content-encoding'.  A patch is > attached that removes all attempts to decode content inside Web::Scraper and > instead trusts the HTTP::Response object to decode its content accurately. > > Thanks, > Andrew > >
-- Tatsuhiko Miyagawa
Fixed in 0.32


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.