This queue is for tickets about the WWW-Scraper-ISBN-GoogleBooks_Driver CPAN distribution.

Report information
The Basics
Id:
93222
Status:
new
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
lyon.lemmens [...] redlemon.nl
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



X-Amavis-Alert: BAD HEADER SECTION, Improper folded header field made up entirely of whitespace (char 20 hex): X-Virus-Checked: Checked\n \n Content previ[...]
MIME-Version: 1.0
X-Spam-Flag: NO
X-Virus-Checked: Checked Content preview: LS, I've been playing around with the excellent ISBN scrapers. But I couldn't get the GoogleBooks one to install as it failed the tests, not capturing the number of pages correctly. With a bit of digging I found that google books redirected me to the dutch site google.books.nl. Which your code captured and adapted the language for correctly. But for some reason it didn't capture the length of the book. Looking at the source of the HTML page, I could not see directly what was wrong. [...] Content analysis details: (-1.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain 0.0 HTML_MESSAGE BODY: HTML included in message 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4736] -1.0 AWL AWL: From: address is in the auto white-list
Content-Type: multipart/alternative; boundary="nextPart4162140.iQeEGDI6Jc"
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -1.899
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 203E2240631 for <cpan-bug+WWW-Scraper-ISBN-GoogleBooks_Driver@hipster.bestpractical.com>; Fri, 21 Feb 2014 05:50:40 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XCclay3UJrN6 for <cpan-bug+WWW-Scraper-ISBN-GoogleBooks_Driver@hipster.bestpractical.com>; Fri, 21 Feb 2014 05:50:35 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 45E8A240615 for <bug-WWW-Scraper-ISBN-GoogleBooks_Driver@rt.cpan.org>; Fri, 21 Feb 2014 05:50:34 -0500 (EST)
Received: (qmail 11391 invoked by alias); 21 Feb 2014 10:50:33 -0000
Received: from smtpq1.tb.mail.iss.as9143.net (HELO smtpq1.tb.mail.iss.as9143.net) (212.54.42.164) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Fri, 21 Feb 2014 02:50:30 -0800
Received: from [212.54.42.137] (helo=smtp6.tb.mail.iss.as9143.net) by smtpq1.tb.mail.iss.as9143.net with esmtp (Exim 4.71) (envelope-from <lyon.lemmens@redlemon.nl>) id 1WGngX-0001wO-3a for bug-WWW-Scraper-ISBN-GoogleBooks_Driver@rt.cpan.org; Fri, 21 Feb 2014 11:50:25 +0100
Received: from 5351a60f.cm-6-2c.dynamic.ziggo.nl ([83.81.166.15] helo=smtp.redlemon.nl) by smtp6.tb.mail.iss.as9143.net with esmtp (Exim 4.71) (envelope-from <lyon.lemmens@redlemon.nl>) id 1WGngW-0007SG-Jl for bug-WWW-Scraper-ISBN-GoogleBooks_Driver@rt.cpan.org; Fri, 21 Feb 2014 11:50:25 +0100
Received: from brutus.redlemon.nl ([192.168.178.11] helo=brutus.localnet) by smtp.redlemon.nl with esmtp (Exim 4.80.1) (envelope-from <lyon.lemmens@redlemon.nl>) id 1WGngT-00008I-9W for bug-WWW-Scraper-ISBN-GoogleBooks_Driver@rt.cpan.org; Fri, 21 Feb 2014 11:50:24 +0100
Delivered-To: cpan-bug+WWW-Scraper-ISBN-GoogleBooks_Driver@hipster.bestpractical.com
Subject: GoogleBooks ISBN Scraper fail test (+solution)
X-Spam-Check-BY: la.mx.develooper.com
Date: Fri, 21 Feb 2014 11:50:20 +0100
X-Spam-Level:
X-Ziggo-Spam-Status: No
X-Quarantine-ID: <XCclay3UJrN6>
To: bug-WWW-Scraper-ISBN-GoogleBooks_Driver@rt.cpan.org
Content-Transfer-Encoding: 7Bit
From lyon.lemmens@redlemon.nl Fri Feb 21 05: 50:41 2014
X-Toutatis-Spam-Report: Spam detection software, running on the system "toutatis.redlemon.nl", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details.
X-Spam-Status: No, score=-1.899 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
X-Ziggo-Spamscore: 1.5
X-Toutatis-Spam-Bar: -
X-Ziggo-Spamreport: BAYES_50=0.8,CM_REPLY_NOARROW=0.3,HTML_MESSAGE=0.001,RDNS_DYNAMIC=0.982,RP_MATCHES_RCVD=-0.574
Reply-To: lyon.lemmens@redlemon.nl
Message-ID: <1957680.rRyq9K18zO@brutus>
X-Toutatis-Spam-Score: -1.8
X-Ziggo-Spambar: +
User-Agent: KMail/4.12.1 (Linux/3.11.0-17-generic; KDE/4.12.1; x86_64; ; )
Return-Path: <lyon.lemmens@redlemon.nl>
X-Original-To: cpan-bug+WWW-Scraper-ISBN-GoogleBooks_Driver@hipster.bestpractical.com
X-RT-Mail-Extension: www-scraper-isbn-googlebooks_driver
From: Lyon Lemmens <lyon.lemmens@redlemon.nl>
X-RT-Interface: Email
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7Bit
X-RT-Original-Encoding: ascii
Content-Length: 1151
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7Bit
X-RT-Original-Encoding: ascii
Content-Length: 4864

LS,

 

I've been playing around with the excellent ISBN scrapers. But I couldn't get the GoogleBooks one to install as it failed the tests, not capturing the number of pages correctly.

 

With a bit of digging I found that google books redirected me to the dutch site google.books.nl. Which your code captured and adapted the language for correctly. But for some reason it didn't capture the length of the book. Looking at the source of the HTML page, I could not see directly what was wrong.

 

However I did notice that there is a flag in the URL that tells you if a redirect has taken place (redir_esc=y). Setting this flag to 'n' in the first place prevented the redirection completely.

 

This means that by setting this flag, you would always go to the main site and you wouldn't need to jump through the language hoops. That would probably simplify the code a bit.

 

Anyway, for now I made one change to the code:

 

124c124

< $data->{url} = $code->{'ISBN:'.$isbn}{info_url};

---

> $data->{url} = $code->{'ISBN:'.$isbn}{info_url} . '&redir_esc=n';

 

This makes it always use the main site and all tests now run OK.

 

--

Regards

 

Lyon Lemmens



This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.