Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Web-Scraper CPAN distribution.

Report information
The Basics
Id: 68258
Status: rejected
Priority: 0/
Queue: Web-Scraper

People
Owner: Nobody in particular
Requestors: hayzer [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.34
Fixed in: (no value)



Subject: Failed to get text of script tag
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1305611364-18810-278"
Content-Length: 0
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 827
Download (untitled) / with headers
text/plain 827b
Hi, I'm trying to rip off a plain JS code from the attached file. This is my code: {{{ my $scraper = scraper { process "script", "scripts[]" => "TEXT"; }; my $jscode = $scraper->scrape($htmlfile); }}} The result is: {{{ $VAR1 = { 'scripts' => [ '', '', '' ] }; }}} Sorry if it is my mistake and not a bug. Web::Scraper => 0.34 HTML::Element => 4.2 HTML::Selector::XPath => 0.07 HTML::Entities => 3.68 HTML::Tagset => 3.20 HTML::TreeBuilder::XPath => 0.12 perl -v => This is perl 5, version 12, subversion 3 (v5.12.3) built for i386-linux-thread-multi uname -a => Linux 2.6.35.12-90.fc14.i686 #1 SMP Fri Apr 22 16:14:44 UTC 2011 i686 i686 i386 GNU/Linux
Subject: index4.html
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"; name="index4.html"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline; filename="index4.html"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: ascii
Content-Length: 2250
Download index4.html
text/html 2.1k
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-7621-1345645886-873.68258-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1024
Download (untitled) / with headers
text/plain 1024b
You should be able to use 'RAW' instead of 'TEXT' to get the JS. On Tue May 17 01:49:24 2011, hayzer@gmail.com wrote: Show quoted text
> Hi, > > I'm trying to rip off a plain JS code from the attached file. > This is my code: > {{{ > my $scraper = scraper { > process "script", "scripts[]" => "TEXT"; > }; > > my $jscode = $scraper->scrape($htmlfile); > }}} > > The result is: > {{{ > $VAR1 = { > 'scripts' => [ > '', > '', > '' > ] > }; > }}} > > Sorry if it is my mistake and not a bug. > > Web::Scraper => 0.34 > HTML::Element => 4.2 > HTML::Selector::XPath => 0.07 > HTML::Entities => 3.68 > HTML::Tagset => 3.20 > HTML::TreeBuilder::XPath => 0.12 > > perl -v => > This is perl 5, version 12, subversion 3 (v5.12.3) built for > i386-linux-thread-multi > uname -a => > Linux 2.6.35.12-90.fc14.i686 #1 SMP Fri Apr 22 16:14:44 UTC 2011 i686 > i686 i386 GNU/Linux


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.