Skip Menu |
 

This queue is for tickets about the WWW-UsePerl-Journal CPAN distribution.

Report information
The Basics
Id: 13748
Status: resolved
Priority: 0/
Queue: WWW-UsePerl-Journal

People
Owner: BARBIE [...] cpan.org
Requestors: simonw [...] digitalcraftsmen.net
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.12
Fixed in: (no value)

Attachments
www-useperl-journal-patch.txt



Subject: Fails tests with new use.perl layout
The new Use Perl xhtml layout breaks the tests. Attached patch fixes the tests.
--- WWW-UsePerl-Journal-0.12/lib/WWW/UsePerl/Journal.pm Sun Apr 17 07:20:19 2005 +++ WWW-UsePerl-Journal-0.12-new/lib/WWW/UsePerl/Journal.pm Mon Jul 18 13:07:16 2005 @@ -80,7 +80,7 @@ "/journal.pl?op=list&uid=$uid")->content; die "Cannot connect to " . UP_URL unless $content; - $content =~ m#<HTML><HEAD><TITLE>Journal of (.*?) \(\d+\)</TITLE># + $content =~ m#<title>Journal of (.*?) \(\d+\)</title># or die "$uid does not exist"; $1; } @@ -122,7 +122,25 @@ my @entries; -# Sample of this on 04/10/2002 +# Sample of this on 18/07/2005 +# <div class="search-results"> +# <h4> +# <a href="//use.perl.org/~pjf/journal/25733">Losing money internationally</a> +# </h4> +# <div class="data"> +# On Saturday July 16, @01:44AM +# </div> +# <div class="intro"> +# Losing money internationally +#I deal with banks regularly, and while issues... +# </div> +# <div class="author"> +# Author: <a href="//use.perl.org/~pjf/">pjf</a> +# </div> +# +#</div> + +# Old sample from 04/10/2002 #<B><A HREF="//use.perl.org/~davorg/journal/8165">Buy More Books</A></B><BR> # <FONT SIZE="-1">On 2002.10.04 6:24</FONT><BR> # Yesterday I got my royalty statement for sales of Data Munging with Perl in the...<BR> @@ -133,19 +151,11 @@ # <P> while ( $content =~ m# - <B><A\s*HREF="$site/~(\w+)/journal/(\d+)">(.+?)</A></B><BR> - \s* - <FONT\s*SIZE="-1">On\s*(.+?)</FONT><BR> - \s* - .+?<BR> - \s* - <FONT\s*SIZE="-1"> - \s* - Author:\s*<A\s*HREF="$site/~(\w+)/">(\w+)</A> - \s* - </FONT> + <h4>\s*<a\s*href="$site/~(\w+)/journal/(\d+)">(.+?)</a>\s*</h4> \s* - <P> + <div\sclass="data">\s*On\s*(.+?)\s+</div> + .+? + Author:\s*<a\s*href="$site/~(\w+)/">(\w+)</a> #migxs ) { die "$5 is not $6" if $5 ne $6; my $time = Time::Piece->strptime($4, '%Y.%m.%d %H:%M'); @@ -176,10 +186,7 @@ s/^.*\Q<!-- start template: ID 251, journalsearch;search;default -->\E//sm; $content =~ - s/<A HREF=\"$site\/search\.pl\?threshold=0&op=journals - &sort=1&amp;start=30">Next 30 matches&gt; - <\/A>\s*<P>\s* - <!-- end template: ID 251, journalsearch;search;default -->.*$//sm; + s/<div class="pagination.*$//sm; return $content; }
Download (untitled) / with headers
text/plain 166b
Have extensively gone through the screen scraping regexes, and converted them. Several changes still to be done, but this is the first working version. fixed in 0.13.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.