Skip Menu |
 

This queue is for tickets about the WWW-Google-SiteMap CPAN distribution.

Report information
The Basics
Id: 30592
Status: resolved
Priority: 0/
Queue: WWW-Google-SiteMap

People
Owner: Nobody in particular
Requestors: bryn.dole [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 40E204D80E0 for <bug-WWW-Google-SiteMap [...] rt.cpan.org>; Thu, 8 Nov 2007 18:37:58 -0500 (EST)
Received: (qmail 31176 invoked by alias); 8 Nov 2007 23:37:57 -0000
Received: from el-out-1112.google.com (HELO el-out-1112.google.com) (209.85.162.181) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 08 Nov 2007 15:37:55 -0800
Received: by el-out-1112.google.com with SMTP id r27so156055ele for <bug-WWW-Google-SiteMap [...] rt.cpan.org>; Thu, 08 Nov 2007 15:37:51 -0800 (PST)
Received: by 10.142.178.13 with SMTP id a13mr532672wff.1194565070751; Thu, 08 Nov 2007 15:37:50 -0800 (PST)
Received: by 10.142.166.7 with HTTP; Thu, 8 Nov 2007 15:37:50 -0800 (PST)
Delivered-To: cpan-bug+www-google-sitemap [...] diesel.bestpractical.com
MIME-Version: 1.0
Subject: WWW-Google-SiteMap bug: duplicate URLs
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=FngH4Vej+DdEg62wm8NgWVr04l/1z+5W7kCte7NWSRdOOZx/78/ifeYkaNyd/Zp6r5rYRwySukYi1L+lhVyhQLoT3f76/o2Mah1xIyY60RvN5PYBQnL/g2fNFWYfwVF9X8qBHa2I+la3CGvjcs98o26cMvSJICLzn693VZHNRWs=
X-Spam-Status: No, hits=-2.6 required=8.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VERIFIED,DK_SIGNED,HTML_MESSAGE,SPF_PASS
Return-Path: <bryn.dole [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=dvc3pXBOzJ/Y0EHYA/OOvXCD4VJ3mPWuB9jTOFljhtY=; b=T2iIlNvKx7PVmhDlVdU7riLBpf5pa9B03tDoLRU2VbKeWk4A+r7IaowXCt26MgcE0eZV6T805NLAX5XUDPY21GrIo4Y8rweYtln4PvQ3hsdWyQhoz+cpfHLIicVThS5njJ2InfeuBoWqq7GF8U69e8pAZ3zUq7r8MzB3qOMfiX8=
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-WWW-Google-SiteMap [...] rt.cpan.org
Date: Thu, 8 Nov 2007 15:37:50 -0800
Message-Id: <d26a05de0711081537rd771ff9kd7afa117a9c6522c [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="----=_Part_26889_3332064.1194565070742"
To: bug-WWW-Google-SiteMap [...] rt.cpan.org
From: "Bryn Dole" <bryn.dole [...] gmail.com>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 404
Download (untitled) / with headers
text/plain 404b
I get duplicate URLs in my sitemap. Here is an easy fix for to urls() in WWW-Google-SiteMap-1.09/lib/WWW/Google/SiteMap.pm Maybe the bug is in the crawler part, but this is an easy fix. Bryn sub urls { my $self = shift; $self->{urls} = \@_ if @_; my %hist; my @urls = grep { ref($_) && defined $_->loc && !$hist{$_->loc}++} @{$self->{urls}}; return wantarray ? @urls : \@urls; }
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 572
MIME-Version: 1.0
X-Spam-Status: No, hits=-2.4 required=8.0 tests=ANY_BOUNCE_MESSAGE,BAYES_00,DKIM_SIGNED,DKIM_VERIFIED,DK_SIGNED,HTML_MESSAGE,SPF_PASS,VBOUNCE_MESSAGE
In-Reply-To: <rt-3.6.HEAD-10610-1194565083-751.30592-3-0 [...] rt.cpan.org>
References: <RT-Ticket-30592 [...] rt.cpan.org> <d26a05de0711081537rd771ff9kd7afa117a9c6522c [...] mail.gmail.com> <rt-3.6.HEAD-10610-1194565083-751.30592-3-0 [...] rt.cpan.org>
Content-Type: multipart/alternative; boundary="----=_Part_26996_29408013.1194568125641"
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 1D3004D80A7 for <bug-WWW-Google-SiteMap [...] rt.cpan.org>; Thu, 8 Nov 2007 19:29:01 -0500 (EST)
Received: (qmail 17385 invoked by alias); 9 Nov 2007 00:29:00 -0000
Received: from rn-out-0910.google.com (HELO rn-out-0102.google.com) (64.233.170.186) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 08 Nov 2007 16:28:50 -0800
Received: by rn-out-0102.google.com with SMTP id i50so324348rne for <bug-WWW-Google-SiteMap [...] rt.cpan.org>; Thu, 08 Nov 2007 16:28:46 -0800 (PST)
Received: by 10.142.217.17 with SMTP id p17mr400045wfg.1194568125638; Thu, 08 Nov 2007 16:28:45 -0800 (PST)
Received: by 10.142.166.7 with HTTP; Thu, 8 Nov 2007 16:28:45 -0800 (PST)
Delivered-To: cpan-bug+www-google-sitemap [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #30592] AutoReply: WWW-Google-SiteMap bug: duplicate URLs
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=KRIfX+H8/rxJQh+CELUc0mvgUPn8MNBBn425qjw2u2kirmUuH2G8hvN4SdbeCkT5A7vnk1qwjXL/GqDPdIX1RFAC+r1j3HV5WWouYojB04/pVxvx3O6KDAJxsNsYniv3zVSXtJboDKCbL38HVaeR2EXK2dZOCPaBmSisxNa6l2c=
Return-Path: <bryn.dole [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=i1d5JiEM9E/WW6yafHPOC5lOJ8rvn1OavQndr4gwwmA=; b=Fs1k4JoiXMYd0rlxDR7hAMTMOn25B/ILb9MuFUpmqnyp/3mrPMX8KH0isxHoHnhnwXKX2Yblfe0/bueihx/eY9+l/6skssmh3cCn4RdDWWRP6A2EhbcAi9U7Gzpa7i9r0zao29spPDws2hNE0SJ2JGbkI+XUGCO8XEDJDc7ovLk=
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-WWW-Google-SiteMap [...] rt.cpan.org
Date: Thu, 8 Nov 2007 16:28:45 -0800
Message-Id: <d26a05de0711081628l5af90feaq13c28c0c3af30026 [...] mail.gmail.com>
To: bug-WWW-Google-SiteMap [...] rt.cpan.org
From: "Bryn Dole" <bryn.dole [...] gmail.com>
RT-Message-ID: <rt-3.6.HEAD-10733-1194568157-1480.30592-0-0 [...] rt.cpan.org>
Content-Length: 0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
X-RT-Original-Encoding: utf-8
Content-Length: 1440
Download (untitled) / with headers
text/plain 1.4k
Never mind. It was pilot error. Looks like sitemap.gz file get appended, and this is why I was seeing dups. Bryn On Nov 8, 2007 3:38 PM, Bugs in WWW-Google-SiteMap via RT < bug-WWW-Google-SiteMap@rt.cpan.org> wrote: Show quoted text
> > Greetings, > > This message has been automatically generated in response to the > creation of a trouble ticket regarding: > "WWW-Google-SiteMap bug: duplicate URLs", > a summary of which appears below. > > There is no need to reply to this message right now. Your ticket has been > assigned an ID of [rt.cpan.org #30592]. Your ticket is accessible > on the web at: > > http://rt.cpan.org/Ticket/Display.html?id=30592 > > Please include the string: > > [rt.cpan.org #30592] > > in the subject line of all future correspondence about this issue. To do > so, > you may reply to this message. > > Thank you, > bug-WWW-Google-SiteMap@rt.cpan.org > > ------------------------------------------------------------------------- > I get duplicate URLs in my sitemap. Here is an easy fix for to urls() in > WWW-Google-SiteMap-1.09/lib/WWW/Google/SiteMap.pm > > Maybe the bug is in the crawler part, but this is an easy fix. > > Bryn > > > sub urls { > my $self = shift; > $self->{urls} = \@_ if @_; > my %hist; > my @urls = grep { ref($_) && defined $_->loc && !$hist{$_->loc}++} > @{$self->{urls}}; > return wantarray ? @urls : \@urls; > } > >
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 2278


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.