Skip Menu |
 

This queue is for tickets about the CGI-RSS CPAN distribution.

Report information
The Basics
Id: 71851
Status: resolved
Worked: 5 min
Priority: 0/
Queue: CGI-RSS

People
Owner: jettero [...] cpan.org
Requestors: dynot [...] JUNKMAIL.ATH.CX
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.9600
Fixed in: (no value)



Subject: Does not do rfc822 date processing
Download (untitled) / with headers
text/plain 660b
You use a wrong format to generate an rfc822 date. Instead of %Z you should use %z. `man 3 strftime` describes the correct format: RFC 2822-compliant date format (with an English locale for %a and %b) "%a, %d %b %Y %T %z" RFC 822-compliant date format (with an English locale for %a and %b) "%a, %d %b %y %T %z" With the capital Z the feed is invalid at http://validator.w3.org/feed/ Additionally, I dont like that all dates are converted to the local timezone. Why is this happening? I think it should only parse the date if it is not already in rfc822 format. See attached script for fix and testing.
Subject: cgi-rss.pl
Download cgi-rss.pl
text/x-perl 2.6k
#!/usr/bin/perl -w use strict; use CGI::RSS; use POSIX qw(strftime); # run script then enter a few dates to test or remove line 102 to see rss output { package CGI::RSS; sub valid_rfc822_date ($) { $_[0] =~ m!^ (?: (?: Mon | Tue | Wed | Thu | Fri | Sat | Sun ) # day ,\s\s? # comma, space or two )? # (these were optional) \d\d?\s # day with 1 or 2 digit, space (?: Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec ) # month \s\d{2,4}\s # space, 2 or 4 digit year, space \d\d:\d\d:\d\d\s # hr:min:sec, space (?: [\+\-]\d\d\d\d | # time zone with digits, or UT | GMT | EST | EDT | CST | CDT | MST | MDT | PST | PDT | Z | A | M | N | Y # time zone with characters )$ !ix; } sub date { my $this = shift; my $date = shift; if( ! valid_rfc822_date($date) and my $pd = &ParseDate($date) ) { warn "parsing date: $date\n"; my $rfc822_date = &UnixDate($pd, '%a, %d %b %Y %H:%M:%S %z'); return $this->pubDate($rfc822_date); } $this->pubDate($date); } } my $rss = new CGI::RSS; my @feed = ( { title => "first item", link => "http://localhost/directory/1", guid => "http://localhost/directory/1", desc => "this is the first item", date => "Sun, 16 Oct 2011 06:45:03 +0900" }, { title => "second item", link => "http://localhost/directory/2", guid => "http://localhost/directory/2", desc => "this is the second item", date => "15 Oct 2011 12:15:06 +0200" }, { title => "third item", link => "http://localhost/directory/3", guid => "http://localhost/directory/3", desc => "this is the third item", date => "2011-08-11 07:02:26" }, { title => "fourth item", link => "http://localhost/directory/4", guid => "http://localhost/directory/4", desc => "this is the fourth item", date => "Sat, 15 Oct 2011 09:31:01 +0800 (CST)" }, ); print $rss->header; print $rss->begin_rss( title => "My Feed!", link => "http://localhost/directory", desc => "My feed is cool!" ); #print $rss->date($_) while (<>); exit; foreach my $h ( @feed ) { print $rss->item( $rss->title ( $h->{title} ), $rss->link ( $h->{link} ), $rss->guid ( $h->{link} ), # unique identifier, usually $rss->description ( $h->{desc} ), # a permalink $rss->date ( $h->{date} ), # does rfc822 date processing ); } print $rss->finish_rss;
On Sat Oct 22 13:13:04 2011, dynot wrote: Show quoted text
> You use a wrong format to generate an rfc822 date. Instead of %Z you > should use %z. `man 3 strftime` describes the correct format:
That may be the case, and I'm willing to change it, but I have a test in t/ that feeds the rss to the w3 validator and it checks out. The only recommendation that it gives is to add atom:link with rel="self". I went ahead and tried it though (https://github.com/jettero/cgi--rss/tree/_proposed_fix). If I used your proposed format then w3 recommends I use rfc822: "Problematical RFC 822 date-time value: Sat, 22 Mar 08 00:00:00 -0400" If you check the RFC under §5 (http://www.w3.org/Protocols/rfc822/), I think you'll find that RFC 822 is indeed %a, %d %b %Y %H:%M:%S %Z. I'm not at all opposed to addding options to disable the format check. I have also made a change (github only) to allow you to change the format to whatever you like with $CGI::RSS::RFC822F = "%z". Will this help? -Paul -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
On Sun Oct 23 10:23:54 2011, JETTERO wrote: Show quoted text
> On Sat Oct 22 13:13:04 2011, dynot wrote:
> > You use a wrong format to generate an rfc822 date. Instead of %Z
you Show quoted text
> > should use %z. `man 3 strftime` describes the correct format:
> > > That may be the case, and I'm willing to change it, but I have a test
in Show quoted text
> t/ that feeds the rss to the w3 validator and it checks out. The only > recommendation that it gives is to add atom:link with rel="self".
That's interesting. Without any modification, your module produced date formats such as: Sat, 15 Oct 2011 03:31:01 CEST the problem is that CEST is not standards compilant according to the w3 validator! pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST By leaving the year in 4 digits with capital %Y, and changing %z to lowercase, the feed validates. (Sat, 15 Oct 2011 23:45:03 +0200) Show quoted text
> > I went ahead and tried it though > (https://github.com/jettero/cgi--rss/tree/_proposed_fix). > > If I used your proposed format then w3 recommends I use rfc822: > > "Problematical RFC 822 date-time value: Sat, 22 Mar 08 00:00:00 -0400" >
That's funny, cause rfc822 recommends 2 digits for the year, so this format is indeed rfc822 compliant! date = 1*2DIGIT month 2DIGIT ; day month year ; e.g. 20 Jun 82 As far as I can tell, the w3 validator checks the date for rfc2822 (4digit year), and not rfc822 (2digit year). Show quoted text
> If you check the RFC under §5 (http://www.w3.org/Protocols/rfc822/), I > think you'll find that RFC 822 is indeed %a, %d %b %Y %H:%M:%S %Z. >
Yes I read that and CEST is not among the options for time zones, only with 1 2 or 3 characters or +/- 4 digits . Show quoted text
> I'm not at all opposed to addding options to disable the format check.
Good to hear, thanks. Show quoted text
> I have also made a change (github only) to allow you to change the > format to whatever you like with $CGI::RSS::RFC822F = "%z". Will
this help? Nice to have options but I'm not interested in customizing the date format; what I would like is to pass RSS feed validation. Show quoted text
> > -Paul >
Thank you, Peter
From: dynot [...] JUNKMAIL.ATH.CX
Download (untitled) / with headers
text/plain 482b
Oh, found it on the validator website: The value specified must meet the Date and Time specifications as defined by RFC822, with the exception that the year should be expressed as four digits. But RFC822 date with 4 digits is RFC2822 isn't it? :) RFC 2822-compliant date format (with an English locale for %a and %b) "%a, %d %b %Y %T %z" RFC 822-compliant date format (with an English locale for %a and %b) "%a, %d %b %y %T %z"
Download (untitled) / with headers
text/plain 419b
Show quoted text
> Sat, 15 Oct 2011 03:31:01 CEST > pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST
These dates are the same. And I agree that they are correct. I may not understand the problem. The date format I'm using passes the validator and the one you proposed does not. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
Download (untitled) / with headers
text/plain 349b
I see the problem now. Yes, the current RSS best practice is to use the 4-digit year, despite the fact that it's not RFC822. The w3 validator recognizes only the best practice with a bad explanation in the error. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
Download (untitled) / with headers
text/plain 1005b
On Tue Oct 25 14:41:50 2011, JETTERO wrote: Show quoted text
> > Sat, 15 Oct 2011 03:31:01 CEST > > pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST
> > These dates are the same. And I agree that they are correct. > > I may not understand the problem. The date format I'm using passes the > validator and the one you proposed does not. >
I'm sorry but I have no idea what you're talking about. I guess your version passes validation because you only tried one date, 2008-03-22, which is transformed to Sat, 22 Mar 2008 00:00:00 CET and this is valid. However I try some other dates like Sun, 16 Oct 2011 06:45:03 +0900 which become Sat, 15 Oct 2011 23:45:03 CEST and this DOES NOT validate. I am in timezone GMT+2, this is equivalent to CEST i guess. The format I proposed is changing the capital %Z to lower %z. How does this break validation? I get: 2008-03-22 => Sat, 22 Mar 2008 00:00:00 +0100 Sun, 16 Oct 2011 06:45:03 +0900 => Sat, 15 Oct 2011 23:45:03 +0200 which are all valid.
Download (untitled) / with headers
text/plain 554b
Show quoted text
> which are all valid.
I cut and pasted your date format into a temporary branch at github and linked to it. It did not pass validation. This whole %z vs %Z discussion is silly, since they're actually both valid. In any case, I'm going to release a version where you can set it to anything you like, I don't particularly care what you use, but the format I have selected is textbook, correct, and passes validation. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
Download (untitled) / with headers
text/plain 652b
On Tue Oct 25 17:07:08 2011, JETTERO wrote: Show quoted text
> > which are all valid.
> > I cut and pasted your date format into a temporary branch at github and > linked to it. It did not pass validation. This whole %z vs %Z > discussion is silly, since they're actually both valid. > > In any case, I'm going to release a version where you can set it to > anything you like, I don't particularly care what you use, but the > format I have selected is textbook, correct, and passes validation. >
Ok, thanks. This customizable date format will be a good solution. I have no idea why I get this invalid CEST timezone, or why we get different validation results.
Download (untitled) / with headers
text/plain 882b
On Tue Oct 25 17:15:00 2011, dynot wrote: Show quoted text
> Ok, thanks. This customizable date format will be a good solution.
I'm still trying to make up my mind... ->new(date_format=>"$blarg") or should I just leave it as a lexical namespace var: $CGI::RSS::DATE_FORMAT = "$blarg"; I'll probably release today when I make up my mind. Show quoted text
> I have no idea why I get this invalid CEST timezone, or why we get > different validation results.
I have some idea. Perl is getting that timzezone from your operating system. Shell out and issue "date +%z/%Z" and I bet you see that CEST there too. Probably a bad localization? All my googles show that CEST is just Central European summer time. Maybe w3 just has a really out of date tzinfo database? Is it new? -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
Download (untitled) / with headers
text/plain 1.4k
On Wed Oct 26 07:23:10 2011, JETTERO wrote: Show quoted text
> > I have no idea why I get this invalid CEST timezone, or why we get > > different validation results.
> > I have some idea. Perl is getting that timzezone from your operating > system. Shell out and issue "date +%z/%Z" and I bet you see that CEST > there too. Probably a bad localization? All my googles show that
CEST Show quoted text
> is just Central European summer time. Maybe w3 just has a really out
of Show quoted text
> date tzinfo database? Is it new? > > >
I have three debian systems, two with GMT+2 timezone: $ date +%z +0200 $ date +%Z CEST and one with GMT timezone: $ date +%z +0000 $ date +%Z UTC Neither CEST / UTC validate as RFC822 date! zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM) Maybe debian specific? I can test tomorrow on SUSE linux.
Download (untitled) / with headers
text/plain 245b
oic, us centric. I read that about 500 times the last two days and never noticed that it's only 5 timezones. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
Download (untitled) / with headers
text/plain 106b
It's not debian specific, tried on SUSE linux and got the same results: $ date +%z +0200 $ date +%Z CEST
Download (untitled) / with headers
text/plain 399b
Right, no, I said a post ago that the real problem is my lack of understanding that the standard only accepts US timezones. I had no idea about that. I just assumed the were examples and it would take any timezone. Basically... I'll be changing to %z today ... -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
ok thanks for your help.
Download (untitled) / with headers
text/plain 207b
Forgot to close this I guess. Please only respond if it's still broken. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.