Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the CGI CPAN distribution.

Report information
The Basics
Id: 67100
Status: resolved
Priority: 0/
Queue: CGI

People
Owner: MARKSTOS [...] cpan.org
Requestors: cpan [...] jibsheet.com
Cc: trs [...] bestpractical.com
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 3.51
  • 3.52
  • 3.55
Fixed in: (no value)

Attachments
0001-Failing-test-for-an-over-zealous-default-charset-rt..patch



CC: trs [...] bestpractical.com
Subject: charset applied to every content-type
Download (untitled) / with headers
text/plain 636b
Successfully installed CGI.pm-3.50 $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif Successfully installed CGI.pm-3.52 (upgraded from 3.50) $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif; charset=ISO-8859-1 falcone@transom:~$ perl -v This is perl 5, version 12, subversion 3 (v5.12.3) built for darwin-2level serving the ISO-8859-1 on images seems.... wrong. This actually fails for us on some application/javascript which wasn't forcing a charset, and I'll fix that, but I'm not sure why CGI started adding that on between 3.50 and 3.51
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 31 Mar 2011 09:07:11 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Download (untitled) / with headers
text/plain 115b
Kevin, Thanks for the report. I agree it looks like a mistake to set a charset for a non "text/*" type. Mark
Download (untitled) / with headers
text/plain 773b
Show quoted text
> This actually fails for us on some application/javascript which wasn't > forcing a charset, and I'll > fix that, but I'm not sure why CGI started adding that on between 3.50 > and 3.51
The reason for the change was documented in "Changes", which referenced the related RT ticket, which in turn referenced HTTP RFCs to justify the change. Here's the entry from "Changes" [BUG FIXES] - Setting charset() now works for all content types, not just "text/*". (RT#57945, Thanks to Yanick and Gerv.) Unless there are other RFC references which counter what we found through the previous ticket, the current behavior will stay. For now I'm marking this change request as "rejected", but feel free to re-open if you find RFC support for reverting the behavior.
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 09:36:02 -0700
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
Download (untitled) / with headers
text/plain 1.8k
On 08/15/2012 07:41 PM, MARKSTOS via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67100 > >
>> This actually fails for us on some application/javascript which >> wasn't forcing a charset, and I'll fix that, but I'm not sure why >> CGI started adding that on between 3.50 and 3.51
> > The reason for the change was documented in "Changes", which > referenced the related RT ticket, which in turn referenced HTTP RFCs > to justify the change. Here's the entry from "Changes" > > [BUG FIXES] - Setting charset() now works for all content types, not > just "text/*". (RT#57945, Thanks to Yanick and Gerv.)
I think there's some confusion here. The bug fix and ticket you mention above is about *allowing* a charset to be sent on non-text/ types. This bug is that CGI.pm is sending a default charset of ISO-8859-1 on *any* content-type even when *no charset* is set by the code. It seems, from first glance, that the change you quote above to always respect the charset has caused the side effect that the default textual charset is always sent, regardless of content-type. By example: tom@whaam ~ $ perl -MV=CGI CGI /opt/perlbrew/perls/perl-5.14.1/lib/5.14.1/CGI.pm: 3.50 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("text/plain");' Content-Type: text/plain; charset=ISO-8859-1 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif tom@whaam ~ $ cpanm -q M/MA/MARKSTOS/CGI.pm-3.51.tar.gz Successfully installed CGI.pm-3.51 1 distribution installed tom@whaam ~ $ perl -MV=CGI CGI /opt/perlbrew/perls/perl-5.14.1/lib/5.14.1/CGI.pm: 3.51 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("text/plain");' Content-Type: text/plain; charset=ISO-8859-1 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif; charset=ISO-8859-1
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 12:51:41 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Download (untitled) / with headers
text/plain 334b
Thanks for the clarification, Tom. I see what you are saying. The two bug reports were not opposite of each other as I first thought. As you've clarified, it appears the last change "over-fixed" issue, and it should be dialed back some. Tests and a patch would be welcome. Otherwise, I'll get to it as time permits myself. Mark
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 10:10:06 -0700
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
Download (untitled) / with headers
text/plain 463b
On 08/16/2012 09:52 AM, mark@summersault.com via RT wrote: Show quoted text
> I see what you are saying. The two bug reports were not opposite of each > other as I first thought. As you've clarified, it appears the last > change "over-fixed" issue, and it should be dialed back some.
Agreed. Show quoted text
> Tests and a patch would be welcome. Otherwise, I'll get to it as time > permits myself.
Here's a couple tests. I don't have time at the moment to track down and fix the bug itself.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 13:52:33 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Download (untitled) / with headers
text/plain 141b
Thanks for the tests! Maybe there's a college CS student out there somewhere who would like to apprentice as a CGI.pm maintainer... Mark
Download (untitled) / with headers
text/plain 1.7k
Show quoted text
> >> This actually fails for us on some application/javascript which > >> wasn't forcing a charset, and I'll fix that, but I'm not sure why > >> CGI started adding that on between 3.50 and 3.51
> > > > The reason for the change was documented in "Changes", which > > referenced the related RT ticket, which in turn referenced HTTP RFCs > > to justify the change. Here's the entry from "Changes" > > > > [BUG FIXES] - Setting charset() now works for all content types, not > > just "text/*". (RT#57945, Thanks to Yanick and Gerv.)
> > I think there's some confusion here. The bug fix and ticket you > mention > above is about *allowing* a charset to be sent on non-text/ types. > > This bug is that CGI.pm is sending a default charset of ISO-8859-1 on > *any* content-type even when *no charset* is set by the code. > > It seems, from first glance, that the change you quote above to always > respect the charset has caused the side effect that the default > textual > charset is always sent, regardless of content-type.
I reviewed this again tonight. I also agree with your point that it's a waste of bandwidth to send charset on a image/gif type by default where it doesn't apply. Reading the old ticket one more time, the request then wasn't that the charset be set for every content-type by default, it was that it was *allowed to be changed* via $self->charset() for non "text/*" types. It already worked for non-text types if the charset was supplied in every call to header(). Internally, perhaps the problem is the way the code is structured, it can't tell whether $self->charset() contains a value that is the global default, or one set by the user. Thus, some kind of re-working is required to provide the ideal solution.
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Tue, 15 Jan 2013 10:45:58 -0800
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
Download (untitled) / with headers
text/plain 1.3k
On 01/11/2013 06:20 PM, MARKSTOS via RT wrote: Show quoted text
> I reviewed this again tonight. > > I also agree with your point that it's a waste of bandwidth to send > charset on a image/gif type by default where it doesn't apply. > > Reading the old ticket one more time, the request then wasn't that the > charset be set for every content-type by default, it was that it was > *allowed to be changed* via $self->charset() for non "text/*" types. It > already worked for non-text types if the charset was supplied in every > call to header().
Yep, that was my understanding as well, and what I was trying to express in my mail you quoted. I'm glad we're on the same page. :) Show quoted text
> Internally, perhaps the problem is the way the code is structured, it > can't tell whether $self->charset() contains a value that is the global > default, or one set by the user. Thus, some kind of re-working is > required to provide the ideal solution.
Correct. Back when this ticket was previously active, I dug into the CGI internals to try and come up with a quick patch. Unfortunately, I came to the same conclusion: the global default is indistinguishable from a user set default, so the entire method of ->charset() handling needs reworking. It's unfortunately not enough to simply compare the current ->charset() to the global default, since the user may have explicitly specified the global default which should be respected.
Download (untitled) / with headers
text/plain 934b
I think header() should be a subset of HTTP::Headers, and so charset() attribute should be read-only. In CGI.pm, the default value of the Content-Type header is set to "text/html; charset=ISO-8859-1". That's why charset() *returns* "ISO-8859-1" by default. By the way, why is HTTP::Headers::content_type_charset() read-only? In other words, how can we make CGI::charset() writable? In my opinion, charset() attribute of CGI.pm isn't writable. In fact, you can't set charset() in the following situation: my $q = CGI->new; $q->charset( 'utf-8' ); # set print $q->header( -type => 'text/plain; charSet=EUC-JP' ); # [STDOUT] # Content-Type: text/plain; charSet=EUC-JP; charset=utf-8 # In CGI.pm, the Content-Type header is separated into two properties: '-type' and '-charset'. To begin with, I don't think we can't *separate* the Content-Type header into two properties, while I don't know how to fix this problem ;)
Download (untitled) / with headers
text/plain 240b
This issue has been copied to: https://github.com/leejo/CGI.pm/issues/84 please take all future correspondence there. This ticket will remain open but please do not reply here. This ticket will be closed when the github issue is dealt with.
Download (untitled) / with headers
text/plain 865b
I've taken the path of least disruption here and amended the perldoc to headers to explain that passing -charset => '' is the workaround to nonsensical charsets. commit 37ad1ed2164635c9ed358493c33ae90c91448bfc Author: Lee Johnson <lee@givengain.ch> Date: Fri Oct 17 15:49:06 2014 +0200 resolve #84 [rt.cpan.org #67100] - document default charset being nonsensical in some situations and the workaround being to pass in -charset => '' in those cases. this is the path of least disruption as there is currently no way to tell if the default charset is one set by the user or the module, and the alternative of checking the content type against the charset is a no go IMO Changes | 7 +++++++ lib/CGI.pm | 8 +++++++- t/headers/type.t | 30 ++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+), 1 deletion(-)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.