Skip Menu |
 

This queue is for tickets about the FCGI CPAN distribution.

Report information
The Basics
Id: 56437
Status: resolved
Priority: 0/
Queue: FCGI

People
Owner: bobtfish [...] bobtfish.net
Requestors: eric_a_benson [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Fri, 9 Apr 2010 11:38:16 -0700 (PDT)
To: mason-users [...] lists.sourceforge.net, bug-FCGI [...] rt.cpan.org, bug-CGI [...] rt.cpan.org
From: Eric Benson <eric_a_benson [...] yahoo.com>
Download (untitled) / with headers
text/plain 2.4k
I'm sending this bug report to the mason-users list as well as the bug reporting mail addresses for the CGI and FCGI Perl modules. A change in Unicode string handling in the FCGI module between version 0.68 and 0.69 is causing problems for Mason applications deployed with FastCGI, using the CGI::Fast module. I observed this problem in my own application and discovered by searching the Internet that it had been reported previously. The ChangeLog for FCGI shows this entry for version 0.68_01: o Fix UTF-8 double encoding when FCGI is passed octets by downgrading them into bytes correctly. Fixes RT#52400 <chansen@cpan.org> This appears to be the source of the trouble. A problem report was sent to the mason-users list on March 10 as well as the Request Tracker Users list on March 15 by Vitaly Tskhovrebov: ---------------------------------------------------------------------------------------------- Hello. I'm trying to use mason with RT 3.8.7 and when I starting it with FCGI_SOCKET_PATH=/opt/rt3/var/nginx/rt3.fcgi.socket perl /opt/rt3/bin/mason_handler.fcgi & and trying to browse to, it may return this error: Wide character in FCGI::Stream::PRINT at /usr/lib64/perl5/site_perl/5.8.8/HTML/Mason/CGIHandler.pm line 106, line 323. I believe that's related to utf-8 handling, but I Can't understand how to turn on utf-8 by default at whole mason's engine. ----------------------------------------------------------------------------------------------- He received this response on the Request Tracker Users list from Ruslan Zakirov: ----------------------------------------------------------------------------------------------- Hello Vitaly. Downgrade FCGI module to version 0.68. ----------------------------------------------------------------------------------------------- Vitaly reported that this solved his problem and I also observed that it solved the problem in my application. However, obviously we cannot rely on an obsolete version of a Perl module forever, especially since subsequent versions of FCGI have fixes needed for upcoming Perl version 5.12. Clearly the change described in version 0.68_01 was desired to fix some application, but it seems to have broken other applications. It seems that either that change to FCGI should be modified so that Mason applications with UTF-8 work, or Mason itself should be modified, or the CGI::Fast module should be modified. Perhaps a new parameter is required to be supplied to CGI::Fast or HTML::Mason::CGIHandler to make it UTF-8 aware.
RT-Send-CC: chansen [...] cpan.org
Download (untitled) / with headers
text/plain 3.2k
Hi Eric On Fri Apr 09 14:38:31 2010, eric_a_benson@yahoo.com wrote: Show quoted text
> A change in Unicode string handling in the FCGI module between version > 0.68 and 0.69 is causing problems for Mason applications deployed > with FastCGI, using the CGI::Fast module. I observed this problem > in my own application and discovered by searching the Internet that > it had been reported previously. > > The ChangeLog for FCGI shows this entry for version 0.68_01: > > o Fix UTF-8 double encoding when FCGI is passed octets by > downgrading > them into bytes correctly. Fixes RT#52400 <chansen@cpan.org>
<snip> I've copied chansen into this mail also. Show quoted text
> I'm trying to use mason with RT 3.8.7 and when I starting it with > FCGI_SOCKET_PATH=/opt/rt3/var/nginx/rt3.fcgi.socket perl > /opt/rt3/bin/mason_handler.fcgi & > > and trying to browse to, it may return this error: > Wide character in FCGI::Stream::PRINT at > /usr/lib64/perl5/site_perl/5.8.8/HTML/Mason/CGIHandler.pm line 106, > line 323. > > I believe that's related to utf-8 handling, but I Can't understand how > to turn on utf-8 by default at whole mason's engine.
<snip> Show quoted text
> He received this response on the Request Tracker Users list from > Ruslan Zakirov:
<snip> Show quoted text
> Downgrade FCGI module to version 0.68.
<snip> Show quoted text
> Vitaly reported that this solved his problem and I also observed that > it solved the problem in my application. However, obviously we > cannot rely on an obsolete version of a Perl module forever, > especially since subsequent versions of FCGI have fixes needed for > upcoming Perl version 5.12. Clearly the change described in version > 0.68_01 was desired to fix some application, but it seems to have > broken other applications. It seems that either that change to FCGI > should be modified so that Mason applications with UTF-8 work, or > Mason itself should be modified, or the CGI::Fast module should be > modified. Perhaps a new parameter is required to be supplied to > CGI::Fast or HTML::Mason::CGIHandler to make it UTF-8 aware.
I'm afraid that there is no intention of changing FCGI at this point. What was previously happening is that FCGI had no concept of perl's UTF-8 handling whatsoever. Therefore, if you output character data which was in the high bit range (i.e. uft8 upgraded characters), then you would silently get data corruption... Not a good thing. :( FCGI was changed such that if perl knows how to / is able to downgrade the character string into a byte string, then this is done (giving correct output), otherwise an error is thrown. Whilst, unfortunately, this causes some things which used to silently "work" (for values of work meaning silently corrupting your data) to fail now, resulting in the issue that you're seeing, I'm 100% convinced that the behavior change is entirely correct.. I'll keep this ticket open as a reference to track the issues with Mason/CGI::Fast, but someone would need to convince me (in the form of patches and tests) to change the code... I _would_ support and take a patch to add a SILENTLY_CORRUPT_UNENCODED_CHARACTER_DATA_I_KNOW_WHAT_I_AM_DOING_AND_LIKE_MOJIBAKE flag (or something similar), which would optionally disable the new behavior, if the user was prepared to specify very plainly that they explicitly want the broken behavior. Cheers t0m
Subject: Re: [rt.cpan.org #56437] Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Sat, 10 Apr 2010 17:11:36 +0200
To: bug-FCGI [...] rt.cpan.org
From: Christian Hansen <christian.hansen [...] mac.com>
Download (untitled) / with headers
text/plain 2.9k
10 apr 2010 kl. 09.45 skrev Tomas Doran via RT: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=56437 > > > Hi Eric > > On Fri Apr 09 14:38:31 2010, eric_a_benson@yahoo.com wrote:
>> A change in Unicode string handling in the FCGI module between >> version >> 0.68 and 0.69 is causing problems for Mason applications deployed >> with FastCGI, using the CGI::Fast module. I observed this problem >> in my own application and discovered by searching the Internet that >> it had been reported previously. >> >> The ChangeLog for FCGI shows this entry for version 0.68_01: >> >> o Fix UTF-8 double encoding when FCGI is passed octets by >> downgrading >> them into bytes correctly. Fixes RT#52400 <chansen@cpan.org>
> > <snip> > > I've copied chansen into this mail also. >
>> I'm trying to use mason with RT 3.8.7 and when I starting it with >> FCGI_SOCKET_PATH=/opt/rt3/var/nginx/rt3.fcgi.socket perl >> /opt/rt3/bin/mason_handler.fcgi & >> >> and trying to browse to, it may return this error: >> Wide character in FCGI::Stream::PRINT at >> /usr/lib64/perl5/site_perl/5.8.8/HTML/Mason/CGIHandler.pm line 106, >> line 323. >> >> I believe that's related to utf-8 handling, but I Can't understand >> how >> to turn on utf-8 by default at whole mason's engine.
> > <snip> >
>> He received this response on the Request Tracker Users list from >> Ruslan Zakirov:
> > <snip> >
>> Downgrade FCGI module to version 0.68.
> > <snip> >
>> Vitaly reported that this solved his problem and I also observed that >> it solved the problem in my application. However, obviously we >> cannot rely on an obsolete version of a Perl module forever, >> especially since subsequent versions of FCGI have fixes needed for >> upcoming Perl version 5.12. Clearly the change described in version >> 0.68_01 was desired to fix some application, but it seems to have >> broken other applications. It seems that either that change to FCGI >> should be modified so that Mason applications with UTF-8 work, or >> Mason itself should be modified, or the CGI::Fast module should be >> modified. Perhaps a new parameter is required to be supplied to >> CGI::Fast or HTML::Mason::CGIHandler to make it UTF-8 aware.
>
The problem is that UTF-X may or may not be well-formed UTF-8 , it could also be UTF-EBCDIC, but you have to understand that you are trying to output Perl's internal encoding and *not* UTF-8. Current behavior is desirable, if you wish to change this you need to convince p5p as it's beyond the scope for an extension module. <snip> Show quoted text
> > I _would_ support and take a patch to add a > SILENTLY_CORRUPT_UNENCODED_CHARACTER_DATA_I_KNOW_WHAT_I_AM_DOING_AND_LIKE_MOJIBAKE > flag (or something similar), which would optionally disable the new > behavior, if the user was prepared to specify very plainly that they > explicitly want the broken behavior.
No need for patch, users who prefer the previously broken behavior can add "use bytes;" which FCGI.XS respects. -- chansen
Subject: Re: [rt.cpan.org #56437] Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Sat, 10 Apr 2010 11:04:58 -0700 (PDT)
To: bug-FCGI [...] rt.cpan.org, mason-users [...] lists.sourceforge.net
From: Eric Benson <eric_a_benson [...] yahoo.com>
Download (untitled) / with headers
text/plain 1.2k
Show quoted text
> The problem is that UTF-X may or may not be well-formed UTF-8 , it could
Show quoted text
> also be UTF-EBCDIC, but you have to understand that you are trying to output > Perl's internal encoding and *not* UTF-8.
Show quoted text
> Current behavior is desirable, if you wish to change this you need to convince > p5p as it's beyond the scope for an extension module.
I understand why this is desirable, I am just trying to understand how to fix my application that previously worked, even if it only worked accidentally. I don't think there's any change available to me at the application level that will solve this. My application doesn't use FCGI directly. It creates a CGI::Fast object and passes that to HTML::Mason::CGIHandler::handle_cgi_object. As far as I can tell, neither CGI::Fast nor Mason has any special handling for Unicode. Mason just passes strings to the print method of the STDOUT stream. Most Mason applications are deployed using mod_perl. I am not using mod_perl, but I assume that those applications are current working with Unicode. What is the change required to enable Mason to work with FastCGI with Unicode strings? Do I need to re-implement or create a subclass of CGI::Fast that doesn't use FCGI directly but uses a wrapper object that does the Perl string to raw bytes conversion itself?
RT-Send-CC: mason-users [...] lists.sourceforge.net, christian.hansen [...] mac.com, bug-CGI [...] rt.cpan.org
Download (untitled) / with headers
text/plain 3.3k
Vid Sat, 10 apr 2010 kl. 14.05.13, skrev eric_a_benson@yahoo.com: Show quoted text
> > The problem is that UTF-X may or may not be well-formed UTF-8 , it
> could >
> > also be UTF-EBCDIC, but you have to understand that you are trying
> to output
> > Perl's internal encoding and *not* UTF-8.
>
> > Current behavior is desirable, if you wish to change this you need
> to convince
> > p5p as it's beyond the scope for an extension module.
> > I understand why this is desirable, I am just trying to understand how > to fix my application that previously worked, even if it only > worked accidentally. I don't think there's any change available to > me at the application level that will solve this. My application > doesn't use FCGI directly. It creates a CGI::Fast object and passes > that to HTML::Mason::CGIHandler::handle_cgi_object. As far as I can > tell, neither CGI::Fast nor Mason has any special handling for > Unicode. Mason just passes strings to the print method of the > STDOUT stream. Most Mason applications are deployed using mod_perl. > I am not using mod_perl, but I assume that those applications are > current working with Unicode. What is the change required to enable > Mason to work with FastCGI with Unicode strings? Do I need to re- > implement or create a subclass of CGI::Fast that doesn't use FCGI > directly but uses a wrapper object that does the Perl string to raw > bytes conversion itself? >
Mason developers should consider implementing support for encodings, thats the proper place to fix it IMO. I can think of two short term solutions: 1) Monkey/Hot patch FCGI::Stream $ cat cgi-hotpatch.pl #!/usr/bin/perl use strict; use warnings; use CGI::Fast qw[]; use FCGI qw[]; use Encode qw[]; my $enc = Encode::find_encoding('UTF-8'); my $org = \&FCGI::Stream::PRINT; no warnings 'redefine'; local *FCGI::Stream::PRINT = sub { for (my $i = 1; $i < @_; $i++) { $_[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); } goto $org; }; while (my $q = CGI::Fast->new) { print $q->start_html, $q->p("\x{263A}"), $q->end_html; } 2) Implement a stream class which does the encoding. untie() current STDOUT and tie() it with new stream class. $ cat cgi-tiehandle.pl #!/usr/bin/perl use strict; use warnings; { package MyStream; use Encode qw[FB_CROAK LEAVE_SRC]; sub PRINT { my $self = shift; $_ = $self->{encoding}->encode($_, FB_CROAK|LEAVE_SRC) for @_; return $self->{stream}->PRINT(@_); } sub TIEHANDLE { my ($class, $stream, $encoding) = @_; return bless { stream => $stream, encoding => $encoding }, $class; } } { package MyFast; use base 'CGI::Fast'; sub new { my $class = shift; my $charset = shift; my $self = $class->SUPER::new(@_); my $stream = tied *STDOUT; $self->charset($charset); no warnings 'untie'; tie *STDOUT, 'MyStream', $stream, Encode::find_encoding($charset); return $self; } } while (my $q = MyFast->new('UTF-8')) { print $q->start_html, $q->p("\x{263A}"), $q->end_html; } We (the FCGI developers) should look into moving away from tiehandle to PerlIO for streams, this would make it easier for users as they can use the encoding pragma: use encoding 'UTF-8'; -- chansen
Subject: [rt.cpan.org #56437] Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Sun, 11 Apr 2010 12:15:25 +0000
To: bug-fcgi [...] rt.cpan.org
From: mason-users-owner [...] lists.sourceforge.net
Download (untitled) / with headers
text/plain 106b
You must be subscribed to post to this list. See http://www.masonhq.com/?YouMustBeSubscribedToPost
CC: mason-users [...] lists.sourceforge.net, christian.hansen [...] mac.com, bug-CGI [...] rt.cpan.org
Subject: [rt.cpan.org #56437] Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Sun, 11 Apr 2010 07:46:52 -0400
From: "Christian Hansen via RT" <bug-FCGI [...] rt.cpan.org>
Download (untitled) / with headers
text/plain 3.3k
<URL: https://rt.cpan.org/Ticket/Display.html?id=56437 > Vid Sat, 10 apr 2010 kl. 14.05.13, skrev eric_a_benson@yahoo.com: Show quoted text
> > The problem is that UTF-X may or may not be well-formed UTF-8 , it
> could >
> > also be UTF-EBCDIC, but you have to understand that you are trying
> to output
> > Perl's internal encoding and *not* UTF-8.
>
> > Current behavior is desirable, if you wish to change this you need
> to convince
> > p5p as it's beyond the scope for an extension module.
> > I understand why this is desirable, I am just trying to understand how > to fix my application that previously worked, even if it only > worked accidentally. I don't think there's any change available to > me at the application level that will solve this. My application > doesn't use FCGI directly. It creates a CGI::Fast object and passes > that to HTML::Mason::CGIHandler::handle_cgi_object. As far as I can > tell, neither CGI::Fast nor Mason has any special handling for > Unicode. Mason just passes strings to the print method of the > STDOUT stream. Most Mason applications are deployed using mod_perl. > I am not using mod_perl, but I assume that those applications are > current working with Unicode. What is the change required to enable > Mason to work with FastCGI with Unicode strings? Do I need to re- > implement or create a subclass of CGI::Fast that doesn't use FCGI > directly but uses a wrapper object that does the Perl string to raw > bytes conversion itself? >
Mason developers should consider implementing support for encodings, thats the proper place to fix it IMO. I can think of two short term solutions: 1) Monkey/Hot patch FCGI::Stream $ cat cgi-hotpatch.pl #!/usr/bin/perl use strict; use warnings; use CGI::Fast qw[]; use FCGI qw[]; use Encode qw[]; my $enc = Encode::find_encoding('UTF-8'); my $org = \&FCGI::Stream::PRINT; no warnings 'redefine'; local *FCGI::Stream::PRINT = sub { for (my $i = 1; $i < @_; $i++) { $_[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); } goto $org; }; while (my $q = CGI::Fast->new) { print $q->start_html, $q->p("\x{263A}"), $q->end_html; } 2) Implement a stream class which does the encoding. untie() current STDOUT and tie() it with new stream class. $ cat cgi-tiehandle.pl #!/usr/bin/perl use strict; use warnings; { package MyStream; use Encode qw[FB_CROAK LEAVE_SRC]; sub PRINT { my $self = shift; $_ = $self->{encoding}->encode($_, FB_CROAK|LEAVE_SRC) for @_; return $self->{stream}->PRINT(@_); } sub TIEHANDLE { my ($class, $stream, $encoding) = @_; return bless { stream => $stream, encoding => $encoding }, $class; } } { package MyFast; use base 'CGI::Fast'; sub new { my $class = shift; my $charset = shift; my $self = $class->SUPER::new(@_); my $stream = tied *STDOUT; $self->charset($charset); no warnings 'untie'; tie *STDOUT, 'MyStream', $stream, Encode::find_encoding($charset); return $self; } } while (my $q = MyFast->new('UTF-8')) { print $q->start_html, $q->p("\x{263A}"), $q->end_html; } We (the FCGI developers) should look into moving away from tiehandle to PerlIO for streams, this would make it easier for users as they can use the encoding pragma: use encoding 'UTF-8'; -- chansen
Subject: Re: [rt.cpan.org #56437] Unicode trouble with FCGI-0.69 and higher using Mason with CGI::Fast
Date: Thu, 15 Apr 2010 19:19:48 -0700 (PDT)
To: bug-FCGI [...] rt.cpan.org
From: Eric Benson <eric_a_benson [...] yahoo.com>
Download (untitled) / with headers
text/plain 379b
Show quoted text
> my $enc = Encode::find_encoding('UTF-8');
Show quoted text
> my $org = \&FCGI::Stream::PRINT; > no warnings 'redefine'; > local *FCGI::Stream::PRINT = sub { > for (my $i = 1; $i < @_; $i++) { > $_[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); > } > goto $org; > };
Thanks for that! My app is working great again with the latest FCGI and this hot patch.
I'm going to resolve this as there isn't anything to fix in FCGI here.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.