Skip Menu |
 

This queue is for tickets about the FCGI CPAN distribution.

Report information
The Basics
Id: 59328
Status: rejected
Priority: 0/
Queue: FCGI

People
Owner: Nobody in particular
Requestors: bitcard [...] cfs.parliant.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.68_01
Fixed in: (no value)



Subject: FCGI won't accept unicode on output, fails with "Wide character in FCGI::Stream::PRINT"
When using the FCGI module, it seems that all output containing UTF8 data needs to have encode_utf8() called on the strings before calling print(). The attached sample script demonstrates the problem -- it prints a smiley face character twice -- once after calling encode_utf8(), and then again afterwards by directly printing the string with the unicode literal in it. The second smiley does not print, and instead Apache logs this error: [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. This change seems to have started being a problem in version 0.68_01 where code was added to FCGI.XL like this: #ifdef DO_UTF8 if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) croak("Wide character in FCGI::Stream::PRINT"); #endif In previous versions, the Unicode data passes through OK, but now I'm assuming that the sv_utf8_downgrade() call is failing and therefore the script fails. It is bad that all perl strings would need to have encode_utf8() called on them to make them safe for FCGI. It seems that the fix described in the ChangeLog for this is not correct for all cases -- it seems that FCGI should allow print() calls using Unicode strings that were correctly created in normal perl ways. Is it possible that sv_utf8_downgrade() is not supposed to be used in this way? I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this seems to have no effect (but it does work for CGI or command-line output). Is it possible that FCGI's stream handling is bypassing the perl IO filtering that is enabled by binmode? The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 and the perl 5.8.9_3 port. This problem has been verified on machines running both i386 and amd64 versions of FreeBSD on different machines. perl -v reports; This is perl, v5.8.9 built for amd64-freebsd (with 1 registered patch, see perl -V for more detail)
Subject: fcgiunitest
Download fcgiunitest
application/octet-stream 883b

Message body not shown because it is not plain text.

Download (untitled) / with headers
text/plain 2.8k
Vid Tue, 13 Jul 2010 kl. 17.42.48, skrev csaldanh: Show quoted text
> When using the FCGI module, it seems that all output containing UTF8 > data needs to have encode_utf8() called on the strings before calling > print().
This is true for all Unicode strings in Perl. It's possible to produce Unicode strings without an encoding but you can't interchange them without an encoding. Show quoted text
> The attached sample script demonstrates the problem -- it prints a > smiley face character twice -- once after calling encode_utf8(), and > then again afterwards by directly printing the string with the unicode > literal in it. The second smiley does not print, and instead Apache > logs this error: > > [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: > server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in > FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. > > > This change seems to have started being a problem in version 0.68_01 > where code was added to FCGI.XL like this: > > #ifdef DO_UTF8 > if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) > croak("Wide character in FCGI::Stream::PRINT"); > #endif >
Correct. Show quoted text
> In previous versions, the Unicode data passes through OK, but now I'm > assuming that the sv_utf8_downgrade() call is failing and therefore the > script fails. It is bad that all perl strings would need to have > encode_utf8() called on them to make them safe for FCGI. >
This is incorrect. Previous versions passed perl's internal representation of Unicode (UTF-X) which may or may not be UTF-8 depending on the data and platform. Show quoted text
> > It seems that the fix described in the ChangeLog for this is not correct > for all cases -- it seems that FCGI should allow print() calls using > Unicode strings that were correctly created in normal perl ways. Is it > possible that sv_utf8_downgrade() is not supposed to be used in this way?
Usage of sv_utf8_downgrade() is correct, it attempts to encode the string to octets and will fail if the string contains characters above 0xFF ("Wide character in %s"). Show quoted text
> I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this > seems to have no effect (but it does work for CGI or command-line > output). Is it possible that FCGI's stream handling is bypassing the > perl IO filtering that is enabled by binmode? >
FCGI.pm uses the TIEHANDLE API for streams, not PerlIO. Show quoted text
> > The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 > and the perl 5.8.9_3 port. This problem has been verified on machines > running both i386 and amd64 versions of FreeBSD on different machines. > > perl -v reports; > This is perl, v5.8.9 built for amd64-freebsd > (with 1 registered patch, see perl -V for more detail)
If you want the previous (FCGI.pm <= 0.68) incorrect behavior can disable the exception by using the C<bytes> pragma. { use bytes; print "\x{263A}"; } -- chansen
Download (untitled) / with headers
text/plain 184b
I'm rejecting this ticket as I don't believe there is anything to fix here. We've also arranged to patch the documentation to make this more clear in subsequent versions. Thanks t0m


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.