|Subject:||CGI.pm - misleading documentation|
|Date:||Fri, 5 Feb 2010 15:20:16 +0100 (CET)|
|To:||bug-CGI.pm [...] rt.cpan.org|
|From:||Helmut Richter <Helmut.Richter [...] lrz.de>|
Hello, this is not a report on a bug in CGI.pm (in fact it works perfectly although the documentation warns against a very useful feature!) but in its documentation. If this is not the right address to send such comments, please forward. Mit besten Grüßen / Best regards Helmut Richter ==================================================== Dr. Helmut Richter Leibniz-Rechenzentrum Tel: +49-89-35831-8785 Boltzmannstraße 1 Fax: +49-89-35831-9700 85748 Garching / Germany ==================================================== Problem ------- The documentation as found insays about the -utf8 pragma: | -utf8 | | This makes CGI.pm treat all parameters as UTF-8 strings. Use this with care, | as it will interfere with the processing of binary uploads. It is better to | manually select which fields are expected to return utf-8 strings and | convert them using code like this: | | | use Encode; | my $arg = decode utf8=>param('foo'); I have the following qualms with it: 1. It is not at all obvious what exactly is meant with "treat all parameters as UTF-8 strings", or what the consequences for the user of CGI.pm are. The term "UTF-8 string" could mean "binary string containing UTF-8 encoded data"; this is not meant (and it is very fortunate that this is not what happens). 2. It is not so that it "interferes with the processing of binary uploads". Quite the contrary: it is a special feature of the -utf8 pragma that parameters are decoded from UTF-8 *without* interfering with binary uploads (I guess by first extracting the binary data and decoding only the remaining text). At least, I was not able to get any errors into binary uploads by using the -utf8 pragma which did a correct decoding of the input form data without touching the binary upload data. 3. The unnecessary work-around in the last line is *not* a functional substitute for the effect of the -utf8 pragma. If one uses it, one has still to keep track of the encoding of parameters used as defaults, e.g. textfield(-name=>'field_name', -value=>'starting value', -size=>50, -maxlength=>80); will only work if the string for starting value is ASCII, otherwise it must be replaced by "encode ('utf8', 'starting value')". Also, comparing input parameter values with constants can only be done after proper decoding. All this complicated and error-prone wizardry is unnecessary when using the -utf8 pragma. There is no reason to warn against it. Again: there is no need to modify the implementation of CGI.pm -- it does exactly what is needed. Only the documenation must be updated to tell the user what CGI.pm really does. Suggested new wording --------------------- -utf8 This makes CGI.pm treat all parameters as text strings rather than binary strings (see *perlunitut* for the distinction), assuming UTF-8 for the encoding of input/output from/to the form. This is typically used in conjunction with a <form> tag containing the option 'accept-charset="UTF-8"' to ensure UTF-8 input from the form and with 'binmode (STDOUT, ":utf8")' to ensure UTF-8 output to the form, while all handling of the data within the perl script manipulates only text strings. CGI.pm does the decoding from the UTF-8 encoded input data, restricting this decoding to input text as distinct from binary upload data which are left untouched. Therefore, a ':utf8' layer must *not* be used on STDIN.