Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 88592
Status: open
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: cpan [...] chmrr.net
dagolden [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Can't turn off "does not map to Unicode" warnings
Download (untitled) / with headers
text/plain 783b
When using ":encoding(UTF-8)", I can't seem to turn off the "does not map to Unicode" warning with C<< no warnings "utf8" >> the way I can with the ":utf8" layer. For example, trying to slurp a Latin1 encoded file with :encoding(UTF-8) $ perl -E 'use warnings; my $t = do { no warnings "utf8"; open my $fh, "<:encoding(UTF-8)", "t/lib/Latin1.pm"; local $/; <$fh> }; binmode (STDOUT, ":utf8"); say $t' utf8 "\xE1" does not map to Unicode at -e line 1. ...blah blah Latin 1 text ... But with the :utf8 layer I get no warning: $ perl -E 'use warnings; my $t = do { no warnings "utf8"; open my $fh, "<:utf8", "t/lib/Latin1.pm"; local $/; <$fh> }; binmode (STDOUT, ":utf8"); say $t' ...blah blah Latin 1 text ... Any ideas what could be going on? Regards, David
Because 'UTF-8' is NOT an alias of utf8. It is an alias of utf-8-strict which checks malformed byte sequence. http://perldoc.perl.org/Encode.html#UTF-8-vs.-utf8-vs.-UTF8 Dan the Encode Maintainer On Tue Sep 10 17:54:16 2013, DAGOLDEN wrote: Show quoted text
> When using ":encoding(UTF-8)", I can't seem to turn off the "does not > map to Unicode" warning with C<< no warnings "utf8" >> the way I can > with the ":utf8" layer. > > For example, trying to slurp a Latin1 encoded file with :encoding(UTF- > 8) > > $ perl -E 'use warnings; my $t = do { no warnings "utf8"; open my $fh, > "<:encoding(UTF-8)", "t/lib/Latin1.pm"; local $/; <$fh> }; binmode > (STDOUT, ":utf8"); say $t' > utf8 "\xE1" does not map to Unicode at -e line 1. > ...blah blah Latin 1 text ... > > But with the :utf8 layer I get no warning: > > $ perl -E 'use warnings; my $t = do { no warnings "utf8"; open my $fh, > "<:utf8", "t/lib/Latin1.pm"; local $/; <$fh> }; binmode (STDOUT, > ":utf8"); say $t' > ...blah blah Latin 1 text ... > > Any ideas what could be going on? > > Regards, > David
Download (untitled) / with headers
text/plain 815b
On Tue Sep 10 20:32:37 2013, DANKOGAI wrote: Show quoted text
> Because 'UTF-8' is NOT an alias of utf8. It is an alias of utf-8- > strict which checks malformed byte sequence.
I understand that. I'm not complaining that it complains. I'm complaining that the warning can't be surpressed. It can be fatalize. It can be caught by $SIG{__WARN__}. The docs are unclear what exactly is supposed to happen. It says that "Encode::Unicode" (whatever that is) ignores CHECK and always croaks. That is *not* currently happening. Then FB_DEFAULT says that it uses a lexical warning category of "utf8" if the data is supposed to be UTF-8. But in my example, I say C<< no warnings 'utf8' >> with ":encoding(UTF-8)" yet still get warnings. That's the bug. Would you mind looking into it a little further, please? Thank you, David
Download (untitled) / with headers
text/plain 544b
Sorry. I just re-read what I wrote and realized it was still probably unclear. Reading from a layer with :encoding(UTF-8) throws a warning. It can be caught by $SIG{__WARN__}, so it's clearly being issued via the usual warnings code path. It can be made fatal with C<< use warnings FATAL => 'utf8' >>. But C<< no warnings 'utf8' >> doesn't suppress the warning. I can't figure out why fatalization would work but turning off the warning lexically wouldn't, since I would think that both should work if the category is being set correctly.
Download (untitled) / with headers
text/plain 388b
On 2013-09-10T23:46:43-04:00, DAGOLDEN wrote: Show quoted text
> Sorry. I just re-read what I wrote and realized it was still probably > unclear.
I just ran across this as well. The attached test file may help make the problem explicit, and shows a very odd wrinkle as well -- namely, that it is the additional stack frame from Encode::decode is causing the lexical warnings to not be observed. - Alex
Subject: encoding-warning.pl
Download encoding-warning.pl
text/x-perl 2.7k
#!/usr/bin/env perl use strict; use warnings; use Encode; use Test::More; my $valid = "\x61\x00\x00\x00"; my $invalid = "\x78\x56\x34\x12"; my @warnings; $SIG{__WARN__} = sub {push @warnings, "@_"}; my $enc = find_encoding("UTF32-LE"); diag "This is perl $^V, Encode $Encode::VERSION"; { @warnings = (); my $ret = Encode::Unicode::decode( $enc, $valid ); is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings"); } { @warnings = (); my $ret = Encode::Unicode::decode( $enc, $invalid ); like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns"); } { no warnings 'utf8'; @warnings = (); my $ret = Encode::Unicode::decode( $enc, $invalid ); is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'"); } { no warnings; @warnings = (); my $ret = Encode::Unicode::decode( $enc, $invalid ); is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings"); } { @warnings = (); my $ret = Encode::decode( $enc, $invalid ); like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns"); } { no warnings 'utf8'; @warnings = (); my $ret = Encode::decode( $enc, $invalid ); is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'"); }; { no warnings; @warnings = (); my $ret = Encode::decode( $enc, $invalid ); is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'"); }; done_testing; __END__ # This is perl v5.20.0, Encode 2.63 ok 1 - Method call on valid string produces no warnings ok 2 - Method call on invalid string warns ok 3 - Warning from method call can be silenced via no warnings 'utf8' ok 4 - Warning from method call can be silenced via no warnings ok 5 - Function call on invalid string warns not ok 6 - Warning from function call can be silenced via no warnings 'utf8' # Failed test 'Warning from function call can be silenced via no warnings 'utf8'' # at wat.pl line 55. # got: 'Code point 0x12345678 is not Unicode, may not be portable at /home/chmrr/prog/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0/x86_64-linux/Encode.pm line 175. # ' # expected: '' not ok 7 - Warning from function call can be silenced via no warnings 'utf8' # Failed test 'Warning from function call can be silenced via no warnings 'utf8'' # at wat.pl line 62. # got: 'Code point 0x12345678 is not Unicode, may not be portable at /home/chmrr/prog/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0/x86_64-linux/Encode.pm line 175. # ' # expected: '' 1..7 # Looks like you failed 2 tests of 7.
Download (untitled) / with headers
text/plain 467b
On Tue Oct 21 00:59:52 2014, ALEXMV wrote: Show quoted text
> I just ran across this as well. The attached test file may help make > the problem explicit, and shows a very odd wrinkle as well -- namely, > that it is the additional stack frame from Encode::decode is causing > the lexical warnings to not be observed.
Great insight! I took your test file and patched Encode.pm to catch and re-issue the warning. Pull request is here: https://github.com/dankogai/p5-encode/pull/26


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.