Skip Menu |
 

This queue is for tickets about the Search-Dict CPAN distribution.

Report information
The Basics
Id: 97188
Status: open
Priority: 0/
Queue: Search-Dict

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.07
Fixed in: (no value)



Subject: Warnings when filehandle with utf8 layer is used
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
Message-ID: <rt-4.0.18-6817-1405257451-1666.0-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1405257451-6817-5"
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
Content-Length: 498
Download (untitled) / with headers
text/plain 498b
If look() is used with a filehandle with a utf8 layer, and the file has actually codepoints >= 128, then it's likely that warnings in the form of # utf8 "\xBC" does not map to Unicode at /usr/share/perl/5.10/Search/Dict.pm line 76, <$fh> line 2. are generated. See the attached test file for an example. The reason for this problem: when doing the seek() it can happen that the file pointer ends up in the middle of the UTF-8 sequence, causing the (mandatory?) warning. Regards, Slaven
Subject: search-dict-utf8.t
MIME-Version: 1.0
Content-Type: text/troff; name="search-dict-utf8.t"
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline; filename="search-dict-utf8.t"
Content-Transfer-Encoding: binary
Content-Length: 585
Download search-dict-utf8.t
text/x-perl 585b
#!/usr/bin/perl use strict; use File::Temp 'tempfile'; use Search::Dict; use Test::More 'no_plan'; my @warnings; $SIG{__WARN__} = sub { push @warnings, @_ }; my $encoding = 'utf8'; #my $encoding = 'iso-8859-1'; my($tmpfh,$tmpfile) = tempfile(UNLINK => 1); binmode $tmpfh, ":encoding($encoding)"; for (qw(abc def ghi jkl mno pqr stu vwx yz)) { print $tmpfh $_ . ("\x{fc}"x4096) . "\n"; } close $tmpfh or die $!; open my $fh, "<:encoding($encoding)", $tmpfile or die $!; look $fh, 'vwx'; like scalar(<$fh>), qr{^vwx}; is_deeply join("\n",@warnings), "", "no warnings"; __END__
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-6817-1405257451-1666.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-6817-1405257451-1666.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-4889-1441535788-577.97188-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 765
Download (untitled) / with headers
text/plain 765b
On 2014-07-13 09:17:31, SREZIC wrote: Show quoted text
> If look() is used with a filehandle with a utf8 layer, and the file > has actually codepoints >= 128, then it's likely that warnings in the > form of > > # utf8 "\xBC" does not map to Unicode at > /usr/share/perl/5.10/Search/Dict.pm line 76, <$fh> line 2. > > are generated. See the attached test file for an example. > > The reason for this problem: when doing the seek() it can happen that > the file pointer ends up in the middle of the UTF-8 sequence, causing > the (mandatory?) warning. > > Regards, > Slaven
Currently my workaround is to cease these warnings before calling look(): local $SIG{__WARN__} = sub { push @warnings, grep { !/utf8 .* does not map to Unicode/ } @_ }; Search::Dict::look(...)
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-4889-1441535788-577.97188-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-6817-1405257451-1666.0-0-0 [...] rt.cpan.org> <rt-4.0.18-4889-1441535788-577.97188-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-7104-1441558244-1549.97188-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 225
Download (untitled) / with headers
text/plain 225b
Great bug! I wonder if temporarily setting the filehandle to "raw" would be a good solution. I hate messing with layers, though. Narrowing the scope of disabling warnings into the look-up might be a less invasive solution.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-4889-1441535788-577.97188-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-6817-1405257451-1666.0-0-0 [...] rt.cpan.org> <rt-4.0.18-4889-1441535788-577.97188-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-1045-1573567430-992.97188-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1111
On 2015-09-06 06:36:28, SREZIC wrote: Show quoted text
> On 2014-07-13 09:17:31, SREZIC wrote:
> > If look() is used with a filehandle with a utf8 layer, and the file > > has actually codepoints >= 128, then it's likely that warnings in the > > form of > > > > # utf8 "\xBC" does not map to Unicode at > > /usr/share/perl/5.10/Search/Dict.pm line 76, <$fh> line 2. > > > > are generated. See the attached test file for an example. > > > > The reason for this problem: when doing the seek() it can happen that > > the file pointer ends up in the middle of the UTF-8 sequence, causing > > the (mandatory?) warning. > > > > Regards, > > Slaven
> > Currently my workaround is to cease these warnings before calling > look(): > > local $SIG{__WARN__} = sub { push @warnings, grep { !/utf8 .* does not > map to Unicode/ } @_ }; > Search::Dict::look(...)
Just for the record: since perl 5.28 the warning message is slightly different ("UTF-8" instead "utf8"), so the workaround looks now: local $SIG{__WARN__} = sub { push @warnings, grep { !/(?:utf8|UTF-8) .* does not map to Unicode/ } @_ }; Search::Dict::look(...)


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.