Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 44523
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: jquelin [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: files containing NULL byte reported as UTF-LE by Encode::Guess
Download (untitled) / with headers
text/plain 554b
attached file contains "foo<null>bar" where <null> is the null byte (ctrl+v ctrl+0 in vim, or ctrl+q in emacs) this file is detected as UTF-16LE by Encode::Guess, as demonstrated by snippet: $ perl -MEncode::Guess -E '$a=qx{cat null}; say guess_encoding($a,"ascii")->name;' UTF-16LE and of course, using this detected encoding to decode the file yields very strange results: $ perl -MEncode -E '$a=qx{cat null}; $b=decode("UTF-16LE",$a); say $b' Wide character in print at -e line 1. 潦o慢ੲ happens with Encode 2.32, providing Encode::Guess 2.03
Subject: null
Download null
application/octet-stream 8b

Message body not shown because it is not plain text.

Download (untitled) / with headers
text/plain 778b
On Tue Mar 24 12:00:26 2009, JQUELIN wrote: Show quoted text
> attached file contains "foo<null>bar" where <null> is the null byte > (ctrl+v ctrl+0 in vim, or ctrl+q in emacs) > > this file is detected as UTF-16LE by Encode::Guess, as demonstrated by > snippet: > > $ perl -MEncode::Guess -E '$a=qx{cat null}; say > guess_encoding($a,"ascii")->name;' > UTF-16LE > > and of course, using this detected encoding to decode the file yields > very strange results: > $ perl -MEncode -E '$a=qx{cat null}; $b=decode("UTF-16LE",$a); say $b' > Wide character in print at -e line 1. > 潦o慢ੲ > > happens with Encode 2.32, providing Encode::Guess 2.03
No, that's not a bug. That's what UTF-(16|32)(LE|BE) is all about. i.e \x20\x00 is VALID and it means \x{0020}. Dan the Maintainer Thereof
Download (untitled) / with headers
text/plain 157b
i understand that the sequence is valid utf-16. what i'm objecting is that it's not the best guess in this case... what should i do to have a correct guess?
Download (untitled) / with headers
text/plain 405b
On Tue Mar 24 13:28:34 2009, JQUELIN wrote: Show quoted text
> i understand that the sequence is valid utf-16. what i'm objecting is > that it's not the best guess in this case... > > what should i do to have a correct guess?
Of course it is not the best. After all it is guessing and so long as it appears vaild, it returns the only valid guess. Read perldoc Encode::Guess one more time. Dan the Encode Maintainer


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.