Skip Menu |
 

This queue is for tickets about the Encode-Detect CPAN distribution.

Report information
The Basics
Id: 15399
Status: rejected
Priority: 0/
Queue: Encode-Detect

People
Owner: Nobody in particular
Requestors: vskytta [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.01
Fixed in: (no value)



Subject: Some UTF-8 detected as EUC-JP/EUC-KR
Download (untitled) / with headers
text/plain 184b
With 0.01 and Perl 5.8.6 on Fedora Core 4, some UTF-8 data appears to be detected as EUC-*. For example, "München" in UTF-8 gets detected as EUC-JP, and "Skyttä" in UTF-8 as EUC-KR.
Subject: Some UTF-8 detected as EUC-JP/EUC-KR/gb18030
Download (untitled) / with headers
text/plain 170b
The attached utf-8 file is incorrectly detected as gb18030 $ perl -MEncode::Detect::Detector -E 'say Encode::Detect::Detector::detect(`cat ~/prueba.html`);' gb18030 $
Subject: prueba.html
Download prueba.html
text/html 301b

Lotería Canción

Download (untitled) / with headers
text/plain 429b
On Thu Nov 11 14:59:40 2010, DMUEY wrote: Show quoted text
> The attached utf-8 file is incorrectly detected as gb18030 > > $ perl -MEncode::Detect::Detector -E 'say > Encode::Detect::Detector::detect(`cat ~/prueba.html`);' > gb18030 > $
There are exactly two bytes in the file which are not ASCII so it's a bit ridiculous to expect a correct guess here. Unless you have evidence that bytes c3 b3 are not valid GB18030 it's not a mistake.
Misdetections on short files will happen.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.