Skip Menu |
 

This queue is for tickets about the YAML-Syck CPAN distribution.

Report information
The Basics
Id: 25436
Status: resolved
Priority: 0/
Queue: YAML-Syck

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.82
Fixed in: (no value)



Subject: Warning when using wide characters
Download (untitled) / with headers
text/plain 395b
The following script: #!/usr/bin/perl use strict; use warnings; use YAML::Syck qw(); $YAML::Syck::ImplicitUnicode = 1; YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}"); would cause the warning: Wide character in print at /usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51. I guess that something like binmode(":utf8") in DumpFile would fix the warning. Regards, Slaven
Download (untitled) / with headers
text/plain 822b
On Wed Mar 14 13:00:48 2007, SREZIC wrote: Show quoted text
> The following script: > > #!/usr/bin/perl > use strict; > use warnings; > use YAML::Syck qw(); > $YAML::Syck::ImplicitUnicode = 1; > YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}"); > > would cause the warning: > > Wide character in print at > /usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51. > > I guess that something like binmode(":utf8") in DumpFile would fix the > warning. >
The issue is still in 1.05. Thinking again about it, it seems that the binmode call must not be unconditional, but only used if $ImplicitUnicode is set. Also it is not clear what to do in the case if DumpFile operates on an open filehandle. Let the user set binmode on the filehandle? Push the utf8 layer before writing/reading and pop it after? Regards, Slaven
Download (untitled) / with headers
text/plain 156b
I don't know enough about the YAML spec to know if non-utf8 wide chars are supported. An easy solution that would not round trip well would be this patch
Subject: patch1.txt
Download patch1.txt
text/plain 1k
diff --git a/lib/YAML/Syck.pm b/lib/YAML/Syck.pm index 1353866..8badaac 100644 --- a/lib/YAML/Syck.pm +++ b/lib/YAML/Syck.pm @@ -96,21 +96,22 @@ sub _is_openhandle { sub DumpFile { my $file = shift; + require Encode; if ( _is_openhandle($file) ) { if ($#_) { - print {$file} YAML::Syck::DumpYAML($_) for @_; + print {$file} Encode::encode_utf8(YAML::Syck::DumpYAML($_)) for @_; } else { - print {$file} YAML::Syck::DumpYAML($_[0]); + print {$file} Encode::encode_utf8(YAML::Syck::DumpYAML($_[0])); } } else { open(my $fh, '>', $file) or die "Cannot write to $file: $!"; if ($#_) { - print $fh YAML::Syck::DumpYAML($_) for @_; + print $fh Encode::encode_utf8(YAML::Syck::DumpYAML($_)) for @0_; } else { - print $fh YAML::Syck::DumpYAML($_[0]); + print $fh Encode::encode_utf8(YAML::Syck::DumpYAML($_[0])); } close $fh; }
Download (untitled) / with headers
text/plain 267b
I spoke with Avar about this. The plan is to update the documentation to clarify that if you are expected to open the file handle as UTF8 if you expect wide chars to be in the structure: open(my $fh, ">:encoding(UTF-8)", "out.yml") or die DumpFile($fh, $hashref);
Download (untitled) / with headers
text/plain 1.2k
On 2010-07-20 00:11:55, TODDR wrote: Show quoted text
> I spoke with Avar about this. The plan is to update the documentation > to clarify that if you are > expected to open the file handle as UTF8 if you expect wide chars to > be in the structure: > > open(my $fh, ">:encoding(UTF-8)", "out.yml") or die > DumpFile($fh, $hashref); >
Sorry, I have to re-open this ticket. Using this is not enough to get a dump/load roundtrip working (see below). Also, I don't like it that the user has to do something special to have wide character serialization correct. I think there should be a way to detect the presence of wide characters automatically and do the right thing? Regards, Slaven #!/usr/bin/perl -w use strict; use Test::More 'no_plan'; use YAML::Syck qw(DumpFile LoadFile); my $test = ["\x{20ac}"]; open(my $fh, ">:encoding(UTF-8)", "/tmp/test.yml"); DumpFile $fh, $test; close $fh or die $!; my $test2 = LoadFile "/tmp/test.yml"; is_deeply($test2,$test); __END__ $ perl5.12.0 /tmp/yamlsyck.pl not ok 1 # Failed test at /tmp/yamlsyck.pl line 12. Wide character in print at /usr/perl5.12.0/lib/5.12.0/Test/Builder.pm line 1753. # Structures begin differing at: # $got->[0] = 'âÃÂì' # $expected->[0] = 'â¬' 1..1 # Looks like you failed 1 test of 1. Exitcode 1
Download (untitled) / with headers
text/plain 417b
Show quoted text
> Also, I don't like it that the user has to do something special to have > wide character serialization correct. I think there should be a way to > detect the presence of wide characters automatically and do the right thing?
As an english speaker, my wide character ignorance is vast. I'm open to suggestions but the little I know is that auto-detection algorithms for UTF8 are buggy at best. What do you suggest?
Download (untitled) / with headers
text/plain 804b
On 2010-08-30 13:13:12, TODDR wrote: Show quoted text
> > Also, I don't like it that the user has to do something special to
> have
> > wide character serialization correct. I think there should be a way
> to
> > detect the presence of wide characters automatically and do the
> right thing? > > As an english speaker, my wide character ignorance is vast. I'm open > to suggestions but the little > I know is that auto-detection algorithms for UTF8 are buggy at best. > > What do you suggest?
I had a very brief look into the source code of YAML::Syck. Probably the root problem is the usage of SvPV and newSVpvn in perl_syck.h. It should rather use SvPV_utf8 and newSVpvn_utf8. I think in this case all the hacks with ImplicitUnicode and suggesting an encoding layer when doing IO may be removed. Regards, Slaven
Ticket migrated to github as https://github.com/toddr/YAML-Syck/issues/28


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.