Skip Menu |
 

This queue is for tickets about the MARC-Record CPAN distribution.

Report information
The Basics
Id: 48120
Status: new
Priority: 0/
Queue: MARC-Record

People
Owner: Nobody in particular
Requestors: henridamien.laurent [...] biblibre.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by diesel.bestpractical.com (Postfix) with SMTP id 89FF019B830A for <bug-MARC-Record [...] rt.cpan.org>; Thu, 23 Jul 2009 12:35:01 -0400 (EDT)
Received: (qmail 23747 invoked by uid 103); 23 Jul 2009 16:35:00 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 23 Jul 2009 16:35:00 -0000
Received: from smtp-104-thursday.nerim.net (HELO kraid.nerim.net) (62.4.16.104) by 16.mx.develooper.com (qpsmtpd/0.80) with ESMTP; Thu, 23 Jul 2009 09:34:54 -0700
Received: from [192.168.1.69] (hdlaurent.pck.nerim.net [62.212.120.129]) by kraid.nerim.net (Postfix) with ESMTP id 428ADCFF8B for <bug-MARC-Record [...] rt.cpan.org>; Thu, 23 Jul 2009 18:34:50 +0200 (CEST)
Delivered-To: cpan-bug+MARC-Record [...] diesel.bestpractical.com
Subject: UTF8 diacritics management problems
MIME-Version: 1.0
User-Agent: Thunderbird 2.0.0.22 (X11/20090608)
X-Spam-Status: No, hits=0.0 required=8.0 tests=
Return-Path: <henridamien.laurent [...] biblibre.com>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: bug-MARC-Record [...] rt.cpan.org
Date: Thu, 23 Jul 2009 18:34:50 +0200
X-Spam-Level: *
X-Virus-Checked: Checked by ClamAV on 16.mx.develooper.com
Content-Type: multipart/mixed; boundary="------------090701010007020700020604"
Message-ID: <4A68912A.40002 [...] biblibre.com>
To: bug-MARC-Record [...] rt.cpan.org
From: LAURENT Henri-Damien <henridamien.laurent [...] biblibre.com>
Content-Length: 0
content-type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 983
Download (untitled) / with headers
text/plain 983b
*Here comes a test script to see that somehow, PERL management of UTF8 sometimes can break encoding of diacritics for MARC::Record Data. Solution is to set UTF8 Flag for all the subfields. perl testMARCRecord.pl fichierTestUTF8.2709 shows data before and after adding a simple field. Solution comes with a function like this : sub SetUTF8Flag{ my ($record)=@_; return unless ($record && $record->fields()); foreach my $field ($record->fields()){ if ($field->tag()>=10){ my @subfields; foreach my $subfield ($field->subfields()){ push @subfields,($$subfield[0],utf8::encode($$subfield[1])); } my $newfield=MARC::Field->new( $field->tag(), $field->indicator(1), $field->indicator(2), @subfields ); $field->replace_with($newfield); } } } *
content-type: application/octet-stream; name="fichierTestUTF8.2709"
content-disposition: attachment; filename="fichierTestUTF8.2709"
Content-Transfer-Encoding: base64
Content-Length: 2651
Download fichierTestUTF8.2709
application/octet-stream 2.5k

Message body not shown because it is not plain text.

content-type: application/x-perl; name="testMARCRecord.pl"
content-disposition: inline; filename="testMARCRecord.pl"
Content-Transfer-Encoding: 8bit
Content-Length: 1073
#!/usr/bin/perl use strict; use warnings; # Koha modules used use MARC::File::USMARC; use MARC::File::XML; use MARC::Record; use MARC::Batch; use MARC::Charset; use C4::Charset; use utf8; use open qw( :std :utf8); use Encode; my ( $input_marc_file) = (''); $|=1; my $debug=$ENV{DEBUG}; my $batch; my $fh = IO::File->new($ARGV[0]); # don't let MARC::Batch open the file, as it applies the ':utf8' IO layer $batch = MARC::Batch->new( 'USMARC', $fh ); $batch->warnings_off(); $batch->strict_off(); my $i=0; my $commitnum = $commit ? $commit : 50; RECORD: while ( ) { my $record; # get records eval { $record = $batch->next() }; if ( $@ ) { print "Bad MARC record: skipped\n"; next; } # skip if we get an empty record (that is MARC valid, but will result in AddBiblio failure last unless ( $record ); my $record2=$record->clone; warn "Original :", $record->as_formatted; $record->insert_fields_ordered(MARC::Field->new('700','','',a=>"Billé",b=>'Louis')); warn "Modified :",$record->as_formatted; $i++; }


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.