Skip Menu |

This queue is for tickets about the MARC-Record CPAN distribution.

Report information
The Basics
Id: 48120
Status: new
Priority: 0/
Queue: MARC-Record

Owner: Nobody in particular
Requestors: henridamien.laurent [...]

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)

Received: from ( []) by (Postfix) with SMTP id 89FF019B830A for <bug-MARC-Record [...]>; Thu, 23 Jul 2009 12:35:01 -0400 (EDT)
Received: (qmail 23747 invoked by uid 103); 23 Jul 2009 16:35:00 -0000
Received: from ( by with QMQP; 23 Jul 2009 16:35:00 -0000
Received: from (HELO ( by (qpsmtpd/0.80) with ESMTP; Thu, 23 Jul 2009 09:34:54 -0700
Received: from [] ( []) by (Postfix) with ESMTP id 428ADCFF8B for <bug-MARC-Record [...]>; Thu, 23 Jul 2009 18:34:50 +0200 (CEST)
Delivered-To: cpan-bug+MARC-Record [...]
Subject: UTF8 diacritics management problems
MIME-Version: 1.0
User-Agent: Thunderbird (X11/20090608)
X-Spam-Status: No, hits=0.0 required=8.0 tests=
Return-Path: <henridamien.laurent [...]>
X-Original-To: bug-MARC-Record [...]
Date: Thu, 23 Jul 2009 18:34:50 +0200
X-Spam-Level: *
X-Virus-Checked: Checked by ClamAV on
Content-Type: multipart/mixed; boundary="------------090701010007020700020604"
Message-ID: <4A68912A.40002 [...]>
To: bug-MARC-Record [...]
From: LAURENT Henri-Damien <henridamien.laurent [...]>
Content-Length: 0
content-type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 983
Download (untitled) / with headers
text/plain 983b
*Here comes a test script to see that somehow, PERL management of UTF8 sometimes can break encoding of diacritics for MARC::Record Data. Solution is to set UTF8 Flag for all the subfields. perl fichierTestUTF8.2709 shows data before and after adding a simple field. Solution comes with a function like this : sub SetUTF8Flag{ my ($record)=@_; return unless ($record && $record->fields()); foreach my $field ($record->fields()){ if ($field->tag()>=10){ my @subfields; foreach my $subfield ($field->subfields()){ push @subfields,($$subfield[0],utf8::encode($$subfield[1])); } my $newfield=MARC::Field->new( $field->tag(), $field->indicator(1), $field->indicator(2), @subfields ); $field->replace_with($newfield); } } } *
content-type: application/octet-stream; name="fichierTestUTF8.2709"
content-disposition: attachment; filename="fichierTestUTF8.2709"
Content-Transfer-Encoding: base64
Content-Length: 2651
Download fichierTestUTF8.2709
application/octet-stream 2.5k

Message body not shown because it is not plain text.

content-type: application/x-perl; name=""
content-disposition: inline; filename=""
Content-Transfer-Encoding: 8bit
Content-Length: 1073
#!/usr/bin/perl use strict; use warnings; # Koha modules used use MARC::File::USMARC; use MARC::File::XML; use MARC::Record; use MARC::Batch; use MARC::Charset; use C4::Charset; use utf8; use open qw( :std :utf8); use Encode; my ( $input_marc_file) = (''); $|=1; my $debug=$ENV{DEBUG}; my $batch; my $fh = IO::File->new($ARGV[0]); # don't let MARC::Batch open the file, as it applies the ':utf8' IO layer $batch = MARC::Batch->new( 'USMARC', $fh ); $batch->warnings_off(); $batch->strict_off(); my $i=0; my $commitnum = $commit ? $commit : 50; RECORD: while ( ) { my $record; # get records eval { $record = $batch->next() }; if ( $@ ) { print "Bad MARC record: skipped\n"; next; } # skip if we get an empty record (that is MARC valid, but will result in AddBiblio failure last unless ( $record ); my $record2=$record->clone; warn "Original :", $record->as_formatted; $record->insert_fields_ordered(MARC::Field->new('700','','',a=>"Billé",b=>'Louis')); warn "Modified :",$record->as_formatted; $i++; }

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to