Skip Menu |
 

This queue is for tickets about the XML-Tidy CPAN distribution.

Report information
The Basics
Id: 24113
Status: resolved
Priority: 0/
Queue: XML-Tidy

People
Owner: Pip [...] CPAN.Org
Requestors: Frank.G.Goss [...] aphis.usda.gov
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 4FE254D80C2 for <bug-XML-Tidy [...] rt.cpan.org>; Wed, 27 Dec 2006 12:29:16 -0500 (EST)
Received: (qmail 8445 invoked by alias); 27 Dec 2006 17:29:16 -0000
Received: from mailco100.aphis.usda.gov (HELO mailco100.aphis.usda.gov) (168.68.129.31) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Wed, 27 Dec 2006 09:29:13 -0800
Received: from (unknown [168.68.129.31]) by DA32USCOFC1_AVS01.usda.gov with smtp id 39f0_33e2d28e_95cf_11db_b395_001143d22fdf; Wed, 27 Dec 2006 17:25:11 +0000
Delivered-To: cpan-bug+xml-tidy [...] diesel.bestpractical.com
MIME-Version: 1.0
Subject: XML-Tidy changes encoding
X-Spam-Status: No, hits=-1.5 required=8.0 tests=BAYES_00,HTML_50_60,HTML_MESSAGE,NO_REAL_NAME
Return-Path: <Frank.G.Goss [...] aphis.usda.gov>
X-Mailer: Lotus Notes Release 6.5.4 March 27, 2005
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-XML-Tidy [...] rt.cpan.org
Date: Wed, 27 Dec 2006 10:29:05 -0700
Received-SPF: pass (x1.develooper.com: local policy)
X-Mimetrack: Serialize by Router on MailCO100/INT/APHIS/USDA(Release 6.5.5|November 30, 2005) at 12/27/2006 10:29:12, Serialize complete at 12/27/2006 10:29:12
Message-Id: <OF0DF44958.F582A204-ON87257251.005F47DA-87257251.00600C85 [...] aphis.usda.gov>
Content-Type: multipart/alternative; boundary="=_alternative 00600C7E87257251_="
To: bug-XML-Tidy [...] rt.cpan.org
From: Frank.G.Goss [...] aphis.usda.gov
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: US-ASCII
Content-Length: 973
Download (untitled) / with headers
text/plain 973b
version: XML-Tidy 1.2.43HJnFa Perl version: v5.8.8 build for MSWin32-x86-multi-thread This is the code fragment that I am running. opendir (DIR, $sourceDir) || die "Could not open directory, $sourceDir: $!\n"; while (defined($file = readdir(DIR))) { next if $file =~ /^\.\.?$/; # skip . and .. print "processing file, $file\n"; my $sourceFile = $sourceDir."/".$file; my $tidyObj = XML::Tidy->new('filename' => $sourceFile); $tidyObj->tidy(' '); $tidyObj->write('filename' => $sourceFile.".BAK"); } closedir(DIR); I have a number of XML file to tidy-up. The original files have the following declaration: <?xml version="1.0" encoding="ISO-8859-1"?> After tidying up the declaration changes to: <?xml version="1.0" encoding="utf-8"?> This causes errors with the validation since there are some accented characters not in the UTF-8 character set. How can Tidy be changed to preserve the declaration and the encoding? Regards, Frank Goss
Content-Type: text/html; charset="US-ASCII"
X-RT-Original-Encoding: US-ASCII
Content-Length: 2065
MIME-Version: 1.0
X-Spam-Status: No, hits=-2.6 required=8.0 tests=BAYES_00,DK_SIGNED,DK_VERIFIED,SPF_PASS
In-Reply-To: <rt-3.6.HEAD-7520-1167240574-670.24113-4-0 [...] rt.cpan.org>
Content-Disposition: inline
Received-SPF: pass (x1.develooper.com: domain of pipstuart [...] gmail.com designates 66.249.92.173 as permitted sender)
References: <RT-Ticket-24113 [...] rt.cpan.org> <OF0DF44958.F582A204-ON87257251.005F47DA-87257251.00600C85 [...] aphis.usda.gov> <rt-3.6.HEAD-7520-1167240574-670.24113-4-0 [...] rt.cpan.org>
Reply-To: PipStuart [...] Gmail.Com
Content-Type: text/plain; charset="utf-8"; format="flowed"
X-RT-Original-Encoding: ISO-8859-1
Received: from la.mx.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id E59264D80BA for <bug-XML-Tidy [...] rt.cpan.org>; Wed, 27 Dec 2006 12:49:50 -0500 (EST)
Received: (qmail 18331 invoked by alias); 27 Dec 2006 17:49:50 -0000
Received: from ug-out-1314.google.com (HELO ug-out-1314.google.com) (66.249.92.173) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Wed, 27 Dec 2006 09:49:47 -0800
Received: by ug-out-1314.google.com with SMTP id k3so660322ugf for <bug-XML-Tidy [...] rt.cpan.org>; Wed, 27 Dec 2006 09:49:41 -0800 (PST)
Received: by 10.67.117.18 with SMTP id u18mr7237576ugm.1167241781870; Wed, 27 Dec 2006 09:49:41 -0800 (PST)
Received: by 10.67.31.19 with HTTP; Wed, 27 Dec 2006 09:49:41 -0800 (PST)
Delivered-To: cpan-bug+xml-tidy [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #24113] XML-Tidy changes encoding
Domainkey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=OriU1uZO4EVUihdnEhZQwhk8PCk7W7nTN23LFqJ31GIEJFRLCmHgyWpkPY64jxsu8D/yfJ0HRz7x3y6T9C77nhGwzA7yMxg8T1KmDbpWAuKOyVVLEpwkAfq2IrfM2RF7dzZVIZrY5i78//h4MVYriHELmVD/8IPNuS1UVCQBB9k=
Return-Path: <pipstuart [...] gmail.com>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: bug-XML-Tidy [...] rt.cpan.org
Date: Wed, 27 Dec 2006 09:49:41 -0800
Message-Id: <289a62770612270949r1363dbd3k5a97136235498152 [...] mail.gmail.com>
To: bug-XML-Tidy [...] rt.cpan.org, Frank.G.Goss [...] aphis.usda.gov
Content-Transfer-Encoding: 7bit
From: "Pip Stuart" <pipstuart [...] gmail.com>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-7469-1167241798-826.24113-0-0 [...] rt.cpan.org>
Content-Length: 2921
Download (untitled) / with headers
text/plain 2.8k
Hello Frank, Thanks for reporting this oversight. I'm not sure if the XML declaration header is exposed from the XML::XPath module that my XML::Tidy inherits from but... I'll try to use whatever is there unmolested (or include some work-around code) the next time I can get to packaging a new release. Sorry for any inconvenience caused by my bug. Sincerely, -Pip@CPAN.Org On 12/27/06, Frank.G.Goss@aphis.usda.gov via RT <bug-XML-Tidy@rt.cpan.org> wrote: Show quoted text
> Wed Dec 27 12:29:32 2006: Request 24113 was acted upon. > Transaction: Ticket created by Frank.G.Goss@aphis.usda.gov > Queue: XML-Tidy > Subject: XML-Tidy changes encoding > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: Frank.G.Goss@aphis.usda.gov > Status: new > Ticket <URL: > http://rt.cpan.org/Ticket/Display.html?id=24113 > > > > version: XML-Tidy 1.2.43HJnFa > > Perl version: v5.8.8 build for MSWin32-x86-multi-thread > > This is the code fragment that I am running. > > opendir (DIR, $sourceDir) || die "Could not open directory, $sourceDir: > $!\n"; > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > print "processing file, $file\n"; > my $sourceFile = $sourceDir."/".$file; > my $tidyObj = XML::Tidy->new('filename' => $sourceFile); > $tidyObj->tidy(' '); > $tidyObj->write('filename' => $sourceFile.".BAK"); > } > closedir(DIR); > > I have a number of XML file to tidy-up. The original files have the > following declaration: > > <?xml version="1.0" encoding="ISO-8859-1"?> > > After tidying up the declaration changes to: > > <?xml version="1.0" encoding="utf-8"?> > > This causes errors with the validation since there are some accented > characters not in the UTF-8 character set. > > How can Tidy be changed to preserve the declaration and the encoding? > > Regards, > Frank Goss > > > > version: XML-Tidy 1.2.43HJnFa > > Perl version: v5.8.8 build for MSWin32-x86-multi-thread > > This is the code fragment that I am running. > > opendir (DIR, $sourceDir) || die "Could not open directory, $sourceDir: > $!\n"; > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > print "processing file, $file\n"; > my $sourceFile = $sourceDir."/".$file; > my $tidyObj = XML::Tidy->new('filename' => $sourceFile); > $tidyObj->tidy(' '); > $tidyObj->write('filename' => $sourceFile.".BAK"); > } > closedir(DIR); > > I have a number of XML file to tidy-up. The original files have the > following declaration: > > <?xml version="1.0" encoding="ISO-8859-1"?> > > After tidying up the declaration changes to: > > <?xml version="1.0" encoding="utf-8"?> > > This causes errors with the validation since there are some accented > characters not in the UTF-8 character set. > > How can Tidy be changed to preserve the declaration and the encoding? > > Regards, > Frank Goss
MIME-Version: 1.0
In-Reply-To: <OF0DF44958.F582A204-ON87257251.005F47DA-87257251.00600C85 [...] aphis.usda.gov>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <OF0DF44958.F582A204-ON87257251.005F47DA-87257251.00600C85 [...] aphis.usda.gov>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-2373-1280167749-620.24113-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 146
Download (untitled) / with headers
text/plain 146b
I've just released XML-Tidy-1.4.A7QCvHw to the CPAN which resolves this issue (exclusively for the 'filename' constructor case). -- -Pip@CPAN.Org


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.