Skip Menu |
 

This queue is for tickets about the Perl-Tidy CPAN distribution.

Report information

Subject: Broken handling of UTF-8 strings
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Charset: utf8
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1202100120-2794-4"
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 460
Download (untitled) / with headers
text/plain 460b
Hi. I can't get Perl::Tidy to do the right thing with UTF-8 strings. I've attached a test case with a source file (test.pl), the expected output (want.pl.tdy) and the actual output (got.pl.tdy). The problem arises because Perl::Tidy considers "Für" to be 4 characters rather than 3, courtesy of length(), which returns the number of bytes rather than the number of characters, unless the string "is in Unicode". perl 5.8.8 Linux 2.6.22 Perl::Tidy 20071205
Subject: test.pl
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1202098277-2794-3"
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Charset: utf8
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 0
Content-Type: application/x-perl; name="test.pl"
Content-Disposition: inline; filename="test.pl"
Content-Transfer-Encoding: base64
Content-Length: 90
Download test.pl
text/x-perl 90b
my $test = [ "Für Elise" => "Beethoven", "Eine kleine Nachtmusik" => "Mozart" ];
Subject: want.pl.tdy
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1202098383-2766-0"
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Charset: utf8
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 0
Content-Type: application/octet-stream; name="want.pl.tdy"
Content-Disposition: inline; filename="want.pl.tdy"
Content-Transfer-Encoding: base64
Content-Length: 103
Download want.pl.tdy
application/octet-stream 103b

Message body not shown because it is not plain text.

Subject: got.pl.tdy
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1202098327-2797-0"
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Charset: utf8
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 0
Content-Type: application/octet-stream; name="got.pl.tdy"
Content-Disposition: inline; filename="got.pl.tdy"
Content-Transfer-Encoding: base64
Content-Length: 102
Download got.pl.tdy
application/octet-stream 102b

Message body not shown because it is not plain text.

MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
Message-Id: <rt-3.6.HEAD-2794-1202106087-584.32905-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 579
Download (untitled) / with headers
text/plain 579b
On Sun Feb 03 23:42:09 2008, CHOCOLATE wrote: Show quoted text
> Hi. > > I can't get Perl::Tidy to do the right thing with UTF-8 strings. > > I've attached a test case with a source file (test.pl), the expected > output (want.pl.tdy) and the actual output (got.pl.tdy). > > The command was perltidy -fnl test.pl > > The problem arises because Perl::Tidy considers "Für" to be 4 characters > rather than 3, courtesy of length(), which returns the number of bytes > rather than the number of characters, unless the string "is in Unicode". > > perl 5.8.8 > Linux 2.6.22 > Perl::Tidy 20071205
MIME-Version: 1.0
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Disposition: inline
Charset: utf8
Message-Id: <rt-3.6.HEAD-22019-1228659015-732.32905-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 307
Download (untitled) / with headers
text/plain 307b
I think this is no bug as long as you do not "use utf8", because the strings should be interpreted as byte strings then. perltidy, however, does not seem to honor "use utf8" yet, so the indentation will still be incorrect despite this pragma being in effect, which I consider a bug, too. :-) Regards, fany
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-22019-1228659015-732.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <rt-3.6.HEAD-22019-1228659015-732.32905-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1399963458-2744-2"
Message-ID: <rt-4.0.18-2744-1399963458-327.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-2744-1399963458-532.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: sebastian [...] podjasek.pl
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 285
Download (untitled) / with headers
text/plain 285b
On Nd 07 Gru 2008, 09:10:15, FANY wrote: Show quoted text
> perltidy, however, does not seem to honor "use utf8" yet, so the > indentation will still be incorrect despite this pragma being in effect, > which I consider a bug, too. :-)
Just a tiny little change which solves this bug - at least for me.
MIME-Version: 1.0
Subject: 0001-Fix-indentation-of-wide-characters.patch
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="0001-Fix-indentation-of-wide-characters.patch"
Content-Disposition: inline; filename="0001-Fix-indentation-of-wide-characters.patch"
Content-Transfer-Encoding: binary
Content-Length: 619
From 203ea7adfd66a00ebb104af882c4d96faf244230 Mon Sep 17 00:00:00 2001 From: Sebastian Podjasek <sebastian@podjasek.pl> Date: Tue, 13 May 2014 08:43:07 +0200 Subject: [PATCH] Fix indentation of wide-characters --- lib/Perl/Tidy.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Perl/Tidy.pm b/lib/Perl/Tidy.pm index c326a8f..19b92bd 100644 --- a/lib/Perl/Tidy.pm +++ b/lib/Perl/Tidy.pm @@ -165,6 +165,7 @@ EOM } $fh = $New->( $filename, $mode ) or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); + $fh->binmode(':utf8'); return $fh, ( $ref or $filename ); } -- 1.9.1
MIME-Version: 1.0
X-Spam-Status: No, score=-4.698 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-2, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-2744-1399963458-272.32905-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-272.32905-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.194.6.166 with SMTP id c6mr2610305wja.64.1399988230247; Tue, 13 May 2014 06:37:10 -0700 (PDT)
Message-ID: <CAK7Dq6UZRGCNfHyCz+Q5n+okfxLuzk3Zh5Z8kzqCeWy7as7tZg [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="047d7b5d2718275b0104f9482991"
X-Spam-Score: -4.698
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 071352405BA for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Tue, 13 May 2014 09:37:19 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tIKYt0KtQsPB for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Tue, 13 May 2014 09:37:18 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id DC6112400B9 for <bug-Perl-Tidy [...] rt.cpan.org>; Tue, 13 May 2014 09:37:17 -0400 (EDT)
Received: (qmail 25893 invoked by alias); 13 May 2014 13:37:17 -0000
Received: from mail-we0-f177.google.com (HELO mail-we0-f177.google.com) (74.125.82.177) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Tue, 13 May 2014 06:37:14 -0700
Received: by mail-we0-f177.google.com with SMTP id x48so390206wes.36 for <bug-Perl-Tidy [...] rt.cpan.org>; Tue, 13 May 2014 06:37:10 -0700 (PDT)
Received: by 10.180.107.65 with HTTP; Tue, 13 May 2014 06:37:10 -0700 (PDT)
Delivered-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #32905] Broken handling of UTF-8 strings
Return-Path: <s7078hancock [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=McfSUc3aig9FS7GXDey82QugSJP65WkzyiFjChBuyig=; b=0MLqa/nhe5skghy+HIcpI9VKylNij2hSeBkOSpNED5dCw9Qm8dG4f76qFntdmiaM3V ZmUYh+/gK50Dg8CBdkDdgnRxbb8YGlO7WR8OD0KEK2jSYciEUo/FQml9bAaglCbVjTBm QvKcyaV8qq/ECMWa3JhveLi8HfEb+xBMtFj7K2F67QDh3y/+mjDM30H6IuCraNVm2vD9 zdNjiARsvbB+SrpCJHOY0lY1AV28heiW+z7TbD+c+Dz3bsaEoho1JbR+5jOkUo4EkTuL 3TxfaxHt5l7i9oruTruktFnUdftbN5pHySz6qrGN3LB4A71P7vIJuel+w2rJ7nft7unQ CUlA==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
X-RT-Mail-Extension: perl-tidy
Date: Tue, 13 May 2014 06:37:10 -0700
X-Spam-Level:
To: "bug-Perl-Tidy [...] rt.cpan.org" <bug-Perl-Tidy [...] rt.cpan.org>
From: Steven Hancock <s7078hancock [...] gmail.com>
RT-Message-ID: <rt-4.0.18-22784-1399988239-5.32905-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 509
Download (untitled) / with headers
text/plain 509b
Sebastian, thanks. Steve On Tuesday, May 13, 2014, Sebastian Podjasek via RT < bug-Perl-Tidy@rt.cpan.org> wrote: Show quoted text
> Queue: Perl-Tidy > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=32905 > > > On Nd 07 Gru 2008, 09:10:15, FANY wrote:
> > perltidy, however, does not seem to honor "use utf8" yet, so the > > indentation will still be incorrect despite this pragma being in effect, > > which I consider a bug, too. :-)
> > Just a tiny little change which solves this bug - at least for me. >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 825
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-2744-1399963458-327.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <rt-3.6.HEAD-22019-1228659015-732.32905-0-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.0-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1406275151-13352-2"
Message-ID: <rt-4.0.18-13352-1406275151-383.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-13352-1406275151-1562.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: sebastian [...] podjasek.pl
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 180
Download (untitled) / with headers
text/plain 180b
Previous patch would cause problems when using Perl::Tidy with string as an argument (Perl::Tidy::IOScalar does not implement binmode method). Also added some tests for this issue.
MIME-Version: 1.0
Subject: 0001-Fix-indentation-of-wide-characters.patch
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="0001-Fix-indentation-of-wide-characters.patch"
Content-Disposition: inline; filename="0001-Fix-indentation-of-wide-characters.patch"
Content-Transfer-Encoding: binary
Content-Length: 2673
From c572d03d038eb2c9cb663b2484ae48cef35073ce Mon Sep 17 00:00:00 2001 From: Sebastian Podjasek <sebastian.podjasek@intelliway.pl> Date: Tue, 13 May 2014 08:43:07 +0200 Subject: [PATCH] Fix indentation of wide-characters --- lib/Perl/Tidy.pm | 1 + t/testwide.pl.src | 4 ++++ t/testwide.t | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+) create mode 100644 t/testwide.pl.src create mode 100644 t/testwide.t diff --git a/lib/Perl/Tidy.pm b/lib/Perl/Tidy.pm index c326a8f..69d2788 100644 --- a/lib/Perl/Tidy.pm +++ b/lib/Perl/Tidy.pm @@ -165,6 +165,7 @@ EOM } $fh = $New->( $filename, $mode ) or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); + $fh->binmode(':utf8') if $fh->can('binmode'); return $fh, ( $ref or $filename ); } diff --git a/t/testwide.pl.src b/t/testwide.pl.src new file mode 100644 index 0000000..10eec3a --- /dev/null +++ b/t/testwide.pl.src @@ -0,0 +1,4 @@ +%pangrams=("Plain","ASCII", +"Zwölf große Boxkämpfer jagen Vik quer über den Sylter.","DE", +"Jeż wlókł gęś. Uf! Bądź choć przy nim, stań!","PL", +"Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.","RU"); \ No newline at end of file diff --git a/t/testwide.t b/t/testwide.t new file mode 100644 index 0000000..e96dfdd --- /dev/null +++ b/t/testwide.t @@ -0,0 +1,48 @@ +use strict; +use utf8; +use Test; +use Carp; +use FindBin; +BEGIN {plan tests => 2} +use Perl::Tidy; + + +my $source = <<'EOM'; +%pangrams=("Plain","ASCII", +"Zwölf große Boxkämpfer jagen Vik quer über den Sylter.","DE", +"Jeż wlókł gęś. Uf! Bądź choć przy nim, stań!","PL", +"Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.","RU"); +EOM + +my $expected_output=<<'EOM'; +%pangrams = ( + "Plain", "ASCII", + "Zwölf große Boxkämpfer jagen Vik quer über den Sylter.", "DE", + "Jeż wlókł gęś. Uf! Bądź choć przy nim, stań!", "PL", + "Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.", "RU" + ); +EOM + +my $perltidyrc = <<'EOM'; +-gnu +EOM + +my $output; + +Perl::Tidy::perltidy( + source => \$source, + destination => \$output, + perltidyrc => \$perltidyrc, + argv => '-nsyn', +); + +ok($output, $expected_output); + +Perl::Tidy::perltidy( + source => $FindBin::Bin . '/testwide.pl.src', + destination => \$output, + perltidyrc => \$perltidyrc, + argv => '-nsyn', +); + +ok($output, $expected_output); -- 1.9.1
MIME-Version: 1.0
X-Spam-Status: No, score=-4.698 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-2, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.180.92.38 with SMTP id cj6mr16851934wib.64.1406417932715; Sat, 26 Jul 2014 16:38:52 -0700 (PDT)
Message-ID: <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="f46d043c7faa490fe304ff21316c"
X-Spam-Score: -4.698
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 4F3F22404BF for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 26 Jul 2014 19:39:06 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yTEky75ifHTv for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 26 Jul 2014 19:39:05 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 0875B240472 for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 26 Jul 2014 19:39:04 -0400 (EDT)
Received: (qmail 24565 invoked by alias); 26 Jul 2014 23:39:04 -0000
Received: from mail-we0-f174.google.com (HELO mail-we0-f174.google.com) (74.125.82.174) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sat, 26 Jul 2014 16:38:57 -0700
Received: by mail-we0-f174.google.com with SMTP id x48so5816164wes.5 for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 26 Jul 2014 16:38:52 -0700 (PDT)
Received: by 10.180.37.205 with HTTP; Sat, 26 Jul 2014 16:38:52 -0700 (PDT)
Delivered-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #32905] Broken handling of UTF-8 strings
Return-Path: <s7078hancock [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mcfQOtTjmpptDB4DVDDSs8m6tmkW/GomJX1RkK54YKo=; b=AyndcFEeNn0gbrxCZss85hHQt/gIAQ0SzTwnKJ7fnEggzZewZeYKqzgWnXn4JeSwfP NFxLV7o93VYZVdxuGeN9UlBJJ9JrEw02KxBrVPKSH5Be4RB5d3Ce7IlYCh0FEsvC2irT Hh7cf6TmkFjNH1rxwzq7qA2ambk/L4MyZt3aPC9663RJAD/NOmdyOLWw9hki4jDDD8VB WiXxZ47++uvPPLSFOcvfQN9fNzRObKFPM0rNynbs2Qi9wRybhHugHCB4Rpefyk0tad70 eFmgbxRDL9Da+ERaU198aFuOfBejExBAN+dXN6rFuOYRjkLa74qTE6oPeEk9+ikf6Vlk z+zA==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
X-RT-Mail-Extension: perl-tidy
Date: Sat, 26 Jul 2014 16:38:52 -0700
X-Spam-Level:
To: "bug-Perl-Tidy [...] rt.cpan.org" <bug-Perl-Tidy [...] rt.cpan.org>
From: Steven Hancock <s7078hancock [...] gmail.com>
RT-Message-ID: <rt-4.0.18-28006-1406417947-971.32905-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 699
Download (untitled) / with headers
text/plain 699b
Sebastian, Thanks for the updated patch. I would be interested in putting this in to the next release on an experimental basis, but at least for now it would have to be turned on with a flag since otherwise it will cause problems with some existing scripts. Do you have a suggested name for the flag? Thanks, Steve On Fri, Jul 25, 2014 at 12:59 AM, Sebastian Podjasek via RT < bug-Perl-Tidy@rt.cpan.org> wrote: Show quoted text
> Queue: Perl-Tidy > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=32905 > > > Previous patch would cause problems when using Perl::Tidy with string as > an argument (Perl::Tidy::IOScalar does not implement binmode method). Also > added some tests for this issue. >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1139
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-28006-1406417947-971.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org> <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com> <rt-4.0.18-28006-1406417947-971.32905-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-23616-1415404815-1643.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-23616-1415404815-1731.32905-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: sebastian [...] podjasek.pl
Content-Length: 199
Download (untitled) / with headers
text/plain 199b
On Sob 26 Lip 2014, 19:39:07, s7078hancock@gmail.com wrote: Show quoted text
> Do you have a suggested name for the flag?
-wide / --wide-chars seems to be related to issue solved. As I've seen -wc is already taken.
MIME-Version: 1.0
X-Spam-Status: No, score=-5.809 tagged_above=-99.9 required=10 tests=[AWL=0.889, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-23616-1415404815-827.32905-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org> <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com> <rt-4.0.18-28006-1406417947-971.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-23616-1415404815-827.32905-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.194.92.42 with SMTP id cj10mr25404215wjb.6.1415453435797; Sat, 08 Nov 2014 05:30:35 -0800 (PST)
Message-ID: <CAK7Dq6UVx5UMnjGVVNYgEqk8=J0eYJtEd-0Gsbg-o-B83H-81Q [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="047d7bfcf1c03ca97b050758ef9e"
X-Spam-Score: -5.809
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 3FCDE240502 for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 8 Nov 2014 08:30:47 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wjVsO6hGBTcd for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 8 Nov 2014 08:30:46 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id C484824008F for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 8 Nov 2014 08:30:45 -0500 (EST)
Received: (qmail 28224 invoked by alias); 8 Nov 2014 13:30:44 -0000
Received: from mail-wg0-f50.google.com (HELO mail-wg0-f50.google.com) (74.125.82.50) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sat, 08 Nov 2014 05:30:41 -0800
Received: by mail-wg0-f50.google.com with SMTP id z12so5654784wgg.37 for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 08 Nov 2014 05:30:35 -0800 (PST)
Received: by 10.180.103.133 with HTTP; Sat, 8 Nov 2014 05:30:35 -0800 (PST)
Delivered-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #32905] Broken handling of UTF-8 strings
Return-Path: <s7078hancock [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=cvX2Ofy6xqK2PKzrxKFkEiyYrmsQBF9oEMnlAL/0sgw=; b=zH9VrZNTB5Blu24cAZ9BbPSjNNBOIzH0okbCX7WSrIfa7MzgjheHTn+WEhArb/A2bd NEz+ylew+3ikV/oBHrioLvV4iR0rwkUNvSYLvWZjrMAAbS6S217iFyOgS2S7wgnUVTsl A19FKxgAFjyzp8nidURWMocvx15SMYC1xvbZhN6wp/mZIzv3bfvTbyx7lGN3xxdv04rj 7Ac7IJ2pjR/J8woUkU3L6wyVZyTcKo9zfQa2xuQpyUgaR0ikZqw9Ygnk3BAdBY5+Pao+ WSCZsp42VLq3kMHm7ZvqRYlPgVzEvfHvttnx2zCtDgbQE2m+FohieJUS1FUX2Gt3wCwi squA==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
X-RT-Mail-Extension: perl-tidy
Date: Sat, 8 Nov 2014 05:30:35 -0800
X-Spam-Level:
To: "bug-Perl-Tidy [...] rt.cpan.org" <bug-Perl-Tidy [...] rt.cpan.org>
From: Steven Hancock <s7078hancock [...] gmail.com>
RT-Message-ID: <rt-4.0.18-3610-1415453448-1363.32905-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 463
Download (untitled) / with headers
text/plain 463b
That seems okay, or even -wch / --wide-characters Steve On Fri, Nov 7, 2014 at 4:00 PM, Sebastian Podjasek via RT < bug-Perl-Tidy@rt.cpan.org> wrote: Show quoted text
> Queue: Perl-Tidy > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=32905 > > > On Sob 26 Lip 2014, 19:39:07, s7078hancock@gmail.com wrote:
> > Do you have a suggested name for the flag?
> > -wide / --wide-chars > > seems to be related to issue solved. As I've seen -wc is already taken. >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1028
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-3610-1415453448-1363.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org> <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com> <rt-4.0.18-28006-1406417947-971.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-23616-1415404815-827.32905-5-0 [...] rt.cpan.org> <CAK7Dq6UVx5UMnjGVVNYgEqk8=J0eYJtEd-0Gsbg-o-B83H-81Q [...] mail.gmail.com> <rt-4.0.18-3610-1415453448-1363.32905-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1439614595-22511-2"
Message-ID: <rt-4.0.18-22511-1439614595-199.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1746
Download (untitled) / with headers
text/plain 1.7k
Here is the more portable solution, that works with any source type (ScalarRef, handle, ArrayRef). --- "d:\\downloads\\Tidy.pm" 2015-08-15 03:18:51.000000000 +0300 +++ "d:\\devel\\perl\\perl\\site\\lib\\Perl\\Tidy.pm" 2015-08-15 07:47:21.570789700 +0300 @@ -173,12 +173,6 @@ $fh = $New->( $filename, $mode ) or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); - # The first call here will be to read the config file, which is before - # the --encoding has been set, so the config file cannot be read as utf8 - $fh->binmode(':encoding(utf8)') - if ( $rOpts_character_encoding - && $rOpts_character_encoding eq 'utf8' - && $fh->can('binmode') ); return $fh, ( $ref or $filename ); } @@ -811,12 +805,23 @@ # Prefilters and postfilters: The prefilter is a code reference # that will be applied to the source before tidying, and the # postfilter is a code reference to the result before outputting. - if ($prefilter) { + if ( $prefilter || ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' ) ) { my $buf = ''; while ( my $line = $source_object->get_line() ) { $buf .= $line; } - $buf = $prefilter->($buf); + + $buf = $prefilter->($buf) if $prefilter; + + if ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' && !utf8::is_utf8($buf) ) { + require Encode; + + eval { + $buf = Encode::decode('UTF-8', $buf, Encode::FB_CROAK | Encode::LEAVE_SRC); + }; + + Die "unable to decode source\n" if $@; + } $source_object = Perl::Tidy::LineSource->new( \$buf, $rOpts, $rpending_logfile_message );
MIME-Version: 1.0
Subject: patch.txt
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/plain; charset="utf-8"; name="patch.txt"
Content-Disposition: inline; filename="patch.txt"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1646
Download patch.txt
text/plain 1.6k
--- "d:\\downloads\\Tidy.pm" 2015-08-15 03:18:51.000000000 +0300 +++ "d:\\devel\\perl\\perl\\site\\lib\\Perl\\Tidy.pm" 2015-08-15 07:47:21.570789700 +0300 @@ -173,12 +173,6 @@ $fh = $New->( $filename, $mode ) or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); - # The first call here will be to read the config file, which is before - # the --encoding has been set, so the config file cannot be read as utf8 - $fh->binmode(':encoding(utf8)') - if ( $rOpts_character_encoding - && $rOpts_character_encoding eq 'utf8' - && $fh->can('binmode') ); return $fh, ( $ref or $filename ); } @@ -811,12 +805,23 @@ # Prefilters and postfilters: The prefilter is a code reference # that will be applied to the source before tidying, and the # postfilter is a code reference to the result before outputting. - if ($prefilter) { + if ( $prefilter || ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' ) ) { my $buf = ''; while ( my $line = $source_object->get_line() ) { $buf .= $line; } - $buf = $prefilter->($buf); + + $buf = $prefilter->($buf) if $prefilter; + + if ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' && !utf8::is_utf8($buf) ) { + require Encode; + + eval { + $buf = Encode::decode('UTF-8', $buf, Encode::FB_CROAK | Encode::LEAVE_SRC); + }; + + Die "unable to decode source\n" if $@; + } $source_object = Perl::Tidy::LineSource->new( \$buf, $rOpts, $rpending_logfile_message );
MIME-Version: 1.0
X-Spam-Status: No, score=-6.005 tagged_above=-99.9 required=10 tests=[AWL=0.693, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-22511-1439614595-1086.32905-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org> <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com> <rt-4.0.18-28006-1406417947-971.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-23616-1415404815-827.32905-5-0 [...] rt.cpan.org> <CAK7Dq6UVx5UMnjGVVNYgEqk8=J0eYJtEd-0Gsbg-o-B83H-81Q [...] mail.gmail.com> <rt-4.0.18-3610-1415453448-1363.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-22511-1439614595-1086.32905-5-0 [...] rt.cpan.org>
X-Virus-Checked: Checked
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.182.176.36 with SMTP id cf4mr46123312obc.40.1439647101737; Sat, 15 Aug 2015 06:58:21 -0700 (PDT)
Message-ID: <CAK7Dq6W0O7U7z_VJnshBexD8jcYDo42RtfDXaUrX6Q04XtHMbA [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="e89a8ff1ca281a0ad9051d59f6d6"
X-Spam-Score: -6.005
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 4D8A0240347 for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 15 Aug 2015 09:58:33 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i990BlO3P+vZ for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Sat, 15 Aug 2015 09:58:31 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 101F7240172 for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 15 Aug 2015 09:58:30 -0400 (EDT)
Received: (qmail 32329 invoked by alias); 15 Aug 2015 13:58:30 -0000
Received: from mail-ob0-f177.google.com (HELO mail-ob0-f177.google.com) (209.85.214.177) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sat, 15 Aug 2015 06:58:25 -0700
Received: by obbop1 with SMTP id op1so81555949obb.2 for <bug-Perl-Tidy [...] rt.cpan.org>; Sat, 15 Aug 2015 06:58:21 -0700 (PDT)
Received: by 10.60.37.201 with HTTP; Sat, 15 Aug 2015 06:58:21 -0700 (PDT)
Delivered-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #32905] Broken handling of UTF-8 strings
Return-Path: <s7078hancock [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0cO42gHKQauRCKwHQj44nf+Vnq+hzJSptxCkSykFKIw=; b=Stb3NB+Zgy7hgW7zDIf7F88n54HmcqxDCAutqBb+UIZ7LEEbiuRkaSZf2wug57cVBS zHe8kj3oQ8xB9HYe0P6N8XJA/7GknZ4OF/lV3sFDXtnh43AN+CDLMPijYCDxb5gn5SB3 sgF3iD9LAJCH4opz8BMEmQWGbJL1diz+PPjxSt8HcgbHogDrwDVgYAmLOpsKfrZ9aB8q JRkWnsawk/etvM3iI8HsKJwDrLp9MpoC+EboG56MjigqgJ35EgZz4vW8HJpgZqpsAIV6 jSL5qfC1Pl9z8+USe4KUy3sAc8o+Eq4zQR0WZ3LbGMoFyMg6hT9Lm05b7Nmc0JqWRMeW QSUg==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
X-RT-Mail-Extension: perl-tidy
Date: Sat, 15 Aug 2015 06:58:21 -0700
X-Spam-Level:
To: "bug-Perl-Tidy [...] rt.cpan.org" <bug-Perl-Tidy [...] rt.cpan.org>
From: Steven Hancock <s7078hancock [...] gmail.com>
RT-Message-ID: <rt-4.0.18-27581-1439647114-404.32905-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 4655
Download (untitled) / with headers
text/plain 4.5k
Dmytro, Thanks for the patch, this looks like a better way to go. I changed the 'require' to 'use' to get it to work without complaints. When I tested it on a file I noticed that the output file was not UTF-8 on my system, so I think that the previous coding may need to be retained when the $mode is write. Any other comments are very welcome. I assume that I will be getting more suggestions for improvement once this version starts to get used, so I will probably need to do another release fairly soon. Steve On Fri, Aug 14, 2015 at 9:56 PM, Dmytro Zagashev via RT < bug-Perl-Tidy@rt.cpan.org> wrote: Show quoted text
> Queue: Perl-Tidy > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=32905 > > > Here is the more portable solution, that works with any source type > (ScalarRef, handle, ArrayRef). > > --- "d:\\downloads\\Tidy.pm" 2015-08-15 03:18:51.000000000 +0300 > +++ "d:\\devel\\perl\\perl\\site\\lib\\Perl\\Tidy.pm" 2015-08-15 > 07:47:21.570789700 +0300 > @@ -173,12 +173,6 @@ > $fh = $New->( $filename, $mode ) > or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); > > - # The first call here will be to read the config file, which is before > - # the --encoding has been set, so the config file cannot be read as > utf8 > - $fh->binmode(':encoding(utf8)') > - if ( $rOpts_character_encoding > - && $rOpts_character_encoding eq 'utf8' > - && $fh->can('binmode') ); > return $fh, ( $ref or $filename ); > } > > @@ -811,12 +805,23 @@ > # Prefilters and postfilters: The prefilter is a code reference > # that will be applied to the source before tidying, and the > # postfilter is a code reference to the result before outputting. > - if ($prefilter) { > + if ( $prefilter || ( $rOpts_character_encoding && > $rOpts_character_encoding eq 'utf8' ) ) { > my $buf = ''; > while ( my $line = $source_object->get_line() ) { > $buf .= $line; > } > - $buf = $prefilter->($buf); > + > + $buf = $prefilter->($buf) if $prefilter; > + > + if ( $rOpts_character_encoding && > $rOpts_character_encoding eq 'utf8' && !utf8::is_utf8($buf) ) { > + require Encode; > + > + eval { > + $buf = Encode::decode('UTF-8', > $buf, Encode::FB_CROAK | Encode::LEAVE_SRC); > + }; > + > + Die "unable to decode source\n" if $@; > + } > > $source_object = Perl::Tidy::LineSource->new( \$buf, $rOpts, > $rpending_logfile_message ); > > --- "d:\\downloads\\Tidy.pm" 2015-08-15 03:18:51.000000000 +0300 > +++ "d:\\devel\\perl\\perl\\site\\lib\\Perl\\Tidy.pm" 2015-08-15 > 07:47:21.570789700 +0300 > @@ -173,12 +173,6 @@ > $fh = $New->( $filename, $mode ) > or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); > > - # The first call here will be to read the config file, which is before > - # the --encoding has been set, so the config file cannot be read as > utf8 > - $fh->binmode(':encoding(utf8)') > - if ( $rOpts_character_encoding > - && $rOpts_character_encoding eq 'utf8' > - && $fh->can('binmode') ); > return $fh, ( $ref or $filename ); > } > > @@ -811,12 +805,23 @@ > # Prefilters and postfilters: The prefilter is a code reference > # that will be applied to the source before tidying, and the > # postfilter is a code reference to the result before outputting. > - if ($prefilter) { > + if ( $prefilter || ( $rOpts_character_encoding && > $rOpts_character_encoding eq 'utf8' ) ) { > my $buf = ''; > while ( my $line = $source_object->get_line() ) { > $buf .= $line; > } > - $buf = $prefilter->($buf); > + > + $buf = $prefilter->($buf) if $prefilter; > + > + if ( $rOpts_character_encoding && > $rOpts_character_encoding eq 'utf8' && !utf8::is_utf8($buf) ) { > + require Encode; > + > + eval { > + $buf = Encode::decode('UTF-8', > $buf, Encode::FB_CROAK | Encode::LEAVE_SRC); > + }; > + > + Die "unable to decode source\n" if $@; > + } > > $source_object = Perl::Tidy::LineSource->new( \$buf, $rOpts, > $rpending_logfile_message ); > >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 6074
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-27581-1439647114-404.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-3.6.HEAD-22019-1228659015-732.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-2744-1399963458-327.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-13352-1406275151-1876.32905-5-0 [...] rt.cpan.org> <CAK7Dq6Xa5s+RDHOhan2HJnF9dJUaiGxnsohmxKj6SM9GMLDhrQ [...] mail.gmail.com> <rt-4.0.18-28006-1406417947-971.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-23616-1415404815-827.32905-5-0 [...] rt.cpan.org> <CAK7Dq6UVx5UMnjGVVNYgEqk8=J0eYJtEd-0Gsbg-o-B83H-81Q [...] mail.gmail.com> <rt-4.0.18-3610-1415453448-1363.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-22511-1439614595-1086.32905-5-0 [...] rt.cpan.org> <CAK7Dq6W0O7U7z_VJnshBexD8jcYDo42RtfDXaUrX6Q04XtHMbA [...] mail.gmail.com> <rt-4.0.18-27581-1439647114-404.32905-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1449746205-3938-3"
Message-ID: <rt-4.0.18-3938-1449746205-1419.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-3938-1449746205-295.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: qsimpleq
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 673
Download (untitled) / with headers
text/plain 673b
Update Dmytro's patch. Add utf8 support for output files too. Суб Авг 15 09:58:34 2015, s7078hancock@gmail.com писал: Show quoted text
> Dmytro, > Thanks for the patch, this looks like a better way to go. I changed > the > 'require' to 'use' to get it to work without complaints. When I > tested it > on a file I noticed that the output file was not UTF-8 on my system, > so I > think that the previous coding may need to be retained when the $mode > is > write. > > Any other comments are very welcome. I assume that I will be getting > more > suggestions for improvement once this version starts to get used, so I > will > probably need to do another release fairly soon.
MIME-Version: 1.0
Subject: fix-utf8-output-for-std-and-files.patch
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="fix-utf8-output-for-std-and-files.patch"
Content-Disposition: inline; filename="fix-utf8-output-for-std-and-files.patch"
Content-Transfer-Encoding: binary
Content-Length: 2215
--- Tidy.pm.orig 2015-12-10 15:20:42.236631385 +0500 +++ Tidy.pm 2015-12-10 15:19:01.052440351 +0500 @@ -74,6 +74,7 @@ @EXPORT = qw( &perltidy ); use Cwd; +use Encode (); use IO::File; use File::Basename; use File::Copy; @@ -173,12 +174,6 @@ $fh = $New->( $filename, $mode ) or Warn("Couldn't open file:$filename in mode:$mode : $!\n"); - # The first call here will be to read the config file, which is before - # the --encoding has been set, so the config file cannot be read as utf8 - $fh->binmode(':encoding(utf8)') - if ( $rOpts_character_encoding - && $rOpts_character_encoding eq 'utf8' - && $fh->can('binmode') ); return $fh, ( $ref or $filename ); } @@ -811,12 +806,20 @@ # Prefilters and postfilters: The prefilter is a code reference # that will be applied to the source before tidying, and the # postfilter is a code reference to the result before outputting. - if ($prefilter) { + if ( $prefilter || ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' ) ) { my $buf = ''; while ( my $line = $source_object->get_line() ) { $buf .= $line; } - $buf = $prefilter->($buf); + + $buf = $prefilter->($buf) if $prefilter; + + if ( $rOpts_character_encoding && $rOpts_character_encoding eq 'utf8' && !utf8::is_utf8($buf) ) { + eval { + $buf = Encode::decode('UTF-8', $buf, Encode::FB_CROAK | Encode::LEAVE_SRC); + }; + Die "unable to decode source\n" if $@; + } $source_object = Perl::Tidy::LineSource->new( \$buf, $rOpts, $rpending_logfile_message ); @@ -3935,7 +3938,10 @@ $output_file_open = 1; if ($binmode) { if ( ref($fh) eq 'IO::File' ) { - binmode $fh; + if ( $rOpts->{'character-encoding'} && $rOpts->{'character-encoding'} eq 'utf8' ) { + binmode $fh, ":encoding(UTF-8)"; + } + else { binmode $fh } } if ( $output_file eq '-' ) { binmode STDOUT } }
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-3272-1456595424-1751.32905-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 138
Download (untitled) / with headers
text/plain 138b
Version 20160301 of perltidy has the latest patch, thanks. I think this issue is closed now but will leave the RT ticket open for a while.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-3272-1456595424-1751.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <rt-4.0.18-3272-1456595424-1751.32905-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1461584440-19731-2"
Message-ID: <rt-4.0.18-19731-1461584440-112.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-19731-1461584440-48.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: qsimpleq
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 199
Download (untitled) / with headers
text/plain 199b
Суб Фев 27 12:50:24 2016, SHANCOCK : Show quoted text
> this issue is closed now but will leave the RT ticket open for a > while.
And it was the right decision. -st -utf8 continued to be broken. fixed by patch
MIME-Version: 1.0
Subject: utf8.diff
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="utf8.diff"
Content-Disposition: inline; filename="utf8.diff"
Content-Transfer-Encoding: binary
Content-Length: 1002
Download utf8.diff
text/x-diff 1002b
--- Tidy.pm_old 2016-03-01 19:46:22.000000000 +0500 +++ Tidy.pm 2016-04-25 16:04:09.984426450 +0500 @@ -3955,15 +3959,13 @@ unless ($fh) { Perl::Tidy::Die "Cannot write to output stream\n"; } $output_file_open = 1; if ($binmode) { - if ( ref($fh) eq 'IO::File' ) { - if ( $rOpts->{'character-encoding'} - && $rOpts->{'character-encoding'} eq 'utf8' ) + if ( $rOpts->{'character-encoding'} + && $rOpts->{'character-encoding'} eq 'utf8' ) { - binmode $fh, ":encoding(UTF-8)"; + if ( ref($fh) eq 'IO::File' ) { $fh->binmode(":encoding(UTF-8)"); } + elsif ( $output_file eq '-' ) { binmode STDOUT, ":encoding(UTF-8)"; } } - else { binmode $fh } - } - if ( $output_file eq '-' ) { binmode STDOUT } + elsif ( $output_file eq '-' ) { binmode STDOUT } } }
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-3272-1456595424-1751.32905-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <rt-4.0.18-3272-1456595424-1751.32905-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1461584473-21600-2"
Message-ID: <rt-4.0.18-21600-1461584473-1744.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-21600-1461584473-917.32905-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: qsimpleq
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 199
Download (untitled) / with headers
text/plain 199b
Суб Фев 27 12:50:24 2016, SHANCOCK : Show quoted text
> this issue is closed now but will leave the RT ticket open for a > while.
And it was the right decision. -st -utf8 continued to be broken. fixed by patch
MIME-Version: 1.0
Subject: utf8.diff
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="utf8.diff"
Content-Disposition: inline; filename="utf8.diff"
Content-Transfer-Encoding: binary
Content-Length: 1002
Download utf8.diff
text/x-diff 1002b
--- Tidy.pm_old 2016-03-01 19:46:22.000000000 +0500 +++ Tidy.pm 2016-04-25 16:04:09.984426450 +0500 @@ -3955,15 +3959,13 @@ unless ($fh) { Perl::Tidy::Die "Cannot write to output stream\n"; } $output_file_open = 1; if ($binmode) { - if ( ref($fh) eq 'IO::File' ) { - if ( $rOpts->{'character-encoding'} - && $rOpts->{'character-encoding'} eq 'utf8' ) + if ( $rOpts->{'character-encoding'} + && $rOpts->{'character-encoding'} eq 'utf8' ) { - binmode $fh, ":encoding(UTF-8)"; + if ( ref($fh) eq 'IO::File' ) { $fh->binmode(":encoding(UTF-8)"); } + elsif ( $output_file eq '-' ) { binmode STDOUT, ":encoding(UTF-8)"; } } - else { binmode $fh } - } - if ( $output_file eq '-' ) { binmode STDOUT } + elsif ( $output_file eq '-' ) { binmode STDOUT } } }
MIME-Version: 1.0
X-Spam-Status: No, score=-6.398 tagged_above=-99.9 required=10 tests=[AWL=0.200, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-19731-1461584441-610.32905-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-32905 [...] rt.cpan.org> <rt-4.0.18-3272-1456595424-1751.32905-5-0 [...] rt.cpan.org> <rt-4.0.18-19731-1461584441-610.32905-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.157.4.72 with SMTP id 66mr16596781otc.141.1461612121629; Mon, 25 Apr 2016 12:22:01 -0700 (PDT)
Message-ID: <CAK7Dq6UU04XbwjW5rk70v2+6ACq1xWngbcu8V80igvPzuZhvkQ [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="001a1136fcbc4f2d7f0531541797"
X-Spam-Score: -6.398
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 49D4C240094 for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Mon, 25 Apr 2016 15:22:12 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Lgt-99qpCkci for <cpan-bug+Perl-Tidy [...] hipster.bestpractical.com>; Mon, 25 Apr 2016 15:22:10 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 67C41240023 for <bug-Perl-Tidy [...] rt.cpan.org>; Mon, 25 Apr 2016 15:22:10 -0400 (EDT)
Received: (qmail 6616 invoked by alias); 25 Apr 2016 19:22:08 -0000
Received: from mail-oi0-f49.google.com (HELO mail-oi0-f49.google.com) (209.85.218.49) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 25 Apr 2016 12:22:05 -0700
Received: by mail-oi0-f49.google.com with SMTP id x19so53101822oix.2 for <bug-Perl-Tidy [...] rt.cpan.org>; Mon, 25 Apr 2016 12:22:05 -0700 (PDT)
Received: by 10.60.116.196 with HTTP; Mon, 25 Apr 2016 12:22:01 -0700 (PDT)
Delivered-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #32905] Broken handling of UTF-8 strings
Return-Path: <s7078hancock [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to; bh=PwgjXwSI2TDwgdSRWJFMx9lB2clRUsZYYvzygq/LM+Q=; b=iieqcGX6d7lkjs1lTJylTWUe5L7mnGfrylGpWABkFLbaqpsKe9x+PSx2lASyQ0VlJA CHV/LmIc3sF923hrVivsOUlEzDfEkFJbKyM1xLcgZ3t2gDoNv9w/kVr1XZ5ZSqejVMma ZHwYKZcfAixBARDgXQcePhkuZXg12Yl9LYRgbsOiQZRu4ZQHCU5SkPMaSIArjkdsQurL qZJOI0cUaC+qVsqwXW4iOmKsmIrYViJ50OxWvrvIBLy9Wm6csvN4Ao+cxd5YQyuXphe0 WJr/IXDKG+ggj8YpK7LI8yRIqSPqiCeUnY0SRuKLOFsfs/UHj+W9miRqYFEJrPYm8sIl lcpA==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Perl-Tidy [...] hipster.bestpractical.com
X-RT-Mail-Extension: perl-tidy
X-Google-Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to; bh=PwgjXwSI2TDwgdSRWJFMx9lB2clRUsZYYvzygq/LM+Q=; b=JJD7uFx2umsCwbqffEYnbd51b7XO0zyPvRXD9kAf2NiEzEd5ERxqj4UfNgTwwSQs98 iCMJl5U41MWr8lMC8adwHK8EEXlGUJpTx4En7dUDB1YoTPy5FrElE0M2LX4lp7WQAcuq RAOoU6uCuJA8/01alL3f0lTkarvDG3eemYc5haixlGprHqk2tZo/Y33TUEUP03+b+u7r /kLVv0nZXaMELCw1xB96EZC6RU4HuFIo5tRLjz8k3REcrtYk+y5j2sCNtYbNx5TEvWKa EOxqo3/lrVPDYvEneUuxoTeqSpHEVJxanQXKbQ6cVHV9YY3CmPADq+L3xQ2EaLaxgksY 7mSA==
X-Google-Sender-Auth: WthVnxxv4M-Bqe8wvGHLbtcgL-U
Sender: s7078hancock [...] gmail.com
Date: Mon, 25 Apr 2016 12:22:01 -0700
X-Spam-Level:
To: "bug-Perl-Tidy [...] rt.cpan.org" <bug-Perl-Tidy [...] rt.cpan.org>
X-GM-Message-State: AOPr4FV94ZaZpxhDmzk9C2gR/1Dsyv+7LiEXGvG7UvmkuoKUd7wVAt4xVkZf8UD80li1xA+i8aRnpFWChXI0sg==
From: Steven Hancock <perltidy [...] users.sourceforge.net>
RT-Message-ID: <rt-4.0.18-13995-1461612133-29.32905-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1516
Download (untitled) / with headers
text/plain 1.4k
Thanks for the report and patch. Steve On Mon, Apr 25, 2016 at 4:40 AM, http://mrakobes86reg.id.bk.ru/ via RT < bug-Perl-Tidy@rt.cpan.org> wrote: Show quoted text
> Queue: Perl-Tidy > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=32905 > > > Суб Фев 27 12:50:24 2016, SHANCOCK :
> > this issue is closed now but will leave the RT ticket open for a > > while.
> > And it was the right decision. -st -utf8 continued to be broken. fixed by > patch > > --- Tidy.pm_old 2016-03-01 19:46:22.000000000 +0500 > +++ Tidy.pm 2016-04-25 16:04:09.984426450 +0500 > @@ -3955,15 +3959,13 @@ > unless ($fh) { Perl::Tidy::Die "Cannot write to output stream\n"; > } > $output_file_open = 1; > if ($binmode) { > - if ( ref($fh) eq 'IO::File' ) { > - if ( $rOpts->{'character-encoding'} > - && $rOpts->{'character-encoding'} eq 'utf8' ) > + if ( $rOpts->{'character-encoding'} > + && $rOpts->{'character-encoding'} eq 'utf8' ) > { > - binmode $fh, ":encoding(UTF-8)"; > + if ( ref($fh) eq 'IO::File' ) { > $fh->binmode(":encoding(UTF-8)"); } > + elsif ( $output_file eq '-' ) { binmode STDOUT, > ":encoding(UTF-8)"; } > } > - else { binmode $fh } > - } > - if ( $output_file eq '-' ) { binmode STDOUT } > + elsif ( $output_file eq '-' ) { binmode STDOUT } > } > } > >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 2346
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-29128-1495384631-988.32905-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 46
This patch is implemented in version 20170521.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.