Skip Menu |
 

This queue is for tickets about the PathTools CPAN distribution.

Report information
The Basics
Id: 107856
Status: open
Priority: 0/
Queue: PathTools

People
Owner: Nobody in particular
Requestors: HAKONH [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: abs2rel problem with unicode paths
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Message-ID: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 766
Download (untitled) / with headers
text/plain 766b
Using Perl version 5.20.1 on a Linux laptop. When running the following script: use feature qw(say); use strict; use utf8; use warnings; use Env qw(HOME); use File::Spec::Functions qw(abs2rel); my $tdir = 'ø'; my $path = "$HOME/$tdir/b/æ"; my $base = "$HOME/$tdir"; chdir $base; binmode STDOUT, ":utf8"; say abs2rel( $path, $base ); say abs2rel( $path ); I get output: b/æ ../ø/b/æ Expected output: b/æ ../ø/b/æ Assumed problem: Line 409 in Unix.pm ( https://metacpan.org/source/SMUELLER/PathTools-3.47/lib/File/Spec/Unix.pm ) $base = $self->_cwd() unless defined $base and length $base; calls Cwd::getcwd() which returns bytes, this causes $base not to be recognized as a prefix for $path.. Fix: _cwd() should return unicode in this case.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-24232-1445267780-1033.107856-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 112
Download (untitled) / with headers
text/plain 112b
Show quoted text
> > Expected output: > > b/æ > ../ø/b/æ >
Sorry that was a typo, should be: Expected output: b/æ b/æ
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-21211-1445271224-1337.107856-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 559
Download (untitled) / with headers
text/plain 559b
On 2015-10-19 08:13:57, HAKONH wrote: Show quoted text
> $base = $self->_cwd() unless defined $base and length $base; > > calls Cwd::getcwd() which returns bytes, this causes $base not to be > recognized as a prefix for $path.. > > Fix: _cwd() should return unicode in this case.
I'm not sure that the code should do any utf8 decoding of filenames, at least not without being requested too -- there is no standardization for filesystems to use a specific encoding (some use UTF-16, some use latin1, some use utf-8..) and there is no way for us to tell which one is in use.
MIME-Version: 1.0
X-Spam-Status: No, score=-6.598 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-21211-1445271224-1761.107856-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-107856 [...] rt.cpan.org> <rt-4.0.18-26320-1445267637-117.107856-5-0 [...] rt.cpan.org> <rt-4.0.18-21211-1445271224-1761.107856-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.13.213.138 with SMTP id x132mr20712782ywd.107.1445281446400; Mon, 19 Oct 2015 12:04:06 -0700 (PDT)
Message-ID: <CACrz-HuQFY4=XwkRVWgvtsGHim7a=dvn8GF2sXggPkSh44b0Vw [...] mail.gmail.com>
Content-Type: multipart/alternative; boundary="001a114fa914369e8e052279cfb9"
X-Spam-Score: -6.598
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id BD9C324007C for <cpan-bug+PathTools [...] hipster.bestpractical.com>; Mon, 19 Oct 2015 15:04:18 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WgNTS4H-86WO for <cpan-bug+PathTools [...] hipster.bestpractical.com>; Mon, 19 Oct 2015 15:04:14 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id EE19124001D for <bug-PathTools [...] rt.cpan.org>; Mon, 19 Oct 2015 15:04:13 -0400 (EDT)
Received: (qmail 3954 invoked by alias); 19 Oct 2015 19:04:13 -0000
Received: from mail-yk0-f173.google.com (HELO mail-yk0-f173.google.com) (209.85.160.173) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 19 Oct 2015 12:04:10 -0700
Received: by ykfy204 with SMTP id y204so152437710ykf.1 for <bug-PathTools [...] rt.cpan.org>; Mon, 19 Oct 2015 12:04:06 -0700 (PDT)
Received: by 10.129.79.147 with HTTP; Mon, 19 Oct 2015 12:04:06 -0700 (PDT)
Delivered-To: cpan-bug+PathTools [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #107856] abs2rel problem with unicode paths
Return-Path: <kenahoo [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=GEolQVLPSnilXY6id7QODBuQ+lEQtixVi0uf5A2c4SE=; b=WE5yMJfc2fE++ltGtyp4z0sBfRMaZmLyEE1gMqP/8mHdhF55hfZYB4bP2Z1GaZNAoW sXzIoe2hRAeUIaxT9xyUv4jVqP2sdusMJAZ30DqZlymUyFX1TPGIwPlBi0Vo4eoFGPOR FijcGADPShYJtIYNSjD71n8qX44CloHkwiOEH3qfTE+qWmF4KTfiVHRyvy4ILYgtlTVa v660up/R9Ly8arJC0Bnr6JnQLCee8P+QLKN40b5J99BMwhO9s3RlGDmyygkCL+46jXQ0 D5RNv4NOzwkf/TiA2wT0ZEgzDfUVHoyx8mLEOH1+49mKG0vdhTZ+tSGZp5HeuvFXR8+/ eDFg==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+PathTools [...] hipster.bestpractical.com
X-RT-Mail-Extension: pathtools
X-Google-Sender-Auth: 65wE1GF5ST0Hasa15m34Awud8I0
Sender: kenahoo [...] gmail.com
Date: Mon, 19 Oct 2015 14:04:06 -0500
X-Spam-Level:
To: "bug-PathTools [...] rt.cpan.org" <bug-PathTools [...] rt.cpan.org>
From: Ken Williams <kwilliams [...] cpan.org>
RT-Message-ID: <rt-4.0.18-9862-1445281459-1084.107856-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 851
Download (untitled) / with headers
text/plain 851b
Filesystems use encodings at all? I thought they just used byte sequences. On Mon, Oct 19, 2015 at 11:13 AM, Karen Etheridge via RT < bug-PathTools@rt.cpan.org> wrote: Show quoted text
> Queue: PathTools > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=107856 > > > On 2015-10-19 08:13:57, HAKONH wrote: >
> > $base = $self->_cwd() unless defined $base and length $base; > > > > calls Cwd::getcwd() which returns bytes, this causes $base not to be > > recognized as a prefix for $path.. > > > > Fix: _cwd() should return unicode in this case.
> > I'm not sure that the code should do any utf8 decoding of filenames, at > least not without being requested too -- there is no standardization for > filesystems to use a specific encoding (some use UTF-16, some use latin1, > some use utf-8..) and there is no way for us to tell which one is in use. >
content-type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1348
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <rt-4.0.18-26320-1445267637-117.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-4364-1445281557-728.107856-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 441
Download (untitled) / with headers
text/plain 441b
Maybe the function should then croak if the user uses the one-argument call and $path has the UTF-8 flag set? Since in this case unexpected results may occur as shown.. Accordingly, a workaround seems to be to encode $path before passing it on: my $encode_flags = Encode::FB_CROAK | Encode::LEAVE_SRC; $path = Encode::encode( 'UTF-8', $path, $encode_flags ); say Encode::decode( 'UTF-8', abs2rel( $path ), $encode_flags ); Ouput: b/æ


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.