Skip Menu |
 

This queue is for tickets about the File-Slurp CPAN distribution.

Report information
The Basics
Id: 84918
Status: open
Priority: 0/
Queue: File-Slurp

People
Owner: cwhitener [...] gmail.com
Requestors: corion [...] corion.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



From corion [...] corion.net Mon Apr 29 15: 50:44 2013
MIME-Version: 1.0
X-Spam-Status: No, score=-6.55 tagged_above=-99.9 required=10 tests=[AWL=0.350, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
X-Spam-Flag: NO
content-type: text/plain; charset="utf-8"; format="flowed"
Message-ID: <517ECEFA.7030708 [...] corion.net>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -6.55
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id ADEB4240798 for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Mon, 29 Apr 2013 15:50:44 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M0eWqxTaDgDT for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Mon, 29 Apr 2013 15:50:41 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 980CC24028A for <bug-File-Slurp [...] rt.cpan.org>; Mon, 29 Apr 2013 15:50:39 -0400 (EDT)
Received: (qmail 30108 invoked by alias); 29 Apr 2013 19:50:38 -0000
Received: from mail.corion.net (HELO mail.corion.net) (46.163.73.47) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 29 Apr 2013 12:50:31 -0700
Received: from port-92-193-102-240.dynamic.qsc.de ([92.193.102.240] helo=aliens.maischein-int.de) by mail.corion.net with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <corion [...] corion.net>) id 1UWu5i-0005nR-EB for bug-File-Slurp [...] rt.cpan.org; Mon, 29 Apr 2013 21:50:26 +0200
Received: from [192.168.1.17] by aliens.maischein-int.de with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from <corion [...] corion.net>) id 1UWu5h-0003W1-Uc for bug-File-Slurp [...] rt.cpan.org; Mon, 29 Apr 2013 21:50:25 +0200
Delivered-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
Subject: read_file() ignores binmode option for short files
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
Return-Path: <corion [...] corion.net>
X-RT-Mail-Extension: file-slurp
X-Original-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
X-Spam-Check-BY: la.mx.develooper.com
Date: Mon, 29 Apr 2013 21:50:18 +0200
X-Spam-Level:
To: bug-File-Slurp [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Max Maischein <corion [...] corion.net>
X-RT-Original-Encoding: iso-8859-15
X-RT-Interface: Email
Content-Length: 762
Download (untitled) / with headers
text/plain 762b
Hello, thanks for writing File::Slurp. I noticed a bug in File::Slurp which leads to bad data being read. The binmode option is ignored in the code path for short files. Especially when reading and writing text files on Windows using {binmode => ':raw'}, but also when processing UTF-8 files, this is quite bad. The quick workaround is to simply delete that wrong optimization at the start of read_file(). If you want to keep the code path for short files, you will have to come up with your own way of reimplementing IO layers, or at least detect :raw and likely :utf-8 layers and act on them appropriately. Especially the line to "fix" Windows input does not seem prudent: $buf =~ s/\015\012/\n/g if $is_win32 ; Thanks for looking at this, -max
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: multipart/mixed; boundary="----------=_1540073465-10014-2"
Message-ID: <rt-4.0.18-10014-1540073465-1771.84918-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 714
Download (untitled) / with headers
text/plain 714b
Show quoted text
> I noticed a bug in File::Slurp which leads to bad data being read. The > binmode option is ignored in the code path for short files. Especially > when reading and writing text files on Windows using {binmode => > ':raw'}, but also when processing UTF-8 files, this is quite bad.
Reviewing this bug, the problem is not in the short path, but in the long path, which does not cope with read_file( 'file.txt', { binmode => ':crlf' }); or read_file( 'file.txt', { binmode => ':encoding(Latin-1)' }); on Windows, due to the hand-rolled "fixup" of newlines under the assumption that all binmode arguments need to trigger this. I've attached a test file for this. The tests fail under Windows currently.
MIME-Version: 1.0
Subject: newline_binmode.t
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: application/octet-stream; name="newline_binmode.t"
Content-Disposition: inline; filename="newline_binmode.t"
Content-Transfer-Encoding: base64
Content-Length: 1352
Download newline_binmode.t
text/x-perl 1.3k
use strict; use warnings; use IO::Handle (); use File::Basename (); use File::Spec (); use lib File::Spec->catdir(File::Spec->rel2abs(File::Basename::dirname(__FILE__)), 'lib'); use FileSlurpTest qw(temp_file_path); use File::Slurp qw(read_file write_file); use Test::More; plan tests => 6; my $binmode; for (':encoding(Latin-1)', ':crlf', ':raw') { $binmode = $_; my $data = "\n\n\n"; my $file_name = temp_file_path(); stdio_write_file($file_name, $data); my $slurped_data = read_file($file_name, { binmode => $binmode }); my $stdio_slurped_data = stdio_read_file( $file_name ) ; print 'data ', unpack( 'H*', $data), "\n", 'slurp ', unpack('H*', $slurped_data), "\n", 'stdio slurp ', unpack('H*', $stdio_slurped_data), "\n"; is($data, $slurped_data, "slurp ($binmode)"); write_file($file_name, { binmode => $binmode }, $data ); $slurped_data = stdio_read_file($file_name); is($data, $slurped_data, "spew ($binmode)"); unlink $file_name; }; sub stdio_write_file { my ($file_name, $data) = @_; open (my $fh, '>', $file_name) || die "Couldn't create $file_name: $!"; binmode $fh, $binmode; $fh->print($data); } sub stdio_read_file { my ($file_name) = @_; open (my $fh, '<', $file_name ) || die "Couldn't open $file_name: $!"; binmode $fh, $binmode; local $/; my $data = <$fh>; return $data; }
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-17095-1549152500-1748.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 108
Download (untitled) / with headers
text/plain 108b
Hi Corion, I believe this to be resolved in the current release that refactored read_file(). Thanks, Chase
MIME-Version: 1.0
X-Spam-Status: No, score=-5.9 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, FROM_OUR_RT=-4] autolearn=ham
In-Reply-To: <rt-4.0.18-17095-1549152500-300.84918-6-0 [...] rt.cpan.org>
X-Cpan.org: This message routed through the cpan.org mail forwarding service. Please use PAUSE pause.perl.org to configure your delivery settings.
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-84918 [...] rt.cpan.org> <517ECEFA.7030708 [...] corion.net> <rt-4.0.18-17095-1549152500-300.84918-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <87b4abc2-fffe-ab77-81e8-8695cd05b321 [...] corion.net>
content-type: text/plain; charset="utf-8"
Autocrypt: addr=corion [...] corion.net; keydata= mQENBFrXmq0BCAC8tVHD8D6GmWsA1uOWnxHeCnNClnMRnCa4qoxhAkbZo7Zq1XHedT7zWC2m UxKEYyctcAy824BZLnbBFg7AgczUqDnsiDRlbE3udHIWC0537V30HPhi4ouGnmuU2EKAzR5C kYD8HHJmSnFX6S/JSvZRvQWKmlx3s3+0nlb1TkWXJG30B/VGc5BN2TXWxczc5IYuoT9c6uVG LpxGX+FrVXAuvnNzS3dgbGwuOI+PJ/Zk6VcaOXKNuZ1PDiHsAKaAZ6UNbgUHLHJEu62oMGnd ofueYp3vLFH6T8PKEaBIkjcry1iuCGU8mUZ+hTVEeyQk01u/aAwdVGbBzjQofoDkxJajABEB AAG0IU1heCBNYWlzY2hlaW4gPGNvcmlvbkBjb3Jpb24ubmV0PokBVAQTAQgAPhYhBGe327th RTZnEA7EWm37uUQQTybOBQJa15qxAhsDBQkB4TOABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA AAoJEG37uUQQTybOSP8IAIH2H/fgA3bGLAw/XuTGkYeZcxFY8Xt5pPWoePPsxZCgm2BbefKn jo2Bz7mZmImF59CQQ/g8Bt7gviuNGuaiYRknZS4t0EZE6ZAyQXbI4vNFB2dKz52uZ24X2l4D 106dQiA7Q7LlJxJ2Q4j/+JOAh0dn4oCbRgsoZl0Io4kCJyzGD+h0fIp7J4GcrCxL/24+QO29 VNuGT/l9wk872St+eldVXTTezaJcP+aNpE1eDdO8yTosdwtXNZKTnfo3xNfSX4jW9LSP+r3x UPQQ//wIe718jDCRzPm6cpp8+S3qXlknXWQhxm7L9tV7LFLDbO2jpuU+TK5BsKGogh7V6eq3 hR65AQ0EWtearwEIALDanXG3DcT/P5c+44Xdq8KcwxaBss77zShawUWpRk/YNFC3688v0P5n rvbVSQ1jqBtYJjwx4yVEcpkWL5njVWhNAHaPufZbl9vFp7Qn4BCJMcWzNES8cDe1fwrjzY/l 62d6G8qKzBsWxuSY+SOFAY55yIWFnZQET1e33JWAzaW2uVfMbNzWfAhZ3OBGgOyIP3nKLRDr 6ALgz2E6WitBBdqLoYTRwypSOasIURWFNhLdp7HiiVhybCmFJRzLFnEUQkpkJtifEh4DUeyS HN2SVx2+Vfbusif4MpDt/FK4vpflT4KudHVRV5/zZ2QKLbvaFb4+fMeo4nqAf/V7Kl2a8HUA EQEAAYkBPAQYAQgAJhYhBGe327thRTZnEA7EWm37uUQQTybOBQJa15qvAhsMBQkB4TOAAAoJ EG37uUQQTybOQdoIAKiru1tUAqUqKKqglt6NzJe/rCXbtTBF0og6xKGqWRwJo6w+N2hBOwVU OS0IgudlPFQgb72IT7Zi+zNFjTZzsgBSF84+4PibRqOu3rCtFiidB9PW42X/85ElunaHlUeD cE7zPzOLqTJutMQqj4w/larC4uf2zO6yAx6Nwd/XfkcsP5amXNL3cItELYia8FryNVEFzBer 2pZKMtvVPn1tkWKqXRX0GMqdfDjxfZFP3KTPCjPTHJMOJiLaOETF4qIdXGEhcan4alhN0Utb easL3/vqal9dumq8kwe1DEtbSpO9eLgKNdjR40hXfnYKYbizrQcd73pdHvfXwzEaw5U8bDI=
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.9
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 327A224022D for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Sun, 3 Feb 2019 01:40:45 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xie3BB-XthSN for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Sun, 3 Feb 2019 01:40:43 -0500 (EST)
Received: from xx1.develooper.com (xx1.develooper.com [207.171.7.115]) by hipster.bestpractical.com (Postfix) with ESMTPS id 3B1D32401C9 for <bug-File-Slurp [...] rt.cpan.org>; Sun, 3 Feb 2019 01:40:41 -0500 (EST)
Received: from localhost (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with ESMTP id EB5B57CF89 for <bug-File-Slurp [...] rt.cpan.org>; Sat, 2 Feb 2019 22:40:39 -0800 (PST)
Received: from xx1.develooper.com (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with SMTP id E65E37CF88 for <bug-File-Slurp [...] rt.cpan.org>; Sat, 2 Feb 2019 22:40:37 -0800 (PST)
Received: from mail.corion.net (mail.corion.net [83.169.23.242]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by xx1.develooper.com (Postfix) with ESMTPS id 6AC597C1C5 for <bug-File-Slurp [...] rt.cpan.org>; Sat, 2 Feb 2019 22:40:36 -0800 (PST)
Received: from p4fe896f0.dip0.t-ipconnect.de ([79.232.150.240] helo=aliens.maischein.home) by mail.corion.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <corion [...] corion.net>) id 1gqBSD-0000CZ-SX for bug-File-Slurp [...] rt.cpan.org; Sun, 03 Feb 2019 07:40:33 +0100
Received: from cabininthewoods.maischein.home ([192.168.1.92]) by aliens.maischein.home with esmtp (Exim 4.89) (envelope-from <corion [...] corion.net>) id 1gqBSD-00012k-8k for bug-File-Slurp [...] rt.cpan.org; Sun, 03 Feb 2019 07:40:33 +0100
Delivered-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #84918] read_file() ignores binmode option for short files
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0
Return-Path: <corion [...] corion.net>
X-Original-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
X-RT-Mail-Extension: file-slurp
Openpgp: preference=signencrypt
Date: Sun, 3 Feb 2019 07:40:33 +0100
X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' FRAUD_ATTACH 0.05, HTML_00_01 0.05, HTML_00_10 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1000_LESS 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_600_699 0, BODY_SIZE_7000_LESS 0, IN_REP_TO 0, LEGITIMATE_SIGNS 0, MSG_THREAD 0, REFERENCES 0, SPF_NONE 0, URI_ENDS_IN_HTML 0, URI_WITH_PATH_ONLY 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __DQ_NEG_HEUR 0, __DQ_NEG_IP 0, __FORWARDED_MSG 0, __HAS_FROM 0, __HAS_MSGID 0, __HTTPS_URI 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_TEXT_P 0, __MIME_TEXT_P1 0, __MIME_VERSION 0, __MOZILLA_USER_AGENT 0, __MULTIPLE_URI_TEXT 0, __NO_HTML_TAG_RAW 0, __REFERENCES 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __SUBJ_REPLY 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0, __URI_IN_BODY 0, __URI_NOT_IMG 0, __URI_NO_MAILTO 0, __URI_NO_WWW 0, __URI_NS , __URI_WITH_PATH 0, __USER_AGENT 0, __blackholes.mail-abuse.org_ERROR , __zen.spamhaus.org_ERROR '
X-Spam-Level:
X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2019.2.3.63017
To: bug-File-Slurp [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Max Maischein <corion [...] corion.net>
RT-Message-ID: <rt-4.0.18-19702-1549176047-332.84918-0-0 [...] rt.cpan.org>
Content-Length: 625
Download (untitled) / with headers
text/plain 625b
Hello Chase, thank you very much for working on File::Slurp! Unfortunately, the problem is not fixed (see test attached to this ticket and to https://github.com/perhunter/slurp/pull/19 . The new version does not cope properly with {binmode => ':encoding(Latin-1)'} for example, because it does _not_ apply the :crlf handling in that situation when normal reading would. -max Am 03.02.2019 um 01:08 schrieb Chase Whitener via RT: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=84918 > > > Hi Corion, > > I believe this to be resolved in the current release that refactored read_file(). > > Thanks, > Chase >
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-20746-1552139659-1580.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 308
Download (untitled) / with headers
text/plain 308b
This problem still exists. Please see the attached test. How can I help so that instead of closing tickets you run the tests I include? I have already created a pull request to add the test at https://github.com/perhunter/slurp/pull/19 , which remains unmerged. I'm at a loss how to support you better here.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-20746-1552139659-1580.84918-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net> <rt-4.0.18-20746-1552139659-1580.84918-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-19663-1552153473-1680.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 417
Download (untitled) / with headers
text/plain 417b
Apologies for closing the ticket out prematurely. On Sat Mar 09 08:54:19 2019, CORION wrote: Show quoted text
> This problem still exists. Please see the attached test. > > How can I help so that instead of closing tickets you run the tests I > include? I have already created a pull request to add the test at > https://github.com/perhunter/slurp/pull/19 , which remains unmerged. > I'm at a loss how to support you better here.
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-7374-1554412657-1055.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 6298
Download (untitled) / with headers
text/plain 6.1k
9:11 PM <genio> Does anyone quite know how I should go about https://rt.cpan.org/Public/Bug/Display.html?id=84918 ? 9:11 PM <dipsy> [ Bug #84918 for File-Slurp: read_file() ignores binmode option for short files ] 9:11 PM <genio> No matter what I do to accommodate that problem, I break binmode.t on Windows 9:14 PM <Grinnz> so if i'm reading correctly the problem is that it opens handles with :raw by default 9:14 PM <Grinnz> thus default layers never get applied 9:15 PM <genio> yea 9:15 PM <shadowpaste> "genio" at 217.168.150.38 pasted "attempt_one.diff" (24 lines) at http://paste.scsys.co.uk/583733 9:15 PM <dipsy> [ magnet_web paste from "genio" at 217.168.150.38... ] 9:16 PM <Grinnz> do you know why :raw is even used? 9:16 PM <genio> That allows the current test suite to pass, but makes his attached test fail on the :encoding(Latin-1) test 9:16 PM <Grinnz> you know what better question, what does File::Slurper do 9:16 PM <genio> Grinnz: to not break back-compat with the old way of doing things and to deal with file handles and file paths similarly 9:17 PM <Grinnz> right, File::Slurper has a separate read_text that can care about default layers 9:17 PM <genio> I don't know that it's possible to fix without killing bugwards compatibility 9:17 PM <Grinnz> though actually even there it has a special case for the crlf layer on windows 9:18 PM <Grinnz> so, i guess you just gotta do that 9:18 PM <Grinnz> but yeah, it will result in different results on windows so it would be backwards incompatible 9:19 PM <Grinnz> oh, maybe not since it did that postprocessing to emulate crlf 9:19 PM <Grinnz> but only without layers passed 9:20 PM <Grinnz> i'd say it's impossible to know in File::Slurper whether the user desires text or binary processing of the file in general 9:20 PM <Grinnz> er, in File::Slurp 9:21 PM <Grinnz> i guess if they pass :utf8 or any :encoding layer you can assume text processing 9:22 PM <Grinnz> but according to the workaround that was there, basically any binmode passed would assume binary processing 9:23 PM <Grinnz> so tldr: yes it would be incompatible, tell them to use File::Slurper for a less buggy module 9:24 PM <genio> Either that or just go ahead and break bugwards compatibility on Windows because who the hell's been using windows!? :) 9:24 PM <genio> I kid. mostly 9:25 PM <Grinnz> you've got a few hours left in the states for an april fools release that just forwards to File::Slurper ;) 9:40 PM <genio> so, I guess I'm at a stopping point for getting any work done on File::Slurp. anything I do at this point breaks something 9:43 PM <Grinnz> i think "making it not completely break" is plenty 9:52 PM <haarg> binmode.t tests that providing a binmode at all causes it to avoid newline translation 9:53 PM <haarg> so yeah i don't think there's anything you can do without breaking one of the existing tests 9:53 PM <haarg> aside from documenting it 9:55 PM <Grinnz> i assume that "feature" stems from the misconception of binmode being used only to set the raw layer 9:58 PM <haarg> well, that test is testing :utf8 9:59 PM <Grinnz> that's a whole other ball of fun 10:04 PM <haarg> otherwise https://gist.github.com/haarg/60d51a5c675e2076b2b8aac6b3adb3bc would be a sensible fix 10:04 PM <Grinnz> an incompatible one, but yeah 10:05 PM <haarg> breaks binmode.t on windows, yeah 10:06 PM <Grinnz> it would also slow down a lot of cases that don't pass binmode but should 10:13 PM <haarg> perl and io on windows are already slow enough that i doubt anyone would notice 10:13 PM <Grinnz> i was talking about on unix 10:14 PM <haarg> :raw does nothing on unix 10:14 PM <Grinnz> isn't it what binmode with no argument applies? 10:15 PM <haarg> yes 10:15 PM <haarg> and it does nothing 10:17 PM <Grinnz> i guess i'm conflating the effect on non-slurpy readline with performance 10:18 PM <haarg> it can turn off other layers that were applied, but used on its own it does nothing 10:19 PM <haarg> there is the :unix 'layer' that can be used and provide some speed improvements 10:21 PM <Grinnz> maybe that's what i'm thinking of 10:21 PM <Grinnz> layers are a fucking mess tbh 10:21 PM <haarg> yeah 10:25 PM <haarg> by default there is a unix layer and a perlio layer. the perlio layer ignores the unix layer. applying the unix layer removes the perlio layer. the utf8 layer sets a flag on the perlio layer. i think the default windows behavior works similarly but doesn't have a name that can be applied. 10:29 PM <haarg> ah on windows it gets the crlf layer instead of the perlio layer 10:29 PM <haarg> but still has the unix layer 10:29 PM <Grinnz> naturally 10:29 PM <haarg> makes perfect sense 10:29 PM <Grinnz> File::Spec does it, why not 10:30 PM <haarg> and if you use :raw, it turns off the crlf flag on the crlf layer 10:30 PM <Grinnz> 🤨 10:30 PM <haarg> it's still there though, because it's the thing actually doing the io work ... 10:32 PM <haarg> genio: document it as needing to use :raw:encoding(Latin-1) if you need consistency 10:32 PM <Grinnz> i think for File::Slurp backcompat is all it has going for it 10:33 PM <haarg> ^ 10:34 PM <genio> leave it as is has been voted for. anyone want to write up a bit of documentation on how the current approach is broken for write_file() ? ... 10:36 PM <Grinnz> i thought we were talking about read_file ... 10:36 PM <genio> gah. yes. I errantly said write_file. meant read_file 10:40 PM <haarg> it's essentially: without binmode is default perl behavior (crlf translation on windows), with binmode is starting from :raw. if you want crlf translation and unicode handling, use :encoding(UTF-8):crlf 10:41 PM <Grinnz> read_file currently skips CRLF translation on Windows if any binmode is passed, even text encoding layers. For a more straightforward way to read and decode a file as text, try L<File::Slurper/read_text>. 10:41 PM <Grinnz> or yeah mention the manual use of :crlf 10:43 PM <Grinnz> applying :crlf manually won't be portable to unix though right 10:44 PM <haarg> crlf works fine on unix 10:44 PM <Grinnz> right but i mean, it will still function 10:44 PM <Grinnz> where on unix you don't want it to generally 10:45 PM <haarg> if it's a text file it won't really do any harm 10:45 PM <haarg> for read_file it's perfectly sensible to use on both windows and unix
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-26174-1554412733-249.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 279
Download (untitled) / with headers
text/plain 279b
Corion, I don't think there's much we can do here. I've asked around in the hopes that it was just me not being creative enough, but it doesn't appear that there's a solution that won't break some already defined behavior. I'd be happy to entertain ideas, though. Thanks, Chase
MIME-Version: 1.0
X-Spam-Status: No, score=-5.9 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, FROM_OUR_RT=-4] autolearn=ham
In-Reply-To: <rt-4.0.18-26174-1554412734-1025.84918-6-0 [...] rt.cpan.org>
X-Cpan.org: This message routed through the cpan.org mail forwarding service. Please use PAUSE pause.perl.org to configure your delivery settings.
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-84918 [...] rt.cpan.org> <517ECEFA.7030708 [...] corion.net> <rt-4.0.18-26174-1554412734-1025.84918-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <6d812278-60c5-cafe-6e27-b99d787de843 [...] corion.net>
content-type: text/plain; charset="utf-8"
Autocrypt: addr=corion [...] corion.net; keydata= mQENBFrXmq0BCAC8tVHD8D6GmWsA1uOWnxHeCnNClnMRnCa4qoxhAkbZo7Zq1XHedT7zWC2m UxKEYyctcAy824BZLnbBFg7AgczUqDnsiDRlbE3udHIWC0537V30HPhi4ouGnmuU2EKAzR5C kYD8HHJmSnFX6S/JSvZRvQWKmlx3s3+0nlb1TkWXJG30B/VGc5BN2TXWxczc5IYuoT9c6uVG LpxGX+FrVXAuvnNzS3dgbGwuOI+PJ/Zk6VcaOXKNuZ1PDiHsAKaAZ6UNbgUHLHJEu62oMGnd ofueYp3vLFH6T8PKEaBIkjcry1iuCGU8mUZ+hTVEeyQk01u/aAwdVGbBzjQofoDkxJajABEB AAG0IU1heCBNYWlzY2hlaW4gPGNvcmlvbkBjb3Jpb24ubmV0PokBVAQTAQgAPhYhBGe327th RTZnEA7EWm37uUQQTybOBQJa15qxAhsDBQkB4TOABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA AAoJEG37uUQQTybOSP8IAIH2H/fgA3bGLAw/XuTGkYeZcxFY8Xt5pPWoePPsxZCgm2BbefKn jo2Bz7mZmImF59CQQ/g8Bt7gviuNGuaiYRknZS4t0EZE6ZAyQXbI4vNFB2dKz52uZ24X2l4D 106dQiA7Q7LlJxJ2Q4j/+JOAh0dn4oCbRgsoZl0Io4kCJyzGD+h0fIp7J4GcrCxL/24+QO29 VNuGT/l9wk872St+eldVXTTezaJcP+aNpE1eDdO8yTosdwtXNZKTnfo3xNfSX4jW9LSP+r3x UPQQ//wIe718jDCRzPm6cpp8+S3qXlknXWQhxm7L9tV7LFLDbO2jpuU+TK5BsKGogh7V6eq3 hR65AQ0EWtearwEIALDanXG3DcT/P5c+44Xdq8KcwxaBss77zShawUWpRk/YNFC3688v0P5n rvbVSQ1jqBtYJjwx4yVEcpkWL5njVWhNAHaPufZbl9vFp7Qn4BCJMcWzNES8cDe1fwrjzY/l 62d6G8qKzBsWxuSY+SOFAY55yIWFnZQET1e33JWAzaW2uVfMbNzWfAhZ3OBGgOyIP3nKLRDr 6ALgz2E6WitBBdqLoYTRwypSOasIURWFNhLdp7HiiVhybCmFJRzLFnEUQkpkJtifEh4DUeyS HN2SVx2+Vfbusif4MpDt/FK4vpflT4KudHVRV5/zZ2QKLbvaFb4+fMeo4nqAf/V7Kl2a8HUA EQEAAYkBPAQYAQgAJhYhBGe327thRTZnEA7EWm37uUQQTybOBQJa15qvAhsMBQkB4TOAAAoJ EG37uUQQTybOQdoIAKiru1tUAqUqKKqglt6NzJe/rCXbtTBF0og6xKGqWRwJo6w+N2hBOwVU OS0IgudlPFQgb72IT7Zi+zNFjTZzsgBSF84+4PibRqOu3rCtFiidB9PW42X/85ElunaHlUeD cE7zPzOLqTJutMQqj4w/larC4uf2zO6yAx6Nwd/XfkcsP5amXNL3cItELYia8FryNVEFzBer 2pZKMtvVPn1tkWKqXRX0GMqdfDjxfZFP3KTPCjPTHJMOJiLaOETF4qIdXGEhcan4alhN0Utb easL3/vqal9dumq8kwe1DEtbSpO9eLgKNdjR40hXfnYKYbizrQcd73pdHvfXwzEaw5U8bDI=
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.9
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 442E9240251 for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Fri, 5 Apr 2019 02:27:10 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P5tDcfRCJk57 for <cpan-bug+File-Slurp [...] hipster.bestpractical.com>; Fri, 5 Apr 2019 02:27:08 -0400 (EDT)
Received: from xx1.develooper.com (xx1.develooper.com [207.171.7.115]) by hipster.bestpractical.com (Postfix) with ESMTPS id 7A4CB2400C5 for <bug-File-Slurp [...] rt.cpan.org>; Fri, 5 Apr 2019 02:27:08 -0400 (EDT)
Received: from localhost (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with ESMTP id E0DA87C19E for <bug-File-Slurp [...] rt.cpan.org>; Thu, 4 Apr 2019 23:27:06 -0700 (PDT)
Received: from xx1.develooper.com (xx1.develooper.com [127.0.0.1]) by localhost (Postfix) with SMTP id E98037CF73 for <bug-File-Slurp [...] rt.cpan.org>; Thu, 4 Apr 2019 23:27:04 -0700 (PDT)
Received: from mail.corion.net (mail.corion.net [83.169.23.242]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by xx1.develooper.com (Postfix) with ESMTPS id CD14D7C19E for <bug-File-Slurp [...] rt.cpan.org>; Thu, 4 Apr 2019 23:26:56 -0700 (PDT)
Received: from p4ffb0e26.dip0.t-ipconnect.de ([79.251.14.38] helo=aliens.maischein.home) by mail.corion.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <corion [...] corion.net>) id 1hCIJS-0005rl-Ve for bug-File-Slurp [...] rt.cpan.org; Fri, 05 Apr 2019 08:26:55 +0200
Received: from cabininthewoods.maischein.home ([192.168.1.92]) by aliens.maischein.home with esmtp (Exim 4.89) (envelope-from <corion [...] corion.net>) id 1hCIJS-0003eK-9R for bug-File-Slurp [...] rt.cpan.org; Fri, 05 Apr 2019 08:26:54 +0200
Delivered-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #84918] read_file() ignores binmode option for short files
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
Return-Path: <corion [...] corion.net>
X-Original-To: cpan-bug+File-Slurp [...] hipster.bestpractical.com
X-RT-Mail-Extension: file-slurp
Openpgp: preference=signencrypt
Date: Fri, 5 Apr 2019 08:26:43 +0200
X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' HTML_00_01 0.05, HTML_00_10 0.05, SUPERLONG_LINE 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1100_1199 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, IN_REP_TO 0, LEGITIMATE_SIGNS 0, MSG_THREAD 0, NO_URI_HTTPS 0, REFERENCES 0, SPF_NONE 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __DQ_NEG_HEUR 0, __DQ_NEG_IP 0, __FRAUD_MONEY_CURRENCY 0, __FRAUD_MONEY_CURRENCY_DOLLAR 0, __HAS_FROM 0, __HAS_MSGID 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_TEXT_P 0, __MIME_TEXT_P1 0, __MIME_VERSION 0, __MOZILLA_USER_AGENT 0, __NO_HTML_TAG_RAW 0, __REFERENCES 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __SUBJ_REPLY 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0, __URI_NO_MAILTO 0, __URI_NO_WWW 0, __URI_NS , __USER_AGENT 0, __blackholes.mail-abuse.org_ERROR , __zen.spamhaus.org_ERROR '
X-Spam-Level:
X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2019.4.5.61516
To: bug-File-Slurp [...] rt.cpan.org
Content-Transfer-Encoding: 8bit
From: Max Maischein <corion [...] corion.net>
RT-Message-ID: <rt-4.0.18-4173-1554445631-1997.84918-0-0 [...] rt.cpan.org>
Content-Length: 1120
Hello Chase, Show quoted text
> I don't think there's much we can do here. I've asked around in the hopes that it was just me not being creative enough, but it doesn't appear that there's a solution that won't break some already defined behavior. I'd be happy to entertain ideas, though.
Yes - personally, I think that read_file() should behave like do { local $/; <> } does, since originally File::Slurp had been intended as "simply" a faster implementation. But I think the design flaws in File::Slurp (and our adherence to bugwards compatibility) will prevent us from fixing this flaw on Windows. The bug has always been there on Windows, but on the upside, there is no proclamation in the module that it is the best module to read data from a file anymore. I haven't spent time with the reworked code, but maybe simply replacing all that code with: if( $binmode ) { binmode $fh => $binmode; }; { local $/; <$fh> } makes the test suite pass, but I consider the (original) test suite potentially flawed, most likely due to infamiliarity of the original author with Windows. Thanks for investigating this, -max
MIME-Version: 1.0
In-Reply-To: <517ECEFA.7030708 [...] corion.net>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <517ECEFA.7030708 [...] corion.net>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-23503-1555771684-1703.84918-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1735
Download (untitled) / with headers
text/plain 1.6k
Well, I don't think that it was that. We have to remember that this module was written well before Perl layers and unicode and ... It was doing the right thing for a very long time. So, we're still at somewhat of an impasse. While a large part of me sees it as kind of OK to break current practice in favor of doing the right thing on Windows as well, the other part of me does not agree. Two options: 1) Keep the current functionality and document the bug on Windows. This documentation would need to explain the problem and the reasoning for not fixing it. 2) Break back-compat on Windows and let the Perl layers do the line ending conversions for us on the various user-supplied layers via binmode. This would break some assumptions about how the _module_ works on Windows, but would comply with most people's assumptions about what the code _should_ be doing. This would also need a heaping helping of documentation. I don't want to make a BDFL, fist-on-the-table declaration about what to do here. This may be vote time. Currently, there are two maintainers, myself and Uri. I am not sure how Uri feels about this topic as we haven't yet discussed it. My work thus far has been strongly focusing on _NOT_ breaking any backwards compatibility. I would not want to do anything at all here without Uri's and maybe some sort of other vote process in place. It's my opinion that Uri's vote overrides whatever vote I may cast and even that of any type of community vote. Uri, what are your opinions? There's a lot going on in this ticket and much to digest. Also, just because I can only think of the two options above doesn't mean that there isn't a third or nth. If you have other options, I'd love to hear them. Thanks, Chase


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.