Skip Menu |
 

This queue is for tickets about the Module-Build CPAN distribution.

Report information
The Basics
Id: 106813
Status: new
Priority: 0/
Queue: Module-Build

People
Owner: Nobody in particular
Requestors: ntyni [...] iki.fi
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.4214
Fixed in: (no value)

Attachments
0001-Sort-file-lists-generated-by-rscan_dir.patch
0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch



Subject: Deterministic linking order
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
Message-ID: <rt-4.0.18-13726-1441136258-1644.0-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1441136258-13726-2"
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
Content-Length: 1193
Download (untitled) / with headers
text/plain 1.1k
While working on the "reproducible builds" effort [0], we have noticed that the linking order of object files in Module::Build::c_link() depends on readdir() order, which is nondeterministic. This affects the generated binary, rendering it non-reproducible. The nondeterminism originates in rscan_dir(). The attached patch makes it return its file lists in sorted order. Some alternative fixes would be to call File::Find with the "preprocess" argument to sort the list, or sort the list of object files in process_support_files() or later in c_link(). It's not clear to me if the latter options are safe, or if a distribution might inject its own list of object files and expect their order to be preserved. In contrast, since there's no existing guarantee of the order of rscan_dir() results, it's clearly safe. The downside is a number of probably unnecessary sort() calls when rscan_dir() gets called in other contexts. I'm happy to rework the patch to one of the other alternatives, and make a GitHub pull request out of that if you like. Please just let me know what kind of fix you would prefer. Thanks for your work on Module-Build! [0] https://wiki.debian.org/ReproducibleBuilds
Subject: 0001-Sort-file-lists-generated-by-rscan_dir.patch
MIME-Version: 1.0
Content-Type: text/x-diff; name="0001-Sort-file-lists-generated-by-rscan_dir.patch"
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline; filename="0001-Sort-file-lists-generated-by-rscan_dir.patch"
Content-Transfer-Encoding: binary
Content-Length: 1091
From 7bfcb26d8e314bce37aeeef4048f99b66fcdfbbc Mon Sep 17 00:00:00 2001 From: Niko Tyni <ntyni@debian.org> Date: Tue, 1 Sep 2015 22:05:27 +0300 Subject: [PATCH] Sort file lists generated by rscan_dir() The rscan_dir() function traverses a directory with File::Find, which returns files in readdir() order. This order is nondeterministic and depends on the file system. The lists are used, among other things, to find C files to compile (in process_support_files()) and later to link (in c_link()). The linking order affects the generated binary, essentially rendering it nondeterministic and breaking reproducibility. --- lib/Module/Build/Base.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Module/Build/Base.pm b/lib/Module/Build/Base.pm index 1cbc61e..d9ea82f 100644 --- a/lib/Module/Build/Base.pm +++ b/lib/Module/Build/Base.pm @@ -5255,7 +5255,7 @@ sub rscan_dir { die "Unknown pattern type"; File::Find::find({wanted => $subr, no_chdir => 1}, $dir); - return \@result; + return [ sort @result ]; } sub delete_filetree { -- 2.1.4
MIME-Version: 1.0
X-Spam-Status: No, score=-7.98 tagged_above=-99.9 required=10 tests=[AWL=1.509, BAYES_00=-1.9, FROM_OUR_RT=-4, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-1.289] autolearn=ham
In-Reply-To: <rt-4.0.18-13726-1441136258-891.106813-3-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-106813 [...] rt.cpan.org> <rt-4.0.18-13726-1441136258-891.106813-3-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20150913200758.GA3228 [...] estella.local.invalid>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -7.98
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id BBAE72403B5 for <cpan-bug+Module-Build [...] hipster.bestpractical.com>; Sun, 13 Sep 2015 16:08:15 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NCVqCB-eXBEX for <cpan-bug+Module-Build [...] hipster.bestpractical.com>; Sun, 13 Sep 2015 16:08:14 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id E0D42240029 for <bug-Module-Build [...] rt.cpan.org>; Sun, 13 Sep 2015 16:08:13 -0400 (EDT)
Received: (qmail 14306 invoked by alias); 13 Sep 2015 20:08:12 -0000
Received: from smtp-rs1-vallila2.fe.helsinki.fi (HELO smtp-rs1-vallila2.fe.helsinki.fi) (128.214.173.75) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Sun, 13 Sep 2015 13:08:04 -0700
Received: from estella.local.invalid (hy-vpn2-5.vpn.helsinki.fi [128.214.129.5]) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id t8DK7wv5004631 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NOT) for <bug-Module-Build [...] rt.cpan.org>; Sun, 13 Sep 2015 23:07:59 +0300
Received: from niko by estella.local.invalid with local (Exim 4.86) (envelope-from <ntyni [...] cc.helsinki.fi>) id 1ZbDZ8-00017A-8A for bug-Module-Build [...] rt.cpan.org; Sun, 13 Sep 2015 23:07:58 +0300
Delivered-To: cpan-bug+Module-Build [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #106813] AutoReply: Deterministic linking order
User-Agent: Mutt/1.5.23 (2014-03-12)
Return-Path: <ntyni [...] cc.helsinki.fi>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Module-Build [...] hipster.bestpractical.com
X-RT-Mail-Extension: module-build
Date: Sun, 13 Sep 2015 23:07:58 +0300
X-Spam-Level:
To: Bugs in Module-Build via RT <bug-Module-Build [...] rt.cpan.org>
From: Niko Tyni <ntyni [...] iki.fi>
RT-Message-ID: <rt-4.0.18-32458-1442174896-1620.106813-0-0 [...] rt.cpan.org>
Content-Length: 1369
Download (untitled) / with headers
text/plain 1.3k
Show quoted text
> While working on the "reproducible builds" effort [0], we have noticed that the linking order of object files in Module::Build::c_link() depends on readdir() order, which is nondeterministic. This affects the generated binary, rendering it non-reproducible. > > The nondeterminism originates in rscan_dir(). The attached patch makes it return its file lists in sorted order. Some alternative fixes would be to call File::Find with the "preprocess" argument to sort the list, or sort the list of object files in process_support_files() or later in c_link(). > > It's not clear to me if the latter options are safe, or if a distribution might inject its own list of object files and expect their order to be preserved. In contrast, since there's no existing guarantee of the order of rscan_dir() results, it's clearly safe. The downside is a number of probably unnecessary sort() calls when rscan_dir() gets called in other contexts.
Unfortunately it turns out that my proposed change wasn't safe after all. Apparently File::Find::find() will always list regular files before subdirectories, and the sorting destroys this property. We've tested the patch on Debian unstable, and at least one package build broke because of this. I'll try to come up with a better patch and follow up on this ticket. Apologies for the inconvenience. -- Niko Tyni ntyni@debian.org
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-32458-1442174896-1620.106813-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: API
References: <RT-Ticket-106813 [...] rt.cpan.org> <rt-4.0.18-13726-1441136258-891.106813-3-0 [...] rt.cpan.org> <20150913200758.GA3228 [...] estella.local.invalid> <rt-4.0.18-32458-1442174896-1620.106813-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1442858322-7726-2"
Message-ID: <rt-4.0.18-7726-1442858322-1812.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-7726-1442858322-250.106813-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
From: ntyni [...] iki.fi
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1310
Download (untitled) / with headers
text/plain 1.2k
On Sun Sep 13 16:08:16 2015, ntyni@iki.fi wrote: Show quoted text
> > While working on the "reproducible builds" effort [0], we have > > noticed that the linking order of object files in > > Module::Build::c_link() depends on readdir() order, which is > > nondeterministic. This affects the generated binary, rendering it > > non-reproducible. > > > > The nondeterminism originates in rscan_dir(). The attached patch > > makes it return its file lists in sorted order. Some alternative > > fixes would be to call File::Find with the "preprocess" argument to > > sort the list, or sort the list of object files in > > process_support_files() or later in c_link(). > > > > It's not clear to me if the latter options are safe, or if a > > distribution might inject its own list of object files and expect > > their order to be preserved. In contrast, since there's no existing > > guarantee of the order of rscan_dir() results, it's clearly safe. The > > downside is a number of probably unnecessary sort() calls when > > rscan_dir() gets called in other contexts.
> > Unfortunately it turns out that my proposed change wasn't safe after > all.
Show quoted text
> I'll try to come up with a better patch and follow up on this ticket.
Revised patch attached, this should work better. We've tested this for a week now without any regressions.
MIME-Version: 1.0
Subject: 0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-diff; name="0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch"
Content-Disposition: inline; filename="0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch"
Content-Transfer-Encoding: binary
Content-Length: 1239
From: Niko Tyni <ntyni@debian.org> Date: Tue, 1 Sep 2015 22:05:27 +0300 Subject: [PATCH] Preprocess file lists generated by rscan_dir() to sort them The rscan_dir() function traverses a directory with File::Find, which returns files in readdir() order. This order is nondeterministic and depends on the file system. The lists are used, among other things, to find C files to compile (in process_support_files()) and later to link (in c_link()). The linking order affects the generated binary, essentially rendering it nondeterministic and breaking reproducibility. Bug-Debian: https://bugs.debian.org/797709 Bug: https://rt.cpan.org/Public/Bug/Display.html?id=106813 --- lib/Module/Build/Base.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Module/Build/Base.pm b/lib/Module/Build/Base.pm --- a/lib/Module/Build/Base.pm +++ b/lib/Module/Build/Base.pm @@ -5254,7 +5254,7 @@ sub rscan_dir { ref($pattern) eq 'CODE' ? sub {push @result, $File::Find::name if $pattern->()} : die "Unknown pattern type"; - File::Find::find({wanted => $subr, no_chdir => 1}, $dir); + File::Find::find({wanted => $subr, no_chdir => 1, preprocess => sub { sort @_ }}, $dir); return \@result; }


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.