Skip Menu |
 

This queue is for tickets about the Module-Build CPAN distribution.

Report information
The Basics
Id: 106813
Status: new
Priority: 0/
Queue: Module-Build

People
Owner: Nobody in particular
Requestors: ntyni [...] iki.fi
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.4214
Fixed in: (no value)

Attachments
0001-Sort-file-lists-generated-by-rscan_dir.patch
0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch



Subject: Deterministic linking order
Download (untitled) / with headers
text/plain 1.1k
While working on the "reproducible builds" effort [0], we have noticed that the linking order of object files in Module::Build::c_link() depends on readdir() order, which is nondeterministic. This affects the generated binary, rendering it non-reproducible. The nondeterminism originates in rscan_dir(). The attached patch makes it return its file lists in sorted order. Some alternative fixes would be to call File::Find with the "preprocess" argument to sort the list, or sort the list of object files in process_support_files() or later in c_link(). It's not clear to me if the latter options are safe, or if a distribution might inject its own list of object files and expect their order to be preserved. In contrast, since there's no existing guarantee of the order of rscan_dir() results, it's clearly safe. The downside is a number of probably unnecessary sort() calls when rscan_dir() gets called in other contexts. I'm happy to rework the patch to one of the other alternatives, and make a GitHub pull request out of that if you like. Please just let me know what kind of fix you would prefer. Thanks for your work on Module-Build! [0] https://wiki.debian.org/ReproducibleBuilds
Subject: 0001-Sort-file-lists-generated-by-rscan_dir.patch
From 7bfcb26d8e314bce37aeeef4048f99b66fcdfbbc Mon Sep 17 00:00:00 2001 From: Niko Tyni <ntyni@debian.org> Date: Tue, 1 Sep 2015 22:05:27 +0300 Subject: [PATCH] Sort file lists generated by rscan_dir() The rscan_dir() function traverses a directory with File::Find, which returns files in readdir() order. This order is nondeterministic and depends on the file system. The lists are used, among other things, to find C files to compile (in process_support_files()) and later to link (in c_link()). The linking order affects the generated binary, essentially rendering it nondeterministic and breaking reproducibility. --- lib/Module/Build/Base.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Module/Build/Base.pm b/lib/Module/Build/Base.pm index 1cbc61e..d9ea82f 100644 --- a/lib/Module/Build/Base.pm +++ b/lib/Module/Build/Base.pm @@ -5255,7 +5255,7 @@ sub rscan_dir { die "Unknown pattern type"; File::Find::find({wanted => $subr, no_chdir => 1}, $dir); - return \@result; + return [ sort @result ]; } sub delete_filetree { -- 2.1.4
Subject: Re: [rt.cpan.org #106813] AutoReply: Deterministic linking order
Date: Sun, 13 Sep 2015 23:07:58 +0300
To: Bugs in Module-Build via RT <bug-Module-Build [...] rt.cpan.org>
From: Niko Tyni <ntyni [...] iki.fi>
Download (untitled) / with headers
text/plain 1.3k
Show quoted text
> While working on the "reproducible builds" effort [0], we have noticed that the linking order of object files in Module::Build::c_link() depends on readdir() order, which is nondeterministic. This affects the generated binary, rendering it non-reproducible. > > The nondeterminism originates in rscan_dir(). The attached patch makes it return its file lists in sorted order. Some alternative fixes would be to call File::Find with the "preprocess" argument to sort the list, or sort the list of object files in process_support_files() or later in c_link(). > > It's not clear to me if the latter options are safe, or if a distribution might inject its own list of object files and expect their order to be preserved. In contrast, since there's no existing guarantee of the order of rscan_dir() results, it's clearly safe. The downside is a number of probably unnecessary sort() calls when rscan_dir() gets called in other contexts.
Unfortunately it turns out that my proposed change wasn't safe after all. Apparently File::Find::find() will always list regular files before subdirectories, and the sorting destroys this property. We've tested the patch on Debian unstable, and at least one package build broke because of this. I'll try to come up with a better patch and follow up on this ticket. Apologies for the inconvenience. -- Niko Tyni ntyni@debian.org
From: ntyni [...] iki.fi
Download (untitled) / with headers
text/plain 1.2k
On Sun Sep 13 16:08:16 2015, ntyni@iki.fi wrote: Show quoted text
> > While working on the "reproducible builds" effort [0], we have > > noticed that the linking order of object files in > > Module::Build::c_link() depends on readdir() order, which is > > nondeterministic. This affects the generated binary, rendering it > > non-reproducible. > > > > The nondeterminism originates in rscan_dir(). The attached patch > > makes it return its file lists in sorted order. Some alternative > > fixes would be to call File::Find with the "preprocess" argument to > > sort the list, or sort the list of object files in > > process_support_files() or later in c_link(). > > > > It's not clear to me if the latter options are safe, or if a > > distribution might inject its own list of object files and expect > > their order to be preserved. In contrast, since there's no existing > > guarantee of the order of rscan_dir() results, it's clearly safe. The > > downside is a number of probably unnecessary sort() calls when > > rscan_dir() gets called in other contexts.
> > Unfortunately it turns out that my proposed change wasn't safe after > all.
Show quoted text
> I'll try to come up with a better patch and follow up on this ticket.
Revised patch attached, this should work better. We've tested this for a week now without any regressions.
Subject: 0003-Preprocess-file-lists-generated-by-rscan_dir-to-sort.patch
From: Niko Tyni <ntyni@debian.org> Date: Tue, 1 Sep 2015 22:05:27 +0300 Subject: [PATCH] Preprocess file lists generated by rscan_dir() to sort them The rscan_dir() function traverses a directory with File::Find, which returns files in readdir() order. This order is nondeterministic and depends on the file system. The lists are used, among other things, to find C files to compile (in process_support_files()) and later to link (in c_link()). The linking order affects the generated binary, essentially rendering it nondeterministic and breaking reproducibility. Bug-Debian: https://bugs.debian.org/797709 Bug: https://rt.cpan.org/Public/Bug/Display.html?id=106813 --- lib/Module/Build/Base.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Module/Build/Base.pm b/lib/Module/Build/Base.pm --- a/lib/Module/Build/Base.pm +++ b/lib/Module/Build/Base.pm @@ -5254,7 +5254,7 @@ sub rscan_dir { ref($pattern) eq 'CODE' ? sub {push @result, $File::Find::name if $pattern->()} : die "Unknown pattern type"; - File::Find::find({wanted => $subr, no_chdir => 1}, $dir); + File::Find::find({wanted => $subr, no_chdir => 1, preprocess => sub { sort @_ }}, $dir); return \@result; }


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.