This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id:
125105
Status:
open
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
alexchandel [...] gmail.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



Subject: Segmentation fault at 2 GB of memeory
Parsing a large file (about 317 lines, about 19670 bytes) results in a segmentation fault as soon as Perl hits ~2 GB of memory usage. It's hard to tell, but memory allocation seemed to accelerate the further along the parser got in the file. Is there any way to force Regexp::Grammars to treat a list-like subrule as non-backtracking or atomic, like "<[statements=block_stmt]>*+" instead of "<[statements=block_stmt]>*"? Wouldn't that save on memory if Regexp::Grammars knew to fail instead of saving backtracking locations? Tested in Perl 5.26.1, macOS 10.13.4, Regexp::Grammars 1.048.
On Tue Apr 17 16:34:13 2018, alexchandel@gmail.com wrote:
Show quoted text
> Parsing a large file (about 317 lines, about 19670 bytes) results in a > segmentation fault as soon as Perl hits ~2 GB of memory usage. It's > hard to tell, but memory allocation seemed to accelerate the further > along the parser got in the file. > > Is there any way to force Regexp::Grammars to treat a list-like > subrule as non-backtracking or atomic, like > "<[statements=block_stmt]>*+" instead of "<[statements=block_stmt]>*"? > Wouldn't that save on memory if Regexp::Grammars knew to fail instead > of saving backtracking locations? > > Tested in Perl 5.26.1, macOS 10.13.4, Regexp::Grammars 1.048.
Note that this happens *even with* <nocontext:>, plus the liberal use of possessive & atomic expressions in tokens to limit backtracking at the small scale.
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 22:33:22 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Hi Alex,

I agree that it would certainly be better if non-backtracking repetitions could be
applied to subrule calls. It would indeed reduce the parser's memory footprint
in some cases.

Unfortunately, I am not aware of any way to make something like <subrule>*+
work correctly.

There is a long-standing issue with how in-regex variable localizations are implemented
which means that they do not unwind correctly when a call to an independent
subpattern is part of a non-backtracking repetition. And because R::G uses
localized variables to implement its parse-tree stack, this issue makes it
impossible to implement <subrule>*+ properly.

I raised this issue with P5P several years ago but have been unable to convince
them to change the current behaviour in such a way as to ensure that repeated
localizations unwind correctly.

I certainly understand your frustration, and I am very sorry that I have no better option
to offer you. :-(

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 15:16:19 +0200
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>
On 18 April 2018 at 14:34, damian@conway.org via RT <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Hi Alex, > > I agree that it would certainly be better if non-backtracking repetitions > could be > applied to subrule calls. It would indeed reduce the parser's memory > footprint > in some cases. > > Unfortunately, I am not aware of any way to make something like <subrule>*+ > work correctly. > > There is a long-standing issue with how in-regex variable localizations are > implemented > which means that they do not unwind correctly when a call to an independent > subpattern is part of a non-backtracking repetition. And because R::G uses > localized variables to implement its parse-tree stack, this issue makes it > impossible to implement <subrule>*+ properly. > > I raised this issue with P5P several years ago but have been unable to > convince > them to change the current behaviour in such a way as to ensure that > repeated > localizations unwind correctly.
Can you point me at any of this discussion? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 23:34:32 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Hi Yves,

> Can you point me at any of this discussion?

Most of it was in person between Rik Signes (who was Pumpking at the time)
and myself.

The formal report was much more recent, via Hugo van der Sanden:
https://rt.perl.org/Public/Bug/Display.html?id=132277

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 13:44:47 +0000
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>
This is the first I have heard of this. Perhaps there is something we can do.

Yves

On Wed, 18 Apr 2018, 15:37 damian@conway.org via RT, <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
       Queue: Regexp-Grammars
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 >

Hi Yves,

> Can you point me at any of this discussion?

Most of it was in person between Rik Signes (who was Pumpking at the time)
and myself.

The formal report was much more recent, via Hugo van der Sanden:
https://rt.perl.org/Public/Bug/Display.html?id=132277

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 23:48:45 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
> This is the first I have heard of this. 
> Perhaps there is something we can do.

It would be awesome if there were.

Hugo’s sample code demonstrates one manifestation
of the problem, but I’ll be happy to provide further examples
of the issue if you need them.

Thanks, Yves.

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 15:30:15 +0000
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>
Yes some tests would be very helpful

On Wed, 18 Apr 2018, 15:49 damian@conway.org via RT, <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
       Queue: Regexp-Grammars
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 >

> This is the first I have heard of this.
> Perhaps there is something we can do.

It would be awesome if there were.

Hugo’s sample code demonstrates one manifestation
of the problem, but I’ll be happy to provide further examples
of the issue if you need them.

Thanks, Yves.

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Thu, 19 Apr 2018 02:23:47 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Yves,

Here is the simplest test suite I could devise that demonstrates
the problem. In my testing, it fails identically under every release
from 5.10 to 5.26

Damian

-----cut----------cut----------cut----------cut----------cut----------cut----------cut-----


#! /usr/bin/env perl
use warnings;

#
# The pattern matching code in each pair of subtests is identical,
# except that the second subtest replaces a backtracking quantifier
# with its non-backtracking equivalent.
#
# In each case, the non-backtracking version should pass, but it fails
#

use Test::More;
plan tests => 12;

our $count;

################

subtest 'Backtracking *' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A)*    # This quantifier is backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking *+' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A)*+   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################

subtest 'Backtracking (?:*)' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?: (?&A)* )   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking (?>*)' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?> (?&A)* )   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################

subtest 'Backtracking +' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A)+    # This quantifier is backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking ++' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A)++   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################

subtest 'Backtracking {3}' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A){3}    # This quantifier is backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking {3}+' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A){3}+   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################

subtest 'Backtracking {1,3}' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A){1,3}    # This quantifier is backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking {1,3}+' => sub {
    $count = 0;
    ok "aaa" =~ m{
                    \A
                    (?&A){1,3}+   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 3 => "Found 3 a's" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################

subtest 'Backtracking ?' => sub {
    $count = 0;
    ok "a" =~ m{
                    \A
                    (?&A)?    # This quantifier is backtracking
                    \z
                    (?{ is $count, 1 => "Found 1 a" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

subtest 'Non-backtracking ?+' => sub {
    $count = 0;
    ok "a" =~ m{
                    \A
                    (?&A)?+   # This quantifier is non-backtracking
                    \z
                    (?{ is $count, 1 => "Found 1 a" })

                    (?(DEFINE)  (?<A>  a  (?{ local $count = $count + 1 }) ) )
                }x => 'Regex matched';
};

################


Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 18 Apr 2018 18:50:25 +0200
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>
On 18 April 2018 at 18:24, damian@conway.org via RT <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Yves, > > Here is the simplest test suite I could devise that demonstrates > the problem. In my testing, it fails identically under every release > from 5.10 to 5.26
That is awesome. Thanks Damian. Sorry for the terse replies earlier, was on a dratted phone. Hope you well! Yves
Show quoted text
> > -----cut----------cut----------cut----------cut----------cut----------cut----------cut----- > > > #! /usr/bin/env perl > use warnings; > > # > # The pattern matching code in each pair of subtests is identical, > # except that the second subtest replaces a backtracking quantifier > # with its non-backtracking equivalent. > # > # In each case, the non-backtracking version should pass, but it fails > # > > use Test::More; > plan tests => 12; > > our $count; > > ################ > > subtest 'Backtracking *' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)* # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking *+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)*+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking (?:*)' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?: (?&A)* ) # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking (?>*)' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?> (?&A)* ) # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking +' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)+ # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking ++' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)++ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking {3}' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){3} # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking {3}+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){3}+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking {1,3}' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){1,3} # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking {1,3}+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){1,3}+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking ?' => sub { > $count = 0; > ok "a" =~ m{ > \A > (?&A)? # This quantifier is backtracking > \z > (?{ is $count, 1 => "Found 1 a" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking ?+' => sub { > $count = 0; > ok "a" =~ m{ > \A > (?&A)?+ # This quantifier is non-backtracking > \z > (?{ is $count, 1 => "Found 1 a" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################
-- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Thu, 19 Apr 2018 04:21:52 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
> Sorry for the terse replies earlier, was on a dratted phone.

No problem at all. Thanks for your interest in the problem!


> Hope you well!

Very well. Hope you are likewise. :-)

Damian

On Wed Apr 18 14:22:44 2018, damian@conway.org wrote:
Show quoted text
> > Sorry for the terse replies earlier, was on a dratted phone.
> > No problem at all. Thanks for your interest in the problem! > >
> > Hope you well!
> > Very well. Hope you are likewise. :-) > > Damian
Possessive quantifiers and atomic grouping would be nice. But in the mean time, is there any way I can cut down on Regexp::Grammars' explosive memory usage? The grammar that triggers this has binary expression trees, with different rules used to encode operator precedence, and each rule is an objtoken. Would "optimizing" the object tree, which often looks something like or(xor(and(compare(add(mult(factor(power(2))), 2))))), to add(2,2), by deleting unnecessary children in the constructor that Regexp::Grammars calls save significant memory? Or does Regexp::Grammars keep references to these objects? Where do these 100s of megabytes of memory come from, in matching a couple hundred short lines?
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Wed, 25 Apr 2018 22:43:25 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Hi Alex,

> Where do these 100s of megabytes of memory come from,
> in matching a couple hundred short lines?

That's the huge mystery here. I simply don't know.

You reported falling over at 2GB, which strongly implies
a serious memory leak, not a huge object tree.

Each node object is likely to be less that 1kB, so that would
imply a tree of well over a million nodes, which is absurd.

Optimizing the tree would, of course, reduce the memory usage,
but I doubt it will solve the problem, because I doubt a big tree
is causing the problem in the first place.

If you can send me a self-contained example of the grammar
and the data that causing this issue, I'll take a look at it
myself and see if I can see anything obviously amiss.

But it's more likely, I fear, that we're tripping some internal
issue that's resulting in a huge memory leak. So I can't promise
I'll be able to solve this for you.

Damian

On Wed Apr 25 08:44:37 2018, damian@conway.org wrote:
Show quoted text
> Hi Alex, >
> > Where do these 100s of megabytes of memory come from, > > in matching a couple hundred short lines?
> > That's the huge mystery here. I simply don't know. > > You reported falling over at 2GB, which strongly implies > a serious memory leak, not a huge object tree. > > Each node object is likely to be less that 1kB, so that would > imply a tree of well over a million nodes, which is absurd. > > Optimizing the tree would, of course, reduce the memory usage, > but I doubt it will solve the problem, because I doubt a big tree > is causing the problem in the first place. > > If you can send me a self-contained example of the grammar > and the data that causing this issue, I'll take a look at it > myself and see if I can see anything obviously amiss. > > But it's more likely, I fear, that we're tripping some internal > issue that's resulting in a huge memory leak. So I can't promise > I'll be able to solve this for you. > > Damian
Damian, I've sent you a simplified grammar and an test file that result in this issue. Running it invariably results in an error similar to: [1] 12343 segmentation fault ./simple.pl test.cl Alex
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Fri, 27 Apr 2018 23:53:23 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Thanks for the example file, Alex.

I've confirmed that it segfaults under any version of Perl >= 5.18
and runs perfectly on any version of Perl between 5.10 and 5.16.

So it's an internal issue of some kind in the revamped regex engine
(the revamping process began in 5.18).

The bad news is that this is not something I can fix in the module's
code.

The less bad news is that this is definitely something I can report
as a Perl bug, and which someone else may eventually be able to fix.
I will do this as soon as I can reduce the problem to something
small enough to make an actionable bug report.

For the moment, the only workaround seems to be to run it
under Perl 5.16 or earlier (via perlbrew, for example).

The alternative would be to look at porting your grammar to Marpa::R2
(https://metacpan.org/pod/distribution/Marpa-R2/pod/Marpa_R2.pod).

Or, less painfully, to use one of the Marpa helper modules:

    https://metacpan.org/pod/MarpaX::Simple
    https://metacpan.org/pod/Grammar::Marpa

I'm sorry I don't have a better answer for you than this, Alex.
If I happen to come across a simpler workaround whilst I'm
creating the bug report, I'll certainly post it here.

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sat, 28 Apr 2018 00:40:50 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
On further investigation, this grammar segfaults
under pre-5.18 versions of Perl as well, if it is fed
longer input.

Therefore the problem is inherent in either the way
the grammar is written (the tail-recursions might be an issue,
maybe try <[subrule]> % SEPARATOR instead?)
or else it is intrinsic to the regex engine itself, but
not able to be bisected (which means it almost
certainly can't be fixed).

Once again, the only Perl 5 solution I can currently suggest is
to try Marpa::R2 (via one of the helper modules).

Alternatively, you could perhaps look at converting it to a
Perl 6 grammar.

I freely admit that neither of these is a particularly
easy option, and I apologize again that Regexp::Grammars
doesn't seem to be able to handle this task as we would
both wish.

Damian
On Fri Apr 27 10:41:41 2018, damian@conway.org wrote:
Show quoted text
> On further investigation, this grammar segfaults > under pre-5.18 versions of Perl as well, if it is fed > longer input. > > Therefore the problem is inherent in either the way > the grammar is written (the tail-recursions might be an issue, > maybe try <[subrule]> % SEPARATOR instead?) > or else it is intrinsic to the regex engine itself, but > not able to be bisected (which means it almost > certainly can't be fixed). > > Once again, the only Perl 5 solution I can currently suggest is > to try Marpa::R2 (via one of the helper modules). > > Alternatively, you could perhaps look at converting it to a > Perl 6 grammar. > > I freely admit that neither of these is a particularly > easy option, and I apologize again that Regexp::Grammars > doesn't seem to be able to handle this task as we would > both wish. > > Damian
I don't know of a public debugging interface, but is there any way you could step through Regexp::Grammars' matching to see what triggers the segfault? Even if it's inherent in how the program is written, memory shouldn't be a problem, as my computer (& address space) have far more than 2GB of memory. I can't use separators in most cases, because I have multiple separators of equal precedence that I need to preserve. For example, <[factor]> % [-+] doesn't preserve whether the tail factors were added or subtracted, and I don't see anything in the documentation on how to preserve them. But I'm not sure this is the problem. First, the trees are generally narrow. A value might parse to or(xor(and(comp(add(mult(factor(power(value(number(42)))))))))). However, more importantly, in code removed from the simplified case I send you, I optimize the entire tree to just number(42). You can test this yourself by modifying the new() sub in the Foo class to delete $self->{head} and $self->{tail} if they're present. The same segmentation fault occurs.
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sat, 28 Apr 2018 09:16:05 +0000
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>


On Fri, 27 Apr 2018, 20:16 Alex via RT, <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
       Queue: Regexp-Grammars
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 >

On Fri Apr 27 10:41:41 2018, damian@conway.org wrote:
> On further investigation, this grammar segfaults
> under pre-5.18 versions of Perl as well, if it is fed
> longer input.
>
> Therefore the problem is inherent in either the way
> the grammar is written (the tail-recursions might be an issue,
> maybe try <[subrule]> % SEPARATOR instead?)
> or else it is intrinsic to the regex engine itself, but
> not able to be bisected (which means it almost
> certainly can't be fixed).
>
> Once again, the only Perl 5 solution I can currently suggest is
> to try Marpa::R2 (via one of the helper modules).
>
> Alternatively, you could perhaps look at converting it to a
> Perl 6 grammar.
>
> I freely admit that neither of these is a particularly
> easy option, and I apologize again that Regexp::Grammars
> doesn't seem to be able to handle this task as we would
> both wish.
>
> Damian

I don't know of a public debugging interface, but is there any way you could step through Regexp::Grammars' matching to see what triggers the segfault?

Even if it's inherent in how the program is written, memory shouldn't be a problem, as my computer (& address space) have far more than 2GB of memory.

I can't use separators in most cases, because I have multiple separators of equal precedence that I need to preserve. For example, <[factor]> % [-+] doesn't preserve whether the tail factors were added or subtracted, and I don't see anything in the documentation on how to preserve them.

But I'm not sure this is the problem. First, the trees are generally narrow. A value might parse to or(xor(and(comp(add(mult(factor(power(value(number(42)))))))))). However, more importantly, in code removed from the simplified case I send you, I optimize the entire tree to just number(42).

You can test this yourself by modifying the new() sub in the Foo class to delete $self->{head} and $self->{tail} if they're present. The same segmentation fault occurs.

The 2gb thing is suspicious, it suggests some 32bit counter, possibly signed.

But the problem you are seeing probably has to do with when temporaries are freed. It may simply be we are missing a call to free things at the right time.

Yves
Show quoted text
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sun, 29 Apr 2018 01:13:40 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
> I don't know of a public debugging interface, but is there any way you
> could step through Regexp::Grammars' matching to see what triggers the
> segfault?

As Yves subsequently suggested, it's likely this issue is down in guts of Perl
and not a response to any particular component of Regexp::Grammars. So
stepping through the grammar parse is unlikely to help determine the
problem. Unless someone stepped through it with gdb or some other
interpreter-level debugger.


> Even if it's inherent in how the program is written, memory shouldn't
> be a problem, as my computer (& address space) have far more than 2GB
> of memory.

Agreed. As does mine. But, again as Yves suggested, something is hitting
the 32-bit limit even if it's not malloc.


> I can't use separators in most cases, because I have multiple
> separators of equal precedence that I need to preserve. For example,
> <[factor]> % [-+] doesn't preserve whether the tail factors were added
> or subtracted, and I don't see anything in the documentation on how to
> preserve them.

You can remove the recursion by named-capturing the separators as well (as a list):

    <[factor]>+ % <[operator=([-+])]>

But, yes, this will only defer the problem, not prevent it.
It will cause fewer subrule calls, but eventually the 2GB
limit will still be reached and the segfault will occur.

Damian

Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sun, 29 Apr 2018 01:16:48 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
Thanks for the insights, Yves.

Would you like me to send you Alex's example
for you to explore?

I completely understand that you may have no interest
in doing so, but I thought I should at least beg^H^H^Hoffer ;-)

Damian
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sat, 28 Apr 2018 16:46:52 +0000
To: bug-Regexp-Grammars@rt.cpan.org
From: demerphq <demerphq@gmail.com>
On Sat, 28 Apr 2018, 17:17 damian@conway.org via RT, <bug-Regexp-Grammars@rt.cpan.org> wrote:
Show quoted text
       Queue: Regexp-Grammars
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 >

Thanks for the insights, Yves.

Would you like me to send you Alex's example
for you to explore?

I completely understand that you may have no interest
in doing so, but I thought I should at least beg^H^H^Hoffer ;-)

Can't hurt. But it'll be some days before I get to it. On a laptop free holiday in Rome just now....

Yves
Show quoted text
Subject: Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date: Sun, 29 Apr 2018 03:12:26 +1000
To: bug-Regexp-Grammars@rt.cpan.org
From: Damian Conway <damian@conway.org>
> Can't hurt.

Much obliged! I'll send it directly to you.


> But it'll be some days before I get to it. On a laptop free
> holiday in Rome just now....

Excellent. Have a great time and forget all about those
annoying regexes. :-)

Damian


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.