Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 16952
Status: open
Priority: 0/
Queue: PPI

People
Owner: adamk [...] cpan.org
Requestors: nospam-abuse [...] bloodgate.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



From: Tels <nospam-abuse [...] bloodgate.com>
To: bug-PPI [...] rt.cpan.org
Subject: [PATCH] Speed up tokenizer char-by-char
Date: Sat, 7 Jan 2006 13:24:36 +0100
Download (untitled) / with headers
text/plain 2.4k
-----BEGIN PGP SIGNED MESSAGE----- Moin, profiling PPI showed that for lines that are not recognized completely, the line is processed char-by-char. Unfortunately, this happened in an empty while loop by calling a subroutine for each character. :) The attached patch moves the loop inside the subroutine, allowing us to bypass the calls, the empty while body as well as the repeated checks for the valid cursor pos. I also eliminated duplicate code inside the loop. The patch also fixes a bug as a side-effect, the process_next_char() routine did not localize $_. I have not attempted to add a test for that, though. The speedup is a few percent, which highly depends on how many times lines need to be processed char-by-char and how long they are. Example parsing Graph::Easy.pm 5 times (to avoid start-up overhead skewing the results, the results are still skwed by the DESTROY e.g. the parsing is speed up more than shown here): Lowest from three runs: te@linux:~/perl/PPI> time perl d.pl real 0m5.376s user 0m5.288s sys 0m0.066s te@linux:~/perl/PPI> time perl -IPPI-1.109.e/lib/ d.pl real 0m5.181s user 0m5.110s sys 0m0.054s On this particular data, PPI is now about 3..4% faster. All tests still pass. Also attached are two profile runs. The .pm file 2489 lines, the test parses 12490 lines in 5.18 seconds, making PPI parsing about 2400 lines/s on my 2.0 Ghz AMD Athlon. Not bad :) Further ideas are to: * recognize more things entirely, so char-by-char overhead is reduced * less subroutines (to concentrate code hot spots) * find out what calls __ANON__ (which smells like something is triggering an overload, needlessly) Hope you like this work, Tels - -- Signed on Sat Jan 7 12:51:00 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. If you are bald, and comb some of your hair over the bald spot, you are violating US Patent #4,022,227: <http://tinyurl.com/6qxl7> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQ7+zBHcLPEOTuEwVAQHFrQf+Oxf3qwPF+OPi8ZMgPSa+h2oTCbS11B2Y IivSO3SuOp3okljWi8eEmLEdJa1tVYIw+kcXp+7/TUhS8XOKhy1LPHUAV6fKUbHE MtXu+EJ5/zYk3Xh2GwWRuK7IG7KiggnoteuonjGwVW2Ry5mMn+9wxAoN9bjRo/cf Jo6JKVdKss/Asq2yFL4p66YiK6FxPcohq8EhEkBUFYNoGaBzMxemNDDcha5Zhg/s pbc1u2tJjtxJU/tyR0T112i4Ay+H7gyuag3ah4j97Ltjasd7qPOMEiZ3TvTmws8A U1dDiH9tQzPRIbIAJAi9py2yEf4dBALn1bp13OppbYlm+vErUYwxgw== =LRYP -----END PGP SIGNATURE-----
Download 1.09_1.txt
text/plain 3.5k

Message body is not shown because sender requested not to inline it.

Download 1.09_2.txt
text/plain 3.3k

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Download (untitled) / with headers
text/plain 286b
On Sat Jan 07 07:18:10 2006, nospam-abuse@bloodgate.com wrote: Show quoted text
> The patch also fixes a bug as a side-effect, the process_next_char() > routine did not localize $_. I have not attempted to add a test for that, > though.
This part of the patch is fixed in SVN revision 1052. -- Chris
Subject: Re: [rt.cpan.org #16952] [PATCH] Speed up tokenizer char-by-char
Date: Fri, 22 Sep 2006 00:22:32 +0200
To: bug-PPI [...] rt.cpan.org
From: Tels <nospam-abuse [...] bloodgate.com>
Download (untitled) / with headers
text/plain 1.3k
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Moin, On Thursday 21 September 2006 17:19, via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=16952 > > > On Sat Jan 07 07:18:10 2006, nospam-abuse@bloodgate.com wrote:
> > The patch also fixes a bug as a side-effect, the process_next_char() > > routine did not localize $_. I have not attempted to add a test for > > that, though.
> > This part of the patch is fixed in SVN revision 1052. > -- Chris
Heya Chris, nice to meet you :-) So did my patch get applied? best wishes from holiday, Tels - -- Signed on Fri Sep 22 00:22:02 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Den wahren Wert dieser Software werden vermutlich nur Fach Läute und Firmen erkennen." -- "So isst es. Ein gewißer Standart muss schon gewart beiben!" -- Kabe (http://tinyurl.com/3kucx) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iQEVAwUBRRMQqHcLPEOTuEwVAQLH4Af/QYopP/QJ9mOBk3wrSUrGlk7TZUnPKOta GYwou4U0IeUmy7OM7lTjbB86LmkqejxGqdRfD0OcNIfBHknjWEx2RjIrYyxs/kRf aM8XqdkyMQHekw3DRTFn3HSOawhdoVLa+Tk7nxZQQ0xCKnKLKtUDEKLo5Vd0CkG3 ajFpssLZt7shRSSQFjV4fdUd0al9Pw32gqFJElPyduInHFloa+XCrLFa7eF60k8B u0yCCK+toPasHgSV73GQIuGI8HmYf7OQsZ4Ih3xuNMgS3ZE4FacgEgR4oUxYVRYg pDuZc2eBklzPIWXXmXoKwHnxP3e1v3uwXhc8XxeryQIk09px93dcGQ== =vRHr -----END PGP SIGNATURE-----


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.