Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 129086
Status: new
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: TONYC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: STOP_AT_PARTIAL forced on for renew()ed encodings (for some anyway)
Download (untitled) / with headers
text/plain 2.3k
This causes PerlIO::encoding to loop when a partial character is found at eof. PerlIO::encoding calls into Encode roughly like the loop in the following code: use strict; use Encode qw(encode decode); use constant BUFSIZ => 8192; my $flags = Encode::PERLQQ(); # this can't be zero my $encoding = "UTF-8"; my $filesrc = "\x{100c}" x 10000; my $filedata = encode($encoding, $filesrc, Encode::FB_CROAK); #chop $filedata; # LINE A my $expect = decode($encoding, (my $temp = $filedata), $flags); my $enc = Encode::find_encoding("UTF-8") or die; # this seems to have been added for PerlIO::encoding my $dup = $enc->renew or die; my $out = ""; my $buf = ""; while (length $filedata || length $buf) { # refill the buffer from the file my $fillsize = BUFSIZ - length($buf); $buf .= substr($filedata, 0, $fillsize, ""); my $eof = $filedata eq ""; # current behaviour my $mflags = $flags | Encode::STOP_AT_PARTIAL; # LINE B # try to avoid looping over a partial at eof # my $mflags = $eof ? $flags : $flags | Encode::STOP_AT_PARTIAL; # LINE C # decode our buffer, consuming some/all of it my $result = $dup->decode($buf, $mflags); print length $buf, "\n"; $out .= $result; } print $out eq $expect ? "ok\n" : "not ok\n"; This works fine if there's no partial at eof, but if there is (uncomment the chop at LINE A) it will loop until terminated. Ok, that's fine since we pass in STOP_AT_PARTIAL, but even if we make that conditional on eof (uncomment LINE C and comment LINE B) it continues to loop. For UTF-8 at least this occurs because Method_decode() in Encode.xs passes a true value to process_utf8() for the stop_at_partial parameter if the encoding has been "renewed". If I replace in Method_decode(): s = process_utf8(aTHX_ dst, s, e, check_sv, 0, strict_utf8(aTHX_ obj), renewed); with: s = process_utf8(aTHX_ dst, s, e, check_sv, 0, strict_utf8(aTHX_ obj), 0); the modified code above works correctly. This particular change would break PerlIO::encoding on older perls[1], but it would be useful if Encode could provide a way for PerlIO::encoding to prevent that behaviour. I understand the renew() is needed to ensure the state of the encoding is kept, eg. for byte ordering for UTF-16 encodings, so I don't think that can be removed. This might be the cause of https://rt.cpan.org/Ticket/Display.html?id=124094 Any ideas? Tony [1] at least for files that don't end with a partially encoded character
Download (untitled) / with headers
text/plain 989b
On Mon Apr 08 21:00:51 2019, TONYC wrote: Show quoted text
> This particular change would break PerlIO::encoding on older perls[1], > but it would be useful if Encode could provide a way for > PerlIO::encoding to prevent that behaviour. > > I understand the renew() is needed to ensure the state of the encoding > is kept, eg. for byte ordering for UTF-16 encodings, so I don't think > that can be removed. > > This might be the cause of > https://rt.cpan.org/Ticket/Display.html?id=124094 > > Any ideas?
Possible solutions: a) add a stop_at_partial parameter to renew() that defaults to 1, so older perls will see the old behaviour, and new perls can supply zero to make the STOP_AT_PARTIAL flag significant b) add a REALLY_NO_STOP_AT_PARTIAL flag that overrides the renew controlled flag c) make it dependent on perl version. c) would be hard to test b) is just ugly I think a) is the best solution, it can be tested in any perl version. If that makes sense to you I can work on a patch. Tony


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.