Skip Menu |

This queue is for tickets about the Lingua-StopWords CPAN distribution.

Report information
The Basics
Id: 52330
Status: open
Priority: 0/
Queue: Lingua-StopWords

Owner: Nobody in particular
Requestors: tburtonw [...]

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)

Subject: Lingua-StopWords does not produce utf8 for russian when utf8 argument is set
Date: Tue, 1 Dec 2009 14:52:39 -0500
To: "bug-Lingua-StopWords [...]" <bug-Lingua-StopWords [...]>
From: "Burton-West, Tom" <tburtonw [...]>
Download (untitled) / with headers
text/plain 659b
Hello, I am using Lingua::StopWords 0.9 with perl 5.8.8. When I give the utf-8 argument to getStopWords, I do not get correct utf8 out. It seems to ignore the utf-8 argument. use Lingua::StopWords qw( getStopWords ); my $stopwords = {}; $stopwords = getStopWords('ru', 'UTF-8'); my @words = keys %{$stopwords}; binmode STDOUT, ":utf8"; foreach my $word (@words) { print "$word\n"; } If I run the above program without setting STDOUT to utf8, I can verify that I am getting the koi8-r encoding whether or not the 'UTF-8" argument is included in the call to getStopWords. Tom Burton-West<>
Subject: [ #52330] How to fix
Date: Mon, 27 Mar 2017 18:27:12 +0300
To: bug-Lingua-StopWords [...]
From: Ivan Krylov <krylov.r00t [...]>
Download (untitled) / with headers
text/plain 252b
Lines 17-39 of Lingua/StopWords/ need to be re-encoded in two steps: 1) utf-8 -> latin1 2) koi8-r -> utf-8 This transformation fixes the file and stores the correct UTF-8 representation of Cyrillic characters in the file. -- Best regards, Ivan

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to