Skip Menu |
 
rt.cpan.org will be shut down on March 1st, 2021.

This queue is for tickets about the Lingua-EN-NamedEntity CPAN distribution.

Report information
The Basics
Id: 133019
Status: open
Priority: 0/
Queue: Lingua-EN-NamedEntity

People
Owner: Nobody in particular
Requestors: amead [...] alanmead.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: _spurn_dictionary_words() excludes common names like Mark
Date: Sat, 18 Jul 2020 00:47:57 -0500
To: bug-Lingua-EN-NamedEntity [...] rt.cpan.org
From: Alan Mead <amead [...] alanmead.org>
Download (untitled) / with headers
text/plain 364b
As well as May, June, Joy, etc. It also excludes less common names like Star, Candy, Hope, etc. This is a significant issue for my use of this module. -- Alan D. Mead, Ph.D. President, Talent Algorithms Inc. science + technology = better workers http://www.alanmead.org Courage is resistance to fear, mastery of fear - not absence of fear. -- Mark Twain
Subject: Re: [rt.cpan.org #133019] _spurn_dictionary_words() excludes common names like Mark
Date: Sat, 18 Jul 2020 18:42:42 +0000
To: "bug-Lingua-EN-NamedEntity [...] rt.cpan.org" <bug-Lingua-EN-NamedEntity [...] rt.cpan.org>
From: Alberto Simões <asimoes [...] protonmail.com>
Download (untitled) / with headers
text/plain 1.7k
Hi Indeed, it can produce a lot of false negatives. This will all depend on your corpus, and what you prefer (false positives or false negatives). In any case, I am no longer managing this module, and I am not sure if Runar Buvik is still interested on doing so *last release five years ago). My suggestion would be to add one of two options: - allow the user to specify the common word lexicon (where the user can remove those words from the list) - create a method to add exceptions. This should be quite straightforward. If anyone is willing to provide a patch, I am happy to apply it, and release a new version. If anyone is willing to adopt the module, I am happy to share that responsibility. Kindest regards, ambs ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Saturday, July 18, 2020 7:06 AM, amead@alanmead.org via RT <bug-Lingua-EN-NamedEntity@rt.cpan.org> wrote: Show quoted text
> Sat Jul 18 02:06:41 2020: Request 133019 was acted upon. > Transaction: Ticket created by amead@alanmead.org > Queue: Lingua-EN-NamedEntity > Subject: _spurn_dictionary_words() excludes common names like Mark > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: amead@alanmead.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=133019 > > > As well as May, June, Joy, etc. It also excludes less common names like > Star, Candy, Hope, etc. This is a significant issue for my use of this > module. > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Alan D. Mead, Ph.D. > President, Talent Algorithms Inc. > > science + technology = better workers > > http://www.alanmead.org > > Courage is resistance to fear, mastery of fear - not absence > of fear. > > -- Mark Twain


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.