|CC:||hinrik [...] cpan.org|
|Subject:||Enable Custom (User Implemented) Tokenizers with FTS3 tables|
We (Hinrik and I) have an application (Hailo on CPAN) that could use SQLite FTS3 tables. The problem is that the default tokenizer SQLite provides is too naïve for the sort of text we're processing. I.e. it's ASCII-only. SQLite supports custom tokenizers by creating a C function and then passing a pointer to that function as a BLOB via fts3_tokenizer():SQLite's default tokenizer is defined in its fts3_tokenizer1.c. I haven't tested it yet but it should be possible to do this with the current DBD::SQLite interface by creating an XS module which includes the sqlite headers and creates a sqlite3_tokenizer_module and returns a pointer to its struct to Perl as a IV, then that could be passed to DBD::SQLite by converting the IV to a SQLite BLOB: 1.29/lib/DBD/SQLite.pm#Blobs But it would be much simpler if DBD::SQLite did all the hard lifting so you could simply pass Perl subroutine callbacks similar to how 'sqlite_create_function' works now. Have the maintainers looked into this and perhaps have some idea about how best to do this?