Lucene 6.2 and ISO-8859-1 (latin) characters -

i'm trying lucene index searcher on project.

the content of documents indexed have latin (iso-8859-1) characters, users can (and will) search using charset.

as far know, lucene generates index files using utf-8.

questions:

1) there way specify charset when searching lucene? or have manually convert query utf-8 , run search?

2) indexsearcher.search() method not ignoring whitespaces, have guess "tokens" right meaningful results show up. if user forgets add whitespaces on searched term, no results showed. there way configure searcher (or queryparser) ignore whitespaces?

not quite sure running trouble here. presume reading user input string, i'm don't know issue come up. providing code clarify that. if are, indeed, reading byte array user input, yes, converting necessary. it's not laborious process convert byte[] string though. use string ctor.
queryparser tokenizes @ whitespace if analyzer have passed it does. standardanalyzer typical choice.

Search This Blog

Facebook Talkie

Lucene 6.2 and ISO-8859-1 (latin) characters -

Comments

Post a Comment

Popular posts from this blog

delphi - How to make a proper alternate row color on a filtered TVirtualStringTree -

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -