Lucene 6.2 and ISO-8859-1 (latin) characters -
i'm trying lucene index searcher on project.
the content of documents indexed have latin (iso-8859-1) characters, users can (and will) search using charset.
as far know, lucene generates index files using utf-8.
questions:
1) there way specify charset when searching lucene? or have manually convert query utf-8 , run search?
2) indexsearcher.search() method not ignoring whitespaces, have guess "tokens" right meaningful results show up. if user forgets add whitespaces on searched term, no results showed. there way configure searcher (or queryparser) ignore whitespaces?
not quite sure running trouble here. presume reading user input string, i'm don't know issue come up. providing code clarify that. if are, indeed, reading byte array user input, yes, converting necessary. it's not laborious process convert byte[] string though. use string ctor.
queryparser tokenizes @ whitespace if analyzer have passed it does.
standardanalyzer
typical choice.
Comments
Post a Comment