indexing - Solr index removing stopwords does not seem to work -


i remove stopwords index during indexing , query somehow words within stopwords.txt not seem removed index (i can still use these in query , result hits them).

here schema.xml:

    <fieldtype name="text" class="solr.textfield"         positionincrementgap="100">           <analyzer type="index">             <tokenizer class="solr.standardtokenizerfactory"/>             <!-- in example, use synonyms @ query time             <filter class="solr.synonymfilterfactory"         synonyms="index_synonyms.txt" ignorecase="true" expand="false"/>             -->             <filter class="solr.lowercasefilterfactory"/>             <filter class="solr.stopfilterfactory"         ignorecase="true"         words="stopwords.txt" />             <filter class="solr.worddelimiterfilterfactory"         generatewordparts="1" generatenumberparts="1" catenatewords="1"         catenatenumbers="1" catenateall="0" splitoncasechange="1"/>                     <filter class="solr.keywordmarkerfilterfactory"         protected="protwords.txt"/>             <filter class="solr.kstemfilterfactory"/>             <filter class="solr.removeduplicatestokenfilterfactory"/>           </analyzer>           <analyzer type="query">             <tokenizer class="solr.standardtokenizerfactory"/>             <filter class="solr.synonymfilterfactory"         synonyms="synonyms.txt" ignorecase="true" expand="true"/>             <filter class="solr.lowercasefilterfactory"/>             <filter class="solr.stopfilterfactory" ignorecase="true"         words="stopwords.txt" />             <filter class="solr.worddelimiterfilterfactory"         generatewordparts="1" generatenumberparts="1" catenatewords="0"         catenatenumbers="0" catenateall="0" splitoncasechange="1"/>                     <filter class="solr.keywordmarkerfilterfactory"         protected="protwords.txt"/>             <filter class="solr.kstemfilterfactory"/>             <filter class="solr.removeduplicatestokenfilterfactory"/>           </analyzer>         </fieldtype>       <field name="_version_" type="long" indexed="true" stored="true"/>       <field name="id" type="string" indexed="true" stored="true"         required="true" multivalued="false" />       <field name="post_content" type="text" indexed="true"         stored="true"/>       <field name="post_title" type="text" indexed="true" stored="true"/>       <field name="post_date" type="date" indexed="true" stored="true"/>       <field name="_text_" type="text" indexed="true"         stored="false"          multivalued="true" termvectors="true"                 termpositions="true" termoffsets="true"/> 

i using solr 6.0.

thanks advice,

sabine

by default file stopwords.txt not have stop words in it.

you can check same in of configset given solr.

but if check conf/lang folder , find many stopword files.

you can use whichever applicable per language.

for testing purpose can copy stopwords stopwords_en.txt file , paste in file stopward.txt in path configsets/basic_configs/conf/. here configset may different you. depends 1 have used.


Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -