indexing - Solr index removing stopwords does not seem to work -
i remove stopwords index during indexing , query somehow words within stopwords.txt not seem removed index (i can still use these in query , result hits them).
here schema.xml:
<fieldtype name="text" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <!-- in example, use synonyms @ query time <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/> --> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="0" splitoncasechange="1"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.kstemfilterfactory"/> <filter class="solr.removeduplicatestokenfilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="0" catenatenumbers="0" catenateall="0" splitoncasechange="1"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.kstemfilterfactory"/> <filter class="solr.removeduplicatestokenfilterfactory"/> </analyzer> </fieldtype> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="id" type="string" indexed="true" stored="true" required="true" multivalued="false" /> <field name="post_content" type="text" indexed="true" stored="true"/> <field name="post_title" type="text" indexed="true" stored="true"/> <field name="post_date" type="date" indexed="true" stored="true"/> <field name="_text_" type="text" indexed="true" stored="false" multivalued="true" termvectors="true" termpositions="true" termoffsets="true"/>
i using solr 6.0.
thanks advice,
sabine
by default file stopwords.txt
not have stop words in it.
you can check same in of configset given solr.
but if check conf/lang folder , find many stopword files.
you can use whichever applicable per language.
for testing purpose can copy stopwords stopwords_en.txt
file , paste in file stopward.txt
in path configsets/basic_configs/conf/
. here configset may different you. depends 1 have used.
Comments
Post a Comment