Tuesday, April 10, 2012

Can't search for all generated terms in lucene index

I'm indexing and searching code using a custom analyzer. Given text "will wi-fi work", following tokens are generated ('will' being a stop-word, is eliminated).



wi-fi {position:2 start:5 end:10}
wifi {position:2 start:5 end:10}
wi {position:2 start:5 end:7}
fi {position:2 start:8 end:10}
work {position:3 start:11 end:15}


When I search for terms wi-fi, work I get search results. However, when I issue any query (phrase/non-phrase) for wifi, wi, fi I don't get any results. Is there anything wrong with the generated tokens?



Parsed search queries:



For wi-fi (works fine)



Lucene's: +matchAllDocs:true +(alltext:wi-fi alltext:wifi alltext:wi alltext:fi)


For wifi (no results returned)



Lucene's: +matchAllDocs:true +alltext:wifi


For "will wi-fi work" (works fine)



Lucene's: +matchAllDocs:true +alltext:"(wi-fi wifi wi fi) work"


For "will wifi work" (no results returned)



Lucene's: +matchAllDocs:true +alltext:"? wifi work"




No comments:

Post a Comment