Tuesday, April 10, 2012

Can't search for all generated terms in lucene index

I'm indexing and searching code using a custom analyzer. Given text "will wi-fi work", following tokens are generated ('will' being a stop-word, is eliminated).

wi-fi {position:2 start:5 end:10}
wifi {position:2 start:5 end:10}
wi {position:2 start:5 end:7}
fi {position:2 start:8 end:10}
work {position:3 start:11 end:15}

When I search for terms wi-fi, work I get search results. However, when I issue any query (phrase/non-phrase) for wifi, wi, fi I don't get any results. Is there anything wrong with the generated tokens?

Parsed search queries:

For wi-fi (works fine)

Lucene's: +matchAllDocs:true +(alltext:wi-fi alltext:wifi alltext:wi alltext:fi)

For wifi (no results returned)

Lucene's: +matchAllDocs:true +alltext:wifi

For "will wi-fi work" (works fine)

Lucene's: +matchAllDocs:true +alltext:"(wi-fi wifi wi fi) work"

For "will wifi work" (no results returned)

Lucene's: +matchAllDocs:true +alltext:"? wifi work"

No comments:

Post a Comment