5 MINUTE READ | January 5, 2016
Elasticsearch Term or Terms Query Not Working? Start Here.
Christopher Davis has written this article. More details coming soon.
Summary if the term(s) being searched contain spaces or special characters, you’ll need to use a not_analyzed property in your search to make it work.
By default Elasticsearch runs data that comes in through a set of analyzers when it comes in. You can specify what sort of analysis you want done on the strings when you set up the property’s
parameter.index
This analysis turns the raw data into a set of tokens that are stored in an inverted index (here’s a bit more in depth guide).
When you search for something, the inverted index is queried and documents that match are returned.
When you search with something like a query string or match query, Elasticsearch will use its analyzers again to tokenize the query and look up documents that match in the inverted index. You can control which analyzer is used with the
parameter in the query object. You can see how Elasticsearch tokenizes as term with the analyze endpoint.analyzer
curl 'http://localhost:9200/\_analyze?pretty&text=test%20two'
{ "tokens" : [ { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "", "position" : 1 }, { "token" : "two", "start_offset" : 5, "end_offset" : 8, "type" : "", "position" : 2 } ] }
The term and terms queries do no analysis: they look for values that match exactly what’s given to them. This makes all kinds of sense: you’re trying to look up the values exactly as you pass them in.
But there’s a catch: term and terms queries still search the inverted index.
This is unnoticeable if you’re doing those queries on terms that are all one word or numeric since the terms stored in Elasticsearch would not have changed (the analyzer does nothing without spaces to tokenize on, etc). But term values with spaces or punctuation will appear not to be working unless the field you’re search is set to be not_analyzed.
First lets create an index with a single type and property.
curl -XPUT http://localhost:9200/analyzed\_example -d '{ "mappings": {
"mytype": { "_source": {"enabled": true}, "properties": { "content": { "type": "string" } } } } }'
Then we’ll index some documents:
curl -XPOST http://localhost:9200/analyzed\_example/mytype -d '{"content": "test"}'
curl -XPOST http://localhost:9200/analyzed\_example/mytype -d '{"content": "test two"}'Now let’s try a terms query with test, which should return just one document, but really returns two:
curl -XPOST http://localhost:9200/analyzed\_example/mytype/\_search?pretty -d '{ "query": {"term": {"content": "test"}}
}'{
"took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.5945348, "hits" : [ { "_index" : "analyzed_example", "_type" : "mytype", "_id" : "AVHotWCgWVxYklVnp_0-", "_score" : 0.5945348, "_source":{"content": "test"} }, { "_index" : "analyzed_example", "_type" : "mytype", "_id" : "AVHotYZ9WVxYklVnp_0_", "_score" : 0.37158427, "_source":{"content": "test two"} } ] } }
Why two documents? Because the analysis done one the content field in the second document put test and two into the inverted index. As such our terms query matches. But what happens when we do a term query on test two? No results.
curl -XPOST http://localhost:9200/analyzed\_example/mytype/\_search?pretty -d '{ "query": {"term": {"content": "test two"}}
}'{
"took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } }
We can get around this by setting the field we want to “not_analyzed”:
curl -XPUT http://localhost:9200/nonanalyzed\_example -d '{"mappings": {
"mytype": { "_source": {"enabled": true}, "properties": { "content": { "type": "string", "index": "not_analyzed" } } } } }'
curl -XPOST http://localhost:9200/nonanalyzed\_example/mytype -d '{"content": "test"}'
curl -XPOST http://localhost:9200/nonanalyzed\_example/mytype -d '{"content": "test two"}'And now both of our queries turn out as expected:
curl -XPOST http://localhost:9200/nonanalyzed\_example/mytype/\_search?pretty -d '{ "query": {"term": {"content": "test"}}
}'{
"took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "nonanalyzed_example", "_type" : "mytype", "_id" : "AVHov1xVWVxYklVnp_1H", "_score" : 1.0, "_source":{"content": "test"} } ] } }
curl -XPOST http://localhost:9200/nonanalyzed\_example/mytype/\_search?pretty -d '{ "query": {"term": {"content": "test two"}}
}'{
"took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "nonanalyzed_example", "_type" : "mytype", "_id" : "AVHov4K7WVxYklVnp_1I", "_score" : 1.0, "_source":{"content": "test two"} } ] } }
It’s up to your application’s needs. Some examples are document properties that map to identifiers external to Elasticsearch or things like URL slugs.
An application at PMG needed some exact matching on certain fields as well as the normal search functionality Elasticsearch provides. We ended up creating a specially named field that was not analyzed specifically to do the term and terms queries we needed.
Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
Interested in working with us? See our open engineering roles here.
Stay in touch
Subscribe to our newsletter
By clicking and subscribing, you agree to our Terms of Service and Privacy Policy