Query Types

Couchbase Full Text Search supports multiple types of query.

Introduction to Query Types

Full Text Search allows text-data to be queried. Multiple options are provided for ensuring the right kinds of match. This page describes the purpose of each query-type, and provides sample JSON objects that indicate how queries can be constructed.

Available query-types include:

  • Simple Queries: Accept input-text in the form of words and phrases, and attempt to find matches across bodies of text that have been indexed. Analyzers are applied to both input and target, potentially to strip out unnecessary characters, reduce words to the basic stems on which matching should occur, handle punctuation, and more. Additionally, match accuracy-levels can be specified; and multiple queries can be expressed together, with their respective priorities boosted, (to ensure their results' prominence in the eventual result-set).

  • Compound Queries: Accept multiple queries simultaneously, and return either the conjunction of results from the result-sets, or a disjunction.

  • Range Queries: Accept ranges for dates and numbers, and return documents that contain values within those ranges.

  • Query String Queries: Accept query strings, which express query-requirements in a special syntax.

  • Geospatial Queries: Accept longitude-latitude coordinate pairs, in order to return documents that specify a geographical location.

  • Non-Analytic Queries: Accept words and phrases on which exact matches only are returned. No analysis is performed.

  • Special Queries: For testing purposes, return either all of the documents in an index, or none.

These query-types are explained in greater detail below. Examples are provided, using the Couchbase REST API query-syntax. (Note that Full Text Search can also be performed with the Couchbase Web Console and the Couchbase SDK.) The JSON data refers to the travel-sample bucket, and assumes the existence of a Full Text Index named hotel.

To run the examples using curl, use the following syntax:

curl -u Administrator:password -X POST -H "Content-Type: application/json" \
  -d '{your query in JSON here...}' \
  http://localhost:8094/api/index/index_name/query

Note that the examples below show only the JSON fragments that constitute non-generic parts of the queries they describe. For actual use in a Full Text Search, these JSON fragments should be wrapped in the following generic configuration:

{
  "explain": false,
  "fields": [
    "*"
  ],
  "highlight": {},
  "query":{ your_query_details_here }
}

For more information on using the REST API to perform queries, see Searching with the REST API.

Simple Queries

Match Query

A match query analyzes input text, and uses the results to query an index. Options include specifying an analyzer, performing a fuzzy match, and performing a prefix match. By default, the analyzer used for the input-text is that previously used on the target-text, during index-creation. For information on analyzers, see Understanding Analyzers.

When fuzzy matching is used, if the single parameter is set to a non-zero integer, the analyzed text is matched with a corresponding level of fuzziness.

When a prefix match is used, the prefix_length parameter specifies that for a match to occur, a prefix of specified length must be shared by the input-term and the target text-element.

The following JSON object demonstrates specification of a match query:

{
 "match": "location hostel",
 "field": "reviews.content",
 "analyzer": "standard",
 "fuzziness": 3,
 "prefix_length": 4
}

A match query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Match Phrase Query

The input text is analyzed, and a phrase query is built with the terms resulting from the analysis. This type of query searches for terms in the target that occur in the positions and offsets indicated by the input: this depends on term vectors, which must have been included in the creation of the index used for the search.

For example, a match phrase query for location for functions is matched with locate the function, if the standard analyzer is used: this analyzer uses a stemmer, which tokenizes location and locate to locat, and reduces functions and function to function. Additionally, the analyzer employs stop removal, which removes small and less significant words from input and target text, so that matches are attempted on only the more significant elements of vocabulary: in this case for and the are removed. Following this processing, the tokens locat and function are recognized as common to both input and target; and also as being both in the same sequence as, and at the same distance from one another; and therefore a match is made.

{
 "match_phrase": "very nice",
 "field": "reviews.content"
}

A match phrase query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Fuzzy Query

A fuzzy query matches terms within a specified edit (or Levenshtein) distance: meaning that terms are considered to match when they are to a specified degree similar, rather than exact. A common prefix of a stated length may be also specified as a requirement for matching.

Fuzziness is specified by means of a single integer. For example:

{
 "term": "interest",
 "field": "reviews.content",
 "fuzziness": 2
}

Fuzzinesss is demonstrated by means of the Java SDK, in the context of the term query (see below), in java-sdk::full-text-searching-with-sdk.adoc. Note that two such queries are specified, with the difference in fuzziness between them resulting in different forms of match, and different sizes of result-sets.

Prefix Query

A prefix query finds documents containing terms that start with the specified prefix.

{
 "prefix": "inter",
 "field": "reviews.content"
}
Regexp Query

A regexp query finds documents containing terms that match the specified regular expression.

{
 "regexp": "inter.+",
 "field": "reviews.content"
}

A regexp query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Wildcard Query

A wildcard query uses a wildcard expression, to search within individual terms for matches. Wildcard expressions can be any single character (?) or zero to many characters (*). Wildcard expressions can appear in the middle or end of a term, but not at the beginning.

{
 "wildcard": "inter*",
 "field": "reviews.content"
}

A wildcard query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Boolean Field Query

A boolean field query searches fields that contain boolean true or false values. A boolean field query searches the actual content of the field, and should not be confused with the boolean queries (described below, in the section on compound queries) that modify whether a query must, should, or must not be present.

{
 "bool": true,
 "field": "free_breakfast"
}

Compound Queries

Conjunction Query (AND)

A conjunction query contains multiple child queries. Its result documents must satisfy all of the child queries.

{
 "conjuncts":[
   {"field":"reviews.content", "match": "location"},
   {"field":"free_breakfast", "bool": true}
 ]
}

A conjunction query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Disjunction Query (OR)

A disjunction query contains multiple child queries. Its result documents must satisfy a configurable min number of child queries. By default this min is set to 1. For example, if three child queries — A, B, and C — are specified, a min of 1 specifies that the result documents should be those returned uniquely for A (with all returned uniquely for B and C, and all returned commonly for A, B, and C, omitted).

{
 "disjuncts":[
   {"field":"reviews.content", "match": "location"},
   {"field":"free_breakfast", "bool": true}
 ]
}

A disjunction query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Boolean Query

A boolean query is a combination of conjunction and disjunction queries. A boolean query takes three lists of queries:

  • must: Result documents must satisfy all of these queries.

  • should: Result documents should satisfy these queries.

  • must not: Result documents must not satisfy any of these queries.

{
 "must": {
   "conjuncts":[{"field":"reviews.content", "match": "location"}]},
 "must_not": {
   "disjuncts": [{"field":"free_breakfast", "bool": false}]},
 "should": {
   "disjuncts": [{"field":"free_breakfast", "bool": true}]}
}
Doc ID Query

A doc ID query returns the indexed document or documents among the specified set. This is typically used in conjunction queries, to restrict the scope of other queries’ output.

{ "ids": [ "hotel_10158", "hotel_10159" ] }

A doc ID Query is demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Range Queries

Date Range Query

A date range query finds documents containing a date value, in the specified field within the specified range. Dates should be in the format specified by RFC-3339, which is a specific profile of ISO-8601. Define the endpoints using the fields start and end. One endpoint can be omitted, but not both. The inclusiveStart and inclusiveEnd properties in the query JSON control whether or not the endpoints are included or excluded.

{
 "start": "2001-10-09T10:20:30-08:00",
 "end": "2016-10-31",
 "inclusive_start": false,
 "inclusive_end": false,
 "field": "review_date"
}
Numeric Range Query

A numeric range query finds documents containing a numeric value in the specified field within the specified range. Define the endpoints using the fields min and max. You can omit one endpoint, but not both. The inclusiveMin and inclusiveMax properties control whether or not the endpoints are included or excluded. By default, min is inclusive and max is exclusive.

{
 "min": 100, "max": 1000,
 "inclusive_min": false,
 "inclusive_max": false,
 "field": "id"
}

A numeric range Query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Query String Query

A query string can be used, to express a given query by means of a special syntax.

{ "query": "+nice +view" }

A query string Query is demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc. Note also that the Full Text Searches conducted with the Couchbase Web Console themselves use query strings. (See Searching from the UI.)

Certain queries supported by FTS are not yet supported by the query string syntax. This includes wildcards, regexp, and date range queries.

Using the query string syntax, the following query types can be performed:

Match

A term without any other syntax is interpreted as a match query for the term in the default field. The default field is _all.

For example, water performs a Match Query for the term water.

Match Phrases

Placing the search terms in quotes performs a match phrase query. For example: light beer performs a match phrase query for the phrase light beer.

Field Scoping

The specified field in which to search can be specified by prefixing the term with a field-name, separated by a colon. For example, description:water performs a match query for the term water, in the description field.

Required, Optional, and Exclusion

When a query string includes multiple items, by default these are placed into the SHOULD clause of a boolean query. This can be adjusted, by prefixing items with either + or -. Prefixing with + places that item in the MUST portion of the boolean query. Prefixing with - places that item in the MUST NOT portion of the boolean query.

For example, +description:water -light beer performs a boolean query that MUST satisfy the match query for the term water in the description field, MUST NOT satisfy the match query for the term light in the default field, and SHOULD satisfy the match query for the term beer in the default field. Result documents satisfying the SHOULD clause score higher than those that do not.

Boosting

When multiple query-clauses are specified, the relative importance a given clause can be specified by suffixing it with the ^ operator, followed by a number. For example, description:water name:water^5 performs Match Queries for water in both the name and description fields, but documents having the term in the name field score higher.

Numeric Ranges

Numeric ranges can be specified with the >, >=, <, and <= operators, each followed by a numeric value. For example, abv:>10 performs a numeric range query on the abv field, for values greater than 10.

A numeric range query is demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Non-Analytic Queries

Term and Phrase queries support no analysis on their inputs. This means that only exact matches are returned.

In most cases, given the benefits of using analyzers, use of match and match phrase queries is preferable to that of term and phrase. For information on analyzers, see Understanding Analyzers.

Term Query

A term query is the simplest possible query. It performs an exact match in the index for the provided term.

{
  "term": "locate",
  "field": "reviews.content"
}

Term queries are also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Phrase Query

A phrase query searches for terms occurring in the specified position and offsets. It performs an exact term-match for all the phrase-constituents, without using an analyzer.

{
  "terms": ["nice", "view"],
  "field": "reviews.content"
}

A phrase query is also demonstrated by means of the Java SDK, in java-sdk::full-text-searching-with-sdk.adoc.

Geospatial Queries

Geospatial queries return documents that each specify a geographical location. Each query contains either:

  • A single longitude-latitude coordinate pair; and a distance value, in miles, which determines a radius measured from the location specified by the coordinate pair. Documents are returned if they specify (by means of a longitude-latitude coordinate pair) a location that lies within the radius.

  • Two longitude-latitude coordinate pairs. These are respectively taken to indicate the upper left and lower right corners of a bounding box. Documents are returned if they specify a location that lies within the bounding box.

A geospatial query must be applied to an index that applies the geopoint type mapping to the document-field that contains the target longitude-latitude coordinate pair.

More detailed information is provided in Geospatial Queries.

Special Queries

Special queries are usually employed either in combination with other queries, or to test the system.

Match All Query

Matches all documents in an index, irrespective of terms. For example, if an index is created on the travel-sample bucket for documents of type zucchini, the match all query returns all document IDs from the travel-sample bucket, even though the bucket contains no documents of type zucchini.

{ "match_all": {} }
Match None Query

Matches no documents in the index.

{ "match_none": {} }