## Implementation of Search Modes

In general, search vendors use dissimilar naming conventions and search mode implementations (1 - 5). Thus, the following documentation does not apply to third-party search services.

A search mode is a particular way of instructing a search engine how to interpret a query. Commands known as search operators are used. Some operators are boolean, consisting of the logic operations NOT, OR, XOR, and AND.

In Minerazzi, all queries are of the form

query = search operator:search terms(1)

where a colon (:) must be used to limit search operators from search terms. If no operator is declared, the default AND operator is used. So a query consisting of three terms (w1, w2, and w3) like this

query = w1 w2 w3(2)

is interpreted like this

query = AND:w1 w2 w3(3)

Queries can be in upper, lower, or mixed cases as Minerazzi is case-insensitive.

### Implementation

An easy way to search without having to memorize search operators is through our Match Previews interface. The interface supports the 4 mentioned search modes plus 4 distance-based modes. It also supports the corresponding complement modes (non-matches) for a total of 16 search modes. These are listed in the following table.

Boolean-based Modes | |||
---|---|---|---|

Mode | Operator | Alias | What it does |

NOT | NOT: | Exclude | Excludes a term while matching the rest of the terms in any order or proximity. |

XOR | XOR: | Uneven; Odd | Matches uneven number of terms (1, 3, 5,..) in any order or proximity. |

OR | OR: | Any | Matches any term in any order or proximity. |

AND | AND: | All | Matches all terms in any order or proximity. |

Distance-based Modes | |||

Mode | Operator | Alias | What it does |

NEAR | NR: or n: | Around | Matches the first two search terms in any order and no more than a given number of terms from one another or themselves. Use it to find passages that start and end with any two terms in any order. |

XNEAR | XNR: or ~n: | Exclusive near; Proximity | Matches the first two search terms in any order and no more than a given number of terms from one another. Use it to find passages that start and end with different terms in any order. |

ADJACENCY | AD: or _n: | Block | Matches the first two search terms, in a strict order, and no more than a given number of terms from one another. Use it to find nonoverlapping passages (term blocks) that start and end with different terms in a strict order. |

EXACT | EX: | Sequence | Matches an exact sequence of terms, phrases, and clauses. Use it to find an exact passage. |

As Minerazzi is case insensitive, the operators can be in upper or lower cases. Again, a colon (:) follows a search operator. Note that for distance based operators, we can use shortcuts by declaring an n number of words separating the search terms. The default value is n = 5. As expected, n = 0 reduces to an EXACT search. Thus,

- n: find matches separated by n words as in a NEAR search.
- ~n: find matches separated by n words as in an XNEAR search.
- _n: find matches separated by n words as in an ADJACENCY search.

### Understanding Complement Search Modes

Every search mode has a complement mode that finds its number of "nonmatches". The complements of boolean-based modes are the results of applying a boolean operation followed by a NOT inversion operation. The result is a logic complement. By contrast, the complements of distance-based modes are not logic, but subtractive; i.e., relative to a reference mode. We currently use AND as our reference mode.

In C-like programming languages (PHP, JavaScript,...), NOT is represented with the exclamation character ("!"). In logic, negation is also called the *logical complement*.

So NOT, OR, XOR, and AND, each followed by NOT yields the NNOT, NOR, XNOR, and NAND complement(ary) modes. Here NNOT means NOT NOT or a double negation ("!!").

All these additional search modes enhance the user's search experience. To illustrate, if a query consists of 5 terms, an XOR search can retrieve documents matching 1, 3, or 5 of these terms while an XNOR search can find documents matching 0, 2, or 4 of them.

To achieve similar results with a search engine lacking of these modes, a user would need to combine search modes in a rather cumbersome way or submit several queries individually and then merge the results. With Minerazzi's Match Previews interface, all that hassle is avoided by submitting a single search, like this

XOR:w1 w2 w3 w4 w5(4)

or like this

XNOR:w1 w2 w3 w4 w5(5)

### Drawbacks of Complement Modes: NOR Dumps

The total number of matches and nonmatches from a search mode and its logic complement can be used to dump all records from a collection, at least in theory.

This is easy to demonstrate.

Assume that we search a collection with an OR query consisting of three terms w1, w2, and w3. The following drum-shaped figure represent this collection. The colored regions represent the number of OR matches while the white region represents the number of nonmatches or NOR results.

We may refer to this "NOR white region" as an empty subset, { }, to indicate that this is a subset of documents matching none of the search terms.

While the { } set frequently consists of irrelevant documents, it should be stressed that it might include relevant documents. For instance, documents with synonyms of the search terms or containing non-query terms with high-order co-occurrences with the terms being searched can be relevant. Semantic-based algorithms like latent semantic indexing (LSI) have been used to extract these types of documents from the { } set, but so far this is an open IR problem.

Said "NOR white region" or { } set is also present in the following figures, whose colored regions correspond from left to right to search results from the NOT, XOR, and AND modes; i.e., assuming the same search terms. In a truth table representation, { } corresponds to the row where all entries are 0 (i.e., false).

In each case, { } plus the white regions of the Venn Diagrams represent the corresponding NNOT, XNOR, and NAND results. Thus, given a search mode and its complement, it is theoretically possible to dump an entire collection. We refer to this as **Database Dumps**. In the case of search engines that allow combination of search modes, an easy combination would be something equivalent to an OR negation (NOT OR), in which case we may refer to it as **NOR Dumps**.

Fortunately, query-driven database dumps can be avoided in many ways; for instance by

- disabling complement modes.
- limiting complement mode results.
- removing or ignoring all { } results.

Option 1 eliminates any benefit derived from complement mode searches. Option 2, adopted in early versions of Match Previews, limits those benefits. Option 3 effectively disables NOR searches.

We have adopted a combination of Options 2 and 3: removing { } results from the search results of all complement modes while limiting NOR results to the first 30 results. This is a *happy medium* solution and a small price to pay. Actually, resetting complement mode results in this way simplifies their interpretation.

### Final Remarks

As search vendors frequently have contradictory naming conventions (1 - 5), some of the conventions given in this article might differ from others found on the Web. At the time of writing, some of the complement modes herein described are not yet widely implemented by commercial search engines.

### References

- Kostofff, R. N., RIgsby, J. T., and Barth, R. B. (2006). Adjacency and Proximity Searching in the Science Citation Index and Google.
- University of Leeds, UK. The University Library (2013). Advanced Medline.
- Alliant Library. Accessed on 9-19-2013. Proximity Operators.
- Wikipedia. (Accessed on 9-18-2013). Proximity Search.
- Exalead. (Accessed on 9-18-2013). Web Search Syntax.