NR-NNR

nr, xnr

NR and NNR Searches

Results from a NEAR (NR) search are a subset of AND results.

{AND} ⊇ {NR} (1)

A subtractive complement relative to AND is also defined.

{NNR} = {AND} - {NR} (2)

This is depicted in the next figure.

NR searches

Figure 1. NR results as a subset of AND results.

The blue region represents the number of NR matches and the pink one the number of NR nonmatches or NNR. The sum of both regions represents the total number of AND results. So NNR is the subtractive complement of NR, relative to the AND set.

In other words, the blue region corresponds to the number of AND results that match an NR search whereas the pink region corresponds to the number of AND results that do not match said search.

Applications to IR

In our implementation, a near search matches the first two search terms, in any order, and no more than NR number of terms from one another or themselves.

When declared in a query, the 'NR' abbreviation is also interpreted as a place holder for a value and as a distance operator. Thus, the query

NR:w1 w2 w3 ...(3)

with NR = 10 is written as

10:w1 w2 w3...(4)

and instructs the platform to find documents where

  1. w1 is separated by no more than 10 words from w2
  2. w1 is separated by no more than 10 words from w1
  3. w1 is separated by no more than 10 words from w1
  4. w2 is separated by no more than 10 words from w2

As a result, a near search can find documents with passages (portions of texts) starting with w1 or w2 and ending with w1 or w2.

Unlike window passages defined in the IR literature (1 - 4) which are defined by fragmenting documents, these are defined by the query and can be overlapping or nonoverlapping in nature. We define an overlapping passage as one that starts or ends within other passages of similar word lengths. This is illustrated in Figure 2.

NR searches

Figure 2. Representation of overlapping and nonoverlapping near passages starting with w1 or w2 and ending with w1 or w2.

Final Remarks

Minerazzi supports the NR search mode and its subtractive complement, relative to the AND mode. This mode helps users discriminate between AND results based on a word distance criterion that is defined at query time.

References

  1. Kaszkiel, M. and Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology, 52(4):344-364.
  2. Callan, J.P. (1994). Passage-level evidence in document retrieval. In B.W. Croft & C.J. van Rijsbergen (Eds.), Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July (pp. 302-310), New York: ACM.
  3. Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In N. J. Belkin, D. Narasimhalu, & P. Willett (Eds.), Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA (pp. 178-185).
  4. Liu, X. and Croft, B. Language models for information retrieval: Passage retrieval based on language models. Proceedings of the eleventh international conference on Information and knowledge management, pp.375-382, November 2002.