AD-NAD

xnr, nxnr

AD and NAD Searches

Results from an adjacency (AD) search are a subset of the AND, NR, and XNR results.

{AND} ⊇ {NR} ⊇ {XNR} ⊇ {AD} (1)

A subtractive complement, relative to AND, is also defined.

{NAD} = {AND} - {AD} (2)

This is depicted in the next figure. There is a clear distinction between AND, near (NR), proximity (XNR), and adjacency searches (AD) (1).

AD searches

Figure 1. AD results as a subset of AND results.

The blue region represents the number of AD matches and the pink one the number of AD nonmatches or NAD. The sum of both regions represents the total number of AND results. So NAD is the subtractive complement of AD, relative to the AND set.

In other words, the blue region corresponds to the number of AND results that match an AD search whereas the pink region corresponds to the number of AND results that do not match said search. We could have computed this complement relative to XNR, but we elected not to do that.

Applications to IR

In our implementation, AD matches the first two search terms, in a strict order, and no more than AD number of terms from one another.

When declared in a query, the 'AD' abbreviation is also interpreted as a place holder for a value and as a distance operator. Thus, the query

_AD:w1 w2 w3 ...(3)

with AD = 10

_10:w1 w2 w3...(4)

instructs the platform to find documents where w1 is separated by no more than 10 words from w2.

The underscore at the beginning is required so when the query is parsed it will be recognized as an AD search and not as an NR or XNR search.

As a result, an AD search can find documents with passages (portions of texts) starting with w1 and ending with w2.

Unlike window passages defined in the IR literature (2 - 5) which are defined by fragmenting documents, these are defined by the query and can be overlapping or nonoverlapping in nature. We define an overlapping passage as one that starts or ends within other passages of similar word lengths. This is illustrated in Figure 2.

AD searches

Figure 2. Representation of overlapping and nonoverlapping adjacency passages starting with w1 and ending with w2.

Final Remarks

Minerazzi supports the AD search mode and its subtractive complement, relative to the AND mode. This mode helps users discriminate between AND results based on a word distance criterion that is defined at query time.

References

  1. Kostofff, R. N., RIgsby, J. T., and Barth, R. B. (2006). Adjacency and Proximity Searching in the Science Citation Index and Google.
  2. Kaszkiel, M. and Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology, 52(4):344-364.
  3. Callan, J.P. (1994). Passage-level evidence in document retrieval. In B.W. Croft & C.J. van Rijsbergen (Eds.), Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July (pp. 302-310), New York: ACM.
  4. Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In N. J. Belkin, D. Narasimhalu, & P. Willett (Eds.), Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA (pp. 178-185).
  5. Liu, X. and Croft, B. Language models for information retrieval: Passage retrieval based on language models. Proceedings of the eleventh international conference on Information and knowledge management, pp.375-382, November 2002.