Turning searchers into data miners

What is Minerazzi?

Minerazzi is a platform for building miners.
  • Mission Statement: To turn searchers into data miners, big data predators, and collection curators.
  • Vision Statement: To develop a new class of productivity search engines.

What is a miner?

A miner is a topic-specific search engine with a novel search paradigm: letting users recrawl and mine search results while they search. The result is a new generation of more efficient and productive search engines that go beyond lists of links and text snippets.

Miners can be built from individual web pages (from-the-bottom-up) or a pre-existent directory or sitemap (from-top-to-bottom), turning said directory or sitemap into a data mining machine.

Because each miner is a microindex loaded with data extraction tools, users can deploy these to enhance or build their own curated collections.

What is a microindex?

We define a microindex as a small collection of primary URLs about a given topic or knowledge domain. Recrawling these allows users discover new, secondary URLs with fresh and somehow related content.

In many cases, a single primary record lets users discover dozen or hundred of secondary records with front- and back-end information waiting to be consumed. The total number of primary and secondary URLs defines the reach of a microindex.

Since December 1 of 2014, we are the first search solution that allows users recrawling their own search results. This is done with several complementary tools:

  • A URL crawler (). This tool allows users qualitatively extract resources from multiple markup fields and elements.
  • A link crawler (). This tool allows users quantitatively extract resources by walking the link structure of sites.
  • Dozens of element-specific crawling tools for discovering front- and back-end information.
  • Dozens of extraction/mining tools and tutorials.

To do the above users only need to access one or a few links from a microindex or while recrawling. To illustrate, submit a query and then run the tool located within a search result. Try with any of the following queries and miners:

Query
Miner
[ stanford ]
[ mit ]
[ cornell ], etc
[ wikileaks ]
[ indymedia ]
[ investigative reporting ], etc
[ most wanted ]
[ background checks ]
[ fbi ], etc

Why is recrawling so important?

Recrawling lets users walk the link graph of a site and discover hidden or fresh goldmines that search engines might not have discovered. In addition, recrawling exposes users to new content and involve them in learning through discovery.

By recrawling search results, users can build curated collections, self-guide investigative work, or gather link intelligence from sites, directories, blogs, forums, or social networks.

At this time, Minerazzi recrawls files with the most common formats (.php, .asp, .aspx, .html, .htm, .js, etc). For recrawling to be useful, however, the content of a file should not be obfuscated or blocked.


Why microindexing?

Don't underestimate or mistakenly take the term micro in microindexing. Microindexing is a curation strategy that simplifies the building of third-party large collections. If you already have a curated collection, building a miner out of it allows you to mine your carefully collected resources. This can be done without investing in expensive web scraping services or human resources.

When querying a microindex, users can find relevant results with fewer keywords. For instance, if a microindex is about a particular virus disease, searching for [ vaccine ], [ treatment ], or [ remedies ] should return relevant results with less typing. If you are a researcher or librarian, you will love microindexing.

In general, Minerazzi turns searching into a data mining activity. This makes more sense than limiting the search experience to browsing through zillion of cached records or staring at a list of links. The problem with that is that frequently those records are either outdated or irrelevant, not to mention that users essentially become passive expectators.

What you can do with Minerazzi?


Search

Build a search engine about a given topic like news, music, health, legal, human resources...

Index

Index hard-to-find documents. Help others to find what they want. Be a leader instead of a follower.

Mine

Extract contact information and Web Intelligence from search result pages. Be a data miner.

Who can use it and how?
teachers, students

Teachers: Build a search engine about disciplines, journals, lecture materials...

researchers

Researchers: Deploy a search engine about company resources, projects, tools...

marketers

Business Intelligence: Collect network and users contact information, keywords....

football

Anyone: Build search engines about popular topics like sport, shopping, social styles, recipes, games....

The Minerazzi Difference


A User-Centered Experience

Minerazzi places users at the center of the action. Instead of reducing their search experience to staring at a list of results, Minerazzi allows them to interact with those results. Users actually become paparazzies of information, chasing down and mining data—from here the name of our platform.

While searching with Minerazzi, users can extract all sort of contact information (phone numbers, email addresses,..), query-driven data (keywords, tracking codes,..), and server configuration records. The data gathered can then be used for any marketing or research purpose.

From Searching to Mining

This approach, wherein users are engaged in learning through discovery, data gathering, and analysis is a natural evolution of the traditional concept of searching.

titles
titles
descriptions
descriptions
emails
emails
phones
phones
Query Tracks
query tracks
Signals
search signals
Co-Occurrence
co-occurrence
Keywords Matrix
matrix
Title tags
title tags
description tags
desc tags
keyword tags
kw tags
all tags
all tags
Mine it all!

Doing More Effortless

Moreover, with Minerazzi users can:

  • Reformulate queries by clicking on keywords. No need to waste time with keyword brainstorming sessions.
  • Modify search modes by clicking on match counts. No need to memorize or manipulate search commands.
  • Accept URL submissions by regular email, thus from practically anywhere.
  • Submit queries that require of diacritics like tildes and accents.

As an added feature, Minerazzi allows you to follow query-relevant sites across the top social networks and search engines.


pinterest
google maps

bing


flickr
facebook
twitter


linkedin
tumblr
instagram


google

you tube
googleplus

Extraction Tools

Because Minerazzi is loaded with about 40 extraction tools, users can extract and mine all sort of data while searching. A sample of some these tools are given below.

  • News - Access news headlines relevant to a miner.
  • Word Statistics - Extract statistics and candidate keywords.
  • Directory Exploits - Find exploits through robots text files.
  • Configuration Exploits - Test possible misconfigurations.
  • Email Tool - Extract email addresses.
  • Phone Tool - Extract phone numbers.
  • Geolocation Tool - Extract geolocation data.
  • URL Tool - Extract all kind of URLs, not just from links.
  • Image Tool - View images from web sites.
  • CSS Tool - Get external and internal .css files
  • Colors - Extract colors from external and internal .css files.
  • Scripts - Get all javascripts from web pages.
  • HTTP Headers - Examine server configuration headers.
  • Mail Exchangers - Spot email remote servers.
  • DNS Checks - Check available DNS services.
  • DNS Records - Determine DNS records.
  • Web Plugins - Identify third-party tracking codes.
  • Source Codes - Read file source codes.
  • Meta Tags - Get OpenGraph, Twitter, DC, traditional tags.
  • and a lot more.

Without a doubt, our platform induces users to spend more time researching and mining instead of merely searching.

New Search Paradigms


Match Previews

Minerazzi is the first search engine that allows users preview the total number of matches and nonmatches from all available search modes.

This is achieved through our unique Match Previews interface. The interface makes multimodal searches possible, reduces query costs, and helps users adopt a search strategy based on self-generated feedback.


X Searches

Our Match Previews interface also provides full native support for X Searches. These are searches based on the XOR and XNOR search modes.

When combined with other IR algorithms and techniques (LSI, LDA, Semantics,...), these search modes provide new information retrieval paradigms. If you are into research and data mining, you will love X Searches.

A Call to Action

Goodbye One-Way Searching

Did you realize that the days of one-way, machine-centered searching and staring at lists of search results ended with the last century? Two-way, user-centered searches are here to stay.

Why keep sleeping with the past or teaching outdated textbook stuff? Be part of new information retrieval paradigms.

You are at the right place, at the right time. As we make the platform widely available, sure there will be bumps along the way or things to improve.

Let's grow and learn together.


Contact us

You may contact us for any business inqueries or general questions at admin@minerazzi.com.