XML & Feed URLs Extractor
- Other url mining tools:
- RAR Parser
- URL Cleaner
- URL Query Parser
- XML and Feed Flattener
- and many more »
- This tool was originally designed to extract URLs from Web feeds. However, as of 06-23-2018, the tool also extracts URLs from XML files so we changed its name to indicate this. The content of this page is still valid.
- The above change takes the tool to a new level. For instance, now you can extract URLs from files like sitemaps.xml and similar files.
- To use it, submit the full URL of the file or feed you are interested in, including its http(s) scheme.
- Because this tool parses documents with the XML, RSS, Atom, and RDF formats, it is suitable for extracting URLs from files other than Web feeds. Candidate files may include site maps, inventory files, and similar files as long as they have the .xml, .rss, .atom, or .rdf extensions.
- A web feed is a data format used for providing users with frequently updated content. Blog feeds aimed at online communities imparts a social component to the technology. In general, web syndication technology is a form of social communication (Wikipedia, 2017a; 2017b) that is rich in urls, waiting to be extracted and mined.
- This tool is derived from a previous one: The XML & Web Feed Flattener.
- Once a document with the above extensions is flattened, the tool extracts all of its URLs, effectively working as a URL extractor.
- As of 05-20-2018, the tool converts URLs to links, deduplicates URLs generated by hash (#) characters (e.g., due to comments), and lets users exclude those pointing to images.
- Anyone that need to extracts URLs from feeds.
- Extract URLs from the site maps of Google at
- https://www.google.com/sitemap.xml (returns 21 records, all .xml site maps).
- http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml (warning: returns 50,000 records, all .gz compressed files).
- Extract feed URLs from a news source like Google, Bing, MIT News, and similar sources.
- Interacting with third party tools: Queryfeed searches on social networks, such as Twitter, Google Plus, Instagram, and others, returning query results as RSS feeds. For instance, the following RSS URLs are obtained by querying [donald trump] or [trump] in Queryfeed:
- Twitter search field: https://queryfeed.net/twitter?q=donald+trump&title-type=user-name-both&geocode=
- Instagram search field: https://queryfeed.net/instagram?q=trump
- Google Plus search field: https://queryfeed.net/plus?q=donald+trump
- Wikipedia (2017a). Web Feeds.
- Wikipedia (2017b). Web Syndication.
Feedback
Contact us for any suggestion or question regarding this tool.