XML & Web Feed Flattener
- This tool was originally designed to flatten the tree structure of a Web feed. However, as of 06-23-2018, the tool also flattens the tree structure of XML files so we changed its name to indicate this. The content of this page is still valid.
- The above change takes the tool to a new level. For instance, now you can flatten the tree structure of files like sitemaps.xml and similar files.
- To use it, submit the full URL of the file or feed you are interested in, including its http(s) scheme.
- Because this tool parses documents with the XML, RSS, Atom, and RDF formats, it is suitable for flattening document trees from files other than Web feeds. Candidate files may include site maps, inventory files, and similar files as long as they have the .xml, .rss, .atom, or .rdf extensions.
- A web feed is a data format used for providing users with frequently updated content. Blog feeds aimed at online communities imparts a social component to the technology. In general, web syndication technology is a form of social communication (Wikipedia, 2017a; 2017b).
- One way of mining web feeds consists in adopting the following two-step strategy:
- Step 1: Convert a web feed into a multidimensional associative array, A.
- Step 2: Flatten A into a one-dimensional array, B.
- The rest is a matter of reading and manipulating the key-value pairs of A and B and use that information for other text mining purposes, like document validation or the design of a feed parser capable of discriminating between feed formats (i.e., XML, ATOM, RSS, and RDF).
- This tool does that, precisely. The tool works fairly well with local and remote feeds, but might fail to properly convert a feed if its access is blocked or if the feed is not a valid XML document. The tool has been tested with several news sites that offer web feeds, like the news service at MIT (MIT News, 2017), and with several blog feeds. We have found other interesting uses for the tool. Our XML & Feed URLs Extractor is an example of this.
- Anyone that need to mine web feeds and their tree structures.
- Flatten the site maps of Google at
- https://www.google.com/sitemap.xml (returns 21 records).
- http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml (warning: returns 100,000 records).
- Flatten a web feed obtained from a news source like MIT News (MIT, 2017).
- Flatten several web feeds with the ATOM, RSS, and RDF formats and compare results.
- Using the output from the previous exercise, propose object notation forks that allows you to discriminate between these feed formats. Rewrite these forks using array notation.
Contact us for any suggestion or question regarding this tool.