I am trying to scrap some data from a Podcast (such as "Title" or link to mp3 file).
Intuitively, this should be very easy because the structure of Podcast xml files is very simple.
So I did "Load Page" with the URL http://radiofrance-podcast.net/podcast/rss_13305.xml
I expected to have this structure in the treeview:
- channel
+ title
+ link
+...
- item
+ title
+ link
+ enclosure
+ guid
In this case, getting the mp3 file is very easy: just identify the tag with the path .*.item.guid
However, I was very surprised to see that RoboMaker adds a comment in the xml code:
Kapow RoboSuite: This is an HTML representation of an XML document
Looking at the treeview, the structure is much more complicated, and each tag is
surrounded by span tags such as < span class="start-tag" >
If I use the tag automatically generated by RoboMaker, then it is not robust, and if the structure of the
Podcast changes slightly, then it cannot find the right tag again. Making it more robust requires a lot
of effort, namely due to the fact that this is a html representaiton of xml.
So what am I doing wrong? Is there a way to load the page in its native xml format? Something like "Load xml"
instead of "Load page" action?