Welcome to openkapow Sign in | Join
in Search

News

Is "Web Scraping" robust toward site changes?

The #1 question that comes up with people, when they first hear about using "Web Scraping" technology to turn web sites into API's, is Robustness, but in reality it is almost a non-issue.

Let me explain why. The technology on openkapow, is the same as Kapow Technologies has been selling to more than 200 customers worldwide, which together must have deployed at least one hundred thousand robots. Even though this is the first question asked by each and every customer, it never has been raised again by any of them when they start using the product, since in reality it is a non-issue.

 There are several reasons for this:

  • Web sites in reality changes layout very rarely, in average I would say once per year.
  • The fully customizable "tag finder" technology used in openkapow, which locate objects in the DOM tree based on smart relative pattern matching, in reality only breaks, when a web site has a major change of layout and functionality, something very rare.
  • In the rare event that something changes that cause a robot to break, the "debug path" in the error report from running a robot, directly lead to the broken place in the robot, thus locating and fixing a bug, most often is a 2 minute job.

openkapow web scraping is using the visual interface of a website, which is often a lot easier to access and understand than even a real API. After having used openkapow for a while, users realize this means that the ability to create a perfect fit API quickly with openkapow, most often is easier even for websites that publish other real API's, since those API's likely will not fit perfectly to the desired API signature and resultset.

It takes a few hours to learn how to use RoboMaker, and running over the tutorials is a must, but it's worth it.

Try it out!!!

Published Saturday, February 24, 2007 10:24 AM by stefan

Comments

 

redsoda said:

hi guys - how funny you went all web 2.0.. nice to see openkapow.com - but what's the point with the sign in process to do comments ? is is some kind of ROII or whatever ?

March 8, 2007 12:39 PM
 

stefus said:

Sorry, I think it is a "feature" of the communityserver.org software that openkapow is build on. We will look into it.

March 23, 2007 4:33 PM
 

Erik Weegenaar1 said:

I work with Kapow Robosuite 6.1 very often. The sites that I scrape very often change. Most changes are minor (such as a change in the tag), but other are major (like changing the language, used to name a tag, into another language or changing URL's within the site (to redirect tot non-flash pages for instance)).

It costs me time (between minutes and hours) every week!

April 11, 2007 3:13 AM
 

stefan said:

Of course you cannot completely overcome this issue, but we have tried our best to make the support burden minimal.  We we originally ran the kapow.net marketplace we maintained more than 4000 robots with less than one person, and that was back when we had version 4.11

It all boils down to what problem you need to solve and if "web scraping" is the right solution. If self-service, need to do it now, do-not-control the data, etc, is essential, then use it with great success.

August 7, 2007 2:31 PM
Anonymous comments are disabled
Copyright 2006, 2007 KapowTech.com All Rights Reserved Company | Contact | Terms | Privacy