Welcome to openkapow Sign in | Join
.

Scrape Email from a Page - Best Practices

  •  03-20-2007, 3:02 PM

    Scrape Email from a Page - Best Practices

    Hi All,

    I'm sure there are a lot of people out there that would be interested in a Robot that can scrape email from a page and send it back as an RSS or CSV string using REST (0 to many).

    As such I would like to offer this post as a way to publicly develop such a functional Robot, - for public consumption.
    I would like to use it as a Robot to fire while crawling certain sites - maybe save the output to my DB.

    Anyway, my take on it would be to set it up as follows:


    Input Value (baseurl)-> LoadPage(baseurl)->SetGlobalVar("")-> For Each URL-> Test URL (email regex)->If Match Extract(RESTOutput)-> Return Output

    Any better ideas - or existing Email Scraping Robots would be appreciated.

    (I'll post a link to the Robot here, when finished)

    Thanks & Kapow!,

    CM

     

    Filed under: ,
View Complete Thread
.
Copyright 2006, 2007 KapowTech.com All Rights Reserved Company | Contact | Terms | Privacy