November 1st, 2007by NSS

Uses for the digging scrapper

Today I added a little scrapper script to NSS2.  It allows you to scrape articles from ezine articles, images from google images, and the youtube code for videos.

It by default pulls in 15 results for each thing based on a keyword.  You can increase this but if you overdo it at ezine articles they ban you.  I’ve found I can scrape a result set once an hour or so and have zero problems.  If I try and do 4 an hour  I usually don’t get much further.  I think about 50 pulls from ezine articles over a short time period and they catch you.

The structure is thus:

images  (the url the image came from)
images/gfx (The actual thumbnail of the image)

Articles (The article from ezine articles.  (H2 tags for the title, p tags for the body)

video (the embed code from youtube).

To add that content to the “foundation” sites in NSS2 you need to do a couple changes.

1.) Articles need to be reformated a tad.  The following code works for me:

Replace every closing p tag in the articles with this:

Replace:   </p>

With:        </p><p>&nbsp;</p>

I use a program called simple search and replace to do this.  It’s free and can be found with a quick google search.

Then you need to make three folders:

video
articles
rotateimage

The html files in the video and article files can just be moved over as is.  But you want to put the files in the images/gfx folder in the rotateimages folder.

Obviously you can use this for other things than the foundation sites.


blackhat White hat

Share and Enjoy:
  • Digg
  • del.icio.us
  • TailRank
  • Technorati

Posted in Dig Scraper |

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.