Adsense

web scrapping Ruby



Hpricot is a HTML parser, fantastic ruby library, easy to install and easy usage
To install
sudo gem install hpricot open-uri
open-uri is using a network streams
here i posted a simple web scraping code
This code to fetch the group of student results from the Annauniversity website
01# FETCH MY CLASS STUDENTS EXAM RESULT FROM ANNAUNIVERSITY SITE
02# PROGAMME NAME SCRABING_EXAM_RESULTS.RB
03# AUTHOR : RAJKUMAR.S
04# VERSION : 0.01
05# LICENSE: GNU GPL 3
06
07require 'rubygems'
08require 'open-uri'
09require 'hpricot'
10
12# EXAM_NO IS A RANGE
13exam_no = "52108621001".."52108621039"
14
15exam_no.each do |each_number|
16doc=Hpricot(open(url+each_number))
17data=doc.search('table')
18# WRITE A FILE AS HTML FORMAT EASILY VIEW ALL RESULTS IN ONE PAGE
19File.open("result.html","a") {|f| f.puts(data)}
20# FIND THE INSIDE CONTENT OF TABLE TAG
21x=doc.search('table').inner_html
22# IT IS REMOVE THE HTML TAGS
23a=x.gsub(/<\/?[^>]*>/,"")
24# SPEARATE AN ARRAY WHERE \N IS PLACED
25b=a.split.join("\n")
26puts b+"\n"+"======================="
27
28File.open("result.txt","a") { |f| f.puts(b+"\n\n"+"=================")}
29
30end

newest questions on wordpress