Hpricot is a HTML parser, fantastic ruby library, easy to install and easy usage
To install
sudo gem install hpricot open-uri
open-uri is using a network streams
here i posted a simple web scraping code
This code to fetch the group of student results from the Annauniversity website
13 | exam_no = "52108621001" .. "52108621039" |
15 | exam_no. each do |each_number| |
16 | doc=Hpricot(open(url+each_number)) |
17 | data=doc.search( 'table' ) |
19 | File .open( "result.html" , "a" ) {|f| f.puts(data)} |
21 | x=doc.search( 'table' ).inner_html |
23 | a=x.gsub(/<\/?[^>]*>/, "" ) |
26 | puts b+ "\n" + "=======================" |
28 | File .open( "result.txt" , "a" ) { |f| f.puts(b+ "\n\n" + "=================" )} |