瀹樼綉涓嬭澆涓涓?a target="_blank">涓閿畨瑁呭寘鏃㈠彲,linux涓嬬殑瀹夎,澶уGoogle涓嬪氨鏈夊緢澶氭暀紼嬩簡(jiǎn).瀵逛簬IDE緗戜笂璇碞etBeans鏀寔寰楀緢瀹岀編,浣嗘槸鍥犱負(fù)鏈漢姣旇緝鍠滄Eclipse,鎵浠ヨ繕鏄窡澶у鎺ㄨ崘EasyEclipse for Ruby and Rails,褰撶劧浣犲彲浠ラ夋嫨鍙笅RoR鐨勬彃浠惰屼笉寮勪釜鍏ㄦ柊鐨凟clipse.
浠ュ墠涓鐩村湪鐢↗ava鍐欑埇铏伐鍏鋒姄鍥劇墖,瀵笻ttpClient鍖呰,姝e垯琛ㄨ揪寮忓鐞嗛偅涓槸绱晩,灝辯畻寮勫ソ浜?jiǎn)宸ュ咃L(fēng)被,鏈夋椂鍊欎竴浼?xì)鍙堟兂涓嶈典h潵鏀懼摢鍎?浣哛uby瀵規(guī)柟闈㈠寘瑁呯殑灝卞緢寮哄ぇ,鐭煭鍑犲崄琛屼唬鐮佸氨鎼炲畾浜?jiǎn)杩欎竴鍒?
欏甸潰鑾峰彇鍜屾枃浠朵笅杞界殑鏂規(guī)硶.
util.rb:
require 'net/http'
def query_url(url)
return Net::HTTP.get(URI.parse(url));
end
def save_url(url,dir,filename)
filename = url[url.rindex('/')+1, url.length-1] if filename == nil || filename.empty?
require 'open-uri'
Dir.mkdir("#{dir}") if dir != nil && !dir.empty? && !FileTest.exist?(dir)
open(url) do |fin|
if true
File.new("#{dir}#{filename}","wb").close
open("#{dir}#{filename}","wb") do |fout|
while buf = fin.read(1024) do
fout.write buf
STDOUT.flush
end
end
end
end
end
鎶撳彇鍥劇墖鐨勫叿浣撳簲鐢?
require "util"
begin
start_url = 'http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm'
while start_url != nil && !start_url.empty? do
print "寮濮嬩笅杞?{start_url}\n"
content = query_url(start_url)
next_page = content.scan(/ <a href="(.*?)" class="next-page"><span>涓嬩竴欏?/span><\/span><\/a>/)
next_url = nil
next_url = next_page[0][0] if next_page != nil && next_page.length > 0 && next_page[0].length > 0
imgs = content.scan(/<img src="(http:\/\/img[\d].*?)" \/>/)
for img in imgs
url = img[0];
save_url(url,"d:\\mall\\",nil)
end
start_url = next_url;
# break;
end
end
浣跨敤涓澶╀箣鍚庢劅瑙塺uby鐨勮娉曞緢鑷劧,寰堝ソ鐞嗚В,涓婃墜姣旇緝瀹規(guī)槗,鑰屼笖鐩稿叧鍖呭皝瑁呯殑涔熷緢濂?紜疄姣旇緝閫傚悎鎷挎潵鐜╃帺灝忕▼搴?

]]>