using
?System.IO;
using
?System.Text;
using
?System.Text.RegularExpressions;
using
?System.Net;
1.先取得網(wǎng)頁(yè)的原代碼
?Uri?url=new?Uri("http://www.tkk7.com/wujun");
????????????HttpWebRequest?request=(HttpWebRequest)?WebRequest.Create(url);
????????????HttpWebResponse?response?=?(HttpWebResponse)request.GetResponse();
????????????Stream?stream?=?response.GetResponseStream();
????????????StreamReader?sr?=?new?StreamReader(stream);
????????????string?str=sr.ReadToEnd();
????????????sr.Close();
????????????stream.Close();
????????????response.Close();
得到網(wǎng)頁(yè)的html源代碼以后。再根據(jù)源代碼分析 所有 <a href ="url">?? 最后得到 href后面? url的鏈接地址
正則表達(dá)式
????Regex?RegExFindHref?=?new?Regex(@"<a\s+([^>]*\s*)?href\s*=\s*(?:""(?<1>[/\a-z0-9_][^""]*)""|'(?<1>[/\a-z0-9_][^']*)'
|(?<1>[/\a-z0-9_]\S*))(\s[^>]*)?>(?<2>.*?)</a>",?RegexOptions.Singleline?|?RegexOptions.IgnoreCase?|?RegexOptions.Compiled);
循環(huán)讀出 連接地址
?????for?(Match?m?=?RegExFindHref.Match(str);?m.Success;?m?=?m.NextMatch())
????????????{
???????????????TextBox1.Text+=?m.Groups[1].ToString()+"\n";
????????????
????????????}
運(yùn)行后
TextBox1 將顯示分析后的所有網(wǎng)頁(yè)的連接 :
http://www.dotlucene.net/
http://www.castleproject.org/
http://www.codeplex.com/
http://www.codeproject.com/
http://www.asp.net/
http://www.nhibernate.org/
http://www.tkk7.com/wujun/CommentsRSS.aspx
http://www.tkk7.com/wujun/archive/2006/10/23/47150.html#76745
http://www.tkk7.com/wujun/archive/2006/10/23.html
http://www.tkk7.com/wujun/archive/2006/10/23/76769.html
http://www.tkk7.com/wujun/archive/2006/10/23/76769.html
http://www.tkk7.com/wujun/archive/2006/10/23/76769.html#FeedBack
http://www.tkk7.com/wujun/admin/EditPosts.aspx?postid=76769
http://www.tkk7.com/wujun/AddToFavorite.aspx?id=76769
http://www.tkk7.com/wujun/archive/2006/10/20.html
?......
..............
?.........................等等等。。。
剛剛看到一個(gè)JAVA版。
地址:?
http://www.tkk7.com/ekinglong/archive/2006/10/27/77688.html