<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    漁人碼頭

    天行健,君子以自強(qiáng)不息。地勢(shì)坤,君子以厚德載物。
    posts - 12, comments - 16, trackbacks - 0, articles - 43
      BlogJava :: 首頁 :: 新隨筆 :: 聯(lián)系 :: 聚合  :: 管理

    在用Java的HttpURLConnection 來下載網(wǎng)頁,發(fā)現(xiàn)訪問google的網(wǎng)站時(shí),會(huì)被google拒絕掉。

    ?????? try
    ??????? {
    ??????????? url = new URL(urlStr);
    ??????????? httpConn = (HttpURLConnection) url.openConnection();
    ??????????? HttpURLConnection.setFollowRedirects(true);

    ??????????? // logger.info(httpConn.getResponseMessage());
    ??????????? in = httpConn.getInputStream();
    ??????????? out = new FileOutputStream(new File(outPath));

    ??????????? chByte = in.read();
    ??????????? while (chByte != -1)
    ??????????? {
    ??????????????? out.write(chByte);
    ??????????????? chByte = in.read();
    ??????????? }
    ??????? }
    ??????? catch (MalformedURLException e)
    ????????{
    ?????????}
    ??????? }



    經(jīng)過一段時(shí)間的研究和查找資料,發(fā)現(xiàn)是由于上面的代碼缺少了一些必要的信息導(dǎo)致,增加更加詳細(xì)的屬性

    ??????????? httpConn.setRequestMethod("GET");
    ??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");

    完整代碼如下:
    ?? public static void DownLoadPages(String urlStr, String outPath)
    ??? {
    ??????? int chByte = 0;
    ??????? URL url = null;
    ??????? HttpURLConnection httpConn = null;
    ??????? InputStream in = null;
    ??????? FileOutputStream out = null;

    ??????? try
    ??????? {
    ??????????? url = new URL(urlStr);
    ??????????? httpConn = (HttpURLConnection) url.openConnection();
    ??????????? HttpURLConnection.setFollowRedirects(true);
    ??????????? httpConn.setRequestMethod("GET");
    ??????????? httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
    ???????????
    ??????????? // logger.info(httpConn.getResponseMessage());
    ??????????? in = httpConn.getInputStream();
    ??????????? out = new FileOutputStream(new File(outPath));

    ??????????? chByte = in.read();
    ??????????? while (chByte != -1)
    ??????????? {
    ??????????????? out.write(chByte);
    ??????????????? chByte = in.read();
    ??????????? }
    ??????? }
    ??????? catch (MalformedURLException e)
    ??????? {
    ??????????? e.printStackTrace();
    ??????? }
    ??????? catch (IOException e)
    ??????? {
    ??????????? e.printStackTrace();
    ??????? }
    ??????? finally
    ??????? {
    ??????????? try
    ??????????? {
    ??????????????? out.close();
    ??????????????? in.close();
    ??????????????? httpConn.disconnect();
    ??????????? }
    ??????????? catch (Exception ex)
    ??????????? {
    ??????????????? ex.printStackTrace();
    ??????????? }
    ??????? }
    ??? }

    此外,還有第二種方法可以訪問Google的網(wǎng)站,就是用apache的一個(gè)工具HttpClient 模仿一個(gè)瀏覽器來訪問Google

    ??????? Document document = null;
    ??????? HttpClient httpClient = new HttpClient();
    ???????
    ??????? GetMethod getMethod = new GetMethod(url);
    ??????? getMethod.setFollowRedirects(true);
    ??????? int statusCode = httpClient.executeMethod(getMethod);
    ???????
    ??????? if (statusCode == HttpStatus.SC_OK)
    ??????? {
    ??????????? InputStream in = getMethod.getResponseBodyAsStream();
    ??????????? InputSource is = new InputSource(in);

    ??????????? DOMParser domParser = new DOMParser();?? //nekoHtml 將取得的網(wǎng)頁轉(zhuǎn)換成dom
    ??????????? domParser.parse(is);
    ??????????? document = domParser.getDocument();
    ???????????
    ??????????? System.out.println(getMethod.getURI());
    ???????????
    ??????? }
    ??????? return document;

    推薦使用第一種方式,使用HttpConnection 比較輕量級(jí),速度也比第二種HttpClient 的快。


    評(píng)論

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2006-12-11 16:08 by Fisher
    轉(zhuǎn)載一些代碼,使用HttpUrlConnection來模擬ie form登陸web:


    關(guān)于java模擬ie form登陸web的問題

    HttpURLConnection urlConn=(HttpURLConnection)(new URL(url).openConnection());
    urlConn.addRequestProperty("Cookie",cookie);
    urlConn.setRequestMethod("POST");
    urlConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
    urlConn.setFollowRedirects(true);
    urlConn.setDoOutput(true); // 需要向服務(wù)器寫數(shù)據(jù)
    urlConn.setDoInput(true); //
    urlConn.setUseCaches(false); // 獲得服務(wù)器最新的信息
    urlConn.setAllowUserInteraction(false);
    urlConn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
    urlConn.setRequestProperty("Content-Language","en-US" );
    urlConn.setRequestProperty("Content-Length", ""+data.length());

    DataOutputStream outStream = new DataOutputStream(urlConn.getOutputStream());
    outStream.writeBytes(data);
    outStream.flush();
    outStream.close();

    cookie=urlConn.getHeaderField("Set-Cookie");
    BufferedReader br=new BufferedReader(new InputStreamReader(urlConn.getInputStream(),"gb2312"));


    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2007-04-09 17:03 by dongle
    好文,解決我的大問題了

    # 這樣真的能解決問題嗎?  回復(fù)  更多評(píng)論   

    2007-05-31 22:12 by Rachel
    我寫了段提取網(wǎng)頁內(nèi)容的程序,批量訪問此網(wǎng)站下的明細(xì)網(wǎng)頁內(nèi)容并抓取(http://cn.made-in-china.com

    測(cè)試時(shí)執(zhí)行沒問題
    執(zhí)行到幾十次后,返回都是空
    再后來一次都不靈了
    訪問URL的代碼跟你寫的幾乎一樣
    獲取的是以下結(jié)果


    <p>Due to network security, your access to Made-in-China.com
    has been temporarily denied.</p>
    <p>In order to provide you with safe and stable web services,
    we have to prevent abuse of Made-in-China.com by implementing
    additional security measures. We hope you understand
    and cooperate with us.</p>

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2007-05-31 22:35 by Rachel
    最后
    拔掉router
    再插上
    解析正常 :-)

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2007-07-04 16:51 by smalltiger
    非常感謝你的這篇目文章!幫了我的大忙了,想和你交個(gè)朋友,可以的話請(qǐng)加我的Q:109030035或者M(jìn)SN:109030035@qq.com

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2008-02-25 10:00 by Fisher
    好久沒有搞Java了,想不到這么多朋友看了我的帖子,呵呵
    很高興能幫到樓上的那個(gè)朋友。

    最近我發(fā)現(xiàn)有個(gè)叫網(wǎng)絡(luò)爬蟲的開源組建那些,應(yīng)該會(huì)比我這個(gè)辦法好

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2008-02-26 22:51 by qyxxpd.com
    @Rachel
    我寫了段提取網(wǎng)頁內(nèi)容的程序,批量訪問此網(wǎng)站下的明細(xì)網(wǎng)頁內(nèi)容并抓取(http://cn.made-in-china.com

    其實(shí)你用.MainWebFetcher.DownLoadPages("http://cn.made-in-china.com/", "C://tmp//test.txt");

    http://cn.made-in-china.com后加/就行了.

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2008-05-08 09:43 by abyer
    你這個(gè)如何驗(yàn)證用戶名和密碼啊

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2011-03-09 09:27 by whs
    你這是模擬IE嗎?你是模擬火狐好不?標(biāo)題都搞錯(cuò)

    # re: 關(guān)于java模擬ie 訪問web網(wǎng)站的解決方法  回復(fù)  更多評(píng)論   

    2011-08-26 11:57 by noname
    傻子,別看到一個(gè)Mozilla/4.0就以為是火狐,半吊子好好學(xué)著,別出來丟人現(xiàn)眼。@whs
    主站蜘蛛池模板: 中文字幕在线日亚洲9| a毛片在线免费观看| 亚洲精品视频免费| 一级毛片免费观看不卡的| 亚洲国产成人综合| 亚洲欧洲日产国码一级毛片| 免费无码VA一区二区三区| 亚洲av无码专区在线观看下载| 国产成人精品亚洲精品| 巨波霸乳在线永久免费视频 | 一区国严二区亚洲三区| 免费看搞黄视频网站| 亚洲精品成a人在线观看☆| 亚洲精品你懂的在线观看| 毛片免费全部播放一级| 国产综合免费精品久久久| 亚洲国产乱码最新视频 | 亚洲av成本人无码网站| 亚洲av日韩av激情亚洲| 国产精品嫩草影院免费| 91精品啪在线观看国产线免费| 国产成人人综合亚洲欧美丁香花 | 亚洲第一第二第三第四第五第六| 亚洲色成人中文字幕网站 | 亚洲一区二区三区久久久久| 国产成人毛片亚洲精品| 嫩草影院在线免费观看| 无码日韩精品一区二区免费暖暖 | 国产精品亚洲精品观看不卡| 亚洲午夜久久久影院伊人| 国产麻豆免费观看91| 8x成人永久免费视频| 中文字幕免费在线看| 精品在线视频免费| 亚洲中文字幕AV在天堂| 亚洲专区在线视频| 亚洲成AV人片在线观看无码 | 亚洲AV综合色区无码一二三区| 亚洲第一极品精品无码久久| yy6080久久亚洲精品| 国产美女无遮挡免费视频网站|