原文來自:
HttpClient POST 的 UTF-8 編碼問題
Apache HttpClient ( http://jakarta.apache.org/commons/httpclient/ ) 是一個純 Java 的HTTP 協(xié)議的客戶端編程工具包, 對 HTTP 協(xié)議的支持相當(dāng)全面, 更多細(xì)節(jié)也可以參考IBM 網(wǎng)站上的這篇文章 HttpClient入門 ( http://www-128.ibm.com/developerworks/cn/opensource/os-httpclient/ ).
問題分析
不過在實際使用中, 還是發(fā)現(xiàn)按照最基本的方式調(diào)用 HttpClient 時, 并不支持 UTF-8 編碼,
在網(wǎng)絡(luò)上找過一些文章, 也不得要領(lǐng), 于是查看了 commons-httpclient-3.0.1 的一些代碼, 首先在 PostMethod
中找到了 generateRequestEntity() 方法:
????/** ?????*?Generates?a?request?entity?from?the?post?parameters,?if?present.??Calls ?????*?{@link?EntityEnclosingMethod#generateRequestBody()}?if?parameters?have?not?been?set. ?????*? ?????*?@since?3.0 ?????*/ ????protected?RequestEntity?generateRequestEntity()?{ ????????if?(!this.params.isEmpty())?{ ????????????//?Use?a?ByteArrayRequestEntity?instead?of?a?StringRequestEntity. ????????????//?This?is?to?avoid?potential?encoding?issues.??Form?url?encoded?strings ????????????//?are?ASCII?by?definition?but?the?content?type?may?not?be.??Treating?the?content ????????????//?as?bytes?allows?us?to?keep?the?current?charset?without?worrying?about?how ????????????//?this?charset?will?effect?the?encoding?of?the?form?url?encoded?string. ????????????String?content?=?EncodingUtil.formUrlEncode(getParameters(),?getRequestCharSet()); ????????????ByteArrayRequestEntity?entity?=?new?ByteArrayRequestEntity( ????????????????EncodingUtil.getAsciiBytes(content), ????????????????FORM_URL_ENCODED_CONTENT_TYPE ????????????); ????????????return?entity; ????????}?else?{ ????????????return?super.generateRequestEntity(); ????????} ????} |
原來使用 NameValuePair 加入的 HTTP 請求的參數(shù)最終都會轉(zhuǎn)化為 RequestEntity 提交到 HTTP 服務(wù)器, 接著在 PostMethod 的父類 EntityEnclosingMethod 中找到了如下的代碼:
????/** ?????*?Returns?the?request's?charset.??The?charset?is?parsed?from?the?request?entity's? ?????*?content?type,?unless?the?content?type?header?has?been?set?manually.? ?????*? ?????*?@see?RequestEntity#getContentType() ?????*? ?????*?@since?3.0 ?????*/ ????public?String?getRequestCharSet()?{ ????????if?(getRequestHeader("Content-Type")?==?null)?{ ????????????//?check?the?content?type?from?request?entity ????????????//?We?can't?call?getRequestEntity()?since?it?will?probably?call ????????????//?this?method. ????????????if?(this.requestEntity?!=?null)?{ ????????????????return?getContentCharSet( ????????????????????new?Header("Content-Type",?requestEntity.getContentType())); ????????????}?else?{ ????????????????return?super.getRequestCharSet(); ????????????} ????????}?else?{ ????????????return?super.getRequestCharSet(); ????????} ????} |
解決方案
從上面兩段代碼可以看出是 HttpClient 是如何依據(jù) "Content-Type" 獲得請求的編碼(字符集),
而這個編碼又是如何應(yīng)用到提交內(nèi)容的編碼過程中去的. 按照這個原來, 其實我們只需要重載 getRequestCharSet() 方法,
返回我們需要的編碼(字符集)名稱, 就可以解決 UTF-8 或者其它非默認(rèn)編碼提交 POST 請求時的亂碼問題了.
測試
首先在 Tomcat 的 ROOT WebApp 下部署一個頁面 test.jsp, 作為測試頁面, 主要代碼片段如下:
<%@?page?contentType="text/html;charset=UTF-8"%> <%@?page?session="false"?%> <% request.setCharacterEncoding("UTF-8"); String?val?=?request.getParameter("TEXT"); System.out.println(">>>>?The?result?is?"?+?val); %> |
接著寫一個測試類, 主要代碼如下:
????public?static?void?main(String[]?args)?throws?Exception,?IOException?{ ????????String?url?=?"http://localhost:8080/test.jsp"; ????????PostMethod?postMethod?=?new?UTF8PostMethod(url); ????????//填入各個表單域的值 ????????NameValuePair[]?data?=?{ ????????????????new?NameValuePair("TEXT",?"中文"), ????????}; ????????//將表單的值放入postMethod中 ????????postMethod.setRequestBody(data); ????????//執(zhí)行postMethod ????????HttpClient?httpClient?=?new?HttpClient(); ????????httpClient.executeMethod(postMethod); ????} ???? ????//Inner?class?for?UTF-8?support ????public?static?class?UTF8PostMethod?extends?PostMethod{ ????????public?UTF8PostMethod(String?url){ ????????????super(url); ????????} ????????@Override ????????public?String?getRequestCharSet()?{ ????????????//return?super.getRequestCharSet(); ????????????return?"UTF-8"; ????????} ????} |
運(yùn)行這個測試程序, 在 Tomcat 的后臺輸出中可以正確打印出 ">>>> The result is 中文" .
代碼下載
本文所提到的所有代碼, 以及測試程序(可直接導(dǎo)入 eclipse)提供打包下載:
att:HttpClient POST 的 UTF-8 編碼問題.httpClientUTF8.tar.bz2END