原文來自: HttpClient POST 的 UTF-8 編碼問題

Apache HttpClient ( http://jakarta.apache.org/commons/httpclient/ ) 是一個純 Java 的HTTP 協(xié)議的客戶端編程工具包, 對 HTTP 協(xié)議的支持相當全面, 更多細節(jié)也可以參考IBM 網站上的這篇文章 HttpClient入門 ( http://www-128.ibm.com/developerworks/cn/opensource/os-httpclient/ ).

問題分析

不過在實際使用中, 還是發(fā)現(xiàn)按照最基本的方式調用 HttpClient 時, 并不支持 UTF-8 編碼, 在網絡上找過一些文章, 也不得要領, 于是查看了 commons-httpclient-3.0.1 的一些代碼, 首先在 PostMethod 中找到了 generateRequestEntity() 方法:
????/**
?????*?Generates?a?request?entity?from?the?post?parameters,?if?present.??Calls
?????*?{@link?EntityEnclosingMethod#generateRequestBody()}?if?parameters?have?not?been?set.
?????*?
?????*?@since?3.0
?????*/
????protected?RequestEntity?generateRequestEntity()?{
????????if?(!this.params.isEmpty())?{
????????????//?Use?a?ByteArrayRequestEntity?instead?of?a?StringRequestEntity.
????????????//?This?is?to?avoid?potential?encoding?issues.??Form?url?encoded?strings
????????????//?are?ASCII?by?definition?but?the?content?type?may?not?be.??Treating?the?content
????????????//?as?bytes?allows?us?to?keep?the?current?charset?without?worrying?about?how
????????????//?this?charset?will?effect?the?encoding?of?the?form?url?encoded?string.
????????????String?content?=?EncodingUtil.formUrlEncode(getParameters(),?getRequestCharSet());
????????????ByteArrayRequestEntity?entity?=?new?ByteArrayRequestEntity(
????????????????EncodingUtil.getAsciiBytes(content),
????????????????FORM_URL_ENCODED_CONTENT_TYPE
????????????);
????????????return?entity;
????????}?else?{
????????????return?super.generateRequestEntity();
????????}
????}

原來使用 NameValuePair 加入的 HTTP 請求的參數(shù)最終都會轉化為 RequestEntity 提交到 HTTP 服務器, 接著在 PostMethod 的父類 EntityEnclosingMethod 中找到了如下的代碼:
????/**
?????*?Returns?the?request's?charset.??The?charset?is?parsed?from?the?request?entity's?
?????*?content?type,?unless?the?content?type?header?has?been?set?manually.?
?????*?
?????*?@see?RequestEntity#getContentType()
?????*?
?????*?@since?3.0
?????*/
????public?String?getRequestCharSet()?{
????????if?(getRequestHeader("Content-Type")?==?null)?{
????????????//?check?the?content?type?from?request?entity
????????????//?We?can't?call?getRequestEntity()?since?it?will?probably?call
????????????//?this?method.
????????????if?(this.requestEntity?!=?null)?{
????????????????return?getContentCharSet(
????????????????????new?Header("Content-Type",?requestEntity.getContentType()));
????????????}?else?{
????????????????return?super.getRequestCharSet();
????????????}
????????}?else?{
????????????return?super.getRequestCharSet();
????????}
????}


解決方案

從上面兩段代碼可以看出是 HttpClient 是如何依據 "Content-Type" 獲得請求的編碼(字符集), 而這個編碼又是如何應用到提交內容的編碼過程中去的. 按照這個原來, 其實我們只需要重載 getRequestCharSet() 方法, 返回我們需要的編碼(字符集)名稱, 就可以解決 UTF-8 或者其它非默認編碼提交 POST 請求時的亂碼問題了.

測試

首先在 Tomcat 的 ROOT WebApp 下部署一個頁面 test.jsp, 作為測試頁面, 主要代碼片段如下:
<%@?page?contentType="text/html;charset=UTF-8"%>
<%@?page?session="false"?%>
<%
request.setCharacterEncoding("UTF-8");
String?val?=?request.getParameter("TEXT");
System.out.println(">>>>?The?result?is?"?+?val);
%>


接著寫一個測試類, 主要代碼如下:
????public?static?void?main(String[]?args)?throws?Exception,?IOException?{
????????String?url?=?"http://localhost:8080/test.jsp";
????????PostMethod?postMethod?=?new?UTF8PostMethod(url);
????????//填入各個表單域的值
????????NameValuePair[]?data?=?{
????????????????new?NameValuePair("TEXT",?"中文"),
????????};
????????//將表單的值放入postMethod中
????????postMethod.setRequestBody(data);
????????//執(zhí)行postMethod
????????HttpClient?httpClient?=?new?HttpClient();
????????httpClient.executeMethod(postMethod);
????}
????
????//Inner?class?for?UTF-8?support
????public?static?class?UTF8PostMethod?extends?PostMethod{
????????public?UTF8PostMethod(String?url){
????????????super(url);
????????}
????????@Override
????????public?String?getRequestCharSet()?{
????????????//return?super.getRequestCharSet();
????????????return?"UTF-8";
????????}
????}


運行這個測試程序, 在 Tomcat 的后臺輸出中可以正確打印出 ">>>> The result is 中文" .

代碼下載

本文所提到的所有代碼, 以及測試程序(可直接導入 eclipse)提供打包下載: att:HttpClient POST 的 UTF-8 編碼問題.httpClientUTF8.tar.bz2

END