這個是發(fā)生在上周周末的真實案例,因為cxf client 端線程安全導致的錯誤,總結(jié)出來希望其他使用cxf的兄弟注意。
首先描述一下背景,簡單的說就是使用cxf作為web service的客戶端,運行在weblogic上,連接外部的服務器。為了測試需要,開發(fā)了一個簡單的模擬器模擬服務器端,準備在release之前跑穩(wěn)定性測試。
結(jié)果出問題了,在排除掉一些干擾和諸如網(wǎng)絡環(huán)境,設置等之后問題依舊,由于系統(tǒng)負責,包括ws的模擬器也是出了一個之前沒有試過的方法,因此費了不少時間來查找問題。過程很枯燥,應該很多人經(jīng)歷過,在一個大的系統(tǒng)中找到一個小錯誤的出處,可以說是一門學問,技術(shù)耐心和運氣都是需要的.....跳出這個過程,由于問題表現(xiàn)在web service的網(wǎng)絡連接在這個異常上,在服務器端模擬器的日志中有大量的這種異常信息:
2009-07-24 19:23:22,898 DEBUG ( : ) (tomcat-exec-56) [Http11NioProcessor] - Error parsing HTTP request header
java.io.EOFException: Unexpected EOF read on the socket
at org.apache.coyote.http11.InternalNioInputBuffer.readSocket(InternalNioInputBuffer.java:589)
at org.apache.coyote.http11.InternalNioInputBuffer.parseRequestLine(InternalNioInputBuffer.java:425)
at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:825)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:719)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2080)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
2009-07-24 19:23:22,898 DEBUG ( : ) (tomcat-exec-56) [Http11NioProcessor] - Error parsing HTTP request header
java.io.EOFException: Unexpected EOF read on the socket
at org.apache.coyote.http11.InternalNioInputBuffer.readSocket(InternalNioInputBuffer.java:589)
at org.apache.coyote.http11.InternalNioInputBuffer.parseRequestLine(InternalNioInputBuffer.java:425)
at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:825)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:719)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2080)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
而服務器端模擬器這次是我們第一次使用tomcat和coyote,因此懷疑是tomcat的問題,在再三追查代碼無果的情況下,決定換一個服務器端模擬器來確認問題所在:到底是cxf的客戶端的問題,還是服務器端模擬器。一個簡單的模擬器寫出來了,一個跳過所有業(yè)務邏輯直接調(diào)用cxf客戶端實現(xiàn)代碼的測試小程序?qū)懗鰜砹耍瑴y試之后發(fā)現(xiàn),問題依舊。于是將目光集中到cxf的客戶端上。
在測試中發(fā)現(xiàn)這樣一個規(guī)律,在上述服務器端的異常發(fā)生前,在客戶端中總是會有規(guī)律的出現(xiàn)下面這個異常:
Jul 24, 2009 10:36:18 PM org.apache.cxf.phase.PhaseInterceptorChain doIntercept
INFO: Interceptor has thrown exception, unwinding now null
tps = 25
Exception in thread "Thread-41" 2009-07-24 22:36:19,925 149585 [Thread-41] (********Impl.java:459) ERROR junit.framework.Test - Got an exception when invoking **** service:javax.xml.ws.WebServiceException: java.lang.NullPointerException

(這里的信息是和業(yè)務相關的,不方便打出,總之和我們討論的問題無關)
at test.TestMci.execute(TestMci.java:84)
at test.TestMci.access$1(TestMci.java:81)
at test.TestMci$TestThread.run(TestMci.java:90)
at java.lang.Thread.run(Thread.java:595)
Caused by: javax.xml.ws.WebServiceException: java.lang.NullPointerException
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:142)
at $Proxy40.authorizeAndPurchase(Unknown Source)
at *********************
6 more
Caused by: java.lang.NullPointerException
at org.apache.cxf.transport.http.HTTPConduit.prepare(HTTPConduit.java:483)
at org.apache.cxf.interceptor.MessageSenderInterceptor.handleMessage(MessageSenderInterceptor.java:46)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:226)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:469)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:299)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:251)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:73)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:124)
8 more
看來問題是出現(xiàn)在這里了。
進一步的測試發(fā)現(xiàn),低壓力下比如2-3個工作線程,基本不會有任何問題,因此可以解釋為什么功能測試時不出現(xiàn)問題。工作線程加到10個,基本上還算問題,只有極其偶然的會出現(xiàn)一次兩次這個異常。進一步加大工作線程,由于受到測試機器的性能限制,10個工作線程和100個工作線程的tps基本相同,都大體在300TPS左右(客戶端在筆記本上跑的,cpu已經(jīng)90%+了)。測試的結(jié)果是上面的異常開始變的有規(guī)律,恩,非常搞笑的規(guī)律,大概每10000次左右的請求就發(fā)生一次上述異常,非常的穩(wěn)定而執(zhí)著的重現(xiàn)。服了,這么有規(guī)律而重現(xiàn)性極好的錯誤,還真是難得一見........
分析一下問題,在tps基本保持不變的情況下,客戶端線程從10增加到100問題就變得明顯。因此問題的焦點直指線程安全這個老大難問題,重新審視我們使用cxf的代碼,發(fā)現(xiàn)有個地方
/**
* the server service
*/
MessagingChannelServiceImplService mcsis = null;
/**
* the sever service port
*/
MessagingChannelService mcs = null;
mcsis = new MessagingChannelServiceImplService(serviceWsdlUrl);
mcs = mcsis.getMessagingChannelServiceImplPort();
MessagingChannelService是cxf自動生成的,這個是@WebService,其他的業(yè)務代碼都是調(diào)用它上面的業(yè)務方法來實現(xiàn)。由于serviceWsdlUrl不變,因此我們重用了一些東西,避免每次都初始化一次。看來問題出現(xiàn)在這里,試著將代碼修改為threadlocal,讓每個線程都初始化一次然后保存給自己使用。
修改后的客戶端代碼在之后的測試中,非常穩(wěn)定,沒有再出現(xiàn)上面的異常,問題算是解決了。
我對cxf不是很熟悉,找了一下也沒有找到到底是那里造成的線程不安全,google了一下找到幾個地方,但是似乎還不能完全說明問題。先列出來慢慢研究:
1) Are JAX-WS client proxies thread safe?
裝載自這里:
http://cxf.apache.org/faq.html#FAQ-AreJAXWSclientproxiesthreadsafe%253F
Official JAX-WS answer: No. According to the JAX-WS spec, the client proxies are NOT thread safe. To write portable code, you should treat them as non-thread safe and synchronize access or use a pool of instances or similar.
CXF answer: CXF proxies are thread safe for MANY use cases. The exceptions are:
* Use of ((BindingProvider)proxy).getRequestContext() - per JAX-WS spec, the request context is PER INSTANCE. Thus, anything set there will affect requests on other threads. With CXF, you can do:
((BindingProvider)proxy).getRequestContext().put("thread.local.request.context", "true");
((BindingProvider)proxy).getRequestContext().put("thread.local.request.context", "true");
and future calls to getRequestContext() will use a thread local request context. That allows the request context to be threadsafe. (Note: the response context is always thread local in CXF)
* Settings on the conduit - if you use code or configuration to directly manipulate the conduit (like to set TLS settings or similar), those are not thread safe. The conduit is per-instance and thus those settings would be shared.
* Session support - if you turn on sessions support (see jaxws spec), the session cookie is stored in the conduit. Thus, it would fall into the above rules on conduit settings and thus be shared across threads.
For the conduit issues, you COULD install a new ConduitSelector that uses a thread local or similar. That's a bit complex though.
For most "simple" use cases, you can use CXF proxies on multiple threads. The above outlines the workarounds for the others.
2) cxf的wiki中談到Client API中的Proxy-based API
wiki 地址:
http://cwiki.apache.org/CXF20DOC/jax-rs.html
Limitations
Proxy methods can not have @Context method parameters and subresource methods returning Objects can not be invoked - perhaps it is actually not too bad at all - please inject contexts as field or bean properties and have subresource methods returning typed classes : interfaces, abstract classes or concrete implementations.
Proxies are currently not thread-safe.
3) thread safe issue caused by XMLOutputFactoryImpl
找到的一個cxf的bug,
https://issues.apache.org/jira/browse/CXF-2229
Description
Currently CXF calls StaxUtils.getXMLOutputFactory() to get the cached instance of XMLOutputFactoryImpl. But XMLOutputFactoryImpl.createXMLStreamWriter is not thread-safe. See below.
javax.xml.stream.XMLStreamWriter createXMLStreamWriter(javax.xml.transform.stream.StreamResult sr, String encoding) throws javax.xml.stream.XMLStreamException {
try{
if(fReuseInstance && fStreamWriter != null && fStreamWriter.canReuse() && !fPropertyChanged){
fStreamWriter.reset();
fStreamWriter.setOutput(sr, encoding);
if(DEBUG)System.out.println("reusing instance, object id : " + fStreamWriter);
return fStreamWriter;
}
return fStreamWriter = new XMLStreamWriterImpl(sr, encoding, new PropertyManager(fPropertyManager)); -- this is not thread safe, since the new instance is assigned to the field fStreamWriter first, then it is possible that different threads get the same XMLStreamWriterImpl when they call this method at the same time.
}catch(java.io.IOException io){
throw new XMLStreamException(io);
}
}
The solution might be, StaxUtils.getXMLOutputFactory() method creates a new instance of XMLOutputFactory every time, don't cache it.