【Java基礎專題】IO與文件讀寫---使用Apache commons io包提高讀寫效率

【一】Apache commons IO簡介

首先貼一段Apache commons IO官網上的介紹，來對這個著名的開源包有一個基本的了解：

Commons IO is a library of utilities to assist with developing IO functionality. There are four main areas included:

●Utility classes - with static methods to perform common tasks
●Filters - various implementations of file filters
●Comparators - various implementations of java.util.Comparator for files
●Streams - useful stream, reader and writer implementations

Packages

org.apache.commons.io This package defines utility classes for working with streams, readers, writers and files.

org.apache.commons.io.comparator This package provides various Comparator implementations for Files.

org.apache.commons.io.filefilter This package defines an interface (IOFileFilter) that combines both FileFilter and FilenameFilter.

org.apache.commons.io.input This package provides implementations of input classes, such as InputStream and Reader.

org.apache.commons.io.output This

【二】org.apache.comons.io.input包介紹

這個包針對SUN JDK IO包進行了擴展，實現了一些功能簡單的IO類，主要包括了對字節/字符輸入流接口的實現

這個包針對java.io.InputStream和Reader進行了擴展，其中比較實用的有以下幾個：

●AutoCloseInputStream

Proxy stream that closes and discards the underlying stream as soon as the end of input has been reached or when the stream is explicitly closed. Not even a reference to the underlying stream is kept after it has been closed, so any allocated in-memory buffers can be freed even if the client application still keeps a reference to the proxy stream

This class is typically used to release any resources related to an open stream as soon as possible even if the client application (by not explicitly closing the stream when no longer needed) or the underlying stream (by not releasing resources once the last byte has been read) do not do that.

這個輸入流是一個底層輸入流的代理，它能夠在數據源的內容被完全讀取到輸入流后，后者當用戶調用close()方法時，立即關閉底層的輸入流。釋放底層的資源(例如文件的句柄)。這個類的好處就是避免我們在代碼中忘記關閉底層的輸入流而造成文件處于一直打開的狀態。

我們知道對于某些文件，只允許由一個進程打開。如果我們使用后忘記關閉那么該文件將處于一直“打開”的狀態，其它進程無法讀寫。例如下面的例子：

new BufferedInputStream(new FileInputStream(FILE))

里面的FileInputStream(FILE)在打開后不能被顯式關閉，這將導致可能出現的問題。如果我們使用了AutoCloseInputStream，那么當數據讀取完畢后，底層的輸入流會被自動關閉，迅速地釋放資源。

new BufferedInputStream(new AutoClosedInputStream(new FileInputStream));

那么這個類是如何做到自動關閉的呢？來看看這個非常簡單的類的代碼吧

Packages
org.apache.commons.io	This package defines utility classes for working with streams, readers, writers and files.
org.apache.commons.io.comparator	This package provides various `Comparator` implementations for `File`s.
org.apache.commons.io.filefilter	This package defines an interface (IOFileFilter) that combines both `FileFilter` and `FilenameFilter`.
org.apache.commons.io.input	This package provides implementations of input classes, such as `InputStream` and `Reader`.
org.apache.commons.io.output	This

package org.apache.commons.io.input;

import java.io.IOException;

import java.io.InputStream;

public class AutoCloseInputStream extends ProxyInputStream {

public AutoCloseInputStream(InputStream in) {

super(in);

}

public void close() throws IOException {

in.close();

in = new ClosedInputStream();

}

public int read() throws IOException {

int n = in.read();

if (n == -1) {

close();

}

return n;

}

public int read(byte[] b) throws IOException {

int n = in.read(b);

if (n == -1) {

close();

}

return n;

}

public int read(byte[] b, int off, int len) throws IOException {

int n = in.read(b, off, len);

if (n == -1) {

close();

}

return n;

}

protected void finalize() throws Throwable {

close();

super.finalize();

}

public class ClosedInputStream extends InputStream {

/**

* A singleton.

public static final ClosedInputStream CLOSED_INPUT_STREAM = new ClosedInputStream();

/**

* Returns -1 to indicate that the stream is closed.

* @return always -1

public int read() {

return -1;

}

可以看到這個類通過兩個途徑來保證底層的流能夠被正確地關閉：
①每次調用read方法時，如果底層讀到的是-1，立即關閉底層輸入流。返回一個ClosedInputStream
②當這個類的對象被回收時，確保關閉底層的輸入流

●TeeInputStream

InputStream proxy that transparently writes a copy of all bytes read from the proxied stream to a given OutputStream. The proxied input stream is closed when the close() method is called on this proxy. It is configurable whether the associated output stream will also closed.

可以看到這個類的作用是把輸入流讀入的數據原封不動地傳遞給輸出流。這一點和JDK中提供的PipedInputStream的理念有些類似。在實際使用中可以非常方便地做到像：將從遠程URL讀入的數據寫到輸出流，保存到文件之類的動作。當輸入流被關閉時，輸出流不一定被關閉。可以依然保持打開的狀態。

下面是這個類的部分源碼

public int read(byte[] bts, int st, int end) throws IOException {

int n = super.read(bts, st, end);

if (n != -1) {

branch.write(bts, st, n);

}

return n;

}

●CharSequenceReader

Reader implementation that can read from String, StringBuffer, StringBuilder or CharBuffer.

這個類可以看成是對StringReader的一個擴展，用于從內存中讀取字符。

●NullReader

A functional, light weight Reader that emulates a reader of a specified size.

This implementation provides a light weight object for testing with an Reader where the contents don't matter.

One use case would be for testing the handling of large Reader as it can emulate that scenario without the overhead of actually processing large numbers of characters - significantly speeding up test execution times.

從上面的文字描述來看，這個類顯然是用來做測試輔助的，它的目標對象是“對讀入內容不關心”的需求。它并不傳遞真正的數據，而是模擬這個過程。來看看下面的源代碼

/**

* Read the specified number characters into an array.

* @param chars The character array to read into.

* @param offset The offset to start reading characters into.

* @param length The number of characters to read.

* @return The number of characters read or <code>-1</code>

* if the end of file has been reached and

* <code>throwEofException</code> is set to <code>false</code>.

* @throws EOFException if the end of file is reached and

* <code>throwEofException</code> is set to <code>true</code>.

* @throws IOException if trying to read past the end of file.

public int read(char[] chars, int offset, int length) throws IOException {

if (eof) {

throw new IOException("Read after end of file");

}

if (position == size) {

return doEndOfFile();

}

position += length;

int returnLength = length;

if (position > size) {

returnLength = length - (int)(position - size);

position = size;

}

processChars(chars, offset, returnLength);

return returnLength;

}

/**

* Return a character value for the <code>read()</code> method.

* <p>

* This implementation returns zero.

* @return This implementation always returns zero.

protected int processChar() {

// do nothing - overridable by subclass

return 0;

}

/**

* Process the characters for the <code>read(char[], offset, length)</code>

* method.

* <p>

* This implementation leaves the character array unchanged.

* @param chars The character array

* @param offset The offset to start at.

* @param length The number of characters.

protected void processChars(char[] chars, int offset, int length) {

// do nothing - overridable by subclass

}

知道它是怎么模擬的了嗎？呵呵~~。原來它只是模擬計數的過程，根本不傳遞、處理、存儲任何數據。數組始終都是空的。

【三】org.apache.commons.io.output包介紹

和input包類似，output包也實現/繼承了部分JDK IO包的類、接口。這里需要特別注意的有3個類，他們分別是：

①ByteArrayOutputStream
②FileWriterWithEncoding
③LockableFileWriter

●ByteArrayOutputStream

This class implements an output stream in which the data is written into a byte array. The buffer automatically grows as data is written to it.

The data can be retrieved using toByteArray() and toString().

Closing a ByteArrayOutputStream has no effect. The methods in this class can be called after the stream has been closed without generating an IOException.

This is an alternative implementation of the java.io.ByteArrayOutputStream class. The original implementation only allocates 32 bytes at the beginning. As this class is designed for heavy duty it starts at 1024 bytes. In contrast to the original it doesn't reallocate the whole memory block but allocates additional buffers. This way no buffers need to be garbage collected and the contents don't have to be copied to the new buffer. This class is designed to behave exactly like the original. The only exception is the deprecated toString(int) method that has been ignored.

從上面的文檔中，我們看到Apache commons io的ByteArrayOutputString比起SUN自帶的ByteArrayOutputStream更加高效，原因在于：

①緩沖區的初始化大小比原始的JDK自帶的ByteArrayOutputStream要大很多(1024:32)
②緩沖區的大小可以無限增加。當緩沖不夠時動態增加分配，而非清空后再重新封閉
③減少write方法的調用次數，一次性將多個一級緩沖數據寫出。減少堆棧調用的時間

那么為什么這個類可以做到這些呢？來看看他的源碼吧：

/** The list of buffers, which grows and never reduces. */

private List buffers = new ArrayList();

/** The current buffer. */

private byte[] currentBuffer;

/**

* Creates a new byte array output stream. The buffer capacity is

* initially 1024 bytes, though its size increases if necessary.

public ByteArrayOutputStream() {

this(1024);

}

而JDK自帶的Buffer則只有簡單的一個byte[]

/**

* The buffer where data is stored.

protected byte buf[];

/**

* Creates a new byte array output stream. The buffer capacity is

* initially 32 bytes, though its size increases if necessary.

public ByteArrayOutputStream() {

this(32);

}

原來Apache commons 的io是采用了二級緩沖：首先一級緩沖是一個byte[]，隨著每次寫出的數據不同而不同。二級緩沖則是一個無限擴充的ArrayList，每次從byte[]中要寫出的數據都會緩存到這里。當然效率上要高很多了。那么這個類是如何做到動態增加緩沖而不需要每次都回收已有的緩沖呢？

/**

* Makes a new buffer available either by allocating

* a new one or re-cycling an existing one.

* @param newcount the size of the buffer if one is created

private void needNewBuffer(int newcount) {

if (currentBufferIndex < buffers.size() - 1) {

//Recycling old buffer

filledBufferSum += currentBuffer.length;

currentBufferIndex++;

currentBuffer = getBuffer(currentBufferIndex);

} else {

//Creating new buffer

int newBufferSize;

if (currentBuffer == null) {

newBufferSize = newcount;

filledBufferSum = 0;

} else {

newBufferSize = Math.max(

currentBuffer.length << 1,

newcount - filledBufferSum);

filledBufferSum += currentBuffer.length;

}

currentBufferIndex++;

currentBuffer = new byte[newBufferSize];

buffers.add(currentBuffer);

}

在初始化的情況下，currentBuffer == null，于是第一個一級緩沖區byte[]的大小就是默認的1024或者用戶指定的值。然后filledBufferSum、currentBufferIndex分別進行初始化。創建第一個一級緩存區，添加到二級緩沖區buffers中。

當后續的緩沖請求到來后，根據剩下的緩沖大小和尚存的緩沖進行比較，然后選擇較大的值作為緩沖擴展的大小。再次創建一個新的一級緩沖byte[]，添加到二級緩沖中。

相比于JDK自帶的方法，這個類多了一個write(InputStream in)的方法，看看下面的源代碼

public synchronized int write(InputStream in) throws IOException {

int readCount = 0;

int inBufferPos = count - filledBufferSum;

int n = in.read(currentBuffer, inBufferPos, currentBuffer.length - inBufferPos);

while (n != -1) {

readCount += n;

inBufferPos += n;

count += n;

if (inBufferPos == currentBuffer.length) {

needNewBuffer(currentBuffer.length);

inBufferPos = 0;

}

n = in.read(currentBuffer, inBufferPos, currentBuffer.length - inBufferPos);

}

return readCount;

}

可以看到每次從InputStream讀取當前一級緩沖剩余空間大小的字節，緩沖到剩下的空間。如果緩沖滿了則繼續分配新的一級緩沖。直至數據讀完。對于寫出到另外的輸出流，則
是：

public synchronized void writeTo(OutputStream out) throws IOException {

int remaining = count;

for (int i = 0; i < buffers.size(); i++) {

byte[] buf = getBuffer(i);

int c = Math.min(buf.length, remaining);

out.write(buf, 0, c);

remaining -= c;

if (remaining == 0) {

break;

}

由于每次write的時候一次性地寫出一級緩沖，而且是將二級緩沖全部寫出，減少了調用的次數，所以提高了效率。

●FileWriterWithEncoding

從這個類的名稱已經可以很清楚的知道它的作用了。在JDK自帶的FileWriter中，是無法設置encoding的，這個類允許我們采用默認或者指定的encoding，以字符的形式寫到文件。為什么這個類可以改變字符嗯？

原理很簡單：無非使用了OutputStreamWriter。而且這個類并不是繼承與FileWriter，而是直接繼承于Writer。

OutputStream stream = null;

Writer writer = null;

try {

stream = new FileOutputStream(file, append);

if (encoding instanceof Charset) {

writer = new OutputStreamWriter(stream, (Charset)encoding);

} else if (encoding instanceof CharsetEncoder) {

writer = new OutputStreamWriter(stream, (CharsetEncoder)encoding);

} else {

writer = new OutputStreamWriter(stream, (String)encoding);

}

剩下的各種write方法，無非就是decorator模式而已。

●LockableFileWriter

使用“文件鎖”而非“對象鎖”來限制多線程環境下的寫動作。這個類采用在JDK默認的系統臨時目錄下寫文件：java.io.tmpdir屬性。而且允許我們設置encoding。

/**

* Constructs a LockableFileWriter with a file encoding.

* @param file the file to write to, not null

* @param encoding the encoding to use, null means platform default

* @param append true if content should be appended, false to overwrite

* @param lockDir the directory in which the lock file should be held

* @throws NullPointerException if the file is null

* @throws IOException in case of an I/O error

public LockableFileWriter(File file, String encoding, boolean append,

String lockDir) throws IOException {

super();

// init file to create/append

file = file.getAbsoluteFile();

if (file.getParentFile() != null) {

FileUtils.forceMkdir(file.getParentFile());

}

if (file.isDirectory()) {

throw new IOException("File specified is a directory");

}

// init lock file

if (lockDir == null) {

lockDir = System.getProperty("java.io.tmpdir");

}

File lockDirFile = new File(lockDir);

FileUtils.forceMkdir(lockDirFile);

testLockDir(lockDirFile);

lockFile = new File(lockDirFile, file.getName() + LCK);

// check if locked

createLock();

// init wrapped writer

out = initWriter(file, encoding, append);

}

首先創建一個用于存放lock文件的目錄，位于系統臨時目錄下。

接下來創建一個位于該目錄下的名為xxxLCK的文件(鎖文件)。

然后創建鎖

最后則是初始化writer

顯然我們關心的是如何創建這個鎖，以及如何在寫期間進行鎖。首先來看創建鎖的過程

private void createLock() throws IOException {

synchronized (LockableFileWriter.class) {

if (!lockFile.createNewFile()) {

throw new IOException("Can't write file, lock " +

lockFile.getAbsolutePath() + " exists");

}

lockFile.deleteOnExit();

}

注意這里的deleteOnExit()方法很重要，它告訴JVM：當JV退出是要刪除該文件。否則磁盤上將有無數無用的臨時鎖文件。

下面的問題則是如何實現鎖呢？呵呵~~。還是回到這個上面這個類的構造方法吧，我們看到在構造這個LockableFileWriter時，會調用createLock()這個方法，而這個方法如果發現文件已經創建/被其它流引用時，會拋出一個IOException。于是創建不成功，也就無法繼續后續的write操作了。

那么如果第一個進程創建了鎖之后就不釋放，那么后續的進程豈不是無法寫了，于是在這個類的close方法中有這樣一句代碼：

public void close() throws IOException {

try {

out.close();

} finally {

lockFile.delete();

}

每個進程在完成數據的寫動作后，必須調用close()方法，于是鎖文件被刪除，鎖被解除。相比于JDK中自帶Writer使用的object鎖(synchronized(object))，這個方法確實要更加簡便和高效。這個類當初就是被設計來替換掉原始的FileWriter的。

但切記：一定要在結尾處調用close()方法，否則無法解鎖。而且這里沒有AutoCloseOutputStream這樣的類哦！

-------------------------------------------------------------
生活就像打牌，不是要抓一手好牌，而是要盡力打好一手爛牌。

posted on 2010-03-04 10:28 Paul Lin 閱讀(5653) 評論(0) 編輯收藏所屬分類: J2SE

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: 【Java基礎專題】IO與文件讀寫---優化搜索程序(01) 【Java基礎專題】IO與文件讀寫---DirectoryWalker和FileFilter的復雜條件使用【Java基礎專題】IO與文件讀寫---使用DirectoryWalker和FileFilterUtils進行搜索【Java基礎專題】IO與文件讀寫---慎用FileUtils.writeLines(File, Collection)方法 TSS上關于JDBC操作優化的Tips總結【Java基礎專題】IO與文件讀寫---對同步/異步和阻塞/非阻塞的理解【Java基礎專題】IO與文件讀寫---同步/異步與阻塞/非阻塞的區別（轉）【Java基礎專題】IO與文件讀寫---使用Apache commons IO包進行資源遍歷【Java基礎專題】IO與文件讀寫---使用Apache commons IO過濾文件和目錄【Java基礎專題】IO與文件讀寫---使用Apache commons IO操縱底層讀寫

2010年3月

日

一

二

三

四

五

六

常用鏈接

留言簿(21)

隨筆分類

隨筆檔案

BlogJava熱點博客

好友博客

無羽蒼鷹

常用鏈接

留言簿(21)

隨筆分類

隨筆檔案

BlogJava熱點博客

好友博客

搜索

最新評論

閱讀排行榜

評論排行榜