使用DOM方式操作XML文件,即是和DOM樹打交道的過程:在構(gòu)建XML文件時(shí),首先構(gòu)建一棵DOM樹,然后將該樹狀結(jié)構(gòu)寫成XML文件;在解析XML文件時(shí),首先將源XML文件解析成一棵DOM樹,然后遍歷這棵DOM樹、或從DOM樹中查找需要的信息。
關(guān)于DOM樹中節(jié)點(diǎn)類型、不同節(jié)點(diǎn)具有的接口、特性、限制等信息可以參考《DOM樹節(jié)點(diǎn)解析》,本文只關(guān)注如何構(gòu)建XML文件與解析XML文件。在構(gòu)建和解析XML文件中,都以w3school中的books.xml文件的內(nèi)容為例:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<bookcategory="web"cover="paperback" >
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
</bookstore>
我們都知道Java是一門面向?qū)ο蟮恼Z言,因而我們需要盡量以面向?qū)ο蟮乃枷胛揖帉懘a,面向?qū)ο缶幊唐渲幸粋€(gè)比較重要的特點(diǎn)就是基于對(duì)象編程,因而我們?cè)诰帉戇@個(gè)測(cè)試代碼時(shí),也盡量的基于對(duì)象操作,而不是像過程式的語言,有一點(diǎn)信息做一點(diǎn)操作。
在這里,對(duì)XML文件中定義的book元素,我們使用Book對(duì)象與其對(duì)應(yīng):
public class Book {
private String category;
private String cover;
private TitleInfo title;
private List<String> authors;
private int year;
private double price;
...
public static class TitleInfo {
private String title;
private String lang;
...
}
}
根據(jù)XML文件定義構(gòu)建Book實(shí)例:
public class W3CBooksBuilder {
public static List<Book> buildBooks() {
List<Book> books = new ArrayList<Book>();
books.add(buildHarrayBook());
books.add(builcEverydayItalian());
books.add(buildLearningXML());
books.add(buildXQueryKickStart());
return books;
}
public static Book buildHarrayBook() {
Book book = new Book();
book.setCategory("children");
book.setTitle(new TitleInfo("Harry Potter", "en"));
book.setAuthors(Arrays.asList("J K. Rowling"));
book.setYear(2005);
book.setPrice(29.99);
return book;
}
public static Book builcEverydayItalian() {
...
}
public static Book buildLearningXML() {
...
}
public static Book buildXQueryKickStart() {
...
}
}
DOM解析XML文件
DOM使用DocumentBuilder類來解析XML文件,它提供parse方法,將XML文件解析成一棵DOM樹,并返回Document實(shí)例:
public Document parse(InputStream is);
public Document parse(InputStream is, String systemId);
public Document parse(String uri);
public Document parse(File f);
public abstract Document parse(InputSource is);
DocumentBuilder類還提供了判斷當(dāng)前解析器是否存在命名空間解析、驗(yàn)證等配置,以及提供了設(shè)置EntityResolver、ErrorHandler的接口。這里使用EntityResolver和ErrorHandler只是重用SAX的API,并不表示DOM解析的內(nèi)部實(shí)現(xiàn)一定要基于SAX,然而貌似JDK自帶的DOM解析內(nèi)部使用的引擎就是SAX。T_T
public abstract boolean isNamespaceAware();
public abstract boolean isValidating();
public abstract void setEntityResolver(EntityResolver er);
public abstract void setErrorHandler(ErrorHandler eh);
DocumentBuilder提供了 構(gòu)建Document實(shí)例的工廠方法,在以編程方式構(gòu)建DOM樹時(shí),首先需要構(gòu)建Document實(shí)例,繼而使用Document實(shí)例構(gòu)建其余節(jié)點(diǎn)類型,而構(gòu)建Document實(shí)例需要通過DocumentBuilder類來實(shí)現(xiàn):
public abstract Document newDocument();
最后,DocumentBuilder還提供了一些額外的方法,比如重置DocumentBuilder實(shí)例的狀態(tài),以重用該DocumentBuilder;獲取DOMImplementation實(shí)例;獲取Schema實(shí)例;判斷XInclude處理模式。
public void reset();
public abstract DOMImplementation getDOMImplementation();
public Schema getSchema();
public boolean isXIncludeAware();
DocumentBuilder是一個(gè)抽象類,要獲取DocumentBuilder實(shí)例,需要使用DocumentBuilderFactory。DocumentBuilderFactory提供了多種查找DocumentBuilder實(shí)現(xiàn)類的方法;DocumentBuilderFactory本身也是抽象類,它提供了兩個(gè)靜態(tài)方法來創(chuàng)建DocumentBuilderFactory實(shí)例:
public static DocumentBuilderFactory newInstance();
public static DocumentBuilderFactory newInstance(String factoryClassName, ClassLoader classLoader);
不帶參數(shù)的newInstance()方法使用以下步驟查找DocumentBuilderFactory的實(shí)現(xiàn)類:
1. 查看系統(tǒng)屬性中是否存在javax.xml.parsers.DocumentBuilderFactory為key的定義,如果存在,則使用該key定義的值作為DocumentBuilderFactory的實(shí)現(xiàn)類。
2. 查找${java.home}/lib/jaxp.properties屬性文件中是否存在javax.xml.parsers.DocumentBuilderFactory為key的定義,若存在,則使用該屬性文件中以該key定義的值作為DocumentBuilderFactory的實(shí)現(xiàn)類。
3. 查找當(dāng)前ClassPath(包括jar包中)下是否存在META-INF/services//javax.xml.parsers.DocumentBuilderFactory文件的定義(ServiceProvider),若存在,則讀取該文件中的第一行的值作為DocumentBuilderFactory的實(shí)現(xiàn)類。
4. 若以上都沒有找到,則使用默認(rèn)的DocumentBuilderFactory的實(shí)現(xiàn)類:
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
在找到相應(yīng)的DocumentBuilderFactory實(shí)現(xiàn)類后,實(shí)例化該實(shí)現(xiàn)類,并返回DocumentBuilderFatory實(shí)例。這里的查找機(jī)制和XMLReaderFactory查找XMLReader實(shí)現(xiàn)類以及commons-logging查找LogFactory的機(jī)制很像。
對(duì)帶參數(shù)的newInstance()方法,直接使用參數(shù)中提供的DocumentBuilderFactory實(shí)現(xiàn)類以及ClassLoader來創(chuàng)建DocumentBuilderFactory實(shí)例。
最后,在系統(tǒng)屬性中將jaxp.debug設(shè)置為true可以打開調(diào)試信息。
在創(chuàng)建DocumentBuilderFactory實(shí)例后,如其名所示,它可以用于獲取DocumentBuilder實(shí)例,另外,DocumentBuilderFactory還提供了配置解析器的方法:
public abstract DocumentBuilder newDocumentBuilder();
public void setNamespaceAware(boolean awareness);
public boolean isNamespaceAware();
public void setValidating(boolean validating);
public boolean isValidating();
public void setIgnoringElementContentWhitespace(boolean whitespace);
public boolean isIgnoringElementContentWhitespace();
public void setExpandEntityReferences(boolean expandEntityRef);
public boolean isExpandEntityReferences();
public void setIgnoringComments(boolean ignoreComments);
public boolean isIgnoringComments();
public void setCoalescing(boolean coalescing);
public boolean isCoalescing();
public void setXIncludeAware(final boolean state);
public boolean isXIncludeAware();
public abstract void setAttribute(String name, Object value);
public abstract Object getAttribute(String name);
public abstract void setFeature(String name, boolean value);
public abstract boolean getFeature(String name);
public Schema getSchema();
public void setSchema(Schema schema);
在創(chuàng)建出DocumentBuilderFactory,使用該factory創(chuàng)建DocumentBuilder實(shí)例后,就可以使用該DocumentBuilder解析XML文件成一個(gè)Document實(shí)例,而通過該Document實(shí)例就可以遍歷、查找DOM樹,從而獲得想要的信息。在下面的例子中,遍歷DOM樹,創(chuàng)建多個(gè)Book實(shí)例:
public class W3CBooksDOMReader {
private static DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
private String booksXmlFile;
public W3CBooksDOMReader(String booksXmlFile) {
this.booksXmlFile = booksXmlFile;
}
public List<Book> parse() {
Document doc = parseXmlFile();
Element root = doc.getDocumentElement();
NodeList nodes = root.getElementsByTagName("book");
List<Book> books = new ArrayList<Book>();
for(int i = 0; i < nodes.getLength(); i++) {
books.add(parseBookElement((Element)nodes.item(i)));
}
return books;
}
private Document parseXmlFile() {
File xmlFile = new File(booksXmlFile);
if(!xmlFile.exists()) {
throw new RuntimeException("Cannot find xml file: " + booksXmlFile);
}
try {
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(xmlFile);
} catch(Exception ex) {
throw new RuntimeException("Failed to create DocumentBuilder instance", ex);
}
}
private Book parseBookElement(Element bookElement) {
String category = bookElement.getAttribute("category");
String cover = bookElement.getAttribute("cover");
NodeList nodes = bookElement.getElementsByTagName("title");
String lang = ((Element)nodes.item(0)).getAttribute("lang");
// First way to get content of an Element
String title = ((Text)((Element)nodes.item(0)).getFirstChild()).getData().trim();
List<String> authors = new ArrayList<String>();
nodes = bookElement.getElementsByTagName("author");
for(int i = 0; i < nodes.getLength(); i++) {
// Second way to get content of an Element
String author = nodes.item(0).getTextContent().trim();
authors.add(author);
}
nodes = bookElement.getElementsByTagName("year");
int year = Integer.parseInt(nodes.item(0).getTextContent().trim());
nodes = bookElement.getElementsByTagName("price");
double price = Double.parseDouble(nodes.item(0).getTextContent().trim());
Book book = new Book();
book.setCategory(category);
book.setCover(cover);
book.setTitle(new TitleInfo(title, lang));
book.setAuthors(authors);
book.setYear(year);
book.setPrice(price);
return book;
}
public String getBooksXmlFile() {
return booksXmlFile;
}
public static void main(String[] args) {
W3CBooksDOMReader reader = new W3CBooksDOMReader("resources/xmlfiles/w3c_books.xml");
List<Book> books = reader.parse();
System.out.println("result:");
for(Book book : books) {
System.out.println(book);
}
}
}
DOM構(gòu)建XML文件
將對(duì)象實(shí)例序列化成XML文件,首先需要構(gòu)建DOM樹,即要構(gòu)建Document實(shí)例,然后將該Document實(shí)例寫入的XML文件中。如上節(jié)所述,可以使用DocumentBuilder類來創(chuàng)建Document實(shí)例,然后根據(jù)對(duì)象實(shí)例(Book實(shí)例)和需要的XML格式構(gòu)建節(jié)點(diǎn)和節(jié)點(diǎn)的排布即可,這里不再詳述。
要將對(duì)象序列化成XML文件還要處理的另一個(gè)問題是如何將Document實(shí)例寫入到指定的XML文件中,在Java中提供了Transformer接口來做這件事情。這屬于XLST(EXtensible Stylesheet Language)的范疇,不過這里不打算對(duì)其做詳細(xì)介紹,主要關(guān)注如何將Document實(shí)例輸出成XML文件。
Transformer提供了transform方法將Document實(shí)例寫入指定的流中:
public abstract void transform(Source xmlSource, Result outputTarget);
其中Source接口定義了輸入源,它可以是DOMSource,也可以是SAXSource,或者是自定義的其他Source子類,這里主要介紹DOMSource。Source接口定義了systemId屬性,它表示XML源的位置,XML源不是從URL中獲取的源來說,它為null。具體定義如下:
public interface Source {
public void setSystemId(String systemId);
public String getSystemId();
}
DOMSource是對(duì)Source的一個(gè)具體實(shí)現(xiàn),它接收Node、systemId信息:
public class DOMSource implements Source {
private Node node;
private String systemID;
public DOMSource() { }
public DOMSource(Node n) {
setNode(n);
}
public DOMSource(Node node, String systemID) {
setNode(node);
setSystemId(systemID);
}
...
}
Result是對(duì)輸出目的的抽象,即將輸入源轉(zhuǎn)換成目的源。同Source接口,Result接口也定義了systemId屬性,表示目的文件位置,如果目的源不是URL,則改值為null:
public interface Result {
public void setSystemId(String systemId);
public String getSystemId();
}
JDK中提供了多種Result的實(shí)現(xiàn),如DOMResult、StreamResult等。這里只介紹StreamResult,表示其輸出目的是流,我們可以提供Writer、OutputStream等實(shí)例來接收這些輸出:
public class StreamResult implements Result {
public StreamResult() {
}
public StreamResult(OutputStream outputStream) {
setOutputStream(outputStream);
}
public StreamResult(Writer writer) {
setWriter(writer);
}
public StreamResult(String systemId) {
this.systemId = systemId;
}
public StreamResult(File f) {
setSystemId(f.toURI().toASCIIString());
}
...
private String systemId;
private OutputStream outputStream;
private Writer writer;
}
除了transform方法,Transformer類還提供了其他的方法用于配置Transformer在轉(zhuǎn)換時(shí)用到的信息(只提供接口定義,不詳述):
public void reset();
public abstract void setParameter(String name, Object value);
public abstract Object getParameter(String name);
public abstract void clearParameters();
public abstract void setURIResolver(URIResolver resolver);
public abstract URIResolver getURIResolver();
public abstract void setOutputProperties(Properties oformat);
public abstract Properties getOutputProperties();
public abstract void setOutputProperty(String name, String value);
public abstract String getOutputProperty(String name);
public abstract void setErrorListener(ErrorListener listener);
public abstract ErrorListener getErrorListener();
類似DocumentBuilder,Transformer通過TransformerFactory創(chuàng)建,而TransformerFactory的創(chuàng)建如同DocumentBuilderFactory的創(chuàng)建以及查找機(jī)制,所不同的是TransformerFactory的屬性名為:javax.xml.transform.TransformerFactory,其默認(rèn)實(shí)現(xiàn)類為:com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl,而且它也提供了兩個(gè)獲取TransformerFactory實(shí)例的方法,這里不再詳述:
public static TransformerFactory newInstance();
public static TransformerFactory newInstance(String factoryClassName, ClassLoader classLoader);
TransformerFactory提供了創(chuàng)建Transformer和Templates的方法,同時(shí)也提供了在創(chuàng)建這兩個(gè)實(shí)例時(shí)可以設(shè)置的一些配置方法:
public abstract Transformer newTransformer(Source source);
public abstract Transformer newTransformer();
public abstract Templates newTemplates(Source source);
public abstract Source getAssociatedStylesheet(Source source, String media,
String title, String charset);
public abstract void setURIResolver(URIResolver resolver);
public abstract URIResolver getURIResolver();
public abstract void setFeature(String name, boolean value);
public abstract boolean getFeature(String name);
public abstract void setAttribute(String name, Object value);
public abstract Object getAttribute(String name);
public abstract void setErrorListener(ErrorListener listener);
public abstract ErrorListener getErrorListener();
最后,提供一個(gè)完整的例子,使用本文開始時(shí)創(chuàng)建的List<Book>實(shí)例序列化成XML文件:
public class W3CBooksDOMWriter {
private static DocumentBuilder docBuilder;
private static Transformer transformer;
static {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
docBuilder = factory.newDocumentBuilder();
} catch(Exception ex) {
throw new RuntimeException("Create DocumentBuilder instance failed.", ex);
}
TransformerFactory transFactory = TransformerFactory.newInstance();
try {
transformer = transFactory.newTransformer();
} catch(Exception ex) {
throw new RuntimeException("Create Transformer instance failed.", ex);
}
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
}
private List<Book> books;
public W3CBooksDOMWriter(List<Book> books) {
this.books = books;
}
public void toXml(Writer writer) throws Exception {
Document doc = buildDOMTree();
writeToXmlFile(writer, doc);
}
public Document buildDOMTree() {
Document doc = docBuilder.newDocument();
Element root = doc.createElement("bookstore");
doc.appendChild(root);
for(Book book : books) {
Element bookElement = buildBookElement(doc, book);
root.appendChild(bookElement);
}
return doc;
}
public Element buildBookElement(Document doc, Book book) {
Element bookElement = doc.createElement("book");
bookElement.setAttribute("category", book.getCategory());
bookElement.setAttribute("cover", book.getCover());
TitleInfo title = book.getTitle();
Element titleElement = doc.createElement("title");
titleElement.setAttribute("lang", title.getLang());
titleElement.setTextContent(title.getTitle());
bookElement.appendChild(titleElement);
for(String author : book.getAuthors()) {
Element authorElement = doc.createElement("author");
authorElement.setTextContent(author);
bookElement.appendChild(authorElement);
}
Element yearElement = doc.createElement("year");
yearElement.setTextContent(String.valueOf(book.getYear()));
bookElement.appendChild(yearElement);
Element priceElement = doc.createElement("price");
priceElement.setTextContent(String.valueOf(book.getPrice()));
bookElement.appendChild(priceElement);
return bookElement;
}
public void writeToXmlFile(Writer writer, Document doc) throws Exception {
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(writer);
transformer.transform(source, result);
}
public static void main(String[] args) throws Exception {
StringWriter writer = new StringWriter();
List<Book> books = W3CBooksBuilder.buildBooks();
W3CBooksDOMWriter domWriter = new W3CBooksDOMWriter(books);
domWriter.toXml(writer);
System.out.println(writer.toString());
}
}