<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    gembin

    OSGi, Eclipse Equinox, ECF, Virgo, Gemini, Apache Felix, Karaf, Aires, Camel, Eclipse RCP

    HBase, Hadoop, ZooKeeper, Cassandra

    Flex4, AS3, Swiz framework, GraniteDS, BlazeDS etc.

    There is nothing that software can't fix. Unfortunately, there is also nothing that software can't completely fuck up. That gap is called talent.

    About Me

     

    Why And How To Use PDOM: A Persistent W3C DOM API

    What is PDOM?

    PDOM stands for Persistent Document Object Model.

    PDOM implements the W3C DOM API, as MiniDOM solves SAX processing problems, so PDOM solves DOM scalability problems by providing a persistent implementation of the DOM API.

    An enhanced implementation of XPATH provides excellent usability.

    PDOM's implementation exploits the capabilities of GPO, the Generic Persistent Object model.

    Is it straightforward to use?

    It is simplest to give an example, here is some java code:

    import cutthecrap.pdom.*;

    client = new PDOMClient(null, "D:/testxml/pdom.rw");
    PDOM pdom = client.getPDOM();

    PDocument doc = pdom.createDocument("Opera", "D:/testxml/opera.xtm");

    The PDocument class implements the org.w3c.dom.Document interface.

    You should be able to see from the above code that a PDOM system can contain many DOM documents.

    At some later stage, the persistent document could be retrieved:

    import cutthecrap.pdom.*;

    client = new PDOMClient(null, "D:/testxml/pdom.rw");
    PDOM pdom = client.getPDOM();

    PDocument doc = pdom.getDocument("Opera");

    So PDOM provides both a Persistent DOM repository to manage and interact with individual huge XML documents, and also allowing the storage of perhaps millions of separate XML documents.

    Once a PDocument has been returned the Document interface can be used to navigate to the contained nodes.

    In addition to the ord.w3c.dom interfaces, support is also provided for XPath-based queries.

    What PDOM Isn't

    PDOM has not been developed to provide a rigorous implementation of the full W3C DOM model. It does not currently support DTDs and there are no immediate plans to do so.

    One example of this is that PDOM automatically recognises the "id" attribute to provide the identity for an element - subsequently accessible using document.getElementById, where the standard specifies that the DTD must indicate which attribute is used to identify a specific element type.

    By default also, text nodes are not added if they only include whitespace. Although this behaviour can be overridden when an XML document is imported.

    XPath

    Support is provided for using XPath to return nodes from the DOM.

    PNode someNode = doc.getElementById("someId");
    XPathQuery query = new XPath(".//baseNameString/text()");

    query.setContext(someNode);

    Iterator nodes = query.execute();

    while (nodes.hasNext()) {
    Text txt = (Text) nodes.next();

    System.out.println("baseNameString : " + txt.getNodeValue());
    }

    A number of utility methods are provided to make this even simpler, for example:

    PNode someNode = doc.getElementById("someId");
    Iterator nodes = someNode.queryXPath(".//baseNameString/text()");

    Will produce the same result.

    Creating XPathQuery objects directly though may have some advantages, for example, they might be passed as arguments to methods to be applied to other computationally chosen nodes - simply calling setContext for each node to be queried against.

    XPath [Predicates]

    The XPath support now also includes predicates where before it was limited to object navigation. For example:

    PElement root = (PElement) doc.getDocumentElement();

    nodes = root.queryXPath(".//instanceOf/topicRef[starts-with(@xlink:href,'#wri')]");

    ..or

    nodes = root.queryXPath(".//instanceOf/topicRef[string-length(@xlink:href)=9]");

    It should be stressed tho' that XPath access should not be "abused". Many ill-considered XPath queries may involve traversal of the entire XML tree where more focussed queries could and should be used.

    Performance

    PDOM is built using the Generic Persistent Object Model. No special optimization has been carried out to minimise storage requirements for the PDOM data model.

    When compared with the Xerces DOM, if an in-memory system is specified then PDOM will require over twice the java memory for Xerces to store the same data, for example:

    Source XML    Xerces    PDOM (memory based)
    ---------- ------- ----
    523K 4.8Mb 10.2Mb

    The figures for the in-memory PDOM representation are a little disappointing, it would have been nice to show a broad equivalence with Xerces for in-memory options. Xerces also is significantly quicker than PDOM in parsing the document.

    It should be stressed that these figures demonstrate what an excellent product Xerces is. PDOM uses a generic representation that requires many java objects. The "bloat" on the PDOM memory usage is mostly explained by the overhead associated with any object instance.

    However, if the PDOM is stored persistently, the memory requirement drops, here are the figures for the PDOM memory requirements and the datastore disk space:

    Source XML    PDOM      GPO Datastore
    ---------- ------- -------------
    523K 1.9Mb 1.6Mb

    You may find it odd that the datastore is so small. This is achieved by various optimizations that ensure the object data is packed efficiently.

    Clearly, as objects are read in the PDOM java memory requirement will increase - particularly if the application retains references to many objects.

    It should be emphasised that the PDOM memory increases only very slightly as the source XML becomes bigger, while the backing datastore will be approximately three times the size of the source XML.

    Scalability

    The main reason to use PDOM is scalability. For small DOMs Xerces is an excellent choice, it's parsing performance is particulalry impressive, but if you cannot predict what size the DOM will be, then PDOM provides a scalable solution.

    If you read in a 300Mb XML file, the Xerces DOM will require a java VM of around two gigabytes, just to hold the data, while PDOM would process the file with a backing store of around 1Gb and do so quite happily, even with a java VM limited to 10Mb.

    Furthermore, processing a 300Mb XML file will take Xerces a considerable time - assuming the memory is available. Processing with PDOM will also take sometime - perhaps several times longer than Xerces would (if it is able to do so) - but thereafter the DOM could be accessed directly rather than having to reprocess the file.

    Not having a 300Mb XML file around, here are some figures for a 5Mb file.

    Source XML    Xerces    PDOM      GPO Datastore
    ---------- ------- ------- -------------
    5.2Mb 21Mb 1.9Mb 13Mb

    When PDOM was used to produce these figures I ran with

    java -mx10M

    This limits the java heap to a maximum of 10Mb. The overhead will effectively remain constant no matter how big or how many DOM documents are stored in the datastore.

    Summary

    PDOM solves the problem of using the standard DOM API to access huge XML data files.

    The persistent DOM allows for XML files to be parsed once, and thereafter retrieved by name.

    The resource overhead on the java VM - and OS virtual memory - when retaining huge in-memory DOMs is removed.

    How Can I Get PDOM?

    PDOM is provided as part of the full Cut The Crap distribution and can be downloaded from www.cutthecrap.biz/software/downloads.html along with other Cut The Crap software.

    posted on 2008-07-29 17:18 gembin 閱讀(519) 評論(0)  編輯  收藏 所屬分類: XML

    導航

    統計

    常用鏈接

    留言簿(6)

    隨筆分類(440)

    隨筆檔案(378)

    文章檔案(6)

    新聞檔案(1)

    相冊

    收藏夾(9)

    Adobe

    Android

    AS3

    Blog-Links

    Build

    Design Pattern

    Eclipse

    Favorite Links

    Flickr

    Game Dev

    HBase

    Identity Management

    IT resources

    JEE

    Language

    OpenID

    OSGi

    SOA

    Version Control

    最新隨筆

    搜索

    積分與排名

    最新評論

    閱讀排行榜

    評論排行榜

    free counters
    主站蜘蛛池模板: 久久经典免费视频| 五月婷婷亚洲综合| 日韩成人免费视频| 在线观看亚洲一区二区| 一本无码人妻在中文字幕免费| 久久影院亚洲一区| 97公开免费视频| 亚洲欧美精品午睡沙发| 亚洲综合图色40p| 亚洲视频免费播放| 免费看黄网站在线看| 又粗又大又猛又爽免费视频 | 美女视频黄.免费网址| 亚洲无线观看国产精品| av无码国产在线看免费网站| 黄页网站在线免费观看| 久久亚洲春色中文字幕久久久| 国产成人免费片在线视频观看| 亚洲AV无码专区国产乱码不卡| 亚洲AV无码乱码在线观看裸奔| 女人毛片a级大学毛片免费| 水蜜桃视频在线观看免费播放高清| 亚洲韩国精品无码一区二区三区| 你是我的城池营垒免费看| 国产精品亚洲片在线va| 亚洲夜夜欢A∨一区二区三区| 成年男女免费视频网站| 在线观看免费无码专区| 国产精品亚洲一区二区三区久久 | 伊人久久免费视频| 黄色a三级三级三级免费看| 亚洲图片中文字幕| 亚洲中文字幕无码久久2017| 全免费一级午夜毛片| 久久国产乱子伦精品免费强| 久久久久亚洲AV无码麻豆| 亚洲中文无韩国r级电影| 日韩版码免费福利视频| 无码国产精品一区二区免费3p| 精品一区二区三区无码免费直播 | 国产精品xxxx国产喷水亚洲国产精品无码久久一区 |