<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    無為

    無為則可為,無為則至深!

      BlogJava :: 首頁 :: 聯系 :: 聚合  :: 管理
      190 Posts :: 291 Stories :: 258 Comments :: 0 Trackbacks

    Level: Introductory

    Uche Ogbuji (uche.ogbuji@fourthought.com), Principal Consultant, Fourthought, Inc

    31 Jan 2006

    The use of XML has become widespread, but much of it is not well formed. When it is well formed, it's often of poor design, which makes processing and maintenance very difficult. And much of the infrastructure for serving XML can compound these problems. In response, there has been some public discussion of XML best practices, such as Henri Sivonen's document, "HOWTO Avoid Being Called a Bozo When Producing XML." Uche Ogbuji frequently discusses XML best practices on IBM developerWorks, and in this column, he gives you his opinion about the main points discussed in such articles.

    I have been discussing XML best practices in this column and in other series for years. Others, such as fellow columnist Elliotte Rusty Harold, have covered it as well. The more XML experts that join the discussion of XML design principles, the better, so the community can converge on solid advice for developers at all levels of XML adoption. In this article, using a recent document and a classic one, you learn more details about XML best practices.

    Enter the no bozo zone

    Henri Sivonen wrote a useful article, "HOWTO Avoid Being Called a Bozo When Producing XML" (see Resources). Adopting the perspective of XML-based Web feed formats, such as RSS and Atom, he goes over his Dos and Don'ts for producing well-formed XML with namespaces. As he says in his introduction:

    There seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of Dos and Don'ts helps developers to move from the first group to the latter.

    The first bit of advice Henri gives is, "Don't think of XML as a text format." I think this is dangerous advice. Certainly his main point is valid -- you cannot be as careless in producing or editing XML as you would a simple text document, but this applies to all text formats with any structure. However, saying that XML is not text is denying one of the most important characteristics of XML, one that is enshrined in the very definition of XML in the specification. ("A textual object is a well-formed XML document [if it conforms to this specification.]") Henri's statement is also confusing because there is a technical definition of text in XML that is essentially the sequence of characters interpreted as XML. Text is not merely what goes within leaf elements or within attributes -- technically called character data. Text is the fundamental fabric of all XML entities, so to say that XML is not text is a contradiction. I think it's more useful to highlight the specific ways in which XML differs from text formats with which developers might already be familiar.

    This comment is an example of how Henri's advice is colored by his interest in the problem of generating well-formed Web feeds. He is right to warn people that carelessly slapping strings together and hoping they are well formed is a dangerous course. I too have written articles advising people to use mature XML toolkits rather than simple text tools when generating XML (see Resources). My concern is that the way in which Henri couches this advice is a bit confusing and could be misconstrued in the broader context of XML processing. He reiterates his advice in the sections, "Don't use text-based templates" and "Don't print". I think this should be summarized as: "Do not use mechanisms that you're not sure will result in well-formed XML." That's very important advice indeed. One approach to safe XML generation is sending SAX events, as Henri suggests in, "Use a tree or a stack (or an XML parser)." If you do so, however, do not assume you are home free. The SAX tools you use might not do all the necessary well-formedness checking. For example, some Unicode characters are not allowed in XML. You may need an additional level of checking to account for such issues.

    Henri rightly suggests that users not try to manage namespaces by hand. As I've discussed on developerWorks, XML namespaces require a great deal of care. His suggestion that developers only think in terms of universal name [namespace Uniform Resource Identifier (URI) plus local name] is generally sound, but sometimes a developer cannot avoid dealing with prefixes or XML declarations. In specifications, such as XSLT, a QName (prefix/local name combination) can be used within attribute values, and the prefix is supposed to be interpreted according to in-scope namespace declarations. This kind of pattern is called a QName in context. In this case, the developer must have control over the declared prefix or the resulting XML processing will fail. When developers do manage their own namespace declarations, the result is often messy because of the complexities of XML namespaces.

    One way to clean up namespace syntax that might become messy while passing through a pipeline of XML processing is to insert a canonicalization step to the end of the pipeline. XML canonicalization eliminates the syntactic variations permitted by XML 1.0 and XML namespaces, including different namespace declaration patterns. Canonicalization will not eliminate all the issues that make namespace declarations treacherous to developers. Canonicalization does not help with QNames in context problems since it does not change the prefixes used in a document, but it does reduce the mess of namespace declarations to the point where you can easily spot problems or even write code to automatically fix them. The GenX library, which is one of the XML generation options Henri suggests, automatically generates canonical XML, and many other toolkits provide canonicalization as an option.

    Henri's advice about Unicode and character handling is almost completely sound. However, in "Avoid adding pretty-printing white space in character data," I think the case is a bit overstated. Pretty-printing XML is safe in most cases between elements, rather than within elements with character data. As Henri says, if you have the XML in Listing 1, it is usually not safe to render it as in Listing 2.


    Listing 1. XML sample
    												
    														<foo>bar</foo>
    
    												
    										


    Listing 2. XML sample with white space added to character data
    												
    														<foo>
      bar
    </foo>
    
    												
    										

    But it is usually safe to pretty-print the XML in Listing 3, so that the output is as in Listing 4.


    Listing 3. Another XML sample
    												
    														<doc><foo>bar</foo></doc>
    
    												
    										


    Listing 4. XML sample in Listing 3 with white space added to character data
    												
    														<doc>
      <foo>bar</foo>
    </doc>
    
    												
    										

    Many XML serializer tools understand this distinction between relatively safe and relatively unsafe pretty-printing. It is important to understand that the form of pretty-printing shown in Listings 3 and 4 can cause distortion if white space is added to mixed content. Such problems can be avoided if the serialization is guided by a schema. In practice, though, most vocabularies that use mixed content are not so sensitive to white space normalization, so don't worry too much about pretty-printing. You should be knowledgeable of the issues, and be sure there is an option to turn pretty-printing off (preferably the default should be to not pretty-print). Henri recommends a pretty-printing practice as in Listing 5, but I disagree because I think it makes for ugly markup that's not friendly to manipulation by people.


    Listing 5. Pretty-printing convention suggested by Henri Sivonen but not recommended by this author
    												
    														<foo
        >bar</foo
    >
    
    												
    										

    From the monastery

    Switching to a very different speed, the second resource I shall explore in this article is Simon St. Laurent's "Monastic XML" (see Resources). This is a collection of brief essays with advice on how to process and even think about XML for maximum effect. Simon uses the metaphors of monasticism and asceticism to suggest that it is dangerous to load XML too heavily with baggage that does not suit its simple, textual roots. In "Marking-up at the foundation," he discusses the fundamental roles of character data and markup (elements and attributes). In "Naming things and reading names," he explains why the generic identifier (also called the element type name) is an important concept and how it should be the sole primary key to the structure of the marked-up information. Realistically, if you're using XML namespaces, the primary key is the universal name (namespace URI plus local name), and this complication is one of the reasons Simon urges caution in "Namespaces as opportunity." "Accepting the discipline of trees" calls out one of XML's dirty secrets: Even though it seems that XML's hierarchical structure could be easily extended to graph structure, in practice, the modeling of graphs in XML has proven a bit difficult. But by far the most important lesson on the "Monastic XML" site is found in "Optimizing markup for processing is always premature." XML is a declarative technology, and therein lies its strengths, as well as its frustrations, for many developers. Developers who try to pull XML design too close to the details of processing generally end up making that processing more difficult in the long term. The key to success with XML is to focus on the nature of the information that needs to be represented in the abstract separately from the technical design of the systems that need to process that information.



    Back to top


    Wrap up

    There is always bound to be some difference of opinion when considering XML best practices, especially in these early stages, but it is great to have a variety of voices on the topic. There are a few other sources for discussion of the topic, and I'll continue to cover them in this column. If you have sources for advice on best practices or want to share your own opinion, please join the discussion on the Thinking XML discussion forum.



    Back to top


    Resources

    Learn

    Get products and technologies
    • Build your next development project with IBM trial software, available for download directly from developerWorks.


    Discuss


    Back to top


    About the author

    Uche Ogbuji photo

    Uche Ogbuji is a consultant and co-founder of Fourthought, Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia, or contact him at uche.ogbuji@fourthought.com.



    凡是有該標志的文章,都是該blog博主Caoer(草兒)原創,凡是索引、收藏
    、轉載請注明來處和原文作者。非常感謝。

    posted on 2007-01-12 10:35 草兒 閱讀(558) 評論(0)  編輯  收藏 所屬分類: 軟件構架java
    主站蜘蛛池模板: 亚欧人成精品免费观看| 青青草国产免费国产是公开| 色猫咪免费人成网站在线观看| 亚洲?V乱码久久精品蜜桃 | 永久黄色免费网站| 久久久久亚洲AV片无码下载蜜桃| 免费黄网站在线看| 亚洲AV美女一区二区三区| 久艹视频在线免费观看| 亚洲一二成人精品区| 亚洲网站免费观看| 亚洲免费在线观看视频| 成人免费视频一区| 香港特级三A毛片免费观看| 久久久久亚洲av毛片大| 黄页免费在线观看| 亚洲一级在线观看| 永久免费毛片手机版在线看| 久久亚洲精品高潮综合色a片| 亚洲国产一级在线观看| 毛片在线全部免费观看| 亚洲一级在线观看| 亚洲国产成人精品91久久久| av永久免费网站在线观看 | 亚洲色成人WWW永久在线观看| 日本一区免费电影| av电影在线免费看| 久久亚洲精精品中文字幕| 好男人www免费高清视频在线| 国产精品成人亚洲| 久久亚洲精品成人777大小说| 波多野结衣在线免费视频| 国产AV无码专区亚洲AV蜜芽| 内射无码专区久久亚洲| 国产无限免费观看黄网站| 亚洲第一成年人网站| 国产成人在线观看免费网站 | 久久夜色精品国产亚洲| 成在线人免费无码高潮喷水| 亚洲成人午夜电影| 2022中文字字幕久亚洲|