Moris' Note Book

本博客所有内容皆收集于网上,非本人原创,非心情日记,非研究心得,只是自己浏览的东西的收集
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

原文:

https://danfoody.sys-con.com/node/40740/mobile

As information technology professionals progress in their knowledge and use of XML and Web services, the question of XML performance persists. In hallway chats, one might hear that "XML takes up too much bandwidth" or "XML takes too many CPU cycles to process."

Unfortunately, these beliefs lead to behaviors inconsistent with best practices for building and deploying Web service-based systems that will stand the test of time. These behaviors include continuing to operate with a proprietary non-XML architecture, and designing the architecture around network devices that do hardware XML processing.

This article examines the myths that surround XML performance issues to help IT professionals avoid the pitfalls associated with the behaviors described above.

A Closer Look at XML Bandwidth
Local area networks have lots of bandwidth at a very low cost - but wide area networks are another matter. While they've been improving by leaps and bounds, it can still be prohibitively expensive to expand the capacity of a WAN link and, as such, WANs can be bandwidth constrained, leading, in some cases, to a raw bandwidth issue.

So, what is it about XML that takes up so much bandwidth? There are really two separate issues. The first is that XML is text, which inherently takes up more space than binary formats. A 32-bit integer could be represented in 4 binary bytes, but take over 10 bytes when transmitted in text form (2.5X larger).

The second is that XML is self-describing, which results in lots of repeating patterns of text. For example, each element name must be explicitly spelled out in both the start tag of the element and the end tag of the element. This adds a lot of extra repetitive text into the document. So, while there are distinct advantages to a self-describing message, it still consumes a lot of bandwidth on the WAN…or does it?

The reality is that many organizations are starting to use hardware compression on their WAN links to reduce bandwidth consumption. And text has the highest compression ratio - in fact, text with lots of repetition can often be compressed 10X. XML is extremely compressible. So, if your WAN links are bandwidth constrained, you should probably be running compression - and if you are running compression, XML will be as efficient as other binary formats and potentially even more efficient since it's much more compressible than a binary stream.

CPU Cycles and XML
What about the CPU cycles required to process XML - doesn't that create a performance drag? It's true that XML is expensive to process. A typical single- processor 1GHz machine can process XML at a rate of about 4MB to 8MB per second, depending on whether you are using DOM or SAX. Now, the real data in that 4MB to 8MB of XML per second is actually significantly less (because of all of the XML tags): that 4MB to 8MB of XML might be equivalent to 1MB or 2MB of actual data processed per second.

But don't forget Moore's law - processing power is increasing rapidly, so what creates a bottleneck today will likely be inconsequential in the near future. Further, if the processing power for XML is a broadly recognized issue, it's highly likely that microprocessor vendors will add instructions to accelerate XML processing. Don't assume that XML processing performance will be limited by Moore's law - it's likely to surpass it if processing moves into the hardware.

Options for Boosting XML Performance
The promise of microprocessor-based acceleration is great, but where does that leave those who need that extra performance today? A simple but perhaps not so obvious option is to reduce the amount of XML processing being done. For example, when choosing an XML proxy or intermediary (such as a Web service management broker), choose one that processes only the portion of XML required to perform each specific function. Also, avoid chaining together products that perform redundant XML parsing and processing serially. For example, instead of doing XML security processing separate from Web service management and routing, choose a product that integrates both of these into a single processing step.

Another option is to deploy a stand-alone "XML accelerator" appliance. Unfortunately, these don't actually provide the expected benefit. For example, many assume these appliances offload XML processing from the application - in fact they don't. An XML-based application still must parse the XML - there's no way to get around that. What XML accelerators can do very effectively is help convert XML in transit. For example, if a Web service application returns a purchase order in XML, an XML accelerator could convert that order to HTML very effectively. So, it might be better to consider these appliances "XSLT accelerators," since that's their most effective function. While they can perform some other forms of XML processing, generally there's little performance advantage in these cases - all of which weighs against the heavy disadvantage of the appliance as a "black box" that can't be managed or extended.

One other alternative looks promising (though the market is still evolving). A number of companies are now building PCI hardware boards that accelerate XML processing. Unlike the appliance "XML accelerators" these boards plug into your application servers and take over the XML processing tasks from the main processor. They do this by plugging in an alternate "provider" under XML processing libraries that applications use (for example, the Java JAXP XML processor APIs). So, they can actually accelerate any XML-based application (whether developed in-house or purchased) transparently without any changes to the application or the network topology. If performance is critical, these hardware boards are a good tactical solution as you wait for Moore's law or the microprocessor vendors to catch up to your performance needs.

Conclusion
If you find yourself worried about XML processing bandwidth and performance, don't take actions that affect your overall architecture or approach to deploying XML and Web service applications. Often the simplest solutions (such as reducing unnecessary and redundant processing) have the biggest bang for the buck. Beyond this, the best approach is to architect your overall strategy assuming bandwidth and performance will not be a problem. Then if you discover there is an XML processing issue in a specific case, address this with a tactical solution that doesn't undermine your overall strategy.

© 2008 SYS-CON Media