New Business Intelligence Articles And Links

Mark Rittman's Oracle Weblog

上面看到的一篇随笔,里面提到的文章和论文值得看下:

 

A few articles and papers I came across recently, that you might find interesting.

First up is a number of in-depth articles written by Howard Rogers, on some Oracle 10g features that BI&W developers might find interesting. The first article is on materialized views, which Howard describes as "amongst the cleverest, neatest, most productive features Oracle has ever come up with, and they continue to impress me every time I use them, even though they’ve been around for many, many Oracle versions (including a stint in ancient versions as something known as a ‘snapshot table’)" Materialized views are the cornerstone of Oracle's relational OLAP offering and this article goes through the manual creation of materialized views, the use of the new DBMS_ADVISOR which writes the materialized view creation code for you, and the miscellaneous small improvements in materialized views that come with 10g. The other two articles, on writable external tables and the new workload repository, are also useful from an ETL and tuning standpoint, and whilst you're there I'd recommend taking a look at the rest of the site and in particular Howard's blog, as he's a veteran of the various Oracle usenet groups and a recently "ex-member" of the Oaktable. Always well written and worth taking the time out to read.

Oracle's Antony Reynolds recently posted a blog article about scaling J2EE applications, which caught my interest as I recently worked on a couple of short engagements that centred around the stress-testing of a J2EE application. According to Antony, the factors that usually limit applications are resource contention, non-linear scaling, cost and manageability and mentions two common "gotcha's" when scaling J2EE applications - mismatched connection pools and unnecessary container managed transactions. A good, fairly concise article that gives a good heads up on what can limit the scalability of a J2EE app and how to deal with this.

There was an interesting posting on the AMIS blog the other day about model-driven design on ETL functions. The posting references an article in Database Magazine by Mark Zwijsen which proposes that "by structuring the functional design of the ETL process, it should be possible to automatically generate about 80% of the actual ETL code". Unfortunately there isn't a link to an online version of the article and I can't locate it with a Google search (maybe it's in Dutch?) but there's a bit more detail in the blog posting and a diagram of a possible ETL structure.

Nigel Pendse recently published the results of the OLAP Survey 4 and following on from that wrote an article for DMReview that went through some of the findings. A couple of points that I found interesting were firstly, that Oracle Discoverer users report the worst query performance, and that secondly, Oracle Express users report the best data quality. Looking at the comment about Discoverer in particular, I can't help thinking that this is because, with later versions of Discoverer (9ias and up) it's considerably more difficult to get summaries working than with earlier versions; with Discoverer 3i, Discoverer itself managed the summary tables, and redirected it's queries to summaries if they would speed up queries. With current versions of Discoverer, you use the query rewrite/materialized views mechanism, which means that the DBA has to get the rewrite mechanism working (changing initialisation parameters for the database), grant certain permissions to both the Discoverer administrator, the End User Layer and the Discoverer users so that query rewrite can take place, and even then with early versions of Discoverer 9iAS running against 8i and 9i databases, it was hit and miss as to whether a materialized view would actually get used. With current versions, it's all a lot more reliable, but it still takes DBA intervention and changes to the database parameters, and in most cases the Discoverer administrator never even knows about these issues and what needs to be done to enable summaries, and just assumes that performance will always be this bad. Coupled with that, for most Discoverer reporting projects, unlike projects that use dedicated, proprietary OLAP servers, there usually isn't a section in the project plan to create and manage summaries, as they're not mandatory (as they are with OLAP servers such  as Express or Powerplay) and the system seems to work just fine without them, at least at the start. In short, there's no technical reason why Discoverer users should have to put up with poor query performance, but because of the combination of a difficult to (initially) set up summary management mechanism, and the fact that summaries are "optional", in fact most Discoverer installations do run pretty slowly.

BI Blogs describes itself as "Bringing together Business Intelligence information from across the blogosphere!" and aggregates feeds from a number of Oracle, Microsoft and other BI-focussed blogs including ones by Mosha Pasumansky, the Hitachi Consulting Yukon Team, Jonathan Weitz and yours truly. I thought this was an interesting development as I keep a close eye on what Microsoft, Cognos, Business Objects and the like are doing with business intelligence & OLAP, and it's particularly interesting to see how the Yukon business intelligence features are shaping up. One to bookmark.

Lastly, there's a trio of new articles in IntelligentEnterprise that'll be of interest to readers of this site. Ralph Kimball and Margy Ross look at slowly changing dimensions, and how types 1, 2 and 3 sometimes need to be adapted for more analytically mature warehouse users; Seth Grimes looks at unstructured data and domain-rooted semantics, whilst Michael Gonzales looks at using rules-based audits and proofs of concept to avoid potential BI project pitfalls.

posted @ 2005-06-29 08:49  蜡人张  阅读(1095)  评论(0编辑  收藏  举报