Data Warehousing

Key Component Areas

A complete data warehouse architecture includes data and technical elements. Thornthwaite breaks down the architecture into three broad areas. The first, data architecture, is centered on business processes. The next area, infrastructure, includes hardware, networking, operating systems, and desktop machines. Finally, the technical area encompasses the decision-making technologies that will be needed by the users, as well as their supporting structures. These areas are detailed in the sub-sections below.

Data Architecture

As stated above, the data architecture portion of the overall data warehouse architecture is driven by business processes. For example, in a manufacturing environment the data model might include orders, shipping, and billing. Each area draws on a different set of dimensions. But where dimensions intersect in the data model the definitions have to be the same—the same customer who buys is the same that builds. So data items should have a common structure and content, and involve a single process to create and maintain.

Thornthwaite says that organizations often ask how data should be represented in the warehouse—entity/relationship or dimensional? “If you have a star schema1 then use dimensional. Is your detail normalized2 or dimensional? Will users be querying detail? Then use dimensional.” He adds that most data warehousing experts are in substantial agreement; the [data] sources are typically entity/relationship models and the front end is a dimensional model. The only issue is where you draw the line between the warehouse itself and the data staging area.

As you work through the architecture and present data to your users, tool choices will be made, but many choices will disappear as the requirements are set. For example, he explains that product capabilities are beginning to merge, like MOLAP and ROLAP. “MOLAP is okay if you stay within the cube you’ve built. It’s fast and allows for flexible querying—within the confines of the cube.” Its weaknesses are size (overall and within a dimension), design constraints (limited by the cube structure), and the need for a proprietary data base.

Infrastructure Architecture

With the required hardware platform and boxes, sometimes the data warehouse becomes its own IS shop. Indeed, there are lots of “boxes” in data warehousing, mostly used for data bases and application servers.

The issues with hardware and DBMS choices are size, scalability, and flexibility. In about 80 percent of data warehousing projects this isn’t difficult; most businesses can get enough power to handle their needs.

In terms of the network, check the data sources, the warehouse staging area, and everything in between to ensure there’s enough bandwidth to move data around. On the desktop, run the tools and actually get some data through them to determine if there’s enough power for retrieval. Sometimes the problem is simply with the machine, and the desktops must be powerful enough to run current-generation access tools. Also, don’t forget to implement a software distribution mechanism.

Technical Architecture

The technical architecture is driven by the meta data catalog. “Everything should be meta data-driven,” says Thornthwaite. “The services should draw the needed parameters from tables, rather than hard-coding them.” An important component of technical architecture is the data staging process, which covers five major areas:

  • Extract - data comes from multiple sources and is of multiple types. Data compression and encryption handling must be considered at this area, if it applies.
  • Transform - data transformation includes surrogate key management, integration, de-normalization, cleansing, conversion, aggregation, and auditing.
  • Load - loading is often done to multiple targets, with load optimization and support for the entire load cycle.
  • Security - administrator access and data encryption policies.
  • Job control - this includes job definition, job scheduling (time and event), monitoring, logging, exception handling, error handling, and notification.

Pages: 1 2 3 4 5

Latest Blog Entries

Full Blog »
  • March 19, 2013

    API: The Solution to Centralized & Coordinated Assessment

    Posted by Erica Harrison

    Although homelessness in America has declined by 5.7 percent since 2007, the amount of homeless persons has remained relatively unchanged, decreasing only slightly since 2011.[1]

    This lack of reduction is certainly not due to lack of effort. CoCs around the nation have significantly boosted their efforts to fight homelessness in their communities. Instead, this slow rate of decline is due in part to economic reasons, but largely due to data communication barriers among and within CoCs nationwide.

  • March 19, 2013

    Canned Reporting: Enhancing Business Intelligence

    Posted by Erica Harrison

    Reports are designed to answer questions for the end user. Whether it is just one question, or a complex conglomeration of questions, a report can provide the answers.

    But what happens when an end user want information that exceeds the capability of an existing report?

  • February 22, 2013

    What is System Integration? – The Benefits of Consolidation

    Posted by Erica Harrison

    Advances in technology have been hitting the global market with fierce speed, placing pressure on enterprises of all sizes to adopt systems that meet the rising level of customer demands. As a result, enterprises are scrambling to find ways to maintain the functionality of their current systems while also riding the wave of innovative technology that keeps them ahead of the competition.

  • Read All