Data Warehousing

Key Component Areas

A complete data warehouse architecture includes data and technical elements. Thornthwaite breaks down the architecture into three broad areas. The first, data architecture, is centered on business processes. The next area, infrastructure, includes hardware, networking, operating systems, and desktop machines. Finally, the technical area encompasses the decision-making technologies that will be needed by the users, as well as their supporting structures. These areas are detailed in the sub-sections below.

Data Architecture

As stated above, the data architecture portion of the overall data warehouse architecture is driven by business processes. For example, in a manufacturing environment the data model might include orders, shipping, and billing. Each area draws on a different set of dimensions. But where dimensions intersect in the data model the definitions have to be the same—the same customer who buys is the same that builds. So data items should have a common structure and content, and involve a single process to create and maintain.

Thornthwaite says that organizations often ask how data should be represented in the warehouse—entity/relationship or dimensional? “If you have a star schema1 then use dimensional. Is your detail normalized2 or dimensional? Will users be querying detail? Then use dimensional.” He adds that most data warehousing experts are in substantial agreement; the [data] sources are typically entity/relationship models and the front end is a dimensional model. The only issue is where you draw the line between the warehouse itself and the data staging area.

As you work through the architecture and present data to your users, tool choices will be made, but many choices will disappear as the requirements are set. For example, he explains that product capabilities are beginning to merge, like MOLAP and ROLAP. “MOLAP is okay if you stay within the cube you’ve built. It’s fast and allows for flexible querying—within the confines of the cube.” Its weaknesses are size (overall and within a dimension), design constraints (limited by the cube structure), and the need for a proprietary data base.

Infrastructure Architecture

With the required hardware platform and boxes, sometimes the data warehouse becomes its own IS shop. Indeed, there are lots of “boxes” in data warehousing, mostly used for data bases and application servers.

The issues with hardware and DBMS choices are size, scalability, and flexibility. In about 80 percent of data warehousing projects this isn’t difficult; most businesses can get enough power to handle their needs.

In terms of the network, check the data sources, the warehouse staging area, and everything in between to ensure there’s enough bandwidth to move data around. On the desktop, run the tools and actually get some data through them to determine if there’s enough power for retrieval. Sometimes the problem is simply with the machine, and the desktops must be powerful enough to run current-generation access tools. Also, don’t forget to implement a software distribution mechanism.

Technical Architecture

The technical architecture is driven by the meta data catalog. “Everything should be meta data-driven,” says Thornthwaite. “The services should draw the needed parameters from tables, rather than hard-coding them.” An important component of technical architecture is the data staging process, which covers five major areas:

  • Extract - data comes from multiple sources and is of multiple types. Data compression and encryption handling must be considered at this area, if it applies.
  • Transform - data transformation includes surrogate key management, integration, de-normalization, cleansing, conversion, aggregation, and auditing.
  • Load - loading is often done to multiple targets, with load optimization and support for the entire load cycle.
  • Security - administrator access and data encryption policies.
  • Job control - this includes job definition, job scheduling (time and event), monitoring, logging, exception handling, error handling, and notification.

Pages: 1 2 3 4 5

Latest Blog Entries

Full Blog »
  • June 21, 2013

    A History of the HMIS & Cloud Computing

    Posted by Erica Harrison

    In May 1729 Philadelphia passed a law allowing citizens to deport local homeless persons.1

    In 1873, Civil War veterans suffered derogatory slurs such as “bums” and “tramps” after railroad giant Jay Cooke & Company shut its doors, eliminating the main source of employment for soldiers post Civil War.2
    Views on homelessness, and the services provided to those experiencing homelessness, [...]

  • May 22, 2013

    Business Process Automation: The Role of Customized Software

    Posted by Erica Harrison

    There is no argument that IT has significantly enhanced business productivity. However, by doing so, IT has in turn increased the volume and diversity of IT-dependent services. This dramatic increase in demand for efficient and complex IT solutions has stretched IT infrastructures to the limits, especially in regards to scalability. Despite the seemingly automated characteristics of IT services, many IT processes are still manual and thus susceptible to human error. These errors can bring business operations to a halt, regardless of the size of the organization.

    There is a solution however; one that can drastically reduce or even eliminate human error while also reducing costs and improving productivity. This solution comes in the form of Business Process Automation (BPA)

  • March 19, 2013

    API: The Solution to Centralized & Coordinated Assessment

    Posted by Erica Harrison

    Although homelessness in America has declined by 5.7 percent since 2007, the amount of homeless persons has remained relatively unchanged, decreasing only slightly since 2011.[1]

    This lack of reduction is certainly not due to lack of effort. CoCs around the nation have significantly boosted their efforts to fight homelessness in their communities. Instead, this slow rate of decline is due in part to economic reasons, but largely due to data communication barriers among and within CoCs nationwide.

  • Read All