Data Warehousing

The staging box needs to be able to extract data from multiple sources, like MVS, Oracle, VM, and others, so be specific when you choose your products. It must handle data compression and encryption, transformation, loading (possibly to multiple targets), and security (at the front end this is challenging, Thornthwaite says). In addition, the staging activities need to be automated. Many vendors’ offerings do different things, so he advises that most organizations will need to use multiple products.

A system for monitoring data warehouse use is valuable for capturing queries and tracking usage, and performance tuning is also helpful. Performance optimization includes cost estimation through a “governor” tool, and should include ad hoc query scheduling. Middleware can provide query management services. Tools for all of these and other related tasks are available for the front end, for server-based query management, and for data from multiple sources. Tools are also available for reporting, connectivity, and infrastructure management. Finally, the data access piece should include reporting services (such as publish and subscribe), a report library, a scheduler, and a distribution manager.

A Word About Meta Data

The creation and management of data has the following “steps” in the data warehouse process:

  1. warehouse model
  2. source definitions
  3. table definitions
  4. source-to-target maps
  5. map and transformation information
  6. physical information (table spaces, etc.)
  7. extracted data
  8. transformed data
  9. load statistics
  10. business descriptions
  11. query requests
  12. the data itself
  13. query statistics

To show how important meta data is, of the steps listed above only three involve “real” data—7, 8, and 12. “Everything else is meta data,” says Thornthwaite, “and the whole data warehouse process relies on it.” The major technical elements of a meta data catalog include:

  • Business rules - includes definitions, derivations, related items, validation, and hierarchy information (versions, dates, etc.).
  • Movement/transformation information - source/destination information, as well as DDL (data types, names, etc.).
  • Operations information - data load job schedules, dependencies, notification, and reliability information (such as host redirects and load balancing).
  • Tool-specific information - graphic display information and special function support.
  • Security rules - authentication and authorization.

Developing an Architecture

When you develop the technical architecture model, draft the architecture requirements document first. Next to each business requirement write down its architecture implications. Group these implications according to architecture areas (remote access, staging, data access tools, etc.) Understand how it fits in with the other areas. Capture the definition of the area and its contents. Then refine and document the model.

Thornthwaite recognizes that developing a data warehouse architecture is difficult, and thus warns against using a “just do it” approach, which he also calls “architecture lite.” But the Zachman framework is more than what most organizations need for data warehousing, so he recommends a reasonable compromise consisting of a four-layer process: business requirements, technical architecture, standards, and products.

Business requirements essentially drive the architecture, so talk to business managers, analysts, and power users. From your interviews look for major business issues, as well as indicators of business strategy, direction, frustrations, business processes, timing, availability, and performance expectations. Document everything well.

From an IT perspective, talk to existing data warehouse/DSS support staff, OLTP application groups, and DBAs; as well as networking, OS, and desktop support staff. Also speak with architecture and planning professionals. Here you want to get their opinions on data warehousing considerations from the IT viewpoint. Learn if there are existing architecture documents, IT principles, standards statements, organizational power centers, etc.

Not many standards exist for data warehousing, but there are standards for a lot of the components. The following are some to keep in mind:

Pages: 1 2 3 4 5

Latest Blog Entries

Full Blog »
  • March 19, 2013

    API: The Solution to Centralized & Coordinated Assessment

    Posted by Erica Harrison

    Although homelessness in America has declined by 5.7 percent since 2007, the amount of homeless persons has remained relatively unchanged, decreasing only slightly since 2011.[1]

    This lack of reduction is certainly not due to lack of effort. CoCs around the nation have significantly boosted their efforts to fight homelessness in their communities. Instead, this slow rate of decline is due in part to economic reasons, but largely due to data communication barriers among and within CoCs nationwide.

  • March 19, 2013

    Canned Reporting: Enhancing Business Intelligence

    Posted by Erica Harrison

    Reports are designed to answer questions for the end user. Whether it is just one question, or a complex conglomeration of questions, a report can provide the answers.

    But what happens when an end user want information that exceeds the capability of an existing report?

  • February 22, 2013

    What is System Integration? – The Benefits of Consolidation

    Posted by Erica Harrison

    Advances in technology have been hitting the global market with fierce speed, placing pressure on enterprises of all sizes to adopt systems that meet the rising level of customer demands. As a result, enterprises are scrambling to find ways to maintain the functionality of their current systems while also riding the wave of innovative technology that keeps them ahead of the competition.

  • Read All