Data Warehousing
The staging box needs to be able to extract data from multiple sources, like MVS, Oracle, VM, and others, so be specific when you choose your products. It must handle data compression and encryption, transformation, loading (possibly to multiple targets), and security (at the front end this is challenging, Thornthwaite says). In addition, the staging activities need to be automated. Many vendors’ offerings do different things, so he advises that most organizations will need to use multiple products.
A system for monitoring data warehouse use is valuable for capturing queries and tracking usage, and performance tuning is also helpful. Performance optimization includes cost estimation through a “governor” tool, and should include ad hoc query scheduling. Middleware can provide query management services. Tools for all of these and other related tasks are available for the front end, for server-based query management, and for data from multiple sources. Tools are also available for reporting, connectivity, and infrastructure management. Finally, the data access piece should include reporting services (such as publish and subscribe), a report library, a scheduler, and a distribution manager.
A Word About Meta Data
The creation and management of data has the following “steps” in the data warehouse process:
- warehouse model
- source definitions
- table definitions
- source-to-target maps
- map and transformation information
- physical information (table spaces, etc.)
- extracted data
- transformed data
- load statistics
- business descriptions
- query requests
- the data itself
- query statistics
To show how important meta data is, of the steps listed above only three involve “real” data—7, 8, and 12. “Everything else is meta data,” says Thornthwaite, “and the whole data warehouse process relies on it.” The major technical elements of a meta data catalog include:
- Business rules - includes definitions, derivations, related items, validation, and hierarchy information (versions, dates, etc.).
- Movement/transformation information - source/destination information, as well as DDL (data types, names, etc.).
- Operations information - data load job schedules, dependencies, notification, and reliability information (such as host redirects and load balancing).
- Tool-specific information - graphic display information and special function support.
- Security rules - authentication and authorization.
Developing an Architecture
When you develop the technical architecture model, draft the architecture requirements document first. Next to each business requirement write down its architecture implications. Group these implications according to architecture areas (remote access, staging, data access tools, etc.) Understand how it fits in with the other areas. Capture the definition of the area and its contents. Then refine and document the model.
Thornthwaite recognizes that developing a data warehouse architecture is difficult, and thus warns against using a “just do it” approach, which he also calls “architecture lite.” But the Zachman framework is more than what most organizations need for data warehousing, so he recommends a reasonable compromise consisting of a four-layer process: business requirements, technical architecture, standards, and products.
Business requirements essentially drive the architecture, so talk to business managers, analysts, and power users. From your interviews look for major business issues, as well as indicators of business strategy, direction, frustrations, business processes, timing, availability, and performance expectations. Document everything well.
From an IT perspective, talk to existing data warehouse/DSS support staff, OLTP application groups, and DBAs; as well as networking, OS, and desktop support staff. Also speak with architecture and planning professionals. Here you want to get their opinions on data warehousing considerations from the IT viewpoint. Learn if there are existing architecture documents, IT principles, standards statements, organizational power centers, etc.
Not many standards exist for data warehousing, but there are standards for a lot of the components. The following are some to keep in mind:








