Data Warehousing

The staging box needs to be able to extract data from multiple sources, like MVS, Oracle, VM, and others, so be specific when you choose your products. It must handle data compression and encryption, transformation, loading (possibly to multiple targets), and security (at the front end this is challenging, Thornthwaite says). In addition, the staging activities need to be automated. Many vendors’ offerings do different things, so he advises that most organizations will need to use multiple products.

A system for monitoring data warehouse use is valuable for capturing queries and tracking usage, and performance tuning is also helpful. Performance optimization includes cost estimation through a “governor” tool, and should include ad hoc query scheduling. Middleware can provide query management services. Tools for all of these and other related tasks are available for the front end, for server-based query management, and for data from multiple sources. Tools are also available for reporting, connectivity, and infrastructure management. Finally, the data access piece should include reporting services (such as publish and subscribe), a report library, a scheduler, and a distribution manager.

A Word About Meta Data

The creation and management of data has the following “steps” in the data warehouse process:

  1. warehouse model
  2. source definitions
  3. table definitions
  4. source-to-target maps
  5. map and transformation information
  6. physical information (table spaces, etc.)
  7. extracted data
  8. transformed data
  9. load statistics
  10. business descriptions
  11. query requests
  12. the data itself
  13. query statistics

To show how important meta data is, of the steps listed above only three involve “real” data—7, 8, and 12. “Everything else is meta data,” says Thornthwaite, “and the whole data warehouse process relies on it.” The major technical elements of a meta data catalog include:

  • Business rules - includes definitions, derivations, related items, validation, and hierarchy information (versions, dates, etc.).
  • Movement/transformation information - source/destination information, as well as DDL (data types, names, etc.).
  • Operations information - data load job schedules, dependencies, notification, and reliability information (such as host redirects and load balancing).
  • Tool-specific information - graphic display information and special function support.
  • Security rules - authentication and authorization.

Developing an Architecture

When you develop the technical architecture model, draft the architecture requirements document first. Next to each business requirement write down its architecture implications. Group these implications according to architecture areas (remote access, staging, data access tools, etc.) Understand how it fits in with the other areas. Capture the definition of the area and its contents. Then refine and document the model.

Thornthwaite recognizes that developing a data warehouse architecture is difficult, and thus warns against using a “just do it” approach, which he also calls “architecture lite.” But the Zachman framework is more than what most organizations need for data warehousing, so he recommends a reasonable compromise consisting of a four-layer process: business requirements, technical architecture, standards, and products.

Business requirements essentially drive the architecture, so talk to business managers, analysts, and power users. From your interviews look for major business issues, as well as indicators of business strategy, direction, frustrations, business processes, timing, availability, and performance expectations. Document everything well.

From an IT perspective, talk to existing data warehouse/DSS support staff, OLTP application groups, and DBAs; as well as networking, OS, and desktop support staff. Also speak with architecture and planning professionals. Here you want to get their opinions on data warehousing considerations from the IT viewpoint. Learn if there are existing architecture documents, IT principles, standards statements, organizational power centers, etc.

Not many standards exist for data warehousing, but there are standards for a lot of the components. The following are some to keep in mind:

Pages: 1 2 3 4 5

Latest Blog Entries

Full Blog »
  • June 21, 2013

    A History of the HMIS & Cloud Computing

    Posted by Erica Harrison

    In May 1729 Philadelphia passed a law allowing citizens to deport local homeless persons.1

    In 1873, Civil War veterans suffered derogatory slurs such as “bums” and “tramps” after railroad giant Jay Cooke & Company shut its doors, eliminating the main source of employment for soldiers post Civil War.2
    Views on homelessness, and the services provided to those experiencing homelessness, [...]

  • May 22, 2013

    Business Process Automation: The Role of Customized Software

    Posted by Erica Harrison

    There is no argument that IT has significantly enhanced business productivity. However, by doing so, IT has in turn increased the volume and diversity of IT-dependent services. This dramatic increase in demand for efficient and complex IT solutions has stretched IT infrastructures to the limits, especially in regards to scalability. Despite the seemingly automated characteristics of IT services, many IT processes are still manual and thus susceptible to human error. These errors can bring business operations to a halt, regardless of the size of the organization.

    There is a solution however; one that can drastically reduce or even eliminate human error while also reducing costs and improving productivity. This solution comes in the form of Business Process Automation (BPA)

  • March 19, 2013

    API: The Solution to Centralized & Coordinated Assessment

    Posted by Erica Harrison

    Although homelessness in America has declined by 5.7 percent since 2007, the amount of homeless persons has remained relatively unchanged, decreasing only slightly since 2011.[1]

    This lack of reduction is certainly not due to lack of effort. CoCs around the nation have significantly boosted their efforts to fight homelessness in their communities. Instead, this slow rate of decline is due in part to economic reasons, but largely due to data communication barriers among and within CoCs nationwide.

  • Read All