“Every data warehouse has an architecture,” says Warren Thornthwaite, a partner with Menlo Park, CA-based InfoDynamics LLC. “It’s either ad hoc or planned; implied or documented. Unfortunately, many warehouses are developed without an explicit architectural plan, which severely limits flexibility.” Without architecture, subject areas don’t fit together, connections lead to nowhere, and the whole warehouse is difficult to manage and change. In addition, although it might not seem important, the architecture of a data warehouse becomes the framework for product selection.
Thornthwaite compares the development of a data warehouse to building a real house. “But how do you build a $3 million mansion, let alone a $100,000 house?” You do it with blueprints, he says—the drawings, specifications, and standards showing how the house will be constructed, at multiple levels of detail. Of course, there are different versions of the blueprint for various subsystems of the house, such as plumbing, electrical, HVAC, communications, and vacuum. There’s also standards that all homes follow, including plugs, lights, plumbing fixtures, door sizes, etc.
For data warehousing, the architecture is a description of the elements and services of the warehouse, with details showing how the components will fit together and how the system will grow over time. Like the house analogy, the warehouse architecture is a set of documents, plans, models, drawings, and specifications, with separate sections for each key component area and enough detail to allow their implementation by skilled professionals.
“This is not a requirements document,” Thornthwaite points out. “The requirements document says what the architecture needs to do. The architecture also isn’t a project plan or task list; it’s the what, not the how or why.”
It’s also not easy, he says, because we’ve only been developing data warehouse systems for 15 years, versus 5,000 years for building homes. Therefore we have fewer standards, the tools and techniques are rapidly evolving, there is little documentation of what systems we already have, and data warehouse terminology is extremely loose.
So while developing an architecture is difficult, it is possible—and it’s critical. First and foremost, he says, the architecture has to be driven by the business. If your requirement is to have nightly updates, this has implications for the architecture, and you must understand the technical requirements to achieve what you want to have. Thornthwaite gives a few business requirement examples, and the general technical considerations for each:
- Nightly updates - adequate staging horsepower.
- Worldwide availability - parallel or distributed servers.
- Customer-level analysis - [large] server size.
- New data sources - flexible tools with support for meta data.
- Reliability - job control features.