Data Warehousing

“Every data warehouse has an architecture,” says Warren Thornthwaite, a partner with Menlo Park, CA-based InfoDynamics LLC. “It’s either ad hoc or planned; implied or documented. Unfortunately, many warehouses are developed without an explicit architectural plan, which severely limits flexibility.” Without architecture, subject areas don’t fit together, connections lead to nowhere, and the whole warehouse is difficult to manage and change. In addition, although it might not seem important, the architecture of a data warehouse becomes the framework for product selection.

Thornthwaite compares the development of a data warehouse to building a real house. “But how do you build a $3 million mansion, let alone a $100,000 house?” You do it with blueprints, he says—the drawings, specifications, and standards showing how the house will be constructed, at multiple levels of detail. Of course, there are different versions of the blueprint for various subsystems of the house, such as plumbing, electrical, HVAC, communications, and vacuum. There’s also standards that all homes follow, including plugs, lights, plumbing fixtures, door sizes, etc.

For data warehousing, the architecture is a description of the elements and services of the warehouse, with details showing how the components will fit together and how the system will grow over time. Like the house analogy, the warehouse architecture is a set of documents, plans, models, drawings, and specifications, with separate sections for each key component area and enough detail to allow their implementation by skilled professionals.

“This is not a requirements document,” Thornthwaite points out. “The requirements document says what the architecture needs to do. The architecture also isn’t a project plan or task list; it’s the what, not the how or why.”

It’s also not easy, he says, because we’ve only been developing data warehouse systems for 15 years, versus 5,000 years for building homes. Therefore we have fewer standards, the tools and techniques are rapidly evolving, there is little documentation of what systems we already have, and data warehouse terminology is extremely loose.

So while developing an architecture is difficult, it is possible—and it’s critical. First and foremost, he says, the architecture has to be driven by the business. If your requirement is to have nightly updates, this has implications for the architecture, and you must understand the technical requirements to achieve what you want to have. Thornthwaite gives a few business requirement examples, and the general technical considerations for each:

  • Nightly updates - adequate staging horsepower.
  • Worldwide availability - parallel or distributed servers.
  • Customer-level analysis - [large] server size.
  • New data sources - flexible tools with support for meta data.
  • Reliability - job control features.

Pages: 1 2 3 4 5

Latest Blog Entries

Full Blog »
  • June 21, 2013

    A History of the HMIS & Cloud Computing

    Posted by Erica Harrison

    In May 1729 Philadelphia passed a law allowing citizens to deport local homeless persons.1

    In 1873, Civil War veterans suffered derogatory slurs such as “bums” and “tramps” after railroad giant Jay Cooke & Company shut its doors, eliminating the main source of employment for soldiers post Civil War.2
    Views on homelessness, and the services provided to those experiencing homelessness, [...]

  • May 22, 2013

    Business Process Automation: The Role of Customized Software

    Posted by Erica Harrison

    There is no argument that IT has significantly enhanced business productivity. However, by doing so, IT has in turn increased the volume and diversity of IT-dependent services. This dramatic increase in demand for efficient and complex IT solutions has stretched IT infrastructures to the limits, especially in regards to scalability. Despite the seemingly automated characteristics of IT services, many IT processes are still manual and thus susceptible to human error. These errors can bring business operations to a halt, regardless of the size of the organization.

    There is a solution however; one that can drastically reduce or even eliminate human error while also reducing costs and improving productivity. This solution comes in the form of Business Process Automation (BPA)

  • March 19, 2013

    API: The Solution to Centralized & Coordinated Assessment

    Posted by Erica Harrison

    Although homelessness in America has declined by 5.7 percent since 2007, the amount of homeless persons has remained relatively unchanged, decreasing only slightly since 2011.[1]

    This lack of reduction is certainly not due to lack of effort. CoCs around the nation have significantly boosted their efforts to fight homelessness in their communities. Instead, this slow rate of decline is due in part to economic reasons, but largely due to data communication barriers among and within CoCs nationwide.

  • Read All