The early days of BI
In the article The origins of today’s OLAP products, Nigel Pendse chronicles the history of OLAP (On-line Analytical Processing), tracing the ancestry of the technology as far back as 1962.
Personally, my introduction to BI (Business Intelligence) technologies doesn’t go back quite as far. In the late 1980’s, I worked for a systems integrator, developing and supporting COBOL applications software. Around this time, we noticed that our customers were increasingly spending time and money on reporting tools and report development.

At the time, we used report writer (designer/generator) tools. Sophisticated users generated report specifications using the designer; the generator tools translated the report specifications into COBOL that could be compiled and executed to produce reports. These tools seemed to work pretty well, and they certainly were a big improvement over hand coding. The main problem, as we soon learned, was that report writers generated reports - and lots of them. So it wasn’t long before I found myself standing before an customer’s executive committee to explain why they had four “green bar” sales reports for February in hand, none of them matching, and all of them “wrong”!
Upon investigation, I came to suspect that the reports didn’t jive because:
- Differing formulas to compute sales, and
- Corrections made to the data between report runs.
A review of the report specifications quickly proved the first point. But the second point was impossible to prove, because the database was volatile (the content changing over time) and the month-end processing had overwritten the necessary forensic data.
To solve the database volatility problem, I wrote an application to take periodic snapshots of the database, and inserted code between the application and database to detect and capture all changes. Each audit record stored before and after states, the effective date of the transaction, the dates that the transaction was applied to the database, and the user who effected the change. The application moved the database snapshots and audit trails into a separate data store to facilitate reporting consistency. This solution enabled programs to reconstruct the state of the data as of any point in time, albeit with a little extra work. Of course, application performance took a hit due to the extra processing required. Later on, I began using RDBMS (Relational Database Management Systems) and OLAP tools to replace the COBOL report generators. But crude as it was, I still consider this system to have been the first of many data warehouses that I’ve had a hand in building.
A retrospective on data warehousing
The development and commercialization of RDBMS and OLAP technologies solved some of the performance problems that I encountered in my first data warehouse, but in my experience, early OLAP applications successes generally were limited to departmental or line of business reporting solutions. Practitioners frequently encountered a couple of significant technology implements when attempting to build EDW (Enterprise Data Warehouse) solutions:
- Scalability - OLAP solutions performed well as long as the amount of data remained small, but performance fell off quickly as data volumes increased.
- Data quality - Enterprise solutions often involved collecting and integrating data from multiple heterogeneous applications and databases, most of which were never designed to integrate data to produce a single version of the truth.
- Business requirements - Many EDWs were built in the absence of well understood business requirements, resulting in data that was insufficient for the (eventually discovered) intended uses.
- Service Level Agreements - Even if the above problems were solved, since an EDW collected data from other systems, the service levels of an EDW were heavily impacted by the source systems. Also, EDWs were expected to serve many different (possibly conflicting) uses.
Many early EDW implementations failed, either because these problems were technologically insurmountable, or because we grossly underestimated the level of effort to overcome these problems.
BI technology advancements
Over time, vendors began to improve the scalability of business intelligence technology. DBMS vendors developed MPP (Massively Parallel Processing) databases and other specialized technologies. These technology developments, coupled with open systems and lowering hardware and storage costs, greatly improved scalability. With the emergence of DBMS and OLAP scalability improvements, the bulk of the cost of building an EDW shifted to the job of collecting, integrating, and improving the quality of the data.
In response, vendors developed ETL (Extract, Transform, Load) and data integration technologies to automate and replace costly hand-coding in COBOL and other programming languages. In my opinion, ETL technologies made it practical to build an EDW.
Lessons learned
Despite the advancements in technology that emerged in the 1990’s, many BI and EDW programs still failed or achieved only marginal success.
It’s fair to say that BI applications tended to see poorer quality business requirements than transactional applications that supported well defined business processes. Through the experience of developing a number of these systems, I found and used some basic data architecture principles to mitigate the risk, biasing the systems to be:
- Inclusive - biased toward collecting more data than less to lessen the probability of expensive rewrites to accommodate latent data requirements.
- Non-destructive - biased toward preserving source data, not overwriting with transformed data so that the system can accommodate business rule changes without going back to reacquire data, if it still exists in source systems.
- Granular - biased toward storing data at a sufficient grain or level of detail to permit later reformulation or reaggregation.
These principles emerged largely to prevent cost overruns and to deliver flexibility to meet unanticipated future needs.
Measuring success
Although it is generally agreed that many early EDW programs were less than successful, surprisingly little has been written about how to measure the success of an EDW program. Some EDW programs were considered a success merely because they completed without being canceled!
In 2006, I co-authored a TDWI article on Data Warehouse Service-Level Agreements that explored data warehouse SLAs—their benefits, the areas of quality they should cover, and some of the implications and solution components for successful implementation. Readers may refer to the article for more details, but it has been my experience that because an EDW is a shared platform, the SLAs must stratify the differing uses of the data, and the architecture must compensate for the resulting resource conflicts. SLAs potentially provide some relatively objective measures of success.
Overall perspective
This article is a reflection and commentary on the evolution of BI in the 1990’s, with a particular emphasis on OLAP, data warehousing and ETL technologies.
One might argue that, excepting OLAP, this article overlooks the problem of providing access to the data by the user community. It might also be argued that the definition of BI for this article is too narrow, as it does not consider the impact of the web. But these are topics best left to another article.
Leave a Reply
You must be logged in to post a comment.