In my book Information Modelling: Practical Guidance (Prentice Hall, 1992), I described a number of patterns for modelling business data structures. Since all businesses want to model the past as well as the present, at least to some extent, I mentioned some of the patterns for modelling historical data.
Since then, several people have asked me to expand on this. I have compiled the following notes to provide some additional guidance.
Please regard this page as work-in-progress. Send me more questions or problems or examples, and I'll try to answer you directly, as well as adding material here.
If you're thinking in object terms, it's worth noting that the behaviour of the 'history' object is often completely different to the behaviour of the 'current' object. In fact, all history objects probably share some common processing characteristics. (In object-speak, they 'inherit' the properties of some generic history object type.)
It follows that when you partition the data into separate logical data stores, or onto different physical platforms, it may well make sense to place the 'history' objects separately from the 'current' objects. But this gets us into the details of the distribution design for your chosen technical architecture (e.g. client/server).
That's fine if you're working with a pure object-oriented platform. But what if you're working with relational databases and CASE tools?
Note: many of my clients are using the Sterling Software CASE tool known as COOL:Gen. This used to be owned by Texas Instruments and was known as IEF or Composer.
If you're trying to use a relational tool in an object-oriented way, you'd have separate entity types for the history data and for the current data. One problem here is that you may want to use the same user interface (UI) - the same screens or windows - for the current data and the history data. Then if you've put the current data and the history data into different places, you may have to have a component that accesses both places and presents both current and history data onto the same windows in the same format.
I don't think there is a single right answer - there never is - but I'm inclined to favour a more object-oriented approach - separating current and history data - in the interests of future maintainability. For one thing, it makes it much easier to implement changes to the data structure if you only have to convert current data and not history data as well. Similar arguments apply to data portability - moving data across platforms.
A correspondent from California writes:
After reading your book "Information Modeling", I've had a question regarding modeling of time and methods for extracting the modeled data. On page 232 of your book is a very good description of snapshot vs. event (or as you term it, "alteration") storage structures, along with the observation that choosing between the structures is largely an issue of balancing simplicity of data storage against retrieval.
As your text is concerned with data modeling, and not access, you do not delve into the details of access methods for the two structures -- unfortunately, I can't seem to find any references to another source which might be of benefit. Would you happen to know of any articles or books which discuss possible approaches?
Our current need addresses event/durational data. Our analysis tool is the SAS system, which provides both SQL (roughly SQL-89 standard) and 3GL type access and manipulation of data objects.
There may well be a general solution or pattern published somewhere, but I don't know of one. The solutions and patterns I am aware of are industry-specific and/or application-specific.
For example, a considerable amount of thought has gone into manufacturing systems (MRP2), which need to process large numbers of small changes in requirements. It would be too inefficient to recalculate the whole thing for every small change, so the usual approach is to recalculate the whole thing periodically, and then record a series of incremental changes against the most recent recalculation.
For another example, banking systems often calculate a daily balance for each customer at, say, 2am. Then if you enquire on your balance at 7pm, the system will quickly scan through all the day's transactions to find any deposits and withdrawals on your account since 2am, and adjust your balance accordingly.
I'm currently looking into the relationship between business strategy and data. In certain types of management information system, the decision-maker is required to (re)construct the intentions of his competitors and the trends in the market-place from a series of data snapshots. (For many companies, the market data are so poor and poorly structured that this is like calculating the tactics of a rival football team by looking at newspaper photos.)
Last Tuesday company A dropped its prices by 15%, then on Friday company B dropped its prices by 20% and opened a new branch office, then the following Monday company A dropped its prices by a further 7% and announced a major new distribution deal, and so on.
I'm also looking at issues of data archiving. IT practitioners often think of archive as the death of data: nobody wants it any more, but we'd better not delete it altogether, so we'll take it off the database and put a tape copy into the vaults. That's fine until someone actually does want to read the data, and then the reconstruction can be seriously expensive.
Richard Veryard is a technology consultant, based in London.
Content last updated on August 8th, 1997.
Technical update on May 17th, 1999.
Copyright © 1997 Richard Veryard