Data Mapping and Transformation

veryard projects - innovation for demanding change

data mapping and transformation

on this page

the raw and the cooked

tool reports
(forthcoming)

other material

data mapping (pdf)

ontology (html)

information
management (html)

information
notions (html)

information
coordination (html)

veryard projects

Why do we have several data models?

	Different stages of the development life-cycle. Conceptual, Logical and Physical. Or perhaps Requirements, Specification and Design.
	Different stages of transformation of data into knowledge. Operational data stores, data warehouses, data marts.
	Division of labour or responsibilities between parallel projects.
	Different platforms.
	Different schemas and notations. Hierarchical, Network, Relational, Object. XML, ebXML and BizTalk schemas.
	Multiple internal sources. For example, a large corporation – perhaps grown through merger and acquisition – with lots of separate legacy systems and data stores, plus various COTS packages.
	Multiple internal destinations. For example, satisfying the information needs of different business processes or user communities within the organization.
	Multiple external sources and destinations.
	Parallel versions and ways of working – in any of the above.

If we can establish data mappings from one model to another, then we can translate data – for example to perform ETL from one data store to another. It also enables us to transform stuff that depends on data structure – in other words, helping us reuse stuff across multiple ontologies.

The Raw and the Cooked

veryard projects > information management > data mapping and transformation > raw and cooked

Simplistic data modelling assumes that there is a clear distinction between atomic data and derived (molecular) data – but it doesn’t work out as clearly as this in practice, and this issue may have sweeping implications for system architecture and design.

The starting point for us is often the transformation between the ‘raw’ and the ‘cooked’. The ‘raw’ refers to detailed records of business transactions and operations, as well as detailed records of market intelligence. The ‘cooked’ refers to consolidated reports of business results and competitive performance.

One of the differences between the raw and the cooked is the level of granularity. Usually (but not always) the raw is finer (smaller grained) than the cooked. Another difference is the level of complexity. The cooked data may be modelled as more complex data objects, while the raw data may be modelled as apparently more simple data objects.

In my 1992 book, I discussed how such transformations could be expressed using an extension to traditional data modelling notations. Object modelling notations can sometimes express these transformations more elegantly.

A clustering algorithm can be used to join up simple data objects (raw data) into complex data objects (cooked data). This is something addressed inadequately by data warehousing methodologies.

Privacy and confidentiality requirements sometimes place restrictions on the dissemination of raw data (in which the subjects can be identified), meaning that only cooked data may be available. (Clever transformation can sometimes allow identity to be reconstructed - often inaccurately, but good enough for some purposes. Inference Control is used to prevent such breaches or leaks.)

But if we focus on the transformation between the raw and the cooked, between simple data objects and the complex data objects, we may overlook the other two transformations depicted above. The data architecture needs to call two things into question: the reliability of the source data and the strategic opportunities opened up by the information. These both demand a level of data reflexivity (data about data, data about itself, sometimes called metadata).

Finally, we should not overlook the possibility of linkage between the source data and the business strategy. The availability and reliability of source data is itself dependent on the strategic alliances formed by the organization. Furthermore, any organization is vulnerable to being fed false information by its competitors, which may tempt it into poor business judgements. Data means ‘that which is given’, but by whom, to whom, and for what purpose?

Give and Take of Information
Grain and Granularity
Privacy and Granularity
Information Leakage

top

home page