Granularity

Grain, Granularity

veryard projects > information management > granularity

grain matters

on this page

on other pages

Grain is a term that originates in photography. It refers to the degree of detail and precision contained in an image - the pixels or dots per inch - or preserved and communicated by a given medium, such as a film or screen.

An image is grainy if the imprecision is visible - in other words, even if you can't see the individual dots, you can see that the image is composed of dots.

In information management, granularity refers to the degree of detail or precision contained in data.

In modelling, granularity refers to the degree of detail and precision contained in a model.

Privacy and confidentiality requirements may govern the available granularity.

Ontology

Information Ethics

Information Notions

The Granularity of Data and the Grain of Existence

veryard projects > information management > granularity > data

In information management, granularity refers to the degree of detail or precision contained in data. Granularity has several dimensions, including time granularity and space – such as the size of the geographical clusters into which customers may be classified. In some domains, there may be a maximum granularity - in other words, perfect precision. Data with maximum granularity are known as atomic data.

Multiplication

Where lots of measurements are taken, granularity refers to the intervals (in space or time) between the measurements.

Division

Where entities are being sorted into categories, granularity refers to the choice between a large number of narrow categories, or a smaller number of broad categories.

Where a group (such as a community or market) is being divided into subgroups, granularity refers to the number of subgroups.

Where a space (such as a network of requirements or solutions or activities) is being carved into manageable chunks, granularity refers to the size of the chunks. (If a clustering approach is used, then the desired granularity will typically be one of the input parameters of the clustering algorithm.)

There is usually a difference in granularity between ‘raw’ data and ‘cooked’ data. The ‘raw’ refers to detailed records of business transactions and operations, as well as detailed records of market intelligence. The ‘cooked’ refers to consolidated or derived reports of business results and competitive performance. Usually (but not always) the raw is finer (smaller grained) than the cooked.

Ontology
Privacy and Granularity

The Granularity of Models

veryard projects > information management > granularity > models

In modelling, granularity refers to the degree of detail and precision contained in a model.

When you are modelling something from a single perspective, granularity is often not very important. A single data item is modelled as a single attribute, with a defined granularity. Granularity is not seen as much of an issue for database design – although I think it should be – but it cannot be avoided as an issue for system integration.

Granularity becomes an important issue for data modelling when you are trying to map or merge information across multiple systems or data stores – because the likelihood is that the granularity doesn’t match. It is an issue for the flexibility of the data model and artefacts designed from it.

Granularity is also a problem with distributed systems, especially where web services are involved, since it may affect the number of service calls across a network, perhaps by an order of magnitude. It may also affect the burstiness of the distribution of service.

And when you are trying to merge data from several sources into a single data warehouse, there are significant technical performance implications of the granularity decision. Some data warehouse experts recommend storing everything into the data warehouse as atomic data – on the grounds that the atomic level is the most stable level, and also represents the highest common factor – but this approach is problematic in some domains. In any case, it places a great burden on the conceptual data modelling phase, to ensure that the atomic level has been correctly identified.

Ontology

Privacy and Granularity

veryard projects > information management > granularity > privacy

Three possible ways of capturing personal information, in database records for each PERSON:

1 A single occurrence of PERSON for each human being.
2 A single occurrence of PERSON for each human being in each (socio-economic) role.
3 Personal information aggregated into demographic or behavioural statistics.

One of the aims of the UK Data Protection Act (and of similar legislation in other countries) is to prevent the combination of data from several sources, for purposes other than that for which the data were originally collected. This means that (2) is preferred to (1). For some purposes, we are only allowed access to statistical aggregations of data, but not the raw data themselves. This means that (3) is preferred to (1) and (2).

Then the fun is to predict the behaviour of an individual from the demographic data, for example:

what is the probability that this person will respond positively to this mailshot
what is the probability that this person will prove a good credit risk

Thus it is possible to dis-aggregate and restore data, to return to the individual person from the anonymous totals and averages. Of course, this process introduces errors and inaccuracies. Does this benefit the person whose privacy is at stake? Hardly! Imagine: you are denied a loan because you live in a dubious district, or you belong to some demographic category that the statisticians depreciate.

Information Ethics
Privacy

veryard projects > information management > granularity