Grain, Granularityveryard projects > information management > granularity
|grain matters||on this page||
on other pages
|Grain is a term that originates
in photography. It refers to the degree of detail and precision contained
in an image - the pixels or dots per inch - or preserved and communicated
by a given medium, such as a film or screen.
An image is grainy if the imprecision is visible - in other words, even if you can't see the individual dots, you can see that the image is composed of dots.
|In information management,
granularity refers to the degree of detail or precision contained in data.
In modelling, granularity refers to the degree of detail and precision contained in a model.
Privacy and confidentiality requirements may govern the available granularity.
The Granularity of Data and the Grain of Existenceveryard projects > information management > granularity > data
|Multiplication||Where lots of measurements are taken, granularity refers to the intervals (in space or time) between the measurements.|
|Division||Where entities are being sorted into categories, granularity refers
to the choice between a large number of narrow categories, or a smaller
number of broad categories.
Where a group (such as a community or market) is being divided into subgroups, granularity refers to the number of subgroups.
Where a space (such as a network of requirements or solutions or activities) is being carved into manageable chunks, granularity refers to the size of the chunks. (If a clustering approach is used, then the desired granularity will typically be one of the input parameters of the clustering algorithm.)
There is usually a difference in granularity
between ‘raw’ data and ‘cooked’ data.
The ‘raw’ refers to detailed records
of business transactions and
operations, as well as detailed records of market intelligence. The ‘cooked’
refers to consolidated or derived reports
of business results and
competitive performance. Usually (but not always) the raw is finer (smaller
grained) than the cooked.
Privacy and Granularity
The Granularity of Modelsveryard projects > information management > granularity > models
When you are modelling something from a single perspective, granularity is often not very important. A single data item is modelled as a single attribute, with a defined granularity. Granularity is not seen as much of an issue for database design – although I think it should be – but it cannot be avoided as an issue for system integration.
Granularity becomes an important issue for data modelling when you are trying to map or merge information across multiple systems or data stores – because the likelihood is that the granularity doesn’t match. It is an issue for the flexibility of the data model and artefacts designed from it.
Granularity is also a problem with distributed systems, especially where web services are involved, since it may affect the number of service calls across a network, perhaps by an order of magnitude. It may also affect the burstiness of the distribution of service.
And when you are trying to merge data from several sources into a single
data warehouse, there are significant technical performance implications
of the granularity decision. Some data warehouse experts recommend
storing everything into the data warehouse as atomic data – on the grounds
that the atomic level is the most stable level, and also represents the
highest common factor – but this approach is problematic in some domains.
In any case, it places a great burden on the conceptual data modelling
phase, to ensure that the atomic level has been correctly identified.
Privacy and Granularityveryard projects > information management > granularity > privacy
1 A single occurrence of PERSON for each human being.
2 A single occurrence of PERSON for each human being in each (socio-economic) role.
3 Personal information aggregated into demographic or behavioural statistics.
One of the aims of the UK Data Protection Act (and of similar legislation in other countries) is to prevent the combination of data from several sources, for purposes other than that for which the data were originally collected. This means that (2) is preferred to (1). For some purposes, we are only allowed access to statistical aggregations of data, but not the raw data themselves. This means that (3) is preferred to (1) and (2).
Then the fun is to predict the behaviour of an individual from the demographic data, for example:
|veryard projects > information management > granularity||
Copyright © 2001-3 Veryard Projects Ltd