Some Private Use Area code points for numbers.

This document lists some Private Use Area code points for numbers, which code points will hopefully be useful for various specialised purposes. The encoding has been produced primarily in relation to the entry of transcriptions of historical documents into computer files.

This list, the code points being entirely the choice of the present author, is published by the present author. The code points chosen are only as consistent amongst end users as end users choose to make them. These code points are not a "standard". They are simply provided on the basis that a list is better than no list, in that the existence of a list will hopefully help in any efforts to computerise transcriptions of historical documents and to produce computer based analyses of historical documents being as portable as possible.


In the course of my research in producing the golden ligatures collection, an interesting matter arose in email correspondence. This concerned the transcription of old German documents which include numbers.

It appears that there is a need to be able to computerise transcriptions of documents which include numbers, clearly distinguishing between the various ways in which numbers could be written in various documents.

It seems reasonable to provide four sets of numbers specifically for the purpose, so that scholars may, if they so choose, use them to transcribe old German documents complete with information about the way in which numbers were expressed in the documents.

The use of these codes is not restricted to transcribing old German documents, that possible application is the reason for their being designated, if other uses are found then that is alright.

The four methods of representing numbers were presented to me numbered as 1, 2, 3, 4. Accordingly I have chosen the code points so that the numbers 1, 2, 3, 4 are apparent in the hexadecimal representations of the code points.

The opportunity has been taken to also include some code points relating particularly to English Old Style founts.

Those code points are, in numerical order, before the code points intended primarily for encoding transcriptions of old German documents, so they are listed next, followed by the set of code points intended primarily for encoding transcriptions of old German documents.


Code points intended primarily for use with Old Style English founts.


The numerical digits are encoded in regular Unicode as 0 to 9 going from U+0030 to U+0039.

In olden days, such as 18th Century English printing, some founts had non-lining numbers.

In the 20th Century, metal type for various Old Style founts was sometimes offered with a choice of traditional non-lining numbers and lining numbers.

It may be useful to have within plain text a way of indicating the difference between using lining figures and non-lining figures.

As there is no presumption as to whether the figures expressed using U+0030 to U+0039 are lining figures or non-lining figures, it is not possible to define whether an alternative set of digits would have lining figures or non-lining figures.

However, it is possible to state that there is an alternative set of digits, intended primarily for use with Old Style English founts, where the characters are, as between being lining and non-lining, different to the way that the characters U+0030 and U+0039 are expressed in the same fount.

The characters are designated as follows.

Ten digit characters such that the appearance of the characters is, as between being lining and non-lining, different to the way that the characters U+0030 and U+0039 are expressed in the same fount.

0 to 9 going from U+E600 to U+E609.


Code points intended primarily for encoding transcriptions of old German documents.


Ten digit characters such that each type body within the ten is of equal width, digits are of equal height and digit bases are all at the same vertical position.

0 to 9 going from U+E610 to U+E619


Ten digit characters such that each type body within the ten is of equal width, yet the digits cannot be coded as U+E610 to U+E619.

0 to 9 going from U+E620 to U+E629


Ten digit characters such that type bodies within the ten are not of equal width, yet digits are of equal height and digit bases are all at the same vertical position.

0 to 9 going from U+E630 to U+E639


Ten digit characters such that type bodies within the ten are not of equal width, yet the digits cannot be coded as U+E630 to U+E639.

0 to 9 going from U+E640 to U+E649


It is noted that the characters U+0030 through to U+0039 will probably in practical founts all be such that their type bodies are of equal width, yet that is not, as far as I am aware, a definitive requirement of the Unicode standard and so no such presumption has been made in the preparation of this document.

William Overington

3 June 2002


 

This file is accessible as follows.

http://www.users.globalnet.co.uk/~ngo/numbers.htm