Some Private Use Area code points for courtyard codes for choosing collections of code points.

Here are some courtyard codes for choosing collections of code points.

A fundamental feature of using courtyard codes to designate the set of meanings of the code points from the Private Use Area which are being used within the same document, at some particular place in the document, is that the designations are carried out upon collections of code points, each collection consisting of 256 code points from the Unicode Private Use Area, running from U+HHHH00 to U+HHHHFF for some particular value of HHHH for each collection of code points. Thus the designation of any one code point from a particular collection of 256 code points is always, at any particular time, as being in the same set as the other 255 code points in that collection of 256 code points.

It is important to remember that courtyard codes indicate overall designations of a collection of 256 code points within the Private Use Area in a particular document at a particular place within that document, not the designation of individual code points within that collection of 256 code points. So, for example, a designation that the code points are being used for cuneiform characters from some particular civilization as codified by some particular researcher or research group with a font available from a particular web address is the sort of information to be expected, not the meaning associated with some particular code point within that set of characters.

There are 537 such collections, 25 from plane 0 of Unicode, 256 from plane 15 of Unicode and 256 from plane 16 of Unicode. These are regarded for the purposes of courtyard codes as being in a contiguous range with indexes from E0 to F8, then F00 to FFF, then 1000 to 10FF.

Readers are, however, notified that the only reason why that feature is fundamental to using courtyard codes is because that is the way that the present author has chosen to implement courtyard codes for this feature. Some other method could possibly be devised, yet this is the method chosen for this particular set of code points.

Each collection has associated with it a string of Unicode characters, called a courtyard courtesy string. The contents of those courtyard courtesy strings are used to establish, at any particular time, what meaning, if a meaning is known, the code points in that collection are intended to have: the values in the courtyard courtesy string being settable from within the document itself. A null courtyard courtesy string means that the present meaning of that collection of Private Use Area code points is unknown.

The way to use these collections within a document is as follows.

Firstly, nominate a range of collections, from a lower limit to an upper limit, where each collection within the range is to have the same designation. The lower limit may be the same as the upper limit, for the purposes of nominating just one collection. There are 26 courtyard collection nomination codes, from U+F300 to U+F319 inclusive. U+F300 to U+F318 each nominate one particular collection, from collection E0 through to collection F8, and U+F319 nominates the collection whose code is in the accumulator register, providing that number is a valid collection index. So, for example, if the accumulator contains the hexadecimal value FD2 then U+F319 will nominate collection FD2, which collection covers code points U+0FD200 to U+0FD2FF. For the avoidance of doubt U+F319 can be used to nominate collections in plane 0, plane 15 and plane 16. Nomination of a collection clears the courtyard courtesy string for that collection, though there is a special rule regarding collection F3. Nomination of a range clears all of the courtyard courtesy strings in the range, though there is a special rule regarding collection F3. The special rule concerning collection F3 is because changing the courtyard courtesy string for collection F3 affects the use of courtyard codes within the document. This rule is explained later in this document.

After that, each code point for a courtyard courtesy character, which is a character in the range U+F320 through to U+F37E, is appended to the courtyard courtesy string for each of the collections in the nominated range.

The nomination of collections and altering the contents of courtyard courtesy strings may take place whenever needed within a document. Often, a designation at the start of a document will be sufficient. Facilities for more complicated usage are however provided.

The codes in the range U+F320 to U+F37E are courtyard courtesy character versions of the codes U+0020 to U+007E of regular Unicode. Thus, for example, U+F342 is COURTYARD COURTESY CHARACTER LATIN CAPITAL LETTER B because U+0042 is LATIN CAPITAL LETTER B.

The meanings of those courtyard courtesy strings is determined as follows by the end user or the software package that he or she is using in conjunction with courtyard courtesy strings in accordance with rules explained later in this present document.


In order to nominate either one collection or a range of collections conveniently, the following format is used. There are variables named LOWER and UPPER which contain the values of the indexes of the collection codes within the software of the program, in whatever format the programmer of that software chooses to use internally within that software. There is also a Boolean variable named ONLY_ENTER_INTO_UPPER which is initially set as false. When a courtyard collection nomination code is processed, it is entered into either both LOWER and UPPER or only into UPPER depending upon the state of the ONLY_ENTER_INTO_UPPER variable, the value of the variable ONLY_ENTER_INTO_UPPER then being toggled, that is, its state is changed to become other than what it is presently.

Although it should not be necessary to need to use explicit codes to set the value of ONLY_ENTER_INTO_UPPER, nevertheless, for completeness, and in case a need is found, codes for setting the value of ONLY_ENTER_INTO_UPPER are included.

Receipt of any courtyard courtesy character sets the value of ONLY_ENTER_INTO_UPPER to false. This is so that if a courtyard collection nomination code is received after a courtyard courtesy character is received, the presumption is that a new nomination sequence is being started.

The purpose of the ONLY_ENTER_INTO_UPPER flag is so that if only one collection is being nominated, it need not be nominated twice, namely as both LOWER and UPPER.


The code U+F313 is the courtyard collection nomination code which would override the setting of the designations of the courtyard. This is not normally done, yet facilities are provided, just in case they are needed for a special purpose, such as where a particular character set has been produced which uses code points in the U+F300 to U+F3FF range and the document author wishes to include one or more of those code points within a document which is organized around a basis of using courtyard codes. This takes a number of code point usages to achieve, yet it is important that such a facility is available just in case it is ever needed.

The rule is that the F3 block is only overwritten if the COURTYARD_OVERWRITE_PERMITTED flag is set. By default, the COURTYARD_OVERWRITE_PERMITTED flag is false.

Once the courtyard is overwritten, it is possible to get courtyard control back within the same document provided that the overriding is done for a specified number of characters by using the U+F31D code point.

The sequence for overwriting the F3 collection depends upon whether overwriting is permanent or temporary.

Permanent overwriting is achieved by the following method.

U+F31B U+F313 followed by a sequence of codes in the range U+F320 to U+F37E to produce a courtyard courtesy string describing the new designation of the F3 collection.

Temporary overwriting is achieved by the following method.

Use courtyard codes to set up a value in the accumulator register, then use U+F31D U+F31B U+F313 followed by a sequence of codes in the range U+F320 to U+F37E to produce a courtyard courtesy string describing the new designation of the F3 collection, which is to last for the specified number of printable characters, then revert to being for courtyard codes.

The implication of the U+F31D mechanism is that if U+F31D is not used and the F3 collection designation is overwritten, then the number of printable characters for which the overwriting is to last is infinite. Naturally, as computers cannot store the number infinity directly, a software flag needs to be implemented so that a system knows whether to count the number of printable characters processed after such an overwriting so that a return to designating the F3 collection to be for courtyard codes can be carried out or whether to just leave the overwriting as permanent.

In practice, such overwriting of the F3 collection designation is likely to be a very rare occurrence, particularly if many people agree to keep the U+F3.. block clear of code point allocations which would clash with courtyard codes.

However, it is good to know that courtyard codes have the capability to manage to function even if such a clash of code point allocations within the F3 collection occurs.

In passing, perhaps it may be mentioned that clashes of Private Use Area code point allocations in other collections may not be that unusual at all. In cases where it is desired to utilize characters from two overlapping allocations in one document the changing of the designation of a collection is just a straightforward application of courtyard codes.


When using LOWER and UPPER, the range may, if desired, be from one plane to another, possibly including the other plane within the range.

U+F300 NOMINATE THE E0 COLLECTION, ENTERING THE VALUE E0 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F301 NOMINATE THE E1 COLLECTION, ENTERING THE VALUE E1 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F302 NOMINATE THE E2 COLLECTION, ENTERING THE VALUE E2 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F303 NOMINATE THE E3 COLLECTION, ENTERING THE VALUE E3 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F304 NOMINATE THE E4 COLLECTION, ENTERING THE VALUE E4 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F305 NOMINATE THE E5 COLLECTION, ENTERING THE VALUE E5 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F306 NOMINATE THE E6 COLLECTION, ENTERING THE VALUE E6 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F307 NOMINATE THE E7 COLLECTION, ENTERING THE VALUE E7 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F308 NOMINATE THE E8 COLLECTION, ENTERING THE VALUE E8 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F309 NOMINATE THE E9 COLLECTION, ENTERING THE VALUE E9 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30A NOMINATE THE EA COLLECTION, ENTERING THE VALUE EA INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30B NOMINATE THE EB COLLECTION, ENTERING THE VALUE EB INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30C NOMINATE THE EC COLLECTION, ENTERING THE VALUE EC INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30D NOMINATE THE ED COLLECTION, ENTERING THE VALUE ED INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30E NOMINATE THE EE COLLECTION, ENTERING THE VALUE EE INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F30F NOMINATE THE EF COLLECTION, ENTERING THE VALUE EF INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F310 NOMINATE THE F0 COLLECTION, ENTERING THE VALUE F0 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F311 NOMINATE THE F1 COLLECTION, ENTERING THE VALUE F1 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F312 NOMINATE THE F2 COLLECTION, ENTERING THE VALUE F2 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F313 NOMINATE THE F3 COLLECTION, ENTERING THE VALUE F3 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F314 NOMINATE THE F4 COLLECTION, ENTERING THE VALUE F4 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F315 NOMINATE THE F5 COLLECTION, ENTERING THE VALUE F5 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F316 NOMINATE THE F6 COLLECTION, ENTERING THE VALUE F6 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F317 NOMINATE THE F7 COLLECTION, ENTERING THE VALUE F7 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F318 NOMINATE THE F8 COLLECTION, ENTERING THE VALUE F8 INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F319 ACT WITH A NOMINATION CODE TO CHOOSE THE COLLECTION GIVEN BY THE VALUE OF THE ACCUMULATOR, ENTERING THE VALUE INTO THE UPPER REGISTER AND POSSIBLY THE LOWER REGISTER
U+F31A SET THE COURTYARD_OVERWRITE_PERMITTED FLAG TO FALSE
U+F31B SET THE COURTYARD_OVERWRITE_PERMITTED FLAG TO TRUE
U+F31C SET THE F3 COURTYARD COURTESY STRING TO CONTAIN THE SINGLE CHARACTER U+F32A INDICATING THAT COURTYARD CODES ARE IN USE
U+F31D SET THE NUMBER OF PRINTABLE CHARACTERS FOR WHICH OVERRIDING OF THE F3 COLLECTION IS TO TAKE PLACE (BEFORE RETURNING TO THE F3 COLLECTION REPRESENTING COURTYARD CODES) TO THE VALUE IN THE ACCUMULATOR
U+F31E SET THE ONLY_ENTER_INTO_UPPER FLAG TO FALSE
U+F31F SET THE ONLY_ENTER_INTO_UPPER FLAG TO TRUE

U+F320 to U+F37E are in each case produced by concatenating the words COURTYARD COURTESY CHARACTER, a space and the name of the corresponding code point allocation of the regular Unicode character in the range U+0020 to U+007E.


A string starting with U+F35B, the courtyard courtesy character version of [, and ending with U+F35D, the courtyard courtesy character version of ], and using elsewhere within the string only characters from the range U+F330 to U+F339, the courtyard courtesy character versions of 0 to 9, and from the range U+F341 to U+F346, the courtyard courtesy character versions of A to F, gives a hexadecimal type tray designation.

A string starting with U+F37B, the courtyard courtesy character version of {, and ending with U+F37D, the courtyard courtesy character version of }, contains the web address, or DVB-MHP address, of a font file.

A string starting with U+F33C, the courtyard courtesy character version of <, and ending with U+F33E, the courtyard courtesy character version of >, contains the web address, or DVB-MHP address, of a text file about a character set.

A string that starts with a F32A, the courtyard courtesy character version of *, is used to indicate that the F3 collection is being used for courtyard codes. This only has meaning after a U+F313 code has been used to nominate collection F3. In this case, the courtyard courtesy character version of * is enough to indicate that usage, any additional characters in the string are just comment.

A string starting with a courtyard courtesy character version of a character other than [ or { or < or * contains direct text about the font. However, users are asked to avoid using any character other than those in the range U+F341 to U+F35A or in the range U+F361 to U+F37A as the first character of a string of courtyard courtesy characters so as to leave scope for the possible addition of further special format strings of courtyard courtesy characters. U+F341 to U+F35A are the courtyard courtesy character versions of A to Z and U+F361 to U+F37A are the courtyard courtesy character versions of a to z.

The use of the courtyard courtesy character versions of [ and ] characters implies the existence of a registry of type trays. This is an issue which will need to be resolved. However, the codes and the method are published so that there is a chance of a registry becoming implemented.

In the event of a registry being established it will be possible to uniquely define a Private Use Area code point allocation in documentation.

Consider for example, the code point U+E707 which is used within the golden ligatures collection for a ct ligature. Suppose that a type tray with a designation E001 were designated which contains the golden ligatures collection, as well as, perhaps, various other Private Use Area code point allocations.

It would then be possible, in documentation to refer to the ct ligature character as A+E707.E001 so that the designation of the type tray is added after the hexadecimal point. Please note that the code format A+ is used in this example, because adding the hexadecimal point and the type tray means that the U+ designation is no longer appropriate.

In a document where it is desired to specify that the E001 type tray is in use for collection E7 (that is, for U+E700 to U+E7FF) the sequence of code points would be as follows.

U+F307 U+F35B U+F345 U+F330 U+F330 U+F331 U+F35D

That is, a sequence of seven courtyard codes. The U+F307 indicates that collection E7 is being nominated. The six characters U+F35B U+F345 U+F330 U+F330 U+F331 U+F35D mean that the six courtyard courtesy characters U+F35B U+F345 U+F330 U+F330 U+F331 U+F35D are used to form the courtyard courtesy string corresponding to collection E7. These six characters give the courtyard courtesy character version of the [E001] sequence. This format indicates that type tray E001 is in use.

If the following sequence is used, it would mean that type tray E001 is to be used for all Private Use Area codes within the range of U+E500 to U+E7FF as three collections are nominated as part of the same designation that type tray E001 is to be used.

U+F305 U+F307 U+F35B U+F345 U+F330 U+F330 U+F331 U+F35D

Thus hopefully these courtyard collection nomination codes and courtyard courtesy characters and courtyard courtesy strings provide a basis for the future for making more effective use of the Private Use Area by means of allowing a document to carry within it an indication of the meanings of any Private Use Area codes used within that document.

William Overington

4 July 2002


 

This file is accessible as follows.

http://www.users.globalnet.co.uk/~ngo/courtcho.htm