The comet circumflex system.

William Overington

Copyright 2002 William Overington

The comet circumflex system is my research on internationalization and localization, being carried out in relation to email conversations, creative writing, web publishing and DVB-MHP broadcasting. Some readers may have arrived here due to an interest in DVB-MHP broadcasting and other readers from other interests, or just serendipitously, so I mention that DVB-MHP means Digital Video Broadcasting - Multimedia Home Platform.

Although usually named as comet circumflex using two words, the designation comet_circumflex using an underscore is intended to be used at least once in each document so that documents on the comet circumflex system may be more easily found using web search engines.

The reason for the name comet circumflex first needs to be explained. I was seeking a sequence of regular Unicode characters which would be highly unlikely to occur otherwise, so that the sequence could be monitored by software to detect a specialised form of markup. The specialised form of markup being phrases of text for internationalization and localization. The presence of an encoded phrase in an otherwise plain text stream could be designated using a key such as an otherwise highly unlikely sequence of Unicode characters. There was a discussion about combining characters in the Unicode mailing list, which list can be joined from a link at the http://www.unicode.org website. I looked at the various combining characters and decided that a sequence of a regular Unicode character as a base character, then one or more combining characters so as to produce a sequence which would be otherwise unlikely, followed by U+20E3 COMBINING ENCLOSING KEYCAP would make a good key for entering a specialised markup universe. An all-Unicode font would then produce a graphic representation of the key, without any prior arrangement being necessary, so that such marked-up sequences could be produced using just a regular all-Unicode plain text editor. The technique has the advantage of entry to the markup universe not being caused inadvertently when parsing an ordinary text file which does not require entry to the markup universe.

I reasoned that, as my application is about languages, an accented character would be nice, yet I was reluctant to use an accented letter as that could perhaps be used in some real language, so I thought of using a symbol with an accent, as an otherwise unlikely to occur sequence. I then looked for a suitable sequence. I liked U+2604 COMET as a base character. For an accent, a circumflex accent seemed to give a definite language connection look.

U+0302 COMBINING CIRCUMFLEX ACCENT

My idea for the system at that time was to encode the phrases as index numbers and also to encode any parameter words within a sentence as index numbers as well. In the event, my later idea to use Esperanto in linking to a full collection of words, was compatible with the use of the circumflex accent as the accented character as most of the accents used in Esperanto are circumflex accents.

Thus I decided to use the following three character sequence as the key.

U+2604 U+0302 U+20E3

The particular markup system for my research on internationalization and localization thus became named the comet circumflex system.

I decided to use the following code, in conjunction with the comet and the combining circumflex accent so as to signal exiting from the specialised markup system.

U+20E2 COMBINING ENCLOSING SCREEN

This gave the following sequence to exit from the specialised markup universe.

U+2604 U+0302 U+20E2

I decided to separate the codes for parameters from the main sentence code by using circled numbers from the Unicode characters in the range U+2460 to U+2473.

U+2460 CIRCLED DIGIT ONE

through to

U+2468 CIRCLED DIGIT NINE

and

U+2469 CIRCLED NUMBER TEN

through to

U+2473 CIRCLED NUMBER TWENTY

There is also

U+24EA CIRCLED DIGIT ZERO

There are also various circled letters which might be very useful for localization. For example, if a parameter arrives as a number, then that number could be a numerical value or could be an index to a list. Suppose that the list is of country codes as used for international telephone calls, then in localization a particular circled letter followed by a circled number could be used to mean that a particular list of translations (that is a particular text file) should be used and that the circled number is the code number of the parameter which should be used to index that list. For example, suppose that country code 44 is received as parameter 1 of a particular comet circumflex sentence, then localization of that sentence could be to have a sentence in the target language yet including a particular circled letter to indicate use of the list of countries followed immediately by a circled 1 to indicate to use parameter 1 as the index. In this case, entry 44 of the particular file would be used to find the name of country 44 expressed in the particular target language.

I originally thought of using just numbers as parameters, yet I have subsequently thought that I could use numbers to index a specific list and use Esperanto roots of words so as to enable the system to convey any word.

There is clearly a balance as to having a specified list, which would be for automated localization within a set of words, and having Esperanto roots which would be for maybe automated, maybe manual, localization yet with any words. This is just one of the many aspects of researching the comet circumflex system.

I wonder just how much internationalization and localization will be achievable using the comet circumflex method. So, I am starting this set of documents in the hope of generating interest and hopefully of adding further documents into the collection from time to time.

William Overington

21 October 2002


The index that follows was last updated on Monday 4 November 2002. Some of the files were added after 21 October 2002. Please see the dates on the individual files.

Some initial experiments with the comet circumflex system.

Here is a file of codes for starting some experiments with the comet circumflex system.

Internationalization and Localization of content.

A transcript of a posting in the forum in the http://forum.mhp.org webspace.

Passing information forward from sentence to sentence.

Some interesting considerations of using words such as "it" and "they" in sentences, where the grammatical gender of the noun to which the words "it" and "they" refer is important.

Poetry with the comet circumflex system.

Is it possible for original poetry to be written using the comet circumflex system?

Some email links in relation to the comet circumflex system.

Here are some email links.

The comet circumflex board game.

A board game as a research platform for the comet circumflex system, in order to test whether the comet circumflex system can be used so that people around the world may play the board game using comet circumflex sentences and discuss the board game using comet circumflex sentences.

A partial simulation of a web based shop.

A partial simulation of a web based shop. The web based shop is simulated as a supplier of special additional pieces for the comet circumflex board game.

Some useful software tools.

Some useful software tools.


 

The comet_circumflex system.

Copyright 2002 William Overington

This file is accessible as follows.

http://www.users.globalnet.co.uk/~ngo/c_c00000.htm