How Linguist Works

Back

How Linguist Works

Linguist comprises a core of classes that handle the basic features of any language, plus a large set of classes you write yourself, each of which deals with a single aspect of your target language. We supply a comprehensive set for you to use or modify as you prefer.

As a simple example, let's take a few lines that would be recognized by the standard package:

variable Count
put 5 into Count
prompt Count
exit

Part of the job of the core classes is to read in the script and break it up into lines and tokens. Linguist then works its way through the tokens. In this example, the first thing it finds is the word variable. Because nothing in Linguist is specific to English, a keyword handler for this word must be provided somewhere. Linguist is organized into packages (see the Reference) and all the handlers in this example can be found in the basic package.

Variables of various kinds are the key to Linguist, and are where most of the work is done. They are consequently the largest classes. Whenever you put a value into a variable, move an object to a position on the screen, display a graphic, play a movie or set an attribute of something, the command handler simply calls a method in the appropriate variable and lets it do the job. This is object orientation at its best, where only the object itself knows how to do something; the other classes merely give it an instruction to do it now.

Keyword handlers

To locate a handler, Linguist searches the packages, one by one. For each package it takes the keyword (here the word variable), makes its first character upper-case, then prepends the initial letter of the package (B for the basic package) and the identifier K for Keyword. The result in this case is BKVariable. It then looks for a class of that name in the basic.keyword package. If the class is not present, the compiler moves on to the next package. If this were graphics it would look for GKVariable in graphics.keyword. This process continues until either it finds a handler or there are no more packages. The latter causes an error to be reported and compilation is aborted.

Once a handler is found, linguist calls its handleKeyword() method. Every keyword handler has one and it's where the real compilation is done. The handler (written by you) takes over at this point to complete the job of identifying what the script is asking for. Once it finishes, it returns a runtime handler containing all necessary data.

In the case of a variable declaration, the keyword handler will ask for the next token (word). If it sees the word array it notes the fact and asks for another token. It's expecting the name of a variable, so the token is checked against the symbol table in case of duplicates. It then instantiates a runtime handler containing the current script line number (for the debugger), the name of the variable, its location in the compiled script and the flag that tells if it's an array. The handler is returned to Linguist, which places it into an array that is to become the runtime program.

Other keyword handlers work in much the same way, some more complex than others. I'll provide more detail in the tutorial.

Runtime handlers

Once the script has been compiled it can be saved to disk or run directly. (Saving is done by serialization, so you need to make sure all of your classes are Serializable.) The runtime process is very simple; Linguist starts with the first handler in the array and calls its execute() method. The value returned is the index of the next handler to be executed, or null if this script thread has ended. The overhead is very low.

The execute() method contains the real meat of what you have embedded inside the script language. It can be as simple or as complex as you like, from simple assignments to complex mathematical functions or even whole embedded Java applets. You can also use it as a jumping-off point for native methods.

Linguist forces you to break your project down into blocks that can be represented by script words. This is an essential discipline in large projects and the extra effort involved in setting up the structures is amply repaid. There's no temptation to hack a temporary patch; it's easier to create a new command or syntax variant and work inside a new runtime handler. Mistakes made in one place are contained and don't propagate through the rest of your application.

Debugging

I haven't yet mentioned a further feature. When you create handlers for a variable (of any type), you should also include methods that can be called by the debugger to display information about your variable. When Linguist runs your script in debug mode, instead of directly calling your handler's execute() method the framework calls the debugger, passing it your handler. The debugger detects a variable handler and supplies the framework needed for you to click on the name in a source window to see its value. There's not much you need to do beyond adding a method in the handler to return a string containing a one-line representation of the variable's state. This can be anything you choose.

A more complex task is to provide code to output a full description of the variable into a window provided by the debugger. For example, the variable type Window in the graphics package lists its current position and size. You are free to output anything you like that might aid the debugging process. This task is made more complicated by the need to handle arrays, so study one of the existing handlers closely.

Back