How many times have you laboriously
gone through a ZX Basic program,
replacing one item with another? Well,
despair no more, Multisearch will
quickly and automatically find and
replace almost any selected item. This
routine is easy to use and is only 225
bytes long. It'll run anywhere in memory
(so it doesn't interfere with other utilities) and, what's more, turns out to have
lots of useful and unexpected applications. |
POWERFUL POSSIBILITIESThe possibilities of Multisearch aren't limited to changing one message for another. You can use it to edit long program lines, to replace keywords or to document programs (replacing line number references with names). Multisearch will also work the other way, replacing names with numbers - which is very useful if you intend to compile a Basic program into machine code.
Most interesting of all is the possibility of writing programs which edit themselves; Multisearch can easily be called while a program runs. In this article we will investigate the internal format of ZX Basic and show how you can use Multisearch to make programs faster, more concise, or to protect them against people who want to fiddle with them (Troubleshootin' Pete, please note).
INSPIRATIONThe idea of Multisearch came when YS reviewed a job lot of 'programmers' toolkits' a number of months ago. [see Talking of Toolkits in issue 6] These are designed to make life easier for Basic programmers, but they all turn out to
have a common flaw - they won't let you
replace numbers in a program automatically.
Some of the toolkits had a 'search and replace' facility, but they all had annoying limitations - for example, Super Toolkit would only replace single keywords. The suggested use was to change LPRINT into PRINT or vice versa, but in fact that's pretty pointless because you can get the same effect on any Spectrum with a standard (but undocumented) command:
This sends the output of PRINT statements to the printer until you cancel it with:
If you want to work the other way, you can use:
to send the results of every LPRINT
statement to the screen. When you want
to use the printer again, the command:
will set things back to normal.
It's a bit more useful to be able to replace text in a program - perhaps you might want to Americanise the word 'colour' by replacing it with 'color', or enforce some similar indignity. But by far the most useful application baffles every single toolkit - the problem of changing numeric values within a program.
INSIDE BASICThe accompanying figure shows the rather complicated way the Spectrum stores a simple Basic program:
10 PRINT 2+VAL "2"
20 GO TO 10
Most of the data is ASCII code - for instance, 34 is the code of inverted commas and 236 is the code of the keyword GO TO. A full list of the keyword values is in Appendix A of the Spectrum manual - take a look at the strange way the Spectrum stores numbers.
Most numbers in a program are also stored in a hidden 'binary form' which takes up six extra bytes. This is meant to make programs run more quickly, by removing the need for the computer to convert numbers from text to binary whenever they are found. In practice, VAL "2323" can be handled almost as fast as the number 2323, and the first version uses three less bytes, because the string value doesn't have a hidden 'binary form'.
In the figure, you can see that VAL "2" needs three less bytes than '2' on its own. The number '2' is followed by a 'marker' byte (code 14) which tells the LIST routine to skip the next five bytes - the binary form of the number. When the program RUNs, the text is ignored and the binary form is used.
The binary is in a rather odd format - one which is explained in Dr Ian Logan's excellent book,
The assembler listing for Multisearch. Grab an assembler (or a Hex loader if you're going to enter the Hex code on the left of each column) and get typing!
Spectrum (published by Melbourne
House). Luckily, with the aid of Multisearch, you don't need to understand the
format to manipulate it. |
The upshot is that numbers in ZX Basic programs need careful treatment, as they can gobble up memory at an alarming rate. Some expressions for numbers are even more concise than the 'VAL' version, because they use the keyword PI instead of a number. PI only occupies one byte in a program. The accompanying table lists a few common values and the expressions to replace them, along with the number of bytes saved ('n' represents any number).
You could use variables with preset values instead of numbers to get a similar saving in space, but beware - ZX Basic is rather slow at finding the value of variables; expressions like SGN PI may be worked out more quickly, especially if your code uses lots of variables anyway.
Interestingly, values expressed using the BIN function are also stored in two forms, so that BIN 1 soaks up eight bytes - one for the keyword, one for the digit, and an extra six for the genuine binary form.
The line numbers at the start of each line are stored in a more sensible 'packed' format - each number occupying just two bytes. They are converted into decimal
by the LIST routine in the ROM. The two bytes after each line number hold the length of the line, so that Basic can skip quickly from one line to the next. An 'ENTER' character is at the end of every line. This format is briefly explained in the Spectrum manual, on page 166.
The first program given is a simple loader which will store the machine code for Multisearch at address 30000. To use it, simply RUN the program and if you've made no typing mistakes, the correct code will be stored. If there's a mistake in the data, an appropriate message should appear. It's wise to SAVE the program as soon as it has apparently run correctly, just in case an error has slipped through. If you save the code you can then load it again - without the Basic - at any address.
MULTISEARCH ON THE RUNThe routine is very easy to use, and all you need to do is load the code into any
; "Find Search string S$" 7530 2A4B5C FINDS LD HL,(VARS) 7533 7E NEXT1 LD A,(HL) 7534 FE53 CP "S" 7536 280E JR Z,GOT_S 7538 FE80 CP T_END 753A 2806 JR Z,ERROR 753C CDB819 CALL F_VAR 753F EB EX DE,HL 7540 18F1 JR NEXT1 ; ; "Variable not found!" 7542 CF ERROR RST 8 7543 01 DEFB 1 ; "Parameter error!" ; "(Wrong string length)" 7544 CF L_ERR RST 8 7545 19 DEFB 25 ; ; "HL points at name S$" 7546 23 GOT_S INC HL ; "Check length is >0" 7547 7E LD A,(HL) 7548 B7 OR A 7549 28F9 JR Z,L_ERR 754B 23 INC HL ; "Check length is <256" 754C 7E LD A,(HL) 754D B7 OR A 754E 20F4 JR NZ,L_ERR 7550 23 INC HL 7551 E5 PUSH HL ; "IX points at S$ text" 7552 DDE1 POP IX ; ; "Find replacement, R$" ; 7554 2A4B5C LD HL,(VARS) 7557 7E NEXT2 LD A,(HL) 7558 FE52 CP "R" 755A 280A JR Z,GOT_R 755C FE80 CP T_END 755E 28E2 JR Z,ERROR 7560 CDB819 CALL F_VAR 7563 EB EX DE,HL 7564 18F1 JR NEXT2 ; "HL points at name R$" 7566 23 GOT_R INC HL ; "R_LEN points at R$"
7567 22AE5C LD (R_LEN),HL ; "Check length is <256" 756A 23 INC HL 756B 7E LD A,(HL) 756C B7 OR A 756D 20D5 JR NZ,L_ERR 756F ED5B535C LD DE,(PROG) 7573 1B DEC DE ; ; "**** MAIN SEARCH LOOP" ; ; "Find length of line" 7574 13 LINE INC DE 7575 13 INC DE 7576 13 INC DE 7577 ED53AC5C LD (L_LEN),DE 757B 13 INC DE 757C 13 INC DE 757D D5 FIND PUSH DE ; "Get old data length &" ; "point HL at old data" 757E DD46FE LD B,(IX-2) 7581 DDE5 PUSH IX 7583 E1 POP HL ; "Match B characters" 7584 1A MATCH LD A,(DE) 7585 BE CP (HL) 7586 2067 JR NZ,GO_ON 7588 23 INC HL 7589 13 INC DE 758A 10F8 DJNZ MATCH ; "Match found, work out" ; "difference of lengths" 758C 2AAE5C LD HL,(R_LEN) 758F 7E LD A,(HL) 7590 DD96FE SUB (IX-2) ; "A = extra bytes needed" 7593 2849 JR Z,NO_OK 7595 302C JR NC,ADD_A ; ; "Discard 256-A bytes" ; 7597 ED44 NEG 7599 4F LD C,A ; "Line length=length-BC" 759A 2AAC5C LD HL,(L_LEN) 759D 5E LD E,(HL) 759E 23 INC HL 759F 56 LD D,(HL)
free area of memory. It's 225 bytes long,
so if you've already got another machine
code routine from address 53246 onwards, you might CLEAR 53020 and load
the code at 53021. Multisearch will work
happily on a 16K computer. If you're
really pushed for space you could load it
into the printer buffer at 23296, so long
as you don't use the printer until you've
finished with Multisearch. |
Wherever it ends up, you call the routine by jumping to its start - with RANDOMIZE USR 53021, for example. But before you do this you must tell Multisearch the text you want to alter. You do this by setting the Basic variables S$ and R$.
Logically enough, S$ should contain the text you want to search for, and R$ should contain the replacement. This is the essence of the power of Multisearch - the text can be program- generated, so you're not just limited to what you can type in. You can enter keywords in strings by typing THEN (Symbol Shift 'G'), followed by the keyword, and then stepping back to scrub out the THEN before you press Enter.
If you load Multisearch into the printer buffer you could try it out with this simple program:
10 LET S$="OLD TEXT"
20 LET R$="NEW TEXT"
30 RANDOMIZE USR 23296
When you RUN the code and
LIST it you'll find that S$ and R$
now refer to the same text. Of course, S$ and R$ don't
have to be the same length. The only restrictions are that both strings must be less
than 256 characters long, and S$ mustn't
be empty (!). In either case, Multisearch
detects the problem before it tries to alter
anything, and reports a 'Parameter
error'. If S$ or R$ are not set, you'll
receive a 'Variable not found' message
and the program will be unchanged. |
Multisearch is very fast, but it can take a few seconds to make major changes to a long program. You can break into it while it's working by pressing the Space key. The routine stops once it's made the current change and spits out a 'Break into program' message. If the routine runs out of room to make changes it'll do as much as it can and then report 'Out of memory'.
It's important to realise that Multisearch doesn't check the syntax of lines
as it alters them - this would make it
slow and much less versatile. However it
means that you can thoroughly mess up a
program by, say, changing all the LET
keywords into POKEs. |
If you corrupt a program in this way you'll get a 'Nonsense in Basic' error when you try to RUN it. Be careful if you change the keywords back automatically - you could end up changing genuine POKEs into 'nonsense' LETs. The moral of the story is to be careful before you use Multisearch ... if in doubt, SAVE your Basic before you mangle it.
TRICKY DIGITSThis business of using strings is all very well, but it doesn't help us replace numbers
75A0 EB EX DE,HL 75A1 B7 OR A 75A2 ED42 SBC HL,BC 75A4 EB EX DE,HL 75A5 72 LD (HL),D 75A6 2B DEC HL 75A7 73 LD (HL),E ; "Adjust R$, S$ pointers" 75A8 DDE5 PUSH IX 75AA E1 POP HL 75AB ED42 SBC HL,BC 75AD E5 PUSH HL 75AE DDE1 POP IX 75B0 2AAE5C LD HL,(R_LEN) 75B3 ED42 SBC HL,BC 75B5 22AE5C LD (R_LEN),HL 75B8 E1 POP HL ; "Shrink from start" 75B9 E5 PUSH HL 75BA CDE819 CALL SHRNK 75BD 181F JR NO_OK ; ; "Extended jumps" ; 75BF 18BC FINDX JR FIND 75C1 18B1 LINEX JR LINE ; ; "Add A bytes" ; 75C3 4F ADD_A LD C,A ; "Add BC to line length" 75C4 D5 PUSH DE 75C5 2AAC5C LD HL,(L_LEN) 75C8 5E LD E,(HL) 75C9 23 INC HL 75CA 56 LD D,(HL) 75CB EB EX DE,HL 75CC 09 ADD HL,BC 75CD EB EX DE,HL 75CE 72 LD (HL),D 75CF 2B DEC HL 75D0 73 LD (HL),E ; "Update S$, R$ pointers" 75D1 DD09 ADD IX,BC 75D3 2AAE5C LD HL,(R_LEN) 75D6 09 ADD HL,BC 75D7 22AE5C LD (R_LEN),HL 75DA E1 POP HL 75DB CD5516 CALL XPAND
; ; "Copy new data to prog" ; 75DE D1 NO_OK POP DE 75DF 2AAE5C LD HL,(R_LEN) 75E2 0600 LD B,0 75E4 4E LD C,(HL) ; "Check R$ isn't empty" 75E5 79 LD A,C 75E6 B7 OR A 75E7 2808 JR Z,NEXT ; "Bounce HL past length" 75E9 23 INC HL 75EA 23 INC HL 75EB EDB0 LDIR ; "Search on from (DE)" 75ED 1802 JR NEXT ; ; "Try the next position" ; 75EF D1 GO_ON POP DE 75F0 13 INC DE ; "Check user isn't bored" 75F1 3E7F NEXT LD A,127 75F3 DBFE IN A,(254) 75F5 1F RRA 75F6 3802 JR C,CONT ; "Generate BREAK error!" 75F8 CF RST 8 75F9 14 DEFB 20 ; "Locate end of program" 75FA 2A4B5C CONT LD HL,(VARS) 75FD B7 OR A 75FE ED52 SBC HL,DE ; "Return at end of prog" 7600 D8 RET C ; "Check for new line no." 7601 1A LD A,(DE) 7602 FE0D CP ENTER 7604 28BB JR Z,LINEX ; "Don't scan hidden nums" 7606 FE0E CP NUMBR 7608 20B5 JR NZ,FINDX ; "Skip over the number" 760A 210600 LD HL,6 760D 19 ADD HL,DE 760E EB EX DE,HL 760F 18E9 JR CONT END
in program lines. We can't store a number in a string without putting it in quotes (or using STR$). LET A$="1" is OK, but LET A$=1 gives an error, and we've already discovered that numbers outside quotes have a special format. To illustrate this, try out the following program:
10 LET S$="40"
20 LET R$="60"
30 RANDOMIZE USR 23296
40 PRINT "Hello";
50 GO TO 40
When you RUN this program it'll replace the text '40' in line 50 with the text '60'. However, it won't replace the hidden binary form; the program still prints out 'Hello' over and over again, because ZX Basic uses the binary form of the line number (still 40), and ignores the text completely. You end up with a line that reads GO TO 60 and performs a GO TO 40!
This is a very useful trick to discourage people from editing your programs - you can jumble up the text of the line numbers but the program will still work correctly because the binary forms are unchanged. The hidden binary is removed when a line is edited (to stop it getting in the way as you move along the line) and the binary is re-calculated from the text when you press Enter. This means that the jumbled values are taken literally after a line is edited, changing the way the program works and hence discouraging fiddlers.
You can save a little memory by replacing the text of each number by a single digit. However you can't dispense with the text altogether - there must be some numeric text between the GO TO and the CHR$ 14, or Basic will spot the subterfuge and give the game away with a 'Nonsense in Basic' error.
BINARY CHOICEWe still can't alter numbers properly. The routine so far will only change text within a program ... it can't replace the binary form of numbers. The solution is to distinguish between numbers and strings, and use a small Basic program to work out the binary form of a number. An appropriate routine is given, which should be MERGEd with your Basic program once the Multisearch code is loaded.
Rather than use a complicated routine to generate binary forms, this program 'cheats' by storing the required number in a variable and then PEEKing the contents of the variable area (which always contains binary values in the same form as that used within programs).
To use the program type GO TO 9990 and press 'T' or 'N' to indicate whether you want to search for text or a number. Then type the data required, exactly as it appears in the program. If you select 'N', the program adds the numeric form to S$. Next you specify the replacement, which may (once again) be text or a number. The program STOPs once the requested changes have been made.
This technique is not ideal, but it does allow numbers to be changed properly without denying you the ability to alter numeric text and leave binary forms unchanged. If you need to process a pattern which contains a number, you'll need to add other characters around the search or replacement string, using the normal Spectrum string handling commands.
You can use the 'binary form' program as a subroutine if you replace the STOP in line 9902 with a RETURN and get rid of the CLEAR statement in line 9900. However you must make sure that V is the first variable encountered when your program is RUN. The routine finds the binary form of a number by storing it in variable V, and then PEEKing the first entry in the variable table. If V isn't the
first entry you'll get incorrect results. |
ASSEMBLER LISTINGMultisearch uses a number of interesting routines and could form the basis of a complete Basic toolkit. The assembly code of the routine, produced by the whizzo new Microdrive version of the Picturesque Editor Assembler, is a little more repetitious than it need be, since it's written in relocatable code. This means it'll run anywhere in memory without modification, but also that it can't use any internal subroutine calls, since the location of each subroutine is not fixed.
Broadly speaking, the program can be divided into two sections. The first part (up to the label LINE) is used to find the variables S$ and R$ and check that they contain correct values. The code to find S$ is duplicated to locate R$ - the only difference is the letter of the name and the extra check to make sure that S$ contains at least one character.
At FINDS, the program points HL into the variable area and then looks for a capital 'S'. This indicates the start of the storage allocated to S$, as explained on page 168 of the Spectrum manual. The ROM routine F_VAR is used to step from one entry to the next until the required letter is found, or the end of the table is reached - in which case a 'Variable not found' error is generated.
Strings stored in the variable area are preceded by their length, recorded in two bytes in normal Z80 fashion - low byte first. Multisearch can't cope with strings of more than 255 bytes (the code is kept simple! ) so it generates a 'Parameter error' if the most significant byte of either string length is not zero. If all goes well IX is left pointing to the text of S$.
From NEXT2 onwards the routine looks for R$. The address of the string (a pointer to the length, in this case) is stored at R_LEN, at the end of a Basic work area called MEMBOT. DE is pointed just before the start of the Basic program (as if the Enter at the end of a
120 CLEAR 29999 130 LET c=-26434 140 FOR i=30000 TO 30224 150 READ a 160 LET c=c+a 170 POKE i,a 180 NEXT i 190 IF c THEN PRINT "DATA ERROR": STOP 200 SAVE "Megasearch" CODE 30000,225 210 SAVE "Megasearch" 1000 DATA 42,75,92,126,254,83,40,14 1010 DATA 254,128,40,6,205,184,25,235 1020 DATA 24,241,207,1,207,25,35,126 1030 DATA 183,40,249,35,126,183,32,244 1040 DATA 35,229,221,225,42,75,92,126 1050 DATA 254,82,40,10,254,128,40,226 1060 DATA 205,184,25,235,24,241,35,34 1070 DATA 174,92,35,126,183,32,213,237 1080 DATA 91,83,92,27,19,19,19,237 1090 DATA 83,172,92,19,19,213,221,70 1100 DATA 254,221,229,225,26,190,32,103 1110 DATA 35,19,16,248,42,174,92,126
1120 DATA 221,150,254,40,73,48,44,237 1130 DATA 68,79,42,172,92,94,35,86 1140 DATA 235,183,237,66,235,114,43,115 1150 DATA 221,229,225,237,66,229,221,225 1160 DATA 42,174,92,237,66,34,174,92 1170 DATA 225,229,205,232,25,24,31,24 1180 DATA 188,24,177,79,213,42,172,92 1190 DATA 94,35,86,235,9,235,114,43 1200 DATA 115,221,9,42,174,92,9,34 1210 DATA 174,92,225,205,85,22,209,42 1220 DATA 174,92,6,0,78,121,183,40 1230 DATA 8,35,35,237,176,24,2,209 1240 DATA 19,62,127,219,254,31,56,2 1250 DATA 207,20,42,75,92,183,237,82 1260 DATA 216,26,254,13,40,187,254,14 1270 DATA 32,181,33,6,0,25,235,24 1280 DATA 233
|If you haven't got an assembler or Hex loader to hand, just type in the Basic listing of Multisearch given above and let the DATA statements work their magic.|
9990 CLEAR: LET v=0: PRINT "Look for (N)umber or (T)ext?": GO SUB 9993: LET s$=a$ 9991 PRINT "Replace with (N)umber or (T)ext?": GO SUB 9993: LET r$=a$ 9992 RANDOMIZE USR 30000: STOP: REM 30000 is the CODE address 9993 PAUSE 0: LET b$=INKEY$: IF b$<>"N" AND b$<>"T" AND b$<>"n" AND b$<>"t" THEN GO TO 9993 9994 INPUT "Enter data ";a$: IF b$="T" OR b$="t" THEN RETURN 9995 LET v=VAL a$: LET a$=a$+CHR$ 14: LET i=PEEK 23627+256*PEEK 23628: FOR j=i+1 TO i+5: LET a$=a$+CHR$ PEEK j: NEXT i: RETURN
|Once you've got Multisearch up and running, use this short routine to get the show on the road!|
previous line had just been reached) and
the main loop through the program
begins at LINE. |
At LINE the routine expects the end of a line and the start of a new one. It skips over three bytes - the Enter and line number - and stores a pointer to the line length in L_LEN. We need to know where the line length is recorded since we may need to alter it if we add or delete characters in the line.
FIND is the point at which Megasearch [sic] tries to locate the search string. DE is saved, so that we know where the match did (or didn't) occur, and then the loop at MATCH is used to see if the characters from DE onwards match those from IX onwards. Register B contains the length of S$. If the comparison fails before B reaches zero, the program leaps off to GO_ON, but if all goes well, the length of R$ is fetched and compared with that of S$. If the two are the same, execution continues at NO_OK (pronounced 'number OK'!) - otherwise some characters must be inserted or deleted so that the replacement text fits in the line.
The job of adding or removing characters is not trivial, since any change in the program size also alters the location of variables, and other useful pieces of information. Luckily, ROM routines exist to adjust the program size and make sure that nothing gets lost. SHRNK and XPAND remove or add BC characters at the location pointed to by HL. XPAND produces an 'Out of memory' error if
there's no room for the extra characters. |
If S$ and R$ are different lengths then Multisearch must adjust the line length (as explained earlier) and alter the pointers to S$ and R$. Any movement of the program also sends the variables skidding around memory, since they're stored at the end of the program. This took a little while to puzzle out when we tested the machine code!
A couple of extra jumps are located between the Delete and Insert instructions - the main loop is too long to be traversed in a single relative jump (it can only cross 126 bytes at one mighty bound) so FINDX and LINEX are used as 'staging posts' on the way to FIND and LINE respectively.
Various paths meet at NO_OK. At this point a correct match has been found and the address on the stack points to the place where R$ must be stored. An LDIR is used to copy the new text into the program. This leaves DE pointing to the character after the new data, from whence the search can re-start. If S$ didn't match the program we have to advance DE and start again one byte further through the program. This step is performed at GO_ON.
Whether or not a match was found, we end up at NEXT, where the Break key is polled in case the user has decided to give up. The routine stops with a BREAK error if bit zero at port address 32766 (the Space key) is reset. At CONT the contents of the system variable VARS are compared with the address in DE.
If DE is pointing into the variable
area we've finished, and the routine
RETurns. Otherwise we must look further through the program, although
before that we check for a couple of 'special cases'. If DE points to an 'ENTER'
character we've reached the end of a line,
so we should pick up the new line length
by looping back to LINE. |
If DE points at a number marker - CHR$ 14 - we must skip over the binary data since it could contain values which appear to be text or keywords, but aren't really. This doesn't stop us finding numbers, since those will always start with an ASCII character (probably a digit). If we've reached the CHR$ 14 we've gone too far.
POSSIBLE IMPROVEMENTSThere are lots of ways in which Multisearch could be improved, but the existing code works and it doesn't take long to type in! It might be useful to make it return a count of the number of replacements found, and perhaps a list of the lines in which changes were made. It would be convenient (but perhaps rather difficult) to re-code the 'binary form' program in machine code.
As it stands, Multisearch is a simple but very effective routine with a multiplicity of uses. There can't be many short routines which can be used to make ZX Basic edit-proof, faster, more concise, more readable, and more versatile. Do let me know what you make of Multisearch.