Halfbakery: 12864

The 12864 Microprocessor

This is a daydream...let me call it the 12864....

Let me start by saying I am prejudiced in favor of the 6809 microprocessor, created by Motorola. That it was the best of its day was confirmed when NASA decided to use it in the Space Shuttle's main computers. I personally feel that the 6809 should have had wider acceptance in the personal-computer market, and that Motorola snubbed its potential by introducing the 68000 too quickly. Only recently, with the widespread use of the 32-bit microprocessors, has the 6809 really become outclassed. So it is time to move on, time to create a new best microprocessor of the day. Since this is currently only my own dream, it has been greatly influenced by what I know of the 6809, and also what I have learned about the 68020. I do not mean to ignore any worthy contributions from other microprocessors; that is in fact the main reason for this essay! I am sharing my dream in hopes that it may be catching....

The 12864 is a 128/64-bit microprocessor. It has 64 address lines, and all registers are 64 bits wide. But it also has 128 data lines, and this is why: First, being able to handle this many bits at once means that the 12864 doesn't need a coprocessor; most coprocessors only handle 80 bits or so. Therefore the 12864 also doesn't need a secondary instruction set telling it how to talk to a coprocessor. A second reason for having a 128-bit data path leads to further simplification of the microprocessor: All its instructions have been carefully designed to fit within 128 bits, so that a single memory-access can provide the 12864 with a whole instruction. To make this still more efficient, the computer that incorporates a 12864 will be required to have 128-bit-wide memory, and not the common 8-bit-wide or 9-bit-wide memory of most of today's microcomputers. This means that the 64-bit Program Counter or PC register is always incremented just once for each instruction pulled from the memory. The 12864 is not much of an evolutionary offshoot from previous microprocessors; it's a radical mutation. Only in the efficiency of its instruction set does it relate to the 6809....

With 128-bit memory, design decisions made in the 6809 and 68020 are greatly simplified in the 12864. Example: Because the 6809 fetched instructions only 8 bits at a time, there were two distinct groups of Branch instructions: an 8-bit branch and a 16-bit branch. Machine code that used 8-bit branches as often as possible was both shorter and faster than code that always used 16-bit branches, because only one byte of memory and 1 clock-cycle of time was needed for 8-bit branching-data, while 2 bytes and 2 cycles were needed for 16-bit data. (Not to mention that 8-bit-branch INSTRUCTIONS were themselves only 8 bits, while most 16-bit branches also had 16-bit opcodes.) And in the 68020 processor, although there are 8-bit, 16-bit, and 32-bit branch instructions; the latter, 32-bit type requires an extra fetch of data from the memory. But the 12864 processor needs only one size of branch instruction, because any 64-bit branch-distance will always fit into a one-clock-cycle 128-bit opcode+data fetch.

Likewise, because any 64-bit address in the memory can be part of a 128-bit fetch, there is no longer any need for a special Direct Page or DP register. In the 6809 the DP register offered an 8-bit way to access part of the memory; thus the longer and slower 16-bit way of specifying memory locations did not always have to be used. This is not a problem in the 12864.

Now what about the choice to use 64-bit-addressing? This represents about 18.4 quintillion addresses (18,446,744,073,709,551,616 addresses, to be exact), far beyond any reasonable projection of any computer's memory needs -- including virtual memory! Not to mention that since each address holds 128 bits of data, we are actually talking about 295 quintillion (8-bit) bytes of memory!

Nevertheless, there are some possibly valid reasons for this choice: First, since the design of this processor is not yet completely fixed, and belongs to nobody, it might be that it could tickle the fancy of a number of different chip manufacturers, and lead to a Industry-Wide Standard Design. Naturally, it makes sense for the 12864 assembly language instruction set to become standardized and non-proprietary, also. Therefore a second reason for choosing 64-bit addressing is simply that it would take longer to put this complex chip into production -- and that hopefully gives the software developers plenty of time to convert their existing software to run on this admittedly incompatible processor. Thus, both the new computers and their software could arrive at the same time! Finally, a third reason for jumping straight to 64-bit addressing is that the architecture of the new computers can be designed with that in mind. Simply because 64 bits represents such a tremendous enhancement, making it the immediate goal means it can remain a standard far into the future....

Now let's get into some of the details of the 12864. The total number of registers of all types will be about 45, give or take a few. This number can be decided after the Condition-Code/Status Register has had its bits defined. As stated earlier, every register is 64 bits wide, including CCS. In the CCS register a number of bits are necessary for various processor functions; just how many depends on the total list of functions that will be designed. For the purposes of this essay, let us examine the CCS register of the 68020: It is 16 bits wide, of which 12 are defined and 4 are undefined. If we start with a 64- bit CCS and only use 12 of them for such things as result-of-instruction flags, interrupt masks, etc, then that leaves 52 bits that can be equated to the entire register set of the 12864 microprocessor. However, it is certain that some of those 52 bits will be dedicated to other processor functions (but I don't know other dreamers will add to this), and so the number of registers is yet unknown.

In case you are wondering why match the bits of CCS with the register set, the answer involves the interrupt system. Whenever an interrupt or exception or other special event occurs, the processor can automatically save on a stack all the registers that are specified in the CCS register. The processor saves time because none of those interrupt-type handling routines need include instructions to specifically save and recover the registers they use. In fact, if the 12864 computer system's main power-up/initialization routine includes defining such a list of registers in CCS, then all interrupt-type routines can be written using only those registers. Different boot software, different registers. Note that 2 registers, the Program Counter and CCS, which ALWAYS are saved, do NOT need to be matched to bits in CCS, and so the 12864 can have 2 more registers than the simple count of available bits in CCS implies.

The next thing to discuss is the actual list of registers. A major element in the design of the 12864 is that as far as the programming instruction set is concerned, all registers are treated equal. But as far as the microcode and the hardware is concerned, some are more equal than others.... For the sake of this discussion, let us assume that there are 45 registers, numbered from 0 to 44. Suppose that Register 33 is the Program Counter, while Register 17 is just an ordinary general-purpose register. The hardware will always use 33 as a pointer to the current instruction about to be executed, and the hardware will always adjust 33 to point it at the appropriate next instruction. But the instruction set will not distinguish 33 from 17! A Logical-OR instruction that manipulates a group of bits inside 17 can just as easily manipulate bits inside 33, simply by specifying 33 instead of 17, in the Logical-Or instruction. Just because this is something that might be disastrous to the program is no reason to keep it from being possible! Let the assembly-language programming tool be written so that it catches such dubious instructions, and warns the programmer! The big advantage of this scheme is that it leads to an extremely significant reduction in the total complexity of both the instruction set and the microcode. Examples later on in this essay may make this more clear.

Let us now examine the bit-format of some of the instructions. By far the majority of the instructions will have a single format that offers astounding programming potential (well, what do you expect with 128 bits to play with!).... Actually, most of this instruction-group format fits into 64 bits, numbered 0 to 63, and defined as follows:

Bits 63-58: These 6 bits hold the actual generic instruction. Of course this means that there are only 64 such instructions, but if you have any doubts about this being enough, you don't yet realize how generic they are!

Bits 57-46 are divided into three groups of 4 bits each, hereinafter to be referred to as 'admode fields', short for 'addressing mode'. Since these fields have 4 bits, it follows that there are 16 different addressing modes. They will be explained shortly. The first admode field, bits 57-54, tells the processor where to find the first chunk of data needed for some instruction, say a SUB. The second admode field, bits 53-50, tells the 12864 where to find the second chunk of data; obviously a SUB instruction needs data that can be subtracted. And the third admode field, bits 49-46, tells the processor where to put the result of the SUB. Perhaps you now see that with 16 addressing modes for each admode field, a simple generic SUB instruction can encompass both registers and memory in quite a few different combinations!

For convenience, let us call the admode fields GET1, GET2, and PUT. A list of proposed addressing modes follows, and if it is adopted, there will be a few restrictions on the use of two of them. Should the list be modified during later design stages of the 12864, these restrictions may still apply. The modes subject to restriction are marked with * symbols; the limitations are detailed at the end of the list. The admodes are numbered 0 to 15 in binary.

{POST NOTE: SORRY FOLKS, but the lack of a fixed font here does bad things to the table layouts. If you copy/paste to Notepad, you can remove excess carriage returns, and the tables should look better.}

Direct Modes 0 to 3 Semi-Direct Modes 4 to 7
_0000 Register Data+16\9\10\3bit Offset__0100 Reg Address + 16\9bit Offset
_0001 Register Data+16\9\10\3bit Adjust__0101 Reg Address + 16\9bit Adjust
_0010 GET1=PUTmode, or GET2 or PUT=NONE__0110 Reg Addr+(Reg+10\3bit Adj) Offset
*0011 Immediate 64bit Data______________*0111 Absolute 64bit Address

_____________________________Indirect Modes 8 to 15
_1000 [Reg + 16\9bit Offset],LSig64bits__1100 [Reg + 16\9bit Offset],MSig64bits
_1001 [Reg + 16\9bit Adjust],LSig64bits__1101 [Reg + 16\9bit Adjust],MSig64bits
_1010 [Reg+(Reg+10\3bit Adj)Offst],LS64__1110 [Reg+(Reg+10\3bit Adj)Offst],MS64
_1011 [Reg],LS64+(Reg+10\3bit Adj)Offst--1111 [Reg],MS64+(Reg+10\3bit Adj)Offst

Explanations

Reg Address: The value in a register is considered to be an address.

16\9bit, 10\3bit: 16 or 9 or 10 or 3 bits of twos-complement information, sign-extended to 64 bits (internally) by 12864 processor.

64bit: 64 bits of information fetched with the 64-bit generic instruction.

Adjust: Value in register is modified, using twos-complement information. If info is negative, register adjusted BEFORE instruction executed. If info is positive, register adjusted AFTER instruction executed.

Offset: Similar to Adjust, but register not modified. Computation of the Offset is always performed before instruction is executed.

NONE: No data or address at all.

[]: Value inside brackets is an address. Information at that address is in turn used as an address.

,LSig64bits ,LS64: An address holds 128 bits of data, of which the Least Significant 64 bits are selected for the instruction.

,MSig64bits ,MS64: The Most Significant 64 bits at an address.

(Reg): Distinguishes a second register that this addressing mode uses.

* Recall design decision that limits instructions to 128 bits, including 64 bits of Immediate Data or Absolute Address. It's quite obvious that only one Admode Field can get to use those 64 bits. It also works out that if different admodes exist in all three Admode Fields, then none of the *-marked admodes may be placed in any Admode Field. And in any instruction that uses data acquired through the GET2 field, or in which GET1 is different from PUT...such instructions exclude the Absolute-64bit-Address mode from the PUT field. (Of course, Immediate Data mode is always excluded from the PUT field.) More details of these limits will be provided later; for now it might be noted that the reason that no 64bit-Offset modes exist is to avoid a lot of trouble. It makes the programmer use more registers for indexing, but eliminates much competition between the Admode Fields for the use of the 64 bits that accompany the instruction. Besides, you might be surprised by how well other instructions can replace any 64-bit Offset modes! Anyway the 12864 processor will probably have 30 or more general-purpose registers (registers the hardware doesn't always modify for specific purposes, like CCS or the Stacks---or have their contents used for other purposes, like pointing at cache or program data). It may be easy to find enough available registers for most address-pointing.

Now for some descriptions of the 16 admodes and their consequences:

Direct Modes 0 to 3 all specify data the 12864 processor has on hand, in a register, or just loaded along-with or as-part-of the instruction. Obviously, these modes can be executed more quickly than the Semi-Direct or Indirect Modes.

(0) In this admode the data needed by the current instruction is in one or two of the registers. The 12864 processor has 128 data lines; just because every register is only 64 bits wide is no reason to limit its ability to process 128 bits. ANY TWO registers may be put together, in any order, to make a place that holds 128 bits! (Okay, I exaggerated; the 12864 will have both a 'Boss' mode and a 'Peon' mode. Only the Boss mode can put ANY two registers together; in the Peon mode a lot of combinations will be illegal. And, even in the Boss mode, a lot of combinations will be undesirable, like using a Stack pointer with a Cache-pointing register; the 12864 Assembler would warn the programmer.) Note that admode 0 merely declares that one or two registers will be used; the actual register(s) specified are elsewhere among the many bits of this generic format. After the processor identifies the register(s) holding the data, an offset will be applied to that data. The offset quantity shall be used by the 12864 in its implementation of the instruction; the register(s) holding the data will not be affected by the offset. The maximum size of the offset is affected by how many registers are used, and by the type of generic instruction being performed; the details of this will be provided later. The main purpose of admode 0 is to let us eliminate the LEA (load effective address) instructions from the processor's list of 64 generics--but certainly other uses will be found for it.

(1) This admode is very much like admode 0. The only real difference is that the content of the register(s) IS affected by this admode, which makes the mode useful in counting loops. One thing to keep in mind is that any negative adjustment is performed before the whole instruction is implemented, while any positive adjustment is performed after the overall instruction is implemented. This admode also helps us eliminate LEA instructions (details later).

(2) This is the only admode with a double meaning. If admode 2 is used within the GET1 field, then it means that the first chunk of data, needed by the instruction, is currently in the place specified by the PUT field. Thus data at some location, after manipulation, will return to that location. If we exactly specify the same admode in both the GET1 and PUT fields (instead of using admode 2 in GET1), we end up being unable to use Immediate Data at all--you'll see! If admode 2 is used in the GET2 field, then it means NONE, no data for that part of the instruction. Operations like LSH (logical shift) use admode 2 in GET2; they need only one main data chunk since any other data is part of the instruction's definition. In fact, if any admode besides 2 is in GET2 during a LSH or similar instruction, then the admode should be ignored, or declared illegal. If admode 2 is in GET2 during a SUB or similar instruction, then the net effect of the SUB will be equivalent to a TST instruction. (With lots of TST-equivalents, there need not be a specific TST among the 64 generics. But the 12864 Assembler may include a TST, and translate it into an equivalent.) Finally, admode 2 in the PUT field also means NONE, no address. The computed result of the SUB or other manipulation is not put anywhere, and this is useful, too! The definition of a CMP (compare) is exactly a SUB that doesn't save the result! So the CMP becomes another common instruction that the 12864 processor excludes from its list of 64 generics.... Like TST, the 12864 Assembler can include CMP, and translate it to an equivalent: a destinationless SUB. Similarly, the 6809 BIT operation is an AND instruction with no destination. Designers unite! The 12864 has a full set of destinationless instructions--and no extra complexity! Moving on, suppose admode 2 is in both GET1 and PUT: This is basically a no-operation, NOP. Lots of ways exist to do a NOP; the 12864 Assembler can include NOP, and translate.

(3) This is the Immediate Data admode. Since the instruction is often 64 bits long, while 128 bits are always fetched from memory, admode 3 tells the instruction to use as data the group of 64 bits fetched with the instruction.

Semi-Direct Modes 4 to 7 all specify that the data the processor has on hand are memory-addresses of the data needed by the instruction. Admodes 4 to 7 are slower than the Direct Modes because the 12864 has to go fetch the data from the memory, but this process is still faster than the Indirect Modes.

(4) This admode is like admode 0 in operation. The main difference is that only one register is ever specified, since one register holds 64 bits and the memory addressing range is 64 bits. But the offset is figured the same way as admode 0, and the value in the register is not changed. As mentioned, the result of the offset computation is a memory address; the data at that location is fetched for use by the instruction.

(5) This admode combines features of admode 1 and admode 4. Again only one register is specified as an address-pointer, or index (4). An adjustment of the value in that register will be applied, pre-decrement or post-increment (1). If you review admodes 1 and 4, this one should be pretty obvious.

(6) The basic addressing mode for doing 64-bit (or any size larger than 16-bit) offsets is admode 6. One register is specified as a pointer (index) to the general region of memory; a second register is specified that will hold the offset from the general place to any specific place. Furthermore, this second register can be given a predecrement or postincrement adjustment, which makes it easy to skip through tables of data. Note that although the second register is adjustable, its value is only an offset; the first register remains unchanged.

(7) This admode specifies that the 64 bits fetched along with the 64-bit generic instruction is absolute memory address of data the instruction needs.

Indirect Modes 8 to 11 are quite like Indirect Modes 12 to 15: They are computed the same way, but at some point the data at an address is used as an address. Now since the data is always 128 bits and addresses are only 64 bits, which 64 of the 128 do we use? Thus admodes 8-11 use the Least Significant 64 bits of the 128, while admodes 12-15 use the Most Significant 64 bits.

Note that all the admodes that use registers as indexes let the Program Counter be used as easily as any other register. The 12864 processor needs no special microcode to provide a host of Program-Counter-Relative admodes, due to basic design decision making the instruction set handle all registers equally. The trick to consistency is for the processor to apply any adjustment or offset to chosen index register AFTER incrementing PC past the current operation This in turn works due to design choice to make ALL the instructions fit in 128 bits. Nevertheless, the 12864 Assembler may specifically distinguish Program-Counter- Relative admodes from the other admodes, and translate appropriately. Finally, note that it may be undesirable to use the PC register as a data-pointer in any admode that will adjust the value of the index!

(8) This admode first computes an address in exactly the same way as admode 4. The 12864 processor then fetches the lowest 64 bits from the memory at that address, and uses this information as another address. Instruction will use the data in the memory at the second address.

(9) This admode first computes an address in exactly the same way as admode 5. Then an address is fetched, and then data, as just described.

(10) This admode first computes an address in exactly the same way as admode 6. Then an address is fetched, and then data, as just described.

(11) This addressing mode starts by using the value in a register as an address. The 64 lowest bits in the memory at that address are fetched; they will be used as a second address. However, before they are used, an offset will be applied to that second address. A second register is specified, along with an adjustment. The value in this predecremented/postincremented register is the 64-bit offset that is applied to the second address; the first register's value, and the memory that held the second address, are not changed by this process. After computing the new, offset address, the 12864 processor fetches 128 bits of data from that location in the memory, for the current instruction.

(12) This admode first computes an address in exactly the same way as admode 4. The 12864 processor then fetches the highest 64 bits from the memory at that address, and uses this information as another address. Instruction will use the data in the memory at the second address.

(13) This admode first computes an address in exactly the same way as admode 5. Then an address is fetched, and then data, as just described.

(14) This admode first computes an address in exactly the same way as admode 6. Then an address is fetched, and then data, as just described.

(15) This addressing mode starts by using the value in a register as an address. The 64 highest bits in the memory at that address are fetched; they will be used as a second address. Then everything proceeds just like admode 11.

Now to show how LEA (load effective address) needn't be included among the 64 generics. Consider admode 15: At the end of its computations the 12864 processor has an address which it normally uses, right now, to fetch data, after which the address is not saved. LEA creates that address and saves it for later use and re-use (doesn't use it now). Suppose admode 15 specifies register 10 (f irst), register 7 (second), and an adjustment of -58. The Assembler translates LEA (with syntax specifying admode 15 and the register info) into an ADD: The GET1 field is given admode 4, register 10, and a 0 offset; the processor fetches 128 bits from the address (part of generic instruction we haven't got to lets us select correct 64 bits). GET2 field is given admode 1, register 7, and adjust of -58; the processor modifies the register and gives its content to the ADD. Then the PUT field specifies where to save result. GET2 might have admode 0 and get result without modifying register 7. Any LEA can be translated!

At last we can continue the bit-designations of the generic instruction format. Been about 1000 bytes per bit of explanation, so far...!

Bits 45-39 specify a Bitfield Size for the instruction. These 7 bits can hold any number from 0 to 127, and with 0 being interpreted by the processor as 128, it becomes possible for the instruction to operate on any data size from 1 to 128 bits. Even though the registers of the 12864 microprocessor are only 64 bits wide, its Arithmetic/Logic Unit is 128 bits wide, and is able to handle any data size smoothly. So if the Bitfield Size is 79, then 79 bits will be taken from the place specified via GET1, manipulated (if the instruction requires it) with 79 bits from the place that GET2 indicates, and finally a 79-bit result is sent to the place described by PUT. The 12864 Assembler considers the Bitfield Size to be optional information; if it is not provided by the programmer, a size of 128 bits will be assumed. Some Assembler instructions, like LEA, default to 64 bits due to the nature of the instruction (LEA computes a 64-bit address). MUL will always have two 64-bit inputs and one 128-bit output; DIV will always have a 128-bit dividend, a 64-bit divisor, and a 128-bit quotient. And whenever Immediate Data is specified, then either the whole instruction must be limited to 64 bits, or the processor must allow 64 bits to be used in the manipulation of 128 bits. (Perhaps we can have both: The processor can have the ability to do the latter, while the Assembler lets the programmer decide the former.) One way the programer can set the Assembler's default to 64 bits would be to simply specify only one data-holding register in an instruction's syntax.

Bit 38 of the generic instruction is the Signed Extension Flag. It tells the processor to treat the result of an operation as a twos-complement number, if this bit is set. When the result is PUT into its destination, its negative- ness or positive-ness, as it exists within the Bitfield Size, is extended out to the Bit-127-mark (the Most Significant Bit is numbered 127; the Least is 0). If only one register is specified, then sign-extending the result out to the Bit-63 -mark is the thing to do. If the Signed Extension Flag is not set, the result of the instruction is simply PUT into its destination, and nothing else is done.

Bits 37-34 contain the Do-If condition. Practically the whole instruction set of the 12864 processor is conditional. This lets the programmer avoid a lot of conditional-Branches that only skip past a few instructions. Where formerly some code might have: BCS (branch if carry set) followed by a ROT (rotate) that would be executed if the carry flag was clear, now we can specify Do-the-ROT-If Carry Clear, and delete the Branch entirely. In fact, with these 4 bits we can delete the entire collection of Branch operations from the generic instruction set of the 12864! The Assembler simply translates any Branch to ADD Immediate Data to the Program Counter, and sets the appropriate Do-If condition-bits. Of course, most of the time, most instructions will set the Do-If to ALWAYS. With only 4 bits, only 16 conditions are allowed. This is enough for Motorola's 6809 and 68020; I hope the final design of the 12864 processor won't require more.

Bits 33-29 are the Flag Mask bits, the other side of the coin from the Do- If conditions. If every instruction can be controlled by the flags in the CCS register, it follows that every instruction should be able to specify which CCS flags, if any, will be affected as a result of its implementation. In fact, for the Branch instructions to be properly deleted from the generic instruction set, it is essential that flag-masking be possible. Traditionally, Branch operations never affect any flags; translating them into ADD instructions makes it obvious why we require flag-masking. Now consider again the Do-the-ROT-If Carry Clear that was previously described: What if the instruction after the ROT is also to be executed only if the Carry flag is clear? A ROT normally affects the Carry flag! So we mask the flag; the next instruction can also Do-If Carry Clear. In the 6809 and the 68020 there are only 5 conditions-of-results flags; I hope the final design of the 12864 processor won't require more.

Bits 28-0 (yes, all the rest) are devoted to the details of the PUT field. However, the highest seven of them, Bits 28-22, can have another purpose. There is a group of operations that perform what we might call 'minor manipulations', and which may need some minor data. The generic instructions of this class that I have so far identified are, in alphabetical order: ASL and ASR (arithmetic shift left and right), COPY, INIT (initialize), ISUB (subtract from an initial value), LSR (logical shift right; LSL = ASL), ROL and ROR (rotations), and SWAP. ASL, ASR, LSR, ROL, and ROR need data ranging from 1 to 128; INIT, ISUB, and sometimes COPY, need twos-complement numbers ranging from -64 to +63. The specification of 7 bits was decided by the needs of ASL, ASR, LSL, ROL and ROR; other instructions are merely taking advantage of what is already there. Only SWAP does not need any of those seven data bits. We could have assigned eight bits to ASL, etc.; twos-complement numbers from -128 to +127 (with zero = +128) would let us reduce the list of generic instructions even more. Unfortunately, we are running out of bits! So we can either assign 5 of 64 generic operations to various kinds of bit-shift, and use 7 bits to describe the size of the shift --or we can have 3 generic shift operations and use 8 bits to describe the size of the shift. But ONLY those 3 generic instructions ever really need that 8th bit! It seems more reasonable to use an extra 2 of the 64 generic instructions.

Let's examine some of the capabilities of these 'minor' manipulators:

ASL (and the identical LSL) merely shift bits from Least Significant to Most Significant. The Bitfield Size determines how many bit-positions will be involved in the shift. There is also some Bitfield Start data (which we haven't got to yet, but has to be mentioned NOW) that specifies exactly where among the 128 bits the Bitfield Size is located. The 12864 Assembler needs to scrutinize these things carefully; we can't let Bit 100 be the Start while the Size is 34 bits, nor let the Size be 52 bits while the Shift is 73 bits! One final thing about ASL and LSL: Perhaps they shouldn't be so identical. The 6809 processor defines them so that there's no reasonable difference between an ASL and an LSL. But the 68020 places a new flag in the CCS register, an eXtend flag designed to hold a bit of data specifically for arithmetic operations. The Carry flag holds data for both arithmetic and logical operations. Yet LSL and ASL both affect the X flag! So perhaps a distinction can reasonably be made: Only ASL should affect X. (ASR and LSR also have this small irrationality.)

ASR and LSR are similar to ASL, of course, their main difference being that these instructions shift bits from Most Significant to Least. More details of what they do need not be presented here; they all are common instructions. But we might note that the power of the 12864 processor lets us get data from just about any place in the computer (using GET1), shift or otherwise manipulate any part of that data, and then PUT the result almost anywhere else, all in just one instruction. The mundane turns into the extraordinary.

INIT lets us initialize a register, or registers, or data at any memory location, such that it becomes a 64-bit or a 128-bit expansion of any number in the 7-bit range of -64 to +63. INIT replaces CLR (clear), which initializes a data-storage place to zero only; now we can initialize to 1, or -1, or to any of more than a hundred possibilities. Note that INIT never needs any GET1 info.

ISUB replaces both NEG (negate) and NOT, which respectively subtract a number from 0 or -1. ISUB subtracts numbers from anything in the Initial Value range of -64 to +63. Finding other uses for this operation is not so important as consolidating NEG and NOT into one generic instruction. The Assembler will, of course, retain both NEG and NOT, and translate appropriately.

ROL and ROR are pretty much like the shift instructions. The 68020 has another sort of rotation called ROXL and ROXR, but the 12864 may not need them. First examine the rotation operation of the 6809: The Carry flag is always part of the rotation; a bit coming off one end of a byte is moved to Carry by one ROT and moved out of Carry back into a byte by another ROT. In the 68020 a simple ROT moves a bit from one end of a location directly to the other end; a copy of that bit is placed in the Carry Flag. ROX, on the other hand, uses the eXtend flag the same way the 6809 uses the Carry. In the 12864 processor we can mask flags that an instruction would normally affect. Suppose the 12864 rotation is designed to normally flag both X and Carry: If we mask Carry, no copy is sent there; if we mask X, the bit that normally moves through it simply bypasses it. (Similarly, the 12864 can have one generic ASL/LSL operation, but the Assembler can mask the X flag for LSL--if the notion proposed earlier is adopted.)

The COPY instruction replaces LoaD, STore, TransFeR, INC, DEC, JMP from the 6809, and COPY also replaces MOVE from the 68020. Even some LEA operations can be translated to COPY. The GET1 admode field lets us specify any place in a 12864-based computer from which to fetch data (and any number of bits from 1 to 128); the PUT field lets us specify almost any other place to receive a copy of those bits of data. What could be simpler and more powerful? To replace INC and DEC, GET1 can specify admode 2 -- same as PUT. When GET1 holds 2 while COPY is being processed, the 7-bit Initialize-data will be used to modify the place specified by PUT. Instead of only -1 or +1, the INC/DEC can now range from -64 to +63 -- even to +64 if the value of zero is interpreted thus (it's no good for anything else!). Some LEA instructions that the Assembler translates into COPYs will have admode 0 in GET1, a specified register, and a 16-bit offset ranging from -32768 to +32767. PUT would specify the same admode and register, and an offset of zero. Masking the flags is normal for LEA. Larger offsets can become ADD Immediate Data to a register, with the flags masked. JMP instructions are translated into COPY to the PC register, with masked flags--and remember that any JMP can now be conditional! Load and store and transfer and MOVE operations become COPY memory to register, reg. to mem., reg. to reg., and mem. to mem. Another 68020 instruction, PEA (push effective address), may be unneeded in the 12864. It has the effect of computing an address and saving in a place that is NOT a register, for later use (most likely by the Program Counter, since there isn't a LEA-to-PC instruction in the 68020). In the 12864 processor, we simply specify the Program Counter's register-number in the PUT-field data if we want to LEA-to-PC. Otherwise we can PUT the EA almost anywhere else, for later use.

SWAP is similar to COPY, in that the GET1 data specifies one place while the PUT data specifies another. However, as the names imply, they do different things: The 12864 SWAP replaces both the 68020 SWAP and EXG (exchange); data in the PUT place is sent to the GET1 place, as well as the usual GET1-to-PUT. Two thing to note about SWAP are that register-adjustments of zero, in the specified admodes, will probably be common, and the CCS flags will usually be masked. But consider that if the GET1 admode is 2 (same as PUT), then nothing happens. This may be the ideal thing for the Assembler to translate a NOP into. And if the flags are NOT masked while the GET1 admode is 2, during the generic SWAP, then this may be the ideal thing for the Assembler to translate a TST into. (If the flags aren't masked during a normal SWAP, then they will be affected only by the data going from the GET1 place to the PUT place.)

Now back to Bits 28-0 of the generic instruction; as mentioned, they hold the details of the PUT field data; we shall begin with Bits 0-6. These specify the Bitfield Start for the PUT field, from 0 to 127. After the 12864 processor analyzes the identity of the place where the result of an instruction is to be PUT, the Bitfield Start tells it exactly where in that location the result goes. For most instructions, most of the time, the value here will be Zero.

Bits 7-12 specify the number of the first register needed to identify the place where the result is PUT. In other words, if Register 7 is the destination of the data, then a 7 will be here (admode 1 in the PUT field). To modify flag bits in the CCS register, simply set a Bitfield Size of 5 (for 5 flags), the CCS register's number here, and a Bitfield Start of zero (assuming the designers put the CCS flags in the lowest bit-positions of the register). If a memory address indexed by Register 15 is the data's destination (admode 4 or 5 in PUT), then 15 will be the number placed here. Bits 7-12 can hold any number from 0 to 63, and as mentioned early in this essay, the 12864 will probably only have 45 registers or so, total. Anything more than the highest register number would be illegal, of course, even in the Boss mode! If admode 2 or admode 7 is specified in the PUT field, then the processor would ignore any register-number in these bits. Admode 3 would be another, except it is illegal in the PUT field.

Bits 13-28 specify the offset or adjustment to be applied to the register. indicated in Bits 7-12. At least this could be true for instructions OTHER than ASL, ASR, COPY, etc., because only OTHER instructions never need the 7 bits from 22-28. An index register being used with a ROL instruction can only have a nine bit offset or adjustment applied to it (in Bits 13-21, of course). HOWEVER, it can be worse! Bits 13-18 may specify a second register altogether! For admodes 0 and 1, any Bitfield Size 65 or more, we must specify 2 registers. For admodes 6, 10, 11, 14, and 15, a second register is a normal part of address-indexing. (At least those admodes get a 64-bit offset from the second register, applied to the first register.) After a second register has been specified, only the bits from 19-28, or from 19-21, can be used as an offset or adjustment to the second register (a 10-bit or a 3-bit modification, respectively). Here is a chart:

___________|2___________2|2___1|1_________1|1__________|_____________|
___________|8|_|_|_|_|_|2|1|_|9|8|_|_|_|_|3|2|_|_|_|_|7|6|_|_|_|_|_|0|
___________|__16-bit_offset/adjust_________|_First_____|__Bitfield___|
___________|__applied_to_first_Register____|_Register*_|__Start______|
___________|-------------------------------|___________|__for_PUT____|
___________|_ASL,_ASR,___|_9-bit_off/adj___|___________|__only_______|
___________|_COPY,_INIT,_|_to_1st_Register_|___________|_____________|
___________|_ISUB,_LSL,__|-----------------|___________|_____________|
___________|_LSR,_ROL,___|3-bit|_Second____|___________|_____________|
___________|_ROR__data___|_ad-_|_Register*_|_*will_be__|_____________|
___________|_____________|_just|_admode_0,_|_ignored___|_____________|
___________|_____________|_to__|_1,_6,_10,_|_in_admode_|_____________|
___________|_____________|_2nd_|_11,_14,___|_2,_7______|_____________|
___________|_____________|_Reg.|__and_15___|___________|_____________|
___________|-------------------|___________|___________|_____________|
___________|_10-bit_offset_or__|___________|___________|_____________|
___________|_adjust_to_1st_or__|___________|___________|_____________|
___________|_2nd_reg,_depending|___________|___________|_____________|
___________|_on_the_admode.____|___________|___________|_____________|

______And_just_to_be_complete:
____|6_________5|5_____5|5_____5|4_____4|4___________3|3|3_____3|3_______2|
____|3|_|_|_|_|8|7|_|_|4|3|_|_|0|9|_|_|6|5|_|_|_|_|_|9|8|7|_|_|4|3|_|_|_|9|
____|__12864____|_GET1__|_GET2__|__PUT__|__Bitfield___|_|_Do-If_|__CCS____|
____|__Instruc._|_admode|_admode|_admode|__Size,_for__|_|_Con-__|__Flag___|
____|__Code_____|_______|_______|_______|__entire_____|_|_dition|__Masks__|
____|___________|_______|_______|_______|__operation__|_|_______|_________|
_______________________________________________________^
________________________________________________Sign-Extension

Having used up 64 bits of the normal 128-bit fetch by the 12864 processor, it's obvious that to provide details of the admodes specified for GET1 and GET2, we will need to use the other 64 bits. Now it has already been stated that they are supposed to hold Immediate Data or an Absolute Address; the potential for conflict is obvious! This conflict is the main reason admode 2 was created: It makes the GET1 field use the admode in the PUT field, thereby eliminating any need for any specific GET1 information among the second 64 bits of the operation fetch. And if the GET2 field specifies admode 3 (Immediate Data) or 7 (Absolute Address), then THAT is all the GET2 information needed, and the instruction can be properly executed. So the main restrictions of limiting the GET1/GET2/PUT system to a total of 128 bits are these: (1) We can't combine Immediate Data with more Immediate Data; (2) We can't combine Immediate Data with data at an Absolute Address; (3) We can't combine the data at two Absolute Addresses; and (4) We can't use Immediate Data or an Absolute Address in any instruction where the GET1 admode is different from PUT. How much does it matter that we can't do these things? We already can't do them with any current processor, right? What we CAN do is far more important: Not only can we combine Immediate Data or the content at an Absolute Address with the content of any register (normal for any processor), we can also combine our Immediate/Absolute information with the data at any place in the memory that can be index-referenced -- and save it too! The typical 12864 program will probably be position-independent, anyway, and seldom need Absolute Addressing. It likely will start by loading several registers with the addresses of a number of data tables, all relative to the PC register. No Immediate Data there! Then the remaining registers will become variable- holders, and use Immediate Data as needed, just like any other program.

So to be a little more specific about how GET1 and GET2 information is set among the second group of 64 bits, let's first note that it took all of 29 bits for the PUT information. Keeping that the same for GET1 and GET2 means that 58 of the 64 bits get assigned real quick! Suppose we assign the Least Significant 32 bits to the GET1 information, and the Most Sig. 32 to the GET2 information. This leaves 3 bits extra for GET1 and 3 extra bits for GET2. The most obvious thing to do with the extra bits is to expand the offset/adjustment data (from 16 to 19 bits, for example), but perhaps they can be used for something else. Note that the ASL, ROL, etc. data takes space away ONLY from the PUT information. A possible use for one of the extra bits is that of being a flag controlling the Bitfield Start data: If the flag is zero, then the seven bits hold the number of the starting bit; if the flag is one, then six of the seven bits specify a register-number where the information on the starting bit is to be found. It would have been nice to have had enough bits to do this to the PUT field, but it may not be missed too much, since the PUT field's Bitfield Start is likely to be zero most of the time, anyway. So here is one more chart:

__The_GET2_info_duplicates_this_GET1_info,_except_Bit-numbers_range_from_32-63.
_______|3|3_____________________1|1_________1|1__________|_|___________|
_______|1|0|_|_|_|_|_|_|_|_|_|_|9|8|_|_|_|_|3|2|_|_|_|_|7|6|5|_|_|_|_|0|
_______|_|__18-bit_offset/adjust,_applied____|_First_____|_|_Register__|
_______|_|__to_First_Register________________|_Register__|_|_holding___|
_______|_|-----------------------------------|___________|_|-----------|
_______|_|_12-bit_off/adj_to_____|_Second____|___________|_Bitfield____|
_______|_|_1st_or_2nd_Register,__|_Register__|___________|_Start_data__|
_______|_|_depending_on_admode___|___________|___________|_for_GET1____|
________^
______Flag_determining_use_of_register_to_hold_Bitfield_Start_data

Now let's consider a few things about the 12864 Assembler. Obviously it's going to recognize many common assembly instructions, and translate them into the far fewer set of generic instructions recognized by the processor. The set of 12864 instructions may be enlarged, simply to take advantage of the possible list of 64. Ordinary instructions like ADD, SUB, ADDC (add with carry), SUBC, ABCD (add Binary Coded Decimal), SBCD, OR, EOR (exclusive or), and AND may be supplemented with NOR, ENOR, and NAND. I don't propose to offer a complete list here; let the Industry decide all the final details. The main thing that needs some attention right now is the format of the Assembler instructions; each will occupy a fair amount of space! But this is reasonable, considering that a 12864 instruction will usually equal 2, and often 3, regular-processor instructions. All the information from 3 regular lines of Assembly code, plus some new stuff, has to fit on 1 line in this proposed Assembler format:

The Label field gives this place in a program an optional name, so that it referred to from other places in the program, if desired.

The Instruction Mnemonic is, of course, the name of the instruction.

Bitfield Size (BfSz) is simply a number 1-128; if this part of the Assembly format is blank, a 128 Size is assumed -- but Admode data may change it to 64.

7-bit data is required in this area whenever the Mnemonic is ASL, ROL, etc. The nature of this data has already been described. Note exceptions like SWAP and COPY, which the Assembler knows never needs this data. The Assembler offers INC and DEC instructions that will require 7-bit data; the programmer never need see this get translated to COPY. Exceptions are peculiar, aren't they?!

Below are examples of the syntax for the addressing modes:

Admode__Syntax____________________Explanation

0000____16;+20.33______Register 16 has data. A Bitfield Start (BfSt) of 33 is specified, so data extracted from 16 starts at Bit 33 (BfSz specifies how many bits). Extracted data will have 20 added to it (register not affected), before being given to the current instruction. An assumed BfSz of 128 would change to 64 (one register specified) minus 32 (due to the BfSt). Conflicts cause Assembly errors

0000____6 9.18_________Two registers have data: 6 is Most significant; 9 is Least significant. 9 is the First Register in Bits 7-12 of the Specific Information area; recall charts. Data extracted from registers starts at Bit 18.

0000____20:10__________Data in register 20. BfSt in register 10. Note that a period denotes exact BfSt data; a colon means a register has the data. BfSt-in-a-register is illegal in PUT.

0000____10 11__________Two registers have data. BfSt is assumed to be zero. Note spaces denoting registers. Other admodes will use commas, and admode fields must be tabulation-separated.

0000____7 3;-123_______128 bits of data available in registers 7 and 3. BfSz determines how many are extracted. BfSt assumed to be 0. 123 subtracted from it before instruction gets it.

0000____URHERE PC:12___The Assembler will accept either 'PC' or the actual number of the PC register (as yet unknown!). Assembler computes offset between content of PC register (what it will be at end of instruction) and place in memory that is designated by label 'URHERE'. Suppose PC register is

0000____PC;-87:12____\_____34, and the programmer knows that the offset is -87:
0000____URHERE 34:12__>__would be identical to URHERE PC:12. PC is the only
0000____34;-87:12____/_____register to which labels can be referenced, because it is the only register that has it value known at all times by the Assembler (relative to Origin of program). Note the :12 means BfSt is in register 12 (no, I don't know why the programmer wants that in this example!).

0001____20+20.33_______Check first example; note lack of semicolon here. Plus or minus sign mandatory for all offsets and adjustments. With plus adjust, data first goes from register to the instruction. At END of instruction, data in register is adjusted. With minus adjust, register is adjusted before data extracted from it for use by instruction.

0010___________________To specify admode 2, simply leave the field BLANK!

0011____#123456________Immediate data preceded by #. A + or - is optional.

0100____,20;+20.33_____Check first example; note extra comma here. Register 20 now being used as index, with an offset of 20 applied to it. The offset value (index+offset) is the address from which data is removed, starting at Bit 33. The data may extend to Bit 127, depending on the BfSz. Note that initial index is always just ONE register.

0100____,14:2__________Register 14 is index of address holding data; register 2 has data on where is the BfSt. Specifying offsets, adjustments, or bitfield starts is always optional.

0100____URHERE,PC______Note use of comma. Assembler computes offset between PC and URHERE, as before. In previous exmple the ADDRESS of URHERE was the information (ignoring fact that BfSt specified in that example made the address useless!); now the memory content at that address is the data.

0101____,20+20.33______Check similar examples. Here register 20 is an index which is used to fetch data. Afterwards, an adjustment of +20 is applied to the register. If the adjustment is negative, it is applied to the index before the index is used as an address-pointer that tells us where data is.

0110____,10 12-14:5____Register 10 is the index, holding an address. Register 12 has a 64-bit offset to that address. Before offset is applied, register 12 receives adjustment of -14. The address thus found (by applying adjusted offset to register 10) is the address of the data, which will be accessed using the BfSt data in register 5.

0111____>123456________Absolute Address always preceded by > symbol. Out of 18.4 quintillion possibilities, this one is pretty low!

1000____[,20;+20]L.33__See admode 0100; the part inside brackets is figured in exactly the same way, resulting in an address. Exactly 64 bits are always extracted from that address in this admode. They are the Least Significat 64 bits, as the L indicates. The 64 bits are then used as the address of the data, which starts at Bit 33.

1000____[URHERE,PC]L___The lowest 64 bits of the data at address URHERE are extracted and used as an address. The instruction gets its data from the address thus found. The L (or M, in admodes 12-15) is a mandatory part of the syntax. All programmer does is provide correct syntax; the Assembler will deduce from that syntax the admode number, and the specific info, that are built into the instruction.

1001____[,25-2222]L:13_The value in register 25 is adjusted by -2222 (maximum can be -32768 in PUT before an assembly error occurs, or -131072 in GET1 or GET2), and then the adjusted index is used to fetch an address (least significant 64 bits). In turn the fetched address is used to fetch the needed data, using the BfSt in register 13.

1010____[,5 9-873]L____See the example for admode 6 (0110); bracketed syntax is analyzed the same way, this time using register 5 as the basic index, register 9 as holding the 64-bit offset, and -873 as the adjustment applied to the offset, before the offset is applied to the index. The address thusly computed is the place from which the Least 64 bits are taken and, in turn, are used as the address to fetch the data. Note that -873 is too big for PUT information, but would work as GET1 or GET2 information.

1011____[,18]L 6+3_____Value in register 18 is used as an address to fetch an address from the memory. Least significant 64 bits are taken from memory to become an address. Register 6 has a 64-bit offset, which is applied to the extracted address. The thusly-computed new address is the place where data will be found. (I say 'found' or 'fetched', but address is also a possible place to PUT the data.) Afterwards, register 6 is adjusted by +3. This example, if in the PUT admode field, and if the instruction is LSL or one of that group, is using the largest positive allowable adjustment (3 bits, twos-complement). What's the chance of having only 32 generic instructions, so we can move a bit to the PUT information field?

I don't think I need to provide any examples for admodes 12-15; they are identical to admodes 8-11, with the sole exception that the letter Lin the syntax is replaced by M. The Assembler uses L and M to determine the correct admode; the 12864 processor uses the admode to determine that either the Least Significant or the Most Significant 64 bits are to be taken from the memory and used as an address. This process has absolutely nothing to do with Bitfield Sizes and Bitfield Starts.

It should be repeated that these examples are only a proposal; thinking about them is bound to lead to speculation about how easily the programmer can make a mistake by forgetting a comma. A whole different syntax might be created just to reduce the chance of such accidents, perhaps one where mnemonic letters replace the commas, periods, colons, and semi-colongs -- even lower-case letters, to prevent confusion between O/offset and 0/Zero. This syntax simply attempts to make the admode-field information compact.

The next field of the Assembler format, after the PUT admode field, is the Do-If condition. Two letters suffice to abbreviate the possible conditions (at least only 2 letters if Motorola's list is used): HI (higher); LS (lower or same); CC (carry flag clear); CS (carry set); NE (not equal to zero; zero flag clear); EQ (equal; zero flag set); VC (oVerflow flag clear); VS (oVerflow set); PL (plus); MI (minus); GE (greater than or equal to zero); LT (less than zero); GT (greater than zero); and LE (less than or equal to zero). This list totals 14 possibile Do-If conditions; with a maximum of 16 allowed, the last two are usually Do Always and Do Never. For the purpose of the Assembler format, the Do Always condition can be the default if the Do-If field is simply left blank, but it wouldn't hurt to allow a DA abbreviation. A DN abbreviation is logically sensible, but practically almost useless -- a NOP for sure! (If the Assembler converts NOP to SWAP, as proposed, obviously the Do-If would be Never!). Maybe some other Do-If condition can be created, just to use that 16th possibility.

After the Do-If field in the Assembler instruction format is the Flag Mask field. Motorola's flags are abbreviated X, N, Z, V, and C, so simply putting an appropriate letter (or letters) in this field should tell the Assembler that you don't want a particular flag to be affected by the current instruction. Simply entering ZCN without any punctuation should be adequate to specify the Carry, Zero, and Negative-sign flags, for example. Now consider the opposite notion: Some Assembler instructions, like LEA, will be translated into other operations, and the flags will automatically be masked by the Assembler during translation. In the 6809 processor there are two registers Y and U, which are not treated the same by LEA instructions. LEAY will affect the Zero flag, while LEAU will not. The idea is to let register Y be used in counting loops, and it works fine. The 12864 Assembler could allow the same sort of thing: If the programmer specifies the Z flag in the Mask field during an LEA instruction, then the Assembler WON'T mask the flag! More precisely, what is happening is the programmer telling the Assembler to reverse its normal handling of the 12864 flagmask bits. If the Assembler usually doesn't mask a flag, then it will be masked -- and vice-versa.

The last field of the Assembler format is the Comment field, in which the programmer is supposed to explain the purpose of the instruction. This field is completely ignored by the Assembler, of course, during the task of creating the machine code for the 12864 processor from the assembly source listing.

And now my two-cents-worth on the hardware of the 12864 computer; if what I am about to say is really worth as much as two cents, I'll be surprised! The average computer has a System Clock that controls the timing of everything that goes on in the computer. The average microprocessor accesses the memory every (fill in blank) cycles of the System Clock, on the average. The remaining clock cycles are spent by the processor processing the data it has accessed. Some of the newer processors have 'preprocessors' built into them, so they can access the memory significantly more often. The preprocessors begin working on future instructions before the main processor finishes the current instruction; it is known as 'pipelining', I believe. The 12864 will be both similar and different to this scheme. It'll likely have one main processor for the main instruction, and 3 subprocessors to handle the data represented by GET1, GET2, and PUT. It figures that if the average 12864 instruction is as complex as 2 or 3 regular- processor instructions, the 12864 may have to do as many memory-accesses as 2 or 3 'regulars'. Yet by processing GET1, GET2, and PUT simultaneously, the 12864 is essentially doing the work of the 'pipeliners'. Whether or not pipelining of the current sort is actually built into the 12864 remains to be seen. In the meantime, though, the 12864 is still going to spend a number of clock cycles in- between memory-accesses, during which it is processing the accessed data. Since it is fairly obvious that the more often a processor can access the memory, the greater the performance of the computer, the standard trick is to increase the speed of the System Clock, and building both processors and memory chips to keep up. Nevertheless, this does not change the fact that the processor spends many clock-cycles NOT accessing the memory! And I get the impression that the memory chips are not keeping up with the processors, in the speed race. So here is my suggestion: Build the 12864 with a faster clock than the System Clock. It will have to hold its outside lines open for more than one internal clock cycle each time its subprocessors access memory (to stay in sync with the System Clock), but while it is doing that, its main processor can be manipulating previously- accessed data. With proper planning the 12864 should be able to access memory almost every cycle of the System Clock, at the memory's maximum possible speed.

I have been saving the thorniest problem for last (at least I think the end of this essay is approaching!), and it concerns the hardware's management of the data. The first part of the problem is this: While most 12864 instructions are 128 bits long, many will be fully described in only 64 bits. So do we make the processor skip the other 64 bits, and move on to the next memory location, or do we scheme to fit another whole instruction in those 64 bits? My inclination is to ignore the 64 bits, UNLESS it 'just happens' that two adjacent instructions in the assembly source listing can both be reduced to 64 bits. In other words, what the processor would do is load 128 bits, discover that the first 64 of them comprise a complete instruction, execute that instruction, and test the next 64 bits to see if they also comprise a complete instruction. If they don't, they will be ignored, and the processor will load 128 bits from the next address. It would be worth having this scheme just to give the programmers a chance to prove they are clever enough to always make full use of it. Any programmer who NEVER attempts to conserve memory should be fired! (And so what if there are more than 18.4 quintillion memory locations -- waste is waste.)

The other aspect of the memory management problem concerns the Stacks, which are places where random numbers of registers are temporarily stored. If each address a Stack register points at holds 128 bits, and each register being saved is only 64 bits wide, then it seems at first obvious to always put 2 registers at the Stack address. But many times an odd number of registers will be saved; what then? The very simplest answer is to always only store 1 register at each Stack address, and ignore the obvious waste, because this way the processor can never get confused. The next-simplest answer may be to REQUIRE the programmer to always PUSH or PULL an even number of registers when using the stack -- even a JSR (jump to subroutine) instruction would have to save another register with the Program Counter, just to keep the total even. I think I may recommend this particular solution (would you believe I have been worrying about this since the middle of this essay, and just now have come up with the idea?).

The bit-code format of instructions like JSR, BSR, PSH, PUL, and MOVEM can't be the same as the format for most 12864 instructions. The main reason is, as mentioned, that the instruction has to incorporate a list of registers -- but it works out OK, because much of the instruction is predefined. Before we get into any details of that, though, let us examine the Stacking system a little closer. In the 6809 there are two Stack registers, one of which is always used by the hardware to save JSR and interrupt information, and one of which the programmer can use for other things. There are occasions when having two Stacks is really convenient, notably when moving large blocks of data around. In the 68020 there are three Stack registers, one for the Boss mode, one for the Interrupt mode, and one for the Peon. Two bits in the CCS register are devoted to keeping track of which Stack the hardware is using at the moment, so if it had been wanted, a fourth Stack could exist in the 68020. This seems worth putting in the 12864. And another thing: TWO CCS registers! One would be a Boss mode CCS that keeps track of things like the current Stack being used and interrupt-control flags, as well as the list of registers to be saved during an Interrupt, as proposed at the beginning of this essay. The other would have the instruction-result flags in it and some other stuff. MOST of that other stuff is another register list, like that in the Boss CCS. Thus when a GSR instruction is used (generic for JSR and BSR: go to subroutine) a list of registers could specified that would be saved in the Peon CCS. Here is a proposed bit-map for GSR:

_____________|6_________5|5_____5|5_____5|4_____4|4________________|
_____________|3|_|_|_|_|8|7|_|_|4|3|_|_|0|9|_|_|6|5|_|_|.....|_|_|0|
_____________|_Code_For__|_GET1__|_GET2__|_Do-If_|_Register_List___|
_____________|___GSR_____|cannot_|_______|_(PUT__|_Note_Peon_CCS___|
_____________|Instruction|__be___|_______|_is_PC_|_and_PC_registers|
_____________|___________|admode_|_______|always)|_not_on_list;____|
_____________|___________|___2___|_______|_______|_always_saved.___|
_____________________________________^
________If GET2 is admode 2 then data specified by GET1 is copied to PC -- equivalent to JSR. If GET2 is any other admode then the data it specifies is added to the data GET1 specifies, and the result is copied to PC. If GET1 specifies PC then we have a BSR equivalent. The CCS instruction-result flags are NEVER affected by this one. Normal limitations: No adding Immediate Data to Absolute Address!

It is worth noting that the Register List, from 0 to 45, is in agreement with the early estimate of approximately 45 registers total for the 12864. If there are any registers that we can be sure NEVER need to be saved during a GSR, even during the Boss mode, then we can have a few more than the 48 implied here. When executing a GSR, the processor would copy the specified register list to the Peon CCS register, save them all on the current Stack, THEN save both the PC and Peon CCS registers. When an Interrupt occurs, the last two registers saved would always be PC and the Boss CCS (although the Peon CCS would be saved just before then). One bit in the same place in the two CCS registers would serve to identify which is which; this bit cannot be allowed to be changed by anything. Then when the generic RTN (return) instruction is executed, 128 bits of PC and CCS data would be taken from the memory; the correct CCS would be identified, and the correct way of returning would follow. One thing to note about RTN from a subroutine: The instruction is almost completely pre-defined. The only odd thing is that the values of the instruction-result flags in CCS BEFORE the RTN occurs have to be preserved while CCS data is being loaded from the Stack during the actual RTN operation. Unless various flag-masks are set by the programmer! The bit-coding of RTN only needs 6 bits for the instruction, 4 bits for Do-If, and 5 bits for flag-masks (flags the programmer does not want preserved during the RTN from a subroutine); the rest of the 64 bits can be ignored. Programmers should be wary of specifying any flag-masks for RTN at the end of an Interrupt handling routine, since here the normal thing for the processor to do is to NOT preserve the flags, as they exist at the end of the Interrupt handler. Masking them would mean transferring Interrupt data to the interrupted program. This would be OK if the interrupted program was specifically waiting for such....

PSH, PUL, and MOVEM-type instructions can all be combined into one generic, I think, that we can call STAK. The bit-coding for it might be like this:

_______|6_________5|5____|5|5_____5|4_____4|4____________________|
_______|3|_|_|_|_|8|7|_|_|4|3|_|_|0|9|_|_|6|5|_|_|_|.....|_|_|_|0|
_______|___STAK____|Con-_|_|_Do-If_|__PUT__|__Register_List,_to__|
_______|instruction|trol_|_|_______|_______|____be_stacked_or____|
_______|___code____|Bits_|_|_______|_______|______unstacked______|
_______|___________|_____|_|_______|_______|_____________________|
_______|___________|_____|_|_______|_______|_____________________|
_______________________^__^____________^
_______PUT specifies the address where the stack is to start. If LOCATION OF ADDRESS is in memory elsewhere, one Control Bit denotes L or M for the 128 bits at that location, from which the stack's address will be fetched -- no bitfield specs! After STAK is finished, the PUT place is given a new value, indicating the new start of the stack. (Immediate Data still forbidden in PUT, of course.) One Control Bit specifies top or bottom of stack; another Control bit specifies data being added to or removed from the stack. As always, only an EVEN total number of registers may be specified. Bit 54 means that the Peon CCS register is part of the stack operation. STAK never affects flags, except when loading CCS from this kind of stack. (I forgot to say, details of PUT can be in other 64 bits of the instruction fetch.)

That about wraps it up, I guess. Any inconsistencies you may have noticed are due to the fact that this is only a proposal, and therefore does not need to be perfect. Only if the Industry decides to get together to create a standard microprocessor along these lines would it be necessary to get really finicky on all the details. And what do I want out of this? First of all, I want to beat the NIH Syndrome: 'If it is Not Invented Here, we are not interested!' Except for the fact that computers I own and know well happen to have 6809s in them, I am not associated in any significant way with any company in the entire computer industry. I will claim the credit for dreaming up this thing, just to prevent anyone else from doing so -- and just to prevent any person or any company from claiming ownership of it, I am quite deliberately placing this whole concept in the public domain, as of NOW. Thus the whole industry starts off on an equal basis with respect to the proposed 12864 microprocessor, and there should now be no barrier to creating an industry wide standard. I am knowingly forfieting all legal claim to any compensation for these ideas, just to prove I seriously want the Industry to get its act together. On the other hand, any 'royalties of conscience' that might come my way will be gladly accepted!

Vernon Nemitz

March 17, 1991

NOTE ADDED JULY 11, 2001: It wouldn't be so tough to build one of these processors today. The Industry is as non-uniform as ever. The first 80486 DX2 chips began appearing in late 1991. Very Long Instruction Word processors are also being designed and built, to different specs than those described here. But the Industry is STILL putting only 8 bits of data at one address, even as it ramps up the production of 64-bit processors. What a mismatch! Meanwhile, 64-bit addressing looks to be a stable quantity for 20 or 30 more years. Maybe I'll try to promote a 25664 microprocessor...128 bits wasn't really QUITE enough, for all the variety of instructions that I had in mind, and considering all the new multimedia instructions, well, why not!