h a l f b a k e r ycarpe demi
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Please log in.
Before you can vote, you need to register.
Please log in or create an account.
|
The slogan "write once, run anywhere" or WORA, was coined by Sun Microsystems to illustrate the cross-platform benefits of the Java language. Unfortunately, when applied to Java this slogan is far from the truth; multiple Java VMs exist, and not for every platform, and for every VM you want your program
to be compatible with, you'll have to debug for. The idea here is intended as a realization of that ideal envisioned by Sun.
So to get right to the point, first there needs to be a reference interpreter, or reference VM if you will. Then the HLL (high level language) to bytecode (VM equivalent of machine language) compiler may be written to run in that interpreter. How is that any different than Java? you ask. How is it any easier to get the same program running on every architecture desired with this? I'll answer that now.
It is simple; the VM can only handle console I/O and a single instruction. Console I/O is supported in some manner in nearly every OS. There are a number of OISCs (One Instruction Set Computer) to choose from for the architecture definition, but that is immaterial so long as a standard is defined.
So, with the theory out of the way, I'll get back to the engineering part of the problem. There must be a reference interpreter defined in the standard. There must also be an interpreter that runs on your desired platforms. Creating such an interpreter is a trivial problem, and would probably be less than 100 lines of code, depending on the language and platform. An interpreter written in ANSI C could be easily adapted for several platforms. Interpreters written in other languages like Java and FORTRAN are not out of the question, and may well be desirable.
The hard part would be the compiler. There must be a compiler that supports at least one common input HLL, such as ANSI C, for the VM to be of any use to software developers. Writing a compiler that compiles only one language, again C for example, to a OISC architecture would be a big undertaking. Writing the same IN that architecture would probably be the single largest software development undertaking ever. Ideally, more than one common language should be supported. Heh.
http://en.wikipedia...uction_set_computer
[Spacecoyote, Mar 02 2009]
OISC with PL/0 (simplified Pascal) support
http://compilers.ie...h/article/88-11-019 from 1987 [Spacecoyote, Mar 02 2009]
Low Level Virtual Machine
http://llvm.org/ Similar idea, albeit higher level. [Spacecoyote, Nov 16 2010]
BytePusher
http://esolangs.org/wiki/BytePusher New VM designed for simple games with single instruction CPU. Modeled after CHIP-8. [Spacecoyote, Dec 04 2010]
UCSD P-System
http://www.threedee.com/jcm/psystem/ prebaked in the 70's [Spacecoyote, Apr 16 2012, last modified Apr 17 2012]
Forth Programming Language
http://en.wikipedia...ogramming_language) [Spacecoyote, May 10 2014]
[link]
|
|
Ideally, yes, but I'll allow support to differ between implementations as necessary. |
|
|
Hmm...If the input language was Java and the target environment was Java, things could get quite interesting. |
|
|
Of course, this would all be too slow for any real computing task. But for some simple things it would be nice to be able to have the same program anywhere. And I mean anywhere. It would be relatively easy to knock out an SRE interpreter in Javascript. |
|
|
If it only uses console I/O, couldn't you just write the program in C? |
|
|
That only works when you have the source code. This makes a single binary that works on all platforms without recompilation. |
|
|
Also, the bytecode produced by the compiler is inherently extremely difficult to decompile, even without encryption. This is good where security or reverse engineering is a concern but cross platform compatibility is required. |
|
|
// the VM can only handle console I/O and a single instruction
What does the single instruction do? |
|
|
I don't understand what the point of this is.
You want an implementation of Java, only yours won't have bugs? That's a halfbakery invention how? |
|
|
Wikipedia gives a few examples of possible instructions that, alone, make for a turing-complete instruction set. One example is as follows: |
|
|
subleq a, b, c ; Mem[b] = Mem[b] - Mem[a]
; if (Mem[b] <= 0) goto c
subleq means "SUBtract and branch if Less than or EQual to zero". |
|
|
Such single instructions are to software what a NAND gate is to hardware: any other instruction may be synthesized from it. For example: |
|
|
ADD a, b == subleq a, Z
subleq Z, b
subleq Z, Z
|
|
|
where a and b are the registers to be added and Z is a register initially containing zero. |
|
|
The point is that even though the compiler would be very hard to create, the interpreter is trivial to create. So if you have some programs made for SRE and you want to run them on a platform that doesn't have a SRE interpreter, you simply write your own. For a good programmer, it would be less than an hour's work. On the other hand, porting the JRE (either Sun's or GNU's) would be a huge amount of work. |
|
|
Also there's the point I made earlier about security. From the standpoint of anyone requiring high security against reverse engineering this makes so much sense that I wouldn't be surprised if this were baked in some secret government way. |
|
|
This is not Java. Not by a long shot. |
|
|
This may end up becoming baked for entirely different reasons than I thought of. It is possible to make chips containing simple logic at VERY TINY half-pitch sizes and running VERY FAST (I'm talking like 15nm at hundreds of gigahertz). Such technology is impractical for use with current architecture paradigms. But a massively-parallel single instruction set, many pipeline chip would be ideal for this. |
|
|
What programs do you imagine being run on this VM? I can see the benefits of a tiny, easily written interpreter that could run some generic utilities on a newly designed computer. But what else? Console I/O only is going to be a bit limiting. |
|
|
Also, I doubt that programs will be terribly resistant to reverse engineering. If it is popular, then some hackers will learn to read it. I suspect it would be easier to read than obfuscated C code. With machine code, you know exactly what an instruction does just by looking at it, but with modern languages you can modify the behavior of an instruction with a declaration in a different file. |
|
|
Well different devices can be piped to/from stdio, depending on what your OS supports, theoretically the possibilities are limitless. This is however, unlike Java, not meant for everyday computing, only special purposes. What purposes I don't know, but they're certainly out there. |
|
|
I hold my position on reverse engineering. The difficulty of deciphering this machine code would be somewhere between that of deciphering brainfuck and deciphering malbodge. There are of course further ways to increase obfuscation if desired. |
|
|
I've always found the local park simple enough. |
|
|
I've started to use LLVM for a project and have realized its basically a higher level version of this. In fact, SRE could probably be implemented as an LLVM backend easily enough. |
|
|
//It is possible to make chips containing simple logic at VERY TINY half-pitch sizes and running VERY FAST (I'm talking like 15nm at hundreds of gigahertz). Such technology is impractical for use with current architecture paradigms. But a massively-parallel single instruction set, many pipeline chip would be ideal for this.// |
|
|
Memory bandwidth will kill it... |
|
|
You are no doubt referring to the notorious Z80A "HACF" instruction, [IT] ? HCF exists on a number of different processors, but only on some early second-source Z80A's did it actually trash the processor. |
|
|
isn't this something like Unix's cshell ? ie: a command line interpreter with extra goodies. |
|
|
//high level language// is an archaic term for 3GL's, it just meant "not machine code or assembler (which predictably are "low level languages"). |
|
|
//isn't this something like Unix's cshell// |
|
|
Nope, its a no-frills/minimal highly portable VM, not a shell. |
|
|
I can't tell if you're being a troll or not so I'll humor you. No, this is not an operating system, it must run on top of an operating system, like Java. |
|
|
Java without hardware-proprietary extensions... ? |
|
|
So you have a "front end" compiler, which takes your source code, either from a text file or from an integrated word-processor and instead of turning it into object-code (machine language) turns it into a bytecode file. This file is like machine-language for a mythical one-opcode machine (based on the assumption that if you want to turn left you simply have to turn right three times). Anyways, this is the "transportable" file that you can bring to any machine (that has an interpreter on it). |
|
|
Now on the target machine is an interpreter. This takes the .sre and runs it on-the-fly. |
|
|
I think I see the point: the "final interpreter", in its crudest form, need only be able to do an almost 1:1 mapping of the .sre file records onto the host machine's corresponding single-opcode (or single static opcode structure). |
|
|
However just to be sure that's what you really mean... |
|
|
It's not a Virtual Machine in any sense other than *any* source listing or bytecode is. |
|
|
"Console I/O" I take it means the device, not the Operator's command set. |
|
|
What is the "reference interpreter" mentioned in the second paragraph ? It's on the source machine, right ?... do you mean a compiler that produces bytecode ? |
|
|
By that definition, [ft], LLVM isn't (strictly) a VM either. Make of that what you will, semantics don't really matter. |
|
|
I've some experience with .NET, although it performs well its rather bloated and supported on relatively few platforms as it really isn't inherently much more portable than native code. |
|
|
//semantics don't really matter// Well they do if you haven't a clue what all the words and buzzwords mean at a native level (or for some reason decided to skip the post a year'n'half ago). |
|
|
I thought the modern acronym "VM" involved running an entire OS inside a partition(read "sandbox")... and that Java "VM" was just a marketing ploy... but now "Virtual Machine" means... something that is interpreted and that has the words "virtual machine" in it's title, yes ? |
|
|
Then you went on about "Console IO" and comparing your system to Java; I figgered you might be talking about some sort of CL shell... and an advanced shell would be comparable to Java, albeit at a text-based level... so I guess they're "virtual machines" as well. |
|
|
The second paragraph talks about about a "reference interpreter" which you're running a compiler "in"... say what ? Does this mean you have say a run-of-the-mill ANSI C compiler and you're taking its machine-code output and compiling *that* into your byte-code ? Granted that would mean you don't need to write a <language> to bytecode compiler, but it *would* means that the eventual bytecode file will be heavily influenced by both the architecture of the source machine's processor and the strengths and weaknesses of the compiler. |
|
|
And of course(, not that this is you), "One Instruction Set Computer" is pretty misleading considering it's actually a "one instruction" computer. At first glance I thought it meant a processor with only one microcode set. |
|
|
The idea's nifty, if it's what I think it is. If it isn't, then *my* idea's nifty. |
|
|
//semantics don't really matter// Semantics, n. pl. "The
branch of linguistics and logic concerned with meaning"
You're saying the meaning doesn't matter? |
|
|
Is this why English students learn maths, whereas American
students only learn the one? |
|
|
err. you mean you don't teach the Grand Unifi... ummm, nevermind. |
|
|
There are two main types of VM, a system VM and a process VM. A system VM (e.g. VMware) is designed for a guest OS to execute within. A process VM (e.g. Java or .NET) is designed for individual programs to run on top of, as an abstraction from the OS and hardware of the physical machine. In the case of SRE, the VM abstracts the processor down to a single instruction, and provides stdio (console I/O) support as the sole interface to the outside world. The console I/O thing is sort of a non-sequitor, I know, its just that at the time of writing I realized programs would need some method of I/O and that's what came to mind, really I/O is a different matter (there's no real reason to restrain what sort of I/O are supported) and would probably be best implemented as a self-contained library/module, as the LLVM project has done with its implementation of the C Standard Library. |
|
|
//Does this mean you have say a run-of-the-mill ANSI C compiler and you're taking its machine-code output and compiling *that* into your byte-code ?//
If one wrote a SRE frontend for whatever machine code architecture is used you could do that, but that's not what I had in mind. |
|
|
Really, if you want to know what this is and how it is useful, look at LLVM. This idea is basically a "lite" version of LLVM. Had I known of LLVM at the time, I probably wouldn't have gone to the trouble of writing this idea. |
|
|
I guess its not entirely redundant as BytePusher has shown that extreme simplicity can be interesting. The main loop is less than 10 lines of C code, and the rest is a little platform-specific support for the I/O and such. Despite that it is a new platform with little interest (probably because its a toy made as a programming/engineering exercise and not a serious project), there already exist 3 separate implementations, likely because of this simplicity. |
|
|
OTOH, LLVM's main strength is that the program may be optimized for the target processor at compile time or run time. With an OISC based system, optimization is out of the question. |
|
|
ahh ic. so in this case the bytecode-generator (compiler) part isn't really VM at all, it's just a program. |
|
|
On the other hand, the handful o bytes sitting on the object/target computers (interpreters) would be the VM bit ? Looking at the WP article on "virtual machine" it seems to have none of the attributes of a "process VM", and does have one of the defining attributes of a "system VM": it's written in a foreign machine code. |
|
|
I don't see how its anything other than a process VM (like LLVM's interpreter mode, QEMU's user mode emulation, Java, or .NET). With all VMs, the object code is "foreign machine code", that's the IDEA of a VM (even if an actual native "processor" doesn't exist), this includes .NET. The difference is in a system VM, other hardware besides "processor" is emulated, for instance VMware emulates a certain BIOS, video card, storage controller, etc, etc. |
|
|
I've been playing with Forth a bit recently...it's a
similar idea. |
|
| |