h a l f b a k e r yactual product may differ from illustration
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Sometimes stability and cost are far more important than performance.
Sometimes all you can find is a Black Friday i3 when you really need a
hardened 486. With ECC memory and supported motherboards costing a bit
more than standard memory, surely somebody out there is looking for a cheap
fault-tolerant
system.
Proposed is a compiler that rewrites all data and program structures to include
fault tolerance at the software level. Even the execution stack would be
structured to include fault tolerance. The operating system itself would have to
be compiled with this compiler for the system to have real usefulness. The
compiler would achieve this using legacy code and without exposing any of the
underlying complications to the programmer unless desired. This would probably
be ridiculously slow, but I'll bet for certain applications, such as for certain virtual
machines or for certain embedded systems, the benefits would outweigh the
performance hit. This is particularly true when a system is housing dozens of
virtual machines but only a couple need fault tolerance.
(Hardware fault tolerance is a different story altogether. Memory is usually the
least of your troubles; software, power supplies, and storage devices cause a lot
more trouble. This would be for niche applications since, for most applications,
it's not hard to find an old server on eBay with yesterday's ECC RAM already
installed. Also, most virtualized environments are already using ECC RAM.)
VIPER microprocessor
http://en.m.wikiped...IPER_microprocessor Not a lot of detail, admittedly. [8th of 7, Nov 19 2013]
What is ECC?
http://en.wikipedia.org/wiki/ECC_memory Defines ECC [popbottle, Nov 19 2013]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
// You can't get 'mission critical' on a budget.
// |
|
|
Really ? Oh dear
maybe you should let the
designers of the 787's RDC network know
about that. |
|
|
Or then again, maybe not. It would only
worry them, and it's too late now. |
|
|
What little I know of computers is hopelessly out of date. So grain of salt. |
|
|
Is there any small part of " Proposed is a compiler that rewrites all data and program structures to include fault tolerance at the software level." that could be done in the small. The door mat or garbage disposal instead of the whole house. Do faults show up in any part on a regular basis? |
|
|
A entire compiler seems like something too big just to prove the concept works. |
|
|
The post reminds me of an old flame-war in alt.folklore.computers concerning what to do about C language "buffer overflow" exploits (which were actually "out of bounds" problems), the concensus being "hire competent programmers". |
|
|
// A entire compiler seems like something
too big just to prove the concept works // |
|
|
It depends on the complexity of the language. |
|
|
For an 8-bit CISC microprocessor with limited
registers, running native code, and a very
simple HLL compiler that doesn't support
libraries or linking - like a Tiny BASIC - then
it's possible to design in a resonable degree
of self-monitoring. |
|
|
Go beyond that and the problems of
validation increase exponentially. |
|
|
The moment a real-time OS is introduced, it's
impossible to achieve in any sensible
timescale. |
|
|
The best approach would most likely be to
network huge numbers of PIC-16's, each one
executing their own little crumb of code, and
individually black-box tested. |
|
|
// The software alone solution sounds like a non-starter as cached code (like the checking code) could get corrupted. // |
|
|
So what if you have three copies of the checking code in memory running in 3 different threads? |
|
|
One problem is that if the instructions get corrupted for one of the three threads, it could theoretically execute random actions and corrupt the other threads or override the protection system. Of course the probability of that is probably less than a 2-bit memory coruption, which will take down an ECC protected system anyway. I think having general purpose code runing in 3 threads will be hard to synchronize, especially when hardware access is involved, so unless you can carefully control the code execution so it only runs code from the memory cache that has already been verified, I'd say it's best to run an emulator on the 3-threaded code. You can't have the OS running non-ECC underneither this, so maybe take the VMWARE hypervisor and rewrite that to run in a three-threaded way. It could then run multiple VMs. Some could be direct hardware VMs with no ECC. Others could be emulated VMs with ECC. This approach would mean that you don't need to recompile most code (or rewrite the compiler), just carefully code your hypervisor. |
|
|
// It could then run multiple VMs. Some could be direct hardware VMs with no ECC. Others could be emulated VMs with ECC. // |
|
|
That's going to be a big task-switching overhead, though. Better to have 3 separate isolated systems, and voting, as [bigs] suggested. |
|
|
Anything that adds any complexity to the software is a Bad Thing. |
|
|
Might it not be cheaper/simpler to put a bunch of
lead shielding around the memory? My
understanding is that a large fraction of memory
errors are caused by cosmic rays. |
|
| |