h a l f b a k e r yCompound disinterest.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Supercore
Unparallelize a core and speed up processing | |
Modern processors have a problem. It's much more efficient to put several cores on the chip than devote that silicon to branch prediction, cache, and other accessories for one. But like it or not a certain amount of programming can only be done one thing at a time, no matter how cleverly you program
for multiple cores.
At first I considered proposing making two kinds of core; one larger and faster for tasks than cannot be made parallel. But that would be too costly. My proposal is to take one of the, for example, four cores on a chip and devote it entirely to caring for another one. It would do branch prediction, cache handling, memory swapping, and memory lookup. One core would therefore be able to work more efficiently at the cost of only three cores being available on the chip. I think this would result in an overall faster chip than a design simply making use of four of the same core.
Branch prediction at wikipedia
https://en.wikipedi...ki/Branch_predictor [beanangel, Jan 11 2017]
Geranium
https://en.wikipedia.org/wiki/Geranium Mentioned in my anno [notexactly, Jan 12 2017]
[link]
|
|
So you want to do branch prediction, caching, and all that
stuff that's usually done by hardware or microcode in
software instead? I imagine it'll go like software vs.
hardware video effects/CGI rendering: it'll be a lot more
accurate, but it'll also be a lot slower and consume a lot
more energy. I suggest making it a feature that can turn on
and off as necessary. |
|
|
It's already the case that one logical processing unit will have multiple dedicated subunits for dealing with the stuff which is relatively slow but needed frequently. |
|
|
Asymmetrical cores also already exist, for example ARM's big.LITTLE architecture. |
|
|
I used to wonder if two different voltage squarewaves could travel through a CPU, with one channel being read at come logic gates, and the other higher voltage being read at different logic gates. i think I remember the primitive solution of voltage drop with one diode causing the low voltage bits to drop to effective zero leaving the higher channel. |
|
|
I remember thinking this could be proved more efficient because if you send as compressed version of data to say, Venus, then uncompress it, it would be faster than sending the raw data, so at planet spanning microprocessors there would be an advantage to a clock synchronized yet compressed data channel travelling through the same wires, so it is only a question of how big does a microprocessor have to be to heighten efficiency this way. |
|
|
this kind of relates to your idea, as little big multicore could decompress and contribute to calculations |
|
|
Another potential speed-up would be to have a processor which can be programmed to do several operations at once to optimize tight-loops. Several ALUs would be required. They would be programmatically "connected" to specified registers and an operation selected for each to perform with each clock cycle. For example:
One could add the contents of one register with a value in RAM,
another could increment an address stored in EDI
another would count the loops (up or down),
another would compare the loop count with the exit value
and another would compare two values for an alternative exit condition.
What's obvious, is that if all of these interdependent operations are executing at once, operands would often be modified too early or too late. To remedy this, one would use two sets of registers: One of each set would hold the value it held at the beginning of the cycle, while the other would receive the freshly modified value. At the end of each cycle, the sets would swap their designations (original versus modified values).
Another likely necessity, is the abillity to undo the result of the last cycle, since it might not know before hand whether it will be one-too-many.
The CPU would be configured for handling a loop this way before entering the loop. It would revert to normal operation immediately after exiting the loop. |
|
|
[alvin] although what you described is different (niftier) [Voice] mentioned branch prediction which wikipedia describes as "The branch that is guessed to be the most likely is then fetched and speculatively executed. If it is later detected that the guess was wrong then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay." (wikipedia, branch prediction) AMD Ryzen uses branch prediction. |
|
|
// AMD Ryzen uses branch prediction // |
|
|
I would think just about every modern architecture uses
branch prediction. |
|
|
I use branch prediction while performing everyday tasks. |
|
|
I, too, am at least as intelligent as a two-square-centimeter piece of silicon and geranium. |
|
|
Are you sure it's not a pelargonium? Apparently they're often
confused with geraniums: [link] |
|
| |