h a l f b a k e r yWe are investigating the problem and will update you shortly.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Remember back in the day when you could actually wrap your head around how a CPU worked? No, well, I might be a bit older than you. I grew up with 8-bit CPUs, back when they were actually simple enough that a child could understand how they worked. Those were the times. :-)
But over the years, CPUs
got more and more complex. Now, they're like intricate puzzles that no human and only vast compiler frameworks like LLVM can solve. And we've hit a wall with how fast they can go. So, instead of making them faster, we're making them more complex, trying to cram as many operations as we can into a single clock cycle. Ironically, while computers got thousands of times faster, typical compilers today are actually slower than some compilers in the 80s-90s.
Now, for the totally half-baked idea:
What if we stripped the CPU core down to a much simpler instruction set? And instead of making it more complex, let's just add more of them - like a lot more, we're talking hundreds or even thousands.
So what about all those fancy vector instructions that modern CPUs use? Well, those instructions are part of what's making CPUs so complex. Most of these complex instructions are vector instructions - they're doing a bunch of simple operations in a single clock cycle. My idea is to spread those operations out across multiple cores. So, if a core has a complex instruction to execute and its neighbors are just sitting idle, we could spread the work out. That way, we could still get it done in one clock cycle.
Sure, this means we'd have to emulate those complex instructions at the hardware level, but I reckon it's worth it. We'd keep the door open for all the existing software that relies on those instructions. But over time, I bet we'd see compilers and software start to favor the simpler instructions and all that parallel power we've got on tap.
There would be a performance hit at first because of the hardware emulation. But think about it: we'd have so many more cores that we'd still come out ahead. And once everything starts to adapt to this new setup, we could really start to see some speed.
So, that's the idea: a simpler, massively parallel CPU that can still handle the complex stuff when it needs to. Especially for stuff like AI that just eats up parallel processing power. It's a bit of a throwback to the simplicity of the early days, but with a modern twist.
Intel, if you're reading this, no rights reserved, feel free to build it. :-D
Multithreaded A*
https://replit.com/...ical/SlidingPuzzle3 multithreaded A* algorithm in Python, that generates code [chronological, Jul 26 2023]
Superscalar processor
https://en.wikipedi...perscalar_processor "A superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor." [Loris, Jul 26 2023]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Destination URL.
E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)
|
|
Would this be analogous to the Reduced Instruction Set Computing idea of the 1990s? |
|
|
Sort of, although RISC architectures (such as ARM) are usually found in lower-end devices - and generally need multiple clock cycles to perform complex operations. I think the new idea here is to create an architecture with cooperative cores, and the hope is this would be able to compete with CISC architectures. Of course, I have no idea if cooperative cores are even possible, so it's really half baked :-) |
|
|
Cooperative cores are possible. Emulating most complex instructions across them in one cycle is not, at least not in practical terms. |
|
|
I've learned never to say "impossible" 😉 |
|
|
If you have 100+ cores, maybe some new tricks are possible - for example, maybe the cores are grouped into clusters of 4, and each group switches between parallel mode and emulation mode, with some sort of controller looking ahead at an instruction queue and scheduling which groups are doing what. |
|
|
I tried to parallelise an algorithm called A*. Unfortunately it has sequential parts to it, there may be a way of getting around that. |
|
|
But some calculations are serial, sequential, they depend on the previous result :'( |
|
|
I love the idea of 1000s of cores on a single CPU die though. |
|
|
I play with multithreaded software for a hobby. |
|
|
I managed to get A* parallel by running it with different data in each process/thread |
|
|
I'll share a link to multithreaded code generator in python. |
|
|
You want parallel? Yeah, well, that works for some operations... but not for things which rely on previous results. For those, you want to do that one thread as fast as possible. |
|
|
So X86 has your big chonky cores which scale up to 8 or 12 or so, even Arm has its big.LITTLE technology with different types of core which can be used to allow heterogeneous multi-processing. |
|
|
As an aside - if you want to do lots of completely independent operations, then what you want is a GPU-like architecture. Great for those embarrassingly parallel problems, but not general compute. |
|
|
If you want to have lots of things talking to each other, that has its own cost in terms of complexity and speed. Doing it in a limited way, though... well, that's kind of already done in full-fat processors today, that's how superscalar processors work. |
|
|
So really, everything you want is already implemented today - maybe not to the scale you'd envisaged, but that's because it's not practical given the diminishing returns. |
|
| |