The high-level pitch of P cores and E cores seems so elegant, but when it actually comes to scheduling, it gets messy fast. Even in a laptop running off a battery, you can't simply switch to E cores because some short-lived work might be latency-sensitive. You also can't assume long-running work should be on an E core because maybe you're anxious to get that video encoded. Even for lots of small work, different core can have different performance characteristics, and a P core might be more efficient for certain workloads.
Funny enough, Unix already has user-settable priorities, aka "nice level". ACPI gives us an idea how plentiful the power is.
So, when powered by AC power, schedule everything on P cores when possible, schedule processes that eat a lot of CPU on P cores, same for any process with a negative nice value.
When powered by a battery, schedule anything with non-negative nice value on E cores, keep one P core up for real-time tasks, and for nice-below-zero tasks.
These are two extremes, but I suppose that the idea is understandable.
> So, when powered by AC power, schedule everything on P cores when possible, schedule processes that eat a lot of CPU on P cores, same for any process with a negative nice value.
Even when plugged in, you may have thermal limitations. P cores will chew through your power budget more aggressively than E cores. For latency-sensitive workloads you do want to emphasize the P cores, but when throughput is the goal you'll usually be better off not ignoring the E cores, and not trying to run the P cores at high frequency where they're much less efficient. Intel started adding E cores to consumer chips in large part so they could score better on throughput-oriented multithreaded benchmarks like Cinebench; they're decent at compiling code, too, but you'll still want the P core for the linker.
Far better would be to tweak the time constants to your liking, so that you can use the full clock range of the chip, but constrain its sustained power draw for quiet and long battery life.
I think that here is where things are lacking. There's not enough information that can be conveyed to the OS with just a number, and the number seems fixed and not tied to user input (active application, user just clicked, action blocking presentation).
It'd be cool if tasks told you about their workload in terms of latency throughput, and cadence required (hello skipping audio when you compile hard).
That's not really how nice levels have worked traditionally, and would disallow specifying "run on Performance cores, but yield to other processes quickly".
They're both - though Intel has mostly talked up the power efficiency.
For CPU's, those two types of efficiency are closely related. Omitted transistors (in an E core design) neither take up die space, nor consume power. And CPU cooling systems are ultimately measured by how many watts of heat they can remove from each unit of die area - so fewer watts from a smaller core. (That's at a given temperature difference, etc. But your die will die if any part of it gets too hot. And revving up the CPU cooling fan is generally not preferred.)
As a user with a laptop, the last thing I want is the OS to decide for me. I want to tell it myself "this is sensitive, put all your energy into it because I'm five minutes away from pushing that important work and I have seven minutes of battery left" or "this won't work at all if run at less than 2 GHz" vs "I must drag what I'm doing along for as long as I can, save every bit of battery possible. The computer can't know about these cases.
The problem is you don't really think about those cases early enough to matter. 7 minutes of battery isn't even a knowable thing - that is current average (though often not calculated that well) and could be 10 minutes if nothing happens (no emails arrive, no web pages rendering in the background, don't touch anything on it....), but if you try to run that 5 minute task in reality you have 2 minutes of CPU using the P cores, or 5 minutes using the E cores - but on the E cores you need 7 minutes. The above times are all made up of course, but they give the idea.
If when the battery is full you make the right decisions your battery can last longer. However this isn't something you can do. You don't want a pop-up when your email program spawns a thread to check for new email - programs do this all the time and the system doesn't know if the thread itself will run for a few ms or for hours. In most cases the battery consumed by the popup will be more than the thread itself uses. You want the system to make the right decisions - but the right decision depends on your system and someone else with a difference CPU may need different decisions.
Maybe its just me, but this P&E arch is underwhelming and screams similar issues AMD bulldozer again.
Claims of massive core counts with mediocre performance, and little control over how things are assigned to the cores.
Maybe that will improve over time with improved schedulers, but I doubt it. Its looks like an architectural issue.
The experience feels so inconstant, even ending up worse than the prior generations with all normal P cores with lower core counts.
I'm avoiding Intel P&E CPUs with anything that needs consistent performance, as my limited experience with the new Intel chips leaves me with a bitter taste in my mouth, and a frustrating computing experience.
I see the heterogeneous architectures as mostly a plus. If you want the most throughput for a highly parallel workload given a power and silicon budget 100% E cores would be best. If you have some workloads that don't parallelize well then a few P cores are best. Heterogeneous gives possibilities to optimize for both cases. There is another knob to turn, and mistakes can be made, but this should be an overall positive.
My bigger concern with the newer Intel CPUs are the crashes and reliability issues that were reported.
The AMD Bulldozers put a lot of new, untested ideas in to practice. Some of them paid off, while many did not, at least not in the short term. There were, however, many good ideas, and many of those lived on and helped make Ryzen what it is.
Over time, Bulldozer performance has matched and exceeded contemporary Intel performance, both because compiler optimizations have made better use of the CPU and because of the slowdowns from Spectre / Meltdown affecting Intel much more than AMD. I still run an FX-8150 server and have compared it with an Intel 2600K system in many tasks.
Do we think that Intel is going to use their current shitshow to make a golden age of Intel CPUs, like how Bulldozer led to Ryzen? I personally don't think so. They've put all their cheap tricks in to their CPUs, tricks which require huge slowdowns when flaws are found, and unless / until they start caring about actually doing things correctly rather than playing fast and loose with hundreds of watts, they'll keep trying to game the benchmarks, will keep having flaws and problems, and will keep losing market share.
> Apart from some models of Alder Lake, it is now impossible to buy an Intel chip that does not have at least P (Performance) and E (Efficiency) cores.
That bit actually still applies. Intel may have branded the 14100F as Raptor Lake, but it is almost certainly Alder Lake silicon, just a higher speed bin of the 12100F.
How about change that to "Anything with more than 6 cores". Anything with 4 cores only has one speed of core. At 6 cores it more of a mixed bag, some have all the same cores, some have a split of performance and efficient cores. Anything over an i5 will have E cores.
Yep, there are still server CPUs with only P-cores.
They are a bit expensive but I wouldn't expect them to drop these skews in the long term for HPC & compute bound workloads. My guess is that diamond rapids will also have some P-skews and maybe AP skews.
The i3-14100F is just one example - Intel still sells numerous non-hybrid models across their lineup including most i3s, Pentiums, Celerons, and many server/workstation Xeons. The documentation's claim about availability is overstated.
The high-level pitch of P cores and E cores seems so elegant, but when it actually comes to scheduling, it gets messy fast. Even in a laptop running off a battery, you can't simply switch to E cores because some short-lived work might be latency-sensitive. You also can't assume long-running work should be on an E core because maybe you're anxious to get that video encoded. Even for lots of small work, different core can have different performance characteristics, and a P core might be more efficient for certain workloads.
Funny enough, Unix already has user-settable priorities, aka "nice level". ACPI gives us an idea how plentiful the power is.
So, when powered by AC power, schedule everything on P cores when possible, schedule processes that eat a lot of CPU on P cores, same for any process with a negative nice value.
When powered by a battery, schedule anything with non-negative nice value on E cores, keep one P core up for real-time tasks, and for nice-below-zero tasks.
These are two extremes, but I suppose that the idea is understandable.
> So, when powered by AC power, schedule everything on P cores when possible, schedule processes that eat a lot of CPU on P cores, same for any process with a negative nice value.
Even when plugged in, you may have thermal limitations. P cores will chew through your power budget more aggressively than E cores. For latency-sensitive workloads you do want to emphasize the P cores, but when throughput is the goal you'll usually be better off not ignoring the E cores, and not trying to run the P cores at high frequency where they're much less efficient. Intel started adding E cores to consumer chips in large part so they could score better on throughput-oriented multithreaded benchmarks like Cinebench; they're decent at compiling code, too, but you'll still want the P core for the linker.
Always personally disable turbo boost. Especially on laptops
Far better would be to tweak the time constants to your liking, so that you can use the full clock range of the chip, but constrain its sustained power draw for quiet and long battery life.
With modern CPUs, disabling turbo boost will leave tons of performance off the table
If I run a game, I limit CPU to about 50% clock speed.
Only way to stop laptop getting crazy hot and fans meaningfully reducing pressure on desk of laptop...
I think that here is where things are lacking. There's not enough information that can be conveyed to the OS with just a number, and the number seems fixed and not tied to user input (active application, user just clicked, action blocking presentation).
It'd be cool if tasks told you about their workload in terms of latency throughput, and cadence required (hello skipping audio when you compile hard).
That's not really how nice levels have worked traditionally, and would disallow specifying "run on Performance cores, but yield to other processes quickly".
>when powered by AC power, schedule everything on P cores when possible
Sometime I feel like that is undesirable. It may make system consume more power, thus more heat output and louder.
A laptop and a desktop certainly would balance P and E differently!
I may be completely wrong, but I read that E cores are not power efficient, rather they are die space efficient.
They're both - though Intel has mostly talked up the power efficiency.
For CPU's, those two types of efficiency are closely related. Omitted transistors (in an E core design) neither take up die space, nor consume power. And CPU cooling systems are ultimately measured by how many watts of heat they can remove from each unit of die area - so fewer watts from a smaller core. (That's at a given temperature difference, etc. But your die will die if any part of it gets too hot. And revving up the CPU cooling fan is generally not preferred.)
As a user with a laptop, the last thing I want is the OS to decide for me. I want to tell it myself "this is sensitive, put all your energy into it because I'm five minutes away from pushing that important work and I have seven minutes of battery left" or "this won't work at all if run at less than 2 GHz" vs "I must drag what I'm doing along for as long as I can, save every bit of battery possible. The computer can't know about these cases.
FWIW, Apple leaves it up to the app developer to specify a quality-of-service for a particular execution context:
https://developer.apple.com/library/archive/documentation/Pe...
The problem is you don't really think about those cases early enough to matter. 7 minutes of battery isn't even a knowable thing - that is current average (though often not calculated that well) and could be 10 minutes if nothing happens (no emails arrive, no web pages rendering in the background, don't touch anything on it....), but if you try to run that 5 minute task in reality you have 2 minutes of CPU using the P cores, or 5 minutes using the E cores - but on the E cores you need 7 minutes. The above times are all made up of course, but they give the idea.
If when the battery is full you make the right decisions your battery can last longer. However this isn't something you can do. You don't want a pop-up when your email program spawns a thread to check for new email - programs do this all the time and the system doesn't know if the thread itself will run for a few ms or for hours. In most cases the battery consumed by the popup will be more than the thread itself uses. You want the system to make the right decisions - but the right decision depends on your system and someone else with a difference CPU may need different decisions.
I feel like cases 1, 2, and 3 broadly fit into "Battery Saver", "Performance", and "Battery Saver" modes?
Yes exactly - which I set myself depending on my current use case at hand. I definitely don't want the OS to try and guess
Thats one of the reasons I switched back to Linux. I've bought Alder Lake and a couple of RockPro64s that have heterogenous CPU sets.
Maybe its just me, but this P&E arch is underwhelming and screams similar issues AMD bulldozer again. Claims of massive core counts with mediocre performance, and little control over how things are assigned to the cores. Maybe that will improve over time with improved schedulers, but I doubt it. Its looks like an architectural issue. The experience feels so inconstant, even ending up worse than the prior generations with all normal P cores with lower core counts. I'm avoiding Intel P&E CPUs with anything that needs consistent performance, as my limited experience with the new Intel chips leaves me with a bitter taste in my mouth, and a frustrating computing experience.
I see the heterogeneous architectures as mostly a plus. If you want the most throughput for a highly parallel workload given a power and silicon budget 100% E cores would be best. If you have some workloads that don't parallelize well then a few P cores are best. Heterogeneous gives possibilities to optimize for both cases. There is another knob to turn, and mistakes can be made, but this should be an overall positive.
My bigger concern with the newer Intel CPUs are the crashes and reliability issues that were reported.
Intel is quietly having their bulldozer moment
That's not really fair to Bulldozer CPUs.
The AMD Bulldozers put a lot of new, untested ideas in to practice. Some of them paid off, while many did not, at least not in the short term. There were, however, many good ideas, and many of those lived on and helped make Ryzen what it is.
Over time, Bulldozer performance has matched and exceeded contemporary Intel performance, both because compiler optimizations have made better use of the CPU and because of the slowdowns from Spectre / Meltdown affecting Intel much more than AMD. I still run an FX-8150 server and have compared it with an Intel 2600K system in many tasks.
Do we think that Intel is going to use their current shitshow to make a golden age of Intel CPUs, like how Bulldozer led to Ryzen? I personally don't think so. They've put all their cheap tricks in to their CPUs, tricks which require huge slowdowns when flaws are found, and unless / until they start caring about actually doing things correctly rather than playing fast and loose with hundreds of watts, they'll keep trying to game the benchmarks, will keep having flaws and problems, and will keep losing market share.
> Apart from some models of Alder Lake, it is now impossible to buy an Intel chip that does not have at least P (Performance) and E (Efficiency) cores.
Really? I just bought one:
https://www.intel.com/content/www/us/en/products/sku/236786/...
> Apart from some models of Alder Lake
That bit actually still applies. Intel may have branded the 14100F as Raptor Lake, but it is almost certainly Alder Lake silicon, just a higher speed bin of the 12100F.
See https://www.intel.com/content/www/us/en/products/compare.htm... and note how none of them get the higher DRAM frequency support or larger L2 caches characteristic of Raptor Lake silicon.
How about change that to "Anything with more than 6 cores". Anything with 4 cores only has one speed of core. At 6 cores it more of a mixed bag, some have all the same cores, some have a split of performance and efficient cores. Anything over an i5 will have E cores.
Hmm, I think Granite Rapids is all P-Cores and goes up to 86 cores (172 threads):
https://en.wikipedia.org/wiki/Granite_Rapids
Yep, there are still server CPUs with only P-cores.
They are a bit expensive but I wouldn't expect them to drop these skews in the long term for HPC & compute bound workloads. My guess is that diamond rapids will also have some P-skews and maybe AP skews.
Here there's weirdness, still, though because there's such a frequency difference you'll get between "low priority" and "high priority" cores.
The i3-14100F is just one example - Intel still sells numerous non-hybrid models across their lineup including most i3s, Pentiums, Celerons, and many server/workstation Xeons. The documentation's claim about availability is overstated.
There are also Xeons, but it limits an OS to use in data centers.
There are workstation Xeons. Though it seems that mobile Xeons are defunct now.