VOGONS


Thought's on Intel's recent troubles

Topic actions

Reply 20 of 42, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I didn't know about it being potential silicon degradation. That's interesting.

It makes me think of the defect-related degradation problems with the P67 PCH SATA and Atom LPC/USB/SD. They admitted to those and at least one had a recall.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

Reply 21 of 42, by StriderTR

User metadata
Rank Member
Rank
Member
swaaye wrote on 2024-07-26, 17:56:

I didn't know about it being potential silicon degradation. That's interesting.

It makes me think of the defect-related degradation problems with the P67 PCH SATA and Atom LPC/USB/SD. They admitted to those and at least one had a recall.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

If I understood what I read correctly, the CPU is calling for more voltage than it can handle, no matter the BIOS settings. So they are basically slowly frying themselves and there really isn't much the end user can do about it.

There is speculation by some that Intel took a gamble the silicon could handle it and did it to push the limits of whats achievable on that die in order to compete with Ryzen without having to put in much effort or cost. Of course, I have no idea what the real reason is, but if this is anywhere near true, then it's really backfired. Though, it does seem like something Intel would do. They command the largest market share and have topped AMD for many years until Ryzen came along. Many people think they got complaisant and, as it was said above, got cocky.

It really sucks for all the system integrators that now have to get caught up in the mess. I just hope Intel does the right thing and makes it right for their customers, both end users and integrators, and allow all owners of those chips an RMA. If I had one of those chips, I would want it replaced even if I was not seeing any signs of a fault, simply becasue there is no way to know if there's damage you just haven't seen yet.

So glad I'm not in this boat. 😜

Retro Blog: https://theclassicgeek.blogspot.com/
Archive: https://archive.org/details/@theclassicgeek/
3D Things: https://www.thingiverse.com/classicgeek/collections

Reply 22 of 42, by Namrok

User metadata
Rank Oldbie
Rank
Oldbie
StriderTR wrote on 2024-07-26, 18:32:
If I understood what I read correctly, the CPU is calling for more voltage than it can handle, no matter the BIOS settings. So t […]
Show full quote
swaaye wrote on 2024-07-26, 17:56:

I didn't know about it being potential silicon degradation. That's interesting.

It makes me think of the defect-related degradation problems with the P67 PCH SATA and Atom LPC/USB/SD. They admitted to those and at least one had a recall.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

If I understood what I read correctly, the CPU is calling for more voltage than it can handle, no matter the BIOS settings. So they are basically slowly frying themselves and there really isn't much the end user can do about it.

There is speculation by some that Intel took a gamble the silicon could handle it and did it to push the limits of whats achievable on that die in order to compete with Ryzen without having to put in much effort or cost. Of course, I have no idea what the real reason is, but if this is anywhere near true, then it's really backfired. Though, it does seem like something Intel would do. They command the largest market share and have topped AMD for many years until Ryzen came along. Many people think they got complaisant and, as it was said above, got cocky.

It really sucks for all the system integrators that now have to get caught up in the mess. I just hope Intel does the right thing and makes it right for their customers, both end users and integrators, and allow all owners of those chips an RMA. If I had one of those chips, I would want it replaced even if I was not seeing any signs of a fault, simply becasue there is no way to know if there's damage you just haven't seen yet.

So glad I'm not in this boat. 😜

So, there is that, which is what the new microcode updates and more conservative power profiles are supposed to address. But there is also some rumors and maybe a partial admission by Intel that there was a period in 2023 when the actual manufacturing process had contaminants, which may have caused oxidation in the copper vias? I may have completely butchered that. Gamers Nexus did a pretty in depth video on it, and may decide to send off some failed 13th gen CPUs to a failure analysis lab for further confirmation/education content.

Win95/DOS 7.1 - P233 MMX (@2.5 x 100 FSB), Diamond Viper V330 AGP, SB16 CT2800
Win98 - K6-2+ 500, GF2 MX, SB AWE 64 CT4500, SBLive CT4780
Win98 - Pentium III 1000, GF2 GTS, SBLive CT4760
WinXP - Athlon 64 3200+, GF 7800 GS, Audigy 2 ZS

Reply 23 of 42, by StriderTR

User metadata
Rank Member
Rank
Member
Namrok wrote on 2024-07-26, 19:34:

So, there is that, which is what the new microcode updates and more conservative power profiles are supposed to address. But there is also some rumors and maybe a partial admission by Intel that there was a period in 2023 when the actual manufacturing process had contaminants, which may have caused oxidation in the copper vias? I may have completely butchered that. Gamers Nexus did a pretty in depth video on it, and may decide to send off some failed 13th gen CPUs to a failure analysis lab for further confirmation/education content.

Yeah, that one may affect a smaller batch and only 13th gen, but there no correcting it after the fact, no saving it via microcode. Will be interesting to see how it plays out.

No matter how you slice it, it's a mess.

Retro Blog: https://theclassicgeek.blogspot.com/
Archive: https://archive.org/details/@theclassicgeek/
3D Things: https://www.thingiverse.com/classicgeek/collections

Reply 24 of 42, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I haven't had any troubles with my two 12600K systems. It's really something that they could mess a refresh up. Surely pushed it too far in the name of competition.

Well I'm mostly interested in their future architecture overhaul using tiles anyway. Let's see how that goes!

Reply 25 of 42, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
Namrok wrote on 2024-07-26, 19:34:
StriderTR wrote on 2024-07-26, 18:32:
If I understood what I read correctly, the CPU is calling for more voltage than it can handle, no matter the BIOS settings. So t […]
Show full quote
swaaye wrote on 2024-07-26, 17:56:

I didn't know about it being potential silicon degradation. That's interesting.

It makes me think of the defect-related degradation problems with the P67 PCH SATA and Atom LPC/USB/SD. They admitted to those and at least one had a recall.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

If I understood what I read correctly, the CPU is calling for more voltage than it can handle, no matter the BIOS settings. So they are basically slowly frying themselves and there really isn't much the end user can do about it.

There is speculation by some that Intel took a gamble the silicon could handle it and did it to push the limits of whats achievable on that die in order to compete with Ryzen without having to put in much effort or cost. Of course, I have no idea what the real reason is, but if this is anywhere near true, then it's really backfired. Though, it does seem like something Intel would do. They command the largest market share and have topped AMD for many years until Ryzen came along. Many people think they got complaisant and, as it was said above, got cocky.

It really sucks for all the system integrators that now have to get caught up in the mess. I just hope Intel does the right thing and makes it right for their customers, both end users and integrators, and allow all owners of those chips an RMA. If I had one of those chips, I would want it replaced even if I was not seeing any signs of a fault, simply becasue there is no way to know if there's damage you just haven't seen yet.

So glad I'm not in this boat. 😜

So, there is that, which is what the new microcode updates and more conservative power profiles are supposed to address. But there is also some rumors and maybe a partial admission by Intel that there was a period in 2023 when the actual manufacturing process had contaminants, which may have caused oxidation in the copper vias? I may have completely butchered that. Gamers Nexus did a pretty in depth video on it, and may decide to send off some failed 13th gen CPUs to a failure analysis lab for further confirmation/education content.

That's all pretty accurate from what I was able to tell.
The manufacturing process problem has supposedly been rectified, but the question was that how many damaged chips were sold during this time? Apparently there were rumors that Intel declined to RMA many many chips during this time period. The question is whether or not they will revisit these to rectify it, if true. My money is on them continuing to stick their head in the sand.

The whole power delivery issue is also SUPPOSEDLY unrelated to this via oxidization issue.....sure it isn't.
Furthermore if the microcode solution actually fixes the problem, and IF it doesn't nerf performance to the point of nullifying the purpose of buying the K/KS sku chips......have the chips themselves already been damaged byt being overvolted for up to a year and a half?

Who knows. And there isn't any easy way to tell.
As someone else mentioned if I had one of these chips, I'd sure as hell want a replacement. Even if I didn't have one exhibiting problems. It could already be degraded and just asymptomatic.

Like I said before. It's a mess.

Reply 26 of 42, by the3dfxdude

User metadata
Rank Oldbie
Rank
Oldbie
The Serpent Rider wrote on 2024-07-26, 02:50:

No doubt about it, but the quality of silicon in mass production could vary beyond projection. So buying a high-end CPU becomes a lottery. 13600/14600 chips and lower chips are mostly unaffected it seems.

Indeed it does vary and that's what makes things scary. When I was first introduced to how degradation is modeled, it was quite laughable. We've improved on it, but there is some catch-up flow work needed at companies, but yet I see designs as you mention pushed things way farther than I am comfortable with. The reports of problems as of late are not surprising to me.

I don't think current CPU designs from all the vendors are going to these avoid problems, unless something changes.

Reply 27 of 42, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++
swaaye wrote on 2024-07-26, 22:07:

I haven't had any troubles with my two 12600K systems. It's really something that they could mess a refresh up. Surely pushed it too far in the name of competition.

12600/13600/14600 chips have more conservative power limit. So there should not be any, unless you try to manually OC them.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

You can't hurt the memory controller by simple overclocking. But Athlon 64 introduced caveats like memory controller lottery, which scaled with voltage, so any potential degradation was an user error, i.e. unsafe overvolting.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 28 of 42, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
The Serpent Rider wrote on 2024-07-27, 02:16:
12600/13600/14600 chips have more conservative power limit. So there should not be any, unless you try to manually OC them. […]
Show full quote
swaaye wrote on 2024-07-26, 22:07:

I haven't had any troubles with my two 12600K systems. It's really something that they could mess a refresh up. Surely pushed it too far in the name of competition.

12600/13600/14600 chips have more conservative power limit. So there should not be any, unless you try to manually OC them.

I also remember some problems with Athlon 64s (probably S939) and overclocking that would gradually kill the memory controller.

You can't hurt the memory controller by simple overclocking. But Athlon 64 introduced caveats like memory controller lottery, which scaled with voltage, so any potential degradation was an user error, i.e. unsafe overvolting.

True enough. But doesn't that in itself indicate an underlying problem with the 12/13/14 gen architecture?
If Intel is selling the i5 sku's as K or KS variants, then they should at least afford some reliable overclocking, whether manually or auto OC or whatever they call it.
I think this illustrates the point that Intel has simply been pushing their tech too hard. It just isn't really capable of doing what they are asking it to do.

Reply 29 of 42, by lti

User metadata
Rank Member
Rank
Member

My limited experience with 12th-gen (the Dell Precision 7670 I use at work) already felt like a flawed platform. I initially thought that it was just that one computer having some weird power saving bugs that resulted in random 1-2 second freezes (complete with the short repeating sound loop that you get in a blue screen), but I've since learned that other brands have the same problem. The ridiculous power consumption also means that anything other than an overgrown gaming laptop will throttle horribly under load, and those gaming laptops are loud.

Newer generations are even worse. I was surprised to see that the power limit for a non-K i5 was 153W. That's almost three times what my "old" (according to the target market for the i9-14900KS and RTX 4090) i5-8500 claims to draw under load (HWMonitor reports 55W when rendering video, but I'm assuming that's horribly inaccurate). I doubt that it's three times faster (at least for what I do). That makes me want to get a power meter and measure full system power consumption (including both monitors) to see what I'm "missing out" on.

I still think that AMD is being artificially forced into the gamer/enthusiast market by motherboard and laptop manufacturers making stupid decisions. I also feel like there's a chance of chipset bugs and compatibility issues being covered up by fanboy-level screaming. Back when the Ryzen 4000 series was new and I was looking for a new laptop, it really felt like people were trying to hide chipset-level bugs under stupid complaints like "it won't play 8K AV1 video smoothly" (through the laptop's built-in 1080p display). The mass market and business market just want it to work out-of-the-box, so they stick with Intel. Unfortunately, Intel stuff doesn't work anymore, and no amount of fiddling with the BIOS seems to be able to fix it.

Intel486dx33 wrote on 2024-07-26, 12:25:

Remove the Side Cover and place a BIG Fan in front.

Removing the side will disrupt the airflow in a well-designed case and cause more problems.

Here's a case with tons of fan bays (and actually the same case that my main desktop uses - I should at least put taller feet on it because the power supply filter is almost touching the desk):
https://www.youtube.com/watch?v=A0nDTz9nBfI
https://www.youtube.com/watch?v=-DnLvemgy5M

Intel486dx33 wrote on 2024-07-26, 17:52:

Over heating Capacitors

Motherboards had shitty caps back then. Recapped motherboards could (and have) run for decades.

Reply 30 of 42, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie

AMD had many problems of very poor quality motherboards during the years when most AMD motherboards used Nvidia chip sets that were in performance well below those of Intel, although on paper, Nvidia indicated otherwise.
A question for those who know more than me on the matter, if Intel processors dissipate more than 200W even more than 250W according to what I read somewhere, I do not know at what voltage, but I imagine it is around 1150mV, how many amps will be passing through the small pins of the socket and of course through the internal components of the processor. We know that the conductor cross section limits the amps it can conduct.

Reply 31 of 42, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I haven't had any problems at all with the two 12600K systems I use. Built when the 12th gen launched. I've overclocked them to 5.0GHz on all P cores for extended periods with a 0.01v offset voltage increase. The power consumption is almost unmanageable with air cooling though, and the heat flux is insane with modern chips. The temps can spike to throttling before the heatsink even notices a change.

I don't recall any talk about the 12-13th gens degrading? I think the overclocker community would have noticed? I might need to do some searching.

Reply 32 of 42, by the3dfxdude

User metadata
Rank Oldbie
Rank
Oldbie
Namrok wrote on 2024-07-26, 19:34:

So, there is that, which is what the new microcode updates and more conservative power profiles are supposed to address. But there is also some rumors and maybe a partial admission by Intel that there was a period in 2023 when the actual manufacturing process had contaminants, which may have caused oxidation in the copper vias? I may have completely butchered that. Gamers Nexus did a pretty in depth video on it, and may decide to send off some failed 13th gen CPUs to a failure analysis lab for further confirmation/education content.

Yes. Software controls that can fry the circuit. Brilliant design folks... wait until it gets exploited.

And as far as bad silicon, I mean sure it happens. But let's keep trying to shrink the process manufacturing further! Nevermind all the effort already to avoid contamination is already insane, let alone what that means to reliability in general.

Reply 33 of 42, by lti

User metadata
Rank Member
Rank
Member
swaaye wrote on 2024-07-27, 17:23:

I don't recall any talk about the 12-13th gens degrading? I think the overclocker community would have noticed? I might need to do some searching.

It's mostly 13th-gen that's degrading.

the3dfxdude wrote on 2024-07-27, 17:54:

Yes. Software controls that can fry the circuit. Brilliant design folks... wait until it gets exploited.

Of course, they will be exploited so "gaming" motherboards get marginally higher benchmark scores.

Reply 34 of 42, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
swaaye wrote on 2024-07-27, 17:23:

I haven't had any problems at all with the two 12600K systems I use. Built when the 12th gen launched. I've overclocked them to 5.0GHz on all P cores for extended periods with a 0.01v offset voltage increase. The power consumption is almost unmanageable with air cooling though, and the heat flux is insane with modern chips. The temps can spike to throttling before the heatsink even notices a change.

I don't recall any talk about the 12-13th gens degrading? I think the overclocker community would have noticed? I might need to do some searching.

The original manufacturing defect with the oxidization of the vias was in regards to 13th gen chips afaik.
From what I read and watched from Level 1 Techs, Gamers Nexus, Andantech and others it seems like the consensus was that there was a failure rate of around 50% being reported from OEMs and system integrators.
Now whether this was 14th gen alone, or a mix or if it was only i9 skus or a mix of lower core counts as well I don't know.

I would think personally, that it would be damn hard to know for sure. Just because you don't have any issues at present, doesn't mean there isn't underlying damage wait to reveal itself in time.
But that does with anything really. I don't have any of the affected chips but if I did have any and they were old enough to be past warranty and still working fine?
I'd assume they were just fine and use them as I otherwise would.

Reply 35 of 42, by ncmark

User metadata
Rank Oldbie
Rank
Oldbie

I hate to say it but this sounds like Boeing. Siphon off all the profits into CEO pockets and shareholder returns instead of product development, and when some real competition comes along put something together half-baked

Reply 37 of 42, by darry

User metadata
Rank l33t++
Rank
l33t++
DosFreak wrote on 2024-07-29, 19:05:

Wow.

If that article is accurate (and I see no obvious reason to doubt it), Intel's response gives absolutely no confidence in them. I don't get the impression that they have a cohesive and thought out strategy in terms of damage control here at all.

I am reading between the lines here, but my guess would be that while they probably understand the mechanism for the issue and how to mitigate it proactively

a) they just don't have an accurate model of how much irreversible damage has already been done nor how much will still be done before the mitigation strategy is put into place nor how many CPUs will actually end up failing

b) based on a), they don't want to overcommit themselves or overreact unnecessarily (hope of minimizing costs)

c) pure conjecture on my part, but the mitigation might be effective in preventing or slowing down degradation in still relatively "healthy" chips, but it may already be too late for some still working ones. It is also possible that the mitigation only slows down the failure process to a point where Intel hopes that most chips will live through their commercially useful lifespan (Intel may not have clarity on even this).

IMHO, if Intel was actually confident in their strategy and its effectiveness, they wouldn't be giving the impression of playing it by the ear as much and would be more assertive, committed and transparent. Ironically, all this ambiguity would be putting me off buying any gen 13 or 14 CPU of theirs if I was in the market for a CPU, even if the mitigation came out today.

I will go one step further in saying that Intel are essentially implying that any current use (before the still unreleased mitigation) of those chips that are currently still working runs the risk of irreversibly damaging them. IMHO, that implied positioning is not making Intel look good at all.

Reply 38 of 42, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
DosFreak wrote on 2024-07-29, 19:05:

That's just pretty a disappointing response, I must say.
I wish I could say that I'm surprised, but I'm not.

Intel will Intel I suppose.

Reply 39 of 42, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Yep, things aren't getting good for Intel with 15k layoffs.

Also Gamers Nexus analysis on the situation: https://www.youtube.com/watch?v=b6vQlvefGxk

I must be some kind of standard: the anonymous gangbanger of the 21st century.