VOGONS


Wonders of 486 DX4, Treasures inside

Topic actions

Reply 60 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
aitotat wrote on 2022-06-20, 15:43:

Clock for clock this is the slowest CPU I tested. At 100 MHz with 8k WB cache it is slower than Intel with 16k WT cache. At full 120 MHz it is only a little faster than Intel CPUs at 100 MHz. But even at full 120 MHz it consumes less power than Intel CPUs, thanks to 500nm core (Intel has 600nm) and of course half cache means less transistors to produce heat. AMD also has 500nm core with 16K cache but it is supposed to be rare. More likely AMD DX4 CPU with 16k cache is actually 350nm 5x86 core and that might be a bad thing because it has a problem when disabling L1 cache.

This makes no sense, sounds like your board is adding a wait state for 40MHz FSB.

Reply 61 of 101, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie

Yes a Cyrix 5x86 is a good match for a PCI board. That is one reason I settled on it.

I don't think that the AMD cores are inferior to Intel's cores. Unless I miss my guess other than WB/WT and microcode (N or no N) the cores are very nearly identical.

Reply 62 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
Jasin Natael wrote on 2022-06-20, 21:27:

I don't think that the AMD cores are inferior to Intel's cores. Unless I miss my guess other than WB/WT and microcode (N or no N) the cores are very nearly identical.

This is true, hence why I think some shennanigans are going on with the benchmarks. The differences should be negligable.

Reply 63 of 101, by heckyeah

User metadata
Rank Newbie
Rank
Newbie

I mean the cores are at probably same levels of performance but the AMD chip has only half the L1 of Intel. Performance parity is only reached with 16kb AMD chips. More cache = more better

Reply 64 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
heckyeah wrote on 2022-06-21, 07:17:

I mean the cores are at probably same levels of performance but the AMD chip has only half the L1 of Intel. Performance parity is only reached with 16kb AMD chips. More cache = more better

It REALLY does not make that much of a difference.

https://thandor.net/benchmark/32

8KB Am486 100 is neck and neck with 16KN i486-100.

Even the Cx486-100 is within margin of error here.

You will also note that Am486-120 is tied with the Am486-100; probably as a result of extra wait states at 40MHz FSB as noted above.

Also see here for CPU Galaxy's comparison video: https://www.youtube.com/watch?v=vIob0r6GjSU

You will see that while synthetic benchmarks like Speedsys, 3DBench appear to clearly favor i486, the performance in real world applications like Doom and Quake are a wash - 8 vs 16KB difference is actually marginal, whereas WB does make a difference.

Moral of this story: Take Speedsys results with a grain of salt.

Personally, I stick to NSSI scores for comparison as they are VERY good gauges of real world ALU and FPU performance in my experience.

Reply 65 of 101, by heckyeah

User metadata
Rank Newbie
Rank
Newbie
appiah4 wrote on 2022-06-21, 08:34:
It REALLY does not make that much of a difference. […]
Show full quote
heckyeah wrote on 2022-06-21, 07:17:

I mean the cores are at probably same levels of performance but the AMD chip has only half the L1 of Intel. Performance parity is only reached with 16kb AMD chips. More cache = more better

It REALLY does not make that much of a difference.

https://thandor.net/benchmark/32

8KB Am486 100 is neck and neck with 16KN i486-100.

Even the Cx486-100 is within margin of error here.

You will also note that Am486-120 is tied with the Am486-100; probably as a result of extra wait states at 40MHz FSB as noted above.

Also see here for CPU Galaxy's comparison video: https://www.youtube.com/watch?v=vIob0r6GjSU

You will see that while synthetic benchmarks like Speedsys, 3DBench appear to clearly favor i486, the performance in real world applications like Doom and Quake are a wash - 8 vs 16KB difference is actually marginal, whereas WB does make a difference.

Moral of this story: Take Speedsys results with a grain of salt.

Personally, I stick to NSSI scores for comparison as they are VERY good gauges of real world ALU and FPU performance in my experience.

Well, yeah, Speedsys probably does something that fills up the cache. The bigger cache doesn't make a difference ... until it does. Just run a set of instructions that doesn't fit in to that 8kb or something that benefits from reading and writing in a single cycle and you'll probably start seeing the difference. All 486 benchmarks should always be taken with a grain of salt and synthetics will always be different from "real world" bencmarks.

Reply 66 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
heckyeah wrote on 2022-06-21, 08:52:
appiah4 wrote on 2022-06-21, 08:34:
It REALLY does not make that much of a difference. […]
Show full quote
heckyeah wrote on 2022-06-21, 07:17:

I mean the cores are at probably same levels of performance but the AMD chip has only half the L1 of Intel. Performance parity is only reached with 16kb AMD chips. More cache = more better

It REALLY does not make that much of a difference.

https://thandor.net/benchmark/32

8KB Am486 100 is neck and neck with 16KN i486-100.

Even the Cx486-100 is within margin of error here.

You will also note that Am486-120 is tied with the Am486-100; probably as a result of extra wait states at 40MHz FSB as noted above.

Also see here for CPU Galaxy's comparison video: https://www.youtube.com/watch?v=vIob0r6GjSU

You will see that while synthetic benchmarks like Speedsys, 3DBench appear to clearly favor i486, the performance in real world applications like Doom and Quake are a wash - 8 vs 16KB difference is actually marginal, whereas WB does make a difference.

Moral of this story: Take Speedsys results with a grain of salt.

Personally, I stick to NSSI scores for comparison as they are VERY good gauges of real world ALU and FPU performance in my experience.

Well, yeah, Speedsys probably does something that fills up the cache. The bigger cache doesn't make a difference ... until it does. Just run a set of instructions that doesn't fit in to that 8kb or something that benefits from reading and writing in a single cycle and you'll probably start seeing the difference. All 486 benchmarks should always be taken with a grain of salt and synthetics will always be different from "real world" bencmarks.

Good luck finding a set of genuine real world instructions that max out a 486's 8KB cache and don't run like shit regardless. Let it be noted that even the original Pentium had 16KB cache, and that was actually split for code and associative write back so even THAT CPU did not conform to a 16KB instruction cache capability. Some features are just overkill marketing bullet points.

Reply 67 of 101, by heckyeah

User metadata
Rank Newbie
Rank
Newbie

Interesting points! Thank you for your insight

Reply 68 of 101, by aitotat

User metadata
Rank Member
Rank
Member

I wouldn't look at the Speedsys results too closely. I included it mostly to show CPUID end perhaps how overclocking effect. It is a synthetic benchmark so it likely fits in 16k L1 cache but does not fit in 8K cache. That explains why 8K Am486 performs so poorly and why I didn't do more synthetic benchmarks expect Topbench (I stored Topbench results to a database file but it requires some cleaning). But that CPU is not very good in ANY of the benchmarks and almost needs the extra 20 MHz to keep up with Intel.

It is possible, for example that I forgot it in WT mode when doing 120 MHz bencmarking (I don't remember what order I tested) or messed with some BIOS settings. I can run a few benchmarks again to see if they match. I don't think that is the problem here. It can be seen that Am5x86 is slower than Intel at the same clock and the Am486 does have less cache so it must be even slower.

What I don't understand is why Intel performs so much better on PC Player benchmarks than AMD or Cyrix.

Reply 69 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
aitotat wrote on 2022-06-21, 10:16:

It is possible, for example that I forgot it in WT mode when doing 120 MHz bencmarking (I don't remember what order I tested) or messed with some BIOS settings. I can run a few benchmarks again to see if they match. I don't think that is the problem here. It can be seen that Am5x86 is slower than Intel at the same clock and the Am486 does have less cache so it must be even slower.

The first assertion is outright wrong the Am5x86 core is essentially the exact same Am486 core on a smaller node. The second one is true but the magnitude of difference should have been miniscule.

If you don't believe me, again, look here:

The Ultimate 486 Benchmark Comparison

This is AFAIC the holy grail of 486 benchmarks. And feipoa's findings show that there is about 1% ALU+FPU difference between Intel WB and AMD WB CPUs.

Overall_486.png

Even in pure ALU the AMD part is only 8% slower - and that is due mostly to, yes you guessed it, stupid synthetic benchmarks.

ALU_486.png

And even then the AMD-120 part is SIGNIFICANTLY faster than Intel-100.

So yeah, something is not right with your test setup. Quite possibly your board runs AMD CPUs with some hidden wait states or something.

Last edited by appiah4 on 2022-06-21, 13:35. Edited 1 time in total.

Reply 70 of 101, by heckyeah

User metadata
Rank Newbie
Rank
Newbie

So I thought I'd fire off some quick tests as well because the topic piqued my interest and I have a 486 here right now that I can easily bench.

I have here a Zida 4DPS 2.11 with jumpers set to 5x86-133, tightest timings possible (all maxed except memory RAS at 3 and memory speed FASTER) and 2 CPUs:

Am5x86-133ADZ (16kb WB)
IntelDX4-100 SK096 (16kb WB)

Other Specs
Generic S3 Trio64 2mb PCI
64mb FPM
256k L2 Cache (set to write through)

Between CPU swaps, nothing was changed except FSB jumper (one jumper from 33mhz to 40mhz) and multiplier jumper (off for 1/3 or on 2/4). Settings otherwise remained exactly same.

To get the Intel DX4-120 results for Doom and Quake, I had to overvolt to 5v. Quake had to load at 100mhz and then be overclocked to 120mhz immediately before the run starts as it wasn't stable enough to load the game (but was stable enough to run it).

The 4DPS works fine with on the fly overclocking via jumper switching. I verified that this did not chance results from boot vs during runtime (ie. no sneaky checks for 40mhz to add wait states while booting)

Phil's Benchmark Suite

Processor |  3DBench | Doom Max | Quake 320
___________________________________________
iDX4-100 | 65.0 | 36.9 | 11.4
AmdX5-100 | 62.3 | 38.7 | 10.9
iDX4-120 | 81.5 | 44.1 | 13.6
AmdX5-120 | 74.8 | 45.3 | 13.1
AmdX5-133 | 72.5 | 44.8 | 13.4
AmdX5-160 | 87.1 | 53.3 | 16.1

Speedsys v4.78 (for reference mostly)

Processor | CPUScore | MemoryBW | L1 Cache | L2 Cache | Throughput 
_____________________________________________________________________
iDX4-100 | 42.48 | 78.14 | 94.64 | 47.20 | 35.48
AmdX5-100 | 37.57 | 78.14 | 94.04 | 45.65 | 34.99
iDX4-120 | 50.97 | 93.89 | 113.66 | 56.68 | 42.63
AmdX5-120 | 45.08 | 93.89 | 112.95 | 54.82 | 42.04
AmdX5-133 | 50.09 | 78.14 | 117.41 | 47.54 | 35.55
AmdX5-160 | 60.11 | 93.89 | 140.98 | 57.10 | 42.72

What pains me is that I also have AmDX4-120SV8B but that's one of those bunk later rebranded 5x86 chips (25544) which have 16kb of cache instead 8kb. I really wanted to see whether that makes a difference, but alas.

Anyways, the difference is small but they seemed to be quite consistent. Apart from Speedsys, I ran all tests twice and got same results always within 0.1fps. I rounded down (ie. chose the lower number if different)

Here AMD wins Doom, loses Quake and 3DBench. Make of these numbers what you will.

Reply 71 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++

Those figures look a whole lot more reasonable to me.

Reply 72 of 101, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
appiah4 wrote on 2022-06-21, 12:41:

Those figures look a whole lot more reasonable to me.

Yeah that is more in line with the other comparisons that I have seen.
CPU galaxy is a good one, also it's been some time ago but Phil also did some testing and he was seeing very little difference in most testing as well.
There are plenty of other examples as well.

Reply 73 of 101, by aitotat

User metadata
Rank Member
Rank
Member

I redid 100 MHz WB and 120 MHz WB tests for the 8k Am486DX4.

The attachment AMDX4both.gif is no longer available

Here are the new results:

CPU               | 3DBench | PCP VGA | PCP SVGA | Topbench | Quake | Doom min | Doom max
AmDX4-120@100 (8k)| 71.4 | 15.6 | 6.9 | 288 | 10.7 | 664 | 1847
AmDX4-120 (8k) | 83.3 | 18.7 | 8.3 | 334 | 12.9 | 569 | 1553

So no or very little changes. And BIOS and jumper settings are correct. I didn't have sound card installed this time but it should not matter since the tests do not have sound enabled.

I think I'm going to try older BIOS version just in case the last beta from 97 does have some issues with non-Intel CPUs.

Reply 74 of 101, by CoffeeOne

User metadata
Rank Oldbie
Rank
Oldbie
appiah4 wrote on 2022-06-21, 10:30:
The first assertion is outright wrong the Am5x86 core is essentially the exact same Am486 core on a smaller node. The second on […]
Show full quote
aitotat wrote on 2022-06-21, 10:16:

It is possible, for example that I forgot it in WT mode when doing 120 MHz bencmarking (I don't remember what order I tested) or messed with some BIOS settings. I can run a few benchmarks again to see if they match. I don't think that is the problem here. It can be seen that Am5x86 is slower than Intel at the same clock and the Am486 does have less cache so it must be even slower.

The first assertion is outright wrong the Am5x86 core is essentially the exact same Am486 core on a smaller node. The second one is true but the magnitude of difference should have been miniscule.

If you don't believe me, again, look here:

The Ultimate 486 Benchmark Comparison

This is AFAIC the holy grail of 486 benchmarks. And feipoa's findings show that there is about 1% ALU+FPU difference between Intel WB and AMD WB CPUs.

Overall_486.png

Even in pure ALU the AMD part is only 8% slower - and that is due mostly to, yes you guessed it, stupid synthetic benchmarks.

ALU_486.png

And even then the AMD-120 part is SIGNIFICANTLY faster than Intel-100.

So yeah, something is not right with your test setup. Quite possibly your board runs AMD CPUs with some hidden wait states or something.

Sorry to complain, but I don't understand that figure.
There are AMD DX4 WB, AMD DX4 WT and Intel DX4.
So is the Intel a L1 WT or a L1 WB? I am sure that DOES make a difference.
Or did I miss something?

EDIT: Also according to this figure, the intel 486DX4 is minimum 10% faster than the AMD at same clock. Or when you compare AMD DX4-100 WT with Intel DX4-100 (WB or WT unknown) a lot more.
Haha

Reply 75 of 101, by appiah4

User metadata
Rank l33t++
Rank
l33t++
CoffeeOne wrote on 2022-06-21, 18:34:
Sorry to complain, but I don't understand that figure. There are AMD DX4 WB, AMD DX4 WT and Intel DX4. So is the Intel a L1 WT […]
Show full quote
appiah4 wrote on 2022-06-21, 10:30:
The first assertion is outright wrong the Am5x86 core is essentially the exact same Am486 core on a smaller node. The second on […]
Show full quote
aitotat wrote on 2022-06-21, 10:16:

It is possible, for example that I forgot it in WT mode when doing 120 MHz bencmarking (I don't remember what order I tested) or messed with some BIOS settings. I can run a few benchmarks again to see if they match. I don't think that is the problem here. It can be seen that Am5x86 is slower than Intel at the same clock and the Am486 does have less cache so it must be even slower.

The first assertion is outright wrong the Am5x86 core is essentially the exact same Am486 core on a smaller node. The second one is true but the magnitude of difference should have been miniscule.

If you don't believe me, again, look here:

The Ultimate 486 Benchmark Comparison

This is AFAIC the holy grail of 486 benchmarks. And feipoa's findings show that there is about 1% ALU+FPU difference between Intel WB and AMD WB CPUs.

Overall_486.png

Even in pure ALU the AMD part is only 8% slower - and that is due mostly to, yes you guessed it, stupid synthetic benchmarks.

ALU_486.png

And even then the AMD-120 part is SIGNIFICANTLY faster than Intel-100.

So yeah, something is not right with your test setup. Quite possibly your board runs AMD CPUs with some hidden wait states or something.

Sorry to complain, but I don't understand that figure.
There are AMD DX4 WB, AMD DX4 WT and Intel DX4.
So is the Intel a L1 WT or a L1 WB? I am sure that DOES make a difference.
Or did I miss something?

EDIT: Also according to this figure, the intel 486DX4 is minimum 10% faster than the AMD at same clock. Or when you compare AMD DX4-100 WT with Intel DX4-100 (WB or WT unknown) a lot more.
Haha

If you read the thread feipoa points out that the i486DX4 is a WB and the only one he had - he laments not having a WT model.

The noted 8% (not 10%) difference between AMD and Intel's WB models is for ONLY ALU tests and those tests are biased towards Intel due to synthetic benchmarks that unrealistically use 8+ KB L1 cache - you will find that real world applications and FPU tests have no such performance discrepancy and the overall ALU+FPU performance difference between the CPUs is rightly <1%.

Also, paging @feipoa for an expert opinion 🤣

Reply 76 of 101, by aitotat

User metadata
Rank Member
Rank
Member

I tested BIOS versions 0402 and 0401 (I was using the last 0402.1 beta) with 8k AMD but no difference. I also tested Intel DX4 WB @ 100 MHz with BIOS 0401 but with same results as with latest BIOS. BIOS version does not affect speed.

Reply 77 of 101, by aitotat

User metadata
Rank Member
Rank
Member

I made my first retrobright attempt. This was just a test to see I don't do anything wrong. Two 3.5" bezels (ivory and grey) and bezel from Mitsumi 4x CD-ROM-drive. Here is how the Mitsumi looks like:

The attachment Mitsumi4x.jpg is no longer available

As can be seen, I did one mistake. I forgot the part from the sled. At least it is now obvious that retrobrite did work. I'm going to retrobrite the original case from this system and take it back to use.

I took a look at old pictures and verified that the original CD-ROM-drive for this system was 4x Panasonic IDE-drive. I should have kept that one. Since I'm now using CF-adapter on front of the case (and a network card once I put this back together), I no longer need those newer DVD-RW-drives to transfer files. So back to good old period correct QUIET CD-ROM-drives. I couldn't find a Panasonic but I did find that 4x Mitsumi I tested retrobright on.

But it was not a good drive. It does not support 80-min CD-R discs at all (it does not detect them). It detects 74-min discs but they must be very high quality discs for it to read them. I also had problems with original discs but recapping helped for that. So basically that Mitsumi doesn't really support CD-R-discs. Even older 2x-speed Panasonic (I like those) does support them nicely. It doesn't support multisession discs (it only sees the first session) but that is not a problem for me. But the 2x-panasonic uses Panasonic interface and that would make sound card stuff harder (I have two or maybe three Panasonic controller cards but there are no free ISA slots).

But no worry. I did find another Mitsumi. It is a 8x drive and supports CD-R discs and is quiet. It is from 96 so a little too new but it will do. I hope I'm going to find one of those 4x Panasonic drives again. No point to change this 8x Mitsumi to anything else.

But since the test bench is up again, I did couple of new tests. This time with GUS Extreme (ES1688f), SB16 CT2940 with CQM and other CT2940 with YMF289B (OPL3 but smaller chip). I tested how speed sensitive they were.

CQM was perfect. Cycles worked with full speed (with Intel DX4-100 WB) and so did Indy 3 and it didn't even need the "a" command line parameter so auto-detection worked perfectly.

ESFM worked perfectly with Cycles. No problems with full speed. But Indy required "a" parameter even with internal cache disabled. Most likely some compatibility issue with detection routines and not because of speed sensitivity.

YMF289B was just as bad as YMF262. It is just as speed sensitive.

Reply 78 of 101, by Joseph_Joestar

User metadata
Rank l33t++
Rank
l33t++
aitotat wrote on 2022-06-22, 15:13:

YMF289B was just as bad as YMF262. It is just as speed sensitive.

Out of curiosity, which game are you testing with?

My OPTi card (which uses a 1:1 YMF289B copy) has no speed issues on a Pentium MMX 166, but I haven't tested anything older than the first Monkey Island on it, and that sounded fine to me. Somewhat newer games like Doom, Duke3D and Tyrian work flawlessly of course.

EDIT - I just realized that Cycles is a game. For some reason, I kept thinking of CPU cycles. 😁 Is the intro music playing at the correct speed in this video? If so, I can use that for comparison purposes.

PC#1: Pentium MMX 166 / Soyo SY-5BT / S3 Trio64V+ / Voodoo1 / YMF719 / AWE64 Gold / SC-155
PC#2: AthlonXP 2100+ / ECS K7VTA3 / Voodoo3 / Audigy2 / Vortex2
PC#3: Athlon64 3400+ / Asus K8V-MX / 5900XT / Audigy2
PC#4: i5-3570K / MSI Z77A-G43 / GTX 970 / X-Fi

Reply 79 of 101, by aitotat

User metadata
Rank Member
Rank
Member

Yes, music speed is correct on that video.

There are two different kinds of speed issued with YMF and Cycles. If CPU is little too fast you get distorted audio (it is instantly obvious) and if CPU is way too fast the game won't start. You just get abnormal program termination error message.