Why do the voodoo 4 and 5 not have hardware support for T&L?

Reply 40 of 103, by Deano

Posted on 2024-04-13, 10:27

Deano Offline

Rank Newbie

Rank: Newbie
Posts: 90
Joined: 2023-11-08, 15:59

The reason is that hardware TnL was hard and a vastly different development experience from the pixel engines companies had been making. Float units were much harder to make fast and correct than the low precision integer pipes that graphics chips before had got away with.

Many HW TnL engine before the consumer revolution were actually embedded CPU(s) dedicated to the task to offload the main CPU
It was common for i960, DSP or the Sony design (PS2 Vector Units). (In the Dx5/6 I remember at least one of the smaller vendor talked to us about a prototype card they were working on which had an extra x86 cpu dedicated to TnL on the graphics card).

Building a dedicated hardware block or chip to do FPU math faster than an Intel CPU was difficult. The Pentium Pro/P2/P3 floating point units were cutting edge monsters for the time and killed many a CPU workstation chip design.

S3's attempt was buggy and unusable, PVR did succeed but it was too costly for the consumer market at the time (eventually powering the iPhone). 3DFX never shipped one.

The two vendors who actually shipped one are both still here (ATI and NVIDIA), its also why the Fixed Function HW TnL was such a short era. Once you have the math units, its was fairly trivial to add more complex control logic and get vertex shaders which is what happened.

Note: ARTX did also succeed with a custom HW TnL + raster engine design (for Nintendo), so ATI brought them just before it shipped (why Gamecube has a made with ATI sticker on). This has a massive influence on the next big step, NVIDIA evolved the GF design up to and including the GFFX. Whereas ATI with two successful design teams, were able to work on a clean design giving us the legendary R300 based chips.

Game dev since last century

Reply 41 of 103, by Bruno128

Posted on 2024-04-13, 13:07

Bruno128 Offline

Rank Member

Rank: Member
Posts: 447
Joined: 2021-08-02, 13:13

Dolenc wrote on 2024-04-13, 09:24:

like a core2duo at 3ghz+

This is pointless

The attachment IMG_8352.jpeg is no longer available

SBEMU compatibility reports list | Navigation thread

Reply 42 of 103, by rasz_pl

Posted on 2024-04-13, 15:09

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4208
Joined: 2017-06-04, 00:57

Kruton 9000 wrote on 2024-04-13, 08:09:

But FX series was developed by ex-3dfx engineers...

Totally wouldnt surprise me, the "our 32bit Shaders are implemented using 16bif FPU units, who needs full 32bit precision anyway? 32 precision at half throughput" does sound like something 3dfx engineers would do 😀

Deano wrote on 2024-04-13, 10:27:

HW TnL engine
3DFX never shipped one.

they never designed one, afair they bought Gigapixel to do it for them?
reading https://www.eetimes.com/3dfx-acquires-graphic … or-186-million/ is a like a horror story
"company will address a weak point: its reliance on external design" What the actual F??!?!? 3Dfx didnt do final chip design in house? how? wat? Did they really left such crucial part to some third party system integrator? No wonder they went belly up.

Article also mentions PixelFusion doing something for Number Nine (fail despite actually developing a strong SIMD graphic chip?) and Bitboys (another fail bordering on scam with non existent GPUs, acquired by ATI to be quickly sold on to Qualcomm), both invested heavily into eDRAM https://www.hpcwire.com/2000/03/17/pixelfusio … ransistor-chip/ something Infineon discontinued. Seems eDRAM was all the rage in 2000, ArtX Flipper was also heavy on eDRAM. ATI was rescued by ArtX buyout, I found mentions suggesting R300 was ArtX people.

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 43 of 103, by eddman

Posted on 2024-04-13, 15:44

eddman Offline

Rank Oldbie

Rank: Oldbie
Posts: 501
Joined: 2020-02-21, 23:02

Deano wrote on 2024-04-13, 10:27:

Many HW TnL engine before the consumer revolution were actually embedded CPU(s) dedicated to the task to offload the main CPU
It was common for i960, DSP or the Sony design (PS2 Vector Units). (In the Dx5/6 I remember at least one of the smaller vendor talked to us about a prototype card they were working on which had an extra x86 cpu dedicated to TnL on the graphics card).

In regard to PC graphics, perhaps you mean engines for Triangle Setup and Clipping? One example I know is G200's RISC based triangle setup engine.

Direct3D 6 and older have no concept of hardware T&L, so no pre-7 game can use such proprietary solutions unless directly coded to do so, and it'd be vendor specific, something that game devs weren't too keen on doing anymore.

It does make sense for a console since you HAVE to target that specific hardware and use its APIs.

As a side question, does anyone know if D3D8 and higher hardware still have a specific fixed-function T&L block or simply do those calculations through shaders?

Kruton 9000 wrote on 2024-04-13, 08:09:

But FX series was developed by ex-3dfx engineers...

Was this really the case or just one of those random claims people posted on the internet throughout the years?

I can see 3dfx engineers having some input but to give the entire or main designing responsibility to them doesn't seem plausible to me.

Reply 44 of 103, by leileilol

Posted on 2024-04-13, 16:16

leileilol Offline

Rank l33t++

Rank: l33t++
Posts: 12335
Joined: 2006-12-16, 18:03

eddman wrote on 2024-04-13, 15:44:

Was this really the case or just one of those random claims people posted on the internet throughout the years?

I can see 3dfx engineers having some input but to give the entire or main designing responsibility to them doesn't seem plausible to me.

The original announcement did mention engineers in relation to 3dfx.

long live PCem

Reply 45 of 103, by Dothan Burger

Posted on 2024-04-13, 17:57

Dothan Burger Offline

Rank Member

Rank: Member
Posts: 154
Joined: 2023-02-21, 16:40

eddman wrote on 2024-04-13, 15:44:
In regard to PC graphics, perhaps you mean engines for Triangle Setup and Clipping? One example I know is G200's RISC based tria […]
Show full quote

Deano wrote on 2024-04-13, 10:27:

Many HW TnL engine before the consumer revolution were actually embedded CPU(s) dedicated to the task to offload the main CPU
It was common for i960, DSP or the Sony design (PS2 Vector Units). (In the Dx5/6 I remember at least one of the smaller vendor talked to us about a prototype card they were working on which had an extra x86 cpu dedicated to TnL on the graphics card).

In regard to PC graphics, perhaps you mean engines for Triangle Setup and Clipping? One example I know is G200's RISC based triangle setup engine.

Direct3D 6 and older have no concept of hardware T&L, so no pre-7 game can use such proprietary solutions unless directly coded to do so, and it'd be vendor specific, something that game devs weren't too keen on doing anymore.

It does make sense for a console since you HAVE to target that specific hardware and use its APIs.

As a side question, does anyone know if D3D8 and higher hardware still have a specific fixed-function T&L block or simply do those calculations through shaders?

Kruton 9000 wrote on 2024-04-13, 08:09:

But FX series was developed by ex-3dfx engineers...

Was this really the case or just one of those random claims people posted on the internet throughout the years?

I can see 3dfx engineers having some input but to give the entire or main designing responsibility to them doesn't seem plausible to me.

Didn't Geforce 3 and the Radeon 8500 have fixed function T&L units as well as vertex shaders?

Reply 46 of 103, by Putas

Posted on 2024-04-13, 19:31

Putas Offline

Rank Oldbie

Rank: Oldbie
Posts: 1298
Joined: 2010-11-21, 06:58
Location: q3dm6

Dothan Burger wrote on 2024-04-13, 17:57:

Didn't Geforce 3 and the Radeon 8500 have fixed function T&L units as well as vertex shaders?

That would be quite redundant.

Reply 47 of 103, by 386SX

Posted on 2024-04-13, 19:55

386SX Offline

Rank l33t

Rank: l33t
Posts: 3192
Joined: 2014-10-27, 12:56

Anyway beside how complex the T&L unit may have been to design, S3 did that quite soon and even if weak and/or broken as never enough explained, they actually tried and succeded building something others did only later with mixed results.

Reply 48 of 103, by Dothan Burger

Posted on 2024-04-13, 20:31

Dothan Burger Offline

Rank Member

Rank: Member
Posts: 154
Joined: 2023-02-21, 16:40

Putas wrote on 2024-04-13, 19:31:

Dothan Burger wrote on 2024-04-13, 17:57:

Didn't Geforce 3 and the Radeon 8500 have fixed function T&L units as well as vertex shaders?

That would be quite redundant.

A Techreport article refutes the Geforce 3 having it but says the Radeon 8500 did.

Vertex shader — As in the GeForce3, the vertex shader replaces the old fixed-function transform and lighting (T&L) unit of the GeForce2/Radeon with a programmable unit capable of bending and flexing entire meshes of polygons as organic units.
The Radeon 8500 also includes an entire fixed-function T&L unit (PIXEL TAPESTRY II), which can operate in parallel with the 8500’s vertex shader. The GeForce3, by contrast, implements its backward-compatible fixed-function T&L capability as a vertex shader program.

https://techreport.com/review/radeon-8500-vs- … eforce3-ti-500/

Wikipedia also claims such, either should be taken with grain of salt but it would be interesting if we could test it.

The Radeon 9000 (RV250) was launched alongside the Radeon 9700. The 9000 succeeded the Radeon 7500 (RV200) in the mainstream market segment, with the latter being moved to the budget segment. This chip was a significant redesign of R200 to reduce cost and power usage. Among hardware removed is one of the two texture units, the "TruForm" function, Hierarchical-Z, the DirectX 7 TCL unit and one of the two vertex shaders. In games, the Radeon 9000 performs similarly to the GeForce4 MX 440. Its main advantage over the MX 440 was that it had a full DirectX 8.1 vertex and pixel shader implementation. While the 9000 was not quite as fast as the 8500LE or the Nvidia GeForce3 Ti200, the 8500LE and Ti200 were to be discontinued, though the former was reintroduced due to strong market demand

https://en.wikipedia.org/wiki/Radeon_R200_series

Reply 49 of 103, by kolderman

Posted on 2024-04-13, 21:59

kolderman Offline

Rank l33t

Rank: l33t
Posts: 2529
Joined: 2019-05-12, 04:22

3dfx was focusing on fill-rate not programmability. And VSA100 was a delayed product that should have been released a year earlier. 3dfx Rampage should have been out in 2000 to compete with GF2, which was the first Nvidia GPU with meaningful programmable shaders (T&L never actually amounted to much). 3dfx made a well-known strategic blunder re OEMs and also a design blunder with VSA100 being a complex multi-chip solution. The reasons for this might be partially due to 3dfx background as an arcade machine GPU supplier - where complex multi-chip boards are the norm. The reality is VSA100 would have been fine it was delivered on time (i.e. 1999, all models), and if a stonking Rampage was delivered in 2000 to compete with gf2. It's not clear how a final Rampage would have performed, but it did get to the prototype stage, showing they "almost got there". The business world is littered with stories of business failures that could have been avoided if certain decisions were made differently. 3dfx is just one of them.

Reply 50 of 103, by DrAnthony

Posted on 2024-04-14, 00:06

DrAnthony Offline

Rank Member

Rank: Member
Posts: 123
Joined: 2021-04-16, 22:37

Deano wrote on 2024-04-13, 10:27:

Many HW TnL engine before the consumer revolution were actually embedded CPU(s) dedicated to the task to offload the main CPU
It was common for i960, DSP or the Sony design (PS2 Vector Units). (In the Dx5/6 I remember at least one of the smaller vendor talked to us about a prototype card they were working on which had an extra x86 cpu dedicated to TnL on the graphics card).

Are you thinking about Rendition? Was it called something like "Thriller Conspiracy" or something like that. I think the only reason it didn't ship was that the raster engine got outdated too quickly for the variant with an on card geometry chip to make any sense in the market.

Reply 51 of 103, by DrAnthony

Posted on 2024-04-14, 00:12

DrAnthony Offline

Rank Member

Rank: Member
Posts: 123
Joined: 2021-04-16, 22:37

Deano wrote on 2024-04-13, 10:27:

Note: ARTX did also succeed with a custom HW TnL + raster engine design (for Nintendo), so ATI brought them just before it shipped (why Gamecube has a made with ATI sticker on). This has a massive influence on the next big step, NVIDIA evolved the GF design up to and including the GFFX. Whereas ATI with two successful design teams, were able to work on a clean design giving us the legendary R300 based chips.

ArtX succeed on the PC as well. The ArtX1 shipped as an integrated GPU with hardware T&L on a very late Super Socket 7 design with dual channel RAM no less. It would have made a bigger impact if it was integrated on Slot 1 though, the Athlon was already on the market and anything K6 based was of limited appeal.

Reply 52 of 103, by Ozzuneoj

Posted on 2024-04-14, 03:18

Ozzuneoj Offline

Rank l33t

Rank: l33t
Posts: 3352
Joined: 2016-03-16, 21:33
Location: USA

DrAnthony wrote on 2024-04-14, 00:12:

Deano wrote on 2024-04-13, 10:27:

Note: ARTX did also succeed with a custom HW TnL + raster engine design (for Nintendo), so ATI brought them just before it shipped (why Gamecube has a made with ATI sticker on). This has a massive influence on the next big step, NVIDIA evolved the GF design up to and including the GFFX. Whereas ATI with two successful design teams, were able to work on a clean design giving us the legendary R300 based chips.

ArtX succeed on the PC as well. The ArtX1 shipped as an integrated GPU with hardware T&L on a very late Super Socket 7 design with dual channel RAM no less. It would have made a bigger impact if it was integrated on Slot 1 though, the Athlon was already on the market and anything K6 based was of limited appeal.

The ArtX is so interesting. There aren't many benchmarks out there, but I see several showing it trading blows with a Voodoo3 3500, and even approaching the Geforce DDR. In others, it is slower but still usable. How did this thing end up SO competitive coming from a company that hadn't previously made 3D graphics chips? And it was an integrated chip no less? I'm sure the dual channel memory helped a lot, but even with a 128bit memory bus, 100Mhz memory shared with the CPU is nowhere near the V3 3500's 183Mhz 128bit memory interface, let alone the Geforce 256's DDR memory interface. I'm sure the ArtX lost ground quickly as bandwidth became the bottleneck, but there were companies making 3D cards for multiple generations (S3, and others) that were barely even competing with the lower end offerings from Nvidia and 3dfx at this point.

Would have been pretty cool to see ArtX put on a graphics card back then. I'm sure it's performance would have gotten some attention with 166-183Mhz SDRAM\SGRAM on a 128bit bus. I have a feeling the drivers would have been the weakest link though, as it was with most companies back then.

Now for some blitting from the back buffer.

Reply 53 of 103, by leileilol

Posted on 2024-04-14, 04:22

leileilol Offline

Rank l33t++

Rank: l33t++
Posts: 12335
Joined: 2006-12-16, 18:03

Ozzuneoj wrote on 2024-04-14, 03:18:

How did this thing end up SO competitive coming from a company that hadn't previously made 3D graphics chips?

ArtX, much like 3dfx, had similar ex-SGI nepotism

long live PCem

Reply 54 of 103, by Ozzuneoj

Posted on 2024-04-14, 04:39

Ozzuneoj Offline

Rank l33t

Rank: l33t
Posts: 3352
Joined: 2016-03-16, 21:33
Location: USA

leileilol wrote on 2024-04-14, 04:22:

Ozzuneoj wrote on 2024-04-14, 03:18:

How did this thing end up SO competitive coming from a company that hadn't previously made 3D graphics chips?

ArtX, much like 3dfx, had similar ex-SGI nepotism

Ah, it all comes back to SGI.

It frustrates me a bit to think of SGI because someone gave me an Indigo 2 (the one that was actually indigo colored) like 20 years ago, and I had no idea what to do with it. It wasn't worth much at the time, and I had no interest in anything that couldn't game. I worked at a computer shop at the time and decided to take it in to show my friend at work. It was there for a while, and one day an acquaintance from another nearby repair shop saw it sitting there when he was in visiting and asked if he could tinker with it for a while. So he took it to his shop. Later that year their shop closed suddenly (after 15-20 years in business) and I never saw that thing again.

Whether I would have kept it all these years, now that it's worth like a thousand bucks, I have no idea... I've moved a lot since then. I likely would have sold it when I got married, before I got heavily into retro computing. Still, it would would be interesting to mess around with one now that I know so much more about the history of 3D graphics.

Now for some blitting from the back buffer.

Reply 55 of 103, by Deano

Posted on 2024-04-14, 05:38

Deano Offline

Rank Newbie

Rank: Newbie
Posts: 90
Joined: 2023-11-08, 15:59

eddman wrote on 2024-04-13, 15:44:

Direct3D 6 and older have no concept of hardware T&L, so no pre-7 game can use such proprietary solutions unless directly coded to do so, and it'd be vendor specific, something that game devs weren't too keen on doing anymore.

Wasn't to be used by D3D directly, simple linked in custom library would take care of the TnL side and hook up directly to D3D5 API (small extension to feed the post TnL hardware from the HW TnL rather than our vertex buffers). One of the selling points is that we could also upload our code so letting us use our existing custom TnL.

For DirectX 5 generation, we had many custom paths to support the various quirks of chips and drivers anyway, using a few vendor extensions would have been fine. Indeed one of the reason to use OGL at the time was because you could easily use vendor extensions.
Even later era like Dx8/9 there were custom extensions hacked in to Direct3D (Anybody remember the Dx8 face/head extension, who was that now? Matrox? Essentially a fixed function paletted matrix TnL tech prior to vertex shaders)

Game dev since last century

Reply 56 of 103, by eddman

Posted on 2024-04-14, 12:18

eddman Offline

Rank Oldbie

Rank: Oldbie
Posts: 501
Joined: 2020-02-21, 23:02

Deano wrote on 2024-04-14, 05:38:

Wasn't to be used by D3D directly, simple linked in custom library would take care of the TnL side and hook up directly to D3D5 API (small extension to feed the post TnL hardware from the HW TnL rather than our vertex buffers). One of the selling points is that we could also upload our code so letting us use our existing custom TnL.

For DirectX 5 generation, we had many custom paths to support the various quirks of chips and drivers anyway, using a few vendor extensions would have been fine. Indeed one of the reason to use OGL at the time was because you could easily use vendor extensions.

Was it the unreleased Rendition mentioned by DrAnthony?

Which dev studio were you working at? Other studios weren't as willing to support proprietary stuff in late 90s.

Even later era like Dx8/9 there were custom extensions hacked in to Direct3D (Anybody remember the Dx8 face/head extension, who was that now? Matrox? Essentially a fixed function paletted matrix TnL tech prior to vertex shaders)

I don't think they were hacked into D3D itself. Yea, that matrox HeadCasting Engine. IINM it wasn't exactly D3D compliant. Apparently you had to implement it through matrox's own software? I don't think any game uses it~~, or at least any of the extra features~~ It seems the TnL unit wasn't compatible with existing games at all.

It wasn't before vertex shaders though. Geforce 3 had already been released a few months earlier.

Reply 57 of 103, by Deano

Posted on 2024-04-14, 15:31

Deano Offline

Rank Newbie

Rank: Newbie
Posts: 90
Joined: 2023-11-08, 15:59

Was at Promethean Designs at the time, a small UK dev house (we did Powerboat Racing and Renegade Racer when I was there). Somebody at Interplay QA commented to me that I was the first single exe HW 3D game they published (we supported PVR to 3DFX with most of the others in-between with just the single exe and api), no idea if true but clearly showed most devs during late 90s were at least expecting Glide and D3D (I did actually write a Glide version of PBR but was the arcade version and never shipped on normal PCs).

A decade later in 2009ish when I did Brink, we were OpenGL and has custom paths in places for NV and ATI, as did Id and the other few OGLers. Of course now Vulkan has similar options and I think its still fairly common today (not a graphics person anymore so don't know the details)

They were hacked via D3D (which didn't have an extension API), we would do a certain set of 'things' to the conventional D3D API, that told the IHV driver woudl catch and detect you wanted to do 'special' stuff, this bypassed the MS API lock. Similar were the custom depth stuff for both NV and ATI (by asking for a certain FourCC format they altered the pipeline to allow us devs to get the native values) and ATI had render 2 vertex stuff.

Honestly can't remember who were talking to about x86 TnL vid card, mostly they were talking to us about would we port our cpu integer TnL for PBR via to this card with its extra CPU. Have a vague feeling in might have been in collaboration with a lower end CPU vendor for a low end PC upgrade all in one. The card would give you 3D acceleration + enough extra TnL power over your shitty crappy float CPU. Never went anywhere though...

Might have got the version wrong, maybe it was Dx7 that headcasting was supported on then? (its been 20+ years). I remember having a look as we didn't have a way of HW accelerated bones at the time. We had close relations with Matrox so might have got it early (we supported and did a promo with them, showing a single PC using 4 monitors and 4 controllers for 4P Renegade Racers). It was too niche and had some stupid restrictions to be really usable via vertex palette stuff and VS1.1 was coming so it was a dead end.

Game dev since last century

Reply 58 of 103, by DrAnthony

Posted on 2024-04-20, 20:36

DrAnthony Offline

Rank Member

Rank: Member
Posts: 123
Joined: 2021-04-16, 22:37

Sorry to drag this back up but I had to rack my old brain a bit to remember where I saw this benchmark but this really clearly shows why hardware T&L was a nothinbuger.

https://www.anandtech.com/show/742/11

You can see the software fallback on the Kyro II kept up with the GeForce 2 without any sort of exotic CPU. It would be interesting to see where the original GeForce would fall in this benchmark as well.

Now, that isn't saying that it's completely useless, in real world conditions it would definitely be useful to free up CPU cycles for other tasks, but it was more of a floor raiser than pushing up the ceiling. I'm sure 3DFX was aware of this and likely was looking more towards a programmable future, they just couldn't stay alive long enough to get there.

Edit: Thought some more on this and realized that the gigahertz Athlon T-bird he used in this tested was released the same year (possibly within a few months) as the GeForce 2. It was high end, but wasn't the top dog for the Athlon that year and the T&L engine on the GPU couldn't outpace it here. It would be super interesting to see what it would take CPU wise to keep up with the GeForce 3 since it did provide a pretty significant leap.

Last edited by DrAnthony on 2024-04-20, 22:02. Edited 1 time in total.

Reply 59 of 103, by AntonPC

Posted on 2024-04-20, 21:07

AntonPC Offline

Rank Newbie

Rank: Newbie
Posts: 25
Joined: 2024-04-19, 07:54

I was surprised when I tried last year a PCI SiS 315E 32MB card with Max Payne on a WinXP Pentium 4 setup and had hardware T&L support. As far I remember it run pretty smooth, too. 😀

Newest: Intel i5 10400F / Asus B460M-CSM / Intel ARC770 LE 16GB / 32GB DDR4 Kingston / Win 10 64
Oldish: Intel i7 3770 / Gigabyte GA-B75M-D3H / Quadro K620 2GB / 16GB DDR3 Kingston / Win 7 64
Older: loading... WinXP32?
Oldest: loading... Win98SE?

Main menu