Reply 40 of 103, by Deano
The reason is that hardware TnL was hard and a vastly different development experience from the pixel engines companies had been making. Float units were much harder to make fast and correct than the low precision integer pipes that graphics chips before had got away with.
Many HW TnL engine before the consumer revolution were actually embedded CPU(s) dedicated to the task to offload the main CPU
It was common for i960, DSP or the Sony design (PS2 Vector Units). (In the Dx5/6 I remember at least one of the smaller vendor talked to us about a prototype card they were working on which had an extra x86 cpu dedicated to TnL on the graphics card).
Building a dedicated hardware block or chip to do FPU math faster than an Intel CPU was difficult. The Pentium Pro/P2/P3 floating point units were cutting edge monsters for the time and killed many a CPU workstation chip design.
S3's attempt was buggy and unusable, PVR did succeed but it was too costly for the consumer market at the time (eventually powering the iPhone). 3DFX never shipped one.
The two vendors who actually shipped one are both still here (ATI and NVIDIA), its also why the Fixed Function HW TnL was such a short era. Once you have the math units, its was fairly trivial to add more complex control logic and get vertex shaders which is what happened.
Note: ARTX did also succeed with a custom HW TnL + raster engine design (for Nintendo), so ATI brought them just before it shipped (why Gamecube has a made with ATI sticker on). This has a massive influence on the next big step, NVIDIA evolved the GF design up to and including the GFFX. Whereas ATI with two successful design teams, were able to work on a clean design giving us the legendary R300 based chips.
Game dev since last century