ViTi95 wrote on 2023-04-25, 22:35:
In this case we can see that Mode 13H is faster compared to Mode X and Mode VBD in high detail, maybe because the 25 MHz bus is better used with 32-bit copies from backbuffer to VRAM.
Just to recap in High X and VBE its direct byte writes to vga ram, the only difference is in VGA chipset video mode handling. 13h is using backbuffer.
Not a pentium2 so no MRTT so BYTE writes should hurt just as much as DWORD ones.
ViTi95 wrote on 2023-04-25, 22:35: Also we can see that writting less data to the video card is plain faster, as the difference in potato detail is huge compared in Mode X compared to the other modes. I guess it's ok to think that less writes to the VRAM is faster, and that 32-bit copies from RAM are faster than 8-bit direct writes to VRAM.
Very difficult to get clues from this, so many unknowns and moving parts. PCI card, maybe comparable with VLB? inapplicable to ISA case 🙁 Cant really get any clues about ISA due to vastly different bus throughput and we arent answering the question about the difference between 2 consecutive WORD ISA writes versus 1 DWORD one.
We dont know anything about % of R_DrawColumn to R_DrawColumn/R_DrawSpan. Blitter can only potentially speed up R_DrawColumn by cutting number of ISA transactions, it could maybe also enable doing low detail 160x100 video mode by quickly doubling lines.
For convenience sake I pretend Sprites dont exist (no overdraw and no readback). Dropping detail level cuts number of R_DrawColumn calls and length of each R_DrawSpan.
High:
X 64K VGA ram writes in 64K PCI transactions
13h 64K BYTE sys ram writes + 16K sys ram reads on 16 KB cache system + 16K VGA ram writes in 16K PCI transactions
VBD 64K VGA ram writes in 64K PCI transactions
Low:
X 2x less R_DrawColumn/R_DrawSpan, 32K VGA ram writes in 32K PCI transactions
13h 2x less R_DrawColumn/R_DrawSpan, 32K WORD sys ram writes + 16K sys ram reads on 16 KB cache system + 16K VGA ram writes in 16K PCI transactions
VBD 2x less R_DrawColumn/R_DrawSpan, 32K VGA ram writes in 32K PCI transactions
Potato:
X 4x less R_DrawColumn/R_DrawSpan, 16K VGA ram writes in 16K PCI transactions
13h 4x less R_DrawColumn/R_DrawSpan, 16K DWORD sys ram writes + 16K sys ram reads on 16 KB cache system + 16K VGA ram writes in 32K PCI transactions
13h 4x less R_DrawColumn/R_DrawSpan, 32K WORD sys ram writes + 16K sys ram reads on 16 KB cache system + 16K VGA ram writes in 16K PCI transactions
VBD 4x less R_DrawColumn/R_DrawSpan, 16K VGA ram writes in 16K PCI transactions
VBD 4x less R_DrawColumn/R_DrawSpan, 32K VGA ram writes in 32K PCI transactions
What I can read from this:
- X high mode 4x less PCI transactions compensates for 64K sys ram writes + 16K reads enough to be 20% faster suggesting PCI transactions are still a huge bottleneck 😮
- X high/low ~70% difference due to 2x less R_DrawColumn/R_DrawSpan, 2x less PCI transactions
- X low/potato ~55% difference due to 2x less R_DrawColumn/R_DrawSpan, 2x less PCI transactions
- 13h high/low ~32% difference due to 2x less R_DrawColumn/R_DrawSpan, 2x less ram writes
- 13h low/potato ~20% difference due to 2x less R_DrawColumn/R_DrawSpan
- VBD high/low ~70% difference due to 2x less R_DrawColumn/R_DrawSpan, 2x less PCI transactions
- VBD low/potato ~10% difference due to 2x less R_DrawColumn/R_DrawSpan
1 Cutting number of R_DrawColumn/R_DrawSpan and PCI transactions makes huge difference, while cutting just R_DrawColumn/R_DrawSpan is ~2x less beneficial.
2 Potato not scaling in 13h/VBD because we only save little CPU while retaining same number of VGA transfers.
3 Why such a huge gap between X and 13h in potato? Both do same amount of computation and PCI transactions, and X has to read back slow VGA ram for invisible sprites. Is 32K WORD sys ram backbuffer writes so costly on quite peppy DX4-75?