VOGONS


Reply 740 of 1173, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

The mock-up you've created looks very very well. I already did some dither testing directly in the colormap, and the result was no-where near good, but I think is possible to do something similar:

The copy from backbuffer to VRAM is devided into blocks (Status bar, messages, border...), so we can use this to render each block in different ways. Mode 13H executable uses this in order to speed up the copy, only copying required parts of the screen. Maybe we could use dithered rendering for gameplay, and non-dithering for the rest of screen elements.

The same dithering method used with Hercules can be used for this, for example it was already used for EGA 640x200 mode in developing stage (although it was later ditched to custom dither 2x1 to make it a little faster https://youtu.be/uO0NSsDEqC8).

https://www.youtube.com/@viti95

Reply 741 of 1173, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

Quick'n'dirty ordered dithering 4x4 on FastDoom EGA 320x200. It shows promising results, what do you think VileR ?

The attachment dos4gw_002.png is no longer available
The attachment dos4gw_001.png is no longer available
The attachment dos4gw_000.png is no longer available

Edit: updated backbuffer-to-VRAM code is here https://github.com/viti95/FastDoom/blob/order … DOOM/i_ega320.c

https://www.youtube.com/@viti95

Reply 742 of 1173, by appiah4

User metadata
Rank l33t++
Rank
l33t++

Is it possible to disable dithering for the HUD bar? Just an idea..

Reply 743 of 1173, by 7F20

User metadata
Rank Member
Rank
Member
appiah4 wrote on 2023-04-05, 13:01:

Is it possible to disable dithering for the HUD bar? Just an idea..

That's exactly my reaction. The hub bar is cleaner/clearing without dithering.

Reply 744 of 1173, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

Yes it's possible, the backbuffered rendered screen is divided in blocks, so it's possible to copy to VRAM with or without dithering different parts of the screen. Anyway that was very quick'n'dirty, I mean, it's possible to optimize the dithering algorithm much further, for example only applying dithering to darker colors.

https://www.youtube.com/@viti95

Reply 745 of 1173, by rasteri

User metadata
Rank Oldbie
Rank
Oldbie

I think the dithered HUD looks better, you can make out more detail

Reply 746 of 1173, by VileR

User metadata
Rank l33t
Rank
l33t
ViTi95 wrote on 2023-04-05, 11:54:

Quick'n'dirty ordered dithering 4x4 on FastDoom EGA 320x200. It shows promising results, what do you think VileR ?

dos4gw_002.pngdos4gw_001.pngdos4gw_000.png

The 4x4 ordered dithering itself looks good! But yeah, the question is how to apply it - in these shots the results are very high-contrast, with overly-bright colors abruptly giving way to complete black.

I guess it would be ideal if this could be combined with a manually-tweaked colormap, preserving the approach taken by DEAT, and making the transitions smoother - by judiciously rendering *some* colormap entries as 4:4-dither combinations. Especially the ones that are currently 100% black (as in my previous example), but others could be dithered too... without getting carried away, so the details don't become fuzzed out.
If there was a way to do that, I'd sure want to try playing with the possibilities.

This approach could help out elsewhere too... e.g. with 16-color composite CGA the dithering resolution is coarse, but the color-bleed effect makes some patterns smoother (closer to 'solid') than others. No need to jump the gun though.

[ WEB ] - [ BLOG ] - [ TUBE ] - [ CODE ]

Reply 747 of 1173, by ishadow

User metadata
Rank Newbie
Rank
Newbie

DOOM has static 256 colors (let's forget that there are palette effects for pain and radiation suit). This allows to create color lookup tables for dithering. With 2x2 dithering we would need 4 precalculated color tables and use one of them for specific pixel positions. For example:
table 0 for pixel (0,0),
table 1 for pixel (1,0),
table 2 for pixel (0,1),
table 3 for pixel (1,1).

With that approach it is possible to manually pick every color. Since we're using 16-color mode It is also possible to store 2 colors in every byte (accessing each color by bitwise operations), so this could reduce amount of tables by half and use just one table per line.

Anyway don't expect great results with EGA palette. It has only 4 levels of brightness (a bit more if you prioritize brightness over color accuracy) and DOOM graphics is really heavy on lightning and brightness levels.
Dithered EGA looks really well in Wolfenstein 3D, but there's no lightning and while Wolfenstein 3D is a 256 color game, its started as a 16-color EGA game, so its 256 color palette is based on 16-color EGA.

To get DOOM looking good in EGA. We should tweak its 256 color palette to be closer to EGA colors, that would make dithering look much better.

Reply 748 of 1173, by leileilol

User metadata
Rank l33t++
Rank
l33t++

You can't do gamma correction on EGA, so what if you do that on two colormaps (one slightly deviating in brightness for a dither on the span) instead? If there's room for a translucency table, surely you can have a double decker dynamic colormap.

(also please don't transparent the lost souls)

apsosig.png
long live PCem

Reply 749 of 1173, by ishadow

User metadata
Rank Newbie
Rank
Newbie

I was experimenting with DOOM palette to improve dithered EGA look. At first I tried shifting colors towards EGA palette, but it didn't helped. Then I've realized that dithered image lacks color and is too dark so I've increased saturation by 60% in Photoshop and increased brightness slightly.
I was using regular DOOM and I've used EGA dithering shader for DOSBox. Unfortunately it works at increased resolution of 640x400 (4 pixels for each real pixel), but it should have similar results in 320x200. I've attached EGA.WAD that contains modified 256 color palette.

Reply 750 of 1173, by appiah4

User metadata
Rank l33t++
Rank
l33t++

That is actually pretty damn good looking.

Reply 751 of 1173, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

Anyone else noticing that when set for sound blaster the FM music doesn't play? Been happening to me for the last two versions

EDIT: actually it is working it's just stupid quiet compared to vanilla, even with the volume slider at max

Reply 752 of 1173, by 7F20

User metadata
Rank Member
Rank
Member
maxtherabbit wrote on 2023-04-10, 16:56:

EDIT: actually it is working it's just stupid quiet compared to vanilla, even with the volume slider at max

Yes, I noticed that the music volume is *extremely* low for me using Adlib music, especially compared to the sound effects. I have to crank the volume on my stereo to hear it.

IIRC, I had some success with at least one FastDoom release by adjusting the mixer settings in the DOSBox config file, but I think it didn't work for all of them. I have resorted to just cranking the sliders all the way and putting my stereo volume super high.

I haven't actually compared it to vanilla Doom since noticing this, so I can do that.

Reply 754 of 1173, by rasz_pl

User metadata
Rank l33t
Rank
l33t

https://github.com/viti95/FastDoom/commit/42d … 877d859f72d60c3 nice
Did you start optimizing drawspans to 16bit writes after Re: Cirrus Logic GD5429 VLB significantly slower than a TSENG ET4000 ? 😀
I wonder if 32bit writes (where possible) could be even faster. For example https://github.com/viti95/FastDoom/blob/68130 … inearp.asm#L225:

        mov   dl,[eax]               ; translate color
mov dh,dl
mov [edi+PLANE+PCOL*4],dx ; write pixel
mov [edi+PLANE+PCOL*4+2],dx ; write pixel

to

        mov   dl,[eax]               ; translate color
mov dh,dl
mov ax,dx
shl edx, 0x10
mov dx,ax
mov [edi+PLANE+PCOL*4],edx ; write pixel

afaik the slow part here is touching ISA mapped memory buffer, so more instructions manipulating registers can be faster than executing two ISA writes as opposed to one? This is just speculation and would need verifying on 8bit ISA VGA, 16bit ISA VGA, and proper 32bit capable VLB/PCI.

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 755 of 1173, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

Yes, I've added potato and low detail for backbuffered and VBE direct modes (which were missing), those modes requires to write 2 or 4 pixels at the same time on each pixel drawn. It's slower compared to the original Mode X executable (much more data is copied to the RAM/VRAM), so I decided to optimize those a little bit with 16-bit writes.

The problem here with 32-bit writing is that there is no fast way to extend an 8-bit color, because it's impossible to copy the 16-bit low part to the high part without avoiding use two registers and a shift instruction. Also we are very limited here since only EDX register is available, as EAX has the colormap lookup table and is reused on each loop.

I'm thinking of preprocessing the colormap and extending it automatically on low or potato modes, to avoid that realtime conversion.

Edit: I'll do the same optimization for column rendering, as it's been proven to be a little bit faster. On a 486DX4-75 FDOOM13H.EXE has gone from 49.765 fps to 51.945 fps (about 4% faster) in potato mode

https://www.youtube.com/@viti95

Reply 756 of 1173, by M-HT

User metadata
Rank Newbie
Rank
Newbie
ViTi95 wrote on 2023-04-14, 06:31:

The problem here with 32-bit writing is that there is no fast way to extend an 8-bit color, because it's impossible to copy the 16-bit low part to the high part without avoiding use two registers and a shift instruction. Also we are very limited here since only EDX register is available, as EAX has the colormap lookup table and is reused on each loop.

It's possible (except maybe the speed) like this:

        mov   dl,[eax]               ; translate color
mov dh,dl
shl edx, 8
mov dl,dh
shl edx, 8
mov dl,dh
mov [edi+PLANE+PCOL*4],edx ; write pixel

Reply 757 of 1173, by rasz_pl

User metadata
Rank l33t
Rank
l33t
ViTi95 wrote on 2023-04-14, 06:31:

as EAX has the colormap lookup table and is reused on each loop.

now I see it, and its all documented on top of the file 🙁

        shl   edx, 8
mov dl,dh
shl edx, 8
mov dl,dh

shl 3 clocks on 386, mov reg,reg 2, 10 clocks total. The big question is is my assumption about doing a write to ISA mapped memory region slowing CPU down correct? I know for sure that port IO Out instruction does that. Did 386/486 early chipsets have any FIFO for ISA writes?

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 758 of 1173, by rasz_pl

User metadata
Rank l33t
Rank
l33t

so I was reading Cirrus Logic datasheet, as one does 😀 Apparently CL-GD5426/’28/’29 BitBLT lets you draw walls without calculating offsets:
-set width of blit to 1 pixel
-set height of blit to whatever the height of a column is
-set blit source to System Memory
That last bit does something magical, subsequent writes to Video memory ignore address and deposit written bytes in a column automagically. Writes in this mode must be grouped into DWORDs for some reason.
"BLT source is system memory rather than display memory. The CPU performs the system bus transfers; the CL-GD5426/’28/’29 ignores the address provided with such transfers. The CPU is required to transfer data in increments of four bytes. If the total number of bytes moved for a BLT is not a multiple of four, the CPU must write ‘extra’ bytes."

There is also BLT Raster Operation Register with a long list of possible computations, maybe one could be adopted for drawing sprites removing the need for slow read from Video memory.

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 759 of 1173, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie
rasz_pl wrote on 2023-04-14, 10:31:
now I see it, and its all documented on top of the file :( […]
Show full quote
ViTi95 wrote on 2023-04-14, 06:31:

as EAX has the colormap lookup table and is reused on each loop.

now I see it, and its all documented on top of the file 🙁

        shl   edx, 8
mov dl,dh
shl edx, 8
mov dl,dh

shl 3 clocks on 386, mov reg,reg 2, 10 clocks total. The big question is is my assumption about doing a write to ISA mapped memory region slowing CPU down correct? I know for sure that port IO Out instruction does that. Did 386/486 early chipsets have any FIFO for ISA writes?

Writing to the ISA bus is always slow, no matter if the chipset can merge 8-bit writes or you use a fast 16-bit ISA card. The less you write to it, the better.

rasz_pl wrote on 2023-04-19, 01:01:
so I was reading Cirrus Logic datasheet, as one does :) Apparently CL-GD5426/’28/’29 BitBLT lets you draw walls without calculat […]
Show full quote

so I was reading Cirrus Logic datasheet, as one does 😀 Apparently CL-GD5426/’28/’29 BitBLT lets you draw walls without calculating offsets:
-set width of blit to 1 pixel
-set height of blit to whatever the height of a column is
-set blit source to System Memory
That last bit does something magical, subsequent writes to Video memory ignore address and deposit written bytes in a column automagically. Writes in this mode must be grouped into DWORDs for some reason.
"BLT source is system memory rather than display memory. The CPU performs the system bus transfers; the CL-GD5426/’28/’29 ignores the address provided with such transfers. The CPU is required to transfer data in increments of four bytes. If the total number of bytes moved for a BLT is not a multiple of four, the CPU must write ‘extra’ bytes."

There is also BLT Raster Operation Register with a long list of possible computations, maybe one could be adopted for drawing sprites removing the need for slow read from Video memory.

I think this is not useable for rendering since this only copies data, it's not able to scale a column. But it's useful to accelerate copies from the backbuffer to the VRAM in FastDoom modes 13H and VBR. Will take a look in depth.

https://www.youtube.com/@viti95