Reply 20 of 48, by Oetker
- Rank
- Oldbie
https://fabiensanglard.net/ega/ this includes a quote from Carmack about one of the workarounds.
https://fabiensanglard.net/ega/ this includes a quote from Carmack about one of the workarounds.
rasz_pl wrote on 2023-09-06, 20:23:On cards that do not generate Vsync interrupts would an AND gate on VGA V H and going straight to one of ISA interrupt lines accomplish same thing?
There was a brief conversation in The myth of the vertical retrace interrupt on EGA/VGA about adding support for vertical retrace signal to cards that don't have it.
For timing purposes, unfortunately vertical retrace signal does not help with synchronizing scrolling (for all VGA cards), since different cards latch to the two DS and HS scroll registers at different times. For game logic timing, vertical retrace would help.
Oetker wrote on 2023-09-06, 20:37:https://fabiensanglard.net/ega/ this includes a quote from Carmack about one of the workarounds.
Fabien's website discusses the Adaptive Tile Refresh hack, and the later smooth hardware scrolling, and Tseng/S3 VisionAlliance Semi ProMotion SVGA quirk that prevented address space wraparound on those cards (effectively demoting them to the EGA card category, which also might not have wrapped, since they didn't decode the whole address space depending on amount of VRAM installed). I haven't really read anywhere about Carmack or Romero or other ID Software people actually address the technical implementation of synchronization for hardware scrolling - not that I'd expect to either, since these minute hardware level details really get lost in memory.
EDIT: My bad, I completely misremembered the S3 Vision for another card, and it was instead a Alliance Semiconductor ProMotion 3210 that had the wraparound bug. See Which >256KB SVGA adapters forgot to implement the VGA address space wrapping after 256KB mark? for details.
Btw, going a bit on a tangent, when Carmack said in Lex Fridman's interview that Fabien quotes:
On some of those cards there was a weird compatibility quirk again because nobody thought this was what it was designed to do
it makes me feel quite sad he would simplify it like that after all these years for the audience. IBM engineers most definitely did have hardware scrolling in mind when they developed the EGA and VGA adapters. They went out of their way to make sure both CRTC and Attribute Controller subsystems *separately* had scrolling support in order to specifically provide pixel level panning, and they knowingly added support for sizing a larger virtual display width (framebuffer stride in modern GPU parlance), which is a necessity to be able to draw the narrow strips of offscreen video memory to get smooth horizontal scrolling.
The address space decoding wrapping around at 256KB mark was also natural and designed. When IBM increased the memory size from CGA to EGA, they had specifically added new registers to control whether the address space should wrap or extend (Register 3D4h/17h bit 5, Address Wrap Select) - they knew all about this type of behavior potentially becoming an issue in the future. My impression is strongly that it was not at all that "nobody thought this was what it was designed to do", but it was "everyone thought this was what it was designed to do". From the selection of SVGA cards that I have, it looks more like the vast majority of early >256KB adapters that shipped got the wraparound right from the first go (and added similar wrap address beyond 256KB mark extension registers), it was only few, e.g. Tseng and S3 in their first Vision cards that dropped the ball here.
It does make for a great "hero" story to tell, "we invented hardware scrolling on the PC", similar to Abrash being attributed to have "invented" Mode X, even though the reality was that these were all features that IBM intentionally decided and architected into their graphics adapters. (It was rather the "Chained" mode that was special, "Mode X" was the "Normal" operating mode). Maybe IBM didn't have a good developer documentation network back then, or these game developers never thought to ask IBM about the documentation that led them to need to independently discovering what the functionality of the hardware is.
Carmack and Abrash's achievements were really "we figured out how the hardware is supposed to work without reading the documentation". Compare these to e.g. Commodore 64 VSP Scrolling: https://kodiak64.com/blog/future-of-VSP-scrolling , or the 1024 colors on CGA: https://int10h.org/blog/2015/04/cga-in-1024-c … de-illustrated/ . Now those are what real heroic "We made the hardware do what we want" tech features are like!
(tangent mode off, I love Carmack and Abrash and Keen)
clb wrote on 2023-09-06, 21:32:Carmack and Abrash's achievements were really "we figured out how the hardware is supposed to work without reading the documentation". Compare these to e.g. Commodore 64 VSP Scrolling: https://kodiak64.com/blog/future-of-VSP-scrolling , or the 1024 colors on CGA: https://int10h.org/blog/2015/04/cga-in-1024-c … de-illustrated/ . Now those are what real heroic "We made the hardware do what we want" tech features are like!
(tangent mode off, I love Carmack and Abrash and Keen)
real heroes respond to "RTFM" with "GFY"
maxtherabbit wrote on 2023-09-07, 01:22:real heroes respond to "RTFM" with "GFY"
Yeah, until "kaboom!" or "white smoke" happens. After that the "real hero(es)" becomes silent and usually thinks "why the f**k I didn't first "RTFM" ?!!" 😁
P.S. Just out of curiosity: Anyone ever successfully used in-game menu option for "fixing" this bug/problem?
from СМ630 to Ryzen gen. 3
engineer's five pennies: this world goes south since everything's run by financiers and economists
this isn't voice chat, yet some people, overusing online communications, "talk" and "hear voices"
analog_programmer wrote on 2023-09-07, 02:05:P.S. Just out of curiosity: Anyone ever successfully used in-game menu option for "fixing" this bug/problem?
The Fix Jerky Motion option does help the game look for longer blank periods (doubled from 5 to 10 port reads long), helping it not mistake a hblank for a vblank as easily, so it does help drawing be more often synchronized to vblank. It did work to 100% synchronize the scroll register writes on my PCI ATI Mach64 card on a 80MHz 486.
Though I'd presume that very fast 66 MHz bus PCI cards or AGP cards might be able to perform even ten I/O port accesses within a hblank, so the Fix Jerky Motion option would not do enough to help synchronizing to vblank there either.
But even if scrolling registers are written in a correctly synchronized manner, the game does still exhibit frequent stutter, because the game logic timing is decoupled from vsync. That unfortunately does mean that even if the PC is fast enough that Keen always hits its vblanks, it will still never play completely free from microstuttering, since the PIT0 timer advances will periodically straddle the vsync, leading to a skipped frame.
^And that's why DOS-based DOS-PC emulators are techically somehow neat (DOSBox+HX Extender, QB8086). 😁
But really, wouldn't it be technically possible to trap EGA/VGA calls and emulate their video logic in software, then output the image via, say, VESA VBE ?
Maybe by using a memory manager like QEMM (has largest API/ABI) to trap things by using V86 or VME ?
There were programs that could do virtualize CGA/VGA on a 386+. PC-MOS/386 comes to mind, which had terminal support.
Or DESQView, which could run multiple graphical programs (afaik). Or Wendin DOS (not sure), another multi-tasking DOS ?
Edit: That's not completely on topic, maybe, but I vaguely remember something about VGA wrap-around/hardware-scrolling from reading the web.
The Japanese versions of DOS/V handled scrolling differently (/V = VGA). MS-DOS /V version 5 used VGA's hardware feature, while 6.20 used a software solution.
- I'm sorry, I forgot about the details here. But it's interesting that never versions nolonger relied on the VGA hardware.
Maybe, because VGA hardware became seemingly unreliable, as time went on ? Newer graphics cards nolonger being accurate here ?
It comes to mind, because Japanese DOS works in a different way, some versions shipped with a memory manager based on a hacked Windows 3.0 kernal, even.
Edit: My bad. The Windows 3.0 thing was on the PC-98 platform (article/blog), not DOS/V platform.
That being said, these things just came to mind while reading this topic. Please just ignore and go on. 😀
"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel
//My video channel//
clb wrote on 2023-09-07, 07:32:The Fix Jerky Motion option does help the game look for longer blank periods (doubled from 5 to 10 port reads long), helping it not mistake a hblank for a vblank as easily, so it does help drawing be more often synchronized to vblank. It did work to 100% synchronize the scroll register writes on my PCI ATI Mach64 card on a 80MHz 486.
Though I'd presume that very fast 66 MHz bus PCI cards or AGP cards might be able to perform even ten I/O port accesses within a hblank, so the Fix Jerky Motion option would not do enough to help synchronizing to vblank there either.
But even if scrolling registers are written in a correctly synchronized manner, the game does still exhibit frequent stutter, because the game logic timing is decoupled from vsync. That unfortunately does mean that even if the PC is fast enough that Keen always hits its vblanks, it will still never play completely free from microstuttering, since the PIT0 timer advances will periodically straddle the vsync, leading to a skipped frame.
Thanks for your very detailed technical answer! So, the in-game "fix" actually has a minimal chance of working on 100% with most video cards. I think out there were fan-modded CK4-6 .EXEs with some working fix, but I've never tried them. It will be interesting to see what is their solution to the problem.
from СМ630 to Ryzen gen. 3
engineer's five pennies: this world goes south since everything's run by financiers and economists
this isn't voice chat, yet some people, overusing online communications, "talk" and "hear voices"
maxtherabbit wrote on 2023-09-07, 01:22:clb wrote on 2023-09-06, 21:32:Carmack and Abrash's achievements were really "we figured out how the hardware is supposed to work without reading the documentation". Compare these to e.g. Commodore 64 VSP Scrolling: https://kodiak64.com/blog/future-of-VSP-scrolling , or the 1024 colors on CGA: https://int10h.org/blog/2015/04/cga-in-1024-c … de-illustrated/ . Now those are what real heroic "We made the hardware do what we want" tech features are like!
(tangent mode off, I love Carmack and Abrash and Keen)
real heroes respond to "RTFM" with "GFY"
Speaking of, doesn't Billy (Cmdr. Keen) read an book in-game ? 😁
If the player does nothing for a while (in episode 4), he sits down and reads that book.
I always thought he was reading the f*cking manual here.
Because, since the proportions are so large, it possibly can't be a comic book.
Btw, in my, uh, cultural area, GFY alone isn't enough. Usually, person A tells person B explicitly where to GFY.
The knee is one of the many possibilities (verbally), for example. 🤣 Ok, enough of that. I'm out. Just came to mind. 😅
"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel
//My video channel//
analog_programmer wrote on 2023-09-07, 08:01:Thanks for your very detailed technical answer! So, the in-game "fix" actually has a minimal chance of working on 100% with most video cards.
Yup. With most modern video cards. Original Keen 1.0 vsync code was a hack (had they benchmarked the hblank length in terms of I/O reads dynamically e.g. at Keen startup, or on the fly in the blanks, I would have considered it a good solution, but they just hardcoded a "5 I/Os should do it" check), and the later "let's make it 10 I/Os" Fix Jerky Motion patch was a hack on top of a hack as well. (of course, it is easy to have hindsight on this, since we see how the hardware ended up evolving, but even back then, I think it would have been a bit icky to assume that the length of a I/O port read would be guaranteed a constant)
Out of curiosity, I did a quick test on my 486 system how long do each hblank/vblank period actually take in EGA Mode 0Dh (320x200 16 colors mode that Keen uses):
On a Paradise ISA VGA card, with running ISA bus at 4.902 MHz, I get hblanks to be 2-3 clocks.
Upping the ISA bus to 6.522 MHz, I/O port operations become faster. Hblank still takes a constant time, so one can do more I/Os in a hblank, this time 3-4 hblanks.
And then clocking the ISA bus up to the maximum that my motherboard allows, 10.008 MHz, the board is able to do 3-5 ISA bus I/Os within a hblank.
So on this board, the original 5 waits without Fix Jerky Motion in Keen will just be enough to work. (although Keen code is in assembly, this test was in C++, I know from reading Turbo C++ disassembly that it misses a couple of optimization opportunities, leading to a few instructions extra)
Then finally, looking at the ATI Mach 64 on a 33MHz PCI bus. One can do 6-8 bus I/Os within a hblank:
and the original hblank code will no longer work, but the Fix Jerky Motion "fix" is then needed.
I don't have 66 MHz PCI or an AGP setup at hand right now to see what would happen there, but would not be surprising to see about 2x I/O performance on those systems, which will break past even the delays that Fix Jerky Motion did.
analog_programmer wrote on 2023-09-07, 08:01:I think out there were fan-modded CK4-6 .EXEs with some working fix, but I've never tried them. It will be interesting to see what is their solution to the problem.
Yeah, K1n9_Duk3 observed in https://pckf.com/viewtopic.php?t=6192 that before Keen 4 1.0 release from November 22, 1991 (that is the first hacky "wait 5 I/Os" synchronization code), the "1.0 Special Demo Version" released by FormGen just a month earlier ( https://keenwiki.shikadi.net/wiki/Keen_4_Versions ) contained a completely different piece of code for scroll register write synchronization. Disassembling and cleaning up the code in that version, it reads
so what he did was he extracted that code out and crafted a load time memory rewrite patch to different Keen games (and others made with the same engine) that replaced the "new" sync code with this "old" form.
Looking through the above code, it exhibits an interesting separated "two-stage" sync mechanism. The DS register is updated during start of a visible scanline (good choice, since cards latch that either at start or end of vsync), and then the HS register is updated during start of vertical retrace. (Not the best choice, since that will be out of sync e.g. on Tseng ET6000, which latches on to both DS and HS registers at start of vsync, so this code will glitch badly on Tseng ET6000).
It is interesting to observe that ID had changed the sync code so close to release, which suggests that they were struggling with this area right until the last minute. (which, again, is no wonder, since now with the help of CRT Terminator in the table in comment Re: Solved(?): Why does Commander Keen 4-6 hardware scrolling glitch on ATI (Mach) PCI video cards we are able to observe exactly how diverging the behavior is across different cards)
Why didn't ID ship with that above synchronization code in FormGen 1.0 Special Demo release then, if it works so much better?
Maybe it was that they had Tseng cards and noticed they were glitching with this version, and thought something had to change to support Tseng. Or maybe the reason is that they benchmarked that this sync code had worse performance than the Keen 1.0 release sync code. This code always waits at least a duration from an active visible pixel up until start of vsync, i.e. the whole vertical front porch area, which is typically 6 scanlines long in Mode 0Dh. The Keen 1.0 sync code does not have that issue.
Or maybe they were running into issues with long interrupts, since they do need to serve interrupts in the timing sensitive wait loop for the vsync part. Not quite sure.
In any case, for fast PCs and any other VGA card that latches to scrolling different than Tseng ET6000, this demo sync code works much better.
Jo22 wrote on 2023-09-07, 07:46:Edit: That's not completely on topic, maybe, but I vaguely remember something about VGA wrap-around/hardware-scrolling from read […]
Edit: That's not completely on topic, maybe, but I vaguely remember something about VGA wrap-around/hardware-scrolling from reading the web.
The Japanese versions of DOS/V handled scrolling differently (/V = VGA). MS-DOS /V version 5 used VGA's hardware feature, while 6.20 used a software solution.
- I'm sorry, I forgot about the details here. But it's interesting that never versions nolonger relied on the VGA hardware.
Maybe, because VGA hardware became seemingly unreliable, as time went on ? Newer graphics cards nolonger being accurate here ?
It comes to mind, because Japanese DOS works in a different way, some versions shipped with a memory manager based on a hacked Windows 3.0 kernal, even.
Edit: My bad. The Windows 3.0 thing was on the PC-98 platform (article/blog), not DOS/V platform.
Now that you mentioned Japanese DOS I do remember Japanese text scroll speed comparison. I recently saw it again in Asianometry video: https://www.youtube.com/watch?v=CEtgzO-Im8w&t=1264s something about NEC modifying DOS to use hardware scrolling on NEC computers.
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor
Oh interesting.. looks like I am able to make Keen 4 go smooth at 70fps fully synchronized to vsync even without needing to rely on CRT Terminator's hardware frame/vblank counter register. No microstuttering in sight, and no more of that "cinematic" 35fps. :p
Hmh, darn, I kinda wished that the hardware frame vblank synchronization register could have been CRT Terminator's killer feature...
Or, maybe I'll have to bury this non-CRT Terminator requiring implementation and pretend it didn't exist.. I bet that's what big vga would do. 😁 (just kidding, I will push the code to GitHub at some point)
As you noted on PCKF, the game doesn't really work properly at 70Hz: things like ledge-grabbing become quite buggy, and the physics behaves slightly differently.
This is, in part, because the game actually has two ways individual entities (objects / sprites / actors) can be updates: one of which operates on the game's 'tick' clock, and one of which is called once per rendered frame, regardless of how much in-game time has elapsed. If you increase the framerate to 1 tick/frame (70Hz), the per-frame updates now happens at twice (or more) the rate compared to the tick-based ones. So 35Hz is as high as you can go while maintaining the physics. Also, things like demo playback are locked to 3 ticks/frame, and can't go faster as they require the deterministic simulation.
So 70Hz Keen seems doomed to remain a curiosity, rather than a good way of actually playing the games. Omnispeak does support this with the undocumented "vl_minTicks = 1" config option, and the old Steam release of Keen Dreams added an F10+V cheat, which both ran the game at 70Hz and made horizontal scrolling run per-pixel instead of every two pixels. (It also disabled the sprite shift emulation, so sprites could move on pixel boundaries, not on 2/4/8 pixel boundaries depending on how many bitshifted copies there were).
It's also worth noting that Keen Dreams has yet another slightly earlier version of the vsync code: it's basically the same as the Keen 4 Special Demo, but doesn't have any check for the total elapsed ticks.
Also of note: the Keen games do use frames to time some delays, mostly for UI/Cutscene purposes. The WaitVBL (Keen 1–3) / VW_WaitVBL (Keen 4–6) function just waits until vblank is disabled, then enabled, in a loop. You can also force Keen 4–6 to wait for extra VBlanks each frame with this function using the F10+V cheat.
Finally, the PIT-based timer code uses 1192030 Hz as the assumed base timer frequency, but most documentation for the PIT gives a frequency of 1193182 Hz. Patching that might make the microstutters less frequent.
I think it was K1n9_Duk3 who was able to get Foray in the Forest (a Keen Galaxy source mod) to run at 70 FPS. I playtested it at that rate and it worked well.
World's foremost 486 enjoyer.
Yeah, I have probably been rediscovering already identified information.
I ended up building a version that removes the Fix Jerky Motion hack, and replaces the bad vsync code with a proper "best practices" version (that I consider one to be) that does not need a toggle option.
Also vertical refresh rate is benchmarked and Game logic is locked to 35 Hz or 70 Hz vertical refresh if it matches. If on 60 Hz EGA, the game logic is then decoupled like before.
Shrinkwrapped these into toggleable options in the menu:
where Full Frame Rate provides 70Hz or 60 Hz game updates, and disabling it provides the "Cinematic" 35 Hz VGA /30 Hz EGA.
1 Pixel Panning gives pixel-perfect horizontal panning, whereas disabling it gives the default 2 pixel granular panning. Although I don't think I have stamina to investigate fixing the shift caching of sprites. (the precomputed 0,1,2,3 pixels shifted variants should be expanded to 0-7 pixels shifted variants) I presume the game could run out of base memory when adding extra caches. Although maybe only Keen sprites and the scoreboard would need to be shift cached for that to work.
The 1 vs 2 pixel shifting does affect game logic as well, as the shifting factors into Keen's x coordinates, so small glitches could occur.
Anyhow, it was fun hacking on the game code a bit. With these changes, the game does get quite a bit closer to feeling smooth like Jazz Jackrabbit.
To close out that trail of development, I pushed the 70Hz running Keen to GitHub at https://github.com/juj/KEEN70HZ . It works nicely on DOSBox as well.
There is a prebuilt package that can be downloaded at https://github.com/juj/KEEN70HZ/raw/main/DIST/KEEN70HZ.ZIP .
I recently created a game engine for EGA/VGA and slow 286/8088-86. I implemented the hardware scroll after watching a video from root42 on youtube.
This implementation seems to work on all real VGAs which were tested, also on emulated VGAS (Dosbox-x, PCEM,86Box).
Emulated EGA on Dosbox-x updates the PEL register every scanline, and this scroll is also working well on it.
This is the engine: https://github.com/mills32/Little-Game-Engine-for-VGA-EGA
And the scroll code:
byte p[4] = {0,2,4,6};
byte p1[8] = {0,1,2,3,4,5,6,7};
byte pix;
void LT_WaitVsync_VGA(){
word x = SCR_X;
word y = SCR_Y;
//
y*=LT_VRAM_Logical_Width;
if (LT_VIDEO_MODE == 0) y += x>>3; //EGA
if (LT_VIDEO_MODE == 1) y += x>>2; //VGA
//change scroll registers:
asm mov dx,003d4h //VGA PORT
asm mov cl,8
asm mov ax,y
asm shl ax,cl
asm or ax,00Dh //LOW_ADDRESS 0x0D
asm out dx,ax //(y << 8) | 0x0D to VGA port
asm mov ax,y
asm and ax,0FF00h
asm or ax,00Ch //HIGH_ADDRESS 0x0C;
asm out dx,ax //(y & 0xFF00) | 0x0C to VGA port
//The smooth panning magic happens here
asm cli
//Wait Vsync
asm mov dx,INPUT_STATUS_0
WaitNotVsync:
asm in al,dx
asm test al,08h
asm jnz WaitNotVsync
WaitVsync:
asm in al,dx
asm test al,08h
asm jz WaitVsync
asm mov dx,INPUT_STATUS_0 //Read input status, to Reset the VGA flip/flop
if (LT_VIDEO_MODE == 0) pix = p1[SCR_X & 7]; //EGA
else pix = p[SCR_X & 3]; //VGA
asm mov dx,0x03C0
asm mov al,0x33 //0x20 | 0x13 (palette normal operation | Pel panning reg)
asm out dx,al
asm mov al,byte ptr pix
asm out dx,al
asm sti
}
First update the Start address (it looks like no EGA or VGA can change this outside vsync).
Then disable interrupts and wait for vsync.
Finally change PEL register just at the start of the new frame.
On more recent SVGA cards (including modern GPUs) it still works very smooth, if you boot from freedos in an USB. Other cards may require first waiting for vsync and updating PEL and start address at the same time (at the start or end of a frame).
Fantastic job mills26. Found a YT clip recorded on XT clone
'Szok! Little Game Engine na 8088/8086/286+ [EGA/VGA]' by Retro-Piwnica https://www.youtube.com/watch?v=t98OKbYonQI
@11:10 author turns off Turbo, game still runs acceptable on 4.77MHz on VGA 😮
Sadly cant find a recording of it running on XT with EGA.
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor
rasz_pl wrote on 2023-12-30, 01:30:Fantastic job mills26. Found a YT clip recorded on XT clone 'Szok! Little Game Engine na 8088/8086/286+ [EGA/VGA]' by Retro-Piwn […]
Fantastic job mills26. Found a YT clip recorded on XT clone
'Szok! Little Game Engine na 8088/8086/286+ [EGA/VGA]' by Retro-Piwnica https://www.youtube.com/watch?v=t98OKbYonQI
@11:10 author turns off Turbo, game still runs acceptable on 4.77MHz on VGA 😮
Sadly cant find a recording of it running on XT with EGA.
Cool 😀.
For the moment we have to rely on dosbox-x and 86Box for a "real" EGA test 🙁