VOGONS


Reply 20 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

Probably for us the most important is 16bit as if I remember right, DOS games and apps usually were in 256colors, and 16bit. Do not remember any game which by default was in 32 bit and give no ability to choose 16 at least. At least those games which not ask for 3d accelerator.

Reply 21 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie
rzookol wrote on 2020-01-26, 19:37:
Fixing it is easy in render_templates.h: […]
Show full quote

Fixing it is easy in render_templates.h:

#if SBPP == 15
....
#elif DBPP == 16
#define PMAKE(_VAL) (((__builtin_bswap16(_VAL)) & 31) | ((__builtin_bswap16(_VAL)) & ~31) << 1)

#if SBPP == 16
...
#elif DBPP == 16
#define PMAKE(_VAL) __builtin_bswap16(_VAL)

Tried : nope, didn't change things. Maybe we not everywhere change it? At least just in those 2 places in render_templates.h, and doing full recompile didn't fix it

Reply 22 of 39, by rzookol

User metadata
Rank Newbie
Rank
Newbie

It depends on screen mode you use for fullscreen. Conversion for 32bit screen also should be added.

Reply 23 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

Ah do you mean it's for fullscreen only.. And yep, I running on 32-bit screens because I am on SDL2 patch for DOSBox assuming that you run on 32bit screen always.

Reply 24 of 39, by jtchip

User metadata
Rank Member
Rank
Member

It's been a while since I coded anything for PPC but I do recall that the Wii (and GameCube) framebuffer is in YUY2 format so something has to do the conversion from RGB to YUY2, probably the SDL backend, and must already handle endian differences.

Reply 25 of 39, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

The external framebuffer is YUY2, the internal framebuffer is RGB32. Textures can be 8/15/16/32 bit.

Reply 26 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

In other words, it is bugs (or just unsupported features) in DOSBox. As explain one of MorphOS developers: the reason likely stems from the fact that the Wii uses little-endian 15- and 16-bit modes, and the SDL implementation doesn't perform SDL_Surface shadow byteswapping as needed, in which case the missing byte swapping goes unnoticed. It'll break for Wii if/when DOSBox code is fixed.

Reply 27 of 39, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

You don't know what you're talking about. The wii doesn't have any little-endian video modes.
I already told you DOSBox doesn't output 8-bit - that means when the video mode is 8-bit, DOSBox is sending 16 or 32-bit output to SDL (depending on default desktop depth)... If it were the wrong format, even 8-bit modes would look messed up.

Reply 28 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

@Jmarsh
It's not me talking about, its other developer, I only quote what he says and especially point out that its quote.

But the truth is AmigaOS4, MorphOS, and Linux PPC (which is far more popular than AmigaOS4 and MorphOS): do have that problem. So are you mean that SDL ports on amigaos4, morphos, and Linux do wrong things, and only Wii SDL port does not? Roots of issue can be different, and not little-endian modes vs big-endian modes, but we have a problem for sure. Logically that if 3 oses have that bug, and 1 is not, they're something. Maybe you do something in your SDL which original SDLs didn't do, who knows, but if LinuxPPC has the same issue, then its quite a problem. Maybe all SDLs for all PPC oses except Wii need fixing, maybe DOSBox, maybe both, dunno. But as an issue not local to just some obscure oses, but also happens on LinuxPPC, there is a common problem.

Reply 29 of 39, by rzookol

User metadata
Rank Newbie
Rank
Newbie

@Jmarsh

Ok, so maybe I'll explain:
-It seems that Dosbox VGA emulation provide lines in LE format for 16bit
-Dosbox gui/scalers part doesn't convert endian, just pass data as SDL Surface (all these PMAKE defines)
-SDL Surface is always (or should be) in BE for PowerPC (https://discourse.libsdl.org/t/managing-byte- … d-surfaces/5470)
- Dosbox passes VGA LE data to BE SDL surface so on MorphOS/OS4/Linux PPC we had display issue in Surface mode
- If SDL on Wii has correct colors it means that Wii 16bit SDL Surface accept LE data and convert it (or not) internally to destination Wii hardware format), This seems to be not compatbile with libSDL API which require that SDL_Surface is BE on PowerPC.

8bit modes uses CLUT so handling is different and doesn't need to convert colors as Cloro Table can be created in correct endian.

Reply 30 of 39, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

The data going out is not the problem, it is the data being fed into the scalers. 8-bit mode still outputs as 16/32/multiple-byte words, the LUT isn't passed to SDL (even though the code is still present to do so).

I can see at least 2 bugs in the PMAKE defs unrelated to endianness and the conversions aren't even optimal (a 5-bit value upsampled to 8-bit should be (x<<3)|(x>>2)) so they definitely need changing, but that still won't fix SDL ports using SDL_surfaces where format->Rmask/Gmask/Bmask don't match the hardcoded redMask/greenMask/blueMask values at the top of render_templates.h.

Reply 31 of 39, by rzookol

User metadata
Rank Newbie
Rank
Newbie

Mask aren't enough as Green bits aren't placed together. I'll probably go hacked way and link special static SDL library (instead of dynamic one) so i can break SDL api

Reply 32 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

@Jmarsh
Thanks for fixes in the main DOSBox repo! I checked now PMAKE changes and yes, 15,16 and 32bit modes all render correctly now. Through capturing of screenshots via ctrl+f5 in 15/16 and 32bpp still save it as before with those ugly colors: http://kas1e.mikendezign.com/aos4/dosbox/badc … rl_f5_16bit.png

But that probably means changing in some different place?

@Dreamer,kcroft
And thank you to big-time for losing your time on it. Funny how different devs fix the same thing for about the same time 😀

Reply 33 of 39, by dreamer_

User metadata
Rank Member
Rank
Member
kas1e wrote on 2020-01-28, 11:59:
@Jmarsh Thanks for fixes in the main DOSBox repo! I checked now PMAKE changes and yes, 15,16 and 32bit modes all render correctl […]
Show full quote

@Jmarsh
Thanks for fixes in the main DOSBox repo! I checked now PMAKE changes and yes, 15,16 and 32bit modes all render correctly now. Through capturing of screenshots via ctrl+f5 in 15/16 and 32bpp still save it as before with those ugly colors: http://kas1e.mikendezign.com/aos4/dosbox/badc … rl_f5_16bit.png

But that probably means changing in some different place?

@Dreamer,kcroft
And thank you to big-time for losing your time on it. Funny how different devs fix the same thing for about the same time 😀

Well, we learned something about dosbox internals, so that's valuable 😀 Fix I suggested and fix that arrived to SVN are slightly different, but it doesn't matter - further improvement in here would require writing some micro-benchmarks for scaler code.

Yes, when going with this approach, additional fixes are needed in screenshot code and (probably) in zmbv code. SDL2 also needs to test opengles backend (if available on any PPC arch), as it uses a different texture format. EDIT: also, all other scalers need to be tested as well, especially rgb ones.

| ← Ceci n'est pas une pipe
dosbox-staging

Reply 34 of 39, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Nearly everything in hardware.cpp is endian-unsafe. Do not listen to a wav/avi recorded on a big-endian system unless you want to blow your speakers.
The AVI framework stuff was all rewritten before xmas so it will probably all get fixed at the same time.

Reply 35 of 39, by kas1e

User metadata
Rank Newbie
Rank
Newbie

Tested recording of the video when we in 16bit and in 32bit modes via that ctrl+alt+f5, and video records and plays on big-endian fine, but sound as jmarsh saying broken of course.

Reply 36 of 39, by rzookol

User metadata
Rank Newbie
Rank
Newbie

Hi, i need to make some tests but maybe this part in vga_draw.cpp would be a better place to do endian conversion, it would give clean BE output and doesn't require PMAKE changes, it could be also Altivec accelerated in easy way.
I'll make some tests and let you know (and need to check how memcpy is implemented because for ( ..) *dest++ = swap(*src++) can be slower than memcopying and doing it later in PMAKE

// unwrapped chunk: to top of memory block
memcpy(TempLine, &vga.draw.linear_base[offset], unwrapped_len);
// wrapped chunk: from base of memory block
memcpy(&TempLine[unwrapped_len], vga.draw.linear_base, wrapped_len);

Reply 37 of 39, by dreamer_

User metadata
Rank Member
Rank
Member
rzookol wrote on 2020-01-29, 10:17:
Hi, i need to make some tests but maybe this part in vga_draw.cpp would be a better place to do endian conversion, it would giv […]
Show full quote

Hi, i need to make some tests but maybe this part in vga_draw.cpp would be a better place to do endian conversion, it would give clean BE output and doesn't require PMAKE changes, it could be also Altivec accelerated in easy way.
I'll make some tests and let you know (and need to check how memcpy is implemented because for ( ..) *dest++ = swap(*src++) can be slower than memcopying and doing it later in PMAKE

// unwrapped chunk: to top of memory block
memcpy(TempLine, &vga.draw.linear_base[offset], unwrapped_len);
// wrapped chunk: from base of memory block
memcpy(&TempLine[unwrapped_len], vga.draw.linear_base, wrapped_len);

I'm not going to discourage you from trying - perhaps you'll figure it out and the results will be better, but when I tested a fix (not exactly the same) in vga_draw.cpp, it resulted in significantly lower performance and extensive tearing (I was not expecting additional tearing) . The scalers code do the conversion to SDL texture format anyway, so it makes sense to rotate the bits in there.

| ← Ceci n'est pas une pipe
dosbox-staging

Reply 38 of 39, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie

Here's a variable-sized byte-swapping wrapper for many SIMD implementations: https://github.com/Wunkolo/qreverse/blob/master/README.md

The author benchmarks the speed-up versus GCC's generically optimized bswap implementation, and the difference typically adds up to > 10-fold when running across 60,000 values (typical for us, given a 320x200 surface).

And that's with the benefit of performing all of those swaps in bulk, in a tight loop that all fits in cache, along with the source memory getting flawless prefetches.

In our case, this is part of a bigger structure and we probably don't have as favorable cache and prefetch behavior. So our hit is likely compounded.

Reply 39 of 39, by aqrit

User metadata
Rank Member
Rank
Member

> qReverse

This library seems to be aimed at doing a fullscreen "mirror" operation, which is not the operation that is needed here.
qReverse has SIMD for NEON/SSSE3+, but PPC would require AltiVec?
It looks like AltiVec does have a byte permutation (shuffle) instruction, though.