High-color video playback on STE

Cyg · 29-10-2012, 23:51:40

Hi Petari !

Sorry for the mute since the last weeks but I had a baby in between, my son came sooner than expected !

I just watch 2 of your trailers, Skyfall and SW, really great ! But if I think that the quality can be improved as most of the frames I captured use only 200 colors out of the 4096...

Could you send me some of your frames, so we can compare the result ?
I improved my algo in quality by few % but it takes longer time to encode, moreover I did it in VB.Net which is not optimal by design :-)

Can you still increase the bandwith to manage fullscreen display, something like 416x270 / 48 colors per line = 56160 + 25920 = 82 082 bytes per frame ?

I will prepare the replay package for 320x200 / 54 colors + 2 global color

CU
Cyg

Petari · 30-10-2012, 11:05:29

Congratulations for your son, Cyg :-)
I think that 4096 colors is little misleading - more important is colors/line value. As it is limited to around 50.
Plus, by darker scenes, we have only few bits for colors, and it results in 'stainification' of pic. If you have some better term for it, let me know.
So, I think that error diffusion is what made big improvement. Even if it is not perfect yet. And it is good to lighten little pic when is dark - will look better overall.
I can send you lot of frames. What format you want ?

No way to achieve 25fps 4096 color anim at higher res - at least not without some serious mods, updates in STE.
We can raise horizontal res, but vertical not. Beause then will be even less time to load bitmap data. With 270 lines, you'll have only some15% time for loading, while need more data than with less lines.
Even horizontal is possible problem - because then disk may not prepare data enough fast. Currently, at 320px I have about 20microSecs pause between sectors. It is enough with fast CF cards. But under 10 will be for sure not enough.

What I see as perspective is to stay at 320x200, but going on more colors/line. Fast load with CATA allows about 70 colors/line.
Using movem.w (a0),d0-d7/a1-a7 transfers very fast 32 bytes (yes, because bug in CPU) directly from IDE to shifter - a0 is here destination $FF8240 . It takes only 72T states. Just repeat it 4 times in row, + one with less regs at end.

Cyg · 30-10-2012, 15:34:07

Thanks a lot :-)

I understand your bandwith issue, it would have be nice to have a 16/9 display in 400x200 for instance.
When you will be managed to display 70 colors per line, it will not testable on any other computer or emulator than yours, but I could adapt my converter after some few test with you, from a captured color test with pixel measures or maybe your more fluent at counting the cycles :-)

During your display at 25fps, I suppose you can not go up to 50fps because you need the top&lower borders during 2 vbl to fully load the next picture, about 2 * 80 lines * 512 cycles per line) ?

About the rendering, I just test your sample to show you my results :
- the original 4 million colors "perfect" pic
- a post-dithered picture with a floyd-steinberg algorithm using the existing colors
- a pre-dithered picture with a simple algorithm which prepare the original picture by choosing the cell or the floor value during the 4096 colors rounding method, and exchange the rule every pixel.

I ran these tests at work but my HigheSTcolor executable here is not perfect, it doesn't use the 2 global colors at all, the quality should improve by little with this 2 additional colors but I need to release it at home.

Original, all sizes are doubled to display the details

MPP result as previously generated by Petari

HigheSTcolor result, no dithering

HigheSTcolor, with post-dithering

HigheSTcolor, with pre-dithering

For darker pictures, I think both MPP & Photochrome seem to have a global illumination issue, the result is darker than the original, which does not help on this difficult case to optimize.
The dithering helps a lot in case of low color pictures.
Example:

Original, a dark capture (bad quality and blurry...)

HigheSTcolor, no dithering: bad result

HigheSTcolor, with post-dithering (floyd steinberg)

HigheSTcolor, with pre-dithering (round at cell or floor on odd or even pixels)

If from a frame to the next frame, you choose the pre-dithered picture with method 1 then the pre-dithered picture with method 2, it could lead to a feeling of mixed of the both, especially for static sequences or backgrounds. Maybe it's not so obvious at 25 fps, but it is at 50 fps.

HigheSTcolor, theorical result of a fast display between 2 slightly different pre-dithered pictures

You could send me a few seconds sequence (100 frames ?) in PNG format or zipped bmp in 320x200 and maybe some of the results to compare.

Cyg

Petari · 30-10-2012, 15:56:52

400x200 is maybe possible. But let's solve some simpler things for beginning.

50 fps is too-too much. During playback, I load half of bitmap of next frame in first border period (bottom, then top), in second one second half of bitmap + audio (1KB) . There are some paddings, because must maintain whole sector loadings. Color data (same one) is loaded 2x in order, directly to shifter, and while it, data rate is not so critical, only that disk must not make pauses between sectors.
With 320x200, synced color loading during scanlines is a must, because there is only some 33% of time available for over 800KB/sec. With 320x160, you can load everything in RAM, and use normal hi-color code, because have almost 50% of time for loading.

I will post you some sequences.
My eyes say that post dithering looks little better - but all this may be little subjective.
In any case, for animations we can not use 50 fps. And not 30K color modes too. Except : what may be interesting is 12.5 fps with 30K color modes - for some cartoons, which are made at 12 fps .

Petari · 31-10-2012, 10:55:15

Here are 100 pic sequences: http://atari.8bitchip.info/AVA2.ZIP
http://atari.8bitchip.info/LGR.ZIP
http://atari.8bitchip.info/UT2.ZIP

8MB is len of each. I choosed some harder cases - Avatar scene here is much harder case than flying. By LGR error diffusion is crucial. By UT2 too, I guess.
Just few samples done with other hi-color solutions: http://atari.8bitchip.info/hand.ZIP
MPP lacks error diffusion, otherwise makes very accurate conversions. Don't know will Zerkman deal with error diff. Douglas Little working currently on fixing problems (mostly hor. strikes appearing) with PCS, so let's wait little - I just received some test v., but will test it later.

Cyg · 31-10-2012, 15:57:24

Thank you Petari, I am going to process your pictures but my main issue is the overall performance of my optimizer. I decreased it yesterday night by 3 but it is still too slow (VB.net...). I am thinking about converting it to C++ now that it's stable, even if I am not fluent at C++.
I have one inner loop that takes 90% of the overall time consumption for only 50 lines of code for the whole function, it would have been easier if I could have made a specific code in C or C++ for this function only and call it into my VB.net code.

I had an idea during my long walk this morning due to metro issues, can you confirm that at every fps, you read the full palette informations, meaning that you read twice the same data every 2 fps and in realtime (during the scanline) ?
If yes, then you could easily read 2 full different palette informations if they are sharing a same bitmap ?

If yes, then we can achieve a 29000 colors display at vbl framerate, leaving a smoother idea and more colorful :-) Something like the fake 100hz or 200hz TV.

1 bitmap + 2 palettes is the way I do crossfade between 2 pics: the same bitmap/pixels are shared between 2 different palettes. 1 pixel has 2 possible color, the first lead to the first picture, the 2nd to the 2nd pictures, it just a matter of optimize for 6 dimensions and not 3 dimensions (RVB values per picture).
Then I just need to fade between every pairs of colors, but you don't even need to fade for 2 frames, you would only need to read the 1st pal then the 2nd pal.
The optimisation is harder/longer but the overall result is better, as you can see when I simulate the result that our eyes should see at 50 fps :

I keep on doing test for optimization, here are some new results from your samples :

Original

Error diffusion

MPP result

higheSTcolor basic result

higheSTcolor pre-dithered result

higheSTcolor post-dithered result

higheSTcolor theorical result for 1 pix + 2 pal as detailed above

Original

higheSTcolor

higheSTcolor pre-dithered

higheSTcolor post-dithered

Cyg

Petari · 31-10-2012, 17:49:14

Yes, I load every color data 2x in row - it means in case of PCS 2x20KB (longer because of paddings) in 1/25 sec - but there is time for it.
Using different palette infos for same bitmap then is great idea. Of course, it will increase file sizes, but that should be not problem.

Considering speed problems: I'm not some good C programmer. Only what can say is that crucial parts, which execute most of time and many times should use minimum function calls, best is not at all - at price of some more lines slow parameter transfers etc. will be avoided.

Cyg · 31-10-2012, 17:55:33

I just add another sample, generated with 1 bitmap and 2 pals mixed together to simulate the result for our eyes :

Don't know if it's really better but blurry background is a nightmare to optimize with low number of colors.
I think the better is to check it during the animation and not only on a static picture, the same for the pre-dithered or post-dithered method.
I will run the optimisation for each kind of method but you have to be patient has it will take a long time to do.
I am on weekend during the next 4 days, it should be over when I will come back, I will also send you the read & replay code.

Cyg

Petari · 02-11-2012, 14:24:07

I did ROM code for 80 colors/line. It is stable.

Test patterns - STE video capture

res is 640x396, on STE it is 320x198, so every px. doubled h-v.

There is update of 80 colors during 1 line.
With movem - all 16 colors at once. 72T states.
72 px - nice visible at bottom.
Used 5 different color gradients, to see what palette is where.
First, what need to be preloaded is grey, then green,
then red, then blue and yellow as last. At end of line loading
grey for next line. Color 0 in it is border color too.
Bitmap indexes are in order: 0,1,2..15,0,1... 15 as px 319.
It stays for fist 10 lines, then all is shifted 1 px left,
so lines 10-19 start with 1. Lines 20-29 start with 2, etc.
Lines 160-169 are same with 0-9.
After it, all bitmap is 0.

With delay 4T in compare to PCS timing - PatD4.png:

On shot is well visible color distribution.
Grey palette is less present than others,
because is gradually overwritten by green one.
Therefore, I think that start point should be moved
by some pixels right (adding some delay in code)
+28T/px seems as reasonable. - then we will have perfect
split on lines. Currently, there is 28px px black at end.
Of course, then there will be less 'yellow' palette, but can
not better - 320 is not divideable by 72 .

By me, the best timing, 32T - PatD32.png .

File struct for 80 col/line :

198x160 bytes bitmap data.
Padded to $7C00. Last 64 bytes contain initial color data.
First palette and optional second - if use 2 different palettes
for same bitmap, for getting better colors .

Color data for 198 lines:
160 bytes for 1 line. 3 lines together with 32byte padding
make 1 disk sector. This is reason why 198 lines 'only' -
to be divideable by 3.
So, 66x512 bytes with paddings. But let it be task of SW
for joining pictures in anim file...

So, 198x160 bytes are color data. Last 32 bytes must be
all same, and choosen border color - so will not be visible
line 199. (where initial colors for line 1 are) .

Cyg · 02-11-2012, 17:16:38

Hi Petari

Very impressive performance results ! 80 colors for 320 pixels will lead to a quality equivalent to my 192 pix with 48 colors, in average 1 color every 4 pixels.

I already have a full-set of Avatar 2 calculated with the pre-dithered method, it looks very nice. Tonight I am sending you this set, maybe another with post-dithered and the display source code + explanations of the format. Double palette for 1 shared bitmap is also under process but it takes longer to calculate because of its complexity, I will send you the current results for you to compare the results of each method (post dithered / pre dithered / double palette 29k colors).

I will read very carefully your screen-test above and will implement this timing in my optimizer, I propose you to make a test on one picture first to validate that cycles are in phase with pixels.

Best regards,
Cyg

Cyg · 02-11-2012, 23:27:36

Petari, what about the background color in the borders ?

If you want a black border I have to force index 0 to 0 on the 1st pal for the left border and on the 5th pal for the right border and the beginning of the left border on the next line.
Unless, borders will have horizontal colored lines.

Cyg

Cyg · 03-11-2012, 03:37:20

Petari,

Here a 1st package : http://www.top250filmsdvd.com/atari/Petari video HigheSTcolor.zip

The source code is rought from my last demo, with a quick code cleaning but there is a lot of unuseful code, sorry, but it should only display the selected file.

VBL : the replay routine, really starts at prestab_pic
BSS.s : to configure the files to display

The higheSTcolor files are, for Avatar 2 :
- one no-dithering sample
- one post-dithering sample
- 100 pre-dithering sample, ready for video :-)
- one double palette sample

I will post the 100 pictures for each mode later tomorrow,

Cyg

Petari · 03-11-2012, 18:09:19

I made table for pixels 0-319 and indexex 0-15, and what table to use.
5120 bytes. Will post it later.
Will check your stuff ...

Cyg · 03-11-2012, 20:54:41

Hi Petari,

Yes a table will help, you confirm me to force to black index 0 on left & right borders ? I am preparing the trick in the converter.

Here the 2 last methods results :
http://www.top250filmsdvd.com/atari/AV2_1pix2pal.zip : Avatar 2, 1 bitmap shared between 2 palettes for a 29 000 colors feeling at 50fps
http://www.top250filmsdvd.com/atari/AV2_Flat and post-dithering.zip : Avatar 2, 1 no-dithered bitmap, 1 post-dithered bitmap, 1 palette (identical for both bitmap)

Cyg

Cyg · 04-11-2012, 01:58:20

Petari,

I started configuring a 80 colors/line optimization, the first result shows a 10% decrease in the global error (euclidian distance to the original), so don't expect a really better rendering, except in some details and for the double palette method (-30%) as combinations are more numerous whereas colors remain as important.
The main issue for improving the quality is the number of distinct colors, 4096 are too few.

higheSTcolor, 54 colors per line

higheSTcolor, 80 colors per line, the nose and front is better, no change in the background

higheSTcolor post-dithered, 54 colors per line

higheSTcolor post-dithered, 80 colors per line

higheSTcolor pre-dithered double palette mix result 29k theoric colors, 54 colors per line

higheSTcolor pre-dithered double palette mix result 29k theoric colors, 80 colors per line

The files for your test : http://www.top250filmsdvd.com/atari/80 cols.zip
Pix files size 32000 bytes, the pal files should size 80*2*200=32000 bytes but I have kept the 2 global color information at the beginning despite it is not useful because index 0 and 1 are used in each of the palettes per line, so the pal size is 32004 bytes and the needed colors starts at 4, not 0.
2nd pal starts at pixel 32 like in your parD32 capture and a palette last for 72 pixels, but I am not sure if the absolute pixels is 31 or 32 due to a blur during the resize process.

Best regards
Cyg