High-color video playback on STE

Started by Petari, 25-09-2012, 14:28:44

Previous topic - Next topic

0 Members and 4 Guests are viewing this topic.

Petari

I was offline during weekend. You did a lot in meantime, as I see  :)
Will look in it today afternoon.
Here is color/palette table:  starts at px 0, color idx 0, then idx 1 ... idx 15. Then px 1, idx 0,1,... Total 320x16 bytes.

http://atari.8bitchip.info/PXMAPD32.BIN

I made anim from 100 frames you posted (Ava2 scene) - looks pretty good. Need little to clean up it and add audio, then I will post it too. I used basically same code as with PCS playback except palette updates self, of course. There is centering by using Timer B. It is version for Steem, not real HW, so anyone can try.
  •  

Petari

Here is it:

http://atari.8bitchip.info/CYG160.ZIP  (3.5MB)

There is muxed AV file, playback SW with src. for Steem, Hatari (not for real STE) + PC SW for muxing.
  •  

Cyg

Very nice :-)

I think you choose to package the method with a dithering every 2 pixels.
I made it alternatively change the odd/even rule 1 frame every 2, to simulate a 25 fps 29k feeling but we can still a little see the dithered pixels.

Could you make it for the other methods, especially the other dithering and the "2 palettes 29k colors feeling" ?

I am working to adapt a simplified version of my .Net code to make it work in C for fast frame encoding purpose, not for other Hicolor FX that will remain in .Net.

Thank you
Cyg
  •  

Petari

#63
I did not have time to try yet alternating palette (29K feeling), but will do it this days for sure.

Considering border color by 80 col/line:  it is somehow ST design problem - they should give special register for border color, and not using color idx 0.
  If read what I wrote here earlier, forget it. I solved separated color 0, index 0 of line and border color. That would mean in fact 82 colors/line. And increases not file size, because there are paddings already. And no need that hi-color conversion SW take care about it. All data will be aranged by muxing SW. You need to set SW so, that have 16 free selectable colors for preloaded (I mark it as Palette 0 of line) palette. And of course, same for other 4 palettes. Hopefully, it will make things simpler, and more free colors never harm.

Line code looks like :

line1   macro

   lea  $FFFF8240.w,a0   * 8T

        asr.l        #8,d0   * 24T
        asr.l        #8,d0
        asr.l        #8,d0
*        asr.l        #8,d0    * 96
*        nop


* Following gives 28T states less time for drive, so better to preload it ahead
* at previous sector
*    move.w  (a0),d0    * 8T   This is first color of palette 0, others
*  are OK already

* d1 holds color idx 0 of palette 0 for line :

    move.w  d1,(a0)   * 8T  - this writes at exact line start
    asr.l  #6,d0   * 20T

* Note: all readings from RAM, shifter are actually writes there, from IDE port
* due to CATA logic

    movem.w  (a0),d0-d7/a1-a7    * 72T, 32 bytes - overshot !
    movem.w  (a0),d0-d7/a1-a7   * 72T
    movem.w  (a0),d0-d7/a1-a7    * 72T
    movem.w  (a0),d0-d7/a1-a7   * 72T  , 288 so far
    movem.w  (a0),d0-d7/a1-a7   * 72T - this is for border and next line

    move.l  usp,a0   *4T  -  dummy buffer for paddings
    movem.w  (a0),d1-d7   * 16 bytes (overshot) , padding 1/2,  8+32=40T
states

* in d1 loaded next line pal0, color 0 !

    endm

Line2 is same, just 2 bytes less padding, by line3 no padding.  3 lines
together are in 1 sector.

It is on muxing SW to place all data on proper places. Player just need to
preload reg d1 with pal0, idx 0 color before starting hi-color show part.


Doing similar with usual from RAM copying code would be not simple, I
guess - I have here free time before color updates, while usual hi-color
code preloads registers with color data then.

What format your SW should produce for this:
I think that best is that you do:  200 or 199 lines bitmap, as it is already.
Then 32 bytes of preload, with all 16 free colors - border color will be set independent of it.
After it 199 line color data, starting at palette1, then 2, 3, 4, and 0 of next line.
160 bytes for 1 line. As you did already.
At end should be 32 zeros or better nothing (no need to fill preload data for not existing line 201, but not relevant for me, do as it is easiest for you). Muxing SW will place border color where is needed.
  •  

Petari

#64
Example with 80 col/line :

http://atari.8bitchip.info/Lorax80cplTest3.avi       (13MB)

Done with test version of new PCS, fast mode m4, error diffusion. Overall looks very good, especially as there is a lot of gradients in video.  Later I will do it with some slower, better quality mode ..
  •  

Cyg

Nice video indeed, very colourful !

I hope to finish my C portage by tomorrow, it will include my modes 54 colors, pre&post dithering and maybe the 2 pals mode (or by thursday) + your 80/82 cols mode.
I believe that your previous bin file is no longer up to date ? You can explain by words or with any csv file the rule, absolute/relative index color, pixel coord where it starts & ends.

Cyg
  •  

Petari

#66
Bin file is OK and up to date. I was likely lucky with it - and perfectly fits new, independent border color scheme too, because end of line is at right moment - better said start moment of writing next line's first palette was at righ place for border update.

I don't use csv files. It is simple binary file with byte values, for 320 hor. pixels, and all possible color index values for them 0-15.
Values can be only in range 0-4. And mean palette number of 5 palettes for that line (0-4), from which color will be used - for current px and color index.

Starts of course at hor 0, vert 0 px (actually, vert is irrelevant, table stays for all lines). First 16 bytes are for px. 0,v. First color index 0, then 1,2 ...15. Color index is what stays in 4 bitplanes for that pixel.
Then for horizontal px. 1, again 16 bytes, for color indexes 0...15. And so on up to px. 319.
Size is of course 320x16 bytes.

I don't know what you mean under relative index color -  I know only indexes 0-15 for ST(E) low res mode.

Table is actually pretty much periodical. Changes (it is always change to palette up for some index) follow well clock cycles of used code. They happen usually after 4 cycles - right so much takes updating of 1 color (word write). When movem execution starts there is then 12 cycles with same pattern - because movem opcode fetch takes 8 cycles.  I hope that it was understandable .

Added video with custom border color:

http://atari.8bitchip.info/F1indBc.avi

Ah, and one question:  how do you perform pre-dithering/error diffusion ? Can I do it right with Photoshop or some other SW ?
  •  

Petari

#67
Tried Cyg 80 col test pic on STE :



Of course, there are errors with some colors - it would be little miracle if it worked without timing table. Otherwise format self is good enough - but no need for global colors (4 bytes at start). I added option to set desired border color in muxing SW.

Btw. my test pic. PatD32.png is pretty accurate - there is no resize. I was able to adjust capturing card with BT Tweaker to get exact 640px width. But sometimes is really hard to judge exact pos. - it is analog thing, there are some ringings for instance. As said, it is pretty periodical, so I finished table faster than thought, last 200 px. entries went fast - I checked only couple times on screenshot. And is 100% OK.
  •  

Cyg

Hi Petari,

I am very close to have my converter running on C and then going really faster.
Could you please send me some frames from the Lorax to tune my implementation and compare to your video result using Photochrome ?

Thanks !
Cyg
  •  

Petari

Sure. I will post it afternoon.
Photochrome code is not final version yet, of course.
I'm on to do first tests with dual palettes on real STE and ava2 conversions made by you. But don't have TV or some CRT monitor now. So, can look it only with TV card. Will try to make capture so, that blur together 2 frames - then should get average colors from 2 color data.

I want to try pre-error diffusion while converting 24bit pictures to 12-bit colorspace (4-4-4) - but can not find SW what can do it - do you know something ?
  •  

Petari

Here is 400pic from Lorax:    http://atari.8bitchip.info/lor400fr.rar

If you can, please make seq. numbered with 4 digits, like:  (nam)0000.ext ... (nam)0399.ext   .  That way you can make anim file too with suplied muxing SW and try in Steem.

I thinkered little about overscan playback.  Rough calculation says that max about 400x220 can be possible.  At so high res, part of bitmaps need to load during scanlines - it means that we can not have much colors/line.  56 seems as max. That would be 3x14 . But I don't know how to do overscan, so need more details. Blitter is not option here - slower than cart. port reversed read.  If maintaining overscan takes too much CPU time, it is not possible.  Do you have line color update code without blitter ? If no, I can do some proper for cart. port, but need to know how to achieve 400px wodth.

  •  

Petari

So far the best video:   http://atari.8bitchip.info/LorSierra24a.avi 
Gradients are almost perfect - and they seem as hardest problem. I made this with converting 24bpp images to 12bpp (4096) colors with Sierra 24a error diffusion at 100% .  Was seeking for long time for somethin like. Problem with almost all error diffusion ways is that they are not really good for animations. We need static pseudorandom dots of diffusion on static scenes - otherwise there will be unstable. Ordered diffusion produces such, but random looks much better. After it just used PCS to convert them without dithering - in this case 80 col/line mode.

Somehow related with above:    http://atari.8bitchip.info/CYG1602P.ZIP
This is video with player SW (Steem version) with alternating color data for 1 frame, made from
this:    http://www.top250filmsdvd.com/atari/AV2_1pix2pal.zip   

It works, and we have really feeling of more colors.  However, problem is that gradations are not stable.  It would be good to do more tests, with another material - for instance from Lorax pics, what I posted, so can see better difference.  I did not test on real STE so far, because I need to put part of code in ROM, what goes not so fast.  But think that in Steem can see even better unstability - especially if use run to next Vbl opt. in Steem Debugger.

And again related with above:  dual (alternating) color will be necessary for sure, for overscan.
It is because I can achieve it only by loading part of bitmap data during scanlines, so interleaved with color data . And that means that must load 2 different streams in any case, even if color data is same - so why not do alternating one then.  Data rate will be about 2.4 MB/sec for 416x231 px. And I have already almost so much with 80 colors/line mode.  With overscan will go back to 48 colors/line - no time to load more, because need to maintain overscan, what costs some CPU tiume.  At moment , I'm little stucked with overscan, because MPP works not really good on real STE. Need to fix some things in converting SW too . 


  •  

Cyg

Hi Petari !

Sorry for the delay, I am very busy at business & home but I just finished a stable and quite optimized version of my converter in C. It is surely not the quickest but the result is nice, it takes 1 min per picture actually and I hope to divide it by 2 again. (I started from 1h in VB.Net !). But in fact the time depends of the quality, I try to find a parameter to make it dynamic, the fastest optimization could take 5-6s with an average result.

I also have to add some dynamic parameters before sending it to you : dithering mode, number of cols. I effectively have an overscan routine compliant with any ST, with around 48 colors that you could improve with your hardware tricks.

I agree about the stability & gradient.
The pre-dithering method + 2 alternative palettes should help a lot because it's stable. Post dithering is unpredictable as it depends on the available colour of the optimization.

I can not run CYG1602P.TTP under Hatari here at work, will try it with STeem at home in a couple of hours.

Cyg
  •  

Petari

I will not use more than 48 colors/line with overscan. It could load more, even 5-6 palettes in theory, but problem is that no enough time to load bitmaps in border periods, so need to load part of bitmaps during scanlines - what means interleaved bitmap data with color data. What seems as good is:  3 palettes and 75 (avg) bytes of bitmap data in 1 line. It means that load 2/3 of bitmap during scanlines, and only 1/3 in borders. Should be good for 416x321 . With 416 px hor. there is 22 bytes +  of invisible data in line, and I must serve it too (no time for process any data manipulations). So, 230 bytes per line must be loaded.  How it is with 400 px hor. ?

With alternating palettes we should preprocess images to 15bit bpp, I guess. But usual tools, with error diffusion offer max 12bit fixed palettes (what is almost essencial for animations).  So, I don't know what is the best way to do preprocessing for alt. palettes.

Anim. playback should work in Hatari when you mount folder with files as GEMDOS hard disk .
  •  

Petari

Overscan with dual palettes:

http://atari.8bitchip.info/OverscanOvertake.avi

More details:  http://atari.8bitchip.info/movpst.php

Btw. found really useful and well made tool for color reduction with error diffusion, and for best place - Virtual Dub.  It is 'Error diffusion' filter - can set bit depths for each channel and other things.
  •