High-color video playback on STE

Started by Petari, 25-09-2012, 14:28:44

Previous topic - Next topic

0 Members and 5 Guests are viewing this topic.

kovacm

Quote from: Petari on 27-11-2012, 15:59:27
Overscan with dual palettes:

http://atari.8bitchip.info/OverscanOvertake.avi
:) :) :) this look awesome! only at the end you have artefacts in few lines, otherwise: perfect!
  •  

Petari

Yes. Errors by PCS conversion - which is still in Beta phase. Currently doing another one, with slower and better PCS conv. mode. 1 min for 1 frame.
Hopefully, Douglas Little will do updates soon.
  •  

Cyg

Great ! Except a bug on the left at 1'00 ?

I had few hours these last days to almost finish my C version of higheSTcolor, still have few hours to work, I should deliver you something by tomorrow night (depending of my newborn son needs ;-) ).

It can go fast, 6s per picture for an quite good result and the quality is a little improved from what you saw before because now I think I have enough power for the algorithm to converge to the "ultimate" result.

What are your overscan colors/palettes timings ?

Best regards
Cyg
  •  

Petari

I corrected errors in overtake vid. Using PCS m5 - it is quite slow with default settings - 1 min for 1 conversion. Reconverted some 150 frames at end. URL is same.
Will do some others with overscan in days.

Overscan is res:  416x228 px . Good that no border left and right.  48 colors/line.  Can not more, because need to load parts of bitmap interleaved with colors.  And no way to have complete separated palettes for lines. 6 colors from last palette of upper line go to begin of line. I think that it is best what can achieve.  There is only 48T state for loading colors in H-blank. I can there load 10 colors. Plus, I couldn't center image vertically, because it would take too much CPU time. Not so big flaw, I think.  That palette sharing between lines slowdowns PCS conversion a lot. 
But this is max, what we can with overscan, I think.  Other useful res would be  320x240 - for 4:3 AR material.
I need to do some experiments with dot timings yet - sometimes I have errors on STE depending on it's warming, it seems.
Maybe need to shift 1px left or right.

Cyg, what preprocessings you use ?
  •  

Cyg

Preprocessing : nothing (or directly included in the source image) and (x+y) alternative floor/ceil rounding rule, nothing new.
As postprocessing, Floyd steinberg, but a little lighter than before.
And modes are 1 pix + 1 pal, 1 pix + 2 pal, 2 pix + 2 pal

Cyg
  •  

Petari

#80
I meant "directly included in the source image" - what doing with some imaging SW before giving it to your converter ?

I added 3 conversions/captures on page:      http://atari.8bitchip.info/movpst.php   

  •  

Cyg

#81
I do nothing, I use the exact image (24bpp) without any preprocessing.

Error diffusion is done after the optimisation to benefit from the existing color from the optimized image. It use the final palette to find the best candidats for error diffusion.
The other solution is to do a pre error diffusion, but that could push a little more the pressure on the solver.
Both need to be compared, it depends on the capacity to solve a clean picture or a noisy picture (noise = pre-error diffusion).

Here are samples, 1 picture file, 1 palette file, 54 colors / line :


no dithering


pre-dithering X+Y, my favorite : beautiful and very stable between frames


pre-dithering X+Y + post error diffusion


post error diffusion


pre error diffusion (from a 256 color conversion with XNView, not designed for that, I don't have any picture tool here at work)


pre error diffusion + post error diffusion (from a 256 color conversion)


Original picture

1 PIX + 2 Palettes trial :


1 pix 2 palettes 54cols swapping result


1 pix 2 palettes 54cols swapping result + post error diffusion


1 pix 2 palettes 80cols swapping result


1 pix 2 palettes 80cols swapping result + post error diffusion

I think I should push error diffusion a little on the double palette result as there are more colors available, result should be less noisy.

If you have any good pre-dithering picture, we can make a try.

Cyg
  •  

Petari

#82
In principle, post error diffusion should be better. But it is hard to achieve, I guess. PCS error diffusion still has some flaws, and I did not try every setting so far.
What for me worked best is pre-error diffuson or ordered dithering - latest is OK with 15bpp.
For single palette conversion we need to do error diffusion while converting to 12bpp - it is of course not to indexed palette (max 256 colors), but to 4 bit per color.  Best results I got with Sierra 24A error diffusion - it produces pretty much static dot patterns on static parts.

With dual palette is much better:  conversion to 15bpp with error diffusion or ordered dither produces almost invisible dots.
And best tool for animations is Error Diffusion filter for Virtual Dub - lot of settings. Static patterns with Floyd Steinberg E.D.

Here are example pictures:    http://atari.8bitchip.info/underw.zip

Capture of video:  http://atari.8bitchip.info/Underwater.avi

Same on YouTube:  http://youtu.be/urgixT_MPCU

Btw. it is less sharp than in reality - because I needed to filter out horizontal lines/bars, which are result of Atari's not standard video out - TV card can not capture it really good, and some shifting appears. Instead 625 lines, Atari produces 2x313 lines.

Btw. your last pic (80 col, 2 pals) looks really good.
  •  

Cyg

Sierra24 seems very interesting, I made a first try and it shows no big changes after encoding :-)
But I think I still prefer the X+Y method for static picture.

I should post my exe here tonight, you will be able to try it by yourself !
For the moment I have a regression bug to solve and a global lightning on the X+Y method (something like halftone too light).
I also try to better balance the 2 palettes using 1 pix, the mixing result is good but the 2 individual pictures are not so nice, which could be a pb for quick eyes :-)
And I have to check that the result still displays fine on ST, and not only rely on my bmp preview on PC :-)

Cyg
  •  

Cyg

#84
Here is a beta version of my optimizer "higheSTcolor", the last enhancements are not optimized (in speed), I'll try to do it later...

http://www.top250filmsdvd.com/atari/higheSTcolor01.exe

It should work fine for 1pix/1pal and 1pix/2pals but not yet for 2pix/2pal.

I let you see the option, basically it looks like :

higheSTcolor.exe -4 --mode=0 --dith=1 --pixpal=3 0000.bmp

Where "-4" mean quality between 1 and 5, speed is exponential and 3 leads to a good result
--mode for the displaying mode
--dith for the pre-dithering methods, there will always be a 2nd results with a post error-diffusion as it cost nothing and could lead to something better
--pixpal to specify the number of pix & pal
Sorry, no wildcards for the file...

Outputs :
*.pal : 1st palette file (no global color for the moment)
*.pa2 : 2nd palette file (useful when pixpal=2 or 3)
*.pix : 1st bitmap file
*F.pix : 1st bitmap file with error difusion
*_preview.bmp : preview file using the 1st pal
*_preview2.bmp : preview file using the 2nd pal
*_previewF.bmp : preview file using the 1st pal with error difusion
*_previewF2.bmp : preview file using the 2nd pal with error difusion

54 cols mode :
1pix, 1 pal:
higheSTcolor.exe -3 --mode=0 --dith=0 --pixpal=1 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=1 --pixpal=1 0000.bmp
1pix, 2 pal:
higheSTcolor.exe -3 --mode=0 --dith=0 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=1 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=2 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=3 --pixpal=3 0000.bmp

80 cols mode, change mode=0 by mode=1

Upcoming :
- overscan mode; could you please give me your color timings?
- 2pix + 2pal
- some speed optimisation

Cyg
  •  

Petari

Here is timing for Overscan:  http://atari.8bitchip.info/OS6.csv

At top is PCS header. Means 416px hor. 16 colors per palette. 3 palettes.  2 palettes to skip. 0 is irrelevant.
The point is that because os not possible to load 16 colors in hor, border, there is sharing between lines. -1 in table means using last palette from upper line (where it is pal 2). 2 palettes to skip is because PCS skips first bitmap line data, and need to preload 1 palette before start with high-color line code.  Basically, for color data you need :  32 bytes of  pal -1 for first line, then 3 palettes for line, and so on ...
In last line, only 8 colors from last palette (2) will be used. You may padd the rest to 16 colors, or cut right after 8 colors.

I started conversion of 500 frames with your code. At qual 4 (and 80c/l)  it goes pretty slow.  When finishes, I'll mux, and see how it looks.   
I need switch to disable generation of preview BMP files (which are useful of course when do settings) - it takes too much space on disk when converting hunderts of pics.

At moment I work on animation playback suitable for people equipped with faster hard disk adapters and drives:  like UltraSatan, some newer IDE drive or CF card  or faster SCSI drive + ACSI-SCSI adapter. For instance Mega STE internal is enough fast - if using newer SCSI drive. Or ICD adapters.  Minimum is 1100KB/sec transfer rate.  Res. is 320x158px, 12.5 fps. More just can not.
I made code for 48 col/line hi-color, with custom border. 54 colors/line is not really good, because means more data, and we are very limited with loading time and speed.
Here is it :  - I can post later csv timing table for it.






lea $FFFF8209.w,a3

lea $FFFF8240.w,a6
lea $FFFF8250.w,a2
move.l a2,usp
lea $FFFF8242.w,a4
lea $FFFF8244.w,a5

clr.l $40.w   *  black border - or custom


moveq #0,d0
moveq #$40,d7
l2AF96 move.b (a3),d0
beq.s l2AF96
sub.w d0,d7
lsl.w d7,d0
move.w #12,d0    * 
l2AFA2 dbf d0,l2AFA2
nop
nop
movea.l palloc(pc),a7


rept 199    *  Or 158   if  movie for UltraSatan and co (1100KB/sec min data rate for 12.5 fps)


   movem.l  (a7)+,d0-d7/a0-a3     *108T (12+12*8)  preload 16+8 colors 
* complete P0 and first half  of P1

   movem.l  d1-d7,(a5)   * 64T   * P0 col 2-15
   move.w  d0,(a4)    * 8T     * P0 col1 , 15 colors of P0 written
   swap   d0    * 4T

* 184T so far

*boe -   border end
   move.w  d0,(a6)   * 8T  update color 0 of P0  at exact line start

   movem.l  (a7)+,d0-d3  *  12+4*8 = 44T   - second half of P1

   movem.l  a0-a3,(a6)     *  40 T  first half of P1 to shifter     at px 56

   move.l   usp,a3     * 4T    px

   movem.l  d0-d3,(a3)   *  40T   second half   P1    at px 100

   movem.l  (a7)+,d0-d7   * 76T , complete P2 pref.   at px 140

   movem.l  d0-d7,(a6)   *  72T   all 16 colors  P2   at px 216

* 28T states free here - just delay

    addq.l #1,d0   * 8T
    asr.l #6,d0   * 20T

*bos  -  border start
   move.w  $40.w,(a6)      * 16T    need 312T from  boe ( move.w d0,(a6) ) - here relative +8T wr.


endr





  •  

Cyg

Just before encoding a bunch a 80 cols pictures, did you check that it now fits from a single sampling ? Just to avoid any CPU warming for nothing...

I am doing some "underwater" testing with all the differents dithering/error diffusion available.

I take the note for the option to enable/disable the preview files, you can also make a big batch file which del the files just after the encoding.

Cyg
  •  

Cyg

#87
Here the version 5 : http://www.top250filmsdvd.com/atari/higheSTcolor05.exe
Note :
- there was a little bug in version 2 for the .pal & .pa2 files, first 2 colors are unused despite the overall size seems to be right => the 2 very last colors are missing
- a major bug for 1pix + 2pal from version 3 to version 4, fine since version 5

- new 48 cols mode that should be compliant with your code (if I made no mystake in the opcode timings)
- --preview option : no preview by default
- correction of a quality bug, should improve by a little for dither=3
- dither=3 is now available when pixpal=1 (1 pix / 1pal) : it does about the same as the forced (X+Y)pre-dither but it evaluated the best candidates during the optimization, not as an input => less constraints => easier to find a correct solution.
- select more additional colors in the 1 pix + 1 pal method, improved the result by some few %
- double image optimization "motion blur", 1 pix + 2 pal

Mode 48 cols preview with dither=3 ("X+Y mod 2" V2) on 1 pix + 1 pal :

My new favorite, it beats the previous X+Y mode (pre-dithering), should be very stable


idem with a post error diffusion

If you do it at 12.5fps, does it mean that you are loading 4 times the palette ? (at every vbl) If yes, then it could be possible to motion blur frames in between without loading any extra-pixels (shared pixels), like I do in my last demos. Without any precalc you could double the frame-rate, 25 fps : 12.5 fps pix + 25 fps pal
But quality is decreasing.
It is available since version 04, you just have to rename the 2nd intermediate frame like the frame just before and add "_bis" : 0001.bmp and 0002.bmp renames as "0001_bis.bmp" to manage frame 1 & 2 in a single pix + 2 pal, then 0003.bmp & 0003_bis for frames 3&4, etc.
I am working a little on it to improve the result (I hope). It is not compliant with dither=1 but with other it should be fine, especially dither=3

Concerning undewater, Sierra 24 12 bits is effectively very good with 1 pix + 1 pal. I always like the "X+Y mod 2" method on a 24b source image, yet more now with the dither=3 option as described before.
For 1 pix + 2 pals, dither=3 on 24b source image is interesting because it is stable.

Version 5 now implement 2 kind of post error diffusion, use the --ed option :
--ed=1 : Floyd Steinberg, as before
--ed=2 : Sierra 2-4
--ed=3 : Atkinson

Advantage of a post dither VS a pre dither, because it is more complex to solve:




dither 3, X+Y mod 2

Results should be about the same if optimisation was perfect, except it is far better with dither 3.

Cyg

  •  

Petari

I was busy during weekend with some work around house, so had not much time to deal with this.
Tried first to make some 80c/l conversions. But there is something wrong - maybe in my code, maybe you did not get correct csv timing, so I got not correct displaying. I need to recheck first my muxing and playing code.
Then went on 54c/l. I noticed error with 4 zeros at start, and bad pixels at end of last line. Anyway, I made some conversions with dual palette, dithering modes 1-3 (tried only first v. what you posted). It looks pretty good, but I see some flickering on sky . I looked it only in Steem, did not finish playback SW for real STE. Likely will do it today, later, then will post some video capture.

One thing more, what can make muxing to AV file easier:  if you can, please make all 3 outputs (pix, pal, pa2) in single file. Just write them in order. It is simpler and faster to open just one file per frame instead 3, especially when need  to do it thousands time in row. It is easy to set proper offsets in SW, depending on part sizes.

Here is csv for 48col/line:   http://atari.8bitchip.info/48cl.csv

But you don't need to rush with implementing it - I need some more tests of timings, on different machines. It seems that we have some differences on STE, Mega STE ...

Code for 80col/line:



line1   macro

   lea  $FFFF8240.w,a0   * 8T

asr.l #8,d0   * 24T
asr.l #8,d0
asr.l #8,d0


    move.w  d1,(a0)   * 8T  - this should write at exact line start
    asr.l  #6,d0   * 20T


* Following executes reversed - writes to a0 instead read ! Src is IDE port, not registers.
    movem.w  (a0),d0-d7/a1-a7    * 72T, 32 bytes - overshot !
    movem.w  (a0),d0-d7/a1-a7   * 72T
    movem.w  (a0),d0-d7/a1-a7    * 72T
    movem.w  (a0),d0-d7/a1-a7   * 72T  , 288 so far
    movem.w  (a0),d0-d7/a1-a7   * 72T - this is for border and next line
   
    move.l  usp,a0   *4T
    movem.w  (a0),d1-d7   * 16 bytes (overshot) , padding 1/2,  8+32=40T states

* in d1 must go next line pal0, color 0 !

    endm



Overscan line code:



*  Overscan mode  -  416x228 px
* Not centered vertically

OverScan

* Set addresses :

lea $ffff8240.w,a0    * 8T
lea $ffff8250.w,a1
lea $ffff820A.w,a2
lea $ffff8260.w,a3


* above 5=40T

* address for loading bitmap data:  a6 
* a5 still free


  IdeSelpDataFR      *  16T  -  this must be executed from ROM !
* from now reversed read/write to dest

move.w d1,(a0)    * 8T  -  this sets first color in blank period

* sum 64T before line self

ol1 macro

  move a3,(a3) ;  8T

moveq #0,d0
move.b d0,(a3)       * 12T

*  Need  76+72+92+72+32+8 T states until next overscan setting. = 352T states exactly


* 28T :


  asr.l  #4,d1    * 16T
  asr.l  #2,d1      * 12T

* In lines 1,4,7... here just dummy delay code !


* Pause for drive possible this way:   8+20+28+8 = 64T


* Bitmap load

* Reversed !
movem.l (a6)+,d1-d7/a5    * loads 32+2 (overshot) bytes in 8+4x17=76T states
addq.l #2,a6   * bug compens  8T   

* Pal1 load in 2 stages :

movem.w   (a0),d1-d7 ;  8+4x8=40T  write colors - 8
movem.w   (a1),d1-d7 ;  8+4x8=40T  write colors - 8

* bmp  load

movem.l (a6)+,d1-d7/a5    * loads 32+2 (overshot) bytes in 8+4x17=76T states
addq.l #2,a6   * bug compens  8T


* Total 78 bytes loadable in 1 line > 3x78=234 - too good, need 224, so line 1 in group of 3
* delay instead  10 bytes load section

* Pal2 load in 2 stages :

movem.w   (a0),d1-d7 ;  8+4x8=40T  write colors - 8
movem.w   (a1),d1-d7 ;  8+4x8=40T  write colors - 8


* 352T after previous overscan setting

move.b d0,(a2) ; 8T
move a2,(a2) ; 8T

* Need 52T until next overscan setting

*Pal0 for next line, part 1 - but goes in this line too, little

movem.w   (a0),d1-d7 ;  8+4x8=40T  write colors - 8

* 12T free  - just delay here

lsl.l  #2,d1   * 12T

* 52T from last OS set.

move a3,(a3) ; 8T
* Here pixels end - pos 416
moveq #0,d0
move.b d0,(a3)    *12T

* 48T yet

* Pal0 for next line, part 2
movem.w   (a1),d1-d7 ;  8+4x8=40T  write colors - 8
nop
nop


endm




12.5 fps:  not possible to work with diverse palettes - ACSI (DMA) can not work during scanlines. So, 1 pal for 4 frames.
  •  

Cyg

It looks like you were facing one of my bug : 4 bytes for nothing (in fact for global unused yet colors) and 4 last bytes missing.
Get the dither=3 a try on a 24 bit pictures with no pre-dithering, it is my preferred, sierra24 with dither=0 is good too.

Thanks for the routines, it will help for overscan as I will be able to test it, for 80 cols also with no possibility to validate it of course.
I tested the 48cols using your routine and it worked, timings are ok.

Best regards,
Cyg
  •