High-color video playback on STE

kovacm · 27-11-2012, 17:05:14

Quote from: Petari on 27-11-2012, 15:59:27
Overscan with dual palettes:

http://atari.8bitchip.info/OverscanOvertake.avi

this look awesome! only at the end you have artefacts in few lines, otherwise: perfect!

Petari · 28-11-2012, 10:59:11

Yes. Errors by PCS conversion - which is still in Beta phase. Currently doing another one, with slower and better PCS conv. mode. 1 min for 1 frame.
Hopefully, Douglas Little will do updates soon.

Cyg · 28-11-2012, 12:52:56

Great ! Except a bug on the left at 1'00 ?

I had few hours these last days to almost finish my C version of higheSTcolor, still have few hours to work, I should deliver you something by tomorrow night (depending of my newborn son needs ;-) ).

It can go fast, 6s per picture for an quite good result and the quality is a little improved from what you saw before because now I think I have enough power for the algorithm to converge to the "ultimate" result.

What are your overscan colors/palettes timings ?

Best regards
Cyg

Petari · 28-11-2012, 16:32:51

I corrected errors in overtake vid. Using PCS m5 - it is quite slow with default settings - 1 min for 1 conversion. Reconverted some 150 frames at end. URL is same.
Will do some others with overscan in days.

Overscan is res: 416x228 px . Good that no border left and right. 48 colors/line. Can not more, because need to load parts of bitmap interleaved with colors. And no way to have complete separated palettes for lines. 6 colors from last palette of upper line go to begin of line. I think that it is best what can achieve. There is only 48T state for loading colors in H-blank. I can there load 10 colors. Plus, I couldn't center image vertically, because it would take too much CPU time. Not so big flaw, I think. That palette sharing between lines slowdowns PCS conversion a lot.
But this is max, what we can with overscan, I think. Other useful res would be 320x240 - for 4:3 AR material.
I need to do some experiments with dot timings yet - sometimes I have errors on STE depending on it's warming, it seems.
Maybe need to shift 1px left or right.

Cyg, what preprocessings you use ?

Cyg · 28-11-2012, 17:10:18

Preprocessing : nothing (or directly included in the source image) and (x+y) alternative floor/ceil rounding rule, nothing new.
As postprocessing, Floyd steinberg, but a little lighter than before.
And modes are 1 pix + 1 pal, 1 pix + 2 pal, 2 pix + 2 pal

Cyg

Petari · 29-11-2012, 13:35:02

I meant "directly included in the source image" - what doing with some imaging SW before giving it to your converter ?

I added 3 conversions/captures on page: http://atari.8bitchip.info/movpst.php

Cyg · 29-11-2012, 16:54:18

I do nothing, I use the exact image (24bpp) without any preprocessing.

Error diffusion is done after the optimisation to benefit from the existing color from the optimized image. It use the final palette to find the best candidats for error diffusion.
The other solution is to do a pre error diffusion, but that could push a little more the pressure on the solver.
Both need to be compared, it depends on the capacity to solve a clean picture or a noisy picture (noise = pre-error diffusion).

Here are samples, 1 picture file, 1 palette file, 54 colors / line :

no dithering

pre-dithering X+Y, my favorite : beautiful and very stable between frames

pre-dithering X+Y + post error diffusion

post error diffusion

pre error diffusion (from a 256 color conversion with XNView, not designed for that, I don't have any picture tool here at work)

pre error diffusion + post error diffusion (from a 256 color conversion)

Original picture

1 PIX + 2 Palettes trial :

1 pix 2 palettes 54cols swapping result

1 pix 2 palettes 54cols swapping result + post error diffusion

1 pix 2 palettes 80cols swapping result

1 pix 2 palettes 80cols swapping result + post error diffusion

I think I should push error diffusion a little on the double palette result as there are more colors available, result should be less noisy.

If you have any good pre-dithering picture, we can make a try.

Cyg

Petari · 30-11-2012, 15:46:18

In principle, post error diffusion should be better. But it is hard to achieve, I guess. PCS error diffusion still has some flaws, and I did not try every setting so far.
What for me worked best is pre-error diffuson or ordered dithering - latest is OK with 15bpp.
For single palette conversion we need to do error diffusion while converting to 12bpp - it is of course not to indexed palette (max 256 colors), but to 4 bit per color. Best results I got with Sierra 24A error diffusion - it produces pretty much static dot patterns on static parts.

With dual palette is much better: conversion to 15bpp with error diffusion or ordered dither produces almost invisible dots.
And best tool for animations is Error Diffusion filter for Virtual Dub - lot of settings. Static patterns with Floyd Steinberg E.D.

Here are example pictures: http://atari.8bitchip.info/underw.zip

Capture of video: http://atari.8bitchip.info/Underwater.avi

Same on YouTube: http://youtu.be/urgixT_MPCU

Btw. it is less sharp than in reality - because I needed to filter out horizontal lines/bars, which are result of Atari's not standard video out - TV card can not capture it really good, and some shifting appears. Instead 625 lines, Atari produces 2x313 lines.

Btw. your last pic (80 col, 2 pals) looks really good.

Cyg · 30-11-2012, 16:56:45

Sierra24 seems very interesting, I made a first try and it shows no big changes after encoding :-)
But I think I still prefer the X+Y method for static picture.

I should post my exe here tonight, you will be able to try it by yourself !
For the moment I have a regression bug to solve and a global lightning on the X+Y method (something like halftone too light).
I also try to better balance the 2 palettes using 1 pix, the mixing result is good but the 2 individual pictures are not so nice, which could be a pb for quick eyes :-)
And I have to check that the result still displays fine on ST, and not only rely on my bmp preview on PC :-)

Cyg

Cyg · 01-12-2012, 00:58:11

Here is a beta version of my optimizer "higheSTcolor", the last enhancements are not optimized (in speed), I'll try to do it later...

http://www.top250filmsdvd.com/atari/higheSTcolor01.exe

It should work fine for 1pix/1pal and 1pix/2pals but not yet for 2pix/2pal.

I let you see the option, basically it looks like :

higheSTcolor.exe -4 --mode=0 --dith=1 --pixpal=3 0000.bmp

Where "-4" mean quality between 1 and 5, speed is exponential and 3 leads to a good result
--mode for the displaying mode
--dith for the pre-dithering methods, there will always be a 2nd results with a post error-diffusion as it cost nothing and could lead to something better
--pixpal to specify the number of pix & pal
Sorry, no wildcards for the file...

Outputs :
*.pal : 1st palette file (no global color for the moment)
*.pa2 : 2nd palette file (useful when pixpal=2 or 3)
*.pix : 1st bitmap file
*F.pix : 1st bitmap file with error difusion
*_preview.bmp : preview file using the 1st pal
*_preview2.bmp : preview file using the 2nd pal
*_previewF.bmp : preview file using the 1st pal with error difusion
*_previewF2.bmp : preview file using the 2nd pal with error difusion

54 cols mode :
1pix, 1 pal:
higheSTcolor.exe -3 --mode=0 --dith=0 --pixpal=1 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=1 --pixpal=1 0000.bmp
1pix, 2 pal:
higheSTcolor.exe -3 --mode=0 --dith=0 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=1 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=2 --pixpal=3 0000.bmp
higheSTcolor.exe -3 --mode=0 --dith=3 --pixpal=3 0000.bmp

80 cols mode, change mode=0 by mode=1

Upcoming :
- overscan mode; could you please give me your color timings?
- 2pix + 2pal
- some speed optimisation

Cyg

Petari · 01-12-2012, 14:03:00

Here is timing for Overscan: http://atari.8bitchip.info/OS6.csv

At top is PCS header. Means 416px hor. 16 colors per palette. 3 palettes. 2 palettes to skip. 0 is irrelevant.
The point is that because os not possible to load 16 colors in hor, border, there is sharing between lines. -1 in table means using last palette from upper line (where it is pal 2). 2 palettes to skip is because PCS skips first bitmap line data, and need to preload 1 palette before start with high-color line code. Basically, for color data you need : 32 bytes of pal -1 for first line, then 3 palettes for line, and so on ...
In last line, only 8 colors from last palette (2) will be used. You may padd the rest to 16 colors, or cut right after 8 colors.

I started conversion of 500 frames with your code. At qual 4 (and 80c/l) it goes pretty slow. When finishes, I'll mux, and see how it looks.
I need switch to disable generation of preview BMP files (which are useful of course when do settings) - it takes too much space on disk when converting hunderts of pics.

At moment I work on animation playback suitable for people equipped with faster hard disk adapters and drives: like UltraSatan, some newer IDE drive or CF card or faster SCSI drive + ACSI-SCSI adapter. For instance Mega STE internal is enough fast - if using newer SCSI drive. Or ICD adapters. Minimum is 1100KB/sec transfer rate. Res. is 320x158px, 12.5 fps. More just can not.
I made code for 48 col/line hi-color, with custom border. 54 colors/line is not really good, because means more data, and we are very limited with loading time and speed.
Here is it : - I can post later csv timing table for it.

Code Select






	lea	$FFFF8209.w,a3

	lea	$FFFF8240.w,a6
	lea	$FFFF8250.w,a2
	move.l	a2,usp
	lea	$FFFF8242.w,a4
	lea	$FFFF8244.w,a5

	clr.l	$40.w   *  black border - or custom


	moveq	#0,d0
	moveq	#$40,d7
l2AF96	move.b	(a3),d0
	beq.s	l2AF96
	sub.w	d0,d7
	lsl.w	d7,d0
	move.w	#12,d0    *  
l2AFA2	dbf	d0,l2AFA2
	nop
	nop
	movea.l	palloc(pc),a7


	rept	199    *  Or 158   if  movie for UltraSatan and co (1100KB/sec min data rate for 12.5 fps)


   movem.l  (a7)+,d0-d7/a0-a3     *108T (12+12*8)  preload 16+8 colors  
* complete P0 and first half  of P1

   movem.l  d1-d7,(a5)   * 64T   * P0 col 2-15
   move.w  d0,(a4)    * 8T     * P0 col1 , 15 colors of P0 written
   swap   d0    * 4T

* 184T so far

*boe -   border end
   move.w  d0,(a6)   * 8T  update color 0 of P0  at exact line start

   movem.l  (a7)+,d0-d3  *  12+4*8 = 44T   - second half of P1

   movem.l  a0-a3,(a6)     *  40 T  first half of P1 to shifter     at px 56

   move.l   usp,a3     * 4T    px 

   movem.l  d0-d3,(a3)   *  40T   second half   P1    at px 100

   movem.l  (a7)+,d0-d7   * 76T , complete P2 pref.   at px 140

   movem.l  d0-d7,(a6)   *  72T   all 16 colors  P2   at px 216

* 28T states free here - just delay

    	addq.l	#1,d0   * 8T
    	asr.l	#6,d0   * 20T

*bos  -  border start
   move.w  $40.w,(a6)      * 16T    need 312T from  boe ( move.w d0,(a6) ) - here relative +8T wr.


	endr

Cyg · 01-12-2012, 14:15:21

Just before encoding a bunch a 80 cols pictures, did you check that it now fits from a single sampling ? Just to avoid any CPU warming for nothing...

I am doing some "underwater" testing with all the differents dithering/error diffusion available.

I take the note for the option to enable/disable the preview files, you can also make a big batch file which del the files just after the encoding.

Cyg

Cyg · 01-12-2012, 23:47:21

Here the version 5 : http://www.top250filmsdvd.com/atari/higheSTcolor05.exe
Note :
- there was a little bug in version 2 for the .pal & .pa2 files, first 2 colors are unused despite the overall size seems to be right => the 2 very last colors are missing
- a major bug for 1pix + 2pal from version 3 to version 4, fine since version 5

- new 48 cols mode that should be compliant with your code (if I made no mystake in the opcode timings)
- --preview option : no preview by default
- correction of a quality bug, should improve by a little for dither=3
- dither=3 is now available when pixpal=1 (1 pix / 1pal) : it does about the same as the forced (X+Y)pre-dither but it evaluated the best candidates during the optimization, not as an input => less constraints => easier to find a correct solution.
- select more additional colors in the 1 pix + 1 pal method, improved the result by some few %
- double image optimization "motion blur", 1 pix + 2 pal

Mode 48 cols preview with dither=3 ("X+Y mod 2" V2) on 1 pix + 1 pal :

My new favorite, it beats the previous X+Y mode (pre-dithering), should be very stable

idem with a post error diffusion

If you do it at 12.5fps, does it mean that you are loading 4 times the palette ? (at every vbl) If yes, then it could be possible to motion blur frames in between without loading any extra-pixels (shared pixels), like I do in my last demos. Without any precalc you could double the frame-rate, 25 fps : 12.5 fps pix + 25 fps pal
But quality is decreasing.
It is available since version 04, you just have to rename the 2nd intermediate frame like the frame just before and add "_bis" : 0001.bmp and 0002.bmp renames as "0001_bis.bmp" to manage frame 1 & 2 in a single pix + 2 pal, then 0003.bmp & 0003_bis for frames 3&4, etc.
I am working a little on it to improve the result (I hope). It is not compliant with dither=1 but with other it should be fine, especially dither=3

Concerning undewater, Sierra 24 12 bits is effectively very good with 1 pix + 1 pal. I always like the "X+Y mod 2" method on a 24b source image, yet more now with the dither=3 option as described before.
For 1 pix + 2 pals, dither=3 on 24b source image is interesting because it is stable.

Version 5 now implement 2 kind of post error diffusion, use the --ed option :
--ed=1 : Floyd Steinberg, as before
--ed=2 : Sierra 2-4
--ed=3 : Atkinson

Advantage of a post dither VS a pre dither, because it is more complex to solve:

dither 3, X+Y mod 2

Results should be about the same if optimisation was perfect, except it is far better with dither 3.

Cyg

Petari · 03-12-2012, 11:22:35

I was busy during weekend with some work around house, so had not much time to deal with this.
Tried first to make some 80c/l conversions. But there is something wrong - maybe in my code, maybe you did not get correct csv timing, so I got not correct displaying. I need to recheck first my muxing and playing code.
Then went on 54c/l. I noticed error with 4 zeros at start, and bad pixels at end of last line. Anyway, I made some conversions with dual palette, dithering modes 1-3 (tried only first v. what you posted). It looks pretty good, but I see some flickering on sky . I looked it only in Steem, did not finish playback SW for real STE. Likely will do it today, later, then will post some video capture.

One thing more, what can make muxing to AV file easier: if you can, please make all 3 outputs (pix, pal, pa2) in single file. Just write them in order. It is simpler and faster to open just one file per frame instead 3, especially when need to do it thousands time in row. It is easy to set proper offsets in SW, depending on part sizes.

Here is csv for 48col/line: http://atari.8bitchip.info/48cl.csv

But you don't need to rush with implementing it - I need some more tests of timings, on different machines. It seems that we have some differences on STE, Mega STE ...

Code for 80col/line:

Code Select



line1   macro

   lea  $FFFF8240.w,a0   * 8T

	asr.l	#8,d0   * 24T
	asr.l	#8,d0 
	asr.l	#8,d0 


    move.w  d1,(a0)   * 8T  - this should write at exact line start
    asr.l  #6,d0   * 20T


* Following executes reversed - writes to a0 instead read ! Src is IDE port, not registers.
    movem.w  (a0),d0-d7/a1-a7    * 72T, 32 bytes - overshot !
    movem.w  (a0),d0-d7/a1-a7   * 72T
    movem.w  (a0),d0-d7/a1-a7    * 72T
    movem.w  (a0),d0-d7/a1-a7   * 72T  , 288 so far
    movem.w  (a0),d0-d7/a1-a7   * 72T - this is for border and next line
    
    move.l  usp,a0   *4T
    movem.w  (a0),d1-d7   * 16 bytes (overshot) , padding 1/2,  8+32=40T states

* in d1 must go next line pal0, color 0 !

    endm

Overscan line code:

Code Select



*  Overscan mode  -  416x228 px
* Not centered vertically

OverScan

* Set addresses :

	lea	$ffff8240.w,a0    * 8T
	lea	$ffff8250.w,a1
	lea	$ffff820A.w,a2
	lea	$ffff8260.w,a3
	

* above 5=40T

* address for loading bitmap data:  a6  
* a5 still free


  	 IdeSelpDataFR      *  16T  -  this must be executed from ROM !
* from now reversed read/write to dest

	move.w	d1,(a0)    * 8T  -  this sets first color in blank period 

* sum 64T before line self

ol1	macro

  	move	a3,(a3)		;  8T

	moveq 	#0,d0
	move.b 	d0,(a3)       * 12T

*  Need  76+72+92+72+32+8 T states until next overscan setting. = 352T states exactly


* 28T :


  	asr.l  #4,d1    * 16T	
  	asr.l  #2,d1      * 12T

* In lines 1,4,7... here just dummy delay code !


* Pause for drive possible this way:   8+20+28+8 = 64T 


* Bitmap load 

* Reversed !
	movem.l	(a6)+,d1-d7/a5    * loads 32+2 (overshot) bytes in 8+4x17=76T states
	addq.l	#2,a6   * bug compens  8T    

* Pal1 load in 2 stages :

	movem.w   (a0),d1-d7	;  8+4x8=40T  write colors - 8
	movem.w   (a1),d1-d7	;  8+4x8=40T  write colors - 8

* bmp  load

	movem.l	(a6)+,d1-d7/a5    * loads 32+2 (overshot) bytes in 8+4x17=76T states
	addq.l	#2,a6   * bug compens  8T


* Total 78 bytes loadable in 1 line > 3x78=234 - too good, need 224, so line 1 in group of 3
* delay instead  10 bytes load section

* Pal2 load in 2 stages :

	movem.w   (a0),d1-d7	;  8+4x8=40T  write colors - 8
	movem.w   (a1),d1-d7	;  8+4x8=40T  write colors - 8


* 352T after previous overscan setting

	move.b	d0,(a2)		; 8T
	move	a2,(a2)		; 8T

* Need 52T until next overscan setting

*Pal0 for next line, part 1 - but goes in this line too, little
 
	movem.w   (a0),d1-d7	;  8+4x8=40T  write colors - 8

* 12T free  - just delay here 

	lsl.l  #2,d1   * 12T

* 52T from last OS set.

	move	a3,(a3)		; 8T
* Here pixels end - pos 416
	moveq 	#0,d0
	move.b 	d0,(a3)    *12T

* 48T yet

* Pal0 for next line, part 2
	movem.w   (a1),d1-d7	;  8+4x8=40T  write colors - 8
	nop
	nop


	endm

12.5 fps: not possible to work with diverse palettes - ACSI (DMA) can not work during scanlines. So, 1 pal for 4 frames.

Cyg · 03-12-2012, 11:58:08

It looks like you were facing one of my bug : 4 bytes for nothing (in fact for global unused yet colors) and 4 last bytes missing.
Get the dither=3 a try on a 24 bit pictures with no pre-dithering, it is my preferred, sierra24 with dither=0 is good too.

Thanks for the routines, it will help for overscan as I will be able to test it, for 80 cols also with no possibility to validate it of course.
I tested the 48cols using your routine and it worked, timings are ok.

Best regards,
Cyg