High-color video playback on STE

Started by Petari, 25-09-2012, 14:28:44

Previous topic - Next topic

0 Members and 6 Guests are viewing this topic.

Cyg

#90
Another trick I use in my demos is to share the colors between several lines of pixels and it still gives a nice looking result. I sucessfully tested to share 54 colors between 6 lines in an FX for a future demo.
It works fine because most often there is about the same colors used from line to another.

This reduce the palette size to load by the number of shares.
For instance, sharing colors between 2 lines will reduce the data loading of about 10kB per picture, it could be helpful if you are close to the limit, from 52 to 42kB for a 320x200 picture.

Cyg
  •  

Petari

You can not use overscan code given here - it is version which loads part of bitmap interleaved with color data.

I will post you overscan code good for static pictures - where no cart. adapter needed. 48 colors/line too, just little different timings.

By animation playback with hi-color main problem is lack of time for load. Therefore I use sync. load during scanlines.  Majority of Atari ST(E) owners has some ACSI hard disk solution - like UltraSatan, some ICD adapter and SCSI drive, Mega STE has internal ACSI-SCSI adapter ...  And it is just not good for hi-color playback.  Syncro load is not possible, and even bare data load during scanlines not. Because DMA transfer will destroy timing - it stops CPU for some cycles.
If go on 158 lines, you have max.  9.5 mS to load 21 sectors. And it means min 1100KB/sec transfer rate, maybe little more even.  With 12.5 fps we need to load:  25KB for bitmap, 14.5KB for colors and 1KB for audio in 1/12.5 sec - or in 4 Vblanks. It gives 21 sector in 1 Vblank. Increasing line count means need for more data load, while time is less. With 200 lines you have only some 6mS for loading.

Surely that color sharing would decrease data amount - with proper material. We can later look into it. But still, will not be able to achieve 200 lines with ACSI. Then can load only some 12 sectors per V blank, what means 24 KB only in 1/12.5 sec - not even enough for bitmap self. 

  •  

Cyg

Thx for the overscan routs, I will add this mode tonight.

Here is version 06 of higheSTcolor http://www.top250filmsdvd.com/atari/higheSTcolor06.exe

- result files in 1 file : pix+pal+pa2 when needed
- faster : 6s for quality 2 which is quick good, 12s for quality 3 and 30s for quality 4
- bug correction in the quality re-evaluation, it improves the result by a little

Cyg
  •  

Petari

I made some capturing with following conversion settings: 
higheSTcolor05.exe -3 --mode=1 --dith=3 --pixpal=3  %1
http://atari.8bitchip.info/CheCyg80cl.avi
There are timing errors - white and black pixels appear. You should shift timing 1px left and right and make 2 test executables with both to check it out - what only I can do at moment.  With PCS there is option to use custom timing table (csv files I posted here).
There is minimum noise, what is good. But sky gradient is not so good as with predithering/error diffusion.
http://atari.8bitchip.info/Cheetah.avi
Above is preprocessed with 15bpp error diffusion - best for dual palette.

Btw. practically same errors in gradient appear with PCS and post error diffusion too.

I will try today some of your new modes/settings. Still not solved 54 color mode playback SW, and need to work on 48c/l too.
  •  

Petari

I did some conversions with v.06. It is significantly better.
Tried with  preprocessed, without dithering in your encoder. With dual palette. You may see how it looks in emulator.
http://atari.8bitchip.info/bipi.zip   40MB .
There is original MPG (no sound), can load directly with Virtual Dub and generate BMP files - after resizing, denoising, opt. error diffusion,  etc. Included player for emulators, and muxing SW for PC. If no sound, just don't select audio file, program will insert silence.
Player and muxer are only for 320x200, 54c/l. dual palette mode.

I don't know is it because no dithering set, but there is too much flickering - big differences between fields.

I don't get the purpose of F files (with F added after number) - if muxing them, getting just bad result, like with shifted timing.

This, with joined parts is fine - muxing goes better, faster.

Now, about timing problem:  I see bad dots on my STE, while in emulator is mostly OK.  What makes me think that you made timings for Steem ? Do you test on some STE ?
There is some difference in exact pixel timings for sure. I have same results on my STE and Mega STE. But sometimes it goes wrong - then I need to do hard reset or turn off machine. It is 1 from 10 cases, and maybe has something with warming too. About it was talked on other places too, so it is not only by me. 
So, we need to see what is it exactly, and are there differences between STE machines considering this.

And little about loading from disk during playback and what to care about:

Data loading from disks is possible only in whole sector (called block too) portions.
Standard sector size is 512 bytes. If we need only 10 bytes, still need to load
whole sector.

Syncro color update, direct from disk: in this case, we still need to work with whole
sector sizes. Additionally, some pause between sectors must be made, otherwise
it will fail.

108 bytes/line case:  if we just load 108, 108, ... etc. bytes in order, it will fail -
- because 512 is not divideable with 108, and sector change will happen in
middle of movem execution, so no pause. We need min some 10 microSecs - with modern disks.
Therefore padding must be done. With 108 bytes/line, need to padd 20 bytes
per line - then 4 lines will make 1 sector:  4x(108+20)=512. 
This is not the best case - padding incrases data size for about 16% . 
With 48c/l. so 96 bytes per line can have less padding - then group it in 5 lines for 1 sector.

With overscan, I have minimal paddings - because selected all params. in that goal.

Of course, no padding in lines needed if no syncro load - case of 12.5 fps ACSI playback. But still must padd to complete sector sizes, therefore I went on 198 lines.
  •  

Cyg

Hi Petari,

Known warm problems can effectively sometimes happen that only a reboot can solve.
I think that some of us had a fix and/or a way to detect it.

For flickering, yes it is "normal" as you choosed the worst option for the 2 pal mode : 1 pal is for lighter value, 2nd is for darker.
You could prefer the dither=2 option which at no cost produce a discret "TV effect": 1 line every 2 is for the lighter or the darker colors, alternatively between lines & pals. Result is therefore more balanced.
dither=3 and dither=1 should be more stable, because balanced pixel per pixel alternatively, not line by line like dither=2 and frame by frame like dither=0

*F.hST (I changed the file extension, hST for higheSTcolor), is the same as the *.hST except with a post-diffusion error, as configured in --ed (1=Floy, 2=Sierra, 3=Atkinson).

Here is a version 07 : http://www.top250filmsdvd.com/atari/higheSTcolor07.exe
- new option to shift some pixels, +-5 pixels , value of 5 means centered. I am not sure if is a global issue for all 80 cols or some palettes only. I believe the color is too longer active because some left colors are still active 5/10 pixels right.
- 10 to 20% faster, I am taking a look at faster compiler, small effort for a promised 10 to 40% speed decrease

Maybe you can find an obvious error in my pixel timings for 80 colors mode, here is the code I use :

   for (i = 1; i <= largeur; i++) // pixels start at 1 and end at largeur (320 or 416)
   {
      for (j = 0; j < 16; j++)  // pal 1
      {
         paletteFinale[j+2] = j; // real color index, between 0 and 15
         if (i > 0 + shiftpixels && i < 32 + (j) * 4 + shiftpixels) // pixel zone where color j is active
            paletteactive[j+2] = 1;
      }
      for (j = 16; j < 2*16; j++)  // pal 2
      {
         paletteFinale[j+2] = j - 16;
         if (i >= 32 + shiftpixels + (j - 16) * 4 && i <104 + (j - (16)) * 4 + shiftpixels)
            paletteactive[j+2] = 1;
      }
      for (j = 2*16; j < 3 * 16; j++)  // pal 3
      {
         paletteFinale[j+2] = j - 2 * 16;
         if (i >= 104 + (j - (2 * 16)) * 4 + shiftpixels && i < 176 + (j - (2 * 16)) * 4 + shiftpixels)
            paletteactive[j+2] = 1;
      }
      for (j = 3*16; j < 4 * 16; j++)  // pal 4
      {
         paletteFinale[j+2] = j - 3 * 16;
         if (i >= 176 + (j - (3 * 16)) * 4 + shiftpixels && i < 248 + (j - (3 * 16)) * 4 + shiftpixels)
            paletteactive[j+2] = 1;
      }
      for (j = 4 * 16; j < 5 * 16; j++)  // pal 5
      {
         paletteFinale[j+2] = j - ( 4 * 16);
         if (i >=248+ (j - (4 * 16)) * 4 + shiftpixels)
            paletteactive[j+2] = 1;
      }
   }

shifpixels is used for the new shifting option.

To calibrate it I usually generate a bitmap ruler like this one, a simple capture or photo of the screen makes the timings & colors readable by using a single set of 48/54/80 colors on every line :

lea lineHiColorPictureBackground,a1
move.l adr_pix_pic_current,a0
; add.l #160*6,a0
move.w #16,d7 ; heigth of the zones
.screenTestHiColor:
move.l (a1)+,d0
move.l (a1)+,d1
rept 20*6
move.l d0,(a0)+
move.l d1,(a0)+
endr
dbf d7,.screenTestHiColor

move.l adr_pix_pic_current,a0
move.w #16,d7
.screenTestHiColorMesure:
lea lineMesure,a1
rept 320/16
move.l (a1),d0
or.l d0,(a0)+
move.l 4(a1),d0
or.l d0,(a0)+
add.l #160-8,a0
move.l 8(a1),d0
or.l d0,(a0)+
move.l 12(a1),d0
or.l d0,(a0)+
add.l #160-8,a0
move.l 16(a1),d0
or.l d0,(a0)+
move.l 20(a1),d0
or.l d0,(a0)+
add.l #160-8,a0
move.l 24(a1),d0
or.l d0,(a0)+
move.l 28(a1),d0
or.l d0,(a0)+
add.l #160-8,a0

sub.l #4*160-8,a0
endr
add.l #160*5,a0
dbf d7,.screenTestHiColorMesure

with :

lineMesure
dc.w %0101010101010101
dc.w %0101010101010101
dc.w %0101010101010101
dc.w %0101010101010101
dc.w %0001000100010001
dc.w %0001000100010001
dc.w %0001000100010001
dc.w %0001000100010001
dc.w %0000000100000001
dc.w %0000000100000001
dc.w %0000000100000001
dc.w %0000000100000001
dc.w %0000000000000001
dc.w %0000000000000001
dc.w %0000000000000001
dc.w %0000000000000001

lineHiColorPictureBackground
dc.w 0,0,0,0
dc.w $FFFF,$0,$0,$0
dc.w 0,$FFFF,$0,$0
dc.w $FFFF,$FFFF,$0,$0
dc.w 0,0,$FFFF,$0
dc.w $FFFF,$0,$FFFF,$0
dc.w 0,$FFFF,$FFFF,$0
dc.w $FFFF,$FFFF,$FFFF,$0
dc.w 0,0,0,$FFFF
dc.w $FFFF,$0,$0,$FFFF
dc.w 0,$FFFF,$0,$FFFF
dc.w $FFFF,$FFFF,$0,$FFFF
dc.w 0,0,$FFFF,$FFFF
dc.w $FFFF,$0,$FFFF,$FFFF
dc.w 0,$FFFF,$FFFF,$FFFF
dc.w $FFFF,$FFFF,$FFFF,$FFFF


Cyg
  •  

Cyg

#96
version 08 : http://www.top250filmsdvd.com/atari/higheSTcolor08.exe

- faster (-20%)
- a little quality bug corrected

version 09 : http://www.top250filmsdvd.com/atari/higheSTcolor09.exe
- new dither=4 : Sierre 2-4 forced pre-dithered => no longer need of Ximagic. Don't work (for now) with 1pix + 2pal
- updated dither=0 for 1pix + 2pal : no longer a darker and a lighter palette, but 2 noisy pictures more balanced and a 29K colors feeling result

Cyg
  •  

Petari

You make new versions faster than I can really test them  :)

Timing describer. Should be easier to follow and implement.


Meanings:   pixel # - usually 0-319  or  0-415
Pixel's color index:  0-15
Change is always in 4px steps, because write takes 4T states.
Therefore we can present shorter: 
start px, start index, end px, end index , palette # or 2 palette #s


80 colors/line - only on real STE .

0,0,31,15,0   -  everything is palette 0

32,0,95,15,0-1    -  this is with movem for all 16 colors, 64T states, step 4, as always. Index 0 at start is 1, then gradually others too...

96,0,103,15,1   -  all is palette 1  -  this is at reading of opcode for movem - 8T states

104,0,167,15,1-2   - movem, all 16 colors ....

168,0,175,15,2   -  opcode for movem

176,0,239,15,2-3   -  movem, 16 colors

240,0,247,15,3

248,0,311,15,3-4  -  updating pal4

312,0,319,15,4   -  all is palette 4 for last 8px


With 48 colors/line we have some partial palette updates:


0,0,55,15,0   -  all pal0

56,0,87,7,0-1    -  updating colors 0-7 with movem  32T

88,7,99,7,0-1   -  index 0-7 is pal1, other is still pal 0. This is 12T because addit. instruction of 4T

100,8,131,15,0-1   -  updating colors 8-15 with movem  32T

132,0,215,15,1   -  fetching colors from RAM with movem to registers + opcode read of next. 76+8 T states.

216,0,279,15,1-2   -  updating all 16 colors  64T

280,0,319,15,2   -  all is pal2 



Or better format:

start px, end px, start idx, end idx, palettes

80c/l :

0,31,0,15,0

32,95,0,15,0-1    * Note:  px count of transient area is always 4x count of colors changing, so 4x16=64 here.

96,103,0,15,1

104,167,0,15,1-2

168,175,0,15,2

176,239,0,15,2-3

240,247,0,15,3

248,311,0,15,3-4

312,319,0,15,4

***

48c/l :

0,55,0,15,0

56,87,0,7,0-1

88,99,7,7,0-1

100,131,8,15,0-1

132,215,0,15,1

216,279,0,15,1-2

280,319,0,15,2

*****

If there is only 1 palette value (5th one) then all color indexes for that pixel area are that palette value.
If there is 2 - and it is always  n-1  -  n , then it is transient area, where colors are updated - usually with movem, what has pretty consistent timing - I never had errors keeping:  8T states for opcode read, then 4T states for every write (2 bytes at once) to shifter.
So, in transient area you have repeated 4 time (for 4px) same transient color index (like in csv table you see same rows repeated 4 times).

With SW for playback we can control timing only in 4px steps - as you know likely. Existing pixel errors on real STE must be because different T cycles counts between movem instructions - only those which write to shifter are relevant.
It would be good if you could make timing in format as above for your 54 color/line scheme. Then can see is there some diff. between it and my timings (for STE).

  •  

Cyg

#98
Thx Petari,

Your timing help a lot.
For 80 cols I see a difference in your timing and my implementation, a one pixel difference.

I updated it in a version 11 : http://www.top250filmsdvd.com/atari/higheSTcolor11.exe

News :
- a little faster (-20%)
- 80 cols new timings (1 pixel)
- Sierra predithering (dither=4) : lighter effect with a minimum threshold => less noise especially on big monochrome zone

Cyg
  •  

Petari

File is ...10.exe .

We need little help from people owning STE or Mega STE computers. I will post here couple simple testing programs, so we can determine % of machines with timing A or B - I call it so for now.
Steem has timing B, while my STE timing A - when is warmed. When cool boots with A or B 50-50%.
No need for hard disk - will be shorter files.
  •  

Cyg

What if using the VBL timer in that case ? It should be more stable on any machine, no ?

Cyg
  •  

Petari

Vbl has nothing with this. And it is not timer :-).
Here is my explanation: memory cycle is 4T states by 68000. It activates in this case glue chip - what decodes address and then activates shifter chip for writing. In that process may happen +-1 T state difference - and no problems, because write self may take only 1 T, so in 4 cycles is some tolerance.
Chips are clocked from same source, but some delays are always there, so  phase shifting may happen .

  •  

Cyg

#102
new version, 16 : http://www.top250filmsdvd.com/atari/higheSTcolor16.exe

dither=5 , a mix between dither=3 (X+Y) mod 2 and dither=4 Sierra predithering, give it a try, it could sometimes be interesting...
It preserves the sharpeness as (X+Y) mod 2 and smooth the big monochrome areas.

Block of lines is available but it bugs a little and it slowdowns by 20%, but I still can optimize a little.

Also I changed the mode=2, 48 cols, with newest Petari timings, I hope they are correct...

Made some test on Alladin, it works pretty well !
You can use a picture with a Sierra dithering or directly let the dither=4 or dither=5 make its own Sierra.

Some samples to run with CYG2001P.TTP  (54 cols) :  http://www.top250filmsdvd.com/atari/Alladin.zip
alad4.cav = dither=4 
alad4F.cav = dither=4 with a post dithering
alad5.cav = dither=5 
alad5F.cav = dither=5 with a post dithering

And for fun, a colorful video of an indian celebration called "Holi" a colorful celebration, very interesting to optimize :  http://www.top250filmsdvd.com/atari/holi3.zip  using dither=3

Sometimes a post-dithering improve the things especially the sharpeness, use files F*.hst, I changed the naming to be easily recognize by STEAV

Cyg
  •  

Petari

Looks great. Keep up good work !
Guide for conversion in monday ...
  •  

Petari

#104
Added guide for converting into STE hi-color.   http://atari.8bitchip.info/gutoHav.html

It is just beginning. There is much more, what may be needed. Currently, I experience with MVtools, a quality framerate conversion. In some cases, it produces really flawless new frames. 

Nice trailer:  http://atari.8bitchip.info/CloudAtlasTrl.avi

I prepare 2 new formats:  320x198px at 50fps  - some say that only at 50 is real .
Overscan 320x240px, 25 fps - for old style 4:3 material. Both with 48color/line.
That would be all , I think, considering formats.
  •