High-color video playback on STE

Started by Petari, 25-09-2012, 14:28:44

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Petari

I made couple short videos with audio :  http://atari.8bitchip.info/HCPL.ZIP
Used Photochrome 4096 color mode, fullscreen, 12.5 fps, audio is 25033 Hz, mono.
It is very demanding considering CPU time, disk space, loading speed. Max some 6 sec. playback at once is possible.
Requirements:  STE or MSTE with 4MB RAM , 3.5 MB free RAM. Hard disk. TV/Monitor capable to work at 50Hz (PAL). May try with some emul. Steem, Hatari. On ST will be no sound, and less colors.   Too much ?  Well, it is still not enough for some contigous playback :-)
This could be base for enhanced Space Ace.

Contigous playback with 4096 colors ?  Extremely demanding, especially at 25 fps and fullscreen. I have some ideas how to achieve it on STE. But it will not work with UltraSatan, to say ahead. Only with Cartridge or IDE IF + little mod. + raw hard disk access and special SW. If it will work some day, expect aprox. VHS quality - there will be noise too, audio quality similar to mono, analog VHS. Sharpness may be better than average VHS. About 45 min on 4GB.
  •  

Cyg

#1
Hi Petari,

Good to see new stuff on STE :-)
I made something similar last year using my new hicolor rendering technic on ST with no sound, you can see some details at the bottom of my thread on Atari Forum http://www.atari-forum.com/viewtopic.php?t=21148
I improved the rendering quality a little since.
I tried some UPX and my own algorithm compression with no big success in term of size (-10 to -30% at best for complexe pictures, surely more for cartoons using complexe trails between 4 bits pixels and not the 4 plans Atari coding). Whatever, it takes a bunch of time to uncompress and also a lot of space in memory if everything is preloaded.

Using another technique of image animation in hicolor, I made an "infinite zoom" effect.
Using 10 pictures I manage a motion blur in between to smooth the transition. You can see 2 examples here :
http://www.youtube.com/watch?v=WlUan662xGM
http://www.youtube.com/watch?v=1qRfYI8cMGY
Pictures are made of 200 to 400 different colors in fullscreen 400x250. I simplified the encoding process to use only 48 colors every 6 lines to give a good enough rendering result and few memory consumption.

You could also take a look at Photochrome 5 encoding for a better encoding than Spectrum, Hicolor is not released yet :-)

Best regards and stay Atari !
Cyg / BLaBla
  •  

Petari

Hi Cyg,

I know your similar hi-color video playback - it is scene from Avater. I was not able to judge how noisy it is really, because of constant rotation.
I did conversions with Photochrome 5, on PC. With mode 3, what is not present on Photochrome 4. It was necessary because some errors on white pants, in video 1. Have some mailing with Douglas Little - he did some updates in last week.

I saw your another demo too - quite impressive.

About contigous playback - we can forget data packing. There is simple no CPU power and time for it. I don't think that it is big flaw today, when storage is cheap.
The real problem is that CPU is busy with moving color data to shifter all time during scanlines. And you can not use DMA/ACSI disk transfer, because it slows CPU down. 

But, as said in first post here - there is a chance:  and with even better quality.  With special hack in STE (very simple mod) + cartridge or special IDE IF + Compact Flash card, it is possible to achieve 25 fps, fullscreen, with DMA audio .
With hack we can have very big transfer from CF card (or hard disk):  peek speed is over 2.5MB/sec. It means that can load 32KB bitmap data in periods without scanlines (about 35% time). And best is that can load color data into shifter directly from IDE disk.
By using simple commands:   move.l  (a0)+,d0  in order.  It is faster that movem and gives better distribution. HW  mod is needed to reverse operation - it will write to (a0) instead of reading from.  I guess that we can have about 68 colors/line with it.
See this:  http://atari.8bitchip.info/astide.php  - second project, cartridge port IF. Transfer rate is 2354 KB/sec . 
Well, all it needs accurate timings in SW, + support in converion SW, because is not compatible with anything what exists.

  •  

Cyg

#3
Great, if you want to test it under my hicolor process, feel free to send me your pictures to encode.

I can try to package my encoder (running on PC) and the rendering routines to be more user friendly than they are for my only use.
I have several modes :
192 pixels large with 48 colors : the best in average density
256 pixels large with 52 colors
320 pixels large with 54 colors + 2 fixed colors
400 pixels large with 48 colors : STE only because of the blitter and hardware address mode on every scanline => for the next demo :-)

After a quick comparison on few pictures, I think my hicolor is really better than Photochrome 5 m3 but I did not test every parameters on it.
Here some samples, using one picture and 2 swaping pictures result (size is doubled) :

Original


Photochrome 5 -m3


Hicolor


Photochrome 5 -m3, fade result from 2 swaping pictures


Hicolor, fade result from 2 swaping pictures

Photochrome has another issue on the global lighting, results are always a bit brighter than the original.

Cyg / BLaBla
  •  

Petari

I'm not totally satisfied with Photochrome - there is too much 'noise' at animations, even when it is suplied with not so much colors. And need for m3 means slow conversions.
I will post you URL, when can DL pic serial for conversion.  It will be 320x200 png, 16M colors.  And want 320 px. result. , single field + if possible, source code for displaying 1 frame - for PAL (50 fps) mode - so I can combine i with playback SW.
If you prefer some other pic format, let me know .
  •  

Cyg

All right !
I made a test with a cartoon style picture, it's not so easy to convert than it seems especially if the source includes artefacts, as they tend to me more visible after optimisation :-)
I will give you the ASM replay routines, no pb at all !

Best regards,
Cyg / BLaBla
  •  

Petari

Yes. I did some normal movie segments, there is less noise (shimmering) visible usually.
I will try to do this weekend some experiments with cart. port IDE adapter. To see how much colors can have per line. Will try to make some pattern and displaying it.
It would be good that you send me your line timing, color info writiong to shifter concept too - I think that I saw it somewhere, but to avoid searching for.  It will help me in doing it faster/better.
  •  

Petari

I did some testings about possible data transfer speeds with cartridge port IF CATA.
After some thinking and looking docs, I went on movem solution, but in special way:

Usually, data transfer goes by reading data into CPU from source and then writing to dest. If we can write directly to dest from source, it will work much faster - in theory 2x.  IDE ports is ideal for such thing, because it autoincrements data adresses. So, we can do transfer by activating IDE for read and then data will appear on data bus, while setting RAM to write on specific loc. But no such CPU instruictions.  Opposite is possible:   move.l  (a1)+,d0 - executing it wits special logic will result in write content of adr. a1 to IDE port.  And it means peek speed of 2.66 MB/sec - what is used in my CATA design.
To achieve opposite - reading from attached IDE drive we need trick - simple mod in ST(E) by cutting CPU R/W line leading to MMU - then logic on ID can invert it when needed - by instruction same as above:  move.l (a1)+,d0  -  then it will write what is on bus to adr at a1.

With movem we can do even faster. But I avoided it so far, because there is small bug with movem -
it performs 1 mem. read cycle more at the end. And it triggers IDE port counter too, so we loose 2 bytes at read. But, with inverting logic, read will be write, and overshot writes exactly
where we need it !

I did test with CATA driver using movem - and with inverting R/W signal logic:


superFast   *  - reading with movem !
* Destroys all registers except a7 and a5

   move.w  #$2700,sr *disable interrupts

* Need to save sp and a5 :

move.l sp,$30.w
move.l a5,$34.w

*activate RLA :
   IdeSelpDataFR
 

* Needs change on line /RW to MMU -
* so by IDE data read MMU gets write instead
* reading ! 
* This is some kind of semi-DMA
* because data goes directly from IDE to RAM
* or vice-cersus.

   movem.l  (a1)+,d0-d7/a0/a2-a7    *  15x4 bytes + 2 bytes overshot = 62 bytes
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7
   addq.l   #2,a1
   movem.l  (a1)+,d0-d7/a0/a2-a7     * so far 62x8 = 496 bytes
   addq.l   #2,a1

*   movem.l  (a1)+,d0-d2        * 3x4 +2 bytes = 14 > 510 so far
*   addq.l   #2,a1
*   move.w   (a1)+,d1     *  last 2 bytes

* better :
   move.l  (a1)+,d0
   move.l  (a1)+,d0
   move.l  (a1)+,d0
   move.l  (a1)+,d0

  IdeSelpR   * LAO off
  move.l $34.w,a5
  move.l $30.w,a7
  move.w srkeep-vbas(a5),SR

  rts



Because of bug it is slower than could be. Still, I got over 2.5MB/sec with filesystem read, with not optimised driver.  Raw speed should be over 3MB/sec.

Now, I want to use same for color palette loading directly into shifter, from IDE port (CATA IF).

But because of bug, it needs special approach. After some time, I concluded that using 15x word transfer is the best - then overshot will write 16-th color too.  All it in 72 CPU clocks.



hiColor   *  4096 color support 

* SR=2700 already !

   move.l sp,$30.w
   lea   $FFFF8240.w,a0

*activate RLA :

   IdeSelpDataFR

* 1 line - must be exactly 512 T states of CPU


* Need 199 times in row !


hiCline     *  a0 must be set to $FF8240 at entering

* Early ideas :
* may use other addr. registers to start with higher color
* or, much better, start with $FF823E !  -  not possible - gives bus error !

*    movem.l  (a0),d0-d7    * 16 colors + half at beginn, dummy,  because overshot
** above is 12+8x8 = 76 T states
*   movem.l  (a0),d0-d6   * 152
*    movem.l  (a0),d0-d6    * 228
*    movem.l  (a0),d0-d6     * 304
*    movem.l  (a0),d0-d6     * 380
*    movem.l  (a0),d0-d6     * 456   *  too much already ?


*    movem.l  (a0),d0-d6   * transfers 14 + half colors because overshot
** above is 12+8x7 = 68 T states

** so, maybe to add :

*    movem.w   (a1),d0-d1   * rest 6 bytes  , with overshot
** above is  12+4x2 = 20 T states
** so, for 16 colors need 88 T states

** Repeat above 4 times - 352 T states 64 colors


* fill with nops to get 512 cycles


* Better variant :

  * a0 = $FF8240

   movem.w  (a0),d0-d7/a1-a7   * 15 colors, +1 via overshot,
* so all 16, no resol. reg damage
* above is 6+15*2 = 72 T states

   movem.w  (a0),d0-d7/a1-a7    * 144

   movem.w  (a0),d0-d7/a1-a7     *  216

   movem.w  (a0),d0-d7/a1-a7     * 288

   movem.w  (a0),d0-d7/a1-a7     * 360

   movem.w  (a0),d0-d7/a1-a7     * 432

   movem.w  (a0),d0-d7/a1-a7     *  504

   nop    * 508
   nop    * 512  - end of line

* code len is 32 bytes

* Above means 16x7 colors = 112, but pixels are only through 320 t states.
* so 320/72 - about 70 colors - ??? or little  more ?


* repeat yet 198 times :
 


The question is:  what is the best moment to start with palette update ?  Of course, we preload first 16 colors in H-blank. Then at which pixel to start update ?
And I think that we should go on all 16 colors update  , except last one, where no time for all 16 - and no sense, as running out of pixel area.

I want to make test patterns for this - where will be visible how many colors can have per line (because on-fly palette change what color will be shown depends on color index # - and is it updated already, or in shifter is still precious value) , + to see will it really work enough fast. Should, because modern drives, cards are much faster that old Atari - cheap Kingston 4GB can over 15 MB/sec, and access time is under 0.5 mS .
  •  

Cyprian

High color video player look pretty amazing on STE.

Quote from: Petari on 01-10-2012, 16:53:14With movem we can do even faster. But I avoided it so far, because there is small bug with movem - it performs 1 mem. read cycle more at the end.

cool research, I only heard about bug in CLR.l
Can you please give exact example of buggy movem instruction?


  •  

Petari

#9
There are diverse bugs in 68000. Most of them is not harmful - usually only slows down. It seems, that designers were satisfied that it works without big problems, and not optimised further. I saw some articles online about it.

Movem:  it is visible from listings above.
For instance, command  movem.l (a1),d0-d7  will transfer 32 bytes at adr. a1 to registers d0-d7.  But at end, it will perform additional word read from adr.  a1+32, in empty.  Except little slower work, it has no other side effects - for normal usage. However, if you want to use it for very fast IDE hard disk read, it is useless - because 2 bytes are lost - additional read in empty at end will trigger IDE counter, and it will suply +2 bytes, which go lost.  This is what I experienced some 4 years ago.
You may see from cycle tables that movem from memory is 2 cycles longer than movem to memory - where no bug.

In my desparate effort to make 25 fps hi-color playback possible, I just went to idea to try movem from memory in reversed direction - when logic sets MMU to write to RAM instead normal read. I expected that then it will write that additional 2 bytes right to proper loc, since it should be regularry incremented.  And it happens - works reliable and very fast. Peek speed is about 3.6 MB/sec.
Note that in that case data appearing on bus (from IDE port) goes to RAM directly. It goes of course to CPU registers too, but it is not relevant.

1 min movie fragment at 25 fps is under work.
  •  

Cyg

Looks terrible !

But Petari, you need to encode the movie respectfully to your displaying method based on inverted movem, isn't it ?
If you send me a screenshot with random colors, colors index and pixel graduation of your display result, I can make a specific configuration in my Hicolor encoder and send it to you the executable to encode your movie frame by frame.


I usually calibrate my configuration with this kind of screen test made of 16 big horizontal color bandes with pixels marks every 2/4/8 to see where the color changes exactly, interleaced is not necessary.

Best regards
Cyg
  •  

Petari

Exactly - it needs new/updated encoding SW. And should be better, thanks to more colors per line possible.
I made some primitive test pattern for testings, but it appeared that my Sandisk CF card went dead, while Kingston 4GB can not do read multiple, what is essencial here.  I will try with 2.5 inch hard disk today later. But CF should be better because faster access times.
And I think, that I can do version compatible with existing hi-color formats.
Just need to maintain same timing as line displaying rutines - so adding some nops here and there, or like... Then 320x200 at 25 fps will work fine. With sound.

Is this your displaying code ? :

; HigheSTcolors display rout. 1992-2011 Cyg / BLaBla
; colors 0 and 1 are global
lea   $ff8244,a6
lea   colors,a7
rept 199 ; 199 lines only, first line is lost because of the synchronisation, possibility for a lower border removal
movem.l (a7)+,d0-d7/a0-a5   ; load 28 colors
movem.l d0-d6,(a6)         ; display 14 colors at the beginning of the right border
movem.l (a7)+,d0-d6         ; load 14 colors
movem.l d7/a0-a5,(a6)      ; display 14 colors right after the end of the left border
movem.l (a7)+,a0-a5         ; load 12 colors
movem.l d0-d6,(a6)         ; display 14 colors
movem.l a0-a5,(a6)         ; display 12 colors
move.w (a7)+,(a6)         ; display 1 last color (3 nop)
endr

And for Photochrom too.

But currently doing 1 min video, with res 320x160 - then should be enough time to read 51KB in remaining non-scanline times of 1/25 sec, with so high transfer rate. And source is even at larger AR.
Then can use slightly modded Photochrome code - adjusted to 160 lines, and with improved start - starting at line 40, so will be centered vert.

Thanks for offering code update. I will need it for sure. Just need to solve that read multiple problem (by color update must be no waitings. So, need card which can at least some 64 sectors in read multiple mode). Will look for some new CF card - but choice is not big in shops, and prices are high. Likely will need to order it.
  •  

Cyg

Yes it's mine in its 320 pixel wide version with 54 colors + 2 colors per line from 1 year ago, I don't think I have improved it since.
You will also need to know when it exactly starts to be centered.

I also have  a 48 colors 400 pixels wide (fulscreen left/right/top&bottom) version, using blitter / STE only and I think I had a fullscreen version compliant with every ST but never used.

Cyg
  •  

Cyprian

Quote from: Petari on 02-10-2012, 10:59:31In my desparate effort to make 25 fps hi-color playback possible, I just went to idea to try movem from memory in reversed direction - when logic sets MMU to write to RAM instead normal read. I expected that then it will write that additional 2 bytes right to proper loc, since it should be regularry incremented.  And it happens - works reliable and very fast. Peek speed is about 3.6 MB/sec.
Note that in that case data appearing on bus (from IDE port) goes to RAM directly. It goes of course to CPU registers too, but it is not relevant.

does it mean that your CATA is able to detect MMU cycles? and use those cycles to write data to RAM? that really crazy idea :)

If it really works, maybe it would be possible to feed directly Shifter (with video data) during MMU cycles?



  •  

Petari

No. The trick is that logic will invert R/W line only when RAM access is performed + that mode is set. Therefore transfer code self must executing from cartridge ROM - so may distinguish simple.
Of course, cart port adapter can not detect much - there are only few CPU lines.
Direct access to shifter/MMU ? It would need some serious cuttings in machine, I guess.
I thouhght about something related - fast RAM access during non-scanline periods. Then 8MB/sec would be possible for about 34% of time. But likely again lot of cuttings + complicated logic.
  •