Instead of using two-complement integer representation of data samples during
code-block decoding, use a signed magnitude representation to avoid comparison
branches in passes.
However the (best) timings show that it is actually slightly slower, or at
least no better. 51649 ms with this attempt vs 51536 before on MAPA_005.jp2
reencoded with default options.
Two variants with sign in msb (the default) or in lsb lead to similar
performances
Addition flag array such that colflags[1+0] is for state of col=0,row=0..3,
colflags[1+1] for col=1, row=0..3, colflags[1+flags_stride] for col=0,row=4..7, ...
This array avoids too much cache trashing when processing by 4 vertical samples
as done in the various decoding steps.