Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)
This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
in parallel. This is the most beneficial improvement, since the
vertical pass involves a lot of cache trashing.
With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
Commit 29313eb5 introduced those flags to avoid issues with
-fsanitize=unsigned-integer-overflow
However it is better just to rewrite the loop to avoid such condition
to occur.
There are situations where, given a tile size, at a resolution level,
there are sub-bands with x0==x1 or y0==y1, that consequently don't have any
valid codeblocks, but the other sub-bands may be non-empty.
Given that we recycle the memory from one tile to another one, those
ghost codeblocks might be non-0 and thus candidate for packet inclusion.
This saves comparing the current pointer with the end of buffer pointer.
This results at least in tiny speed improvement for raw decoding, and
smaller code size for MQC as well.
This kills the remains of the raw.h/.c files that were only used for
decoding. Encoding using the mqc structure already.
Ported from Carl Hetherington work (actually through Matthieu Darbois's port
on top of OpenJPEG 2.1.0)
Can reduce total encoding time by 10-15%
WARNING: VSC mode is not implemented, and so is a temporary regression
that must be fixed.
Decoding some valid .jp2 files like Sentinel2 datasets leads to warnings like:
No incltree created.
tgt_create tree->numnodes == 0, no tree created.
No imsbtree created.
tgt_create tree->numnodes == 0, no tree created.
Besides that, the image is correctly decoded. So there is no reason to emit
those warnings.
* test_tile_decoder: Fix potential buffer overflow (coverity)
CID 1190155 (#1 of 1): Unbounded source buffer (STRING_SIZE)
Using a pointer instead of buffer of fixed size avoids the limit
for the length of the input file name.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* test_tile_encoder: Fix potential buffer overflow (coverity)
CID 1190154 (#1 of 1): Unbounded source buffer (STRING_SIZE)
Using a pointer instead of buffer of fixed size avoids the limit
for the length of the output file name. This implies that the length
can exceed 255, so the data type for variable len had to be fixed, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* openjpip: Initialize data before returning it
This fixes an error reported by Coverity:
CID 1190143 (#1 of 1): Uninitialized scalar variable (UNINIT)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* query_parser: Fix potential out-of-bounds read (coverity)
CID 1190207 (#1 of 1): Out-of-bounds read (OVERRUN)
Variable i must be checked before testing query_param.box_type.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* jpip_parser: Fix potential out-of-bounds read (coverity)
CID 1190206 (#1 of 1): Out-of-bounds read (OVERRUN)
Variable i must be checked before testing query_param.box_type.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC.
Need to check proper usage of OPJ_RESTRICT (if correct then there’s
probably a bug in vc14)
Closes#661
The definition of bit-fields with type OPJ_UINT32 caused complilation errors
on IBM iSeries, because OPJ_UINT32 is defined as uint32_t, and
uint32_t is defined as unsigned long in <stdint.h>. The definition of
bit-fields with an integer type of a specific size doesn't make sense
anyway.
The type casts which used this data type can be removed by changing
the signature of function swap16. As this function is called with
unsigned variables, this change is reasonable.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
openjpeg provides libopenjp2.pc, so the require statements must refer to
libopenjp2 instead of openjp2.
Fixes#594
Signed-off-by: Stefan Weil <sw@weilnetz.de>
By default, only the main thread is used. If opj_codec_set_threads() is not used,
but the OPJ_NUM_THREADS environment variable is set, its value will be
used to initialize the number of threads. The value can be either an integer
number, or "ALL_CPUS". If OPJ_NUM_THREADS is set and this function is called,
this function will override the behaviour of the environment variable.
Addition flag array such that colflags[1+0] is for state of col=0,row=0..3,
colflags[1+1] for col=1, row=0..3, colflags[1+flags_stride] for col=0,row=4..7, ...
This array avoids too much cache trashing when processing by 4 vertical samples
as done in the various decoding steps.
Add a opj_t1_dec_clnpass_step_only_if_flag_not_sig_visit() method that
does the job of opj_t1_dec_clnpass_step_only() assuming the conditions
are met. And use it in opj_t1_dec_clnpass(). The compiler generates
more efficient code.