Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)
This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
in parallel. This is the most beneficial improvement, since the
vertical pass involves a lot of cache trashing.
With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC.
Need to check proper usage of OPJ_RESTRICT (if correct then there’s
probably a bug in vc14)
Closes#661
When trying the GDAL OpenJPEG driver against openjpeg current master HEAD,
I get failures when trying to create .jp2 files. The driver uses
opj_write_tile() and in some tests numresolutions = 1.
In openjp2/dwt.c:410, l_data_size = opj_dwt_max_resolution( tilec->resolutions,tilec->numresolutions) * (OPJ_UINT32)sizeof(OPJ_INT32);
is called and returns l_data_size = 0. Now in git opj_malloc() has a special case
for 0 to return a NULL pointer whereas previously it relied on system malloc(),
which in my case didn't return NULL.
So only test the pointer value if l_data_size != 0. This makes the GDAL
autotest suite to pass again.
HP compiler warns:
cc: "dwt.c", line 798: warning 562: Redeclaration of "opj_v4dwt_decode"
with a different storage class specifier: "opj_v4dwt_decode" will have
internal linkage.
cc: "t2.c", line 1341: warning 562: Redeclaration of "opj_t2_init_seg"
with a different storage class specifier: "opj_t2_init_seg" will have
internal linkage.