Commit Graph

63 Commits

Author SHA1 Message Date
Even Rouault 47943daa15
Speed-up 9x7 IDWD by ~20%
"bench_dwt -I" time goes from 2.8s to 2.2s
2020-05-21 15:42:51 +02:00
Even Rouault adccbc8336
Irreversible decoding: partially revert previous commit, to fix failures in test suite 2020-05-20 20:31:28 +02:00
Even Rouault 3cd1305596
Irreversible compression/decompression DWT: use 1/K constant as per standard
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.

The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
2020-05-20 20:31:28 +02:00
Even Rouault e46e300de5
opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side 2020-05-20 20:31:28 +02:00
Even Rouault 00cff6f5c0
Encoder: use floating-point operations for irreversible transformation 2020-05-20 20:31:28 +02:00
Even Rouault 99107d5e46
dwt.c: change sign of constants to match standard and compensate (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault 07d1f775a1
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch

Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)

Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s

$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s

$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s

$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s

Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Nikola Forró 943db0f1c2 Fix several memory and resource leaks
Signed-off-by: Nikola Forró <nforro@redhat.com>
2018-10-31 16:16:22 +01:00
Stefan Weil 3d6ffaf3f3 Fix some typos in code comments and documentation
All typos were found by Codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-05 20:01:10 +02:00
Even Rouault 9cba05762d Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or higher. But not a proper fix itself (refs #493) 2017-09-20 00:43:54 +02:00
Even Rouault 003759a482 Fix null pointer dereference on partial tile decoding when they are empty. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only) 2017-09-06 15:59:19 +02:00
Even Rouault 579b8937ea Replace uses of size_t by OPJ_SIZE_T 2017-09-04 17:35:52 +02:00
Even Rouault c1e0fba0c4 opj_v4dwt_decode_step1_sse(): rework a bit to improve code generation 2017-09-01 22:23:29 +02:00
Even Rouault 8a17be8945 opj_v4dwt_decode_step2_sse(): loop unroll 2017-09-01 16:31:08 +02:00
Even Rouault 83b5a168ec opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in vertical pass 2017-09-01 16:31:06 +02:00
Even Rouault 470f3ed416 opj_dwt_decode_partial_1_parallel(): add SSE2 optimization 2017-09-01 16:31:02 +02:00
Even Rouault 873004c615 Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at a time 2017-09-01 16:31:00 +02:00
Even Rouault 82a43d8035 Optimize opj_dwt_decode_partial_1() when cas == 0 2017-09-01 16:30:54 +02:00
Even Rouault 98b9310361 Various changes to allow tile buffers of more than 4giga pixels
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault d1299d9670 Fix compiler warning in release mode 2017-09-01 16:30:39 +02:00
Even Rouault eee5104a88 opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting operation by properly initializing working buffer 2017-09-01 16:30:32 +02:00
Even Rouault f9e9942330 Sub-tile decoding: only allocate tile component buffer of the needed dimension
Instead of being the full tile size.

* Use a sparse array mechanism to store code-blocks and intermediate stages of
  IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
  buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
  the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault 6ce49bf5ae Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255. Credit to OSS Fuzz 2017-09-01 10:26:18 +02:00
Even Rouault 04b70908a7 Use IDWT whole tile decoding if the area of interest equals to the image bounds, taking into account the reduced resolution factor 2017-08-29 11:40:53 +02:00
Even Rouault a55c024fc6 Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only 2017-08-28 17:18:33 +02:00
Even Rouault bc71bd1219 opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at end of horizontal pass to actual range of interest 2017-08-23 18:58:32 +02:00
Even Rouault 17a7ac42d5 Add comments for filter_width values 2017-08-21 12:25:38 +02:00
Even Rouault f87c5ef7eb Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer. 2017-08-20 22:02:41 +02:00
Even Rouault 5d40325056 Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault 60f8ddf577 Comment fix 2017-07-06 12:11:37 +02:00
Even Rouault 8fa405ee15 IDWT 5x3: fix bug in AVX2 implementation (#953, #957) 2017-06-30 00:03:05 +02:00
Even Rouault fd0dc535ad IDWT 5x3: generalize SSE2 version for AVX2
Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)

This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
2017-06-21 12:12:58 +02:00
Even Rouault f6e3475cc9 dwt.c: small cleanup 2017-06-21 01:07:56 +02:00
Even Rouault fa55b52d19 Improve performance of inverse DWT 5x3 (#953)
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
  in parallel. This is the most beneficial improvement, since the
  vertical pass involves a lot of cache trashing.

With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
2017-06-20 18:01:34 +02:00
Even Rouault 32b20b93e0 Fix astyle issue 2017-06-17 16:37:56 +02:00
Even Rouault cc07aec6c7 Fix warnings with recent GCC versions 2017-06-17 14:09:31 +02:00
Even Rouault 563bd8499e Reformat whole codebase with astyle.options (#128) 2017-05-09 20:46:20 +02:00
Matthieu Darbois 6e7616c83c Remove TODO for overflow check (#842)
The check was already done. It’s been simplified.
Reformat to get consistent style throughout the functions.
2016-09-15 23:51:34 +02:00
Matthieu Darbois 9a07ccb3d0 Add overflow checks for opj_aligned_malloc (#841)
See
https://pdfium.googlesource.com/pdfium/+/b20ab6c7acb3be1393461eb650ca8fa4660c937e/third_party/libopenjpeg20/0020-opj_aligned_malloc.patch
2016-09-15 01:57:53 +02:00
Matthieu Darbois 0954bc11e3 Fix some warnings (#838)
Fix warnings introduced by uclouvain/openjpeg#786
2016-09-14 00:12:43 +02:00
Even Rouault 48c16b2c19 Merge branch 'master' of https://github.com/uclouvain/openjpeg into tier1_optimizations_multithreading_2
Conflicts:
	src/lib/openjp2/t1.c
2016-09-08 10:30:09 +02:00
Matthieu Darbois 9f24b078c7 Change 'restrict' define to 'OPJ_RESTRICT' (#816)
Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC.
Need to check proper usage of OPJ_RESTRICT (if correct then there’s
probably a bug  in vc14)

Closes #661
2016-09-06 00:49:53 +02:00
Even Rouault 7d3c7a345f Be robust to failed allocations of job structures 2016-05-26 23:51:32 +02:00
Even Rouault 57b216bb58 Use thread pool for DWT decoding 2016-05-25 21:02:07 +02:00
Even Rouault 6a1974d40d Add comment explaining bj is not use when l_data_size == 0 2016-01-09 14:30:48 +01:00
Even Rouault 87c0d7dc1e [git/2.1 regression] Fix opj_write_tile() failure when numresolutions=1
When trying the GDAL OpenJPEG driver against openjpeg current master HEAD,
I get failures when trying to create .jp2 files. The driver uses
opj_write_tile() and in some tests numresolutions = 1.

In openjp2/dwt.c:410, l_data_size = opj_dwt_max_resolution( tilec->resolutions,tilec->numresolutions) * (OPJ_UINT32)sizeof(OPJ_INT32);
is called and returns l_data_size = 0. Now in git opj_malloc() has a special case
for 0 to return a NULL pointer whereas previously it relied on system malloc(),
which in my case didn't return NULL.

So only test the pointer value if l_data_size != 0. This makes the GDAL
autotest suite to pass again.
2016-01-08 19:38:45 +01:00
Stephan Mühlstrasser a1fc83cc25 Fix HP compiler warning about redeclaration of function (#640)
HP compiler warns:
cc: "dwt.c", line 798: warning 562: Redeclaration of "opj_v4dwt_decode"
with a different storage class specifier: "opj_v4dwt_decode" will have
internal linkage.
cc: "t2.c", line 1341: warning 562: Redeclaration of "opj_t2_init_seg"
with a different storage class specifier: "opj_t2_init_seg" will have
internal linkage.
2015-10-19 12:14:27 +02:00
Matthieu Darbois 05b3afd28f Merge pull request #636 from uclouvain/opj_malloc-625
Update allocation functions
Fix #625 
Fix #624
Fix #635
2015-10-18 03:14:55 +02:00
mayeut 8034ffde8b Fix inconsistent behavior of malloc(0)
Update #635
Update #625
2015-10-17 02:55:09 +02:00
Mathieu Malaterre 372fead0d7 Remove the explicit restrict keyword
It would trigger a compiler error on xlc compiler.  Fixes #620
2015-10-13 21:07:11 +02:00