Commit Graph

75 Commits

Author SHA1 Message Date
Even Rouault 1462e9403f
Avoid integer overflows in DWT. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=44544 2022-02-10 14:30:13 +01:00
Stefan Weil 667149ffa1 Fix some typos (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-12-05 13:14:33 +01:00
Even Rouault badbd93af9
Avoid integer overflows in DWT. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11700 and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30646 2021-09-03 15:17:56 +02:00
Even Rouault 6daf5f3e1e
Encoder: avoid global buffer overflow on irreversible conversion when too many decomposition levels are specified (fixes #1286) 2020-11-30 23:29:06 +01:00
Even Rouault 1e931fdb36
Forward DWT 9-7: major speed up by vectorizing vertical pass
`bench_dwt -I -encode` times goes from 8.6s to 2.1s
2020-05-23 01:01:05 +02:00
Even Rouault a38e970fa5
Forward DWT 5-3: major speed up by vectorizing vertical pass
`bench_dwt -encode` times goes from 7.9s to 1.7s
2020-05-23 01:01:05 +02:00
Even Rouault e69fa09f60
Forward DWT: small code refactoring to allow future improvements for the vertical pass 2020-05-22 16:01:45 +02:00
Even Rouault 33d3d0de07
dwt.c: remove unused typedef 2020-05-22 15:06:29 +02:00
Even Rouault 97b384aecd
Forward DWT 5x3: performance improvements in horizontal pass, and modest in vertical pass 2020-05-22 15:03:40 +02:00
Even Rouault bd5f5ee7de
Forward DWT: small code refactoring to allow future improvements for the horizontal pass 2020-05-22 15:02:33 +02:00
Even Rouault 45a35223b7
Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2
"bench_dwt -I" time goes from 2.2s to 1.5s
2020-05-21 17:21:55 +02:00
Even Rouault 272b3e0fb2
Remove useless + 5U margin in opj_dwt_decode_tile_97()
Nothing in code analysis nor test suite shows that this margin is
needed.
It dates back to commit dbeebe72b9
where vector 9x7 decoding was introduced.
2020-05-21 15:42:51 +02:00
Even Rouault 47943daa15
Speed-up 9x7 IDWD by ~20%
"bench_dwt -I" time goes from 2.8s to 2.2s
2020-05-21 15:42:51 +02:00
Even Rouault adccbc8336
Irreversible decoding: partially revert previous commit, to fix failures in test suite 2020-05-20 20:31:28 +02:00
Even Rouault 3cd1305596
Irreversible compression/decompression DWT: use 1/K constant as per standard
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.

The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
2020-05-20 20:31:28 +02:00
Even Rouault e46e300de5
opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side 2020-05-20 20:31:28 +02:00
Even Rouault 00cff6f5c0
Encoder: use floating-point operations for irreversible transformation 2020-05-20 20:31:28 +02:00
Even Rouault 99107d5e46
dwt.c: change sign of constants to match standard and compensate (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault 07d1f775a1
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch

Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)

Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s

$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s

$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s

$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s

Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Nikola Forró 943db0f1c2 Fix several memory and resource leaks
Signed-off-by: Nikola Forró <nforro@redhat.com>
2018-10-31 16:16:22 +01:00
Stefan Weil 3d6ffaf3f3 Fix some typos in code comments and documentation
All typos were found by Codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-05 20:01:10 +02:00
Even Rouault 9cba05762d Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or higher. But not a proper fix itself (refs #493) 2017-09-20 00:43:54 +02:00
Even Rouault 003759a482 Fix null pointer dereference on partial tile decoding when they are empty. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only) 2017-09-06 15:59:19 +02:00
Even Rouault 579b8937ea Replace uses of size_t by OPJ_SIZE_T 2017-09-04 17:35:52 +02:00
Even Rouault c1e0fba0c4 opj_v4dwt_decode_step1_sse(): rework a bit to improve code generation 2017-09-01 22:23:29 +02:00
Even Rouault 8a17be8945 opj_v4dwt_decode_step2_sse(): loop unroll 2017-09-01 16:31:08 +02:00
Even Rouault 83b5a168ec opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in vertical pass 2017-09-01 16:31:06 +02:00
Even Rouault 470f3ed416 opj_dwt_decode_partial_1_parallel(): add SSE2 optimization 2017-09-01 16:31:02 +02:00
Even Rouault 873004c615 Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at a time 2017-09-01 16:31:00 +02:00
Even Rouault 82a43d8035 Optimize opj_dwt_decode_partial_1() when cas == 0 2017-09-01 16:30:54 +02:00
Even Rouault 98b9310361 Various changes to allow tile buffers of more than 4giga pixels
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault d1299d9670 Fix compiler warning in release mode 2017-09-01 16:30:39 +02:00
Even Rouault eee5104a88 opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting operation by properly initializing working buffer 2017-09-01 16:30:32 +02:00
Even Rouault f9e9942330 Sub-tile decoding: only allocate tile component buffer of the needed dimension
Instead of being the full tile size.

* Use a sparse array mechanism to store code-blocks and intermediate stages of
  IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
  buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
  the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault 6ce49bf5ae Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255. Credit to OSS Fuzz 2017-09-01 10:26:18 +02:00
Even Rouault 04b70908a7 Use IDWT whole tile decoding if the area of interest equals to the image bounds, taking into account the reduced resolution factor 2017-08-29 11:40:53 +02:00
Even Rouault a55c024fc6 Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only 2017-08-28 17:18:33 +02:00
Even Rouault bc71bd1219 opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at end of horizontal pass to actual range of interest 2017-08-23 18:58:32 +02:00
Even Rouault 17a7ac42d5 Add comments for filter_width values 2017-08-21 12:25:38 +02:00
Even Rouault f87c5ef7eb Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer. 2017-08-20 22:02:41 +02:00
Even Rouault 5d40325056 Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault 60f8ddf577 Comment fix 2017-07-06 12:11:37 +02:00
Even Rouault 8fa405ee15 IDWT 5x3: fix bug in AVX2 implementation (#953, #957) 2017-06-30 00:03:05 +02:00
Even Rouault fd0dc535ad IDWT 5x3: generalize SSE2 version for AVX2
Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)

This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
2017-06-21 12:12:58 +02:00
Even Rouault f6e3475cc9 dwt.c: small cleanup 2017-06-21 01:07:56 +02:00
Even Rouault fa55b52d19 Improve performance of inverse DWT 5x3 (#953)
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
  in parallel. This is the most beneficial improvement, since the
  vertical pass involves a lot of cache trashing.

With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
2017-06-20 18:01:34 +02:00
Even Rouault 32b20b93e0 Fix astyle issue 2017-06-17 16:37:56 +02:00
Even Rouault cc07aec6c7 Fix warnings with recent GCC versions 2017-06-17 14:09:31 +02:00
Even Rouault 563bd8499e Reformat whole codebase with astyle.options (#128) 2017-05-09 20:46:20 +02:00
Matthieu Darbois 6e7616c83c Remove TODO for overflow check (#842)
The check was already done. It’s been simplified.
Reformat to get consistent style throughout the functions.
2016-09-15 23:51:34 +02:00