Even Rouault
1462e9403f
Avoid integer overflows in DWT. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=44544
2022-02-10 14:30:13 +01:00
Stefan Weil
667149ffa1
Fix some typos (found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-12-05 13:14:33 +01:00
Even Rouault
badbd93af9
Avoid integer overflows in DWT. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11700 and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30646
2021-09-03 15:17:56 +02:00
Even Rouault
6daf5f3e1e
Encoder: avoid global buffer overflow on irreversible conversion when too many decomposition levels are specified ( fixes #1286 )
2020-11-30 23:29:06 +01:00
Even Rouault
1e931fdb36
Forward DWT 9-7: major speed up by vectorizing vertical pass
...
`bench_dwt -I -encode` times goes from 8.6s to 2.1s
2020-05-23 01:01:05 +02:00
Even Rouault
a38e970fa5
Forward DWT 5-3: major speed up by vectorizing vertical pass
...
`bench_dwt -encode` times goes from 7.9s to 1.7s
2020-05-23 01:01:05 +02:00
Even Rouault
e69fa09f60
Forward DWT: small code refactoring to allow future improvements for the vertical pass
2020-05-22 16:01:45 +02:00
Even Rouault
33d3d0de07
dwt.c: remove unused typedef
2020-05-22 15:06:29 +02:00
Even Rouault
97b384aecd
Forward DWT 5x3: performance improvements in horizontal pass, and modest in vertical pass
2020-05-22 15:03:40 +02:00
Even Rouault
bd5f5ee7de
Forward DWT: small code refactoring to allow future improvements for the horizontal pass
2020-05-22 15:02:33 +02:00
Even Rouault
45a35223b7
Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2
...
"bench_dwt -I" time goes from 2.2s to 1.5s
2020-05-21 17:21:55 +02:00
Even Rouault
272b3e0fb2
Remove useless + 5U margin in opj_dwt_decode_tile_97()
...
Nothing in code analysis nor test suite shows that this margin is
needed.
It dates back to commit dbeebe72b9
where vector 9x7 decoding was introduced.
2020-05-21 15:42:51 +02:00
Even Rouault
47943daa15
Speed-up 9x7 IDWD by ~20%
...
"bench_dwt -I" time goes from 2.8s to 2.2s
2020-05-21 15:42:51 +02:00
Even Rouault
adccbc8336
Irreversible decoding: partially revert previous commit, to fix failures in test suite
2020-05-20 20:31:28 +02:00
Even Rouault
3cd1305596
Irreversible compression/decompression DWT: use 1/K constant as per standard
...
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.
The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
2020-05-20 20:31:28 +02:00
Even Rouault
e46e300de5
opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side
2020-05-20 20:31:28 +02:00
Even Rouault
00cff6f5c0
Encoder: use floating-point operations for irreversible transformation
2020-05-20 20:31:28 +02:00
Even Rouault
99107d5e46
dwt.c: change sign of constants to match standard and compensate (no functional change)
2020-05-20 20:31:28 +02:00
Even Rouault
07d1f775a1
Add multithreaded support in the DWT encoder.
...
Update the bench_dwt utility to have a -decode/-encode switch
Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)
Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s
$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s
$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s
$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s
Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Nikola Forró
943db0f1c2
Fix several memory and resource leaks
...
Signed-off-by: Nikola Forró <nforro@redhat.com>
2018-10-31 16:16:22 +01:00
Stefan Weil
3d6ffaf3f3
Fix some typos in code comments and documentation
...
All typos were found by Codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-05 20:01:10 +02:00
Even Rouault
9cba05762d
Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or higher. But not a proper fix itself (refs #493 )
2017-09-20 00:43:54 +02:00
Even Rouault
003759a482
Fix null pointer dereference on partial tile decoding when they are empty. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only)
2017-09-06 15:59:19 +02:00
Even Rouault
579b8937ea
Replace uses of size_t by OPJ_SIZE_T
2017-09-04 17:35:52 +02:00
Even Rouault
c1e0fba0c4
opj_v4dwt_decode_step1_sse(): rework a bit to improve code generation
2017-09-01 22:23:29 +02:00
Even Rouault
8a17be8945
opj_v4dwt_decode_step2_sse(): loop unroll
2017-09-01 16:31:08 +02:00
Even Rouault
83b5a168ec
opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in vertical pass
2017-09-01 16:31:06 +02:00
Even Rouault
470f3ed416
opj_dwt_decode_partial_1_parallel(): add SSE2 optimization
2017-09-01 16:31:02 +02:00
Even Rouault
873004c615
Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at a time
2017-09-01 16:31:00 +02:00
Even Rouault
82a43d8035
Optimize opj_dwt_decode_partial_1() when cas == 0
2017-09-01 16:30:54 +02:00
Even Rouault
98b9310361
Various changes to allow tile buffers of more than 4giga pixels
...
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault
d1299d9670
Fix compiler warning in release mode
2017-09-01 16:30:39 +02:00
Even Rouault
eee5104a88
opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting operation by properly initializing working buffer
2017-09-01 16:30:32 +02:00
Even Rouault
f9e9942330
Sub-tile decoding: only allocate tile component buffer of the needed dimension
...
Instead of being the full tile size.
* Use a sparse array mechanism to store code-blocks and intermediate stages of
IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault
6ce49bf5ae
Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255 . Credit to OSS Fuzz
2017-09-01 10:26:18 +02:00
Even Rouault
04b70908a7
Use IDWT whole tile decoding if the area of interest equals to the image bounds, taking into account the reduced resolution factor
2017-08-29 11:40:53 +02:00
Even Rouault
a55c024fc6
Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115 . Credit to OSS Fuzz. master only
2017-08-28 17:18:33 +02:00
Even Rouault
bc71bd1219
opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at end of horizontal pass to actual range of interest
2017-08-23 18:58:32 +02:00
Even Rouault
17a7ac42d5
Add comments for filter_width values
2017-08-21 12:25:38 +02:00
Even Rouault
f87c5ef7eb
Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer.
2017-08-20 22:02:41 +02:00
Even Rouault
5d40325056
Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
...
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault
60f8ddf577
Comment fix
2017-07-06 12:11:37 +02:00
Even Rouault
8fa405ee15
IDWT 5x3: fix bug in AVX2 implementation ( #953 , #957 )
2017-06-30 00:03:05 +02:00
Even Rouault
fd0dc535ad
IDWT 5x3: generalize SSE2 version for AVX2
...
Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)
This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
2017-06-21 12:12:58 +02:00
Even Rouault
f6e3475cc9
dwt.c: small cleanup
2017-06-21 01:07:56 +02:00
Even Rouault
fa55b52d19
Improve performance of inverse DWT 5x3 ( #953 )
...
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
in parallel. This is the most beneficial improvement, since the
vertical pass involves a lot of cache trashing.
With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
2017-06-20 18:01:34 +02:00
Even Rouault
32b20b93e0
Fix astyle issue
2017-06-17 16:37:56 +02:00
Even Rouault
cc07aec6c7
Fix warnings with recent GCC versions
2017-06-17 14:09:31 +02:00
Even Rouault
563bd8499e
Reformat whole codebase with astyle.options ( #128 )
2017-05-09 20:46:20 +02:00
Matthieu Darbois
6e7616c83c
Remove TODO for overflow check ( #842 )
...
The check was already done. It’s been simplified.
Reformat to get consistent style throughout the functions.
2016-09-15 23:51:34 +02:00