Commit Graph

60 Commits

Author SHA1 Message Date
Even Rouault e46e300de5
opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side 2020-05-20 20:31:28 +02:00
Even Rouault 00cff6f5c0
Encoder: use floating-point operations for irreversible transformation 2020-05-20 20:31:28 +02:00
Even Rouault 99107d5e46
dwt.c: change sign of constants to match standard and compensate (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault 07d1f775a1
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch

Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)

Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s

$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s

$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s

$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s

Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Nikola Forró 943db0f1c2 Fix several memory and resource leaks
Signed-off-by: Nikola Forró <nforro@redhat.com>
2018-10-31 16:16:22 +01:00
Stefan Weil 3d6ffaf3f3 Fix some typos in code comments and documentation
All typos were found by Codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-05 20:01:10 +02:00
Even Rouault 9cba05762d Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or higher. But not a proper fix itself (refs #493) 2017-09-20 00:43:54 +02:00
Even Rouault 003759a482 Fix null pointer dereference on partial tile decoding when they are empty. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only) 2017-09-06 15:59:19 +02:00
Even Rouault 579b8937ea Replace uses of size_t by OPJ_SIZE_T 2017-09-04 17:35:52 +02:00
Even Rouault c1e0fba0c4 opj_v4dwt_decode_step1_sse(): rework a bit to improve code generation 2017-09-01 22:23:29 +02:00
Even Rouault 8a17be8945 opj_v4dwt_decode_step2_sse(): loop unroll 2017-09-01 16:31:08 +02:00
Even Rouault 83b5a168ec opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in vertical pass 2017-09-01 16:31:06 +02:00
Even Rouault 470f3ed416 opj_dwt_decode_partial_1_parallel(): add SSE2 optimization 2017-09-01 16:31:02 +02:00
Even Rouault 873004c615 Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at a time 2017-09-01 16:31:00 +02:00
Even Rouault 82a43d8035 Optimize opj_dwt_decode_partial_1() when cas == 0 2017-09-01 16:30:54 +02:00
Even Rouault 98b9310361 Various changes to allow tile buffers of more than 4giga pixels
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault d1299d9670 Fix compiler warning in release mode 2017-09-01 16:30:39 +02:00
Even Rouault eee5104a88 opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting operation by properly initializing working buffer 2017-09-01 16:30:32 +02:00
Even Rouault f9e9942330 Sub-tile decoding: only allocate tile component buffer of the needed dimension
Instead of being the full tile size.

* Use a sparse array mechanism to store code-blocks and intermediate stages of
  IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
  buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
  the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault 6ce49bf5ae Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255. Credit to OSS Fuzz 2017-09-01 10:26:18 +02:00
Even Rouault 04b70908a7 Use IDWT whole tile decoding if the area of interest equals to the image bounds, taking into account the reduced resolution factor 2017-08-29 11:40:53 +02:00
Even Rouault a55c024fc6 Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only 2017-08-28 17:18:33 +02:00
Even Rouault bc71bd1219 opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at end of horizontal pass to actual range of interest 2017-08-23 18:58:32 +02:00
Even Rouault 17a7ac42d5 Add comments for filter_width values 2017-08-21 12:25:38 +02:00
Even Rouault f87c5ef7eb Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer. 2017-08-20 22:02:41 +02:00
Even Rouault 5d40325056 Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault 60f8ddf577 Comment fix 2017-07-06 12:11:37 +02:00
Even Rouault 8fa405ee15 IDWT 5x3: fix bug in AVX2 implementation (#953, #957) 2017-06-30 00:03:05 +02:00
Even Rouault fd0dc535ad IDWT 5x3: generalize SSE2 version for AVX2
Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)

This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
2017-06-21 12:12:58 +02:00
Even Rouault f6e3475cc9 dwt.c: small cleanup 2017-06-21 01:07:56 +02:00
Even Rouault fa55b52d19 Improve performance of inverse DWT 5x3 (#953)
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
  in parallel. This is the most beneficial improvement, since the
  vertical pass involves a lot of cache trashing.

With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
2017-06-20 18:01:34 +02:00
Even Rouault 32b20b93e0 Fix astyle issue 2017-06-17 16:37:56 +02:00
Even Rouault cc07aec6c7 Fix warnings with recent GCC versions 2017-06-17 14:09:31 +02:00
Even Rouault 563bd8499e Reformat whole codebase with astyle.options (#128) 2017-05-09 20:46:20 +02:00
Matthieu Darbois 6e7616c83c Remove TODO for overflow check (#842)
The check was already done. It’s been simplified.
Reformat to get consistent style throughout the functions.
2016-09-15 23:51:34 +02:00
Matthieu Darbois 9a07ccb3d0 Add overflow checks for opj_aligned_malloc (#841)
See
https://pdfium.googlesource.com/pdfium/+/b20ab6c7acb3be1393461eb650ca8fa4660c937e/third_party/libopenjpeg20/0020-opj_aligned_malloc.patch
2016-09-15 01:57:53 +02:00
Matthieu Darbois 0954bc11e3 Fix some warnings (#838)
Fix warnings introduced by uclouvain/openjpeg#786
2016-09-14 00:12:43 +02:00
Even Rouault 48c16b2c19 Merge branch 'master' of https://github.com/uclouvain/openjpeg into tier1_optimizations_multithreading_2
Conflicts:
	src/lib/openjp2/t1.c
2016-09-08 10:30:09 +02:00
Matthieu Darbois 9f24b078c7 Change 'restrict' define to 'OPJ_RESTRICT' (#816)
Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC.
Need to check proper usage of OPJ_RESTRICT (if correct then there’s
probably a bug  in vc14)

Closes #661
2016-09-06 00:49:53 +02:00
Even Rouault 7d3c7a345f Be robust to failed allocations of job structures 2016-05-26 23:51:32 +02:00
Even Rouault 57b216bb58 Use thread pool for DWT decoding 2016-05-25 21:02:07 +02:00
Even Rouault 6a1974d40d Add comment explaining bj is not use when l_data_size == 0 2016-01-09 14:30:48 +01:00
Even Rouault 87c0d7dc1e [git/2.1 regression] Fix opj_write_tile() failure when numresolutions=1
When trying the GDAL OpenJPEG driver against openjpeg current master HEAD,
I get failures when trying to create .jp2 files. The driver uses
opj_write_tile() and in some tests numresolutions = 1.

In openjp2/dwt.c:410, l_data_size = opj_dwt_max_resolution( tilec->resolutions,tilec->numresolutions) * (OPJ_UINT32)sizeof(OPJ_INT32);
is called and returns l_data_size = 0. Now in git opj_malloc() has a special case
for 0 to return a NULL pointer whereas previously it relied on system malloc(),
which in my case didn't return NULL.

So only test the pointer value if l_data_size != 0. This makes the GDAL
autotest suite to pass again.
2016-01-08 19:38:45 +01:00
Stephan Mühlstrasser a1fc83cc25 Fix HP compiler warning about redeclaration of function (#640)
HP compiler warns:
cc: "dwt.c", line 798: warning 562: Redeclaration of "opj_v4dwt_decode"
with a different storage class specifier: "opj_v4dwt_decode" will have
internal linkage.
cc: "t2.c", line 1341: warning 562: Redeclaration of "opj_t2_init_seg"
with a different storage class specifier: "opj_t2_init_seg" will have
internal linkage.
2015-10-19 12:14:27 +02:00
Matthieu Darbois 05b3afd28f Merge pull request #636 from uclouvain/opj_malloc-625
Update allocation functions
Fix #625 
Fix #624
Fix #635
2015-10-18 03:14:55 +02:00
mayeut 8034ffde8b Fix inconsistent behavior of malloc(0)
Update #635
Update #625
2015-10-17 02:55:09 +02:00
Mathieu Malaterre 372fead0d7 Remove the explicit restrict keyword
It would trigger a compiler error on xlc compiler.  Fixes #620
2015-10-13 21:07:11 +02:00
mayeut ae4799ad07 Add some missing static
Still needs to check j2k.c & jp2.c
Update uclouvain/openjpeg#243
2015-07-18 02:39:32 +02:00
Antonin Descampe 6868ee373e added memory allocation checks (fixes issue 355) 2014-09-19 10:26:35 +00:00
Antonin Descampe d19a4ab676 [trunk] updated copyright and added copyright notice required by ISO, in each file; updated AUTHORS, NEWS 2014-04-03 15:30:57 +00:00