openjpeg

Commit Graph

Author	SHA1	Message	Date
Even Rouault	1e931fdb36	Forward DWT 9-7: major speed up by vectorizing vertical pass `bench_dwt -I -encode` times goes from 8.6s to 2.1s	2020-05-23 01:01:05 +02:00
Even Rouault	a38e970fa5	Forward DWT 5-3: major speed up by vectorizing vertical pass `bench_dwt -encode` times goes from 7.9s to 1.7s	2020-05-23 01:01:05 +02:00
Even Rouault	e69fa09f60	Forward DWT: small code refactoring to allow future improvements for the vertical pass	2020-05-22 16:01:45 +02:00
Even Rouault	33d3d0de07	dwt.c: remove unused typedef	2020-05-22 15:06:29 +02:00
Even Rouault	97b384aecd	Forward DWT 5x3: performance improvements in horizontal pass, and modest in vertical pass	2020-05-22 15:03:40 +02:00
Even Rouault	bd5f5ee7de	Forward DWT: small code refactoring to allow future improvements for the horizontal pass	2020-05-22 15:02:33 +02:00
Even Rouault	45a35223b7	Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2 "bench_dwt -I" time goes from 2.2s to 1.5s	2020-05-21 17:21:55 +02:00
Even Rouault	272b3e0fb2	Remove useless + 5U margin in opj_dwt_decode_tile_97() Nothing in code analysis nor test suite shows that this margin is needed. It dates back to commit `dbeebe72b9` where vector 9x7 decoding was introduced.	2020-05-21 15:42:51 +02:00
Even Rouault	47943daa15	Speed-up 9x7 IDWD by ~20% "bench_dwt -I" time goes from 2.8s to 2.2s	2020-05-21 15:42:51 +02:00
Even Rouault	adccbc8336	Irreversible decoding: partially revert previous commit, to fix failures in test suite	2020-05-20 20:31:28 +02:00
Even Rouault	3cd1305596	Irreversible compression/decompression DWT: use 1/K constant as per standard The previous constant opj_c13318 was mysteriously equal to 2/K , and in the DWT, we had to divide K and opj_c13318 by 2... The issue was that the band->stepsize computation in tcd.c didn't take into account the log2gain of the band. The effect of this change is expected to be mostly equivalent to the previous situation, except some difference in rounding. But it leads to a dramatic reduction of the mean square error and peak error in the irreversible encoding of issue141.tif !	2020-05-20 20:31:28 +02:00
Even Rouault	e46e300de5	opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side	2020-05-20 20:31:28 +02:00
Even Rouault	00cff6f5c0	Encoder: use floating-point operations for irreversible transformation	2020-05-20 20:31:28 +02:00
Even Rouault	99107d5e46	dwt.c: change sign of constants to match standard and compensate (no functional change)	2020-05-20 20:31:28 +02:00
Even Rouault	07d1f775a1	Add multithreaded support in the DWT encoder. Update the bench_dwt utility to have a -decode/-encode switch Measured performance gains for DWT encoder on a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded) Encoding time: $ ./bin/bench_dwt -encode -num_threads 1 time for dwt_encode: total = 8.348 s, wallclock = 8.352 s $ ./bin/bench_dwt -encode -num_threads 2 time for dwt_encode: total = 9.776 s, wallclock = 4.904 s $ ./bin/bench_dwt -encode -num_threads 4 time for dwt_encode: total = 13.188 s, wallclock = 3.310 s $ ./bin/bench_dwt -encode -num_threads 8 time for dwt_encode: total = 30.024 s, wallclock = 4.064 s Scaling is probably limited by memory access patterns causing memory access to be the bottleneck. The slightly worse results with threads==8 than with thread==4 is due to hyperthreading being not appropriate here.	2020-05-20 20:30:21 +02:00
Nikola Forró	943db0f1c2	Fix several memory and resource leaks Signed-off-by: Nikola Forró <nforro@redhat.com>	2018-10-31 16:16:22 +01:00
Stefan Weil	3d6ffaf3f3	Fix some typos in code comments and documentation All typos were found by Codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-09-05 20:01:10 +02:00
Even Rouault	9cba05762d	Avoid index-out-of-bounds access when invoking opj_compress with -n 11 or higher. But not a proper fix itself (refs #493 )	2017-09-20 00:43:54 +02:00
Even Rouault	003759a482	Fix null pointer dereference on partial tile decoding when they are empty. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3297 (master only)	2017-09-06 15:59:19 +02:00
Even Rouault	579b8937ea	Replace uses of size_t by OPJ_SIZE_T	2017-09-04 17:35:52 +02:00
Even Rouault	c1e0fba0c4	opj_v4dwt_decode_step1_sse(): rework a bit to improve code generation	2017-09-01 22:23:29 +02:00
Even Rouault	8a17be8945	opj_v4dwt_decode_step2_sse(): loop unroll	2017-09-01 16:31:08 +02:00
Even Rouault	83b5a168ec	opj_dwt_decode_partial_97(): simplify/more efficient use of sparse arrays in vertical pass	2017-09-01 16:31:06 +02:00
Even Rouault	470f3ed416	opj_dwt_decode_partial_1_parallel(): add SSE2 optimization	2017-09-01 16:31:02 +02:00
Even Rouault	873004c615	Sub-tile decoding: speed up vertical pass in IDWT5x3 by processing 4 cols at a time	2017-09-01 16:31:00 +02:00
Even Rouault	82a43d8035	Optimize opj_dwt_decode_partial_1() when cas == 0	2017-09-01 16:30:54 +02:00
Even Rouault	98b9310361	Various changes to allow tile buffers of more than 4giga pixels Untested though, since that means a tile buffer of at least 16 GB. So there might be places where uint32 overflow on multiplication still occur...	2017-09-01 16:30:44 +02:00
Even Rouault	d1299d9670	Fix compiler warning in release mode	2017-09-01 16:30:39 +02:00
Even Rouault	eee5104a88	opj_dwt_decode_partial_tile(): avoid undefined behaviour in lifting operation by properly initializing working buffer	2017-09-01 16:30:32 +02:00
Even Rouault	f9e9942330	Sub-tile decoding: only allocate tile component buffer of the needed dimension Instead of being the full tile size. * Use a sparse array mechanism to store code-blocks and intermediate stages of IDWT. * IDWT, DC level shift and MCT stages are done just on that smaller array. * Improve copy of tile component array to final image, by saving an intermediate buffer. * For full-tile decoding at reduced resolution, only allocate the tile buffer to the reduced size, instead of the full-resolution size.	2017-09-01 16:30:29 +02:00
Even Rouault	6ce49bf5ae	Fix undefined shift behaviour in opj_dwt_is_whole_tile_decoding(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3255 . Credit to OSS Fuzz	2017-09-01 10:26:18 +02:00
Even Rouault	04b70908a7	Use IDWT whole tile decoding if the area of interest equals to the image bounds, taking into account the reduced resolution factor	2017-08-29 11:40:53 +02:00
Even Rouault	a55c024fc6	Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115 . Credit to OSS Fuzz. master only	2017-08-28 17:18:33 +02:00
Even Rouault	bc71bd1219	opj_dwt_decode_partial_97(): perf improvement: limit copy of coefficients at end of horizontal pass to actual range of interest	2017-08-23 18:58:32 +02:00
Even Rouault	17a7ac42d5	Add comments for filter_width values	2017-08-21 12:25:38 +02:00
Even Rouault	f87c5ef7eb	Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer.	2017-08-20 22:02:41 +02:00
Even Rouault	5d40325056	Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer. This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256' down to 0.860s	2017-08-18 15:08:51 +02:00
Even Rouault	60f8ddf577	Comment fix	2017-07-06 12:11:37 +02:00
Even Rouault	8fa405ee15	IDWT 5x3: fix bug in AVX2 implementation (#953 , #957 )	2017-06-30 00:03:05 +02:00
Even Rouault	fd0dc535ad	IDWT 5x3: generalize SSE2 version for AVX2 Thanks to our macros that abstract SSE use, the functions can use AVX2 when available (at compile time) This brings an extra 23% speed improvement on bench_dwt in 64bit builds with AVX2 compared to SSE2.	2017-06-21 12:12:58 +02:00
Even Rouault	f6e3475cc9	dwt.c: small cleanup	2017-06-21 01:07:56 +02:00
Even Rouault	fa55b52d19	Improve performance of inverse DWT 5x3 (#953 ) * Use single-pass lifting inverse wavelet transform. * For vertical pass, use SSE2 when available so as to process 8 columns in parallel. This is the most beneficial improvement, since the vertical pass involves a lot of cache trashing. With the bench_dwt utility with default arguments (16383x16383 image), time goes from 4.064 s to 1.212 s.	2017-06-20 18:01:34 +02:00
Even Rouault	32b20b93e0	Fix astyle issue	2017-06-17 16:37:56 +02:00
Even Rouault	cc07aec6c7	Fix warnings with recent GCC versions	2017-06-17 14:09:31 +02:00
Even Rouault	563bd8499e	Reformat whole codebase with astyle.options (#128 )	2017-05-09 20:46:20 +02:00
Matthieu Darbois	6e7616c83c	Remove TODO for overflow check (#842 ) The check was already done. It’s been simplified. Reformat to get consistent style throughout the functions.	2016-09-15 23:51:34 +02:00
Matthieu Darbois	9a07ccb3d0	Add overflow checks for opj_aligned_malloc (#841 ) See https://pdfium.googlesource.com/pdfium/+/b20ab6c7acb3be1393461eb650ca8fa4660c937e/third_party/libopenjpeg20/0020-opj_aligned_malloc.patch	2016-09-15 01:57:53 +02:00
Matthieu Darbois	0954bc11e3	Fix some warnings (#838 ) Fix warnings introduced by uclouvain/openjpeg#786	2016-09-14 00:12:43 +02:00
Even Rouault	48c16b2c19	Merge branch 'master' of https://github.com/uclouvain/openjpeg into tier1_optimizations_multithreading_2 Conflicts: src/lib/openjp2/t1.c	2016-09-08 10:30:09 +02:00
Matthieu Darbois	9f24b078c7	Change 'restrict' define to 'OPJ_RESTRICT' (#816 ) Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC. Need to check proper usage of OPJ_RESTRICT (if correct then there’s probably a bug in vc14) Closes #661	2016-09-06 00:49:53 +02:00

1 2

71 Commits