Commit Graph

91 Commits

Author SHA1 Message Date
Even Rouault 0ae3cba340 Allow several repeated calls to opj_set_decode_area() and opj_decode() for single-tiled images
* Only works for single-tiled images --> will error out cleanly, as currently
  in other cases
* Save re-reading the codestream for the tile, and re-use code-blocks of the
  previous decoding pass.
* Future improvements might involve improving opj_decompress, and the image writing logic,
  to use this strategy.
2017-09-01 16:30:48 +02:00
Even Rouault 98b9310361 Various changes to allow tile buffers of more than 4giga pixels
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault f9e9942330 Sub-tile decoding: only allocate tile component buffer of the needed dimension
Instead of being the full tile size.

* Use a sparse array mechanism to store code-blocks and intermediate stages of
  IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
  buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
  the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault 84bbb4a874 opj_t1_allocate_buffers(): remove useless overflow checks 2017-09-01 10:26:53 +02:00
Even Rouault 5d40325056 Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault 4b0bfbfabc Zero-initialize tile buffer regions of skipped code-blocks, so as to make Valgrind happy 2017-08-17 19:05:54 +02:00
Even Rouault fe338a057c Sub-tile decoding: only decode precincts and codeblocks that intersect the window specified in opj_set_decode_area() 2017-08-17 19:05:54 +02:00
Even Rouault 8e6c371e66 opj_t1_encode_cblk(): avoid uint32 overflow when numbps = 0 (which is well defined behaviour, and is properly handled here, but better avoid it to detect real issues) 2017-08-16 18:29:59 +02:00
Even Rouault 92114694a4 Slight improvement in management of code block chunks
Instead of having the chunk array at the segment level, we can move it down to
the codeblock itself since segments are filled in sequential order.
Limit the number of memory allocation, and decrease slightly the memory usage.

On MAPA_005.jp2

n4: 1871312549 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 1610689344 0x4E781E7: opj_aligned_malloc (opj_malloc.c:61)
  n1: 1610689344 0x4E71D1B: opj_alloc_tile_component_data (tcd.c:676)
   n1: 1610689344 0x4E726CF: opj_tcd_init_decode_tile (tcd.c:816)
    n1: 1610689344 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
     n1: 1610689344 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
      n1: 1610689344 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
       n1: 1610689344 0x4E53002: opj_jp2_decode (jp2.c:1564)
        n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
 n1: 219232541 0x4E4BC50: opj_j2k_read_tile_header (j2k.c:4683)
  n1: 219232541 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
   n1: 219232541 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
    n1: 219232541 0x4E53002: opj_jp2_decode (jp2.c:1564)
     n0: 219232541 0x40374E: main (opj_decompress.c:1459)
 n1: 23893200 0x4E72735: opj_tcd_init_decode_tile (tcd.c:1225)
  n1: 23893200 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
   n1: 23893200 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
    n1: 23893200 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
     n1: 23893200 0x4E53002: opj_jp2_decode (jp2.c:1564)
      n0: 23893200 0x40374E: main (opj_decompress.c:1459)
 n0: 17497464 in 52 places, all below massif's threshold (1.00%)
2017-08-07 18:32:52 +02:00
Even Rouault ca34d13e76 Decoding: do not allocate memory for the codestream of each codeblock
Currently we allocate at least 8192 bytes for each codeblock, and copy
the relevant parts of the codestream in that per-codeblock buffer as we
decode packets.
As the whole codestream for the tile is ingested in memory and alive
during the decoding, we can directly point to it instead of copying. But
to do that, we need an intermediate concept, a 'chunk' of code-stream segment,
given that segments may be made of data at different places in the code-stream
when quality layers are used.

With that change, the decoding of MAPA_005.jp2 goes down from the previous
improvement of 2.7 GB down to 1.9 GB.

New profile:

n4: 1885648469 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 1610689344 0x4E78287: opj_aligned_malloc (opj_malloc.c:61)
  n1: 1610689344 0x4E71D7B: opj_alloc_tile_component_data (tcd.c:676)
   n1: 1610689344 0x4E7272C: opj_tcd_init_decode_tile (tcd.c:816)
    n1: 1610689344 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
     n1: 1610689344 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
      n1: 1610689344 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
       n1: 1610689344 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
        n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
 n1: 219232541 0x4E4BBF0: opj_j2k_read_tile_header (j2k.c:4685)
  n1: 219232541 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
   n1: 219232541 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
    n1: 219232541 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
     n0: 219232541 0x40374E: main (opj_decompress.c:1459)
 n1: 39822000 0x4E727A9: opj_tcd_init_decode_tile (tcd.c:1219)
  n1: 39822000 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
   n1: 39822000 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
    n1: 39822000 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
     n1: 39822000 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
      n0: 39822000 0x40374E: main (opj_decompress.c:1459)
 n0: 15904584 in 52 places, all below massif's threshold (1.00%)
2017-08-07 18:32:52 +02:00
Even Rouault 83342f2aaf Fix Doxygen warnings (patch derived from Winfried's doxygen-dif.txt.zip, #849) 2017-07-30 18:18:59 +02:00
Even Rouault db9ef99f6d opj_t1_decode_cblk(): avoid undefined shift behaviour. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2487. Credit to OSS Fuzz 2017-07-29 16:34:35 +02:00
Even Rouault f6551f822f opj_t1_clbl_decode_processor(): avoid undefined behaviour if roishift >= 31. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2506. Credit to OSS Fuzz 2017-07-29 16:29:11 +02:00
Even Rouault 94c4b7300c T1 decoder: check code stream errors when predictable termination is enabled and emit a warning when errors are found 2017-07-26 21:43:32 +02:00
Even Rouault cdd3e83bae Fix clang warning about extraneous parentheses 2017-06-21 12:49:01 +02:00
Even Rouault cc07aec6c7 Fix warnings with recent GCC versions 2017-06-17 14:09:31 +02:00
Even Rouault 9cbc9903c3 Merge branch 't1_flag_optimizations' 2017-06-13 12:09:52 +02:00
Even Rouault 73d1510d47 Encoder: fix packet writing of empty sub-bands (#891, #892)
There are situations where, given a tile size, at a resolution level,
there are sub-bands with x0==x1 or y0==y1, that consequently don't have any
valid codeblocks, but the other sub-bands may be non-empty.
Given that we recycle the memory from one tile to another one, those
ghost codeblocks might be non-0 and thus candidate for packet inclusion.
2017-06-12 18:37:50 +02:00
Even Rouault 81c5311758 T1: fix BYPASS/LAZY, TERMALL/RESTART and PTERM/ERTERM encoding modes. (#674)
There were a number of defects regarding when and how the termination of
passes had to done and the computation of their rate.
2017-06-09 10:49:03 +02:00
Even Rouault 9a9b06911e opj_t1_dec_sigpass_raw/opj_t1_dec_refpass_raw: harmonize style with mqc methods 2017-06-02 19:22:15 +02:00
Even Rouault 532243f1fd MQC/RAW decoder: use an artificial 0xFF 0xFF terminating marker.
This saves comparing the current pointer with the end of buffer pointer.
This results at least in tiny speed improvement for raw decoding, and
smaller code size for MQC as well.

This kills the remains of the raw.h/.c files that were only used for
decoding. Encoding using the mqc structure already.
2017-06-02 18:24:07 +02:00
Even Rouault dde6cbabc0 Simplify VSC handling: instead of masking out bits when reading the 4th row.
Do not set them when updating flags of the 1st row
2017-06-02 18:23:38 +02:00
Even Rouault 3d9940a35b Force inlining of mqc decoding and pass steps through heavy use of macros, so as to get better register allocation 2017-06-02 18:23:20 +02:00
Even Rouault 2ba861c37c Optimize opj_t1_update_flags() 2017-06-02 18:22:42 +02:00
Even Rouault a0861855c1 T1: remove use of neghalf variable. It is useless since bpno is always > 0 2017-06-02 18:22:21 +02:00
Even Rouault 10410fe72e T1: avoid pointer indirection for mqc and raw members of opj_t1_t 2017-06-02 18:21:54 +02:00
Even Rouault a5003787ff T1: remove flags_stride variable from opj_t1_t 2017-06-02 18:21:39 +02:00
Even Rouault aa7a8a4398 T1: loop unrolling in dec_sigpass_raw and dec_refpass_raw 2017-06-02 18:20:58 +02:00
Even Rouault 68557ff503 T1: Transpose coder optimizations to decoder, and cleanup code 2017-06-02 18:20:35 +02:00
Even Rouault 1957a498b6 Fix compiler warnings 2017-05-23 17:06:46 +02:00
Even Rouault 40c0f42def Factor index computation for lut_enc_ctxno_sc and lut_enc_spb 2017-05-23 17:06:46 +02:00
Even Rouault d6907b9304 Optimize a bit opj_t1_enc_clnpass() 2017-05-23 17:06:46 +02:00
Even Rouault c76a592131 T1: remove unused code in decoder 2017-05-23 17:06:46 +02:00
Even Rouault 4068363ff5 T1: fix VSC mode in encoder 2017-05-23 16:16:32 +02:00
Even Rouault cd12414c6b T1: use more compact flags to optimize cache usage in encoder passes. (#172)
Ported from Carl Hetherington work (actually through Matthieu Darbois's port
on top of OpenJPEG 2.1.0)

Can reduce total encoding time by 10-15%

WARNING: VSC mode is not implemented, and so is a temporary regression
that must be fixed.
2017-05-23 16:16:32 +02:00
Even Rouault 8728cfbc79 t1.c: fix compiler warnings 2017-05-23 13:54:28 +02:00
Even Rouault 563bd8499e Reformat whole codebase with astyle.options (#128) 2017-05-09 20:46:20 +02:00
Matthieu Darbois 9a07ccb3d0 Add overflow checks for opj_aligned_malloc (#841)
See
https://pdfium.googlesource.com/pdfium/+/b20ab6c7acb3be1393461eb650ca8fa4660c937e/third_party/libopenjpeg20/0020-opj_aligned_malloc.patch
2016-09-15 01:57:53 +02:00
Matthieu Darbois f88c9974e2 Flags in T1 shall be unsigned (#840)
This will remove some conversion warnings
2016-09-14 23:46:46 +02:00
Matthieu Darbois 0954bc11e3 Fix some warnings (#838)
Fix warnings introduced by uclouvain/openjpeg#786
2016-09-14 00:12:43 +02:00
Even Rouault 48c16b2c19 Merge branch 'master' of https://github.com/uclouvain/openjpeg into tier1_optimizations_multithreading_2
Conflicts:
	src/lib/openjp2/t1.c
2016-09-08 10:30:09 +02:00
Matthieu Darbois 9f24b078c7 Change 'restrict' define to 'OPJ_RESTRICT' (#816)
Visual Studio 2015 does not pass regression tests with `__restrict` so kept disabled for MSVC.
Need to check proper usage of OPJ_RESTRICT (if correct then there’s
probably a bug  in vc14)

Closes #661
2016-09-06 00:49:53 +02:00
Even Rouault 7d3c7a345f Be robust to failed allocations of job structures 2016-05-26 23:51:32 +02:00
Even Rouault 5fbb8b2645 Use thread-pool for T1 decoding 2016-05-25 21:02:07 +02:00
Even Rouault 7092f7ea11 Fix MSVC210 build issue (use of C99 declaration after statement) introduced in ba1edf6cd4 2016-05-23 16:00:28 +02:00
Even Rouault 107eb31531 Improve perf of opj_t1_dec_sigpass_mqc_vsc() and opj_t1_dec_refpass_mqc_vsc() with loop unrolling 2016-05-23 13:45:15 +02:00
Even Rouault 8371491a99 Better inlining of opj_t1_updateflagscolflags() w.r.t. flags_stride 2016-05-23 11:53:54 +02:00
Even Rouault 956c31d5a6 opj_t1_dec_clnpass(): remove useless test in the runlen decoding path (of the non VSC case) 2016-05-23 11:53:54 +02:00
Even Rouault 93f7f90711 opj_t1_decode_cblks(): tiny perf increase when loop unrolling 2016-05-23 11:53:53 +02:00
Even Rouault 1da397e94a Tier 1 decoding: add a colflags array
Addition flag array such that colflags[1+0] is for state of col=0,row=0..3,
colflags[1+1] for col=1, row=0..3, colflags[1+flags_stride] for col=0,row=4..7, ...
This array avoids too much cache trashing when processing by 4 vertical samples
as done in the various decoding steps.
2016-05-23 11:53:53 +02:00