Commit Graph

129 Commits

Author SHA1 Message Date
Even Rouault c06632c6f6
Cleanup code related to quality layer allocation, and add a few safety checks 2022-08-11 18:12:07 +02:00
Even Rouault 3d9bcd3753
Significant speed-up rate allocation by rate/distoratio ratio
- Avoid doing 128 iterations all the time, and stop when the threshold
  doesn't vary much
- Avoid calling costly opj_t2_encode_packets() repeatdly when bisecting the
  layer ratio if the truncation points haven't changed since the last
  iteration.

When used with the GDAL gdal_translate application to convert a 11977 x
8745 raster with data type UInt16 and 8 channels, the conversion time
to JPEG2000 with 20 quality layers using disto/rate allocation (
-co "IC=C8" -co "JPEG2000_DRIVER=JP2OPENJPEG" -co "PROFILE=NPJE_NUMERICALLY_LOSSLESS"
creation options of the GDAL NITF driver) goes from 5m56 wall clock
(8m20s total, 12 vCPUs) down to 1m16 wall clock (3m45 total).
2022-08-11 18:06:50 +02:00
yuan 4ce7d285a5 Encoder: grow again buffer size in opj_tcd_code_block_enc_allocate_data() (fixes #1283) 2020-12-04 19:00:22 +08:00
yuan 4f487798ba Free p_tcd_marker_info to avoid memory leak 2020-11-26 00:22:49 +08:00
yuan 649298dcf8 Encoder: grow again buffer size in opj_tcd_code_block_enc_allocate_data() (fixes #1283) 2020-11-25 20:41:39 +08:00
Even Rouault 15cf3d9581
Encoder: grow again buffer size in opj_tcd_code_block_enc_allocate_data() (fixes #1283) 2020-11-23 18:14:02 +01:00
Even Rouault eaa098b59b
Encoder: grow buffer size in opj_tcd_code_block_enc_allocate_data() to avoid write heap buffer overflow in opj_mqc_flush (fixes #1283) 2020-11-23 13:49:05 +01:00
Even Rouault adccbc8336
Irreversible decoding: partially revert previous commit, to fix failures in test suite 2020-05-20 20:31:28 +02:00
Even Rouault 3cd1305596
Irreversible compression/decompression DWT: use 1/K constant as per standard
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.

The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
2020-05-20 20:31:28 +02:00
Even Rouault f38c069547
Irreversible decoding: align code more closely to the standard by avoid messing up with stepsize (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault 3d35d0f3af
tcd.c: add comment 2020-05-20 20:31:28 +02:00
Even Rouault 00cff6f5c0
Encoder: use floating-point operations for irreversible transformation 2020-05-20 20:31:28 +02:00
Even Rouault 07d1f775a1
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch

Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)

Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s

$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s

$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s

$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s

Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Even Rouault 97eb7e0bf1
Add multithreading support in the T1 (entropy phase) encoder
- API wise, opj_codec_set_threads() can be used on the encoding side
- opj_compress has a -threads switch similar to opj_uncompress
2020-05-20 20:30:21 +02:00
Even Rouault 4edb8c8337
Add support for generation of PLT markers in encoder
* -PLT switch added to opj_compress
* Add a opj_encoder_set_extra_options() function that
  accepts a PLT=YES option, and could be expanded later
  for other uses.

-------

Testing with a Sentinel2 10m band, T36JTT_20160914T074612_B02.jp2,
coming from S2A_MSIL1C_20160914T074612_N0204_R135_T36JTT_20160914T081456.SAFE

Decompress it to TIFF:
```
opj_uncompress -i T36JTT_20160914T074612_B02.jp2 -o T36JTT_20160914T074612_B02.tif
```

Recompress it with similar parameters as original:
```
opj_compress -n 5 -c [256,256],[256,256],[256,256],[256,256],[256,256] -t 1024,1024 -PLT -i T36JTT_20160914T074612_B02.tif -o T36JTT_20160914T074612_B02_PLT.jp2
```

Dump codestream detail with GDAL dump_jp2.py utility (https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/dump_jp2.py)
```
python dump_jp2.py T36JTT_20160914T074612_B02.jp2 > /tmp/dump_sentinel2_ori.txt
python dump_jp2.py T36JTT_20160914T074612_B02_PLT.jp2 > /tmp/dump_sentinel2_openjpeg_plt.txt
```

The diff between both show very similar structure, and identical number of packets in PLT markers

Now testing with Kakadu (KDU803_Demo_Apps_for_Linux-x86-64_200210)

Full file decompression:
```
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp.tif

Consumed 121 tile-part(s) from a total of 121 tile(s).
Consumed 80,318,806 codestream bytes (excluding any file format) = 5.329697
bits/pel.
Processed using the multi-threaded environment, with
    8 parallel threads of execution
```

Partial decompresson (presumably using PLT markers):
```
kdu_expand -i T36JTT_20160914T074612_B02.jp2 -o tmp.pgm -region "{0.5,0.5},{0.01,0.01}"
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp2.pgm  -region "{0.5,0.5},{0.01,0.01}"
diff tmp.pgm tmp2.pgm && echo "same !"
```

-------

Funded by ESA for S2-MPC project
2020-04-21 15:55:44 +02:00
Even Rouault 221a801a97
Rename mis-named function opj_tcd_get_encoded_tile_size() to opj_tcd_get_encoder_input_buffer_size() 2020-04-16 20:33:22 +02:00
Even Rouault 84f3bebbff
Implement writing of IMF profiles
Add -IMF switch to opj_compress as well
2020-02-12 15:55:25 +01:00
Even Rouault 05f9b91e60
opj_tcd_init_tile(): avoid integer overflow
That could lead to later assertion failures.

Fixes #1231 / CVE-2020-8112
2020-01-30 00:59:57 +01:00
Even Rouault 5875a6b446
opj_tcd_mct_decode()/opj_mct_decode()/opj_mct_encode_real()/opj_mct_decode_real(): proper deal with a number of samples larger than 4 billion (refs #1151) 2019-10-03 11:04:30 +02:00
Even Rouault da5e897232 Avoid out-of-bounds write overflow due to uint32 overflow computation on images with huge dimensions. Credit to Google Autofuzz project for providing test case 2018-02-11 13:31:04 +01:00
Even Rouault 7e2b6bebff Add capability to decode only a subset of all components of an image.
This adds a opj_set_decoded_components(opj_codec_t *p_codec,
OPJ_UINT32 numcomps, const OPJ_UINT32* comps_indices) function,
and equivalent "opj_decompress -c compno[,compno]*" option.

When specified, neither the MCT transform nor JP2 channel transformations
will be applied.

Tests added for various combinations of whole image vs tiled-based decoding,
full or reduced resolution, use of decode area or not.
2017-09-19 17:06:19 +02:00
Even Rouault fdef69b43c Fix warnings and errors when compiling with a c++ compiler (#1021) 2017-09-19 12:46:20 +02:00
Even Rouault 28094e1ebf opj_tcd_mct_decode(): avoid heap buffer overflow when components have not the same number of resolutions. Also fixes an issue with subtile decoding. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3331. Credit to OSS Fuzz 2017-09-08 10:56:49 +02:00
Even Rouault 5abd86b14b Properly fix cc893a4ebf to avoid heap-buffer-overflow when numcomps < 3 2017-09-07 18:01:33 +02:00
Even Rouault cc893a4ebf opj_tcd_mct_decode(): fix checks to verify MCT can be done safely. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3305 (master only) 2017-09-07 15:32:54 +02:00
Even Rouault 968e36bbd9 Merge pull request #1010 from rouault/subtile_decoding_stage3
Subtile decoding: memory use reduction and perf improvements
2017-09-05 22:18:58 +02:00
Even Rouault 579b8937ea Replace uses of size_t by OPJ_SIZE_T 2017-09-04 17:35:52 +02:00
Even Rouault 676d4c807f opj_j2k_update_image_data(): avoid allocating image buffer if we can just reuse the tile buffer one 2017-09-01 22:23:29 +02:00
Even Rouault 2c365fe0ec Replace error message 'Not enough memory for tile data' by 'Size of tile data exceeds system limits' (refs https://github.com/uclouvain/openjpeg/pull/730#issuecomment-326654188) 2017-09-01 22:23:29 +02:00
Even Rouault b428b8c7e7 opj_tcd_rateallocate(): make sure to use all passes for a lossless layer (#1009)
And save a useless loop, which should be a tiny faster.
2017-09-01 20:02:09 +02:00
Even Rouault ae19001ba4 opj_tcd_dc_level_shift_decode(): optimize lossy case 2017-09-01 16:31:04 +02:00
Even Rouault ccac773556 Tiny perf improvement in T1 stage for subtile decoding 2017-09-01 16:30:58 +02:00
Even Rouault 98b9310361 Various changes to allow tile buffers of more than 4giga pixels
Untested though, since that means a tile buffer of at least 16 GB. So
there might be places where uint32 overflow on multiplication still occur...
2017-09-01 16:30:44 +02:00
Even Rouault 008a12d4fc TCD: allow tile buffer to be greater than 4GB on 64 bit hosts (but number of pixels must remain under 4 billion) 2017-09-01 16:30:41 +02:00
Even Rouault d5153ba404 Remove limitation that prevents from opening images bigger than 4 billion pixels
However the intermediate buffer for decoding must still be smaller than 4
billion pixels, so this is useful for decoding at a lower resolution level,
or subtile decoding.
2017-09-01 16:30:37 +02:00
Even Rouault c37e360a51 opj_tcd_init_tile(): fix typo on overflow detection condition (introduced in previous commit) 2017-09-01 16:30:35 +02:00
Even Rouault f9e9942330 Sub-tile decoding: only allocate tile component buffer of the needed dimension
Instead of being the full tile size.

* Use a sparse array mechanism to store code-blocks and intermediate stages of
  IDWT.
* IDWT, DC level shift and MCT stages are done just on that smaller array.
* Improve copy of tile component array to final image, by saving an intermediate
  buffer.
* For full-tile decoding at reduced resolution, only allocate the tile buffer to
  the reduced size, instead of the full-resolution size.
2017-09-01 16:30:29 +02:00
Even Rouault a55c024fc6 Subtile decoding: fix overflows in subband coordinate computation that cause later buffer overflow. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3115. Credit to OSS Fuzz. master only 2017-08-28 17:18:33 +02:00
Even Rouault 17a7ac42d5 Add comments for filter_width values 2017-08-21 12:25:38 +02:00
Even Rouault f87c5ef7eb Subtile decoding: only do 9x7 IDWT computations on relevant areas of tile-component buffer. 2017-08-20 22:02:41 +02:00
Even Rouault 5d40325056 Subtile decoding: only do 5x3 IDWT computations on relevant areas of tile-component buffer.
This lowers 'bin/opj_decompress -i ../MAPA.jp2 -o out.tif -d 0,0,256,256'
down to 0.860s
2017-08-18 15:08:51 +02:00
Even Rouault fe338a057c Sub-tile decoding: only decode precincts and codeblocks that intersect the window specified in opj_set_decode_area() 2017-08-17 19:05:54 +02:00
Even Rouault afb308b9cc Encoder: grow buffer size in opj_tcd_code_block_enc_allocate_data() to avoid write heap buffer overflow in opj_mqc_flush (#982) 2017-08-14 17:20:37 +02:00
Even Rouault 0b4fef6d19 Propagate event manager down to opj_t2_encode_packet() and use it to emit an error message when the output buffer is too small 2017-08-10 16:49:47 +02:00
Even Rouault 92114694a4 Slight improvement in management of code block chunks
Instead of having the chunk array at the segment level, we can move it down to
the codeblock itself since segments are filled in sequential order.
Limit the number of memory allocation, and decrease slightly the memory usage.

On MAPA_005.jp2

n4: 1871312549 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 1610689344 0x4E781E7: opj_aligned_malloc (opj_malloc.c:61)
  n1: 1610689344 0x4E71D1B: opj_alloc_tile_component_data (tcd.c:676)
   n1: 1610689344 0x4E726CF: opj_tcd_init_decode_tile (tcd.c:816)
    n1: 1610689344 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
     n1: 1610689344 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
      n1: 1610689344 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
       n1: 1610689344 0x4E53002: opj_jp2_decode (jp2.c:1564)
        n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
 n1: 219232541 0x4E4BC50: opj_j2k_read_tile_header (j2k.c:4683)
  n1: 219232541 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
   n1: 219232541 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
    n1: 219232541 0x4E53002: opj_jp2_decode (jp2.c:1564)
     n0: 219232541 0x40374E: main (opj_decompress.c:1459)
 n1: 23893200 0x4E72735: opj_tcd_init_decode_tile (tcd.c:1225)
  n1: 23893200 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617)
   n1: 23893200 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348)
    n1: 23893200 0x4E4E3CE: opj_j2k_decode (j2k.c:7846)
     n1: 23893200 0x4E53002: opj_jp2_decode (jp2.c:1564)
      n0: 23893200 0x40374E: main (opj_decompress.c:1459)
 n0: 17497464 in 52 places, all below massif's threshold (1.00%)
2017-08-07 18:32:52 +02:00
Even Rouault ca34d13e76 Decoding: do not allocate memory for the codestream of each codeblock
Currently we allocate at least 8192 bytes for each codeblock, and copy
the relevant parts of the codestream in that per-codeblock buffer as we
decode packets.
As the whole codestream for the tile is ingested in memory and alive
during the decoding, we can directly point to it instead of copying. But
to do that, we need an intermediate concept, a 'chunk' of code-stream segment,
given that segments may be made of data at different places in the code-stream
when quality layers are used.

With that change, the decoding of MAPA_005.jp2 goes down from the previous
improvement of 2.7 GB down to 1.9 GB.

New profile:

n4: 1885648469 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 1610689344 0x4E78287: opj_aligned_malloc (opj_malloc.c:61)
  n1: 1610689344 0x4E71D7B: opj_alloc_tile_component_data (tcd.c:676)
   n1: 1610689344 0x4E7272C: opj_tcd_init_decode_tile (tcd.c:816)
    n1: 1610689344 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
     n1: 1610689344 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
      n1: 1610689344 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
       n1: 1610689344 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
        n0: 1610689344 0x40374E: main (opj_decompress.c:1459)
 n1: 219232541 0x4E4BBF0: opj_j2k_read_tile_header (j2k.c:4685)
  n1: 219232541 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
   n1: 219232541 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
    n1: 219232541 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
     n0: 219232541 0x40374E: main (opj_decompress.c:1459)
 n1: 39822000 0x4E727A9: opj_tcd_init_decode_tile (tcd.c:1219)
  n1: 39822000 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618)
   n1: 39822000 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349)
    n1: 39822000 0x4E4E36E: opj_j2k_decode (j2k.c:7847)
     n1: 39822000 0x4E52FA2: opj_jp2_decode (jp2.c:1564)
      n0: 39822000 0x40374E: main (opj_decompress.c:1459)
 n0: 15904584 in 52 places, all below massif's threshold (1.00%)
2017-08-07 18:32:52 +02:00
Even Rouault 373520db30 Add documentation for magic values in the code 2017-08-07 18:32:52 +02:00
Even Rouault f58aab9d6a Add opj_image_data_alloc() / opj_image_data_free()
As bin/common/color.c used to directly call malloc()/free(), we need
to export functions dedicated to allocating/freeing image component data.
2017-08-07 18:32:52 +02:00
Even Rouault 68832af20e opj_tcd_dc_level_shift_decode: avoid int32 overflow when prec == 31. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2799. Credit to OSS Fuzz 2017-07-30 15:22:24 +02:00
Even Rouault 397f62c0a8 Fix write heap buffer overflow in opj_mqc_byteout(). Discovered by Ke Liu of Tencent's Xuanwu LAB (#835) 2017-07-29 19:13:49 +02:00