Commit Graph

2827 Commits

Author SHA1 Message Date
Even Rouault 1c5627ee74
T1 encoder: speed-up by aggressive inlining and more cache friendly data organization
~ 9% speed improvement seen on 10980x10980 uint16 image, T36JTT_20160914T074612_B02.tif
opj_compress time from 17.2s to 15.8s
2020-05-24 15:38:48 +02:00
Even Rouault 1e931fdb36
Forward DWT 9-7: major speed up by vectorizing vertical pass
`bench_dwt -I -encode` times goes from 8.6s to 2.1s
2020-05-23 01:01:05 +02:00
Even Rouault a38e970fa5
Forward DWT 5-3: major speed up by vectorizing vertical pass
`bench_dwt -encode` times goes from 7.9s to 1.7s
2020-05-23 01:01:05 +02:00
Even Rouault e69fa09f60
Forward DWT: small code refactoring to allow future improvements for the vertical pass 2020-05-22 16:01:45 +02:00
Even Rouault 33d3d0de07
dwt.c: remove unused typedef 2020-05-22 15:06:29 +02:00
Even Rouault 97b384aecd
Forward DWT 5x3: performance improvements in horizontal pass, and modest in vertical pass 2020-05-22 15:03:40 +02:00
Even Rouault bd5f5ee7de
Forward DWT: small code refactoring to allow future improvements for the horizontal pass 2020-05-22 15:02:33 +02:00
Even Rouault 45a35223b7
Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2
"bench_dwt -I" time goes from 2.2s to 1.5s
2020-05-21 17:21:55 +02:00
Even Rouault 272b3e0fb2
Remove useless + 5U margin in opj_dwt_decode_tile_97()
Nothing in code analysis nor test suite shows that this margin is
needed.
It dates back to commit dbeebe72b9
where vector 9x7 decoding was introduced.
2020-05-21 15:42:51 +02:00
Even Rouault 47943daa15
Speed-up 9x7 IDWD by ~20%
"bench_dwt -I" time goes from 2.8s to 2.2s
2020-05-21 15:42:51 +02:00
Even Rouault 0c09062464
bench_dwt.c: add a -I switch to test irreversible FWDT/IDWT 2020-05-20 23:20:48 +02:00
Even Rouault adccbc8336
Irreversible decoding: partially revert previous commit, to fix failures in test suite 2020-05-20 20:31:28 +02:00
Even Rouault 3cd1305596
Irreversible compression/decompression DWT: use 1/K constant as per standard
The previous constant opj_c13318 was mysteriously equal to 2/K , and in
the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
band->stepsize computation in tcd.c didn't take into account the log2gain of
the band.

The effect of this change is expected to be mostly equivalent to the previous
situation, except some difference in rounding. But it leads to a dramatic
reduction of the mean square error and peak error in the irreversible encoding
of issue141.tif !
2020-05-20 20:31:28 +02:00
Even Rouault f38c069547
Irreversible decoding: align code more closely to the standard by avoid messing up with stepsize (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault e46e300de5
opj_dwt_encode_1_real(): avoid many bound comparisons, similarly to decoding side 2020-05-20 20:31:28 +02:00
Even Rouault 4ab2ed0907
opj_j2k_setup_encoder(): add validation of tile width and height to avoid potential division by zero 2020-05-20 20:31:28 +02:00
Even Rouault c6a413a423
opj_mct_encode_real(): add SSE optimization 2020-05-20 20:31:28 +02:00
Even Rouault fe4c15f12c
Testing: revise testing of lossy encoding by comparing PEAK and MSE with original image 2020-05-20 20:31:28 +02:00
Even Rouault c2b9d09c65
compare_images.c: code reformatting 2020-05-20 20:31:28 +02:00
Even Rouault 3d35d0f3af
tcd.c: add comment 2020-05-20 20:31:28 +02:00
Even Rouault 00cff6f5c0
Encoder: use floating-point operations for irreversible transformation 2020-05-20 20:31:28 +02:00
Even Rouault 99107d5e46
dwt.c: change sign of constants to match standard and compensate (no functional change) 2020-05-20 20:31:28 +02:00
Even Rouault 07d1f775a1
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch

Measured performance gains for DWT encoder on a
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)

Encoding time:
$ ./bin/bench_dwt -encode -num_threads 1
time for dwt_encode: total = 8.348 s, wallclock = 8.352 s

$ ./bin/bench_dwt -encode -num_threads 2
time for dwt_encode: total = 9.776 s, wallclock = 4.904 s

$ ./bin/bench_dwt -encode -num_threads 4
time for dwt_encode: total = 13.188 s, wallclock = 3.310 s

$ ./bin/bench_dwt -encode -num_threads 8
time for dwt_encode: total = 30.024 s, wallclock = 4.064 s

Scaling is probably limited by memory access patterns causing
memory access to be the bottleneck.
The slightly worse results with threads==8 than with thread==4
is due to hyperthreading being not appropriate here.
2020-05-20 20:30:21 +02:00
Even Rouault 97eb7e0bf1
Add multithreading support in the T1 (entropy phase) encoder
- API wise, opj_codec_set_threads() can be used on the encoding side
- opj_compress has a -threads switch similar to opj_uncompress
2020-05-20 20:30:21 +02:00
Even Rouault 1d358f25c8
Merge pull request #1246 from rouault/write_plt
Add support for generation of PLT markers in encoder
2020-05-20 20:29:31 +02:00
Even Rouault 4edb8c8337
Add support for generation of PLT markers in encoder
* -PLT switch added to opj_compress
* Add a opj_encoder_set_extra_options() function that
  accepts a PLT=YES option, and could be expanded later
  for other uses.

-------

Testing with a Sentinel2 10m band, T36JTT_20160914T074612_B02.jp2,
coming from S2A_MSIL1C_20160914T074612_N0204_R135_T36JTT_20160914T081456.SAFE

Decompress it to TIFF:
```
opj_uncompress -i T36JTT_20160914T074612_B02.jp2 -o T36JTT_20160914T074612_B02.tif
```

Recompress it with similar parameters as original:
```
opj_compress -n 5 -c [256,256],[256,256],[256,256],[256,256],[256,256] -t 1024,1024 -PLT -i T36JTT_20160914T074612_B02.tif -o T36JTT_20160914T074612_B02_PLT.jp2
```

Dump codestream detail with GDAL dump_jp2.py utility (https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/dump_jp2.py)
```
python dump_jp2.py T36JTT_20160914T074612_B02.jp2 > /tmp/dump_sentinel2_ori.txt
python dump_jp2.py T36JTT_20160914T074612_B02_PLT.jp2 > /tmp/dump_sentinel2_openjpeg_plt.txt
```

The diff between both show very similar structure, and identical number of packets in PLT markers

Now testing with Kakadu (KDU803_Demo_Apps_for_Linux-x86-64_200210)

Full file decompression:
```
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp.tif

Consumed 121 tile-part(s) from a total of 121 tile(s).
Consumed 80,318,806 codestream bytes (excluding any file format) = 5.329697
bits/pel.
Processed using the multi-threaded environment, with
    8 parallel threads of execution
```

Partial decompresson (presumably using PLT markers):
```
kdu_expand -i T36JTT_20160914T074612_B02.jp2 -o tmp.pgm -region "{0.5,0.5},{0.01,0.01}"
kdu_expand -i T36JTT_20160914T074612_B02_PLT.jp2 -o tmp2.pgm  -region "{0.5,0.5},{0.01,0.01}"
diff tmp.pgm tmp2.pgm && echo "same !"
```

-------

Funded by ESA for S2-MPC project
2020-04-21 15:55:44 +02:00
Even Rouault 64689d05df
struct opj_j2k: remove unused fields, and add some documentation 2020-04-18 18:25:44 +02:00
Even Rouault 774889a328
Merge pull request #1244 from rouault/fix_pi_warnings
Fix warnings about signed/unsigned casts in pi.c
2020-04-17 00:39:46 +02:00
szukw000 b6b7e96b0c
color_apply_icc_profile: add checks on the number of components (#1236) 2020-04-17 00:37:33 +02:00
Eduardo Barretto 040e142288
jp3d/jpwl/mj2/jpip: Fix resource leaks (#1226)
This issues were found by cppcheck and coverity.
2020-04-17 00:09:40 +02:00
Even Rouault 271a71ef0f
Fix warnings about signed/unsigned casts in pi.c 2020-04-16 23:34:10 +02:00
Even Rouault 221a801a97
Rename mis-named function opj_tcd_get_encoded_tile_size() to opj_tcd_get_encoder_input_buffer_size() 2020-04-16 20:33:22 +02:00
Even Rouault 9c1cfb034a
Merge pull request #1240 from rouault/fix_crash_opj_decompress
opj_decompress: add sanity checks to avoid segfault in case of decoding error
2020-04-01 22:00:19 +02:00
Even Rouault 1c54024165
opj_decompress: add sanity checks to avoid segfault in case of decoding error
Prevent crashes like:
opj_decompress -i 0722_5-1_2019.jp2 -o out.ppm -r 4 -t 0

where 0722_5-1_2019.jp2 is
https://drive.google.com/file/d/1ZxOUZg2-FKjYwa257VFLMpTXRWxEoP0a/view?usp=sharing
2020-04-01 21:11:36 +02:00
Even Rouault 563ecfb55c
opj_compress: improve help message regarding new IMF switch 2020-02-13 09:59:36 +01:00
Even Rouault 4e5501b3c7
Merge pull request #1235 from rouault/imf
Implement writing of IMF profiles
2020-02-13 09:54:20 +01:00
Even Rouault 84f3bebbff
Implement writing of IMF profiles
Add -IMF switch to opj_compress as well
2020-02-12 15:55:25 +01:00
Even Rouault fffe32adcb
openjpeg.h: fix values of OPJ_PROFILE_IMF_ constants 2020-02-12 15:55:02 +01:00
Even Rouault 28881453f6 Merge pull request #1234 from rouault/md5_libtiff_4_1
tests: add alternate checksums for libtiff 4.1
2020-02-10 11:20:20 +01:00
Even Rouault b5cb419faf
tests: add alternate checksums for libtiff 4.1
Fixes #1233

libtiff 4.1 slightly modifies the way it generates files. So
add the new expected md5sum.

Not super elegant solution admitedly.
2020-02-07 22:05:55 +01:00
Even Rouault 647f9b118d
Merge pull request #1232 from rouault/fix_1231
opj_tcd_init_tile(): avoid integer overflow
2020-01-30 13:07:31 +01:00
Even Rouault 05f9b91e60
opj_tcd_init_tile(): avoid integer overflow
That could lead to later assertion failures.

Fixes #1231 / CVE-2020-8112
2020-01-30 00:59:57 +01:00
Max Moroz b63a433ba1 tests/fuzzers: link fuzz binaries using $LIB_FUZZING_ENGINE. (#1230)
This was changed some time ago (https://google.github.io/oss-fuzz/getting-started/new-project-guide/) but the build didn't fail as there is a fallback mechanism. The main advantage of the new approach is that for libFuzzer this produces more performant binaries (as `$LIB_FUZZING_ENGINE` expands into `-fsanitize=fuzzer`, which links libFuzzer from the compiler-rt, allowing better optimization tricks).

I'm also experimenting with dataflow (https://github.com/google/oss-fuzz/issues/1632) on your project, and the dataflow config doesn't have a fallback (as it's a new configuration), therefore I'm proposing a change to migrate from `-lFuzzingEngine` to `$LIB_FUZZING_ENGINE`.
2020-01-13 18:07:54 +01:00
Even Rouault 46c1eff9e9
Merge pull request #1229 from rouault/fix_1228
opj_j2k_update_image_dimensions(): reject images whose coordinates are beyond INT_MAX (fixes #1228)
2020-01-11 11:29:11 +01:00
Even Rouault 024b840739
opj_j2k_update_image_dimensions(): reject images whose coordinates are beyond INT_MAX (fixes #1228) 2020-01-11 01:51:58 +01:00
Even Rouault ac3737372a
Merge pull request #1217 from rouault/fix_ossfuzz_18979
pi.c: avoid integer overflow, resulting in later invalid access to memory in opj_t2_decode_packets()
2019-11-17 13:08:41 +01:00
Robert Ancell 9701b3305d JPWL: convert: Fix buffer overflow reading an image file less than four characters (#1196)
Fixes #1068
2019-11-17 03:09:59 +01:00
Even Rouault cb332992a7
Merge pull request #1218 from rouault/fix_broken_abi_check
abi-check.sh: fix false postive ABI error, and display output error log
2019-11-17 02:47:26 +01:00
Even Rouault 016f80ae21
abi-check.sh: fix false postive ABI error, and display output error log
There is currently a false positive ABI check failure between v2.3.1
and current. It disappears when removing the generated reports of v2.3.1
and recreating them. It is likely that some tooling has evolved since
the initial v2.3.1 report generation.
2019-11-17 02:26:54 +01:00
Even Rouault 4cb1f66304
pi.c: avoid integer overflow, resulting in later invalid access to memory in opj_t2_decode_packets(). Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18979 2019-11-17 01:18:26 +01:00