Even Rouault
e673c8bd4d
Merge pull request #963 from rouault/travis_avx2
...
Enable AVX2 at runtime on Travis-CI and AppVeyor
2017-07-01 12:54:39 +02:00
Even Rouault
b9923764da
Add tools/travis-ci/knownfailures-Ubuntu14.04-clang3.8.0-x86_64-Release-3rdP.txt (copied from knownfailures-Ubuntu12.04-clang3.9.0-x86_64-Release-3rdP.txt)
2017-07-01 10:00:57 +02:00
Even Rouault
f194ff32ac
appveyor.yml: add a /arch:AVX2 config on Windows
...
Try running the tests if the CPU supports AVX2.
2017-07-01 10:00:57 +02:00
Even Rouault
69a001819c
.travis.yml: try to run tests in -mavx2 mode if the CPU supports it
...
And modify settings so as to hae a AVX2 compatible CPU
2017-07-01 02:14:27 +02:00
Even Rouault
8fa405ee15
IDWT 5x3: fix bug in AVX2 implementation ( #953 , #957 )
2017-06-30 00:03:05 +02:00
Even Rouault
6239ed7be4
INSTALL.md: add section discussing how to enable CPU specific optimizations
2017-06-26 13:13:26 +02:00
Even Rouault
533fa2fdee
Merge pull request #957 from rouault/idwt_53_improvements
...
IDWT 5x3 single-pass lifting and SSE2/AVX2 implementation
2017-06-26 12:45:34 +02:00
Even Rouault
6026786069
Style fix
2017-06-21 13:20:35 +02:00
Even Rouault
93aca84731
Fix mingw related warnings
2017-06-21 12:54:40 +02:00
Even Rouault
cdd3e83bae
Fix clang warning about extraneous parentheses
2017-06-21 12:49:01 +02:00
Even Rouault
4fe7620d4a
.travis.yml: add a configuration to test compilation of AVX2 (but disable tests since Travis doesn't have AVX2 compatible machines)
2017-06-21 12:41:56 +02:00
Even Rouault
fd0dc535ad
IDWT 5x3: generalize SSE2 version for AVX2
...
Thanks to our macros that abstract SSE use, the functions can use
AVX2 when available (at compile time)
This brings an extra 23% speed improvement on bench_dwt in 64bit builds
with AVX2 compared to SSE2.
2017-06-21 12:12:58 +02:00
Even Rouault
f6e3475cc9
dwt.c: small cleanup
2017-06-21 01:07:56 +02:00
Even Rouault
f06cfadef8
Enable __SSE__ / __SSE2__ with Visual Studio
2017-06-20 18:24:21 +02:00
Even Rouault
fa55b52d19
Improve performance of inverse DWT 5x3 ( #953 )
...
* Use single-pass lifting inverse wavelet transform.
* For vertical pass, use SSE2 when available so as to process 8 columns
in parallel. This is the most beneficial improvement, since the
vertical pass involves a lot of cache trashing.
With the bench_dwt utility with default arguments (16383x16383 image),
time goes from 4.064 s to 1.212 s.
2017-06-20 18:01:34 +02:00
Even Rouault
919ed5f8b8
Add bench_dwt program (compiled only if BUILD_BENCH_DWT=ON)
2017-06-20 17:56:19 +02:00
Even Rouault
5c56933daf
Merge pull request #955 from rouault/remove_opj_nosanitize
...
Remove OPJ_NOSANITIZE in opj_bio_read() and opj_bio_write() (#761 )
2017-06-18 00:49:20 +02:00
Even Rouault
8df2521a60
Remove OPJ_NOSANITIZE in opj_bio_read() and opj_bio_write() ( #761 )
...
Commit 29313eb5
introduced those flags to avoid issues with
-fsanitize=unsigned-integer-overflow
However it is better just to rewrite the loop to avoid such condition
to occur.
2017-06-17 19:15:00 +02:00
Even Rouault
32b20b93e0
Fix astyle issue
2017-06-17 16:37:56 +02:00
Even Rouault
5f596cb283
Fix warning about unused arguments
2017-06-17 14:10:15 +02:00
Even Rouault
cc07aec6c7
Fix warnings with recent GCC versions
2017-06-17 14:09:31 +02:00
Antonin Descampe
36dd87cea8
Merge pull request #928 from RussellMcOrmond/master
...
Quiet mode for opj_decompress via -quiet long parameter.
2017-06-14 17:23:06 +02:00
Even Rouault
9cbc9903c3
Merge branch 't1_flag_optimizations'
2017-06-13 12:09:52 +02:00
Even Rouault
2609fb8077
Packet header writing: set empty packet header bit to 0 when appropriate (small optimization)
2017-06-12 18:38:11 +02:00
Even Rouault
73d1510d47
Encoder: fix packet writing of empty sub-bands ( #891 , #892 )
...
There are situations where, given a tile size, at a resolution level,
there are sub-bands with x0==x1 or y0==y1, that consequently don't have any
valid codeblocks, but the other sub-bands may be non-empty.
Given that we recycle the memory from one tile to another one, those
ghost codeblocks might be non-0 and thus candidate for packet inclusion.
2017-06-12 18:37:50 +02:00
Even Rouault
81c5311758
T1: fix BYPASS/LAZY, TERMALL/RESTART and PTERM/ERTERM encoding modes. ( #674 )
...
There were a number of defects regarding when and how the termination of
passes had to done and the computation of their rate.
2017-06-09 10:49:03 +02:00
Even Rouault
9a9b06911e
opj_t1_dec_sigpass_raw/opj_t1_dec_refpass_raw: harmonize style with mqc methods
2017-06-02 19:22:15 +02:00
Even Rouault
532243f1fd
MQC/RAW decoder: use an artificial 0xFF 0xFF terminating marker.
...
This saves comparing the current pointer with the end of buffer pointer.
This results at least in tiny speed improvement for raw decoding, and
smaller code size for MQC as well.
This kills the remains of the raw.h/.c files that were only used for
decoding. Encoding using the mqc structure already.
2017-06-02 18:24:07 +02:00
Even Rouault
9b39fc4bcc
Fix documentation of opj_t1_decode_cblks()
2017-06-02 18:23:49 +02:00
Even Rouault
dde6cbabc0
Simplify VSC handling: instead of masking out bits when reading the 4th row.
...
Do not set them when updating flags of the 1st row
2017-06-02 18:23:38 +02:00
Even Rouault
3d9940a35b
Force inlining of mqc decoding and pass steps through heavy use of macros, so as to get better register allocation
2017-06-02 18:23:20 +02:00
Even Rouault
7e8b502842
t1_generate_luts.c: fix compiler warnings
2017-06-02 18:22:59 +02:00
Even Rouault
2ba861c37c
Optimize opj_t1_update_flags()
2017-06-02 18:22:42 +02:00
Even Rouault
a0861855c1
T1: remove use of neghalf variable. It is useless since bpno is always > 0
2017-06-02 18:22:21 +02:00
Even Rouault
10410fe72e
T1: avoid pointer indirection for mqc and raw members of opj_t1_t
2017-06-02 18:21:54 +02:00
Even Rouault
a5003787ff
T1: remove flags_stride variable from opj_t1_t
2017-06-02 18:21:39 +02:00
Even Rouault
0ec842e1f1
Inline opj_raw_decode()
2017-06-02 18:21:21 +02:00
Even Rouault
aa7a8a4398
T1: loop unrolling in dec_sigpass_raw and dec_refpass_raw
2017-06-02 18:20:58 +02:00
Even Rouault
68557ff503
T1: Transpose coder optimizations to decoder, and cleanup code
2017-06-02 18:20:35 +02:00
Even Rouault
1957a498b6
Fix compiler warnings
2017-05-23 17:06:46 +02:00
Even Rouault
40c0f42def
Factor index computation for lut_enc_ctxno_sc and lut_enc_spb
2017-05-23 17:06:46 +02:00
Even Rouault
d6907b9304
Optimize a bit opj_t1_enc_clnpass()
2017-05-23 17:06:46 +02:00
Even Rouault
c76a592131
T1: remove unused code in decoder
2017-05-23 17:06:46 +02:00
Even Rouault
4068363ff5
T1: fix VSC mode in encoder
2017-05-23 16:16:32 +02:00
Even Rouault
cd12414c6b
T1: use more compact flags to optimize cache usage in encoder passes. ( #172 )
...
Ported from Carl Hetherington work (actually through Matthieu Darbois's port
on top of OpenJPEG 2.1.0)
Can reduce total encoding time by 10-15%
WARNING: VSC mode is not implemented, and so is a temporary regression
that must be fixed.
2017-05-23 16:16:32 +02:00
Even Rouault
53d46fc733
Merge pull request #936 from rouault/master_warnings
...
CMake: add stronger warnings for openjp2 lib/bin by default, and error out on declaration-after-statement
2017-05-23 16:15:55 +02:00
Even Rouault
a8ca7c51f3
CMake: add stronger warnings for openjp2 lib/bin by default, and error out on declaration-after-statement
...
And remove occurences of unused arguments in src/lib/openjp2
2017-05-23 15:47:57 +02:00
Even Rouault
6e97d877b1
Merge pull request #935 from rouault/add_compress_vsc_test
...
Tests: test opj_compress in VSC mode (related to #172 )
2017-05-23 14:49:38 +02:00
Even Rouault
2d2c368b19
Tests: test opj_compress in VSC mode (related to #172 )
2017-05-23 14:31:39 +02:00
Even Rouault
8728cfbc79
t1.c: fix compiler warnings
2017-05-23 13:54:28 +02:00