Commit Graph

20 Commits

Author SHA1 Message Date
Konstantin Ritt 529a933128 Micro optimization to hb_utf16_t and hb_utf32_t ::prev()
Implement reverse lookup instead of re-using next()
2015-11-07 02:00:04 +04:00
Konstantin Ritt 44ae9be7a2 Nano optimization to hb_utf16_t and hb_utf32_t ::next() 2015-11-07 01:58:38 +04:00
Behdad Esfahbod 61820bc4ca [API] Add hb_buffer_add_latin1()
This is by no ways to promote non-Unicode encodings.  This is an entry
point that takes Unicode codepoints that happen to all be the first
256 characters and hence fit in 8bit strings.  This is useful eg in Chrome
where strings that can fit in 8bit are implemented that way, and this
avoids copying into UTF-8 or UTF-16.

Perhaps we should rename this to hb_buffer_add_codepoints8().  I'm also
curious if anyone would be really interested in hb_buffer_add_codepoints16().

Please discuss!
2015-01-26 14:25:52 -08:00
Behdad Esfahbod a4d643755a Minor 2014-07-16 20:15:45 -04:00
Behdad Esfahbod 976c8f4552 New API: hb_buffer_[sg]et_replacement_codepoint()
With this change, we now by default replace broken UTF-8/16/32 bits
with U+FFFD.  This can be changed by calling new API on the buffer.
Previously the replacement value used to be (hb_codepoint_t)-1.

Note that hb_buffer_clear_contents() does NOT reset the replacement
character.

See discussion here:

6f13b6d62d

New API:

  hb_buffer_set_replacement_codepoint()
  hb_buffer_get_replacement_codepoint()
2014-07-16 15:34:20 -04:00
Behdad Esfahbod 625dbf141a [buffer] Templatize UTF-* functions 2014-07-16 14:52:59 -04:00
Behdad Esfahbod e634fed428 [buffer] Validate UTF-32 input
Same as what we do for UTF-8 and UTF-16.
2014-07-16 14:17:26 -04:00
Behdad Esfahbod b7bc0b671d Simplify / speed up UTF-8 code 2014-07-11 16:22:13 -04:00
Behdad Esfahbod af2490c095 Only accept well-formed UTF-8 sequences
Enable tests that were disabled before, and adjust one test,
and add more tests.
2014-07-11 16:22:13 -04:00
Behdad Esfahbod 7323d385cc Simplify hb_utf_prev<16> to call hb_utf_next<16> 2014-07-11 16:22:13 -04:00
Behdad Esfahbod 7627100f42 Mark unsigned integer literals with the u suffix
Simplifies hb_in_range() calls as the type can be inferred.
The rest is obsessiveness, I admit.
2014-07-11 16:22:13 -04:00
Behdad Esfahbod db8934faa1 Simplify hb_utf_prev<8> to call hb_utf_next<8> 2014-07-11 13:58:36 -04:00
Behdad Esfahbod 6f13b6d62d When parsing UTF-16, generate invalid codepoint for lonely low surrogate
Test passes now.
2014-07-10 19:39:39 -04:00
Behdad Esfahbod 0beb66e3a6 Fix warnings 2012-12-05 19:14:28 -05:00
Behdad Esfahbod e13f8d280b Fix UTF-8 backward iteration
Ouch!
2012-11-13 15:12:06 -08:00
Behdad Esfahbod 89ac39dbbe Add hb_utf_prev() 2012-09-25 13:59:24 -04:00
Behdad Esfahbod 70ea4ac688 Slightly optimize UTF-8 parsing 2012-09-25 12:30:16 -04:00
Behdad Esfahbod 4445e5e2ec [buffer] Cleanup / optimize UTF-16 parsing a bit 2012-09-25 12:26:12 -04:00
Behdad Esfahbod 1f66c3c1a0 Add hb_utf_strlen()
Speeds up UTF-8 parsing by calling strlen().
2012-09-25 11:42:16 -04:00
Behdad Esfahbod 7f19ae7b9f [buffer] Templatize UTF handling
Also move UTF routines into a separate file, to be reused from shapers
that need it.
2012-09-25 11:23:55 -04:00