lite-xl

Commit Graph

Author	SHA1	Message	Date
Guldoman	2d8d39c7c0	Skip patterns matching nothing in `tokenizer` (#1743 ) These patterns cause infinite loops, so warn about them and skip them.	2024-04-24 21:01:05 +01:00
Guldoman	234dd40e49	Fix patterns starting with `^` in `tokenizer` (#1645 ) Previously the "dirty" version of the pattern was used, which could result in trying to match with multiple `^`, which failed valid matches.	2023-12-26 13:16:33 +00:00
Guldoman	0ebf3c0393	Return state when tokenizing plaintext syntaxes	2023-08-07 14:51:14 +01:00
Guldoman	1c8c569fae	Allow `tokenizer` to pause and resume in the middle of a line (#1444 )	2023-08-07 14:50:58 +01:00
Guldoman	d9925b7d44	Allow groups to be used in end delimiter patterns in tokenizer (#1317 ) * Allow empty groups as first match in tokenizer * Avoid pushing tokens with empty strings * Allow groups to be used in end delimiter in tokenizer * Use the first entry of the type table for the middle part of a subsyntax This applies to delimited matches with a table for `type` and without a `syntax` field. * Match only once if using `at_start` in tokenizer `find_text` * Check if match is escaped in the "close" case too Also allow continuing matching if the match was escaped.	2023-08-07 14:50:58 +01:00
xwii	271a804986	Fix popping subsyntaxes that end consecutively (#1246 )	2022-12-27 20:24:52 -04:00
Guldoman	9d48441685	Add `regex.find_offsets`, `regex.find`, improve `regex.match` (#1232 ) `regex.match` now behaves like `string.match`. This required changes in the `tokenizer` and in the `detectindent` plugin.	2022-12-11 22:25:42 -04:00
Guldoman	0a1b8b6bb1	Set initial tokenizer state to a `NULL` byte	2022-11-15 16:01:04 +01:00
Guldoman	e147a6cb9b	Add `tokenizer.extract_subsyntaxes`	2022-11-15 16:00:48 +01:00
Jefferson González	b8a4f729df	tokenizer: remove the limit of 3 subsyntaxes depth (#1186 ) * tokenizer: remove the limit of 3 subsyntaxes depth Make the state a string of bytes instead of a 32bits integer to be able to have deeper subsyntax support. Fixes issues with syntax files like the one for PHP that was already hitting more than 3 subsyntaxes depth. * remove unnecesary call to set_subsyntax_pattern_idx * fixed wrong word on comments	2022-11-03 18:56:20 -04:00
Jefferson González	880e6e4f0f	Merge pull request #1040 from Guldoman/PR_tokenizer_errors_alert Add more tokenizer errors/warnings	2022-06-22 19:43:51 -04:00
Jefferson González	d2fd5c9df7	Merge pull request #1034 from Guldoman/PR_escape_start_patterns Check if "open" pattern is escaped	2022-06-15 16:51:34 -04:00
Guldoman	d169619f69	Warn if token type is a table when not needed	2022-06-15 21:31:16 +02:00
Guldoman	2e37e85a48	Add helper function to report bad patterns in tokenizer	2022-06-15 21:28:46 +02:00
Guldoman	5027a0f12b	Fix malformed pattern check for group patterns in tokenizer If the token type was a simple string (and not a table), the size of the string was used instead of `1`.	2022-06-15 19:33:58 +02:00
Guldoman	5b6b48320f	Check if "open" pattern is escaped Previously this check was only done for "close" patterns.	2022-06-12 04:19:05 +02:00
Guldoman	c947e8a4d1	Convert more byte offsets to utf-8 pos in regex tokenizer	2022-06-12 02:55:36 +02:00
Guldoman	d8efb1ab53	Show error if language plugin pattern has mismatching number of groups The number of results from a pattern with groups must never be greater than the number of token types for that pattern. Also if a token type was undefined, it's now pushed as a `normal` one.	2022-05-31 02:05:37 +02:00
Guldoman	7ac776bef6	Fix UTF-8 matches in regex group `tokenizer`	2022-05-31 01:59:14 +02:00
Guldoman	2a41002355	Allow using regex groups to split tokens Before, this was only supported by Lua patterns. This expects the regex to use the same syntax used for patterns. That is, the token should be split by empty groups.	2022-05-28 01:38:22 +02:00
jgmdev	94430bcbd2	tokenizer: fix next utf8 char retrieval bug	2022-05-13 11:21:46 -04:00
Jefferson González	e572c58f24	Add utf8 support to tokenizer (#945 ) * add utf8 support to tokenizer * wrap utf8 functions in string table using a 'u' prefix * document new utf8 functions	2022-04-26 09:42:02 -04:00
Guldoman	caefc9112a	Force syntax patterns starting with `^` to match with the whole line Before, syntax patterns/regexes that started with `^` didn't have the desired effect of matching with the start of the line. Now those patterns are used only when matching the whole line.	2022-03-04 11:27:01 +01:00
Guldoman	51975472a9	Add bit32 polyfill globally	2022-01-12 00:07:53 +01:00
Jan200101	99ddf1fb92	Migrate to Lua 5.4	2021-12-31 13:53:01 +01:00
Guldoman	4faaf089ef	Consume unmatched character correctly We must consume the whole UTF-8 character, not just a single byte.	2021-12-11 03:43:33 +01:00
Adam Harrison	96db380c73	Manual merge of into .	2021-11-23 15:57:22 -05:00
Francesco Abbate	5cdd800910	Fix problem checking utf-8 cont at end of string	2021-10-23 15:03:09 +02:00
Guldoman	8a516d35ce	Correctly identify the start of the next character in `tokenizer` When moving to the next character, we have to consider that the current one might be multi-byte.	2021-10-11 22:37:31 +02:00
takase1121	30ccde896d	replace unpack() with table.unpack() I have no idea unpack() is still used and how it still worked.	2021-08-29 09:14:12 +08:00
Adam	248d70a8ca	Add PCRE to support regular expressions Use regular expressions instead of Lua patterns for find and replace editor commands. Syntax files can now use regex or Lua patterns as before keeping backward compatibility for plugins.	2021-06-02 21:27:00 +02:00
Adam	949692860e	Tokenizer cleanup (#198 ) * Cleaned up tokenizer to make subsyntax operations more clear. * Explanatory comments. * Made it so push_subsyntax could be safely called elsewhere. * Unified terminology. * Minor bug fix. * State is an incredibly vaguely named variable. Changed convention to represent what it actually is. * Also changed function name. * Fixed bug.	2021-05-20 21:58:27 +02:00
liquidev	86a7037ed9	support for multiple groups in one pattern (#196 )	2021-05-19 22:35:28 +02:00
lqdev	ba4fbde33d	fixed mixed indentation	2021-05-18 17:52:18 +02:00
adamharrison	3fe6665b9a	Nested Syntax Highlighting (#160 )	2021-05-01 11:45:30 +02:00
rxi	6525269386	Made tokenizer skip parsing process on plain-text files This, along with the earlier rencache changes should resolve #64	2020-05-14 10:10:50 +01:00
rxi	f5025efbb8	Moved highlighter code from `DocView` to `Doc` * Only one highlighter state is kept per-document as opposed to one per-docview * Fixes a bug with retaining older highlighter state as a DocView wasn't able to detect lines changing above it's viewport * Renames `highlighter` module to more descriptive `tokenizer`	2020-05-07 21:14:46 +01:00

37 Commits