initial commit

Signed-off-by: Peter Siegmund <mars3142@noreply.mars3142.dev>
This commit is contained in:
2025-10-31 23:37:30 +01:00
commit 7228269764
9653 changed files with 4034514 additions and 0 deletions

Binary file not shown.

View File

@@ -0,0 +1,3 @@
testdata/grepinputv
testdata/grepinputx

View File

@@ -0,0 +1,643 @@
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than
24KiB long so that it needs more than a single read() call to process it. New
features should be added at the end, because some of the tests involve the
output of line numbers, and we don't want these to change.
PATTERN at the start of a line.
In the middle of a line, PATTERN appears.
This pattern is in lower case.
Here follows a whole lot of stuff that makes the file over 24KiB long.
-------------------------------------------------------------------------------
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox
jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
-------------------------------------------------------------------------------
aaaaa0
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbb
cccccccccccccccccccccccccccccccccccccccccc
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
eeeee
aaaaa2
ffffffffff
This is a line before the binary zero.
This line contains a binary zero here >< for testing.
This is a line after the binary zero.
ABOVE the elephant
ABOVE
ABOVE theatre
AB.VE
AB.VE the turtle
010203040506
match 1:
a
match 2:
b
match 3:
c
match 4:
d
match 5:
e
Rhubarb
Custard Tart
zxc
cvb
bnm
asd
qwe
ert
tyu
uio
ggg
asd
dfg
ghj
jkl
abx
def
ghi
xyz
PUT NEW DATA ABOVE THIS LINE.
=============================
Check up on PATTERN near the end.
This is the last line of this file.

View File

@@ -0,0 +1,15 @@
triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t2_txt s1_tag s_txt p_tag p_txt o_tag
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t5_txt s1_tag s_txt p_tag p_txt o_tag
o_txt
triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t7_txt s1_tag s_txt p_tag p_txt o_tag o_txt

View File

@@ -0,0 +1,17 @@
X one
X two X three X four
X five
X six
X seven…X eightX nineX ten
Before 111
Before 222Before 333…Match
After 111
After 222After 333
And so on and so on
And so on and so on
ſ
ſſſſſ
ÁabcÁ Kk
A

View File

@@ -0,0 +1 @@
Aက€CD Z

View File

@@ -0,0 +1 @@
abc<EFBFBD>

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,17 @@
Data file for multiline tests of multiple matches.
start end in between start
end and following
Other stuff
start end in between start
end and following start
end other stuff
start end in between start
end
** These two lines must be last.
start end in between start
end

View File

@@ -0,0 +1,2 @@
abcሴdef
xyz

View File

@@ -0,0 +1,10 @@
The quick brown
fox jumps
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces

View File

@@ -0,0 +1,43 @@
This is a second file of input for the pcre2grep tests.
Here is the pattern again.
Pattern
That time it was on a line by itself.
To pat or not to pat, that is the question.
complete pair
of lines
That was a complete pair
of lines all by themselves.
complete pair
of lines
And there they were again, to check line numbers.
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
This line contains pattern not on a line by itself.
This is the last line of this file.

View File

@@ -0,0 +1,7 @@
This is a file of patterns for testing the -f option. Don't include any blank
lines because they will match everything! This is no longer true, so have one.
pattern
line by itself
End of the list of patterns.

View File

@@ -0,0 +1,43 @@
This is a second file of input for the pcre2grep tests.
Here is the pattern again.
Pattern
That time it was on a line by itself.
To pat or not to pat, that is the question.
complete pair
of lines
That was a complete pair
of lines all by themselves.
complete pair
of lines
And there they were again, to check line numbers.
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
This line contains pattern not on a line by itself.
This is the last line of this file.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,47 @@
---------------------------- Test U1 ------------------------------
1:X one
2:X two 3:X three 4:X four
5:X five
6:X six
7:X seven…8:X eight
9:X nine
10:X ten
RC=0
---------------------------- Test U2 ------------------------------
12-Before 111
13-Before 222
14-Before 333…15:Match
16-After 111
17-After 222
18-After 333
RC=0
---------------------------- Test U3 ------------------------------
21:0,2
22:0,2
22:2,2
22:4,2
22:6,2
22:8,2
RC=0
---------------------------- Test U4 ------------------------------
pcre2grep: pcre2_match() gave error -22 while matching this text:
Aက€CD Z
UTF-8 error: isolated byte with 0x80 bit set at offset 4
RC=1
---------------------------- Test U5 ------------------------------
CD Z
RC=0
---------------------------- Test U6 -----------------------------
=ǓǤ=
RC=0
---------------------------- Test U7 ------------------------------
Ã<EFBFBD>abcÃ<EFBFBD> KkK
RC=0
---------------------------- Test U8 ------------------------------
Ã<EFBFBD>abcÃ<6D> KkK
RC=0
---------------------------- Test U9 ------------------------------
Aï¼
A1
RC=0
---------------------------- Test U10 ------------------------------
A1

View File

@@ -0,0 +1,74 @@
--- Test 1 ---
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The quick brown
Arg1: [T] [his] [s] Arg2: |T| () () (0)
This time it jumps and jumps and jumps.
Arg1: [T] [his] [s] Arg2: |T| () () (0)
This line contains \E and (regex) *meta* [characters].
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The word is cat in this line
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The caterpillar sat on the mat
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The snowcat is not an animal
RC=0
--- Test 2 ---
Arg1: [qu] [qu]
The quick brown
Arg1: [ t] [ t]
This time it jumps and jumps and jumps.
Arg1: [ l] [ l]
This line contains \E and (regex) *meta* [characters].
Arg1: [wo] [wo]
The word is cat in this line
Arg1: [ca] [ca]
The caterpillar sat on the mat
Arg1: [sn] [sn]
The snowcat is not an animal
RC=0
--- Test 3 ---
0:T
The quick brown
0:T
This time it jumps and jumps and jumps.
0:T
This line contains \E and (regex) *meta* [characters].
0:T
The word is cat in this line
0:T
The caterpillar sat on the mat
0:T
The snowcat is not an animal
RC=0
--- Test 4 ---
0:T
The quick brown
0:T
This time it jumps and jumps and jumps.
0:T
This line contains \E and (regex) *meta* [characters].
0:T
The word is cat in this line
0:T
The caterpillar sat on the mat
0:T
The snowcat is not an animal
RC=0
--- Test 5 ---
T
T
T
T
T
T
RC=1
--- Test 6 ---
0:T:AA
The quick brown
RC=0

View File

@@ -0,0 +1,50 @@
--- Test 1 ---
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
RC=0
--- Test 2 ---
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
RC=0
--- Test 3 ---
0:T
The quick brown
0:T
This time it jumps and jumps and jumps.
0:T
This line contains \E and (regex) *meta* [characters].
0:T
The word is cat in this line
0:T
The caterpillar sat on the mat
0:T
The snowcat is not an animal
RC=0
--- Test 4 ---
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
RC=0
--- Test 5 ---
T
T
T
T
T
T
RC=1
--- Test 6 ---
0:T:AA
The quick brown
RC=0

View File

@@ -0,0 +1,22 @@
--- Test 1 ---
0:¦
The quick brown
0:¦
This time it jumps and jumps and jumps.
0:¦
This line contains \E and (regex) *meta* [characters].
0:¦
The word is cat in this line
0:¦
The caterpillar sat on the mat
0:¦
The snowcat is not an animal
RC=0
--- Test 2 ---
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
RC=0

View File

@@ -0,0 +1,34 @@
--- Test 1 ---
0:¦
The quick brown
0:¦
This time it jumps and jumps and jumps.
0:¦
This line contains \E and (regex) *meta* [characters].
0:¦
The word is cat in this line
0:¦
The caterpillar sat on the mat
0:¦
The snowcat is not an animal
RC=0
--- Test 2 ---
0:¦
The quick brown
0:¦
This time it jumps and jumps and jumps.
0:¦
This line contains \E and (regex) *meta* [characters].
0:¦
The word is cat in this line
0:¦
The caterpillar sat on the mat
0:¦
The snowcat is not an animal
RC=0

View File

@@ -0,0 +1,6 @@
one
two
RC=0
one
two
RC=0

View File

@@ -0,0 +1,3 @@
one
two
RC=0

View File

@@ -0,0 +1,42 @@
---------------------------- Test N1 ------------------------------
1:abc
2:def
RC=0
1-abc
2:def
RC=0
---------------------------- Test N2 ------------------------------
1:abc
def
2:ghi
jkl
RC=0
1-abc
def
2:ghi
jkl
RC=0
---------------------------- Test N3 ------------------------------
2:def
3:
ghi
jkl
RC=0
---------------------------- Test N4 ------------------------------
2:ghi
jkl
RC=0
---------------------------- Test N5 ------------------------------
1:abc
2:def
3:ghi
4:jkl
RC=0
1-abc
2:def
RC=0
---------------------------- Test N6 ------------------------------
1:abc
2:def
3:ghi
4:jkl

View File

@@ -0,0 +1,4 @@
---------------------------- Test UN2 ------------------------------
1:abcð
RC=0

View File

@@ -0,0 +1,2 @@
xxx
jkl

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,714 @@
# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
#newline_default lf any anycrlf
# The next 5 patterns have UTF-8 errors
/[Ã]/utf
/Ã/utf
/ÃÃÃxxx/utf
/‚‚‚‚‚‚‚Ã/utf
/‚‚‚‚‚‚‚Ã/match_invalid_utf
# Now test subjects
/badutf/utf
\= Expect UTF-8 errors
X\xdf
XX\xef
XXX\xef\x80
X\xf7
XX\xf7\x80
XXX\xf7\x80\x80
\xfb
\xfb\x80
\xfb\x80\x80
\xfb\x80\x80\x80
\xfd
\xfd\x80
\xfd\x80\x80
\xfd\x80\x80\x80
\xfd\x80\x80\x80\x80
\xdf\x7f
\xef\x7f\x80
\xef\x80\x7f
\xf7\x7f\x80\x80
\xf7\x80\x7f\x80
\xf7\x80\x80\x7f
\xfb\x7f\x80\x80\x80
\xfb\x80\x7f\x80\x80
\xfb\x80\x80\x7f\x80
\xfb\x80\x80\x80\x7f
\xfd\x7f\x80\x80\x80\x80
\xfd\x80\x7f\x80\x80\x80
\xfd\x80\x80\x7f\x80\x80
\xfd\x80\x80\x80\x7f\x80
\xfd\x80\x80\x80\x80\x7f
\xed\xa0\x80
\xc0\x8f
\xe0\x80\x8f
\xf0\x80\x80\x8f
\xf8\x80\x80\x80\x8f
\xfc\x80\x80\x80\x80\x8f
\x80
\xfe
\xff
/badutf/utf
\= Expect UTF-8 errors
XX\xfb\x80\x80\x80\x80
XX\xfd\x80\x80\x80\x80\x80
XX\xf7\xbf\xbf\xbf
/shortutf/utf
\= Expect UTF-8 errors
XX\xdf\=ph
XX\xef\=ph
XX\xef\x80\=ph
\xf7\=ph
\xf7\x80\=ph
\xf7\x80\x80\=ph
\xfb\=ph
\xfb\x80\=ph
\xfb\x80\x80\=ph
\xfb\x80\x80\x80\=ph
\xfd\=ph
\xfd\x80\=ph
\xfd\x80\x80\=ph
\xfd\x80\x80\x80\=ph
\xfd\x80\x80\x80\x80\=ph
/anything/utf
\= Expect UTF-8 errors
X\xc0\x80
XX\xc1\x8f
XXX\xe0\x9f\x80
\xf0\x8f\x80\x80
\xf8\x87\x80\x80\x80
\xfc\x83\x80\x80\x80\x80
\xfe\x80\x80\x80\x80\x80
\xff\x80\x80\x80\x80\x80
\xf8\x88\x80\x80\x80
\xf9\x87\x80\x80\x80
\xfc\x84\x80\x80\x80\x80
\xfd\x83\x80\x80\x80\x80
\= Expect no match
\xc3\x8f
\xe0\xaf\x80
\xe1\x80\x80
\xf0\x9f\x80\x80
\xf1\x8f\x80\x80
\xf8\x88\x80\x80\x80\=no_utf_check
\xf9\x87\x80\x80\x80\=no_utf_check
\xfc\x84\x80\x80\x80\x80\=no_utf_check
\xfd\x83\x80\x80\x80\x80\=no_utf_check
# Similar tests with offsets
/badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
\= Expect no match
X\xdfabcd\=offset=2
/(?<=x)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\xdf\=offset=3
\= Expect no match
X\xdfabcd\=offset=3
/(?<=xx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\=offset=3
/(?<=xxxx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\=offset=3
X\xdfabc\xdf\=offset=6
X\xdfabc\xdf\=offset=7
\= Expect no match
X\xdfabcd\=offset=6
/\x{100}/IB,utf
/\x{1000}/IB,utf
/\x{10000}/IB,utf
/\x{100000}/IB,utf
/\x{10ffff}/IB,utf
/[\x{ff}]/IB,utf
/[\x{100}]/IB,utf
/\x80/IB,utf
/\xff/IB,utf
/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf
\x{D55c}\x{ad6d}\x{C5B4}
/\x{65e5}\x{672c}\x{8a9e}/IB,utf
\x{65e5}\x{672c}\x{8a9e}
/\x{80}/IB,utf
/\x{084}/IB,utf
/\x{104}/IB,utf
/\x{861}/IB,utf
/\x{212ab}/IB,utf
/[^ab\xC0-\xF0]/IB,utf
\x{f1}
\x{bf}
\x{100}
\x{1000}
\= Expect no match
\x{c0}
\x{f0}
/(\x{100}+|x)/IB,utf
/(\x{100}*a|x)/IB,utf
/(\x{100}{0,2}a|x)/IB,utf
/(\x{100}{1,2}a|x)/IB,utf
/\x{100}/IB,utf
/a\x{100}\x{101}*/IB,utf
/a\x{100}\x{101}+/IB,utf
/[^\x{c4}]/IB
/[\x{100}]/IB,utf
\x{100}
Z\x{100}
\x{100}Z
/[\xff]/IB,utf
>\x{ff}<
/[^\xff]/IB,utf
/\x{100}abc(xyz(?1))/IB,utf
/\777/I,utf
\x{1ff}
\777
/\x{100}+\x{200}/IB,utf
/\x{100}+X/IB,utf
/^[\QÄ€\E-\QÅ<51>\E/B,utf
# This tests the stricter UTF-8 check according to RFC 3629.
/X/utf
\= Expect UTF-8 errors
\x{d800}
\x{da00}
\x{dfff}
\x{110000}
\x{2000000}
\x{7fffffff}
\= Expect no match
\x{d800}\=no_utf_check
\x{da00}\=no_utf_check
\x{dfff}\=no_utf_check
\x{110000}\=no_utf_check
\x{2000000}\=no_utf_check
\x{7fffffff}\=no_utf_check
/(*UTF8)\x{1234}/
abcd\x{1234}pqr
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
/\h/I,utf
ABC\x{09}
ABC\x{20}
ABC\x{a0}
ABC\x{1680}
ABC\x{180e}
ABC\x{2000}
ABC\x{202f}
ABC\x{205f}
ABC\x{3000}
/\v/I,utf
ABC\x{0a}
ABC\x{0b}
ABC\x{0c}
ABC\x{0d}
ABC\x{85}
ABC\x{2028}
/\h*A/I,utf
CDBABC
/\v+A/I,utf
/\s?xxx\s/I,utf
/\sxxx\s/I,utf,tables=2
AB\x{85}xxx\x{a0}XYZ
AB\x{a0}xxx\x{85}XYZ
/\S \S/I,utf,tables=2
\x{a2} \x{84}
A Z
/a+/utf
a\x{123}aa\=offset=1
a\x{123}aa\=offset=3
a\x{123}aa\=offset=4
\= Expect bad offset value
a\x{123}aa\=offset=6
\= Expect bad UTF-8 offset
a\x{123}aa\=offset=2
\= Expect no match
a\x{123}aa\=offset=5
/\x{1234}+/Ii,utf
/\x{1234}+?/Ii,utf
/\x{1234}++/Ii,utf
/\x{1234}{2}/Ii,utf
/[^\x{c4}]/IB,utf
/X+\x{200}/IB,utf
/\R/I,utf
/\777/IB,utf
/\w+\x{C4}/B,utf
a\x{C4}\x{C4}
/\w+\x{C4}/B,utf,tables=2
a\x{C4}\x{C4}
/\W+\x{C4}/B,utf
!\x{C4}
/\W+\x{C4}/B,utf,tables=2
!\x{C4}
/\W+\x{A1}/B,utf
!\x{A1}
/\W+\x{A1}/B,utf,tables=2
!\x{A1}
/X\s+\x{A0}/B,utf
X\x20\x{A0}\x{A0}
/X\s+\x{A0}/B,utf,tables=2
X\x20\x{A0}\x{A0}
/\S+\x{A0}/B,utf
X\x{A0}\x{A0}
/\S+\x{A0}/B,utf,tables=2
X\x{A0}\x{A0}
/\x{a0}+\s!/B,utf
\x{a0}\x20!
/\x{a0}+\s!/B,utf,tables=2
\x{a0}\x20!
/A/utf
\x{ff000041}
\x{7f000041}
/(*UTF8)abc/never_utf
/abc/utf,never_utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf
/AB\x{1fb0}/IB,utf
/AB\x{1fb0}/IBi,utf
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
/[â±¥]/Bi,utf
/[^â±¥]/Bi,utf
/\h/I
/\v/I
/\R/I
/[[:blank:]]/B,ucp
/\x{212a}+/Ii,utf
KKkk\x{212a}
/s+/Ii,utf
SSss\x{17f}
/\x{100}*A/IB,utf
A
/\x{100}*\d(?R)/IB,utf
/[Z\x{100}]/IB,utf
Z\x{100}
\x{100}
\x{100}Z
/[z-\x{100}]/IB,utf
/[z\Qa-d]Ä€\E]/IB,utf
\x{100}
Ä€
/[ab\x{100}]abc(xyz(?1))/IB,utf
/\x{100}*\s/IB,utf
/\x{100}*\d/IB,utf
/\x{100}*\w/IB,utf
/\x{100}*\D/IB,utf
/\x{100}*\S/IB,utf
/\x{100}*\W/IB,utf
/[\x{105}-\x{109}]/IBi,utf
\x{104}
\x{105}
\x{109}
\= Expect no match
\x{100}
\x{10a}
/[z-\x{100}]/IBi,utf
Z
z
\x{39c}
\x{178}
|
\x{80}
\x{ff}
\x{100}
\x{101}
\= Expect no match
\x{102}
Y
y
/[z-\x{100}]/IBi,utf
/\x{3a3}B/IBi,utf
/abc/utf,replace=Ã
abc
/(?<=(a)(?-1))x/I,utf
a\x80zx\=offset=3
/[\W\p{Any}]/B
abc
123
/[\W\pL]/B
abc
\= Expect no match
123
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/utf
/[\s[:^ascii:]]/B,ucp
# A special extra option allows excaped surrogate code points in 8-bit mode,
# but subjects containing them must not be UTF-checked.
/\x{d800}/I,utf,allow_surrogate_escapes
\x{d800}\=no_utf_check
/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes
\x{dfff}\x{df01}\=no_utf_check
# This has different starting code units in 8-bit mode.
/^[^ab]/IB,utf
c
\x{ff}
\x{100}
\= Expect no match
aaa
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
# Check name length with non-ASCII characters
/(?'ABáC678901234567890123456789012012345678901234567890123456789AB012345678901234567890123456789AB012345678901234567890123456789AB'...)/utf
/(?'ABáC6789012345678901234567890123012345678901234567890123456789AB012345678901234567890123456789AB012345678901234567890123456789AB'...)/utf
/(?'ABZC6789012345678901234567890123012345678901234567890123456789AB012345678901234567890123456789AB012345678901234567890123456789AB'...)/utf
/(?(n/utf
/(?(á/utf
# Invalid UTF-8 tests
/.../g,match_invalid_utf
abcd\x80wxzy\x80pqrs
abcd\x{80}wxzy\x80pqrs
/abc/match_invalid_utf
ab\x80ab\=ph
\= Expect no match
ab\x80cdef\=ph
/.a/match_invalid_utf
ab\=ph
ab\=ps
b\xf0\x91\x88b\=ph
b\xf0\x91\x88b\=ps
b\xf0\x91\x88\xb4a
\= Expect no match
b\x80\=ph
b\x80\=ps
b\xf0\x91\x88\=ph
b\xf0\x91\x88\=ps
/.a$/match_invalid_utf
ab\=ph
ab\=ps
\= Expect no match
b\xf0\x91\x98\=ph
b\xf0\x91\x98\=ps
/ab$/match_invalid_utf
ab\x80cdeab
\= Expect no match
ab\x80cde
/.../g,match_invalid_utf
abcd\x{80}wxzy\x80pqrs
/(?<=x)../g,match_invalid_utf
abcd\x{80}wxzy\x80pqrs
abcd\x{80}wxzy\x80xpqrs
/X$/match_invalid_utf
\= Expect no match
X\xc4
/(?<=..)X/match_invalid_utf,aftertext
AB\x80AQXYZ
AB\x80AQXYZ\=offset=5
AB\x80\x80AXYZXC\=offset=5
\= Expect no match
AB\x80XYZ
AB\x80XYZ\=offset=3
AB\xfeXYZ
AB\xffXYZ\=offset=3
AB\x80AXYZ
AB\x80AXYZ\=offset=4
AB\x80\x80AXYZ\=offset=5
/.../match_invalid_utf
AB\xc4CCC
\= Expect no match
A\x{d800}B
A\x{110000}B
A\xc4B
/\bX/match_invalid_utf
A\x80X
/\BX/match_invalid_utf
\= Expect no match
A\x80X
/(?<=...)X/match_invalid_utf
AAA\x80BBBXYZ
\= Expect no match
AAA\x80BXYZ
AAA\x80BBXYZ
# -------------------------------------
/(*UTF)(?=\x{123})/I
/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf
/[󿾟,]/BI,utf
/[\x{fff4}-\x{ffff8}]/I,utf
/[\x{fff4}-\x{afff8}\x{10ffff}]/I,utf
/[\xff\x{ffff}]/I,utf
/[\xff\x{ff}]/I,utf
abc\x{ff}def
/[\xff\x{ff}]/I
abc\x{ff}def
/[Ss]/I
/[Ss]/I,utf
/(?:\x{ff}|\x{3000})/I,utf
/x/utf
abxyz
\x80\=startchar
abc\x80\=startchar
abc\x80\=startchar,offset=3
/\x{c1}+\x{e1}/iIB,ucp
\x{c1}\x{c1}\x{c1}
\x{e1}\x{e1}\x{e1}
/a|\x{c1}/iI,ucp
\x{e1}xxx
/a|\x{c1}/iI,utf
\x{e1}xxx
/\x{c1}|\x{e1}/iI,ucp
/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended
X\x{e1}Y
/X(\x{e1})Y/i,ucp,replace=>\L$1<,substitute_extended
X\x{c1}Y
# Without UTF or UCP characters > 127 have only one case in the default locale.
/X(\x{e1})Y/replace=>\U$1<,substitute_extended
X\x{e1}Y
/A/utf,match_invalid_utf,caseless
\xe5A
/\bch\b/utf,match_invalid_utf
qchq\=ph
qchq\=ps
/line1\nbreak/firstline,utf,match_invalid_utf
line1\nbreak
line0\nline1\nbreak
/A\z/utf,match_invalid_utf
A\x80\x42\n
/ab$/match_invalid_utf
\= Expect no match
ab\x80cde
/ab\z/match_invalid_utf
\= Expect no match
ab\x80cde
/ab\Z/match_invalid_utf
\= Expect no match
ab\x80cde
/(..)(*scs:(1)ab\z)/match_invalid_utf
ab\x80cde
/(..)(*scs:(1)ab\Z)/match_invalid_utf
ab\x80cde
/(..)(*scs:(1)ab$)/match_invalid_utf
ab\x80cde
/(.) \1/i,ucp
i I
/(.) \1/i,ucp,turkish_casing
/[\x60-\x7f]/i,ucp,turkish_casing
i
\= Expect no match
I
/[\x60-\xc0]/i,ucp,turkish_casing
i
\= Expect no match
I
/[\x80-\xc0]/i,ucp,turkish_casing
\= Expect no match
i
I
# python_octal
/\400/
/abc/substitute_extended
abc\=replace=\400
/\400/python_octal
/abc/substitute_extended,python_octal
abc\=replace=\400
/\400/utf
/abc/utf,substitute_extended
abc\=replace=\400
/\400/utf,python_octal
/abc/utf,substitute_extended,python_octal
abc\=replace=\400
/[\x00-\x2f\x11-\xff]+/B
abcd
/[\x00-\x2f\x11-\xff]{4,}/B,utf
abcd
# End of testinput10

View File

@@ -0,0 +1,504 @@
# This set of tests is for the 16-bit and 32-bit libraries' basic (non-UTF)
# features that are not compatible with the 8-bit library, or which give
# different output in 16-bit or 32-bit mode. The output for the two widths is
# different, so they have separate output files.
#forbid_utf
#newline_default LF ANY ANYCRLF
/[^\x{c4}]/IB
/\x{100}/I
/ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional leading comment
(?: (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address
| # or
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # one word, optionally followed by....
(?:
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
\(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) | # comments, or...
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
# quoted strings
)*
< (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # leading <
(?: @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* , (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
)* # further okay, if led by comma
: # closing colon
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* )? # optional route
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address spec
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* > # trailing >
# name and address
) (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
/[\h]/B
>\x09<
/[\h]+/B
>\x09\x20\xa0<
/[\v]/B
/[^\h]/B
/\h+/I
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000}
/[\h\x{dc00}]+/IB
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000}
/\H+/I
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b}
\x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060}
\xa0\x{3000}\x9f\xa1\x{2fff}\x{3001}
/[\H\x{d800}]+/
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b}
\x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060}
\xa0\x{3000}\x9f\xa1\x{2fff}\x{3001}
/\v+/I
\x{2027}\x{2030}\x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
/[\v\x{dc00}]+/IB
\x{2027}\x{2030}\x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
/\V+/I
\x{2028}\x{2029}\x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
/[\V\x{d800}]+/
\x{2028}\x{2029}\x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
/\R+/I,bsr=unicode
\x{2027}\x{2030}\x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}
/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/B
/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/Bi
/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/B
/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/Bi
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
/\u0100/B,alt_bsux,allow_empty_class,match_unset_backref
/[\u0100-\u0200]/B,alt_bsux,allow_empty_class,match_unset_backref
/\ud800/B,alt_bsux,allow_empty_class,match_unset_backref
/^\x{ffff}+/i
\x{ffff}
/^\x{ffff}?/i
\x{ffff}
/^\x{ffff}*/i
\x{ffff}
/^\x{ffff}{3}/i
\x{ffff}\x{ffff}\x{ffff}
/^\x{ffff}{0,3}/i
\x{ffff}
/[^\x00-a]{12,}[^b-\xff]*/B
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B
/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/B
/^[\x{1234}\x{4321}]{2,4}?/
\x{1234}\x{1234}\x{1234}
# Check maximum non-UTF character size for the 16-bit library.
/\x{ffff}/
A\x{ffff}B
/\x{10000}/
/\o{20000}/
# Check maximum character size for the 32-bit library. These will all give
# errors in the 16-bit library.
/\x{110000}/
/\x{7fffffff}/
/\x{80000000}/
/\x{ffffffff}/
/\x{100000000}/
/\o{17777777777}/
/\o{20000000000}/
/\o{37777777777}/
/\o{40000000000}/
/\x{7fffffff}\x{7fffffff}/I
/\x{80000000}\x{80000000}/I
/\x{ffffffff}\x{ffffffff}/I
# Non-UTF characters
/.{2,3}/
\x{400000}\x{400001}\x{400002}\x{400003}
/\x{400000}\x{800000}/IBi
# Check character ranges
/[\H]/IB
/[\V]/IB
/(*THEN:\[A]{65501})/expand
# We can use pcre2test's utf8_input modifier to create wide pattern characters,
# even though this test is run when UTF is not supported.
/a\x{d800}b/utf8_input
 €b
a\x{d800}b
a\o{154000}b
\= Expect warning unless 32bit
a\N{U+d800}b
/a\x{ffff}b/utf8_input
aï¿¿b
a\x{ffff}b
a\o{177777}b
a\N{U+ffff}b
/abý¿¿¿¿¿z/utf8_input
abý¿¿¿¿¿z
ab\x{7fffffff}z
ab\o{17777777777}z
ab\N{U+7fffffff}z
/abÿý¿¿¿¿¿z/utf8_input
abÿý¿¿¿¿¿z
ab\x{ffffffff}z
/abÿAz/utf8_input
abÿAz
ab\x{80000041}z
\= Expect no match
abAz
aAz
ab\377Az
ab\xff\N{U+0041}z
ab\N{U+ff}\N{U+41}z
/ab\x{80000041}z/
ab\x{80000041}z
/(?i:A{1,}\6666666666)/
A\x{1b6}6666666
/abc/substitute_extended,replace=>\777<
abc
/abc/substitute_extended,replace=>\o{012345}<
abc
# Character range merging tests
/[\x{100}-\x{200}\H\x{8000}-\x{9000}]/B
/[\x{100}-\x{200}\V\x{8000}-\x{9000}]/B
/[\x00-\x{6000}\x{3000}-\x{ffff}]#[\x00-\x{6000}\x{3000}-\x{ffff}]{5,7}?/B
/[\x00-\x{6000}\x{3000}-\x{ffffffff}]#[\x00-\x{6000}\x{3000}-\x{ffffffff}]{5,7}?/B
/[\x00-\x2f\x11-\xff]*?!/B
abcd!e
/i/turkish_casing
# Character list tests
/([\x{100}-\x{7fff}\x{9000}\x{9002}\x{9004}\x{9006}\x{9008}\x{10000}-\x{7fffffff}]{3,8}?).#/B
\x{9001}\x{9007}\x{8000}\x{ffff}\x{9002}\x{7fff}\x{10000}\x{7fffffff}\x{500000}\x{9006}#
/([\x{3000}\x{3001}\x{3003}\x{3004}\x{3006}\x{3007}\x{8000}-\x{ffff}\x{100001}\x{100002}\x{100004}\x{100005}\x{100007}\x{100008}\x{10000a}\x{10000b}\x{80000000}-\x{ffffffff}]{5,}).#/B
\x{2fff}\x{3002}\x{7fff}\x{100000}\x{7fffffff}\x{3000}\x{3007}\x{8000}\x{ffff}\x{100001}\x{10000b}\x{80000000}\x{ffffffff}\x{3000}#
/([^\x{4000}\x{4002}\x{4004}\x{4005}\x{4007}\x{4009}\x{400a}\x{f000}\x{f002}\x{f004}\x{f005}\x{f007}\x{f009}\x{f00a}\x{100000}\x{100002}\x{100004}\x{100005}\x{100007}\x{100009}\x{10000a}\x{a0000000}\x{a0000002}\x{a0000004}\x{a0000005}\x{a0000007}\x{a0000009}\x{a000000a}]+).#/B
\x{4000}\x{4002}\x{4004}\x{4005}\x{4007}\x{4009}\x{400a}\x{3fff}\x{4001}\x{4003}\x{4006}\x{4008}\x{400b}\x{100}#
\x{f000}\x{f002}\x{f004}\x{f005}\x{f007}\x{f009}\x{f00a}\x{efff}\x{f001}\x{f003}\x{f006}\x{f008}\x{f00b}\x{100}#
\x{100000}\x{100002}\x{100004}\x{100005}\x{100007}\x{100009}\x{10000a}\x{fffff}\x{100001}\x{100003}\x{100006}\x{100008}\x{10000b}\x{100}#
\x{a0000000}\x{a0000002}\x{a0000004}\x{a0000005}\x{a0000007}\x{a0000009}\x{a000000a}\x{9fffffff}\x{a0000001}\x{a0000003}\x{a0000006}\x{a0000008}\x{a000000b}\x{100}#
# --------------
# EXTENDED CHARACTER CLASSES (UTS#18)
# META_BIGVALUE tests
/\x{80000000}/B
\x{80000000}
\= Expect no match
\x{7fffffff}
\x{80000001}
/[\x{80000000}-\x{8000000f}\x{8fffffff}]/B
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/\x{80000000}/B,alt_extended_class
\x{80000000}
\= Expect no match
\x{7fffffff}
\x{80000001}
/[\x{80000000}-\x{8000000f}\x{8fffffff}]/B,alt_extended_class
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/[\x{80000000}-\x{8000000f}--\x{80000002}]/B,alt_extended_class
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
/[[\x{80000000}-\x{8000000f}]--[\x{80000002}]]/B,alt_extended_class
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
# --------------
# EXTENDED CHARACTER CLASSES (Perl)
# META_BIGVALUE tests
/(?[[\x{80000000}-\x{8000000f}]+\x{8fffffff}])/B
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/(?[[\x{80000000}-\x{8000000f}]-\x{80000002}])/B
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
/(?[[\x{80000000}-\x{8000000f}]-\x{80000002}])/B
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
# --------------
# End of testinput11

View File

@@ -0,0 +1,715 @@
# This set of tests is for UTF-16 and UTF-32 support, including Unicode
# properties. It is relevant only to the 16-bit and 32-bit libraries. The
# output is different for each library, so there are separate output files.
/ÃÃÃxxx/IB,utf,no_utf_check
/abc/utf
Ã]
# Check maximum character size
/\x{ffff}/IB,utf
/\x{10000}/IB,utf
/\x{100}/IB,utf
/\x{1000}/IB,utf
/\x{10000}/IB,utf
/\x{100000}/IB,utf
/\x{10ffff}/IB,utf
/[\x{ff}]/IB,utf
/[\x{100}]/IB,utf
/\x80/IB,utf
/\xff/IB,utf
/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf
\x{D55c}\x{ad6d}\x{C5B4}
/\x{65e5}\x{672c}\x{8a9e}/IB,utf
\x{65e5}\x{672c}\x{8a9e}
/\x{80}/IB,utf
/\x{084}/IB,utf
/\x{104}/IB,utf
/\x{861}/IB,utf
/\x{212ab}/IB,utf
/[^ab\xC0-\xF0]/IB,utf
\x{f1}
\x{bf}
\x{100}
\x{1000}
\= Expect no match
\x{c0}
\x{f0}
/(\x{100}+|x)/IB,utf
/(\x{100}*a|x)/IB,utf
/(\x{100}{0,2}a|x)/IB,utf
/(\x{100}{1,2}a|x)/IB,utf
/\x{100}/IB,utf
/a\x{100}\x{101}*/IB,utf
/a\x{100}\x{101}+/IB,utf
/[^\x{c4}]/IB
/[\x{100}]/IB,utf
\x{100}
Z\x{100}
\x{100}Z
/[\xff]/IB,utf
>\x{ff}<
/[^\xff]/IB,utf
/\x{100}abc(xyz(?1))/IB,utf
/\777/I,utf
\x{1ff}
\777
/\x{100}+\x{200}/IB,utf
/\x{100}+X/IB,utf
/^[\QÄ€\E-\QÅ<51>\E/B,utf
/X/utf
XX\x{d800}\=no_utf_check
XX\x{da00}\=no_utf_check
XX\x{dc00}\=no_utf_check
XX\x{de00}\=no_utf_check
XX\x{dfff}\=no_utf_check
\= Expect UTF error
XX\x{d800}
XX\x{da00}
XX\x{dc00}
XX\x{de00}
XX\x{dfff}
XX\x{110000}
XX\x{d800}\x{1234}
\= Expect no match
XX\x{d800}\=offset=3
/(?<=.)X/utf
XX\x{d800}\=offset=3
/(*UTF16)\x{11234}/
abcd\x{11234}pqr
/(*UTF)\x{11234}/I
abcd\x{11234}pqr
/(*UTF-32)\x{11234}/
abcd\x{11234}pqr
/(*UTF-32)\x{112}/
abcd\x{11234}pqr
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
/\h/I,utf
ABC\x{09}
ABC\x{20}
ABC\x{a0}
ABC\x{1680}
ABC\x{180e}
ABC\x{2000}
ABC\x{202f}
ABC\x{205f}
ABC\x{3000}
/\v/I,utf
ABC\x{0a}
ABC\x{0b}
ABC\x{0c}
ABC\x{0d}
ABC\x{85}
ABC\x{2028}
/\h*A/I,utf
CDBABC
\x{2000}ABC
/\R*A/I,bsr=unicode,utf
CDBABC
\x{2028}A
/\v+A/I,utf
/\s?xxx\s/I,utf
/\sxxx\s/I,utf,tables=2
AB\x{85}xxx\x{a0}XYZ
AB\x{a0}xxx\x{85}XYZ
/\S \S/I,utf,tables=2
\x{a2} \x{84}
A Z
/a+/utf
a\x{123}aa\=offset=1
a\x{123}aa\=offset=2
a\x{123}aa\=offset=3
\= Expect no match
a\x{123}aa\=offset=4
\= Expect bad offset error
a\x{123}aa\=offset=5
a\x{123}aa\=offset=6
/\x{1234}+/Ii,utf
/\x{1234}+?/Ii,utf
/\x{1234}++/Ii,utf
/\x{1234}{2}/Ii,utf
/[^\x{c4}]/IB,utf
/X+\x{200}/IB,utf
/\R/I,utf
# Check bad offset
/a/utf
\= Expect bad UTF-16 offset, or no match in 32-bit
\x{10000}\=offset=1
\x{10000}ab\=offset=1
\= Expect 16-bit match, 32-bit no match
\x{10000}ab\=offset=2
\= Expect no match
\x{10000}ab\=offset=3
\= Expect no match in 16-bit, bad offset in 32-bit
\x{10000}ab\=offset=4
\= Expect bad offset
\x{10000}ab\=offset=5
/í¼€/utf
/\w+\x{C4}/B,utf
a\x{C4}\x{C4}
/\w+\x{C4}/B,utf,tables=2
a\x{C4}\x{C4}
/\W+\x{C4}/B,utf
!\x{C4}
/\W+\x{C4}/B,utf,tables=2
!\x{C4}
/\W+\x{A1}/B,utf
!\x{A1}
/\W+\x{A1}/B,utf,tables=2
!\x{A1}
/X\s+\x{A0}/B,utf
X\x20\x{A0}\x{A0}
/X\s+\x{A0}/B,utf,tables=2
X\x20\x{A0}\x{A0}
/\S+\x{A0}/B,utf
X\x{A0}\x{A0}
/\S+\x{A0}/B,utf,tables=2
X\x{A0}\x{A0}
/\x{a0}+\s!/B,utf
\x{a0}\x20!
/\x{a0}+\s!/B,utf,tables=2
\x{a0}\x20!
/(*UTF)abc/never_utf
/abc/utf,never_utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf
/AB\x{1fb0}/IB,utf
/AB\x{1fb0}/IBi,utf
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
/[â±¥]/Bi,utf
/[^â±¥]/Bi,utf
/[[:blank:]]/B,ucp
/\x{212a}+/Ii,utf
KKkk\x{212a}
/s+/Ii,utf
SSss\x{17f}
# Non-UTF characters should give errors in both 16-bit and 32-bit modes.
/\x{110000}/utf
/\o{4200000}/utf
/\x{100}*A/IB,utf
A
/\x{100}*\d(?R)/IB,utf
/[Z\x{100}]/IB,utf
Z\x{100}
\x{100}
\x{100}Z
/[z-\x{100}]/IB,utf
/[z\Qa-d]Ä€\E]/IB,utf
\x{100}
Ä€
/[ab\x{100}]abc(xyz(?1))/IB,utf
/\x{100}*\s/IB,utf
/\x{100}*\d/IB,utf
/\x{100}*\w/IB,utf
/\x{100}*\D/IB,utf
/\x{100}*\S/IB,utf
/\x{100}*\W/IB,utf
/[\x{105}-\x{109}]/IBi,utf
\x{104}
\x{105}
\x{109}
\= Expect no match
\x{100}
\x{10a}
/[z-\x{100}]/IBi,utf
Z
z
\x{39c}
\x{178}
|
\x{80}
\x{ff}
\x{100}
\x{101}
\= Expect no match
\x{102}
Y
y
/[z-\x{100}]/IBi,utf
/\x{3a3}B/IBi,utf
/./utf
\x{110000}
/(*UTF)abý¿¿¿¿¿z/B
/abý¿¿¿¿¿z/utf
/[\W\p{Any}]/B
abc
123
/[\W\pL]/B
abc
\x{100}
\x{308}
\= Expect no match
123
/[\s[:^ascii:]]/B,ucp
/\pP/ucp
\x{7fffffff}
# A special extra option allows excaped surrogate code points in 32-bit mode,
# but subjects containing them must not be UTF-checked. These patterns give
# errors in 16-bit mode.
/\x{d800}/I,utf,allow_surrogate_escapes
\x{d800}\=no_utf_check
/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes
\x{dfff}\x{df01}\=no_utf_check
# This has different starting code units in 8-bit mode.
/^[^ab]/IB,utf
c
\x{ff}
\x{100}
\= Expect no match
aaa
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
# A few script run tests in non-UTF mode (but they need Unicode support)
/^(*script_run:.{4})/
\x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han
\x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han
\x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul
/^(*sr:.*)/utf,allow_surrogate_escapes
\x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana
\x{d800}\x{dfff} Surrogates (Unknown) \=no_utf_check
/(?(n/utf
/(?(á/utf
# Invalid UTF-16/32 tests.
/.../g,match_invalid_utf
abcd\x{df00}wxzy\x{df00}pqrs
abcd\x{80}wxzy\x{df00}pqrs
/abc/match_invalid_utf
ab\x{df00}ab\=ph
\= Expect no match
ab\x{df00}cdef\=ph
/.a/match_invalid_utf
ab\=ph
ab\=ps
\= Expect no match
b\x{df00}\=ph
b\x{df00}\=ps
/.a$/match_invalid_utf
ab\=ph
ab\=ps
\= Expect no match
b\x{df00}\=ph
b\x{df00}\=ps
/ab$/match_invalid_utf
ab\x{df00}cdeab
\= Expect no match
ab\x{df00}cde
/.../g,match_invalid_utf
abcd\x{80}wxzy\x{df00}pqrs
/(?<=x)../g,match_invalid_utf
abcd\x{80}wxzy\x{df00}pqrs
abcd\x{80}wxzy\x{df00}xpqrs
/X$/match_invalid_utf
\= Expect no match
X\x{df00}
/(?<=..)X/match_invalid_utf,aftertext
AB\x{df00}AQXYZ
AB\x{df00}AQXYZ\=offset=5
AB\x{df00}\x{df00}AXYZXC\=offset=5
\= Expect no match
AB\x{df00}XYZ
AB\x{df00}XYZ\=offset=3
AB\x{df00}AXYZ
AB\x{df00}AXYZ\=offset=4
AB\x{df00}\x{df00}AXYZ\=offset=5
/.../match_invalid_utf
\= Expect no match
A\x{d800}B
A\x{110000}B
/aa/utf,ucp,match_invalid_utf,global
aa\x{d800}aa
/aa/utf,ucp,match_invalid_utf,global
\x{d800}aa
/A\z/utf,match_invalid_utf
A\x{df00}\n
/ab$/match_invalid_utf
\= Expect no match
ab\x{df00}cde
/ab\z/match_invalid_utf
\= Expect no match
ab\x{df00}cde
/ab\Z/match_invalid_utf
\= Expect no match
ab\x{df00}cde
/(..)(*scs:(1)ab\z)/match_invalid_utf
ab\x{df00}cde
/(..)(*scs:(1)ab\Z)/match_invalid_utf
ab\x{df00}cde
/(..)(*scs:(1)ab$)/match_invalid_utf
ab\x{df00}cde
# ----------------------------------------------------
/(*UTF)(?=\x{123})/I
/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf
/[\xff\x{ffff}]/I,utf
/[\xff\x{ff}]/I,utf
/[\xff\x{ff}]/I
/[Ss]/I
/[Ss]/I,utf
/(?:\x{ff}|\x{3000})/I,utf
# ----------------------------------------------------
# UCP and casing tests
/\x{120}/iI
/\x{c1}/iI,ucp
/[\x{120}\x{121}]/iB,ucp
/[ab\x{120}]+/iB,ucp
aABb\x{121}\x{120}
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
/\x{120}\x{c1}/i,ucp,no_start_optimize
\x{121}\x{e1}
/\x{120}\x{c1}/i,ucp
\x{121}\x{e1}
/[^\x{120}]/i,no_start_optimize
\x{121}
/[^\x{120}]/i,ucp,no_start_optimize
\= Expect no match
\x{121}
/[^\x{120}]/i
\x{121}
/[^\x{120}]/i,ucp
\= Expect no match
\x{121}
/\x{120}{2}/i,ucp
\x{121}\x{121}
/[^\x{120}]{2}/i,ucp
\= Expect no match
\x{121}\x{121}
/\x{c1}+\x{e1}/iB,ucp
\x{c1}\x{c1}\x{c1}
/\x{c1}+\x{e1}/iIB,ucp
\x{c1}\x{c1}\x{c1}
\x{e1}\x{e1}\x{e1}
/a|\x{c1}/iI,ucp
\x{e1}xxx
/\x{c1}|\x{e1}/iI,ucp
/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended
X\x{e1}Y
/X(\x{121})Y/ucp,replace=>\U$1<,substitute_extended
X\x{121}Y
/s/i,ucp
\x{17f}
/s/i,utf
\x{17f}
/[^s]/i,ucp
\= Expect no match
\x{17f}
/[^s]/i,utf
\= Expect no match
\x{17f}
/(.) \1/i,ucp
i I
/(.) \1/i,ucp,turkish_casing
\= Expect no match
i I
/(.) \1/i,ucp
i I
\x{212a} k
\= Expect no match
i \x{0130}
\x{0131} I
/(.) \1/i,ucp,turkish_casing
\x{212a} k
i \x{0130}
\x{0131} I
\= Expect no match
i I
/(.) (?r:\1)/i,ucp,turkish_casing
i I
\= Expect no match
i \x{0130}
\x{0131} I
\x{212a} k
/[a-z][^i]I/ucp,turkish_casing
bII
b\x{0130}I
b\x{0131}I
\= Expect no match
biI
/[a-z][^i]I/i,ucp,turkish_casing
b\x{0131}I
bII
\= Expect no match
biI
b\x{0130}I
/[a-z](?r:[^i])I/i,ucp,turkish_casing
b\x{0131}I
b\x{0130}I
\= Expect no match
bII
biI
/b(?r:[\x{00FF}-\x{FFEE}])/i,ucp,turkish_casing
b\x{0130}
b\x{0131}
B\x{212a}
\= Expect no match
bi
bI
bk
/[\x60-\x7f]/i,ucp,turkish_casing
i
\= Expect no match
I
/[\x60-\xc0]/i,ucp,turkish_casing
i
\= Expect no match
I
/[\x80-\xc0]/i,ucp,turkish_casing
\= Expect no match
i
I
# ----------------------------------------------------
/b[\x{00FF}-\x{FFEE}]/ir
b\x{0130}
b\x{0131}
B\x{212a}
\= Expect no match
bi
bI
bk
# Quantifier after a literal that has the value of META_ACCEPT (not UTF). This
# fails in 16-bit mode, but is OK for 32-bit.
/\x{802a0000}*/
\x{802a0000}\x{802a0000}
# UTF matching without UTF, check invalid UTF characters
/\X++/
a\x{110000}\x{ffffffff}
# This used to loop in 32-bit mode; it will fail in 16-bit mode.
/[\x{ffffffff}]/caseless,ucp
\x{ffffffff}xyz
# These are 32-bit tests for handing 0xffffffff when in UCP caselsss mode. They
# will give errors in 16-bit mode.
/k*\x{ffffffff}/caseless,ucp
\x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
/[sk](?r:[sk])[sk]/Bi,ucp
SKS
sks
\x{212a}S\x{17f}
\x{17f}K\x{212a}
\= Expect no match
s\x{212a}s
K\x{17f}K
# ---------------------------------------------------------
# End of testinput12

View File

@@ -0,0 +1,22 @@
# These DFA tests are for the handling of characters greater than 255 in
# 16-bit or 32-bit, non-UTF mode.
#forbid_utf
#subject dfa
/^\x{ffff}+/i
\x{ffff}
/^\x{ffff}?/i
\x{ffff}
/^\x{ffff}*/i
\x{ffff}
/^\x{ffff}{3}/i
\x{ffff}\x{ffff}\x{ffff}
/^\x{ffff}{0,3}/i
\x{ffff}
# End of testinput13

View File

@@ -0,0 +1,108 @@
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
XX\x{d800}\=offset=3
XX\x{d800}\=no_utf_check
XX\x{da00}
XX\x{da00}\=no_utf_check
XX\x{dc00}
XX\x{dc00}\=no_utf_check
XX\x{de00}
XX\x{de00}\=no_utf_check
XX\x{dfff}
XX\x{dfff}\=no_utf_check
XX\x{110000}
XX\x{d800}\x{1234}
/badutf/utf
X\xdf
XX\xef
XXX\xef\x80
X\xf7
XX\xf7\x80
XXX\xf7\x80\x80
/shortutf/utf
XX\xdf\=ph
XX\xef\=ph
XX\xef\x80\=ph
\xf7\=ph
\xf7\x80\=ph
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
/\x{c1}+\x{e1}/iB,ucp
\x{c1}\x{c1}\x{c1}
\x{e1}\x{e1}\x{e1}
/\x{120}\x{c1}/i,ucp,no_start_optimize
\x{121}\x{e1}
/\x{120}\x{c1}/i,ucp
\x{121}\x{e1}
/[^\x{120}]/i,no_start_optimize
\x{121}
/[^\x{120}]/i,ucp,no_start_optimize
\= Expect no match
\x{121}
/[^\x{120}]/i
\x{121}
/[^\x{120}]/i,ucp
\= Expect no match
\x{121}
/\x{120}{2}/i,ucp
\x{121}\x{121}
/[^\x{120}]{2}/i,ucp
\= Expect no match
\x{121}\x{121}
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
\x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
# ----------------------------------------------------
# End of testinput14

View File

@@ -0,0 +1,253 @@
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 17.
# (2) Other tests that must not be run with JIT.
# These tests are first so that they don't inherit a large enough heap frame
# vector from a previous test.
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
"(*LIMIT_HEAP=21)()((?))()()()()()()()()()()()()()()()()()()()()()()()(())()()()()()()()()()()()()()()()()()()()()()(())()()()()()()()()()()()()()"
xx
# -----------------------------------------------------------------------
/(a+)*zz/I
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
aaaaaaaaaaaaaz\=find_limits_noheap
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
/* this is a C style comment */\=find_limits_noheap
/^(?>a)++/
aa\=find_limits_noheap
aaaaaaaaa\=find_limits_noheap
/(a)(?1)++/
aa\=find_limits_noheap
aaaaaaaaa\=find_limits_noheap
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits_noheap
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits_noheap
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits_noheap
/(*LIMIT_MATCH=12bc)abc/
/(*LIMIT_MATCH=4294967290)abc/
/(*LIMIT_DEPTH=4294967280)abc/I
/(a+)*zz/
\= Expect no match
aaaaaaaaaaaaaz
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=3000
/(a+)*zz/
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=10
/(*LIMIT_MATCH=3000)(a+)*zz/I
\= Expect limit exceeded
aaaaaaaaaaaaaz
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
\= Expect limit exceeded
aaaaaaaaaaaaaz
/(*LIMIT_MATCH=60000)(a+)*zz/I
\= Expect no match
aaaaaaaaaaaaaz
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=3000
/(*LIMIT_DEPTH=10)(a+)*zz/I
\= Expect limit exceeded
aaaaaaaaaaaaaz
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=1000
/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I
\= Expect no match
aaaaaaaaaaaaaz
/(*LIMIT_DEPTH=1000)(a+)*zz/I
\= Expect no match
aaaaaaaaaaaaaz
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=10
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
/((?(R2)a+|(?1)b))()/
aaaabcde
/(?(R)a*(?1)|((?R))b)/
aaaabcde
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
abcxyzpqr\=aftertext
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
xyzpqrabcxyzpqr\=aftertext
/a\b/
a.\=allusedtext
a\=allusedtext
/abc\Kxyz/
abcxyz\=allusedtext
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
#subject allusedtext
/(?<=abc)123/
xyzabc123pqr
xyzabc12\=ps
xyzabc12\=ph
/\babc\b/
+++abc+++
+++ab\=ps
+++ab\=ph
/(?<=abc)def/
abc\=ph
/(?<=123)(*MARK:xx)abc/mark
xxxx123a\=ph
xxxx123a\=ps
/(?<=(?<=a)b)c.*/I
abc\=ph
\= Expect no match
xbc\=ph
/(?<=ab)c.*/I
abc\=ph
\= Expect no match
xbc\=ph
/abc(?<=bc)def/
xxxabcd\=ph
/(?<=ab)cdef/
xxabcd\=ph
/(?<=(?<=(?<=a)b)c)./I
123abcXYZ
/(?<=ab(cd(?<=...)))./I
abcdX
/(?<=ab((?<=...)cd))./I
ZabcdX
/(?<=((?<=(?<=ab).))(?1)(?1))./I
abxZ
#subject
# -------------------------------------------------------------------
# These tests provoke recursion loops, which give a different error message
# when JIT is used.
/(?R)/I
abcd
/(a|(?R))/I
abcd
defg
/(ab|(bc|(de|(?R))))/I
abcd
fghi
/(ab|(bc|(de|(?1))))/I
abcd
fghi
/x(ab|(bc|(de|(?1)x)x)x)/I
xab123
xfghi
/(?!\w)(?R)/
abcd
=abc
/(?=\w)(?R)/
=abc
abcd
/(?<!\w)(?R)/
abcd
/(?<=\w)(?R)/
abcd
/(a+|(?R)b)/
aaa
bbb
/[^\xff]((?1))/BI
abcd
# These tests don't behave the same with JIT
/\w+(?C1)/BI,no_auto_possess
abc\=callout_fail=1
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
abc\=callout_fail=1
# This test breaks the JIT stack limit
/(|]+){2,2452}/
(|]+){2,2452}
/b(?<!ax)(?!cx)/allusedtext
abc
abcz
# This test triggers the recursion limit in the interpreter, but completes in
# JIT. It's in testinput2 with disable_recurse_loop_check to get it to work
# in the interpreter.
/(a(?1)z||(?1)++)$/
abcd
# End of testinput15

View File

@@ -0,0 +1,9 @@
# This test is run only when JIT support is not available. It checks that an
# attempt to use it has the expected behaviour. It also tests things that
# are different without JIT.
/abc/I,jit,jitverify
/a*/I
# End of testinput16

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,147 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface, which is supported only with the 8-bit library. This test should
# not be run with JIT (which is not available for the POSIX interface).
#forbid_utf
#pattern posix
# Test some invalid options
/abc/auto_callout
/abc/
abc\=find_limits
/abc/
abc\=partial_hard
/a(())bc/parens_nest_limit=1
/abc/allow_surrogate_escapes,max_pattern_length=2
# Real tests
/abc/
abc
/^abc|def/
abcdef
abcdef\=notbol
/.*((abc)$|(def))/
defabc
defabc\=noteol
/the quick brown fox/
the quick brown fox
\= Expect no match
The Quick Brown Fox
/the quick brown fox/i
the quick brown fox
The Quick Brown Fox
/(*LF)abc.def/
\= Expect no match
abc\ndef
/(*LF)abc$/
abc
abc\n
/(abc)\2/
/(abc\1)/
\= Expect no match
abc
/a*(b+)(z)(z)/
aaaabbbbzzzz
aaaabbbbzzzz\=ovector=0
aaaabbbbzzzz\=ovector=1
aaaabbbbzzzz\=ovector=2
/(*ANY)ab.cd/
ab-cd
ab=cd
\= Expect no match
ab\ncd
/ab.cd/s
ab-cd
ab=cd
ab\ncd
/a(b)c/posix_nosub
abc
/a(?P<name>b)c/posix_nosub
abc
/(a)\1/posix_nosub
zaay
/a?|b?/
abc
\= Expect no match
ddd\=notempty
/\w+A/
CDAAAAB
/\w+A/ungreedy
CDAAAAB
/\Biss\B/I,aftertext
Mississippi
/abc/\
"(?(?C)"
"(?(?C))"
/abcd/substitute_extended
/\[A]{1000000}**/expand,regerror_buffsize=31
/\[A]{1000000}**/expand,regerror_buffsize=32
//posix_nosub
\=offset=70000
/^d(e)$/posix
acdef\=posix_startend=2:4
acde\=posix_startend=2
\= Expect no match
acdef
acdef\=posix_startend=2
/^a\x{00}b$/posix
a\x{00}b\=posix_startend=0:3
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
/ABC/use_length
ABC
/a\b(c/literal,posix
a\\b(c
/a\b(c/literal,posix,dotall
/((a)(b)?(c))/posix
123ace
123ace\=posix_startend=2:6
//posix
\= Expect errors
\=null_subject
abc\=null_subject
/(*LIMIT_HEAP=0)xx/posix
\= Expect error
xxxx
# End of testdata/testinput18

View File

@@ -0,0 +1,25 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
/\w/
\= Expect no match
+++\x{c2}
/\w/ucp
+++\x{c2}
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
/\w/utf
\= Expect UTF error
A\xabB
# End of testdata/testinput19

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,108 @@
# This set of tests exercises the serialization/deserialization and code copy
# functions in the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
foofoo
barbar
#pop mark
C
\= Expect no match
D
#pop
AmanaplanacanalPanama
#pop info
metcalfe 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
#pop
xyz
#pop
abc
#pop should give an error
pqr
/abcd/pushcopy
abcd
#pop
abcd
#pop should give an error
/abcd/push
#popcopy
abcd
#pop
abcd
/abcd/push
#save testsaved1
#pop should give an error
#load testsaved1
#popcopy
abcd
#pop
abcd
#pop should give an error
/abcd/pushtablescopy
abcd
#popcopy
abcd
#pop
abcd
# Must only specify one of these
//push,pushcopy
//push,pushtablescopy
//pushcopy,pushtablescopy
# End of testinput20

View File

@@ -0,0 +1,18 @@
# These are tests of \C that do not involve UTF. They are not run when \C is
# disabled by compiling with --enable-never-backslash-C.
/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx
/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx
/ab\Cde/never_backslash_c
/ab\Cde/info
abXde
/(?<=ab\Cde)X/
abZdeX
/[\C]/
# End of testinput21

View File

@@ -0,0 +1,107 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
/\C+\X \X+\C/Bx,utf
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
/X(\C{3})/utf
X\x{1234}
X\x{11234}Y
X\x{11234}YZ
/X(\C{4})/utf
X\x{1234}YZ
X\x{11234}YZ
X\x{11234}YZW
/X\C*/utf
XYZabcdce
/X\C*?/utf
XYZabcde
/X\C{3,5}/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{1234}\x{512}YZ
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}
X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}YZ
X\x{11234}
/a\Cb/utf
aXb
a\nb
a\x{100}b
/a\C\Cb/utf
a\x{100}b
a\x{12257}b
a\x{12257}\x{11234}b
/ab\Cde/utf
abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
X\nabc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
/^ab\C/utf,no_start_optimize
\= Expect no match - tests \C at end of subject
ab
/\C[^\v]+\x80/utf
[AΏBŀC]
/\C[^\d]+\x80/utf
[AΏBŀC]

View File

@@ -0,0 +1,9 @@
# This test is run when PCRE2 has been built with --enable-never-backslash-C,
# which disables the use of \C. All we can do is check that it gives the
# correct error message.
/a\Cb/
/a[\C]b/
# End of testinput23

View File

@@ -0,0 +1,396 @@
# This file tests the auxiliary pattern conversion features of the PCRE2
# library, in non-UTF mode.
#forbid_utf
#newline_default lf any anycrlf
# -------- Tests of glob conversion --------
# Set the glob separator explicitly so that different OS defaults are not a
# problem. Then test various errors.
#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/
/abc/posix
# Separator must be / \ or .
/a*b/convert_glob_separator=%
# Can't have separator in a class
"[ab/cd]"
"[,-/]"
/[ab/
# Length check
/abc/convert_length=11
/abc/convert_length=12
# Now some actual tests
/a?b[]xy]*c/
azb]1234c
# Tests from the gitwildmatch list, with some additions
/foo/
foo
/= Expect no match
bar
//
\
/???/
foo
\= Expect no match
foobar
/*/
foo
\
/f*/
foo
f
/*f/
oof
\= Expect no match
foo
/*foo*/
foo
food
aprilfool
/*ob*a*r*/
foobar
/*ab/
aaaaaaabababab
/foo\*/
foo*
/foo\*bar/
\= Expect no match
foobar
/f\\oo/
f\\oo
/*[al]?/
ball
/[ten]/
\= Expect no match
ten
/t[a-g]n/
ten
/a[]]b/
a]b
/a[]a-]b/
/a[]-]b/
a-b
a]b
\= Expect no match
aab
/a[]a-z]b/
aab
/]/
]
/t[!a-g]n/
ton
\= Expect no match
ten
'[[:alpha:]][[:digit:]][[:upper:]]'
a1B
'[[:digit:][:upper:][:space:]]'
A
1
\ \=
\= Expect no match
a
.
'[a-c[:digit:]x-z]'
5
b
y
\= Expect no match
q
# End of gitwildmatch tests
/*.j?g/
pic01.jpg
.jpg
pic02.jxg
\= Expect no match
pic03.j/g
/A[+-0]B/
A+B
A.B
A0B
\= Expect no match
A/B
/*x?z/
abc.xyz
\= Expect no match
.xyz
/?x?z/
axyz
\= Expect no match
.xyz
"[,-0]x?z"
,xyz
\= Expect no match
/xyz
.xyz
".x*"
.xabc
/a[--0]z/
a-z
a.z
a0z
\= Expect no match
a/z
a1z
/<[a-c-d]>/
<a>
<b>
<c>
<d>
<->
/a[[:digit:].]z/
a1z
a.z
\= Expect no match
a:z
/a[[:digit].]z/
a[.]z
a:.]z
ad.]z
/<[[:a[:digit:]b]>/
<[>
<:>
<a>
<9>
<b>
\= Expect no match
<d>
/a*b/convert_glob_separator=\
/a*b/convert_glob_separator=.
/a*b/convert_glob_separator=/
# Non control character checking
/A\B\\C\D/
/\\{}\?\*+\[\]()|.^$/
/*a*\/*b*/
/?a?\/?b?/
/[a\\b\c][]][-][\]\-]/
/[^a\\b\c][!]][!-][^\]\-]/
/[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]]/
"[/-/]"
/[-----]/
/[------]/
/[!------]/
/[[:alpha:]-a]/
/[[:alpha:]][[:punct:]][[:ascii:]]/
/[a-[:alpha:]]/
/[[:alpha:/
/[[:alpha:]/
/[[:alphaa:]]/
/[[:xdigi:]]/
/[[:xdigit::]]/
/****/
/**\/abc/
abc
x/abc
xabc
/abc\/**/
/abc\/**\/abc/
/**\/*a*b*g*n*t/
abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt
/**\/*a*\/**/
xx/xx/xx/xax/xx/xb
/**\/*a*/
xx/xx/xx/xax
xx/xx/xx/xax/xx
/**\/*a*\/**\/*b*/
xx/xx/xx/xax/xx/xb
xx/xx/xx/xax/xx/x
"**a"convert=glob
a
c/b/a
c/b/aaa
"a**/b"convert=glob
a/b
ab
"a/**b"convert=glob
a/b
ab
#pattern convert=glob:glob_no_starstar
/***/
/**a**/
#pattern convert=unset
#pattern convert=glob:glob_no_wild_separator
/*/
/*a*/
/**a**/
/a*b/
/*a*b*/
/??a??/
#pattern convert=unset
#pattern convert=glob,convert_glob_escape=0
/a\b\cd/
/**\/a/
/a`*b/convert_glob_escape=`
/a`*b/convert_glob_escape=0
/a`*b/convert_glob_escape=x
# -------- Tests of extended POSIX conversion --------
#pattern convert=unset:posix_extended
/<[[:a[:digit:]b]>/
<[>
<:>
<a>
<9>
<b>
\= Expect no match
<d>
/a+\1b\\c|d[ab\c]/
/<[]bc]>/
<]>
<b>
<c>
/<[^]bc]>/
<.>
\= Expect no match
<]>
<b>
/(a)\1b/
a1b
\= Expect no match
aab
/(ab)c)d]/
Xabc)d]Y
/a***b/
# -------- Tests of basic POSIX conversion --------
#pattern convert=unset:posix_basic
/a*b+c\+[def](ab)\(cd\)/
/\(a\)\1b/
aab
\= Expect no match
a1b
/how.to how\.to/
how\nto how.to
\= Expect no match
how\x{0}to how.to
/^how to \^how to/
/^*abc/
/*abc/
X*abcY
/**abc/
XabcY
X*abcY
X**abcY
/*ab\(*cd\)/
/^b\(c^d\)\(^e^f\)/
/a***b/
# End of testinput24

View File

@@ -0,0 +1,22 @@
# This file tests the auxiliary pattern conversion features of the PCRE2
# library, in UTF mode.
#newline_default lf any anycrlf
# -------- Tests of glob conversion --------
# Set the glob separator explicitly so that different OS defaults are not a
# problem. Then test various errors.
#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/
# The fact that this one works in 13 bytes in the 8-bit library shows that the
# output is in UTF-8, though pcre2test shows the character as an escape.
/'>' c4 a3 '<'/hex,utf,convert_length=13
# This expansion creates a string that is too long for the input buffer.
/\[()]{65535}()/expand
# End of testinput25

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,113 @@
# This set of tests checks local-specific features, using the "fr_FR" locale.
# It is almost Perl-compatible. When run via RunTest, the locale is edited to
# be whichever of "fr_FR", "french", or "fr" is found to exist. There is
# different version of this file called wintestinput3 for use on Windows,
# where the locale is called "french" and the tests are run using
# RunTest.bat.
#forbid_utf
/^[\w]+/
\= Expect no match
École
/^[\w]+/locale=fr_FR
École
/^[\W]+/
École
/^[\W]+/locale=fr_FR
\= Expect no match
École
/[\b]/
\b
\= Expect no match
a
/[\b]/locale=fr_FR
\b
\= Expect no match
a
/^\w+/
\= Expect no match
École
/^\w+/locale=fr_FR
École
/(.+)\b(.+)/
École
/(.+)\b(.+)/locale=fr_FR
\= Expect no match
École
/École/i
École
\= Expect no match
école
/École/i,locale=fr_FR
École
école
/\w/I
/\w/I,locale=fr_FR
# All remaining tests are in the fr_FR locale, so set the default.
#pattern locale=fr_FR
/^[\xc8-\xc9]/i
École
école
/^[\xc8-\xc9]/
École
\= Expect no match
école
/\xb5/i
µ
\= Expect no match
\x9c
/ÿ/i
\xff
\= Expect no match
y
/(.)\1/i
\xfe\xde
/\W+/
>>>\xaa<<<
>>>\xba<<<
/[\W]+/
>>>\xaa<<<
>>>\xba<<<
/[^[:alpha:]]+/
>>>\xaa<<<
>>>\xba<<<
/\w+/
>>>\xaa<<<
>>>\xba<<<
/[\w]+/
>>>\xaa<<<
>>>\xba<<<
/[[:alpha:]]+/
>>>\xaa<<<
>>>\xba<<<
/[[:alpha:]][[:lower:]][[:upper:]]/IB
# End of testinput3

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,189 @@
# There are two sorts of patterns in this test. A number of them are
# representative patterns whose lengths and offsets are checked. This is just a
# doublecheck test to ensure the sizes don't go horribly wrong when something
# is changed. The operation of these patterns is checked in other tests.
#
# This file also contains tests whose output varies with code unit size and/or
# link size. Unicode support is required for these tests. There are separate
# output files for each code unit size and link size.
#pattern fullbincode,memory
/((?i)b)/
/(?s)(.*X|^B)/
/(?s:.*X|^B)/
/^[[:alnum:]]/
/#/Ix
/a#/Ix
/x?+/
/x++/
/x{1,3}+/
/(x)*+/
/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/
"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\<EjmhUZ\?\.akp2dF\>qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b"
"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\<EjmhUZ\?\.akp2dF\>qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b"
/(a(?1)b)/
/(a(?1)+b)/
/a(?P<name1>b|c)d(?P<longername2>e)/
/(?:a(?P<c>c(?P<d>d)))(?P<a>a)/
/(?P<a>a)...(?P=a)bbb(?P>a)d/
/abc(?C255)de(?C)f/
/abcde/auto_callout
/\x{100}/utf
/\x{1000}/utf
/\x{10000}/utf
/\x{100000}/utf
/\x{10ffff}/utf
/\x{110000}/utf
/[\x{ff}]/utf
/[\x{100}]/utf
/\x80/utf
/\xff/utf
/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf
/\x{D55c}\x{ad6d}\x{C5B4}/I,utf
/\x{65e5}\x{672c}\x{8a9e}/I,utf
/[\x{100}]/utf
/[Z\x{100}]/utf
/^[\x{100}\E-\Q\E\x{150}]/utf
/^[\QĀ\E-\QŐ\E]/utf
/^[\QĀ\E-\QŐ\E/utf
/[\p{L}]/
/[\p{^L}]/
/[\P{L}]/
/[\P{^L}]/
/[abc\p{L}\x{0660}]/utf
/[\p{Nd}]/utf
/[\p{Nd}+-]+/utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf
/[\x{105}-\x{109}]/i,utf
/( ( (?(1)0|) )* )/x
/( (?(1)0|)* )/x
/[a]/
/[a]/utf
/[\xaa]/
/[\xaa]/utf
/[^a]/
/[^a]/utf
/[^\xaa]/
/[^\xaa]/utf
#pattern -memory
/[^\d]/utf,ucp
/[[:^alpha:][:^cntrl:]]+/utf,ucp
/[[:^cntrl:][:^alpha:]]+/utf,ucp
/[[:alpha:]]+/utf,ucp
/[[:^alpha:]\S]+/utf,ucp
/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/
/(((a\2)|(a*)\g<-1>))*a?/
/((?+1)(\1))/
"(?1)(?#?'){2}(a)"
/.((?2)(?R)|\1|$)()/
/.((?3)(?R)()(?2)|\1|$)()/
/(?1)()((((((\1++))\x85)+)|))/
# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# with link size 2 in 8-bit and 16-bit but not in 32-bit.
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
# and even when it fails in two modes, the error offset differs. It also varies
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
/(?(1)(?1)){8,}+()/debug
abcd
/(?(1)|a(?1)b){2,}+()/debug
abcde
/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
#pattern -fullbincode
/\[()]{65535}/expand
# End of testinput8

View File

@@ -0,0 +1,284 @@
# This set of tests is run only with the 8-bit library. They must not require
# UTF-8 or Unicode property support. */
#forbid_utf
#newline_default lf any anycrlf
/a\xc4\xa3b/
a\N{U+123}b
\= Expect no match # error message (too big char)
a\x{0123}b
a\o{00443}b
a\443b
/fd bf bf bf bf bf/I,hex
\= Expect warning
\N{U+7fffffff}
\= Expect no match # error message (too big char)
\x{7fffffff}
/\x{100}/I
/\o{400}/I
/ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional leading comment
(?: (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address
| # or
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # one word, optionally followed by....
(?:
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
\(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) | # comments, or...
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
# quoted strings
)*
< (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # leading <
(?: @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* , (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
)* # further okay, if led by comma
: # closing colon
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* )? # optional route
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address spec
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* > # trailing >
# name and address
) (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
/\h/I
/\H/I
/\v/I
/\V/I
/\R/I
/[\h]/B
>\x09<
/[\h]+/B
>\x09\x20\xa0<
/[\v]/B
/[\H]/B
/[^\h]/B
/[\V]/B
/[\x0a\V]/B
/\777/I
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames
XX
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
/[^\x00-a]{12,}[^b-\xff]*/B
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B
/(*MARK:a\x{100}b)z/alt_verbnames
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/
/(?i:A{1,}\6666666666)/
A\x{1b6}6666666
# Should cause an error
/abc/substitute_extended,replace=>\777<
abc
# Should cause an error
/abc/substitute_extended,replace=>\o{012345}<
abc
/i/turkish_casing
# End of testinput9

View File

@@ -0,0 +1,137 @@
# This is a specialized test for checking, when PCRE2 is compiled with the
# EBCDIC option but in an ASCII environment, that newline, white space, and \c
# functionality is working. It catches cases where explicit values such as 0x0a
# have been used instead of names like CHAR_LF. Needless to say, it is not a
# genuine EBCDIC test! In patterns, alphabetic characters that follow a
# backslash must be in EBCDIC code. In data, NL, NEL, LF, ESC, and DEL must be
# in EBCDIC, but can of course be specified as escapes.
# Test default newline and variations
/^A/m
ABC
12\x15ABC
/^A/m,newline=any
12\x15ABC
12\x0dABC
12\x0d\x15ABC
12\x25ABC
/^A/m,newline=anycrlf
12\x15ABC
12\x0dABC
12\x0d\x15ABC
** Fail
12\x25ABC
# Test \h
/^A\ˆ/
A B
A\x41B
# Test \H
/^A\È/
AB
A\x42B
** Fail
A B
A\x41B
# Test \R
/^A\Ù/
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
** Fail
A B
# Test \v
/^A\¥/
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
** Fail
A B
# Test \V
/^A\å/
A B
** Fail
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
# For repeated items, use an atomic group so that the output is the same
# for DFA matching (otherwise it may show multiple matches).
# Test \h+
/^A(?>\ˆ+)/
A B
# Test \H+
/^A(?>\È+)/
AB
** Fail
A B
# Test \R+
/^A(?>\Ù+)/
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
** Fail
A B
# Test \v+
/^A(?>\¥+)/
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
** Fail
A B
# Test \V+
/^A(?>\å+)/
A B
** Fail
A\x15B
A\x0dB
A\x25B
A\x0bB
A\x0cB
# Test \c functionality
/\ƒ@\ƒA\ƒb\ƒC\ƒd\ƒE\ƒf\ƒG\ƒh\ƒI\ƒJ\ƒK\ƒl\ƒm\ƒN\ƒO\ƒp\ƒq\ƒr\ƒS\ƒT\ƒu\ƒV\ƒW\ƒX\ƒy\ƒZ/
\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
/\ƒ[\ƒ\\ƒ]\ƒ^\ƒ_/
\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
/\ƒ?/
A\xffB
/\ƒ&/
# End

View File

@@ -0,0 +1,35 @@
#pattern framesize, memory
/abcd/
abcd\=memory
abcd\=find_limits
/(((((((((((((((((((((((((((((( (^abc|xyz){1,20}$ ))))))))))))))))))))))))))))))/x
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcX\=memory
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcX\=find_limits
/ab(cd)/
abcd\=memory
abcd\=memory,ovector=0
/\[(a)]{1000}/expand,framesize
\[a]{1000}\=ovector=1
# The heapframes_size option gets pcre2test to show the size of the heapframes
# vector that after pcre2_match() has run. Running a match with ovector=0
# causes the match data block to be freed, thus releasing that vector.
/\[(a)]{1000}/expand,framesize
\[a]{1000}\=ovector=1,heapframes_size
/a/heapframes_size,framesize
a\=ovector=0
/a|(b){200}/g,expand,heapframes_size
abacus z\[b]{200}z
a\=ovector=0
/(a)/replace=>$1<
cat\=heapframes_size
# End

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,853 @@
# This set of tests is for the 16-bit and 32-bit libraries' basic (non-UTF)
# features that are not compatible with the 8-bit library, or which give
# different output in 16-bit or 32-bit mode. The output for the two widths is
# different, so they have separate output files.
#forbid_utf
#newline_default LF ANY ANYCRLF
/[^\x{c4}]/IB
------------------------------------------------------------------
Bra
[^\x{c4}] (not)
Ket
End
------------------------------------------------------------------
Capture group count = 0
Subject length lower bound = 1
/\x{100}/I
Capture group count = 0
First code unit = \x{100}
Subject length lower bound = 1
/ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional leading comment
(?: (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address
| # or
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # one word, optionally followed by....
(?:
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
\(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) | # comments, or...
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
# quoted strings
)*
< (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # leading <
(?: @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* , (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
)* # further okay, if led by comma
: # closing colon
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* )? # optional route
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address spec
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* > # trailing >
# name and address
) (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
Capture group count = 0
Contains explicit CR or LF match
Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff
Subject length lower bound = 3
/[\h]/B
------------------------------------------------------------------
Bra
[\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]
Ket
End
------------------------------------------------------------------
>\x09<
0: \x09
/[\h]+/B
------------------------------------------------------------------
Bra
[\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]++
Ket
End
------------------------------------------------------------------
>\x09\x20\xa0<
0: \x09 \xa0
/[\v]/B
------------------------------------------------------------------
Bra
[\x0a-\x0d\x85\x{2028}-\x{2029}]
Ket
End
------------------------------------------------------------------
/[^\h]/B
------------------------------------------------------------------
Bra
[^\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]
Ket
End
------------------------------------------------------------------
/\h+/I
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000}
0: \x{200a}\xa0\x{2000}
/[\h\x{dc00}]+/IB
------------------------------------------------------------------
Bra
[\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{dc00}]++
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000}
0: \x{200a}\xa0\x{2000}
/\H+/I
Capture group count = 0
Subject length lower bound = 1
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b}
0: \x{1fff}\x{200b}
\x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060}
0: \x{202e}\x{2030}\x{205e}\x{2060}
\xa0\x{3000}\x9f\xa1\x{2fff}\x{3001}
0: \x9f\xa1\x{2fff}\x{3001}
/[\H\x{d800}]+/
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b}
0: \x{1fff}\x{200b}
\x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060}
0: \x{202e}\x{2030}\x{205e}\x{2060}
\xa0\x{3000}\x9f\xa1\x{2fff}\x{3001}
0: \x9f\xa1\x{2fff}\x{3001}
/\v+/I
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
0: \x85\x0a\x0b\x0c\x0d
/[\v\x{dc00}]+/IB
------------------------------------------------------------------
Bra
[\x0a-\x0d\x85\x{2028}-\x{2029}\x{dc00}]++
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
0: \x85\x0a\x0b\x0c\x0d
/\V+/I
Capture group count = 0
Subject length lower bound = 1
\x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
0: \x09\x0e\x84\x86
/[\V\x{d800}]+/
\x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
0: \x09\x0e\x84\x86
/\R+/I,bsr=unicode
Capture group count = 0
\R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
0: \x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
Capture group count = 0
First code unit = \x{d800}
Last code unit = \x{dd00}
Subject length lower bound = 6
\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}
0: \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}
/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/B
------------------------------------------------------------------
Bra
[^\x{80}] (not)
[^\x{ff}] (not)
[^\x{100}] (not)
[^\x{1000}] (not)
[^\x{ffff}] (not)
Ket
End
------------------------------------------------------------------
/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/Bi
------------------------------------------------------------------
Bra
/i [^\x{80}] (not)
/i [^\x{ff}] (not)
/i [^\x{100}] (not)
/i [^\x{1000}] (not)
/i [^\x{ffff}] (not)
Ket
End
------------------------------------------------------------------
/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/B
------------------------------------------------------------------
Bra
[^\x{100}]* (not)
[^\x{1000}]+ (not)
[^\x{ffff}]?? (not)
[^\x{8000}]{4} (not)
[^\x{8000}]* (not)
[^\x{7fff}]{2} (not)
[^\x{7fff}]{0,7}? (not)
[^\x{100}]{5} (not)
[^\x{100}]?+ (not)
Ket
End
------------------------------------------------------------------
/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/Bi
------------------------------------------------------------------
Bra
/i [^\x{100}]* (not)
/i [^\x{1000}]+ (not)
/i [^\x{ffff}]?? (not)
/i [^\x{8000}]{4} (not)
/i [^\x{8000}]* (not)
/i [^\x{7fff}]{2} (not)
/i [^\x{7fff}]{0,7}? (not)
/i [^\x{100}]{5} (not)
/i [^\x{100}]?+ (not)
Ket
End
------------------------------------------------------------------
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
/\u0100/B,alt_bsux,allow_empty_class,match_unset_backref
------------------------------------------------------------------
Bra
\x{100}
Ket
End
------------------------------------------------------------------
/[\u0100-\u0200]/B,alt_bsux,allow_empty_class,match_unset_backref
------------------------------------------------------------------
Bra
[\x{100}-\x{200}]
Ket
End
------------------------------------------------------------------
/\ud800/B,alt_bsux,allow_empty_class,match_unset_backref
------------------------------------------------------------------
Bra
\x{d800}
Ket
End
------------------------------------------------------------------
/^\x{ffff}+/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}?/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}*/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}{3}/i
\x{ffff}\x{ffff}\x{ffff}
0: \x{ffff}\x{ffff}\x{ffff}
/^\x{ffff}{0,3}/i
\x{ffff}
0: \x{ffff}
/[^\x00-a]{12,}[^b-\xff]*/B
------------------------------------------------------------------
Bra
[^\x00-a]{12,}
[^b-\xff]*+
Ket
End
------------------------------------------------------------------
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B
------------------------------------------------------------------
Bra
[^\x09-\x0d ]*
\s*
[0-9A-Z_a-z]++
\W+
[^0-9]*?
\d
0
[^0-9A-Z_a-z]{4,6}?
\w*
A
Ket
End
------------------------------------------------------------------
/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/B
------------------------------------------------------------------
Bra
a*
[b-\xff\x{100}-\x{200}]?+
a#
a*+
[b-\xff\x{100}-\x{200}]?
b#
[a-f]*+
[g-\xff\x{100}-\x{200}]*+
#
[g-\xff\x{100}-\x{200}]*+
[a-c]*+
#
[g-\xff\x{100}-\x{200}]*
[a-h]*+
Ket
End
------------------------------------------------------------------
/^[\x{1234}\x{4321}]{2,4}?/
\x{1234}\x{1234}\x{1234}
0: \x{1234}\x{1234}
# Check maximum non-UTF character size for the 16-bit library.
/\x{ffff}/
A\x{ffff}B
0: \x{ffff}
/\x{10000}/
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
/\o{20000}/
# Check maximum character size for the 32-bit library. These will all give
# errors in the 16-bit library.
/\x{110000}/
Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large
/\x{7fffffff}/
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
/\x{80000000}/
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
/\x{ffffffff}/
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
/\x{100000000}/
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
/\o{17777777777}/
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\o{20000000000}/
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\o{37777777777}/
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\o{40000000000}/
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\x{7fffffff}\x{7fffffff}/I
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
/\x{80000000}\x{80000000}/I
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
/\x{ffffffff}\x{ffffffff}/I
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
# Non-UTF characters
/.{2,3}/
\x{400000}\x{400001}\x{400002}\x{400003}
** Character \x{400000} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{400001} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{400002} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{400003} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
0: \x00\x01\x02
/\x{400000}\x{800000}/IBi
Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large
# Check character ranges
/[\H]/IB
------------------------------------------------------------------
Bra
[\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffff}]
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
: ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^
_ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80
\x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f
\x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e
\x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae
\xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd
\xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
\xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea
\xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9
\xfa \xfb \xfc \xfd \xfe \xff
Subject length lower bound = 1
/[\V]/IB
------------------------------------------------------------------
Bra
[\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffff}]
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82
\x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92
\x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1
\xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0
\xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf
\xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
\xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec
\xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb
\xfc \xfd \xfe \xff
Subject length lower bound = 1
/(*THEN:\[A]{65501})/expand
# We can use pcre2test's utf8_input modifier to create wide pattern characters,
# even though this test is run when UTF is not supported.
/a\x{d800}b/utf8_input
 €b
0: a\x{d800}b
a\x{d800}b
0: a\x{d800}b
a\o{154000}b
0: a\x{d800}b
\= Expect warning unless 32bit
a\N{U+d800}b
** Warning: character \N{U+d800} is a surrogate and should not be encoded as UTF-16
0: a\x{d800}b
/a\x{ffff}b/utf8_input
aï¿¿b
0: a\x{ffff}b
a\x{ffff}b
0: a\x{ffff}b
a\o{177777}b
0: a\x{ffff}b
a\N{U+ffff}b
0: a\x{ffff}b
/abý¿¿¿¿¿z/utf8_input
** Failed: character value greater than 0xffff cannot be converted to 16-bit in non-UTF mode
abý¿¿¿¿¿z
ab\x{7fffffff}z
ab\o{17777777777}z
ab\N{U+7fffffff}z
/abÿý¿¿¿¿¿z/utf8_input
** Failed: invalid UTF-8 string cannot be converted to 16-bit string
abÿý¿¿¿¿¿z
ab\x{ffffffff}z
/abÿAz/utf8_input
** Failed: invalid UTF-8 string cannot be converted to 16-bit string
abÿAz
ab\x{80000041}z
\= Expect no match
abAz
aAz
ab\377Az
ab\xff\N{U+0041}z
ab\N{U+ff}\N{U+41}z
/ab\x{80000041}z/
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
ab\x{80000041}z
/(?i:A{1,}\6666666666)/
A\x{1b6}6666666
0: A\x{1b6}6666666
/abc/substitute_extended,replace=>\777<
abc
1: >\x{1ff}<
/abc/substitute_extended,replace=>\o{012345}<
abc
1: >\x{14e5}<
# Character range merging tests
/[\x{100}-\x{200}\H\x{8000}-\x{9000}]/B
------------------------------------------------------------------
Bra
[\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffff}]
Ket
End
------------------------------------------------------------------
/[\x{100}-\x{200}\V\x{8000}-\x{9000}]/B
------------------------------------------------------------------
Bra
[\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffff}]
Ket
End
------------------------------------------------------------------
/[\x00-\x{6000}\x{3000}-\x{ffff}]#[\x00-\x{6000}\x{3000}-\x{ffff}]{5,7}?/B
------------------------------------------------------------------
Bra
AllAny
#
AllAny{5}
AllAny{0,2}?
Ket
End
------------------------------------------------------------------
/[\x00-\x{6000}\x{3000}-\x{ffffffff}]#[\x00-\x{6000}\x{3000}-\x{ffffffff}]{5,7}?/B
Failed: error 134 at offset 34: character code point value in \x{} or \o{} is too large
/[\x00-\x2f\x11-\xff]*?!/B
------------------------------------------------------------------
Bra
[\x00-\xff]*?
!
Ket
End
------------------------------------------------------------------
abcd!e
0: abcd!
/i/turkish_casing
Failed: error 204 at offset 0: PCRE2_EXTRA_TURKISH_CASING require Unicode (UTF or UCP) mode
# Character list tests
/([\x{100}-\x{7fff}\x{9000}\x{9002}\x{9004}\x{9006}\x{9008}\x{10000}-\x{7fffffff}]{3,8}?).#/B
Failed: error 134 at offset 66: character code point value in \x{} or \o{} is too large
\x{9001}\x{9007}\x{8000}\x{ffff}\x{9002}\x{7fff}\x{10000}\x{7fffffff}\x{500000}\x{9006}#
/([\x{3000}\x{3001}\x{3003}\x{3004}\x{3006}\x{3007}\x{8000}-\x{ffff}\x{100001}\x{100002}\x{100004}\x{100005}\x{100007}\x{100008}\x{10000a}\x{10000b}\x{80000000}-\x{ffffffff}]{5,}).#/B
Failed: error 134 at offset 76: character code point value in \x{} or \o{} is too large
\x{2fff}\x{3002}\x{7fff}\x{100000}\x{7fffffff}\x{3000}\x{3007}\x{8000}\x{ffff}\x{100001}\x{10000b}\x{80000000}\x{ffffffff}\x{3000}#
/([^\x{4000}\x{4002}\x{4004}\x{4005}\x{4007}\x{4009}\x{400a}\x{f000}\x{f002}\x{f004}\x{f005}\x{f007}\x{f009}\x{f00a}\x{100000}\x{100002}\x{100004}\x{100005}\x{100007}\x{100009}\x{10000a}\x{a0000000}\x{a0000002}\x{a0000004}\x{a0000005}\x{a0000007}\x{a0000009}\x{a000000a}]+).#/B
Failed: error 134 at offset 124: character code point value in \x{} or \o{} is too large
\x{4000}\x{4002}\x{4004}\x{4005}\x{4007}\x{4009}\x{400a}\x{3fff}\x{4001}\x{4003}\x{4006}\x{4008}\x{400b}\x{100}#
\x{f000}\x{f002}\x{f004}\x{f005}\x{f007}\x{f009}\x{f00a}\x{efff}\x{f001}\x{f003}\x{f006}\x{f008}\x{f00b}\x{100}#
\x{100000}\x{100002}\x{100004}\x{100005}\x{100007}\x{100009}\x{10000a}\x{fffff}\x{100001}\x{100003}\x{100006}\x{100008}\x{10000b}\x{100}#
\x{a0000000}\x{a0000002}\x{a0000004}\x{a0000005}\x{a0000007}\x{a0000009}\x{a000000a}\x{9fffffff}\x{a0000001}\x{a0000003}\x{a0000006}\x{a0000008}\x{a000000b}\x{100}#
# --------------
# EXTENDED CHARACTER CLASSES (UTS#18)
# META_BIGVALUE tests
/\x{80000000}/B
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
\x{80000000}
\= Expect no match
\x{7fffffff}
\x{80000001}
/[\x{80000000}-\x{8000000f}\x{8fffffff}]/B
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/\x{80000000}/B,alt_extended_class
Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large
\x{80000000}
\= Expect no match
\x{7fffffff}
\x{80000001}
/[\x{80000000}-\x{8000000f}\x{8fffffff}]/B,alt_extended_class
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/[\x{80000000}-\x{8000000f}--\x{80000002}]/B,alt_extended_class
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
/[[\x{80000000}-\x{8000000f}]--[\x{80000002}]]/B,alt_extended_class
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
# --------------
# EXTENDED CHARACTER CLASSES (Perl)
# META_BIGVALUE tests
/(?[[\x{80000000}-\x{8000000f}]+\x{8fffffff}])/B
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\x{80000002}
\x{8fffffff}
\= Expect no match
\x{7fffffff}
\x{90000000}
/(?[[\x{80000000}-\x{8000000f}]-\x{80000002}])/B
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
/(?[[\x{80000000}-\x{8000000f}]-\x{80000002}])/B
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\x{80000001}
\x{80000003}
\= Expect no match
\x{80000002}
# --------------
# End of testinput11

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,27 @@
# These DFA tests are for the handling of characters greater than 255 in
# 16-bit or 32-bit, non-UTF mode.
#forbid_utf
#subject dfa
/^\x{ffff}+/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}?/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}*/i
\x{ffff}
0: \x{ffff}
/^\x{ffff}{3}/i
\x{ffff}\x{ffff}\x{ffff}
0: \x{ffff}\x{ffff}\x{ffff}
/^\x{ffff}{0,3}/i
\x{ffff}
0: \x{ffff}
# End of testinput13

View File

@@ -0,0 +1,163 @@
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{d800}\=offset=3
No match
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
** Failed: character \N{U+110000} is greater than 0x10ffff and therefore cannot be encoded as UTF-16
XX\x{d800}\x{1234}
Failed: error -25: UTF-16 error: invalid low surrogate at offset 2
/badutf/utf
X\xdf
No match
XX\xef
No match
XXX\xef\x80
No match
X\xf7
No match
XX\xf7\x80
No match
XXX\xf7\x80\x80
No match
/shortutf/utf
XX\xdf\=ph
No match
XX\xef\=ph
No match
XX\xef\x80\=ph
No match
\xf7\=ph
No match
\xf7\x80\=ph
No match
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
No match
/\x{c1}+\x{e1}/iB,ucp
------------------------------------------------------------------
Bra
/i \x{c1}+
/i \x{e1}
Ket
End
------------------------------------------------------------------
\x{c1}\x{c1}\x{c1}
0: \xc1\xc1\xc1
1: \xc1\xc1
\x{e1}\x{e1}\x{e1}
0: \xe1\xe1\xe1
1: \xe1\xe1
/\x{120}\x{c1}/i,ucp,no_start_optimize
\x{121}\x{e1}
0: \x{121}\xe1
/\x{120}\x{c1}/i,ucp
\x{121}\x{e1}
0: \x{121}\xe1
/[^\x{120}]/i,no_start_optimize
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp,no_start_optimize
\= Expect no match
\x{121}
No match
/[^\x{120}]/i
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp
\= Expect no match
\x{121}
No match
/\x{120}{2}/i,ucp
\x{121}\x{121}
0: \x{121}\x{121}
/[^\x{120}]{2}/i,ucp
\= Expect no match
\x{121}\x{121}
No match
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
\x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
** Character \x{ffffffff} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
No match
# ----------------------------------------------------
# End of testinput14

View File

@@ -0,0 +1,159 @@
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
No match
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
No match
XX\xef
No match
XXX\xef\x80
No match
X\xf7
No match
XX\xf7\x80
No match
XXX\xf7\x80\x80
No match
/shortutf/utf
XX\xdf\=ph
No match
XX\xef\=ph
No match
XX\xef\x80\=ph
No match
\xf7\=ph
No match
\xf7\x80\=ph
No match
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
No match
/\x{c1}+\x{e1}/iB,ucp
------------------------------------------------------------------
Bra
/i \x{c1}+
/i \x{e1}
Ket
End
------------------------------------------------------------------
\x{c1}\x{c1}\x{c1}
0: \xc1\xc1\xc1
1: \xc1\xc1
\x{e1}\x{e1}\x{e1}
0: \xe1\xe1\xe1
1: \xe1\xe1
/\x{120}\x{c1}/i,ucp,no_start_optimize
\x{121}\x{e1}
0: \x{121}\xe1
/\x{120}\x{c1}/i,ucp
\x{121}\x{e1}
0: \x{121}\xe1
/[^\x{120}]/i,no_start_optimize
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp,no_start_optimize
\= Expect no match
\x{121}
No match
/[^\x{120}]/i
\x{121}
0: \x{121}
/[^\x{120}]/i,ucp
\= Expect no match
\x{121}
No match
/\x{120}{2}/i,ucp
\x{121}\x{121}
0: \x{121}\x{121}
/[^\x{120}]{2}/i,ucp
\= Expect no match
\x{121}\x{121}
No match
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
\x{ffffffff}
0: \x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
0: K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
No match
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
No match
/k\x{ffffffff}/caseless,ucp,no_start_optimize
K\x{ffffffff}
0: K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
No match
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
No match
# ----------------------------------------------------
# End of testinput14

View File

@@ -0,0 +1,163 @@
# These test special UTF and UCP features of DFA matching. The output is
# different for the different widths.
#subject dfa
# ----------------------------------------------------
# These are a selection of the more comprehensive tests that are run for
# non-DFA matching.
/X/utf
XX\x{d800}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
Error -36 (bad UTF-8 offset)
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
XX\xef
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xef\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
X\xf7
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1
XX\xf7\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xf7\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
/shortutf/utf
XX\xdf\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
XX\xef\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XX\xef\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
\xf7\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
\xf7\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
# ----------------------------------------------------
# UCP and casing tests - except for the first two, these will all fail in 8-bit
# mode because they are testing UCP without UTF and use characters > 255.
/\x{c1}/i,no_start_optimize
\= Expect no match
\x{e1}
No match
/\x{c1}+\x{e1}/iB,ucp
------------------------------------------------------------------
Bra
/i \x{c1}+
/i \x{e1}
Ket
End
------------------------------------------------------------------
\x{c1}\x{c1}\x{c1}
0: \xc1\xc1\xc1
1: \xc1\xc1
\x{e1}\x{e1}\x{e1}
0: \xe1\xe1\xe1
1: \xe1\xe1
/\x{120}\x{c1}/i,ucp,no_start_optimize
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{e1}
/\x{120}\x{c1}/i,ucp
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{e1}
/[^\x{120}]/i,no_start_optimize
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\x{121}
/[^\x{120}]/i,ucp,no_start_optimize
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}
/[^\x{120}]/i
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\x{121}
/[^\x{120}]/i,ucp
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}
/\x{120}{2}/i,ucp
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
\x{121}\x{121}
/[^\x{120}]{2}/i,ucp
Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{121}\x{121}
# ----------------------------------------------------
# ----------------------------------------------------
# Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
# mode; for the other widths they will fail.
/k*\x{ffffffff}/caseless,ucp
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
\x{ffffffff}
/k+\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}
/k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k\x{ffffffff}/caseless,ucp,no_start_optimize
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
K\x{ffffffff}
\= Expect no match
\x{ffffffff}\x{ffffffff}\x{ffffffff}
/k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
\= Expect no match
Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
# ----------------------------------------------------
# End of testinput14

View File

@@ -0,0 +1,542 @@
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 17.
# (2) Other tests that must not be run with JIT.
# These tests are first so that they don't inherit a large enough heap frame
# vector from a previous test.
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
Failed: error -63: heap limit exceeded
"(*LIMIT_HEAP=21)()((?))()()()()()()()()()()()()()()()()()()()()()()()(())()()()()()()()()()()()()()()()()()()()()()(())()()()()()()()()()()()()()"
xx
Failed: error -63: heap limit exceeded
# -----------------------------------------------------------------------
/(a+)*zz/I
Capture group count = 1
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 7
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaz\=find_limits_noheap
Minimum match limit = 20481
Minimum depth limit = 30
No match
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
Capture group count = 1
May match empty string
Subject length lower bound = 0
/* this is a C style comment */\=find_limits_noheap
Minimum match limit = 64
Minimum depth limit = 7
0: /* this is a C style comment */
1: /* this is a C style comment */
/^(?>a)++/
aa\=find_limits_noheap
Minimum match limit = 5
Minimum depth limit = 3
0: aa
aaaaaaaaa\=find_limits_noheap
Minimum match limit = 12
Minimum depth limit = 3
0: aaaaaaaaa
/(a)(?1)++/
aa\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 5
0: aa
1: a
aaaaaaaaa\=find_limits_noheap
Minimum match limit = 21
Minimum depth limit = 5
0: aaaaaaaaa
1: a
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 24
Minimum depth limit = 3
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 66
Minimum depth limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 66
Minimum depth limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 7
0: aabbccddee
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits_noheap
Minimum match limit = 12
Minimum depth limit = 12
0: aabbccddee
1: aa
2: bb
3: cc
4: dd
5: ee
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits_noheap
Minimum match limit = 10
Minimum depth limit = 10
0: aabbccddee
1: aa
2: cc
3: ee
/(*LIMIT_MATCH=12bc)abc/
Failed: error 160 at offset 16: (*VERB) not recognized or malformed
/(*LIMIT_MATCH=4294967290)abc/
Failed: error 160 at offset 23: (*VERB) not recognized or malformed
/(*LIMIT_DEPTH=4294967280)abc/I
Capture group count = 0
Depth limit = 4294967280
First code unit = 'a'
Last code unit = 'c'
Subject length lower bound = 3
/(a+)*zz/
\= Expect no match
aaaaaaaaaaaaaz
No match
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(a+)*zz/
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=10
Failed: error -53: matching depth limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capture group count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect limit exceeded
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capture group count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect limit exceeded
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capture group count = 1
Match limit = 60000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect no match
aaaaaaaaaaaaaz
No match
\= Expect limit exceeded
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(*LIMIT_DEPTH=10)(a+)*zz/I
Capture group count = 1
Depth limit = 10
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect limit exceeded
aaaaaaaaaaaaaz
Failed: error -53: matching depth limit exceeded
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=1000
Failed: error -53: matching depth limit exceeded
/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I
Capture group count = 1
Depth limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect no match
aaaaaaaaaaaaaz
No match
/(*LIMIT_DEPTH=1000)(a+)*zz/I
Capture group count = 1
Depth limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
\= Expect no match
aaaaaaaaaaaaaz
No match
\= Expect limit exceeded
aaaaaaaaaaaaaz\=depth_limit=10
Failed: error -53: matching depth limit exceeded
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
Failed: error -52: nested recursion at the same subject position
/((?(R2)a+|(?1)b))()/
aaaabcde
Failed: error -52: nested recursion at the same subject position
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -52: nested recursion at the same subject position
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
0: abcxyz
>>>
abcxyzpqr\=aftertext
0: abcxyz
>>>
0+ xyzpqr
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
0: pqrabcxyz
<<< >>>
xyzpqrabcxyzpqr\=aftertext
0: pqrabcxyz
<<< >>>
0+ xyzpqr
/a\b/
a.\=allusedtext
0: a.
>
a\=allusedtext
0: a
/abc\Kxyz/
abcxyz\=allusedtext
0: abcxyz
<<<
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
0: abcxyz
>>>
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
0: abcabcde
>>>>>
#subject allusedtext
/(?<=abc)123/
xyzabc123pqr
0: abc123
<<<
xyzabc12\=ps
Partial match: abc12
<<<
xyzabc12\=ph
Partial match: abc12
<<<
/\babc\b/
+++abc+++
0: +abc+
< >
+++ab\=ps
Partial match: +ab
<
+++ab\=ph
Partial match: +ab
<
/(?<=abc)def/
abc\=ph
Partial match: abc
<<<
/(?<=123)(*MARK:xx)abc/mark
xxxx123a\=ph
Partial match, mark=xx: 123a
<<<
xxxx123a\=ps
Partial match, mark=xx: 123a
<<<
/(?<=(?<=a)b)c.*/I
Capture group count = 0
Max lookbehind = 1
First code unit = 'c'
Subject length lower bound = 1
abc\=ph
Partial match: abc
<<
\= Expect no match
xbc\=ph
No match
/(?<=ab)c.*/I
Capture group count = 0
Max lookbehind = 2
First code unit = 'c'
Subject length lower bound = 1
abc\=ph
Partial match: abc
<<
\= Expect no match
xbc\=ph
No match
/abc(?<=bc)def/
xxxabcd\=ph
Partial match: abcd
/(?<=ab)cdef/
xxabcd\=ph
Partial match: abcd
<<
/(?<=(?<=(?<=a)b)c)./I
Capture group count = 0
Max lookbehind = 1
Subject length lower bound = 1
123abcXYZ
0: abcX
<<<
/(?<=ab(cd(?<=...)))./I
Capture group count = 1
Max lookbehind = 4
Subject length lower bound = 1
abcdX
0: abcdX
<<<<
1: cd
/(?<=ab((?<=...)cd))./I
Capture group count = 1
Max lookbehind = 4
Subject length lower bound = 1
ZabcdX
0: ZabcdX
<<<<<
1: cd
/(?<=((?<=(?<=ab).))(?1)(?1))./I
Capture group count = 1
Max lookbehind = 2
Subject length lower bound = 1
abxZ
0: abxZ
<<<
1:
#subject
# -------------------------------------------------------------------
# These tests provoke recursion loops, which give a different error message
# when JIT is used.
/(?R)/I
Capture group count = 0
May match empty string
Subject length lower bound = 0
abcd
Failed: error -52: nested recursion at the same subject position
/(a|(?R))/I
Capture group count = 1
May match empty string
Subject length lower bound = 0
abcd
0: a
1: a
defg
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?R))))/I
Capture group count = 3
May match empty string
Subject length lower bound = 0
abcd
0: ab
1: ab
fghi
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?1))))/I
Capture group count = 3
May match empty string
Subject length lower bound = 0
abcd
0: ab
1: ab
fghi
Failed: error -52: nested recursion at the same subject position
/x(ab|(bc|(de|(?1)x)x)x)/I
Capture group count = 3
First code unit = 'x'
Subject length lower bound = 3
xab123
0: xab
1: ab
xfghi
Failed: error -52: nested recursion at the same subject position
/(?!\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
=abc
Failed: error -52: nested recursion at the same subject position
/(?=\w)(?R)/
=abc
Failed: error -52: nested recursion at the same subject position
abcd
Failed: error -52: nested recursion at the same subject position
/(?<!\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
/(?<=\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
/(a+|(?R)b)/
aaa
0: aaa
1: aaa
bbb
Failed: error -52: nested recursion at the same subject position
/[^\xff]((?1))/BI
------------------------------------------------------------------
Bra
[^\x{ff}] (not)
CBra 1
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capture group count = 1
Subject length lower bound = 1
abcd
Failed: error -52: nested recursion at the same subject position
# These tests don't behave the same with JIT
/\w+(?C1)/BI,no_auto_possess
------------------------------------------------------------------
Bra
\w+
Callout 1 8 0
Ket
End
------------------------------------------------------------------
Capture group count = 0
Options: no_auto_possess
Optimizations: dotstar_anchor,start_optimize
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
------------------------------------------------------------------
Bra
\w+
Callout 1 26 0
Ket
End
------------------------------------------------------------------
Capture group count = 0
Compile options: <none>
Overall options: no_auto_possess
Optimizations: dotstar_anchor,start_optimize
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
# This test breaks the JIT stack limit
/(|]+){2,2452}/
(|]+){2,2452}
0:
1:
/b(?<!ax)(?!cx)/allusedtext
abc
0: abc
< >
abcz
0: abcz
< >>
# This test triggers the recursion limit in the interpreter, but completes in
# JIT. It's in testinput2 with disable_recurse_loop_check to get it to work
# in the interpreter.
/(a(?1)z||(?1)++)$/
abcd
Failed: error -52: nested recursion at the same subject position
# End of testinput15

View File

@@ -0,0 +1,18 @@
# This test is run only when JIT support is not available. It checks that an
# attempt to use it has the expected behaviour. It also tests things that
# are different without JIT.
/abc/I,jit,jitverify
JIT compilation was not successful (bad JIT option)
Capture group count = 0
First code unit = 'a'
Last code unit = 'c'
Subject length lower bound = 3
JIT support is not available in this version of PCRE2
/a*/I
Capture group count = 0
May match empty string
Subject length lower bound = 0
# End of testinput16

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,230 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface, which is supported only with the 8-bit library. This test should
# not be run with JIT (which is not available for the POSIX interface).
#forbid_utf
#pattern posix
# Test some invalid options
/abc/auto_callout
** Ignored with POSIX interface: auto_callout
/abc/
abc\=find_limits
** Ignored with POSIX interface: find_limits
0: abc
/abc/
abc\=partial_hard
** Ignored with POSIX interface: partial_hard
0: abc
/a(())bc/parens_nest_limit=1
** Ignored with POSIX interface: parens_nest_limit
/abc/allow_surrogate_escapes,max_pattern_length=2
** Ignored with POSIX interface: allow_surrogate_escapes max_pattern_length
# Real tests
/abc/
abc
0: abc
/^abc|def/
abcdef
0: abc
abcdef\=notbol
0: def
/.*((abc)$|(def))/
defabc
0: defabc
1: abc
2: abc
defabc\=noteol
0: def
1: def
2: <unset>
3: def
/the quick brown fox/
the quick brown fox
0: the quick brown fox
\= Expect no match
The Quick Brown Fox
No match: POSIX code 17: match failed
/the quick brown fox/i
the quick brown fox
0: the quick brown fox
The Quick Brown Fox
0: The Quick Brown Fox
/(*LF)abc.def/
\= Expect no match
abc\ndef
No match: POSIX code 17: match failed
/(*LF)abc$/
abc
0: abc
abc\n
0: abc
/(abc)\2/
Failed: POSIX code 15: bad back reference at offset 6
/(abc\1)/
\= Expect no match
abc
No match: POSIX code 17: match failed
/a*(b+)(z)(z)/
aaaabbbbzzzz
0: aaaabbbbzz
1: bbbb
2: z
3: z
aaaabbbbzzzz\=ovector=0
Matched without capture
aaaabbbbzzzz\=ovector=1
0: aaaabbbbzz
aaaabbbbzzzz\=ovector=2
0: aaaabbbbzz
1: bbbb
/(*ANY)ab.cd/
ab-cd
0: ab-cd
ab=cd
0: ab=cd
\= Expect no match
ab\ncd
No match: POSIX code 17: match failed
/ab.cd/s
ab-cd
0: ab-cd
ab=cd
0: ab=cd
ab\ncd
0: ab\x0acd
/a(b)c/posix_nosub
abc
Matched with REG_NOSUB
/a(?P<name>b)c/posix_nosub
abc
Matched with REG_NOSUB
/(a)\1/posix_nosub
zaay
Matched with REG_NOSUB
/a?|b?/
abc
0: a
\= Expect no match
ddd\=notempty
No match: POSIX code 17: match failed
/\w+A/
CDAAAAB
0: CDAAAA
/\w+A/ungreedy
CDAAAAB
0: CDA
/\Biss\B/I,aftertext
** Ignored with POSIX interface: info
Mississippi
0: iss
0+ issippi
/abc/\
Failed: POSIX code 9: bad escape sequence at offset 4
"(?(?C)"
Failed: POSIX code 11: unbalanced () at offset 6
"(?(?C))"
Failed: POSIX code 3: pattern error at offset 6
/abcd/substitute_extended
** Ignored with POSIX interface: substitute_extended
/\[A]{1000000}**/expand,regerror_buffsize=31
Failed: POSIX code 4: ? * + invalid at offset 100000
** regerror() message truncated
/\[A]{1000000}**/expand,regerror_buffsize=32
Failed: POSIX code 4: ? * + invalid at offset 1000001
//posix_nosub
\=offset=70000
** Ignored with POSIX interface: offset
Matched with REG_NOSUB
/^d(e)$/posix
acdef\=posix_startend=2:4
0: de
1: e
acde\=posix_startend=2
0: de
1: e
\= Expect no match
acdef
No match: POSIX code 17: match failed
acdef\=posix_startend=2
No match: POSIX code 17: match failed
/^a\x{00}b$/posix
a\x{00}b\=posix_startend=0:3
0: a\x00b
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
0: A\x00B
/ABC/use_length
ABC
0: ABC
/a\b(c/literal,posix
a\\b(c
0: a\b(c
/a\b(c/literal,posix,dotall
Failed: POSIX code 16: bad argument at offset 0
/((a)(b)?(c))/posix
123ace
0: ac
1: ac
2: a
3: <unset>
4: c
123ace\=posix_startend=2:6
0: ac
1: ac
2: a
3: <unset>
4: c
//posix
\= Expect errors
\=null_subject
No match: POSIX code 16: bad argument
abc\=null_subject
No match: POSIX code 16: bad argument
/(*LIMIT_HEAP=0)xx/posix
\= Expect error
xxxx
No match: POSIX code 14: failed to get memory
# End of testdata/testinput18

View File

@@ -0,0 +1,30 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
0: a\x{1234}b
/\w/
\= Expect no match
+++\x{c2}
No match: POSIX code 17: match failed
/\w/ucp
+++\x{c2}
0: \xc2
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
0: AB\x{00}\x{1234}
/\w/utf
\= Expect UTF error
A\xabB
No match: POSIX code 16: bad argument
# End of testdata/testinput19

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,161 @@
# This set of tests exercises the serialization/deserialization and code copy
# functions in the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
** Ignored when compiled pattern is stacked with 'push': mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
Capture group count = 2
Max back reference = 2
Named capture groups:
n 1
n 2
Options: dupnames
Starting code units: b f
Subject length lower bound = 6
foofoo
0: foofoo
1: foo
barbar
0: barbar
1: <unset>
2: bar
#pop mark
C
0: C
1: C
MK: A
\= Expect no match
D
No match, mark = A
#pop
AmanaplanacanalPanama
0: AmanaplanacanalPanama
1: <unset>
2: <unset>
3: AmanaplanacanalPanama
4: A
#pop info
Capture group count = 4
Named capture groups:
ADDR 2
ADDRESS_PAT 4
NAME 1
NAME_PAT 3
Options: extended
Subject length lower bound = 3
metcalfe 33
0: metcalfe 33
1: metcalfe
2: 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
Serialization failed: error -30: patterns do not all use the same character tables
#pop
xyz
0: xyz
#pop
abc
0: abc
#pop should give an error
** Can't pop off an empty stack
pqr
/abcd/pushcopy
abcd
0: abcd
#pop
abcd
0: abcd
#pop should give an error
** Can't pop off an empty stack
/abcd/push
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
/abcd/push
#save testsaved1
#pop should give an error
** Can't pop off an empty stack
#load testsaved1
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
#pop should give an error
** Can't pop off an empty stack
/abcd/pushtablescopy
abcd
0: abcd
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
# Must only specify one of these
//push,pushcopy
** Not allowed together: push pushcopy
//push,pushtablescopy
** Not allowed together: push pushtablescopy
//pushcopy,pushtablescopy
** Not allowed together: pushcopy pushtablescopy
# End of testinput20

View File

@@ -0,0 +1,97 @@
# These are tests of \C that do not involve UTF. They are not run when \C is
# disabled by compiling with --enable-never-backslash-C.
/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx
------------------------------------------------------------------
Bra
AllAny+
\D
AllAny+
\d
AllAny+
\S
AllAny+
\s
AllAny+
\W
AllAny+
\w
AllAny+
Any
AllAny+
\R
AllAny+
\H
AllAny+
\h
AllAny+
\V
AllAny+
\v
AllAny+
\Z
AllAny++
\z
AllAny+
$
Ket
End
------------------------------------------------------------------
/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx
------------------------------------------------------------------
Bra
\D+
AllAny
\d+
AllAny
\S+
AllAny
\s+
AllAny
\W+
AllAny
\w+
AllAny
Any+
AllAny
\R+
AllAny
\H+
AllAny
\h+
AllAny
\V+
AllAny
\v+
AllAny
a+
AllAny
\x0a+
AllAny
AllAny+
AllAny
Ket
End
------------------------------------------------------------------
/ab\Cde/never_backslash_c
Failed: error 183 at offset 4: using \C is disabled by the application
/ab\Cde/info
Capture group count = 0
Contains \C
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 5
abXde
0: abXde
/(?<=ab\Cde)X/
abZdeX
0: X
/[\C]/
Failed: error 107 at offset 2: escape sequence is invalid in character class
# End of testinput21

View File

@@ -0,0 +1,182 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 2
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-16 mode
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
Anybyte+
extuni
extuni+
Anybyte
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
No match
X\x{11234}Y
0: X\x{11234}Y
1: \x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
1: \x{11234}Y
/X(\C{4})/utf
X\x{1234}YZ
No match
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
X\x{11234}YZW
0: X\x{11234}YZ
1: \x{11234}YZ
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}YZ
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
0: X\x{11234}\x{512}
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
0: a\x{100}b
/a\C\Cb/utf
a\x{100}b
No match
a\x{12257}b
0: a\x{12257}b
a\x{12257}\x{11234}b
No match
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
0: a\x{100}b
/^ab\C/utf,no_start_optimize
\= Expect no match - tests \C at end of subject
ab
No match
/\C[^\v]+\x80/utf
[AΏBŀC]
No match
/\C[^\d]+\x80/utf
[AΏBŀC]
No match

View File

@@ -0,0 +1,180 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 5
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
ab!deXYZ
0: X
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
No match
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
/X(\C{4})/utf
X\x{1234}YZ
No match
X\x{11234}YZ
No match
X\x{11234}YZW
0: X\x{11234}YZW
1: \x{11234}YZW
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}YZ
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
No match
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}Y
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
0: a\x{100}b
/a\C\Cb/utf
a\x{100}b
No match
a\x{12257}b
No match
a\x{12257}\x{11234}b
0: a\x{12257}\x{11234}b
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
0: a\x{100}b
/^ab\C/utf,no_start_optimize
\= Expect no match - tests \C at end of subject
ab
No match
/\C[^\v]+\x80/utf
[AΏBŀC]
No match
/\C[^\d]+\x80/utf
[AΏBŀC]
No match

View File

@@ -0,0 +1,184 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 2
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-8 mode
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
Anybyte+
extuni
extuni+
Anybyte
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
X\x{11234}Y
0: X\x{f0}\x{91}\x{88}
1: \x{f0}\x{91}\x{88}
X\x{11234}YZ
0: X\x{f0}\x{91}\x{88}
1: \x{f0}\x{91}\x{88}
/X(\C{4})/utf
X\x{1234}YZ
0: X\x{1234}Y
1: \x{1234}Y
X\x{11234}YZ
0: X\x{11234}
1: \x{11234}
X\x{11234}YZW
0: X\x{11234}
1: \x{11234}
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
0: X\x{1234}\x{512}
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
X\x{11234}\x{512}
0: X\x{11234}\x{d4}
X\x{11234}\x{512}YZ
0: X\x{11234}\x{d4}
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{d4}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}
X\x{1234}\x{512}
0: X\x{1234}
X\x{11234}Y
0: X\x{f0}\x{91}\x{88}
X\x{11234}YZ
0: X\x{f0}\x{91}\x{88}
X\x{11234}\x{512}YZ
0: X\x{f0}\x{91}\x{88}
X\x{11234}
0: X\x{f0}\x{91}\x{88}
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
No match
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
a\x{12257}b
No match
a\x{12257}\x{11234}b
No match
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{e1}
2: \x{88}\x{b4}
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
No match
/^ab\C/utf,no_start_optimize
\= Expect no match - tests \C at end of subject
ab
No match
/\C[^\v]+\x80/utf
[AΏBŀC]
No match
/\C[^\d]+\x80/utf
[AΏBŀC]
No match

View File

@@ -0,0 +1,11 @@
# This test is run when PCRE2 has been built with --enable-never-backslash-C,
# which disables the use of \C. All we can do is check that it gives the
# correct error message.
/a\Cb/
Failed: error 185 at offset 3: using \C is disabled in this PCRE2 library
/a[\C]b/
Failed: error 107 at offset 3: escape sequence is invalid in character class
# End of testinput23

View File

@@ -0,0 +1,624 @@
# This file tests the auxiliary pattern conversion features of the PCRE2
# library, in non-UTF mode.
#forbid_utf
#newline_default lf any anycrlf
# -------- Tests of glob conversion --------
# Set the glob separator explicitly so that different OS defaults are not a
# problem. Then test various errors.
#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/
/abc/posix
** The convert and posix modifiers are mutually exclusive
# Separator must be / \ or .
/a*b/convert_glob_separator=%
** Invalid glob separator '%'
# Can't have separator in a class
"[ab/cd]"
(?s)\A[ab/cd](?<!/)\z
"[,-/]"
(?s)\A[,-/](?<!/)\z
/[ab/
** Pattern conversion error at offset 3: missing terminating ] for character class
# Length check
/abc/convert_length=11
** Pattern conversion error at offset 3: no more memory
/abc/convert_length=12
(?s)\Aabc\z
# Now some actual tests
/a?b[]xy]*c/
(?s)\Aa[^/]b[\]xy](*COMMIT)[^/]*?c\z
azb]1234c
0: azb]1234c
# Tests from the gitwildmatch list, with some additions
/foo/
(?s)\Afoo\z
foo
0: foo
/= Expect no match
No match
bar
No match
//
(?s)\A\z
\
0:
/???/
(?s)\A[^/][^/][^/]\z
foo
0: foo
\= Expect no match
foobar
No match
/*/
(?s)\A[^/]*+\z
foo
0: foo
\
0:
/f*/
(?s)\Af(*COMMIT)[^/]*+\z
foo
0: foo
f
0: f
/*f/
(?s)\A[^/]*?f\z
oof
0: oof
\= Expect no match
foo
No match
/*foo*/
(?s)\A[^/]*?foo(*COMMIT)[^/]*+\z
foo
0: foo
food
0: food
aprilfool
0: aprilfool
/*ob*a*r*/
(?s)\A[^/]*?ob(*COMMIT)[^/]*?a(*COMMIT)[^/]*?r(*COMMIT)[^/]*+\z
foobar
0: foobar
/*ab/
(?s)\A[^/]*?ab\z
aaaaaaabababab
0: aaaaaaabababab
/foo\*/
(?s)\Afoo\*\z
foo*
0: foo*
/foo\*bar/
(?s)\Afoo\*bar\z
\= Expect no match
foobar
No match
/f\\oo/
(?s)\Af\\oo\z
f\\oo
0: f\oo
/*[al]?/
(?s)\A[^/]*?[al][^/]\z
ball
0: ball
/[ten]/
(?s)\A[ten]\z
\= Expect no match
ten
No match
/t[a-g]n/
(?s)\At[a-g]n\z
ten
0: ten
/a[]]b/
(?s)\Aa[\]]b\z
a]b
0: a]b
/a[]a-]b/
(?s)\Aa[\]a\-]b\z
/a[]-]b/
(?s)\Aa[\]\-]b\z
a-b
0: a-b
a]b
0: a]b
\= Expect no match
aab
No match
/a[]a-z]b/
(?s)\Aa[\]a-z]b\z
aab
0: aab
/]/
(?s)\A\]\z
]
0: ]
/t[!a-g]n/
(?s)\At[^/a-g]n\z
ton
0: ton
\= Expect no match
ten
No match
'[[:alpha:]][[:digit:]][[:upper:]]'
(?s)\A[[:alpha:]][[:digit:]][[:upper:]]\z
a1B
0: a1B
'[[:digit:][:upper:][:space:]]'
(?s)\A[[:digit:][:upper:][:space:]]\z
A
0: A
1
0: 1
\ \=
0:
\= Expect no match
a
No match
.
No match
'[a-c[:digit:]x-z]'
(?s)\A[a-c[:digit:]x-z]\z
5
0: 5
b
0: b
y
0: y
\= Expect no match
q
No match
# End of gitwildmatch tests
/*.j?g/
(?s)\A[^/]*?\.j[^/]g\z
pic01.jpg
0: pic01.jpg
.jpg
0: .jpg
pic02.jxg
0: pic02.jxg
\= Expect no match
pic03.j/g
No match
/A[+-0]B/
(?s)\AA[+-0](?<!/)B\z
A+B
0: A+B
A.B
0: A.B
A0B
0: A0B
\= Expect no match
A/B
No match
/*x?z/
(?s)\A[^/]*?x[^/]z\z
abc.xyz
0: abc.xyz
\= Expect no match
.xyz
0: .xyz
/?x?z/
(?s)\A[^/]x[^/]z\z
axyz
0: axyz
\= Expect no match
.xyz
0: .xyz
"[,-0]x?z"
(?s)\A[,-0](?<!/)x[^/]z\z
,xyz
0: ,xyz
\= Expect no match
/xyz
No match
.xyz
0: .xyz
".x*"
(?s)\A\.x(*COMMIT)[^/]*+\z
.xabc
0: .xabc
/a[--0]z/
(?s)\Aa[\--0](?<!/)z\z
a-z
0: a-z
a.z
0: a.z
a0z
0: a0z
\= Expect no match
a/z
No match
a1z
No match
/<[a-c-d]>/
(?s)\A<[a-c\-d]>\z
<a>
0: <a>
<b>
0: <b>
<c>
0: <c>
<d>
0: <d>
<->
0: <->
/a[[:digit:].]z/
(?s)\Aa[[:digit:].]z\z
a1z
0: a1z
a.z
0: a.z
\= Expect no match
a:z
No match
/a[[:digit].]z/
(?s)\Aa[\[:digit]\.\]z\z
a[.]z
0: a[.]z
a:.]z
0: a:.]z
ad.]z
0: ad.]z
/<[[:a[:digit:]b]>/
(?s)\A<[\[:a[:digit:]b]>\z
<[>
0: <[>
<:>
0: <:>
<a>
0: <a>
<9>
0: <9>
<b>
0: <b>
\= Expect no match
<d>
No match
/a*b/convert_glob_separator=\
(?s)\Aa(*COMMIT)[^\\]*?b\z
/a*b/convert_glob_separator=.
(?s)\Aa(*COMMIT)[^\.]*?b\z
/a*b/convert_glob_separator=/
(?s)\Aa(*COMMIT)[^/]*?b\z
# Non control character checking
/A\B\\C\D/
(?s)\AAB\\CD\z
/\\{}\?\*+\[\]()|.^$/
(?s)\A\\\{\}\?\*\+\[\]\(\)\|\.\^\$\z
/*a*\/*b*/
(?s)\A[^/]*?a(*COMMIT)[^/]*?/(*COMMIT)[^/]*?b(*COMMIT)[^/]*+\z
/?a?\/?b?/
(?s)\A[^/]a[^/]/[^/]b[^/]\z
/[a\\b\c][]][-][\]\-]/
(?s)\A[a\\bc][\]][\-][\]\-]\z
/[^a\\b\c][!]][!-][^\]\-]/
(?s)\A[^/a\\bc][^/\]][^/\-][^/\]\-]\z
/[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]]/
(?s)\A[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]](?<!/)\z
"[/-/]"
(?s)\A[/-/](?<!/)\z
/[-----]/
(?s)\A[\--\-\-\-]\z
/[------]/
(?s)\A[\--\-\--\-]\z
/[!------]/
(?s)\A[^/\--\-\--\-]\z
/[[:alpha:]-a]/
(?s)\A[[:alpha:]\-a]\z
/[[:alpha:]][[:punct:]][[:ascii:]]/
(?s)\A[[:alpha:]][[:punct:]](?<!/)[[:ascii:]](?<!/)\z
/[a-[:alpha:]]/
** Pattern conversion error at offset 4: invalid syntax
/[[:alpha:/
** Pattern conversion error at offset 9: missing terminating ] for character class
/[[:alpha:]/
** Pattern conversion error at offset 10: missing terminating ] for character class
/[[:alphaa:]]/
(?s)\A[\[:alphaa:]\]\z
/[[:xdigi:]]/
(?s)\A[\[:xdigi:]\]\z
/[[:xdigit::]]/
(?s)\A[\[:xdigit::]\]\z
/****/
(?s)
/**\/abc/
(?s)(?:\A|/)abc\z
abc
0: abc
x/abc
0: /abc
xabc
No match
/abc\/**/
(?s)\Aabc/
/abc\/**\/abc/
(?s)\Aabc/(*COMMIT)(?:.*?/)??abc\z
/**\/*a*b*g*n*t/
(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*?b)(?>[^/]*?g)(?>[^/]*?n)(?>[^/]*?t\z)
abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt
0: /abcdefghijklmnop.txt
/**\/*a*\/**/
(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*?/)
xx/xx/xx/xax/xx/xb
0: /xax/
/**\/*a*/
(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*+\z)
xx/xx/xx/xax
0: /xax
xx/xx/xx/xax/xx
No match
/**\/*a*\/**\/*b*/
(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*?/)(*COMMIT)(?:.*?/)??(?>[^/]*?b)(?>[^/]*+\z)
xx/xx/xx/xax/xx/xb
0: /xax/xx/xb
xx/xx/xx/xax/xx/x
No match
"**a"convert=glob
(?s)a\z
a
0: a
c/b/a
0: a
c/b/aaa
0: a
"a**/b"convert=glob
(?s)\Aa(*COMMIT).*?/b\z
a/b
0: a/b
ab
No match
"a/**b"convert=glob
(?s)\Aa/(*COMMIT).*?b\z
a/b
0: a/b
ab
No match
#pattern convert=glob:glob_no_starstar
/***/
(?s)\A[^/]*+\z
/**a**/
(?s)\A[^/]*?a(*COMMIT)[^/]*+\z
#pattern convert=unset
#pattern convert=glob:glob_no_wild_separator
/*/
(?s)
/*a*/
(?s)a
/**a**/
(?s)a
/a*b/
(?s)\Aa(*COMMIT).*?b\z
/*a*b*/
(?s)a(*COMMIT).*?b
/??a??/
(?s)\A..a..\z
#pattern convert=unset
#pattern convert=glob,convert_glob_escape=0
/a\b\cd/
(?s)\Aa\\b\\cd\z
/**\/a/
(?s)\\/a\z
/a`*b/convert_glob_escape=`
(?s)\Aa\*b\z
/a`*b/convert_glob_escape=0
(?s)\Aa`(*COMMIT)[^/]*?b\z
/a`*b/convert_glob_escape=x
** Invalid glob escape 'x'
# -------- Tests of extended POSIX conversion --------
#pattern convert=unset:posix_extended
/<[[:a[:digit:]b]>/
(*NUL)<[[:a[:digit:]b]>
<[>
0: <[>
<:>
0: <:>
<a>
0: <a>
<9>
0: <9>
<b>
0: <b>
\= Expect no match
<d>
No match
/a+\1b\\c|d[ab\c]/
(*NUL)a+1b\\c|d[ab\\c]
/<[]bc]>/
(*NUL)<[]bc]>
<]>
0: <]>
<b>
0: <b>
<c>
0: <c>
/<[^]bc]>/
(*NUL)<[^]bc]>
<.>
0: <.>
\= Expect no match
<]>
No match
<b>
No match
/(a)\1b/
(*NUL)(a)1b
a1b
0: a1b
1: a
\= Expect no match
aab
No match
/(ab)c)d]/
(*NUL)(ab)c\)d\]
Xabc)d]Y
0: abc)d]
1: ab
/a***b/
(*NUL)a*b
# -------- Tests of basic POSIX conversion --------
#pattern convert=unset:posix_basic
/a*b+c\+[def](ab)\(cd\)/
(*NUL)a*b\+c\+[def]\(ab\)(cd)
/\(a\)\1b/
(*NUL)(a)\1b
aab
0: aab
1: a
\= Expect no match
a1b
No match
/how.to how\.to/
(*NUL)how.to how\.to
how\nto how.to
0: how\x0ato how.to
\= Expect no match
how\x{0}to how.to
No match
/^how to \^how to/
(*NUL)^how to \^how to
/^*abc/
(*NUL)^\*abc
/*abc/
(*NUL)\*abc
X*abcY
0: *abc
/**abc/
(*NUL)\**abc
XabcY
0: abc
X*abcY
0: *abc
X**abcY
0: **abc
/*ab\(*cd\)/
(*NUL)\*ab(\*cd)
/^b\(c^d\)\(^e^f\)/
(*NUL)^b(c\^d)(^e\^f)
/a***b/
(*NUL)a*b
# End of testinput24

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,177 @@
# This set of tests checks local-specific features, using the "fr_FR" locale.
# It is almost Perl-compatible. When run via RunTest, the locale is edited to
# be whichever of "fr_FR", "french", or "fr" is found to exist. There is
# different version of this file called wintestinput3 for use on Windows,
# where the locale is called "french" and the tests are run using
# RunTest.bat.
#forbid_utf
/^[\w]+/
\= Expect no match
École
No match
/^[\w]+/locale=fr_FR
École
0: École
/^[\W]+/
École
0: \xc9
/^[\W]+/locale=fr_FR
\= Expect no match
École
No match
/[\b]/
\b
0: \x08
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
\= Expect no match
a
No match
/^\w+/
\= Expect no match
École
No match
/^\w+/locale=fr_FR
École
0: École
/(.+)\b(.+)/
École
0: \xc9cole
1: \xc9
2: cole
/(.+)\b(.+)/locale=fr_FR
\= Expect no match
École
No match
/École/i
École
0: \xc9cole
\= Expect no match
école
No match
/École/i,locale=fr_FR
École
0: École
école
0: école
/\w/I
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 1
# All remaining tests are in the fr_FR locale, so set the default.
#pattern locale=fr_FR
/^[\xc8-\xc9]/i
École
0: É
école
0: é
/^[\xc8-\xc9]/
École
0: É
\= Expect no match
école
No match
/\xb5/i
µ
0: µ
\= Expect no match
\x9c
No match
/ÿ/i
\xff
0: ÿ
\= Expect no match
y
No match
/(.)\1/i
\xfe\xde
0: þÞ
1: þ
/\W+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[\W]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[^[:alpha:]]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/\w+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[\w]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]][[:lower:]][[:upper:]]/IB
------------------------------------------------------------------
Bra
[A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
[a-z\xb5\xdf-\xf6\xf8-\xff]
[A-Z\xc0-\xd6\xd8-\xde]
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 3
# End of testinput3

View File

@@ -0,0 +1,177 @@
# This set of tests checks local-specific features, using the "fr_FR" locale.
# It is almost Perl-compatible. When run via RunTest, the locale is edited to
# be whichever of "fr_FR", "french", or "fr" is found to exist. There is
# different version of this file called wintestinput3 for use on Windows,
# where the locale is called "french" and the tests are run using
# RunTest.bat.
#forbid_utf
/^[\w]+/
\= Expect no match
École
No match
/^[\w]+/locale=fr_FR
École
0: École
/^[\W]+/
École
0: \xc9
/^[\W]+/locale=fr_FR
\= Expect no match
École
No match
/[\b]/
\b
0: \x08
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
\= Expect no match
a
No match
/^\w+/
\= Expect no match
École
No match
/^\w+/locale=fr_FR
École
0: École
/(.+)\b(.+)/
École
0: \xc9cole
1: \xc9
2: cole
/(.+)\b(.+)/locale=fr_FR
\= Expect no match
École
No match
/École/i
École
0: \xc9cole
\= Expect no match
école
No match
/École/i,locale=fr_FR
École
0: École
école
0: école
/\w/I
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 1
# All remaining tests are in the fr_FR locale, so set the default.
#pattern locale=fr_FR
/^[\xc8-\xc9]/i
École
0: É
école
0: é
/^[\xc8-\xc9]/
École
0: É
\= Expect no match
école
No match
/\xb5/i
µ
0: µ
\= Expect no match
\x9c
No match
/ÿ/i
\xff
0: ÿ
\= Expect no match
y
No match
/(.)\1/i
\xfe\xde
0: þÞ
1: þ
/\W+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[\W]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[^[:alpha:]]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/\w+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[\w]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]][[:lower:]][[:upper:]]/IB
------------------------------------------------------------------
Bra
[A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
[a-z\xaa\xb5\xba\xdf-\xf6\xf8-\xff]
[A-Z\xc0-\xd6\xd8-\xde]
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 3
# End of testinput3

View File

@@ -0,0 +1,177 @@
# This set of tests checks local-specific features, using the "fr_FR" locale.
# It is almost Perl-compatible. When run via RunTest, the locale is edited to
# be whichever of "fr_FR", "french", or "fr" is found to exist. There is
# different version of this file called wintestinput3 for use on Windows,
# where the locale is called "french" and the tests are run using
# RunTest.bat.
#forbid_utf
/^[\w]+/
\= Expect no match
École
No match
/^[\w]+/locale=fr_FR
École
0: École
/^[\W]+/
École
0: \xc9
/^[\W]+/locale=fr_FR
\= Expect no match
École
No match
/[\b]/
\b
0: \x08
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
\= Expect no match
a
No match
/^\w+/
\= Expect no match
École
No match
/^\w+/locale=fr_FR
École
0: École
/(.+)\b(.+)/
École
0: \xc9cole
1: \xc9
2: cole
/(.+)\b(.+)/locale=fr_FR
\= Expect no match
École
No match
/École/i
École
0: \xc9cole
\= Expect no match
école
No match
/École/i,locale=fr_FR
École
0: École
école
0: école
/\w/I
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 1
# All remaining tests are in the fr_FR locale, so set the default.
#pattern locale=fr_FR
/^[\xc8-\xc9]/i
École
0: É
école
0: é
/^[\xc8-\xc9]/
École
0: É
\= Expect no match
école
No match
/\xb5/i
µ
0: µ
\= Expect no match
\x9c
No match
/ÿ/i
\xff
0: ÿ
\= Expect no match
y
No match
/(.)\1/i
\xfe\xde
0: þÞ
1: þ
/\W+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[\W]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/[^[:alpha:]]+/
>>>\xaa<<<
0: >>>
>>>\xba<<<
0: >>>
/\w+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[\w]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]]+/
>>>\xaa<<<
0: ª
>>>\xba<<<
0: º
/[[:alpha:]][[:lower:]][[:upper:]]/IB
------------------------------------------------------------------
Bra
[A-Za-z\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
[a-z\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf-\xf6\xf8-\xff]
[A-Z\x8a\x8c\x8e\x9f\xc0-\xd6\xd8-\xde]
Ket
End
------------------------------------------------------------------
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
Subject length lower bound = 3
# End of testinput3

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,408 @@
# This set of tests is run only with the 8-bit library. They must not require
# UTF-8 or Unicode property support. */
#forbid_utf
#newline_default lf any anycrlf
/a\xc4\xa3b/
a\N{U+123}b
0: a\xc4\xa3b
\= Expect no match # error message (too big char)
a\x{0123}b
** Character \x{123} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
a\o{00443}b
** Character \x{123} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
a\443b
** Character \x{123} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
/fd bf bf bf bf bf/I,hex
Capture group count = 0
First code unit = \xfd
Last code unit = \xbf
Subject length lower bound = 6
\= Expect warning
\N{U+7fffffff}
** Warning: character \N{U+7fffffff} is greater than 0x10ffff and should not be encoded as UTF-8
0: \xfd\xbf\xbf\xbf\xbf\xbf
\= Expect no match # error message (too big char)
\x{7fffffff}
** Character \x{7fffffff} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
No match
/\x{100}/I
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
/\o{400}/I
Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
/ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional leading comment
(?: (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address
| # or
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # one word, optionally followed by....
(?:
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
\(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) | # comments, or...
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
# quoted strings
)*
< (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # leading <
(?: @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* , (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
)* # further okay, if led by comma
: # closing colon
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* )? # optional route
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address spec
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* > # trailing >
# name and address
) (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
Capture group count = 0
Contains explicit CR or LF match
Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
Subject length lower bound = 3
/\h/I
Capture group count = 0
Starting code units: \x09 \x20 \xa0
Subject length lower bound = 1
/\H/I
Capture group count = 0
Subject length lower bound = 1
/\v/I
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1
/\V/I
Capture group count = 0
Subject length lower bound = 1
/\R/I
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1
/[\h]/B
------------------------------------------------------------------
Bra
[\x09 \xa0]
Ket
End
------------------------------------------------------------------
>\x09<
0: \x09
/[\h]+/B
------------------------------------------------------------------
Bra
[\x09 \xa0]++
Ket
End
------------------------------------------------------------------
>\x09\x20\xa0<
0: \x09 \xa0
/[\v]/B
------------------------------------------------------------------
Bra
[\x0a-\x0d\x85]
Ket
End
------------------------------------------------------------------
/[\H]/B
------------------------------------------------------------------
Bra
[\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff]
Ket
End
------------------------------------------------------------------
/[^\h]/B
------------------------------------------------------------------
Bra
[^\x09 \xa0]
Ket
End
------------------------------------------------------------------
/[\V]/B
------------------------------------------------------------------
Bra
[\x00-\x09\x0e-\x84\x86-\xff]
Ket
End
------------------------------------------------------------------
/[\x0a\V]/B
------------------------------------------------------------------
Bra
[\x00-\x0a\x0e-\x84\x86-\xff]
Ket
End
------------------------------------------------------------------
/\777/I
Failed: error 151 at offset 4: octal value is greater than \377 in 8-bit non-UTF-8 mode
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
Failed: error 177 at offset 7: character code point value in \u.... sequence is too large
/[^\x00-a]{12,}[^b-\xff]*/B
------------------------------------------------------------------
Bra
[^\x00-a]{12,}+
[^b-\xff]*+
Ket
End
------------------------------------------------------------------
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B
------------------------------------------------------------------
Bra
[^\x09-\x0d ]*+
\s*
[0-9A-Z_a-z]++
\W+
[^0-9]*+
\d
0
[^0-9A-Z_a-z]{4,6}+
\w*
A
Ket
End
------------------------------------------------------------------
/(*MARK:a\x{100}b)z/alt_verbnames
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
/(?i:A{1,}\6666666666)/
Failed: error 151 at offset 13: octal value is greater than \377 in 8-bit non-UTF-8 mode
A\x{1b6}6666666
# Should cause an error
/abc/substitute_extended,replace=>\777<
abc
Failed: error -57 at offset 5 in replacement: bad escape sequence in replacement string
# Should cause an error
/abc/substitute_extended,replace=>\o{012345}<
abc
Failed: error -57 at offset 10 in replacement: bad escape sequence in replacement string
/i/turkish_casing
Failed: error 204 at offset 0: PCRE2_EXTRA_TURKISH_CASING require Unicode (UTF or UCP) mode
# End of testinput9

View File

@@ -0,0 +1,206 @@
PCRE2 version 10.32-RC1 2018-02-19
# This is a specialized test for checking, when PCRE2 is compiled with the
# EBCDIC option but in an ASCII environment, that newline, white space, and \c
# functionality is working. It catches cases where explicit values such as 0x0a
# have been used instead of names like CHAR_LF. Needless to say, it is not a
# genuine EBCDIC test! In patterns, alphabetic characters that follow a
# backslash must be in EBCDIC code. In data, NL, NEL, LF, ESC, and DEL must be
# in EBCDIC, but can of course be specified as escapes.
# Test default newline and variations
/^A/m
ABC
0: A
12\x15ABC
0: A
/^A/m,newline=any
12\x15ABC
0: A
12\x0dABC
0: A
12\x0d\x15ABC
0: A
12\x25ABC
0: A
/^A/m,newline=anycrlf
12\x15ABC
0: A
12\x0dABC
0: A
12\x0d\x15ABC
0: A
** Fail
No match
12\x25ABC
No match
# Test \h
/^A\ˆ/
A B
0: A\x20
A\x41B
0: AA
# Test \H
/^A\È/
AB
0: AB
A\x42B
0: AB
** Fail
No match
A B
No match
A\x41B
No match
# Test \R
/^A\Ù/
A\x15B
0: A\x15
A\x0dB
0: A\x0d
A\x25B
0: A\x25
A\x0bB
0: A\x0b
A\x0cB
0: A\x0c
** Fail
No match
A B
No match
# Test \v
/^A\¥/
A\x15B
0: A\x15
A\x0dB
0: A\x0d
A\x25B
0: A\x25
A\x0bB
0: A\x0b
A\x0cB
0: A\x0c
** Fail
No match
A B
No match
# Test \V
/^A\å/
A B
0: A\x20
** Fail
No match
A\x15B
No match
A\x0dB
No match
A\x25B
No match
A\x0bB
No match
A\x0cB
No match
# For repeated items, use an atomic group so that the output is the same
# for DFA matching (otherwise it may show multiple matches).
# Test \h+
/^A(?>\ˆ+)/
A B
0: A\x20
# Test \H+
/^A(?>\È+)/
AB
0: AB
** Fail
No match
A B
No match
# Test \R+
/^A(?>\Ù+)/
A\x15B
0: A\x15
A\x0dB
0: A\x0d
A\x25B
0: A\x25
A\x0bB
0: A\x0b
A\x0cB
0: A\x0c
** Fail
No match
A B
No match
# Test \v+
/^A(?>\¥+)/
A\x15B
0: A\x15
A\x0dB
0: A\x0d
A\x25B
0: A\x25
A\x0bB
0: A\x0b
A\x0cB
0: A\x0c
** Fail
No match
A B
No match
# Test \V+
/^A(?>\å+)/
A B
0: A\x20B
** Fail
No match
A\x15B
No match
A\x0dB
No match
A\x25B
No match
A\x0bB
No match
A\x0cB
No match
# Test \c functionality
/\ƒ@\ƒA\ƒb\ƒC\ƒd\ƒE\ƒf\ƒG\ƒh\ƒI\ƒJ\ƒK\ƒl\ƒm\ƒN\ƒO\ƒp\ƒq\ƒr\ƒS\ƒT\ƒu\ƒV\ƒW\ƒX\ƒy\ƒZ/
\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
0: \x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a
/\ƒ[\ƒ\\ƒ]\ƒ^\ƒ_/
\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
0: \x1b\x1c\x1d\x1e\x1f
/\ƒ?/
A\xffB
0: \xff
/\ƒ&/
Failed: error 168 at offset 3: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f
# End

View File

@@ -0,0 +1,88 @@
#pattern framesize, memory
/abcd/
Memory allocation (code space): 26
Frame size for pcre2_match(): 128
abcd\=memory
malloc 20480
0: abcd
abcd\=find_limits
Minimum heap limit = 1
Minimum match limit = 2
Minimum depth limit = 2
0: abcd
/(((((((((((((((((((((((((((((( (^abc|xyz){1,20}$ ))))))))))))))))))))))))))))))/x
Memory allocation (code space): 1294
Frame size for pcre2_match(): 624
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcX\=memory
malloc 40960
free unremembered block
No match
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcX\=find_limits
Minimum heap limit = 22
Minimum match limit = 37
Minimum depth limit = 35
No match
/ab(cd)/
Memory allocation (code space): 36
Frame size for pcre2_match(): 144
abcd\=memory
0: abcd
1: cd
abcd\=memory,ovector=0
free 40960
free unremembered block
malloc 128
malloc 20480
0: abcd
1: cd
/\[(a)]{1000}/expand,framesize
Memory allocation (code space): 14010
Frame size for pcre2_match(): 16128
\[a]{1000}\=ovector=1
Matched, but too many substrings
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# The heapframes_size option gets pcre2test to show the size of the heapframes
# vector that after pcre2_match() has run. Running a match with ovector=0
# causes the match data block to be freed, thus releasing that vector.
/\[(a)]{1000}/expand,framesize
Memory allocation (code space): 14010
Frame size for pcre2_match(): 16128
\[a]{1000}\=ovector=1,heapframes_size
Matched, but too many substrings
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Heapframes size in match_data: 20643840
/a/heapframes_size,framesize
Memory allocation (code space): 14
Frame size for pcre2_match(): 128
a\=ovector=0
0: a
Heapframes size in match_data: 20480
/a|(b){200}/g,expand,heapframes_size
Memory allocation (code space): 2818
Frame size for pcre2_match(): 144
abacus z\[b]{200}z
0: a
0: a
0: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
1: b
Heapframes size in match_data: 40960
a\=ovector=0
0: a
Heapframes size in match_data: 20480
/(a)/replace=>$1<
Memory allocation (code space): 24
Frame size for pcre2_match(): 144
cat\=heapframes_size
1: c>a<t
Heapframes size in match_data: 20480
# End

Some files were not shown because too many files have changed in this diff Show More