initial commit

Signed-off-by: Peter Siegmund <mars3142@noreply.mars3142.dev>
2025-10-31 23:37:30 +01:00
commit bf6b52fd94
9654 changed files with 4035664 additions and 0 deletions
--- a/libs/wxWidgets-3.3.1/docs/doxygen/overviews/string.h
+++ b/libs/wxWidgets-3.3.1/docs/doxygen/overviews/string.h
@@ -0,0 +1,156 @@
+/////////////////////////////////////////////////////////////////////////////
+// Name:        string.h
+// Purpose:     topic overview
+// Author:      wxWidgets team
+// Licence:     wxWindows licence
+/////////////////////////////////////////////////////////////////////////////
+
+/**
+
+@page overview_string wxString Overview
+
+@tableofcontents
+
+wxString is used for all strings in wxWidgets. This class is very similar to
+the standard string class, and is implemented using it, but provides additional
+compatibility functions to allow applications originally written for the much
+older versions of wxWidgets to continue to work with the latest ones.
+
+When writing new code, you're encouraged to use wxString as if it were
+`std::wstring` and use only functions compatible with the standard class.
+
+
+@section overview_string_settings wxString Related Compilation Settings
+
+The main build options affecting wxString are `wxUSE_UNICODE_WCHAR` and
+`wxUSE_UNICODE_UTF8`, exactly one of which must be set to determine whether
+fixed-width `wchar_t` or variable-width `char`-based strings are used
+internally. Please see @ref overview_unicode_support_utf for more information
+about this choice.
+
+The other options all affect the presence, or absence, of various implicit
+conversions provided by this class. By default, wxString can be implicitly
+created from `char*`, `wchar_t*`, `std::string` and `std::wstring` and can be
+implicitly converted to `char*` or `wchar_t*`. This behaviour is convenient
+and compatible with the previous wxWidgets versions, but is dangerous and may
+result in unwanted conversions, please see @ref string_conv for how to disable
+them.
+
+
+@section overview_string_iterating Iterating over wxString
+
+It is possible to iterate over strings using indices, but the recommended way
+to do it is to use iterators, either explicitly:
+
+@code
+wxString s = "hello";
+wxString::const_iterator i;
+for (i = s.begin(); i != s.end(); ++i)
+{
+    wxUniChar uni_ch = *i;
+    // do something with it
+}
+@endcode
+
+or, even simpler, implicitly, using range for loop:
+@code
+wxString s = "hello";
+for ( auto c : s )
+{
+    // do something with "c"
+}
+@endcode
+
+@note wxString iterators have unusual proxy-like semantics and can be used to
+    modify the string even when @e not using references, i.e. with just @c
+    auto, as in the example above.
+
+
+@section overview_string_internal wxString Internal Representation
+
+@note This section can be skipped at first reading and is provided solely for
+informational purposes.
+
+As mentioned above, wxString may use any of @c UTF-16 (under Windows, using
+the native 16 bit @c wchar_t), @c UTF-32 (under Unix, using the native 32
+bit @c wchar_t) or @c UTF-8 (under both Windows and Unix) to store its
+content. By default, @c wchar_t is used under all platforms, but wxWidgets can
+be compiled with <tt>wxUSE_UNICODE_UTF8=1</tt> to use UTF-8 instead.
+
+For simplicity of implementation, wxString uses <em>per code unit indexing</em>
+instead of <em>per code point indexing</em> when using UTF-16, i.e. in the
+default <tt>wxUSE_UNICODE_WCHAR==1</tt> build under Windows and doesn't know
+anything about surrogate pairs. In other words it always considers code points
+to be composed by 1 code unit, while this is really true only for characters in
+the @e BMP (Basic Multilingual Plane), as explained in more details in the @ref
+overview_unicode_encodings section. Thus when iterating over a UTF-16 string
+stored in a wxString under Windows, the user code has to take care of
+<em>surrogate pairs</em> manually if it needs to handle them (note however that
+Windows itself has built-in support for surrogate pairs in UTF-16, such as for
+drawing strings on screen, so nothing special needs to be done when just
+passing strings containing surrogates to wxWidgets functions).
+
+@remarks
+Note that while the behaviour of wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
+resembles UCS-2 encoding, it's not completely correct to refer to wxString as
+UCS-2 encoded since you can encode code points outside the @e BMP in a wxString
+as two code units (i.e. as a surrogate pair; as already mentioned however wxString
+will "see" them as two different code points)
+
+In <tt>wxUSE_UNICODE_UTF8==1</tt> case, wxString handles UTF-8 multi-bytes
+sequences just fine also for characters outside the BMP (it implements <em>per
+code point indexing</em>), so that you can use UTF-8 in a completely transparent
+way:
+
+Example:
+@code
+    // first test, using exotic characters outside of the Unicode BMP:
+
+    wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80");
+        // U+10300 is "OLD ITALIC LETTER A" and is part of Unicode Plane 1
+        // in UTF8 it's encoded as 0xF0 0x90 0x8C 0x80
+
+    // it's a single Unicode code-point encoded as:
+    // - a UTF16 surrogate pair under Windows
+    // - a UTF8 multiple-bytes sequence under Linux
+    // (without considering the final NUL)
+
+    wxPrintf("wxString reports a length of %d character(s)", test.length());
+        // prints "wxString reports a length of 1 character(s)" on Linux
+        // prints "wxString reports a length of 2 character(s)" on Windows
+        // since wxString on Windows doesn't have surrogate pairs support!
+
+
+    // second test, this time using characters part of the Unicode BMP:
+
+    wxString test2 = wxString::FromUTF8("\x41\xC3\xA0\xE2\x82\xAC");
+        // this is the UTF8 encoding of capital letter A followed by
+        // 'small case letter a with grave' followed by the 'euro sign'
+
+    // they are 3 Unicode code-points encoded as:
+    // - 3 UTF16 code units under Windows
+    // - 6 UTF8 code units under Linux
+    // (without considering the final NUL)
+
+    wxPrintf("wxString reports a length of %d character(s)", test2.length());
+        // prints "wxString reports a length of 3 character(s)" on Linux
+        // prints "wxString reports a length of 3 character(s)" on Windows
+@endcode
+
+To better explain what stated above, consider the second string of the example
+above; it's composed by 3 characters and the final @NUL:
+
+@image html overview_wxstring_encoding.png
+
+As you can see, UTF16 encoding is straightforward (for characters in the @e BMP)
+and in this example the UTF16-encoded wxString takes 8 bytes.
+UTF8 encoding is more elaborated and in this example takes 7 bytes.
+
+In general, for strings containing many latin characters UTF8 provides a big
+advantage with regards to the memory footprint respect UTF16, but requires some
+more processing for common operations like e.g. length calculation.
+
+Finally, note that the type used by wxString to store Unicode code units
+(@c wchar_t or @c char) is always @c typedef-ined to be ::wxStringCharType.
+
+*/