initial commit
Signed-off-by: Peter Siegmund <mars3142@noreply.mars3142.dev>
This commit is contained in:
156
libs/wxWidgets-3.3.1/docs/doxygen/overviews/string.h
Normal file
156
libs/wxWidgets-3.3.1/docs/doxygen/overviews/string.h
Normal file
@@ -0,0 +1,156 @@
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
// Name: string.h
|
||||
// Purpose: topic overview
|
||||
// Author: wxWidgets team
|
||||
// Licence: wxWindows licence
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
/**
|
||||
|
||||
@page overview_string wxString Overview
|
||||
|
||||
@tableofcontents
|
||||
|
||||
wxString is used for all strings in wxWidgets. This class is very similar to
|
||||
the standard string class, and is implemented using it, but provides additional
|
||||
compatibility functions to allow applications originally written for the much
|
||||
older versions of wxWidgets to continue to work with the latest ones.
|
||||
|
||||
When writing new code, you're encouraged to use wxString as if it were
|
||||
`std::wstring` and use only functions compatible with the standard class.
|
||||
|
||||
|
||||
@section overview_string_settings wxString Related Compilation Settings
|
||||
|
||||
The main build options affecting wxString are `wxUSE_UNICODE_WCHAR` and
|
||||
`wxUSE_UNICODE_UTF8`, exactly one of which must be set to determine whether
|
||||
fixed-width `wchar_t` or variable-width `char`-based strings are used
|
||||
internally. Please see @ref overview_unicode_support_utf for more information
|
||||
about this choice.
|
||||
|
||||
The other options all affect the presence, or absence, of various implicit
|
||||
conversions provided by this class. By default, wxString can be implicitly
|
||||
created from `char*`, `wchar_t*`, `std::string` and `std::wstring` and can be
|
||||
implicitly converted to `char*` or `wchar_t*`. This behaviour is convenient
|
||||
and compatible with the previous wxWidgets versions, but is dangerous and may
|
||||
result in unwanted conversions, please see @ref string_conv for how to disable
|
||||
them.
|
||||
|
||||
|
||||
@section overview_string_iterating Iterating over wxString
|
||||
|
||||
It is possible to iterate over strings using indices, but the recommended way
|
||||
to do it is to use iterators, either explicitly:
|
||||
|
||||
@code
|
||||
wxString s = "hello";
|
||||
wxString::const_iterator i;
|
||||
for (i = s.begin(); i != s.end(); ++i)
|
||||
{
|
||||
wxUniChar uni_ch = *i;
|
||||
// do something with it
|
||||
}
|
||||
@endcode
|
||||
|
||||
or, even simpler, implicitly, using range for loop:
|
||||
@code
|
||||
wxString s = "hello";
|
||||
for ( auto c : s )
|
||||
{
|
||||
// do something with "c"
|
||||
}
|
||||
@endcode
|
||||
|
||||
@note wxString iterators have unusual proxy-like semantics and can be used to
|
||||
modify the string even when @e not using references, i.e. with just @c
|
||||
auto, as in the example above.
|
||||
|
||||
|
||||
@section overview_string_internal wxString Internal Representation
|
||||
|
||||
@note This section can be skipped at first reading and is provided solely for
|
||||
informational purposes.
|
||||
|
||||
As mentioned above, wxString may use any of @c UTF-16 (under Windows, using
|
||||
the native 16 bit @c wchar_t), @c UTF-32 (under Unix, using the native 32
|
||||
bit @c wchar_t) or @c UTF-8 (under both Windows and Unix) to store its
|
||||
content. By default, @c wchar_t is used under all platforms, but wxWidgets can
|
||||
be compiled with <tt>wxUSE_UNICODE_UTF8=1</tt> to use UTF-8 instead.
|
||||
|
||||
For simplicity of implementation, wxString uses <em>per code unit indexing</em>
|
||||
instead of <em>per code point indexing</em> when using UTF-16, i.e. in the
|
||||
default <tt>wxUSE_UNICODE_WCHAR==1</tt> build under Windows and doesn't know
|
||||
anything about surrogate pairs. In other words it always considers code points
|
||||
to be composed by 1 code unit, while this is really true only for characters in
|
||||
the @e BMP (Basic Multilingual Plane), as explained in more details in the @ref
|
||||
overview_unicode_encodings section. Thus when iterating over a UTF-16 string
|
||||
stored in a wxString under Windows, the user code has to take care of
|
||||
<em>surrogate pairs</em> manually if it needs to handle them (note however that
|
||||
Windows itself has built-in support for surrogate pairs in UTF-16, such as for
|
||||
drawing strings on screen, so nothing special needs to be done when just
|
||||
passing strings containing surrogates to wxWidgets functions).
|
||||
|
||||
@remarks
|
||||
Note that while the behaviour of wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
|
||||
resembles UCS-2 encoding, it's not completely correct to refer to wxString as
|
||||
UCS-2 encoded since you can encode code points outside the @e BMP in a wxString
|
||||
as two code units (i.e. as a surrogate pair; as already mentioned however wxString
|
||||
will "see" them as two different code points)
|
||||
|
||||
In <tt>wxUSE_UNICODE_UTF8==1</tt> case, wxString handles UTF-8 multi-bytes
|
||||
sequences just fine also for characters outside the BMP (it implements <em>per
|
||||
code point indexing</em>), so that you can use UTF-8 in a completely transparent
|
||||
way:
|
||||
|
||||
Example:
|
||||
@code
|
||||
// first test, using exotic characters outside of the Unicode BMP:
|
||||
|
||||
wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80");
|
||||
// U+10300 is "OLD ITALIC LETTER A" and is part of Unicode Plane 1
|
||||
// in UTF8 it's encoded as 0xF0 0x90 0x8C 0x80
|
||||
|
||||
// it's a single Unicode code-point encoded as:
|
||||
// - a UTF16 surrogate pair under Windows
|
||||
// - a UTF8 multiple-bytes sequence under Linux
|
||||
// (without considering the final NUL)
|
||||
|
||||
wxPrintf("wxString reports a length of %d character(s)", test.length());
|
||||
// prints "wxString reports a length of 1 character(s)" on Linux
|
||||
// prints "wxString reports a length of 2 character(s)" on Windows
|
||||
// since wxString on Windows doesn't have surrogate pairs support!
|
||||
|
||||
|
||||
// second test, this time using characters part of the Unicode BMP:
|
||||
|
||||
wxString test2 = wxString::FromUTF8("\x41\xC3\xA0\xE2\x82\xAC");
|
||||
// this is the UTF8 encoding of capital letter A followed by
|
||||
// 'small case letter a with grave' followed by the 'euro sign'
|
||||
|
||||
// they are 3 Unicode code-points encoded as:
|
||||
// - 3 UTF16 code units under Windows
|
||||
// - 6 UTF8 code units under Linux
|
||||
// (without considering the final NUL)
|
||||
|
||||
wxPrintf("wxString reports a length of %d character(s)", test2.length());
|
||||
// prints "wxString reports a length of 3 character(s)" on Linux
|
||||
// prints "wxString reports a length of 3 character(s)" on Windows
|
||||
@endcode
|
||||
|
||||
To better explain what stated above, consider the second string of the example
|
||||
above; it's composed by 3 characters and the final @NUL:
|
||||
|
||||
@image html overview_wxstring_encoding.png
|
||||
|
||||
As you can see, UTF16 encoding is straightforward (for characters in the @e BMP)
|
||||
and in this example the UTF16-encoded wxString takes 8 bytes.
|
||||
UTF8 encoding is more elaborated and in this example takes 7 bytes.
|
||||
|
||||
In general, for strings containing many latin characters UTF8 provides a big
|
||||
advantage with regards to the memory footprint respect UTF16, but requires some
|
||||
more processing for common operations like e.g. length calculation.
|
||||
|
||||
Finally, note that the type used by wxString to store Unicode code units
|
||||
(@c wchar_t or @c char) is always @c typedef-ined to be ::wxStringCharType.
|
||||
|
||||
*/
|
||||
Reference in New Issue
Block a user