This change improves build consistency across external projects integrated
through CMake by ensuring that compiler flags defined in configuration files
are passed correctly to the toolchain. It covers the majority of use cases,
as external projects are typically also CMake-based. For projects that use
a custom build system, users will still need to specify the required flags
manually.
Currently, the toolchain CMake files use the remove_duplicated_flags
function from utilities.cmake. The cmakev2 implementation also includes
this function for backward compatibility. Move the
remove_duplicated_flags function to a separate file,
deduplicate_flags.cmake, so it can be shared between cmakev1 and
cmakev2.
Signed-off-by: Frantisek Hrbata <frantisek.hrbata@espressif.com>
The toolchain files are using the remove_duplicated_flags function from
utilities.cmake. However, we want to avoid mixing utilities from cmakev1
and cmakev2. Use `IDF_BUILD_VER_TAG` to include utilities from the
currently used build system version.
Signed-off-by: Frantisek Hrbata <frantisek.hrbata@espressif.com>
The `-mtune=esp-base` option is identical to the default tuning profile,
except that `slow_unaligned_access` is set to false.
This reduces the instruction count for built-in `memcpy` and improves
performance, since our chips can handle misaligned access with minimal
penalty (without triggering exceptions).
Example:
void load(uint32_t *r, char* x) {
memcpy(r, x, sizeof(uint32_t));
}
void store(char* x, uint32_t v) {
memcpy(x, &v, sizeof(uint32_t));
}
Previously generated code:
load:
lbu a5,2(a1)
lbu a3,0(a1)
lbu a4,1(a1)
sb a5,2(a0)
sb a3,0(a0)
sb a4,1(a0)
lbu a5,3(a1)
sb a5,3(a0)
ret
store:
srli a3,a1,8
srli a4,a1,16
srli a5,a1,24
addi sp,sp,-16
sb a1,0(a0)
sb a3,1(a0)
sb a4,2(a0)
sb a5,3(a0)
addi sp,sp,16
jr ra
With `-mtune=esp-base`:
load:
lw a5,0(a1)
sw a5,0(a0)
ret
store:
sw a1,0(a0)
ret
Inlining behavior
=================
Without `-mtune=esp-base`:
- `memcpy()` is inlined only when the compile-time size is ≤ 12 bytes.
- Maximum cost: ~25 instructions
With `-mtune=esp-base`:
- `memcpy()` is inlined for all compile-time constant sizes.
- Maximum cost: ~14 instructions
As a result, some applications may see reduced code size, while others
may increase slightly. However, performance always improves because
extra `memcpy` calls are eliminated.
Performance results
===================
esp32p4 (Ethernet iperf):
- No noticeable difference
esp32c61 (Wi-Fi iperf):
- ~2 Mb/s increase for TCP and UDP TX (may be within measurement error)
NOTE
====
Applies only to RISC-V chips that do not have the hardware issue marked
by the SOC_CPU_MISALIGNED_ACCESS_ON_PMP_MISMATCH_ISSUE macro.