[wpiutil] Fix DynamicStruct string handling (#6253)

Dynamic structs had a few major issues.

In C++, if the string was the last definition in the schema, attempting to set a string would trigger an assertion. This has been fixed

Setting a string value could truncate the string actually stored in the struct, if the definition was shorter than the string to set.
There was no way to detect if this case occurred. The set string function now returns a bool if the string was fully written or not.

Reading a string that had a value shorter than the schema definition would result in embedded trailing nulls in the string. This would make comparing string equality basically impossible, as those embedded nulls count for the length of the string.

The above truncating didn't take into account UTF8 code points. This means a truncation could happen in the middle of a unicode character. Depending on the language this had different behavior, but unpaired code points are problematic to detect in any case. On the decoding side, detect if a split UTF8 code point has occurred by the writer, and if so just ignore it and treat it as not part of the string. Doing this on the receive side means a newer receive side is all that is needed to fix this, which is generally a better option then requiring all senders to update.

Actual DynamicStruct instances have 0 units tests for them. Added a bunch of unit tests around strings to ensure things work properly.
This commit is contained in:
Thad House
2024-01-19 22:24:54 -08:00
committed by GitHub
parent 4b15c73f64
commit 0e5eb3f35c
6 changed files with 413 additions and 13 deletions

View File

@@ -349,16 +349,75 @@ void MutableDynamicStruct::SetData(std::span<const uint8_t> data) {
std::copy(data.begin(), data.begin() + m_desc->GetSize(), m_data.begin());
}
void MutableDynamicStruct::SetStringField(const StructFieldDescriptor* field,
std::string_view DynamicStruct::GetStringField(
const StructFieldDescriptor* field) const {
assert(field->m_type == StructFieldType::kChar);
assert(field->m_parent == m_desc);
assert(m_desc->IsValid());
// Find last non zero character
size_t stringLength;
for (stringLength = field->m_arraySize; stringLength > 0; stringLength--) {
if (m_data[field->m_offset + stringLength - 1] != 0) {
break;
}
}
// If string is all zeroes, its empty and return an empty string.
if (stringLength == 0) {
return "";
}
// Check if the end of the string is in the middle of a continuation byte or
// not.
if ((m_data[field->m_offset + stringLength - 1] & 0x80) != 0) {
// This is a UTF8 continuation byte. Make sure its valid.
// Walk back until initial byte is found
size_t utf8StartByte = stringLength;
for (; utf8StartByte > 0; utf8StartByte--) {
if ((m_data[field->m_offset + utf8StartByte - 1] & 0x40) != 0) {
// Having 2nd bit set means start byte
break;
}
}
if (utf8StartByte == 0) {
// This case means string only contains continuation bytes
return "";
}
utf8StartByte--;
// Check if its a 2, 3, or 4 byte
uint8_t checkByte = m_data[field->m_offset + utf8StartByte];
if ((checkByte & 0xE0) == 0xC0) {
// 2 byte, need 1 more byte
if (utf8StartByte != stringLength - 2) {
stringLength = utf8StartByte;
}
} else if ((checkByte & 0xF0) == 0xE0) {
// 3 byte, need 2 more bytes
if (utf8StartByte != stringLength - 3) {
stringLength = utf8StartByte;
}
} else if ((checkByte & 0xF8) == 0xF0) {
// 4 byte, need 3 more bytes
if (utf8StartByte != stringLength - 4) {
stringLength = utf8StartByte;
}
}
// If we get here, the string is either completely garbage or fine.
}
return {reinterpret_cast<const char*>(&m_data[field->m_offset]),
stringLength};
}
bool MutableDynamicStruct::SetStringField(const StructFieldDescriptor* field,
std::string_view value) {
assert(field->m_type == StructFieldType::kChar);
assert(field->m_parent == m_desc);
assert(m_desc->IsValid());
size_t len = (std::min)(field->m_arraySize, value.size());
bool copiedFull = len == value.size();
std::copy(value.begin(), value.begin() + len,
reinterpret_cast<char*>(&m_data[field->m_offset]));
std::fill(&m_data[field->m_offset + len],
&m_data[field->m_offset + field->m_arraySize], 0);
auto toFill = m_data.subspan(field->m_offset + len, field->m_arraySize - len);
std::fill(toFill.begin(), toFill.end(), 0);
return copiedFull;
}
void MutableDynamicStruct::SetStructField(const StructFieldDescriptor* field,