String type

Type Definition

String

UTF-8 encoded text
Length-prefixed format
Variable size encoding
Zero-copy decoding

Encoding Format

Structure:

[Length_Data: Int][UTF-8 bytes]

Implementation Details

Memory Layout

Length prefix: Variable size (GEneral Int encoding)
UTF-8 bytes: Contiguous
No padding or alignment
Direct buffer access

UTF-8 Handling

Strict UTF-8 validation
Proper code point handling
Surrogate pair support
Invalid sequence detection

Error Handling

Common error cases:

Invalid UTF-8:

try:
    return bytes.decode('utf-8')
except UnicodeError as e:
    raise DecodeError(f"Invalid UTF-8: {e}")

Buffer overflow:

if len(buffer) - offset < needed_size:
    raise BufferError(f"Buffer too small: need {needed_size} bytes")

Length mismatch:

if len(utf8_bytes) != length:
    raise ValueError(f"Length mismatch: expected {length}, got {len(utf8_bytes)}")

Examples

Basic Usage

from jam.types.base.string import String

# Create and encode
text = String("hello")
encoded = text.encode()
# -> [05 68 65 6C 6C 6F]
#    len=5, "hello"

# Decode
decoded = String.decode(encoded)
assert decoded == "hello"

API Reference

Classes

class jam.types.base.string.String(value: str)[source]

Bases: Codable, JsonSerde

UTF-8 encoded string type that implements the Codable interface.

Examples

>>> s = String("Hello")
>>> str(s)
'Hello'
>>> len(s)
5
>>> s.encode()
b'\x05Hello'  # Length prefix followed by UTF-8 bytes

Note

String length is measured in UTF-16 code units, which means some Unicode characters (like emojis) may count as 2 units. This matches Python’s string length behavior.

__init__(value: str)[source]

Initialize a string.

Parameters:: value – Python string value
Raises:: TypeError – If value is not a str

__str__() → str[source]: Convert to str.

__len__() → int[source]: Get string length in UTF-16 code units.

__getitem__(index: int | slice) → str[source]: Get character(s) at index or slice.

__contains__(item: str) → bool[source]: Check if string contains substring.

__eq__(other: Any) → bool[source]: Compare for equality.

__hash__() → int[source]: Make hashable.

__add__(other: String | str) → String[source]: Concatenate strings.

__repr__() → str[source]: Get string representation.

static decode_from(buffer: bytes | bytearray | memoryview, offset: int = 0) → Tuple[String, int][source]

Decode a String from a buffer.

Parameters:

buffer – Bytes to decode from
offset – Starting position in buffer

Returns:

Tuple of (String instance, bytes read)

Raises:

ValueError – If buffer is too short
UnicodeDecodeError – If buffer contains invalid UTF-8