Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.syntblaze.com/llms.txt

Use this file to discover all available pages before exploring further.

char8_t is a fundamental character type introduced in C++20 specifically designed to represent UTF-8 encoded code units. It provides strict type safety for UTF-8 data at the compiler level, distinguishing it from standard narrow character types (char, signed char, and unsigned char).

Type Properties

  • Size: sizeof(char8_t) is always 1 byte.
  • Underlying Type: It shares the exact size, signedness, and alignment requirements of unsigned char.
  • Distinct Type: Unlike int8_t or uint8_t (which are typically typedef aliases), char8_t is a distinct built-in type. This allows it to participate uniquely in function overload resolution and template specialization.

Syntax and Literals

The u8 prefix dictates that a character or string literal is of type char8_t or an array of const char8_t, respectively.
// Character literal (must represent a single 7-bit ASCII/UTF-8 code unit)
char8_t c = u8'A'; 

// String literal (array of const char8_t)
const char8_t* str = u8"Hello, UTF-8";

// Array initialization
char8_t arr[] = u8"Data";

Standard Library Aliases

The C++ Standard Library provides specific aliases for strings and string views utilizing char8_t as the underlying character type:
#include <string>
#include <string_view>

// Alias for std::basic_string<char8_t>
std::u8string u8_str = u8"Dynamic UTF-8 string";

// Alias for std::basic_string_view<char8_t>
std::u8string_view u8_view = u8"UTF-8 string view";

Type Safety and Conversions

Because char8_t is a distinct type, the strict type system prevents accidental mixing of UTF-8 data with raw bytes or locally-encoded strings by disabling implicit conversions.
const char* narrow_str = "Standard string";
const char8_t* utf8_str = u8"UTF-8 string";

// Compilation Error: No implicit conversion between char* and char8_t*
// narrow_str = utf8_str; 
// utf8_str = narrow_str;

// Explicit cast is required to bypass the type system
narrow_str = reinterpret_cast<const char*>(utf8_str);
Integral promotion rules apply to char8_t identically to unsigned char. When used in arithmetic or bitwise operations, a char8_t value is implicitly promoted to int (or unsigned int if int cannot represent the entire range of the underlying type on the target architecture).

Standard Stream I/O

To prevent mojibake (corrupted text rendering caused by mismatched encodings), C++20 restricts direct streaming of UTF-8 types to standard character streams like std::cout. The standard explicitly deletes the stream insertion operator (operator<<) for char8_t characters and pointers (e.g., const char8_t*). This explicit deletion prevents the compiler from falling back to implicit conversions that would match other overloads, such as void* or integral types. Conversely, for std::u8string, the standard simply does not provide an operator<< overload for std::ostream (std::basic_ostream<char>). Attempting to stream it results in a template deduction failure rather than a deleted function error.
#include <iostream>
#include <string>

const char8_t* utf8_str = u8"UTF-8 string";
std::u8string u8_string = u8"Dynamic UTF-8 string";

// Compilation Error: use of deleted function 'operator<<(basic_ostream&, const char8_t*)'
// std::cout << utf8_str;

// Compilation Error: no match for 'operator<<' (template deduction failure)
// std::cout << u8_string;

// Workaround: Explicit casting is required to stream the underlying bytes
std::cout << reinterpret_cast<const char*>(utf8_str);
Master C++ with Deep Grasping Methodology!Learn More