Go string - SyntBlaze

A string in Go is an immutable, read-only sequence of arbitrary bytes. While conventionally used to hold UTF-8 encoded text, a string at its core is strictly a sequence of 8-bit bytes (uint8) and does not inherently guarantee valid Unicode.

Internal Representation and Slicing

At runtime, a string is represented as a two-word data structure. It consists of a pointer to the underlying backing array and an integer tracking the length. Conceptually, it mirrors this struct:

type StringHeader struct {
    Data uintptr // Pointer to the underlying immutable byte array
    Len  int     // Number of bytes (not characters)
}

Because strings share this lightweight descriptor, passing strings to functions or assigning them to new variables is highly efficient. It only copies the pointer and the length, not the underlying byte array. This efficiency also applies to string slicing (e.g., s[start:end]). Slicing a string creates a new string header that points to a different offset within the same backing array. It adjusts the length accordingly without allocating new memory or copying the underlying bytes.

s := "hello world"
sub := s[0:5] // "hello" - shares the same backing array as 's'

Immutability

Once a string is created, its contents cannot be altered. Attempting to reassign a value at a specific index results in a compile-time error.

s := "hello"
// s[0] = 'H' // Compile-time error: cannot assign to s[0]

To perform mutations, the string must be explicitly converted to a mutable slice ([]byte or []rune), modified, and then converted back to a string. This process allocates new memory and copies the data.

s := "hello"
b := []byte(s)     // Allocates and copies data
b[0] = 'H'         // Mutates the byte slice
s = string(b)      // Allocates and copies back to a new string

String Literals

Go provides two syntaxes for declaring string literals:

Interpreted String Literals: Enclosed in double quotes ("..."). These evaluate standard escape sequences (e.g., \n, \t, \xNN for hex bytes, \uNNNN for Unicode code points).
Raw String Literals: Enclosed in backticks (`...`). These ignore all escape sequences, treat backslashes as literal characters, and can span multiple lines.

interpreted := "Line 1\nLine 2\t\x41"
raw := `Line 1\n
Line 2\t\x41`

Indexing and Length

Because strings are byte sequences, the built-in len() function returns the number of bytes, not the number of Unicode characters. Similarly, indexing into a string yields the raw byte (uint8) at that memory offset.

s := "résumé"
fmt.Println(len(s)) // Outputs 8 (the 'é' takes 2 bytes in UTF-8)

// Indexing returns a byte (uint8), not a character
fmt.Printf("%x\n", s[1]) // Outputs c3 (first byte of 'é')

To determine the actual number of Unicode characters, you must use the unicode/utf8 package:

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    count := utf8.RuneCountInString("résumé")
    fmt.Println(count) // Outputs 6
}

Iteration and Runes

To handle multi-byte Unicode characters correctly, Go uses the rune type, which is an alias for int32 representing a single Unicode code point. When you iterate over a string using a for...range loop, Go implicitly decodes the UTF-8 sequence on the fly. It yields the starting byte index and the decoded rune. If the loop encounters an invalid UTF-8 byte sequence, it yields the Unicode replacement character (\uFFFD, also defined as utf8.RuneError) and advances the loop by exactly one byte.

s := "résumé"

for index, char := range s {
    // 'index' is the byte offset
    // 'char' is the decoded rune (int32)
    fmt.Printf("Byte index: %d, Rune: %c\n", index, char)
}

If you iterate using a standard for loop with an index, you will iterate strictly byte-by-byte, which will fracture multi-byte Unicode characters.

// Byte-level iteration (fractures UTF-8)
for i := 0; i < len(s); i++ {
    fmt.Printf("%x ", s[i])
}

Tired of Poor Go Skills? Fix That With Deep Grasping!Learn More

Go complex128 Go byte

​Internal Representation and Slicing

​Immutability

​String Literals

​Indexing and Length

​Iteration and Runes

Internal Representation and Slicing

Immutability

String Literals

Indexing and Length

Iteration and Runes