Python str - SyntBlaze

The Python str type is an immutable, ordered sequence of Unicode code points. It is the built-in text sequence type in Python 3, representing text data as a collection of characters where each character maps to a specific Unicode symbol.

Instantiation and Syntax

Strings are instantiated using string literals or the str() constructor. Python supports multiple literal delimiters to accommodate embedded quotes and multi-line text.

# Literal instantiation
single_quoted = 'text'
double_quoted = "text"
triple_quoted = """multi-line
text"""


# Constructor instantiation
from_int = str(1024)
from_bytes = str(b'data', encoding='utf-8')

String literals can be modified using specific prefixes to alter their parsing behavior:

f or F (f-strings): Enables literal string interpolation, evaluating expressions inside {} at runtime.
r or R (Raw strings): Disables escape character processing (e.g., \n is treated as a literal backslash followed by ‘n’).
u or U (Unicode): Legacy prefix from Python 2; in Python 3, all strings are Unicode by default.

# Prefixed literals
raw_str = r"C:\new_folder\test.txt"
f_str = f"Value: {10 * 2}"

Memory and Internals

Because str is immutable, any operation that alters a string allocates a new string object in memory. Under CPython, the str type utilizes a flexible internal representation (introduced in PEP 393). To optimize memory, CPython dynamically selects the most compact internal encoding based on the largest Unicode code point present in the string:

1 byte per character (Latin-1): If all code points are <= U+00FF.
2 bytes per character (UCS-2): If all code points are <= U+FFFF.
4 bytes per character (UCS-4): If any code point is > U+FFFF.

CPython also implements string interning. Short strings that look like identifiers (containing only letters, digits, and underscores) are cached and reused. This means multiple variables assigned the same string literal may point to the exact same memory address, optimizing memory and allowing O(1) pointer comparisons.

Sequence Protocol Mechanics

As a sequence type, str implements the Sequence protocol (__getitem__, __len__, __iter__, __contains__, __add__, __mul__). It supports 0-based indexing, negative indexing, extended slicing, concatenation, and repetition.

s = "developer"


# Indexing
first_char = s[0]      # 'd'
last_char = s[-1]      # 'r'


# Slicing: sequence[start:stop:step]
substring = s[0:3]     # 'dev'
reversed_s = s[::-1]   # 'repoleved'
every_other = s[::2]   # 'dvlpr'


# Sequence operations
length = len(s)        # 9
contains = 'el' in s   # True


# Concatenation and Repetition
concat = "py" + "thon" # 'python'
repeat = "dev" * 3     # 'devdevdev'

Core API and Methods

The str class provides a comprehensive suite of methods for text manipulation. Because of immutability, all methods that modify text return a new str instance. Formatting and Interpolation:

# .format() method
formatted = "Code: {}, Status: {}".format(200, "OK")


# Legacy % formatting
legacy = "Code: %d, Status: %s" % (200, "OK")

Inspection and Validation: Methods returning booleans based on character properties or substrings.

"data.csv".endswith(".csv")  # True
"1048".isdigit()             # True
"text".isalpha()             # True
"hello".isascii()            # True

Search and Replace:

"abracadabra".find("cad")       # Returns index 4; returns -1 if not found
"abracadabra".index("cad")      # Returns index 4; raises ValueError if not found
"abracadabra".count("a")        # Returns 5
"hello".replace("l", "w", 1)    # Returns "hewlo" (replaces max 1 occurrence)

Splitting and Joining:

# Splitting into a list
"a,b,c".split(",")              # ['a', 'b', 'c']
"line1\nline2".splitlines()     # ['line1', 'line2']


# Joining an iterable of strings
"-".join(["a", "b", "c"])       # 'a-b-c'

Case Conversion and Stripping:

"  Text  ".strip()              # "Text" (removes leading/trailing whitespace)
"Python".lower()                # "python"
"Python".upper()                # "PYTHON"
"ß".casefold()                  # "ss" (aggressive lowercasing for caseless matching)

Encoding

The str type interacts with raw binary data (bytes) via the encode() method. This translates the Unicode code points into a specific byte representation (e.g., UTF-8, ASCII).

# str to bytes
byte_data = "data".encode("utf-8")  # b'data'


# bytes to str
string_data = byte_data.decode("utf-8") # 'data'

Tired of Poor Python Skills? Fix That With Deep Grasping!Learn More

Python float Python bool

​Instantiation and Syntax

​Memory and Internals

​Sequence Protocol Mechanics

​Core API and Methods

​Encoding

Instantiation and Syntax

Memory and Internals

Sequence Protocol Mechanics

Core API and Methods

Encoding