Float16

Memory Layout
Technical Specifications
Type Conversion and Arithmetic
Hardware Architecture Dependency

Float16 is a half-precision, 16-bit binary floating-point type in Swift that conforms to the IEEE 754 standard. It represents real numbers using a highly compact memory footprint of exactly two bytes, trading mathematical precision and dynamic range for reduced memory consumption.

Memory Layout

Under the IEEE 754 standard for binary16, the 16 bits of a Float16 are allocated as follows:

Sign bit: 1 bit (determines positive or negative).
Exponent: 5 bits (determines the magnitude, with a bias of 15).
Significand (Fraction): 10 bits (stores the significant digits). Because normal numbers have an implicit leading 1, it effectively provides 11 bits of precision.

Technical Specifications

Due to its constrained bit-width, Float16 has strict mathematical boundaries:

Maximum finite magnitude: 65504.0
Minimum positive normal magnitude: 2^-14 (approximately 0.000061035)
Decimal precision: Approximately 3.3 decimal digits.

let halfPrecision: Float16 = 3.14

// Inspecting IEEE 754 boundaries
let maxFinite = Float16.greatestFiniteMagnitude // 65504.0
let minNormal = Float16.leastNormalMagnitude    // 0.000061035156
let minNonzero = Float16.leastNonzeroMagnitude  // 0.000000059604645
let machineEpsilon = Float16.ulpOfOne           // 0.0009765625

Type Conversion and Arithmetic

Swift enforces strict type safety and does not implicitly promote or demote floating-point types. Arithmetic operations combining Float16 with Float (32-bit) or Double (64-bit) require explicit initialization. When converting from a higher-precision type to Float16, Swift rounds the value to the nearest representable Float16 value according to the default IEEE 754 rounding mode (round to nearest, ties to even). If the source value exceeds 65504.0, it resolves to Float16.infinity.

let doubleValue: Double = 70000.5
let floatValue: Float = 3.14159265

// Explicit downcasting
let halfFromDouble = Float16(doubleValue) // Evaluates to +Inf (overflow)
let halfFromFloat = Float16(floatValue)   // Evaluates to 3.14 (precision truncated)

// Arithmetic requires matching types
let a: Float16 = 5.0
let b: Float = 10.0
// let result = a + b // Compiler error: Binary operator '+' cannot be applied to operands of type 'Float16' and 'Float'
let result = a + Float16(b) 

Hardware Architecture Dependency

The performance characteristics of Float16 are strictly tied to the underlying instruction set architecture (ISA).

ARM Architecture: On Apple Silicon (M-series) and A11 Bionic or newer, Float16 operations are executed natively in hardware via the ARMv8.2-A FP16 extension, yielding single-cycle arithmetic instructions.
x86_64 Architecture: On Intel-based Macs, hardware support for native half-precision arithmetic is generally absent. The Swift compiler and LLVM backend handle Float16 by emitting instructions that promote the 16-bit values to 32-bit Float registers for computation, and then truncate them back to 16 bits for memory storage. This software emulation incurs a computational overhead.

Master Swift with Deep Grasping Methodology!Learn More

Float Int8

Methods

Initializers

Closures

Type Declarations

Properties

Generics

Optional Binding

Patterns

Extensions

Error Handling

Attributes

Subscripts

Parameters

Constants & Variables

Conditionals

Loops

Collection Types

Basic Types

Numeric Types

Imports

Comments

Optional Operators

Arithmetic Operators

Overflow Operators

Assignment Operators

Bitwise Operators

Comparison Operators

Conditional Operators

Logical Operators

Range Operators

Type-Casting Operators

Memory Layout

Technical Specifications

Type Conversion and Arithmetic

Hardware Architecture Dependency

Methods

Initializers

Closures

Type Declarations

Properties

Generics

Optional Binding

Patterns

Extensions

Error Handling

Attributes

Subscripts

Parameters

Constants & Variables

Conditionals

Loops

Collection Types

Basic Types

Numeric Types

Imports

Comments

Optional Operators

Arithmetic Operators

Overflow Operators

Assignment Operators

Bitwise Operators

Comparison Operators

Conditional Operators

Logical Operators

Range Operators

Type-Casting Operators

Documentation Index

​Memory Layout

​Technical Specifications

​Type Conversion and Arithmetic

​Hardware Architecture Dependency

Memory Layout

Technical Specifications

Type Conversion and Arithmetic

Hardware Architecture Dependency