Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.syntblaze.com/llms.txt

Use this file to discover all available pages before exploring further.

The float keyword in C designates a single-precision floating-point scalar data type used to represent real numbers with fractional components. The C standard does not strictly mandate a specific binary representation; however, most modern compilers implement float using the IEEE 754 (IEC 60559) 32-bit base-2 format. Strict adherence to this standard is implementation-defined and can be verified at compile time via the __STDC_IEC_559__ macro.

Technical Specifications (Assuming IEEE 754)

  • Memory Size: Typically 4 bytes (32 bits).
  • Precision: 6 to 7 significant decimal digits.
  • Value Range: Approximately ±1.18×1038\pm 1.18 \times 10^{-38} to ±3.4×1038\pm 3.4 \times 10^{38}.
  • Header: Implementation-specific limits and properties are defined in <float.h>.

Syntax and Literals

By default, floating-point literals in C are of type double. To explicitly declare a float literal, you must append the f or F suffix. Omitting the suffix results in an implicit conversion (or demotion) from double to float. This demotion can incur a loss of precision, as the value is rounded to the nearest representable float value based on the active rounding mode.
float uninitialized_var; 
float pi = 3.14159f;         // Standard decimal notation with 'f' suffix
float planck = 6.626e-34f;   // Scientific notation (e or E) with 'f' suffix

// Implicit conversion (demotion from double to float)
float implicit_conversion = 2.718; // 2.718 is a double literal

Memory Representation

When implemented as an IEEE 754 32-bit single-precision float, the memory is divided into three distinct bit fields:
  1. Sign Bit (1 bit): Bit 31. Determines if the number is positive (0) or negative (1).
  2. Exponent (8 bits): Bits 30–23. Uses an offset binary representation (biased by 127) to represent the power of 2.
  3. Mantissa / Fraction (23 bits): Bits 22–0. Represents the significant digits. In normalized numbers, a leading 1 is assumed and not stored, providing an effective 24 bits of precision.

Equality Comparison and Precision Limits

Due to the inherent precision limits and rounding errors of floating-point arithmetic, direct equality comparisons using the == operator are highly unreliable. Two mathematically equivalent calculations may yield slightly different binary representations. Because the distance between representable floating-point numbers scales with their magnitude, using a fixed absolute tolerance (such as FLT_EPSILON) is a well-known anti-pattern for values significantly larger or smaller than 1.0. For instance, the gap between adjacent floats exceeds FLT_EPSILON for values greater than 2.0, causing absolute comparisons to incorrectly evaluate to false for mathematically equivalent calculations. Robust comparisons require checking the relative difference, scaling the epsilon tolerance by the magnitude of the operands, while falling back to an absolute check for values near zero.
#include <math.h>
#include <float.h>
#include <stdbool.h>

bool is_equal(float a, float b) {
    float diff = fabsf(a - b);
    
    // Absolute tolerance check for exact equality or values extremely close to zero
    if (diff <= FLT_EPSILON) {
        return true;
    }
    
    // Relative tolerance check for larger magnitudes
    float abs_a = fabsf(a);
    float abs_b = fabsf(b);
    float largest = (abs_a > abs_b) ? abs_a : abs_b;
    
    // Scale the epsilon tolerance by the largest operand magnitude
    return diff <= largest * FLT_EPSILON; 
}

int main() {
    float sum = 0.1f + 0.2f;
    
    // sum == 0.3f may evaluate to false due to rounding errors
    // is_equal(sum, 0.3f) evaluates to true
    
    return 0;
}

Format Specifiers and Type Promotion

When passing a float to a variadic function like printf, C applies default argument promotions, implicitly converting the float to a double. Consequently, the format specifiers %f, %e, and %g in printf actually expect and consume a double. Conversely, scanf requires exact pointer types because no promotion occurs for pointers. The %f specifier in scanf strictly requires a float*.
#include <stdio.h>

int main() {
    // Using 12.5f, which has an exact binary representation, 
    // to avoid demonstrating unintended precision loss artifacts.
    float val = 12.5f;
    
    // 'val' is implicitly promoted to 'double' via default argument promotion
    printf("%f\n", val);  // Decimal notation: 12.500000
    printf("%e\n", val);  // Scientific notation: 1.250000e+01
    printf("%g\n", val);  // Shortest representation: 12.5
    
    // scanf requires a pointer to float; no promotion occurs
    scanf("%f", &val);    
    
    return 0;
}

Standard Macros (<float.h>)

The C standard library provides macros to query the architectural limits of the float type:
#include <float.h>

float max_val = FLT_MAX;         // Maximum finite representable value
float min_val = FLT_MIN;         // Minimum normalized positive value
float epsilon = FLT_EPSILON;     // Difference between 1.0 and the next representable value
int precision_digits = FLT_DIG;  // Number of decimal digits of guaranteed precision
Master C with Deep Grasping Methodology!Learn More