计算机存储浮点数
计算机存储浮点数
存储方式
A computer stores floating-point numbers using a standardized format called IEEE 754. This format is designed to represent real numbers in a way that balances range and precision. Here's how it works:
Basic Structure of IEEE 754 Floating-Point Numbers
A floating-point number in a computer is typically represented by three components:
- Sign bit (S): This determines whether the number is positive (0) or negative (1).
- Exponent (E): This stores the exponent value, which determines the range of the number (i.e., how large or small it can be).
- Mantissa (or Significand) (M): This holds the significant digits of the number, representing its precision.
The general formula for a floating-point number is:
Where:
- Sis the sign bit (0 for positive, 1 for negative).
- Mis the mantissa (or significand), typically in normalized form (starting with a leading 1).
- Eis the exponent, adjusted by a bias.
Common Floating-Point Formats
The two most common floating-point formats are single precision (32-bit) and double precision (64-bit).
1. Single Precision (32-bit Floating-Point):
- 1 bit for sign (S)
- 8 bits for exponent (E)
- 23 bits for mantissa (M)
A 32-bit floating-point number has the following layout:
| S |  E (8 bits)  |       M (23 bits)         |
- Range of exponent: The exponent is stored with a bias of 127 (i.e., ( E - 127 )), meaning the actual exponent is calculated as E - 127.
- Mantissa: The 23 bits store the fractional part. The number is assumed to have a leading 1.(known as implicit leading 1), which is not stored explicitly. For example, a mantissa of001would be interpreted as1.001.
2. Double Precision (64-bit Floating-Point):
- 1 bit for sign (S)
- 11 bits for exponent (E)
- 52 bits for mantissa (M)
A 64-bit floating-point number has the following layout:
| S |      E (11 bits)     |               M (52 bits)               |
- Range of exponent: The exponent is stored with a bias of 1023 (i.e., ( E - 1023 )).
- Mantissa: The 52 bits store the fractional part, with an implicit leading 1.
Example of Single-Precision Float Representation
Suppose we want to store the number -6.75 as a 32-bit float:
- 
Convert to binary: - 6.75in decimal is- 110.11in binary (- 6 = 110, and- .75 = .11in binary).
 
- 
Normalize the number: - In scientific notation, this is ( -1.1011 \times 2^2 ). This shows the sign bit is 1, the exponent is2, and the mantissa is1.1011.
 
- In scientific notation, this is ( -1.1011 \times 2^2 ). This shows the sign bit is 
- 
Set the components: - Sign bit: 1(since the number is negative)
- Exponent: 2 + 127 = 129in decimal, which is10000001in binary.
- Mantissa: The 1.is implicit, so we only store1011, padded to 23 bits:10110000000000000000000.
 
- Sign bit: 
Thus, the 32-bit representation of -6.75 is:
1 10000001 10110000000000000000000
Precision and Limitations
- Precision: The more bits in the mantissa, the more precise the number. Single-precision floats are accurate to about 7 decimal digits, while double-precision floats are accurate to about 15-16 decimal digits.
- Range: The exponent allows floating-point numbers to represent a vast range, from very small numbers (close to zero) to very large ones.
Special Values
IEEE 754 also defines special cases:
- Zero: Represented by all bits in the exponent and mantissa being zero.
- Infinity: Represented by all bits in the exponent being 1, and the mantissa being all 0.
- NaN (Not a Number): Represented by all bits in the exponent being 1, and the mantissa containing non-zero bits.
Summary
- Floating-point numbers are stored in three parts: sign, exponent, and mantissa.
- Single precision uses 32 bits, while double precision uses 64 bits.
- The IEEE 754 standard defines how these components are laid out and how the numbers are calculated, enabling computers to store a wide range of real numbers with a trade-off between precision and range.
bias的取法
The concept of bias in the exponent of floating-point numbers (as per the IEEE 754 standard) is used to store both positive and negative exponents without needing an explicit sign bit for the exponent. The bias enables the use of unsigned integers to represent the exponent, which simplifies comparisons and arithmetic operations on floating-point numbers.
Why Do We Use Bias?
In the IEEE 754 format, the exponent is stored as an unsigned integer, but the actual exponent needs to represent both positive and negative values. For example, in a base-2 floating-point number like (1.25 \times 2^3) or (1.25 \times 2^{-2}), the exponent can be positive or negative. The bias allows the system to shift the range of exponents into the realm of positive integers, enabling the use of simple unsigned integers to represent both negative and positive exponents.
For example, if we want to store the exponent (E = -2), we can't store a negative number directly in an unsigned field. Instead, we bias the exponent by a constant so that both negative and positive exponents are represented as positive integers.
What Is the Bias in IEEE 754?
- 
For single precision (32-bit): The exponent field is 8 bits wide, so it can store values from 0 to 255. The bias for single precision is 127. This means: 
 [
 \text{Exponent (E)} = \text{Stored Value} - 127
 ]- A stored value of 127 represents an actual exponent of (0) (i.e., ( E = 127 - 127 = 0 )).
- A stored value of 128 represents an actual exponent of (1) (i.e., ( E = 128 - 127 = 1 )).
- A stored value of 126 represents an actual exponent of (-1) (i.e., ( E = 126 - 127 = -1 )).
 
- 
For double precision (64-bit): The exponent field is 11 bits wide, and the bias is 1023. This means: 
 [
 \text{Exponent (E)} = \text{Stored Value} - 1023
 ]- A stored value of 1023 represents an actual exponent of (0).
- A stored value of 1024 represents an actual exponent of (1).
- A stored value of 1022 represents an actual exponent of (-1).
 
Why Is the Bias 127 in Single Precision?
The bias is typically chosen to be halfway through the range of representable exponents, which centers the range of exponents at zero. The range of the unsigned 8-bit exponent is 0 to 255, so the bias is chosen as:
[
\text{Bias} = \frac{2^8}{2} - 1 = 127
]
This gives us an exponent range from:
- Minimum exponent: ( 0 - 127 = -127 )
- Maximum exponent: ( 255 - 127 = 128 )
Thus, using a bias of 127 enables the exponent to range from (-127) to (+128), allowing both very small and very large numbers to be represented.
Example of Exponent with Bias
Consider the floating-point number (6.75) represented in IEEE 754 single precision:
- Convert (6.75) to binary: (6.75 = 110.11_2 = 1.1011_2 \times 2^2).
- The actual exponent (E) is 2.
- Store the biased exponent: (E + 127 = 2 + 127 = 129), which is (10000001_2).
Thus, the biased exponent is stored as 129 in the exponent field (in binary: 10000001).
Summary of Why Bias Is Used:
- Simplifies exponent storage: Using bias allows the exponent to be stored as an unsigned integer rather than a signed integer, making hardware implementations of arithmetic operations simpler.
- Centers the exponent range: It allows exponents to have both positive and negative values by "shifting" the range so that zero is in the middle of the possible exponent values.
- Ensures consistency: The use of bias ensures that comparisons between floating-point numbers are straightforward, as all exponents are non-negative after the bias is added.
The choice of 127 as the bias in single precision is a result of the 8-bit width of the exponent, centering the exponent range around zero.
本质上就类似于补码的设计思路

 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号