计算机存储浮点数

存储方式

A computer stores floating-point numbers using a standardized format called IEEE 754. This format is designed to represent real numbers in a way that balances range and precision. Here's how it works:

Basic Structure of IEEE 754 Floating-Point Numbers

A floating-point number in a computer is typically represented by three components:

Sign bit (S): This determines whether the number is positive (0) or negative (1).
Exponent (E): This stores the exponent value, which determines the range of the number (i.e., how large or small it can be).
Mantissa (or Significand) (M): This holds the significant digits of the number, representing its precision.

The general formula for a floating-point number is:

\[(-1)^{S} \times 1.M \times 2^{(E - \text{bias})} \]

Where:

S is the sign bit (0 for positive, 1 for negative).
M is the mantissa (or significand), typically in normalized form (starting with a leading 1).
E is the exponent, adjusted by a bias.

Common Floating-Point Formats

The two most common floating-point formats are single precision (32-bit) and double precision (64-bit).

1. Single Precision (32-bit Floating-Point):

1 bit for sign (S)
8 bits for exponent (E)
23 bits for mantissa (M)

A 32-bit floating-point number has the following layout:

| S |  E (8 bits)  |       M (23 bits)         |

Range of exponent: The exponent is stored with a bias of 127 (i.e., ( E - 127 )), meaning the actual exponent is calculated as E - 127.
Mantissa: The 23 bits store the fractional part. The number is assumed to have a leading 1. (known as implicit leading 1), which is not stored explicitly. For example, a mantissa of 001 would be interpreted as 1.001.

2. Double Precision (64-bit Floating-Point):

1 bit for sign (S)
11 bits for exponent (E)
52 bits for mantissa (M)

A 64-bit floating-point number has the following layout:

| S |      E (11 bits)     |               M (52 bits)               |

Range of exponent: The exponent is stored with a bias of 1023 (i.e., ( E - 1023 )).
Mantissa: The 52 bits store the fractional part, with an implicit leading 1.

Example of Single-Precision Float Representation

Suppose we want to store the number -6.75 as a 32-bit float:

Convert to binary:
- 6.75 in decimal is 110.11 in binary (6 = 110, and .75 = .11 in binary).
Normalize the number:
- In scientific notation, this is ( -1.1011 \times 2^2 ). This shows the sign bit is 1, the exponent is 2, and the mantissa is 1.1011.
Set the components:
- Sign bit: 1 (since the number is negative)
- Exponent: 2 + 127 = 129 in decimal, which is 10000001 in binary.
- Mantissa: The 1. is implicit, so we only store 1011, padded to 23 bits: 10110000000000000000000.

Thus, the 32-bit representation of -6.75 is:

1 10000001 10110000000000000000000

Precision and Limitations

Precision: The more bits in the mantissa, the more precise the number. Single-precision floats are accurate to about 7 decimal digits, while double-precision floats are accurate to about 15-16 decimal digits.
Range: The exponent allows floating-point numbers to represent a vast range, from very small numbers (close to zero) to very large ones.

Special Values

IEEE 754 also defines special cases:

Zero: Represented by all bits in the exponent and mantissa being zero.
Infinity: Represented by all bits in the exponent being 1, and the mantissa being all 0.
NaN (Not a Number): Represented by all bits in the exponent being 1, and the mantissa containing non-zero bits.

Summary

Floating-point numbers are stored in three parts: sign, exponent, and mantissa.
Single precision uses 32 bits, while double precision uses 64 bits.
The IEEE 754 standard defines how these components are laid out and how the numbers are calculated, enabling computers to store a wide range of real numbers with a trade-off between precision and range.

bias的取法

The concept of bias in the exponent of floating-point numbers (as per the IEEE 754 standard) is used to store both positive and negative exponents without needing an explicit sign bit for the exponent. The bias enables the use of unsigned integers to represent the exponent, which simplifies comparisons and arithmetic operations on floating-point numbers.

Why Do We Use Bias?

In the IEEE 754 format, the exponent is stored as an unsigned integer, but the actual exponent needs to represent both positive and negative values. For example, in a base-2 floating-point number like (1.25 \times 2^3) or (1.25 \times 2^{-2}), the exponent can be positive or negative. The bias allows the system to shift the range of exponents into the realm of positive integers, enabling the use of simple unsigned integers to represent both negative and positive exponents.

For example, if we want to store the exponent (E = -2), we can't store a negative number directly in an unsigned field. Instead, we bias the exponent by a constant so that both negative and positive exponents are represented as positive integers.

What Is the Bias in IEEE 754?

For single precision (32-bit): The exponent field is 8 bits wide, so it can store values from 0 to 255. The bias for single precision is 127. This means:
[
\text{Exponent (E)} = \text{Stored Value} - 127
]
- A stored value of 127 represents an actual exponent of (0) (i.e., ( E = 127 - 127 = 0 )).
- A stored value of 128 represents an actual exponent of (1) (i.e., ( E = 128 - 127 = 1 )).
- A stored value of 126 represents an actual exponent of (-1) (i.e., ( E = 126 - 127 = -1 )).
For double precision (64-bit): The exponent field is 11 bits wide, and the bias is 1023. This means:
[
\text{Exponent (E)} = \text{Stored Value} - 1023
]
- A stored value of 1023 represents an actual exponent of (0).
- A stored value of 1024 represents an actual exponent of (1).
- A stored value of 1022 represents an actual exponent of (-1).

Why Is the Bias 127 in Single Precision?

The bias is typically chosen to be halfway through the range of representable exponents, which centers the range of exponents at zero. The range of the unsigned 8-bit exponent is 0 to 255, so the bias is chosen as:
[
\text{Bias} = \frac{2^8}{2} - 1 = 127
]

This gives us an exponent range from:

Minimum exponent: ( 0 - 127 = -127 )
Maximum exponent: ( 255 - 127 = 128 )

Thus, using a bias of 127 enables the exponent to range from (-127) to (+128), allowing both very small and very large numbers to be represented.

Example of Exponent with Bias

Consider the floating-point number (6.75) represented in IEEE 754 single precision:

Convert (6.75) to binary: (6.75 = 110.11_2 = 1.1011_2 \times 2^2).
The actual exponent (E) is 2.
Store the biased exponent: (E + 127 = 2 + 127 = 129), which is (10000001_2).

Thus, the biased exponent is stored as 129 in the exponent field (in binary: 10000001).

Summary of Why Bias Is Used:

Simplifies exponent storage: Using bias allows the exponent to be stored as an unsigned integer rather than a signed integer, making hardware implementations of arithmetic operations simpler.
Centers the exponent range: It allows exponents to have both positive and negative values by "shifting" the range so that zero is in the middle of the possible exponent values.
Ensures consistency: The use of bias ensures that comparisons between floating-point numbers are straightforward, as all exponents are non-negative after the bias is added.

The choice of 127 as the bias in single precision is a result of the 8-bit width of the exponent, centering the exponent range around zero.
本质上就类似于补码的设计思路

posted @ 2024-10-04 17:26 Gold_stein 阅读(103) 评论(0) 收藏举报

刷新页面返回顶部

计算机存储浮点数

计算机存储浮点数

存储方式

Basic Structure of IEEE 754 Floating-Point Numbers

Common Floating-Point Formats

1. Single Precision (32-bit Floating-Point):

2. Double Precision (64-bit Floating-Point):

Example of Single-Precision Float Representation

Precision and Limitations

Special Values

Summary

bias的取法

Why Do We Use Bias?

What Is the Bias in IEEE 754?

Why Is the Bias 127 in Single Precision?

Example of Exponent with Bias

Summary of Why Bias Is Used:

公告