This documents makes use of structure definitions that very much look like C structures. It is important to note that this is not all that true since the data saved in a SWF file are very specific and they don't follow the default, inflexible (as in static,) C definitions.
The following pages define the basic types used in this document. The comments explain in more details how each type is used.
Note that except for bit fields, all types start on a byte boundary. Nothing will be aligned on more than one byte.
The original document by Steve Hollasch can be found at http://steve.hollasch.net/cgindex/coding/ieeefloat.html
IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix platforms. This article gives a brief overview of IEEE floating point and its representation. Discussion of arithmetic implementation may be found in the book mentioned at the bottom of this article.
There are several ways to represent real numbers on computers. Fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, one might represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another approach is to use rationals, and represent every number as the ratio of two integers.
Floating-point representation - the most common solution - basically represents reals in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 × 102. In hexadecimal, the number 123.abc might be represented as 1.23abc × 162.
Floating-point solves a number of representation problems. Fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided.
Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease.
IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored.
The following figure shows the layout for single (32-bit) and double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):
Sign | Exponent | Fraction | Bias | |
---|---|---|---|---|
Single Precision | 1 [31] | 8 [30-23] | 23 [22-00] | 127 |
Double Precision | 1 [63] | 11 [62-52] | 52 [51-00] | 1023 |
The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number.
The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or 73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers.
For double precision, the exponent field is 11 bits, and has a bias of 1023.
The mantissa, also known as the significand, represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits.
To find out the value of the implicit leading bit, consider that any number can be expressed in scientific notation in many different ways. For example, the number five can be represented as any of these:
5.00 × 100 0.05 × 102 5000 × 10-3
In order to maximize the quantity of representable numbers, floating-point numbers are typically stored in normalized form. This basically puts the radix point after the first non-zero digit. In normalized form, five is represented as 5.0 × 100.
A nice little optimization is available to us in base two, since the only possible non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to represent it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits.
So, to sum up:
Let's consider single-precision floats for a second. Note that we're taking essentially a 32-bit number and re-jiggering the fields to cover a much broader range. Something has to give, and it's precision. For example, regular 32-bit integers, with all precision centered around zero, can precisely store integers with 32-bits of resolution. Single-precision floating-point, on the other hand, is unable to match this resolution with its 24 bits. It does, however, approximate this value by effectively truncating from the lower end. For example:
11110000 11001100 10101010 00001111 // 32-bit integer = +1.1110000 11001100 10101010 x 231 // Single-Precision Float = 11110000 11001100 10101010 00000000 // Corresponding Value
This approximates the 32-bit value, but doesn't yield an exact representation. On the other hand, besides the ability to represent fractional components (which integers lack completely), the floating-point value can represent numbers around 2127, compared to 32-bit integers maximum value around 232.
The range of positive floating point numbers can be split into normalized numbers (which preserve the full precision of the mantissa), and denormalized numbers (discussed later) which use only a portion of the fractions's precision.
Denormalized | Normalized | Approximate Decimal | |
---|---|---|---|
Single Precision |
|
|
|
Double Precision |
|
|
|
Since the sign of floating point numbers is given by a special leading bit, the range for negative numbers is given by the negation of the above values.
There are five distinct numerical ranges that single-precision floating-point numbers are not able to represent:
Overflow means that values have grown too large for the representation, much in the same way that you can overflow integers. Underflow is a less serious problem because is just denotes a loss of precision, which is guaranteed to be closely approximated by zero.
Here's a table of the effective range (excluding infinite values) of IEEE floating-point numbers:
Binary | Decimal | |
---|---|---|
Single | ± (2-2-23) × 2127 | ~ ± 1038.53 |
Double | ± (2-2-52) × 21023 | ~ ± 10308.25 |
Note that the extreme values occur (regardless of sign) when the exponent is at the maximum value for finite numbers (2127 for single-precision, 21023 for double), and the mantissa is filled with 1s (including the normalizing 1 bit).
IEEE reserves exponent field values of all 0s and all 1s to denote special values in the floating-point scheme.
As mentioned above, zero is not directly representable in the straight format, due to the assumption of a leading 1 (we'd need to specify a true zero mantissa to yield a value of zero). Zero is a special value denoted with an exponent field of zero and a fraction field of zero. Note that -0 and +0 are distinct values, though they both compare as equal.
If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero), then the value is a denormalized number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number
The values +infinity and -infinity are denoted with an exponent of all 1s and a fraction of all 0s. The sign bit distinguishes between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful because it allows operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point.
The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined.
An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's can be handy to assign to uninitialized variables to trap premature usage.
Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations.
Operations on special numbers are well-defined by IEEE. In the simplest case, any operation with a NaN yields a NaN result. Other operations are as follows:
Operation | Result |
---|---|
n ÷ ±Infinity | 0 |
±Infinity × ±Infinity | ±Infinity |
±nonzero ÷ 0 | ±Infinity |
Infinity + Infinity | Infinity |
±0 ÷ ±0 | NaN |
Infinity - Infinity | NaN |
±Infinity ÷ ±Infinity | NaN |
±Infinity × 0 | NaN |
To sum up, the following are the corresponding values for a given representation:
Sign | Exponent (e) | Fraction (f) | Value |
---|---|---|---|
0 | 00..00 | 00..00 | +0 |
0 | 00..00 |
00..01 : 11..11 |
Positive Denormalized Real 0.f × 2(-b+1) |
0 |
00..01 : 11..10 |
XX..XX |
Positive Normalized Real 1.f × 2(e-b) |
0 | 11..11 | 00..00 | +Infinity |
0 | 11..11 |
00..01 : 01..11 |
SNaN |
0 | 11..11 |
10..00 : 11..11 |
QNaN |
1 | 00..00 | 00..00 | -0 |
1 | 00..00 |
00..01 : 11..11 |
Negative Denormalized Real -0.f × 2(-b+1) |
1 |
00..01 : 11..10 |
XX..XX |
Negative Normalized Real -1.f × 2(e-b) |
1 | 11..11 | 00..00 | -Infinity |
1 | 11..11 |
00..01 : 01..11 |
SNaN |
1 | 11..11 |
10..00 : 11.11 |
QNaN |
A lot of this stuff was observed from small programs I wrote to go back and forth between hex and floating point (printf-style), and to examine the results of various operations. The bulk of this material, however, was lifted from Stallings' book.
A signed or unsigned bit field which width does not directly correspond to an existing C type.
In structures, the width of the field is specified after the field name like in C bit fields. In case of Flash, it can be dynamic in which case a variable name is specified.
Signed bit fields have an implied sign extend of the most significant bit of the bit field. So a signed bit field of 2 bits support the following values:
Decimal | Binary |
-2 | 10 |
-1 | 11 |
0 | 00 |
1 | 01 |
All bit fields are always declared from the MSB to the LSB of the bytes read one after another from the input file. In other words, the first bit read in the file correspond to the MSB of the value being read. As a side effect, the bytes in the file appear as if they were defined in big endian order (which is the opposite of the char, short, long, long long, fixed, float, and double declared outside a bit field.)
The following is a slow but working algorithm to read bit fields in C:
/* global variables (could be in a structure or an object) */ long mask, count, last_byte; /* call once before to read a bit field to reset the parameters */ void start_read_bits(void) { mask = 0x80; count = 0; } /* call for each bit field */ long read_bits(long bit_size) { /* NOTE: any bit field value is at most 32 bits */ /* the result of this function could also be an unsigned long */ long result; unsigned long bit; bit = 1 << (bit_size - 1); while(bit != 0) { if(mask == 0x80) { last_byte = read_input(); } if(last_byte & mask) { result |= bit; } mask /= 2; if(mask == 0) { mask = 0x80; } bit /= 2; } }
Note that this function is safe but it should certainly check the validity of the input variable. read_input() is expected to return one byte from the input file.
A signed or unsigned 8 bit value.
A char value is always aligned on a byte boundary.
A double float is a standard IEEE 754 floating point value of 64 bits.
The value is defined as follow:
This type is similar to most processor double float type and can thus be used directly.
Note that in some cases, double floats are saved with the lower 32 bits of their mantissa after the upper bits. In other wise, the two 32 bits value are swapped.
A signed or unsigned 32 bit value.
A long value is always aligned on a byte boundary.
A short fixed value1 is a 32 bit (or less) number representing a value with 16 bits on the left of the decimal point and 16 bits on the right.
When the value is smaller than 32 bits, we assume that only the least significant bits were defined (quite often only those after the decimal point.)
For more information about bit fields, check out the [un]signed type.
A long float is a standard IEEE 754 floating point value of 32 bits.
The value is defined as follow:
This is the standard 32 bit floating point type on most processors and thus in most languages.
A signed or unsigned 64 bit value.
A long long value is always aligned on a byte boundary.
A signed or unsigned 16 bit value.
A short value is always aligned on a byte boundary.
A short fixed value1 is a 16 bit (or less) number representing a value with 8 bits on the left of the decimal point and 8 bits on the right.
When the value is smaller than 16 bits, we assume that only the least significant bits were defined (quite often only those after the decimal point.)
For more information about bit fields, check out the [un]signed type.
A standard IEEE 754 floating point value of 16 bits.
The value is defined like a 32 bits floating points with:
The easiest way to deal with these floats once loaded is to convert them to 32 bits floats.
A bit field variable defined as TWIPS represents a floating point defined in TWIPS. Load the value as a signed or unsigned integer and then divide it by 20. The floating point result is a precise dimension in pixel.
Please, see the [un]signed type for more information about fields.
A null terminated string of 8 bits characters (i.e. a C string.) You have to scan the string in order to skip it to the next element.
Flash also makes use of Pascal Strings. Those strings start with a size. In all instance, the size of the string is defined on one byte (char). In this case, we declare the string with a construct as follow:
char f_string_size; char f_pascal_string[f_string_size];