Lossless Data Compression for Compressing Space Craft Vibration Signal
seminar project explorer|
Active In SP
Joined: Feb 2011
14-02-2011, 05:07 PM
Lossless Data Compression for Compressing Space Craft Vibration Signals
Anoop S. S.
Department of Computer Science & Engineering
College of Engineering Trivandrum
Kerala - 695016
The project and implimentation is to devise an enhanced compression algorithm suitable for compressing vibration
signals generated from ISRO space carriers. During space missions, it is necessary to measure
generated vibrations in different parts of the space vehicle. This is then sent back to the
base station for further analysis. The signals measured in different parts of the vehicle are
transmitted through 10-12 channels and each channel carries a vibration signal of up to 2KHz.
Each signal is sampled at a rate of 6000 samples per second where each sample is of 8 bit
resolution resulting in a total rate of 480Kbps. A typical space carrier is equipped with 1Mbps
transmitting capacity. However, since this bandwidth has to be utilized for all the data and
control communication to the base station, it is necessary to minimize the amount of bandwidth
used by the vibration data.
Currently, the vibration signal load is reduced by applying frequency domain transformations
which reduces the effective bandwidth usage to 8Kbps. However, this process causes the time
domain information to be lost which is of importance.
Our objective is to devise a suitable lossless compression algorithm that will help to reduce
the bandwidth used without compromising the time domain information. Effectiveness of
traditional compression techniques is to be analyzed since the input is noise type data and
these algorithms might not trade effectively under such circumstances. If required, proper
enhancements are to be applied to yield good compression ratios.
Lossless Data Compression for Compressing Space Craft Vibration Signal.pdf (Size: 162.73 KB / Downloads: 68)
A space craft is a complex piece of engineering work. It will be having lots of components
for controlling and monitoring its internal activities as well as external parameters. And it is a
challenge to make sure that everything works perfectly together to make the mission successful.
It is also important to keep collecting data from the space vehicle and send it back to the base
station, so that it may be analyzed to study the performance, identify weak points, rectify
errors and improve the technology.
A space craft, during its takeoff and voyage, goes through extreme conditions of pressure
and temperature. During the initial phases of the flight, the atmospheric resistance exerts huge
pressure and at the same time heats up the vehicle’s exterior body. Though the vehicle is
designed to take on such worst situations and save its internals from getting damaged, there
are a few side effects that is not possible to be eliminated completely. Vibrations are such a side
effect. Vibrations can be caused by malfunctioning of equipments fixed to the vehicle body or
due to the external forces. What ever be the source of vibration, if it goes beyond the allowed
limits, it may affect the proper functioning of some equipment or may lead to damage. So it
is important to measure the vibrations for later analysis of the performance of the vehicle in
practice. These analysis will help the engineers to perfect the technologies. Also it can help us
to derive conclusions about any possible hardware failures that might happen during a mission.
Vibrations are measured from different parts of the body using probes. The information
collected by the prob will be send back to the ground stations along with other data periodically.
A space craft might be having limited hardware resources. So it is very important that we
reduce the resources used for this purpose as much as possible and keep room for other critical
activities. The purpose of this project and implimentation is to device a suitable compression algorithm that can
compress vibration signals to reduce the bandwidth it uses to be transmitted back to the ground
2 Problem Statement
The space vehicle will be connected with probes that will read vibrations from different parts
of the vehicle. The signals are transmitted through 10-12 channels. Each of these channel
carries a vibration signal of up to 2KHz. The frequency typically can go from 200Hz to a
maximum of 2KHz. Each signal is sampled at a rate of 6000 samples per second. Each sample
is of 8 bit resolution. That will result in the data received for one channel to be 48000 bps.
And the data received through all the 10 channels will be 480Kbps. A typical space carrier is
equipped with 1Mbps transmitting capacity. The vibrations signals alone will consume almost
half of this bandwidth if we send them back as it is. Since this bandwidth has to be utilized
for all the data and control communication to the base station, it is necessary to minimize
the amount of bandwidth used by the vibration data. Vibration data should be compressed
without losing any important information so that it can be sent back with out disrupting other
3 Current Solution
The vibration data is not considered as a live data usually. Most of the time they are not used
for any critical decision making or controlling the vehicle. The current system for reducing the
size of the vibration data is to apply a frequency based encoding. The 8bit sample has domain
of the size 256. The probability distribution of the samples over the domain is studied by
using sample data. Then the entire sample domain is divided into 100 frequency classes. Now
when the transmitter receives data, it buffers them for some time. Then these buffered data is
analyzed to count the samples for each frequency class. once the counting is over, we will have
100 Bytes. From one channel 48 Kbps is received for transmission. If the average frequency
distribution is considered to be 128 over each classes, in one message we can accommodate
12800 Bytes (102.4 Kb) of original information, and the size of each message will be just 100
Bytes (800 bits). If we consider the 10 channels there will be 1 message per channel. Thus
the total amount of 480 Kb can be transmitted using 8 Kb. The compression ratio is 60. This
encoding allows to reduce the bandwidth usage to 10 Kbps.
3.1 Shortcomings of the Current Solution
The current solution of the frequency based encoding gives us surprising compression levels.
But this higher level of compression is lossy. It achieves this level of compression by losing time
domain information. The frequency encoding gives us the idea about the variations in the peak
sample values across seconds . But Its almost impossible to analyze this data on scale smaller
than a second. Also its not possible for us to see the relative variations of the samples with in a
second. Due to these problems its hard for us to regenerate the vibration signal visualizations
back on the ground stations. The time domain information is very important for extended
analysis. So its important to device a better algorithm for encoding and compressing these
data with out losing the timedoemain information.
4 Proposed Solution
Our approach towards solving this problem is guided by two constraints:
1. The data generated has to be compressed to reduce size and bandwidth requirements.
2. We must be able to retrieve time domain information from the compressed data to identify
the distribution of the noise signals over time.
The vibration data that we have at the input, by nature, is of very high randomness. This
can make traditional approaches less effective in achieving significant compression levels. The
proposed solution takes a two-step method towards solving this problem: a Reduced Entropy
Encoding mechanism and the actual compression.
4.1 Low Entropy Adaptive Vibration Encoding (LEAVEN)
The input that we receive is a sequence of 8-bit values representing the noise signals at each
timing sample. We encode this sequence of numbers such that the randomness of the data
is reduced. For this purpose, we introduce the Low Entropy Vibration Encoding (LEAVEN)
algorithm that will encode the sequence based on the changes in the signal rather than the
values themselves. The algorithm tries to reduce the entropy of the input sequence with minimal
increase to the total size.
Basic Technique: The encoding algorithm works primarily by replacing individual values in
the sequencey with the difference from the value appearing just before it. The encoding starts
by adding the first value in the sequence to the output. Now the next value in the sequence is
selected and its difference from the previous value is written out. This process continues until
all the values in the sequence are encoded. For example, consider the sequence:
103 134 128 156 143 174 168 181
From this, the encoded sequence would look like:
103 31 -6 28 -13 31 -6 13
The central idea behind the encoding method is that in a vibration signal, there are more
chances of the delta between two consecutive values to repeat rather than the values themselves.
This results in an input string with reduced entropy with minimal to no increase in the total
Enhancements: Given the general nature of vibration signals, it can be seen that the value
usually goes up and down in consecutive steps. This very nature can be used to make certain
assumptions with which we can decrease the randomness further. As the consecutive deltas
in the encoded sequence would be of opposite signs in a typical case, we can invert signs of
alternate values at the time of encoding. This would result in two values present at even and
odd positions varying only by their sign to be encoded as the same number in the output, thus
reducing entropy further. It is safe to perform this during encoding since the original values
may be restored by applying the same logic during decoding as well. With this enhancement,
the above sequence would be encoded as:
103 31 6 28 13 31 6 13
If there are two consecutive values in the encoded sequence that indicate a move in the same
direction, the purpose of the enhancement will be defeated. To counter this, we would need to
adapt the coding to work with the new sequence. This can be achieved by inserting a special
purpose code that indicates the shift. We are still researching on efficient ways to achieve this
without compromosing data correctness.
It is quite easy to decode the original sequence from the LEAVEN encoded string. We start by
copying over the first value as such to the output. Subsequent values are decoded by adding up
the delta with last value decoded. If the sign based enhancement was applied during encoding,
signs of deltas at even positions are to be toggled before performing the addition.
For example, given the input string:
103 31 6 28 13 31 6 13
We start by inverting the signs of alternate deltas, producing:
103 31 -6 28 -13 31 -6 13
The first value in the encoded text is emitted as such to the output. We find successive
values by adding the delta with the previously obtained decoded value.
103 134 128 156 143 174 168 181
This produces the original sequence that we encoded.
The LEAVEN encoded string from the previous step is compressed losslessly in this step. Four
lossless compression methods are presented. It is to be analyzed how these techniques perform
under a real-world situation and the best is to be selected.
4.2.1 Lempel-Ziv-Welch (LZW)
Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by Abra-
ham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved
implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is
simple to implement and has very high throughput.
The scenario described in Welch’s 1984 paper encodes sequences of 8-bit data as fixed-length
12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the cor-
responding 8-bit character, and the codes 256 through 4095 are created in a dictionary for
sequences encountered in the data as it is encoded. At each stage in compression, input bytes
are gathered into a sequence until the next character would make a sequence for which there is
no code yet in the dictionary. The code for the sequence (without that character) is emitted,
and a new code (for the sequence with that character) is added to the dictionary.
Many refinements were introduced to the algorithm, like the use of variable width codes,
reserving a code to indicate that the code table should be cleared (“clear code”), and a code
to indicate the end of data (“stop code”). The clear code allows the table to be reinitialized
after it fills up, which lets the encoding adapt to changing patterns in the input data.
Since the codes are added in a manner determined by the data, the decoder mimics building
the table as it sees the resulting codes. So the entire dictionary need not be transferred with
the data for decompression. It is critical that the encoder and decoder agree on which variety
of LZW is being used: the size of the alphabet, the maximum code width etc.
Encoding: A high level view of the encoding algorithm is shown here:
1. Initialize the dictionary to contain all strings of length one.
2. Find the longest string W in the dictionary that matches the current input.
3. Emit the dictionary index for W to output and remove W from the input.
4. Add W followed by the next symbol in the input to the dictionary.
5. Go to Step 2.
A dictionary is initialized to contain the single-character strings corresponding to all the
possible input characters (and nothing else except the clear and stop codes if they’re being used).
The algorithm works by scanning through the input string for successively longer substrings
until it finds one that is not in the dictionary. When such a string is found, the index for the
string less the last character (i.e., the longest substring that is in the dictionary) is retrieved
from the dictionary and sent to output, and the new string (including the last character) is
added to the dictionary with the next available code. The last input character is then used as
the next starting point to scan for substrings.
In this way, successively longer strings are registered in the dictionary and made available for
subsequent encoding as single output values. The algorithm works best on data with repeated
patterns, so the initial parts of a message will see little compression. As the message grows,
however, the compression ratio tends asymptotically to the maximum.
Decoding: The decoding algorithm works by reading a value from the encoded input and
outputting the corresponding string from the initialized dictionary. At the same time it obtains
the next value from the input, and adds to the dictionary the concatenation of the string just
output and the first character of the string obtained by decoding the next input value. The
decoder then proceeds to the next input value (which was already read in as the “next value”
in the previous pass) and repeats the process until there is no more input, at which point the
final input value is decoded without any more additions to the dictionary.
In this way the decoder builds up a dictionary which is identical to that used by the encoder,
and uses it to decode subsequent input values. Thus the full dictionary does not need be
sent with the encoded data; just the initial dictionary containing the single-character strings
is sufficient (and is typically defined beforehand within the encoder and decoder rather than
being explicitly sent with the encoded data.).
4.2.2 Lempel-Ziv-Markov chain algorithm (LZMA)
LZMA uses a dictionary based compression algorithm which is a variant of LZ77, whose
output is then encoded with a range encoder. The dictionary compressor produces a stream of
literal symbols and phrase references, which is encoded one bit at a time by the range encoder,
using a model to make a probability prediction of each bit. The main innovation of LZMA is
that instead of a generic byte-based model, LZMA’s model uses contexts specific to the bitfields
in each representation of a literal or phrase. This is nearly as simple as a generic byte-based
model, but gives much better compression because it avoids mixing unrelated bits together in
the same context.
In LZMA compression, the compressed stream is a stream of bits, encoded using adaptive
binary range coder. The stream is divided into packets, each packet describing either a single
byte, or an LZ77 sequence with its length and distance implicitly or explicitly encoded. Each
part of each packet is modeled with independent contexts, so the probability predictions for
each bit are correlated with the values of that bit (and related bits from the same field) in
previous packets of the same type.
First a distance class is encoded using 6 bits. The 5 other bits of the distance code encode
the information about how many direct distance bits need to be extracted from the stream.
Packed code (bit sequence) Packet description
0 + byteCode A single byte encoded using an adaptive binary range coder.
The range coder uses context based on some number of the
most significant bits of the previous byte. Depending on the
state machine, this can also be a single byte encoded as a
difference from the byte at the last used LZ77 distance.
1+0 + len + dist A typical LZ77 sequence describing sequence length and dis-
1+1+0+0 A one-byte LZ77 sequence. Distance is equal to the last used
1+1+0+1 + len An LZ77 sequence. Distance is equal to the last used LZ77
1+1+1+0 + len An LZ77 sequence. Distance is equal to the second last used
1+1+1+1+0 + len An LZ77 sequence. Distance is equal to the third last used
1+1+1+1+1 + len An LZ77 sequence. Distance is equal to the fourth last used
Table 1: LZMA packet types
LZMA is the is the algorithm that gives one of the best compression ratios for general purpose
data. It gives better compression than BZip2 and is used in a major archiving system called
Length code (bit sequence) Description
0+ 3 bits The length encoded using 3 bits, gives the lengths range from
2 to 9.
1+0+ 3 bits The length encoded using 3 bits, gives the lengths range from
10 to 17.
1+1+ 8 bits The length encoded using 8 bits, gives the lengths range from
18 to 273.
Table 2: LZMA length encoding
7zip for its 7z format. Though it iss one of the best algorithms available, documentation is
very scarse. A library named LZMA-SDK, developed and released by Igor Pavlov - the creator
of LZMA and 7zip, is available in the public domain with reference implementations of the
algorithm in various programming languages. Currently, we are trying to analyze the available
source code to understand how it works and to possibly improve on.
4.2.3 Huffman Coding
In computer science and information theory, Huffman coding is an entropy encoding algo-
rithm used for lossless data compression. The term refers to the use of a variable-length code
table for encoding a source symbol (such as a character in a file) where the variable-length code
table has been derived in a particular way based on the estimated probability of occurrence for
each possible value of the source symbol.
Huffman coding uses a specific method for choosing the representation for each symbol, re-
sulting in a prefix code (sometimes called “prefix-free codes”, that is, the bit string representing
some particular symbol is never a prefix of the bit string representing any other symbol) that
expresses the most common source symbols using shorter strings of bits than are used for less
common source symbols.
The technique works by creating a binary tree of nodes. These can be stored in a regular
array, the size of which depends on the number of symbols, n. A node can be either a leaf
node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself,
the weight (frequency of appearance) of the symbol and optionally, a link to a parent node
which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes
contain symbol weight, links to two child nodes and the optional link to a parent node. As a
common convention, bit 0 represents following the left child and bit 1 represents following the
right child. A finished tree has up to n leaf nodes and n1 internal nodes. A Huffman tree that
omits unused symbols produces the most optimal code lengths.
If the symbols are sorted by probability, there is a linear-time (O(n)) method to create a
Huffman tree using two queues, the first one containing the initial weights (along with pointers
to the associated leaves), and combined weights (along with pointers to the trees) being put in
the back of the second queue. This assures that the lowest weight is always kept at the front
of one of the two queues:
1. Start with as many leaves as there are symbols.
2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the
least likely item is in the head of the queue).
3. While there is more than one node in the queues:
(a) Dequeue the two nodes with the lowest weight by examining the fronts of both
(b) Create a new internal node, with the two just-removed nodes as children (either
node can be either child) and the sum of their weights as the new weight.
© Enqueue the new node into the rear of the second queue.
4. The remaining node is the root node; the tree has now been generated.
Huffman coding is expected to give good compression ratios and is the main algorithm for
us to look up to. Applied on top of the RENE encoding, we are looking forward to promising
4.2.4 Arithmetic Coding
Arithmetic coding is a form of variable-length entropy encoding used in lossless data com-
pression. Normally, a string of characters such as the words “hello there” is represented using a
fixed number of bits per character, as in the ASCII code. When a string is converted to arith-
metic encoding, frequently used characters will be stored with fewer bits and not-so-frequently
occurring characters will be stored with more bits, resulting in fewer bits used in total. Arith-
metic coding differs from other forms of entropy encoding such as Huffman coding in that rather
than separating the input into component symbols and replacing each with a code, arithmetic
coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0).
In the simplest case, the probability of each symbol occurring is equal. For example, consider
a sequence taken from a set of three symbols, A, B, and C, each equally likely to occur. Simple
block encoding would use 2 bits per symbol, which is wasteful: one of the bit variations is never
A more efficient solution is to represent the sequence as a rational number between 0 and 1
in base 3, where each digit represents a symbol. For example, the sequence “ABBCAB” could
become 0.0112013. The next step is to encode this ternary number using a fixed-point binary
number of sufficient precision to recover it, such as 0.0010110012 - this is only 9 bits, 25%
smaller than the naive block encoding. This is feasible for long sequences because there are
efficient, in-place algorithms for converting the base of arbitrarily precise numbers. To decode
the value, knowing the original string had length 6, one can simply convert back to base 3,
round to 6 digits, and recover the string.
In general, arithmetic coders can produce near-optimal output for any given set of sym-
bols and probabilities, the optimal value being log2 P bits for each symbol of probability P .
Compression algorithms that use arithmetic coding start by determining a model of the data
basically a prediction of what patterns will be found in the symbols of the message. The more
accurate this prediction is, the closer to optimality the output will be.
In general, each step of the encoding process, except for the very last, is the same; the encoder
has basically just three pieces of data to consider:
1. The next symbol that needs to be encoded
2. The current interval (initially the interval is set to [0,1))
3. The probabilities the model assigns to each of the various symbols that are possible at
The encoder divides the current interval into sub-intervals, each representing a fraction of
the current interval proportional to the probability of that symbol in the current context.
Whichever interval corresponds to the actual symbol that is next to be encoded becomes the
interval used in the next step.
When all symbols have been encoded, the resulting interval unambiguously identifies the
sequence of symbols that produced it. Anyone who has the same final interval and model
that is being used can reconstruct the symbol sequence that must have entered the encoder
to result in that final interval. It is not necessary to transmit the final interval, however; it is
only necessary to transmit one fraction that lies within that interval. In particular, it is only
necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that
begin with those digits fall into the final interval.
Vibration measurements is an important piece of information that is transmitted back from
space crafts which can be used for failure analysis, error corrections and future refinements.
However, the bandwidth available to transmit this information is very limited and it is im-
portant to apply compression to make a more efficient use of bandwidth. The problem with
vibration signals is that the degree of randomness is very high and it is hard to compress them
down to a feasible size.
Our approach here is to attempt to reduce the randomness by an encoding mechanism that we
have devised - LEAVEN - and then perform compression using one of the available methods.
We are also looking at introducing slight improvements over the compression techniques to
make them better suite our specific needs. We will be measuing the compression ratios under
the four different compression algorithms - with and without LEAVEN - and do a comparison
to select the best among these.
 David Huffman “A Method for the Construction of Minimum-Redundancy Codes” Pro-
ceedings of the IRE, Vol. 40, No. 9. (08 September 1952), p. 1098-1101.
 Jacob Ziv and Abraham Lempel “A Universal Algorithm for Sequential Data Compression”
IEEE Transactions on Information Theory, Vol. IT-23, No. 3, (May 1977), p. 337-343.
 T. A. Welch “A Technique for High-Performance Data Compression” Computer, Vol. 17,
No. 6. (June 1984), p. 8-19.
 William Stallings “High-Speed Networks and Internets, Second Edition” Pearson Education,
 7-zip, 7-zip
 Wikipedia, wikipedia