Lossless Data Compression for Compressing Space Craft Vibration Signal
seminar project explorer Active In SP Posts: 231 Joined: Feb 2011 
14022011, 05:07 PM
Lossless Data Compression for Compressing Space Craft Vibration Signals
by Anoop S. S. M105102 Department of Computer Science & Engineering College of Engineering Trivandrum Kerala  695016 201011 Abstract The project and implimentation is to devise an enhanced compression algorithm suitable for compressing vibration signals generated from ISRO space carriers. During space missions, it is necessary to measure generated vibrations in different parts of the space vehicle. This is then sent back to the base station for further analysis. The signals measured in different parts of the vehicle are transmitted through 1012 channels and each channel carries a vibration signal of up to 2KHz. Each signal is sampled at a rate of 6000 samples per second where each sample is of 8 bit resolution resulting in a total rate of 480Kbps. A typical space carrier is equipped with 1Mbps transmitting capacity. However, since this bandwidth has to be utilized for all the data and control communication to the base station, it is necessary to minimize the amount of bandwidth used by the vibration data. Currently, the vibration signal load is reduced by applying frequency domain transformations which reduces the effective bandwidth usage to 8Kbps. However, this process causes the time domain information to be lost which is of importance. Our objective is to devise a suitable lossless compression algorithm that will help to reduce the bandwidth used without compromising the time domain information. Effectiveness of traditional compression techniques is to be analyzed since the input is noise type data and these algorithms might not trade effectively under such circumstances. If required, proper enhancements are to be applied to yield good compression ratios. Lossless Data Compression for Compressing Space Craft Vibration Signal.pdf (Size: 162.73 KB / Downloads: 68) 1 Introduction A space craft is a complex piece of engineering work. It will be having lots of components for controlling and monitoring its internal activities as well as external parameters. And it is a challenge to make sure that everything works perfectly together to make the mission successful. It is also important to keep collecting data from the space vehicle and send it back to the base station, so that it may be analyzed to study the performance, identify weak points, rectify errors and improve the technology. A space craft, during its takeoff and voyage, goes through extreme conditions of pressure and temperature. During the initial phases of the flight, the atmospheric resistance exerts huge pressure and at the same time heats up the vehicle’s exterior body. Though the vehicle is designed to take on such worst situations and save its internals from getting damaged, there are a few side effects that is not possible to be eliminated completely. Vibrations are such a side effect. Vibrations can be caused by malfunctioning of equipments fixed to the vehicle body or due to the external forces. What ever be the source of vibration, if it goes beyond the allowed limits, it may affect the proper functioning of some equipment or may lead to damage. So it is important to measure the vibrations for later analysis of the performance of the vehicle in practice. These analysis will help the engineers to perfect the technologies. Also it can help us to derive conclusions about any possible hardware failures that might happen during a mission. Vibrations are measured from different parts of the body using probes. The information collected by the prob will be send back to the ground stations along with other data periodically. A space craft might be having limited hardware resources. So it is very important that we reduce the resources used for this purpose as much as possible and keep room for other critical activities. The purpose of this project and implimentation is to device a suitable compression algorithm that can compress vibration signals to reduce the bandwidth it uses to be transmitted back to the ground stations. 5 2 Problem Statement The space vehicle will be connected with probes that will read vibrations from different parts of the vehicle. The signals are transmitted through 1012 channels. Each of these channel carries a vibration signal of up to 2KHz. The frequency typically can go from 200Hz to a maximum of 2KHz. Each signal is sampled at a rate of 6000 samples per second. Each sample is of 8 bit resolution. That will result in the data received for one channel to be 48000 bps. And the data received through all the 10 channels will be 480Kbps. A typical space carrier is equipped with 1Mbps transmitting capacity. The vibrations signals alone will consume almost half of this bandwidth if we send them back as it is. Since this bandwidth has to be utilized for all the data and control communication to the base station, it is necessary to minimize the amount of bandwidth used by the vibration data. Vibration data should be compressed without losing any important information so that it can be sent back with out disrupting other critical activities. 6 3 Current Solution The vibration data is not considered as a live data usually. Most of the time they are not used for any critical decision making or controlling the vehicle. The current system for reducing the size of the vibration data is to apply a frequency based encoding. The 8bit sample has domain of the size 256. The probability distribution of the samples over the domain is studied by using sample data. Then the entire sample domain is divided into 100 frequency classes. Now when the transmitter receives data, it buffers them for some time. Then these buffered data is analyzed to count the samples for each frequency class. once the counting is over, we will have 100 Bytes. From one channel 48 Kbps is received for transmission. If the average frequency distribution is considered to be 128 over each classes, in one message we can accommodate 12800 Bytes (102.4 Kb) of original information, and the size of each message will be just 100 Bytes (800 bits). If we consider the 10 channels there will be 1 message per channel. Thus the total amount of 480 Kb can be transmitted using 8 Kb. The compression ratio is 60. This encoding allows to reduce the bandwidth usage to 10 Kbps. 3.1 Shortcomings of the Current Solution The current solution of the frequency based encoding gives us surprising compression levels. But this higher level of compression is lossy. It achieves this level of compression by losing time domain information. The frequency encoding gives us the idea about the variations in the peak sample values across seconds . But Its almost impossible to analyze this data on scale smaller than a second. Also its not possible for us to see the relative variations of the samples with in a second. Due to these problems its hard for us to regenerate the vibration signal visualizations back on the ground stations. The time domain information is very important for extended analysis. So its important to device a better algorithm for encoding and compressing these data with out losing the timedoemain information. 7 4 Proposed Solution Our approach towards solving this problem is guided by two constraints: 1. The data generated has to be compressed to reduce size and bandwidth requirements. 2. We must be able to retrieve time domain information from the compressed data to identify the distribution of the noise signals over time. The vibration data that we have at the input, by nature, is of very high randomness. This can make traditional approaches less effective in achieving significant compression levels. The proposed solution takes a twostep method towards solving this problem: a Reduced Entropy Encoding mechanism and the actual compression. 4.1 Low Entropy Adaptive Vibration Encoding (LEAVEN) The input that we receive is a sequence of 8bit values representing the noise signals at each timing sample. We encode this sequence of numbers such that the randomness of the data is reduced. For this purpose, we introduce the Low Entropy Vibration Encoding (LEAVEN) algorithm that will encode the sequence based on the changes in the signal rather than the values themselves. The algorithm tries to reduce the entropy of the input sequence with minimal increase to the total size. 4.1.1 Encoding Basic Technique: The encoding algorithm works primarily by replacing individual values in the sequencey with the difference from the value appearing just before it. The encoding starts by adding the first value in the sequence to the output. Now the next value in the sequence is selected and its difference from the previous value is written out. This process continues until all the values in the sequence are encoded. For example, consider the sequence: 103 134 128 156 143 174 168 181 From this, the encoded sequence would look like: 103 31 6 28 13 31 6 13 The central idea behind the encoding method is that in a vibration signal, there are more chances of the delta between two consecutive values to repeat rather than the values themselves. This results in an input string with reduced entropy with minimal to no increase in the total length. Enhancements: Given the general nature of vibration signals, it can be seen that the value usually goes up and down in consecutive steps. This very nature can be used to make certain assumptions with which we can decrease the randomness further. As the consecutive deltas in the encoded sequence would be of opposite signs in a typical case, we can invert signs of alternate values at the time of encoding. This would result in two values present at even and odd positions varying only by their sign to be encoded as the same number in the output, thus reducing entropy further. It is safe to perform this during encoding since the original values may be restored by applying the same logic during decoding as well. With this enhancement, the above sequence would be encoded as: 103 31 6 28 13 31 6 13 8 If there are two consecutive values in the encoded sequence that indicate a move in the same direction, the purpose of the enhancement will be defeated. To counter this, we would need to adapt the coding to work with the new sequence. This can be achieved by inserting a special purpose code that indicates the shift. We are still researching on efficient ways to achieve this without compromosing data correctness. 4.1.2 Decoding It is quite easy to decode the original sequence from the LEAVEN encoded string. We start by copying over the first value as such to the output. Subsequent values are decoded by adding up the delta with last value decoded. If the sign based enhancement was applied during encoding, signs of deltas at even positions are to be toggled before performing the addition. For example, given the input string: 103 31 6 28 13 31 6 13 We start by inverting the signs of alternate deltas, producing: 103 31 6 28 13 31 6 13 The first value in the encoded text is emitted as such to the output. We find successive values by adding the delta with the previously obtained decoded value. 103 134 128 156 143 174 168 181 This produces the original sequence that we encoded. 4.2 Compression The LEAVEN encoded string from the previous step is compressed losslessly in this step. Four lossless compression methods are presented. It is to be analyzed how these techniques perform under a realworld situation and the best is to be selected. 4.2.1 LempelZivWelch (LZW) LempelZivWelch (LZW) is a universal lossless data compression algorithm created by Abra ham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has very high throughput. The scenario described in Welch’s 1984 paper encodes sequences of 8bit data as fixedlength 12bit codes. The codes from 0 to 255 represent 1character sequences consisting of the cor responding 8bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is emitted, and a new code (for the sequence with that character) is added to the dictionary. 9 Many refinements were introduced to the algorithm, like the use of variable width codes, reserving a code to indicate that the code table should be cleared (“clear code”), and a code to indicate the end of data (“stop code”). The clear code allows the table to be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data. Since the codes are added in a manner determined by the data, the decoder mimics building the table as it sees the resulting codes. So the entire dictionary need not be transferred with the data for decompression. It is critical that the encoder and decoder agree on which variety of LZW is being used: the size of the alphabet, the maximum code width etc. Encoding: A high level view of the encoding algorithm is shown here: 1. Initialize the dictionary to contain all strings of length one. 2. Find the longest string W in the dictionary that matches the current input. 3. Emit the dictionary index for W to output and remove W from the input. 4. Add W followed by the next symbol in the input to the dictionary. 5. Go to Step 2. A dictionary is initialized to contain the singlecharacter strings corresponding to all the possible input characters (and nothing else except the clear and stop codes if they’re being used). The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary. When such a string is found, the index for the string less the last character (i.e., the longest substring that is in the dictionary) is retrieved from the dictionary and sent to output, and the new string (including the last character) is added to the dictionary with the next available code. The last input character is then used as the next starting point to scan for substrings. In this way, successively longer strings are registered in the dictionary and made available for subsequent encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a message will see little compression. As the message grows, however, the compression ratio tends asymptotically to the maximum. Decoding: The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initialized dictionary. At the same time it obtains the next value from the input, and adds to the dictionary the concatenation of the string just output and the first character of the string obtained by decoding the next input value. The decoder then proceeds to the next input value (which was already read in as the “next value” in the previous pass) and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data; just the initial dictionary containing the singlecharacter strings is sufficient (and is typically defined beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.). 10 4.2.2 LempelZivMarkov chain algorithm (LZMA) LZMA uses a dictionary based compression algorithm which is a variant of LZ77, whose output is then encoded with a range encoder. The dictionary compressor produces a stream of literal symbols and phrase references, which is encoded one bit at a time by the range encoder, using a model to make a probability prediction of each bit. The main innovation of LZMA is that instead of a generic bytebased model, LZMA’s model uses contexts specific to the bitfields in each representation of a literal or phrase. This is nearly as simple as a generic bytebased model, but gives much better compression because it avoids mixing unrelated bits together in the same context. In LZMA compression, the compressed stream is a stream of bits, encoded using adaptive binary range coder. The stream is divided into packets, each packet describing either a single byte, or an LZ77 sequence with its length and distance implicitly or explicitly encoded. Each part of each packet is modeled with independent contexts, so the probability predictions for each bit are correlated with the values of that bit (and related bits from the same field) in previous packets of the same type. First a distance class is encoded using 6 bits. The 5 other bits of the distance code encode the information about how many direct distance bits need to be extracted from the stream. Packed code (bit sequence) Packet description 0 + byteCode A single byte encoded using an adaptive binary range coder. The range coder uses context based on some number of the most significant bits of the previous byte. Depending on the state machine, this can also be a single byte encoded as a difference from the byte at the last used LZ77 distance. 1+0 + len + dist A typical LZ77 sequence describing sequence length and dis tance. 1+1+0+0 A onebyte LZ77 sequence. Distance is equal to the last used LZ77 distance. 1+1+0+1 + len An LZ77 sequence. Distance is equal to the last used LZ77 distance. 1+1+1+0 + len An LZ77 sequence. Distance is equal to the second last used LZ77 distance. 1+1+1+1+0 + len An LZ77 sequence. Distance is equal to the third last used LZ77 distance. 1+1+1+1+1 + len An LZ77 sequence. Distance is equal to the fourth last used LZ77 distance. Table 1: LZMA packet types LZMA is the is the algorithm that gives one of the best compression ratios for general purpose data. It gives better compression than BZip2 and is used in a major archiving system called 11 Length code (bit sequence) Description 0+ 3 bits The length encoded using 3 bits, gives the lengths range from 2 to 9. 1+0+ 3 bits The length encoded using 3 bits, gives the lengths range from 10 to 17. 1+1+ 8 bits The length encoded using 8 bits, gives the lengths range from 18 to 273. Table 2: LZMA length encoding 7zip for its 7z format. Though it iss one of the best algorithms available, documentation is very scarse. A library named LZMASDK, developed and released by Igor Pavlov  the creator of LZMA and 7zip, is available in the public domain with reference implementations of the algorithm in various programming languages. Currently, we are trying to analyze the available source code to understand how it works and to possibly improve on. 4.2.3 Huffman Coding In computer science and information theory, Huffman coding is an entropy encoding algo rithm used for lossless data compression. The term refers to the use of a variablelength code table for encoding a source symbol (such as a character in a file) where the variablelength code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. Huffman coding uses a specific method for choosing the representation for each symbol, re sulting in a prefix code (sometimes called “prefixfree codes”, that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol weight, links to two child nodes and the optional link to a parent node. As a common convention, bit 0 represents following the left child and bit 1 represents following the right child. A finished tree has up to n leaf nodes and n1 internal nodes. A Huffman tree that omits unused symbols produces the most optimal code lengths. If the symbols are sorted by probability, there is a lineartime (O(n)) method to create a Huffman tree using two queues, the first one containing the initial weights (along with pointers to the associated leaves), and combined weights (along with pointers to the trees) being put in the back of the second queue. This assures that the lowest weight is always kept at the front of one of the two queues: 1. Start with as many leaves as there are symbols. 2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). 12 3. While there is more than one node in the queues: (a) Dequeue the two nodes with the lowest weight by examining the fronts of both queues. (b) Create a new internal node, with the two justremoved nodes as children (either node can be either child) and the sum of their weights as the new weight. © Enqueue the new node into the rear of the second queue. 4. The remaining node is the root node; the tree has now been generated. Huffman coding is expected to give good compression ratios and is the main algorithm for us to look up to. Applied on top of the RENE encoding, we are looking forward to promising results. 4.2.4 Arithmetic Coding Arithmetic coding is a form of variablelength entropy encoding used in lossless data com pression. Normally, a string of characters such as the words “hello there” is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arith metic encoding, frequently used characters will be stored with fewer bits and notsofrequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arith metic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0). In the simplest case, the probability of each symbol occurring is equal. For example, consider a sequence taken from a set of three symbols, A, B, and C, each equally likely to occur. Simple block encoding would use 2 bits per symbol, which is wasteful: one of the bit variations is never used. A more efficient solution is to represent the sequence as a rational number between 0 and 1 in base 3, where each digit represents a symbol. For example, the sequence “ABBCAB” could become 0.0112013. The next step is to encode this ternary number using a fixedpoint binary number of sufficient precision to recover it, such as 0.0010110012  this is only 9 bits, 25% smaller than the naive block encoding. This is feasible for long sequences because there are efficient, inplace algorithms for converting the base of arbitrarily precise numbers. To decode the value, knowing the original string had length 6, one can simply convert back to base 3, round to 6 digits, and recover the string. In general, arithmetic coders can produce nearoptimal output for any given set of sym bols and probabilities, the optimal value being log2 P bits for each symbol of probability P . Compression algorithms that use arithmetic coding start by determining a model of the data basically a prediction of what patterns will be found in the symbols of the message. The more accurate this prediction is, the closer to optimality the output will be. In general, each step of the encoding process, except for the very last, is the same; the encoder has basically just three pieces of data to consider: 1. The next symbol that needs to be encoded 13 2. The current interval (initially the interval is set to [0,1)) 3. The probabilities the model assigns to each of the various symbols that are possible at this stage The encoder divides the current interval into subintervals, each representing a fraction of the current interval proportional to the probability of that symbol in the current context. Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. When all symbols have been encoded, the resulting interval unambiguously identifies the sequence of symbols that produced it. Anyone who has the same final interval and model that is being used can reconstruct the symbol sequence that must have entered the encoder to result in that final interval. It is not necessary to transmit the final interval, however; it is only necessary to transmit one fraction that lies within that interval. In particular, it is only necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that begin with those digits fall into the final interval. 14 5 Conclusion Vibration measurements is an important piece of information that is transmitted back from space crafts which can be used for failure analysis, error corrections and future refinements. However, the bandwidth available to transmit this information is very limited and it is im portant to apply compression to make a more efficient use of bandwidth. The problem with vibration signals is that the degree of randomness is very high and it is hard to compress them down to a feasible size. Our approach here is to attempt to reduce the randomness by an encoding mechanism that we have devised  LEAVEN  and then perform compression using one of the available methods. We are also looking at introducing slight improvements over the compression techniques to make them better suite our specific needs. We will be measuing the compression ratios under the four different compression algorithms  with and without LEAVEN  and do a comparison to select the best among these. 15 References [1] David Huffman “A Method for the Construction of MinimumRedundancy Codes” Pro ceedings of the IRE, Vol. 40, No. 9. (08 September 1952), p. 10981101. [2] Jacob Ziv and Abraham Lempel “A Universal Algorithm for Sequential Data Compression” IEEE Transactions on Information Theory, Vol. IT23, No. 3, (May 1977), p. 337343. [3] T. A. Welch “A Technique for HighPerformance Data Compression” Computer, Vol. 17, No. 6. (June 1984), p. 819. [4] William Stallings “HighSpeed Networks and Internets, Second Edition” Pearson Education, 2004 [5] 7zip, 7zip [6] Wikipedia, wikipedia 16 


