CAPTCHA Security: A Low-Cost Attack on a Microsoft CAPTCHA
Active In SP
Joined: Oct 2010
31-10-2010, 09:50 PM
CAPTCHA Security: A Low-Cost Attack on a Microsoft CAPTCHA
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
COLLEGE OF ENGINEERING TRIVANDRUM
CAPTCHA Security A Low-Cost Attack on a Microsoft CAPTCHA.docx (Size: 509.68 KB / Downloads: 110)
CAPTCHA is now almost a standard security technology. The most widely used CAPTCHAs rely on the sophisticated distortion of text images rendering them unrecognizable to the state of the art of pattern recognition techniques, and these text-based schemes have found widespread applications in commercial websites. The state of the art of CAPTCHA design suggests that such text-based schemes should rely on segmentation resistance to provide security guarantee, as individual character recognition after segmentation can be solved with a high success rate by standard methods such as neural networks. The security of a text-based CAPTCHA designed by Microsoft and deployed for years at many of their online services including Hotmail, MSN and Windows Live is analyzed here. This scheme was designed to be segmentation-resistant, and it has been well studied and tuned by its designers over the years. However, a simple attack has achieved a segmentation success rate of higher than 90% against this scheme. It took on average ~80 ms for the attack to completely segment a challenge on a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM. As a result, researchers estimate that this Microsoft scheme can be broken with an overall (segmentation and then recognition) success rate of more than 60%. On the contrary, its design goal was that "automatic scripts should not be more successful than 1 in 10,000" attempts (i.e. a success rate of 0.01%). For the first time, it is shown that a CAPTCHA that is carefully designed to be segmentation-resistant is vulnerable to novel but simple attacks. However, these results show that it is not a trivial task to design a CAPTCHA scheme that is both usable and robust.
A CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a program that generates and grades tests that are human solvable, but intends to be beyond the capabilities of current computer programs. This technology is now almost a standard security mechanism for defending against undesirable or malicious Internet bot programs, such as those spreading junk emails and those grabbing thousands of free email accounts instantly. It has found widespread application on numerous commercial web sites including Google, Yahoo, and Microsoft’s MSN.
The most widely used CAPTCHAs are the so-called text-based schemes, which rely on sophisticated distortion of text images aimed at rendering them unrecognizable to the state of the art of pattern recognition programs. The popularity of such schemes is due to the fact that they have many advantages, for example, being intuitive to users world-wide (the user task performed being just character recognition), having little localization issues (people in different countries all recognize Roman characters), and of good potential to provide strong security (e.g. the space a brute force attack has to search can be huge, if the scheme is properly designed).
A good CAPTCHA must be not only human friendly, but also robust enough to resist to computer programs that attackers write to automatically pass CAPTCHA tests. Early research suggested that computers are very good at recognizing single characters, even if these characters are highly distorted. Table 1 shows characters under typical distortions, along with success rates that a neural network can achieve to recognize them. It is established that if the positions of characters are known in challenge images generated by a CAPTCHA, then breaking this scheme is just a pure recognition problem, which is a trivial task with standard machine learning techniques such as neural networks.
However, when the location of characters in a CAPTCHA challenge is not known a-priori, state of the art (including machine learning) methods do not work well in locating the characters, let alone recognizing them. The problem of identifying character locations in the right order, or segmentation, is still a challenging problem in the fields such as handwriting recognition and computer vision. In general, segmentation is computationally expensive, and often a combinatorially hard problem.
The state of the art of CAPTCHA design suggests that the robustness of text-based schemes should rely on the difficulty of finding where the character is (segmentation), rather than which character it is (recognition). That is, such CAPTCHAs should be segmentation-resistant. In other words, if breaking a (text-based) CAPTCHA can be successfully reduced to a problem of individual character recognition, then this scheme is effectively broken.
2. THE MSN SCHEME
Fig 1 shows some sample challenges generated by the MSN CAPTCHA scheme.
It was observed that the MSN scheme had the following characteristics.
• Eight characters are used in each challenge;
• Only upper case letters and digits are used.
• Foreground (i.e. challenge text) is dark blue and background light gray.
• Warping (both local and global) is used for character distortion.
Local warp produces “small ripples, waves and elastic deformations along the pixels of the character”, and it foils “feature-based algorithms which may use character thickness or serif features to detect and recognize characters”. Characters in the first and second rows of Table 1 are largely distorted by local warping.
Global warp generates character-level, elastic deformations to foil template matching algorithms for character detection and recognition. Characters in the third and fourth rows of Table 1 are largely distorted by global warping.
• The following random arcs of different thicknesses are used as the main anti segmentation measure.
o Thick foreground arcs: These arcs are of foreground color. Their thickness can be the same as the thick portions of characters. They do not directly intersect with any characters, so they are also called “non-intersecting arcs”.
o Thin foreground arcs: These arcs are of foreground color. Although they are typically not as thick as the above type of arcs, their thickness can be the same as the thin portions of characters. They intersect with thick arcs, characters or both, and thus also called “intersecting thin arcs”.
o Thin background arcs: These arcs are thin and of background color. They cut through characters and remove some character content (pixels).
What is special in the design of the MSN scheme is the following: in contrast to many other schemes that use background textures and meshes in foreground and background colors as clutter to increase robustness, random arcs of different thicknesses are used as clutter in this scheme. The idea is that these arcs are themselves a good candidate for false characters, and therefore such a design was expected to provide strong segmentation resistance.
3. A LOW-COST SEGMENTATION ATTACK
This paper has presented a low cost attack that can effectively and efficiently segment challenges generated by the MSN scheme. Specifically, the attack achieves the following:
o Identify and remove random arcs
o Identify all character locations in the right order. Specifically, divide each challenge into 8 ordered segments, each containing a single character.
The attack is built on observing and analyzing the 100 random samples that was collected – this is a “sample set”. The effectiveness of this attack was tested not only on the sample set, but also on a large test set of 500 random samples – the design of the attack used no prior knowledge about any sample in this set. This methodology follows the common practice in the fields such as computer vision. The attack involves 7 consecutive steps, each of which is detailed in the following sections.
We first convert a color challenge to a two-color image using a threshold method: pixels with intensity higher than a certain threshold value are converted to white, and those with a lower intensity to black.
3.2 Fixing broken characters
Thin background arcs remove some character content, and sometimes they also create a crack in characters. For example, the second character (‘T’) in Fig 2 is broken due to this reason. The current step attempts to fix broken characters for two purposes: i) to keep a character as a single entity and consequently enhance our follow-up segmentation methods, and ii) to prevent small portions of characters from being removed as an arc later on. We observed that thin background arcs are typically 1-2 pixels wide after binarization, and the following simple method works well to identify and fix broken characters caused by such arcs.
①. Find pixels that are of background color and have left and right neighbors with foreground color (see Fig 3(a)).
②. Find pixels that are of background color and have top and bottom neighbors with foreground color (see Fig 3(b)).
③. Convert pixels identified above to foreground color.
A side effect of this method is that it might introduce additional foreground pixels that connect components that are initially disconnected. For example, in Fig 4, a thin arc intersecting with ‘R’ is now connected with another arc intersecting with ‘E’. But this drawback has proven a negligible issue in the study – that would not be this case if we chose to connect all two-pixel gaps.
3.3 Vertical segmentation
A vertical segmentation method is applied to segment a challenge vertically into several chunks, each of which might contain one or more characters. The process of vertical segmentation starts by mapping the image to a histogram that represents the number of foreground pixels per column in the image. Then, vertical segmentation lines separate the image into chunks by cutting through columns that have no foreground pixels at all. Fig 5 shows that such vertical histogram segmentation cuts challenge (a) into two chunks, and the other (b) into five.
Typically, this vertical method not only achieves partial segmentation, but also contributes to our divide-and-conquer strategy, which is key to the success of our attack.
3.4 Color filling segmentation
In this step, a “color filling segmentation (CFS)” algorithm is applied to each chunk segmented in the previous step. The basic idea of this algorithm is to detect every connected component, which we call an object, in a chunk. An object can be an arc, character, connected arcs, or connected characters. The algorithm works as follows. First, detect a foreground pixel, and then trace all its foreground neighbors until all pixels in this connected component are traversed – that is, an object is detected. Next, the algorithm locates a foreground pixel outside of the area of the detected object(s), and starts another traversal process to identify a next object. This process continues until all objects in the chunk are located. This method is effectively like using a distinct color to flood each connected component, so we call it the “color filling” segmentation. In the end, the number of colors used to fill a chunk is the number of objects in the chunk.
The CFS method uses the notion of 8-connectivity for determining whether an adjacent pixel is a neighbor or not. In this notion, a pixel has eight neighbors (i.e., left, right, up, down, upper left, lower left, upper right and lower right). In contrast, in the notion of 4-connectivity, a pixel has only four neighbors (i.e., left, right, up, down). Fig 6 illustrates and compares both 8-connectivity and 4-connectivity. Of course, if a pixel and its neighbor are of the same color, then they are connected.
With the CFS method, as shown in Fig 7, we determine that there are six objects in the first chunk and five in the second.
Often, a challenge is divided into four or five chunks by vertical segmentation. It is worthwhile to mention that this color filling step is applied to each chunk (see Fig 8(a)), rather than only those wider chunks that probably contain more than one object. The reason is simply that thinner chunks might also contain more than one object (see Fig 8(b)), and we need to locate all objects in each chunk and track the number of objects for the follow-up arc removal and other steps.
CFS contributes to further segmentation by detecting objects that cannot be segmented by the vertical method, and also gives the number of objects in each chunk. As will be discussed later on, CFS also contributes to further steps such as arc removal.
3.5 Thick arc removal
Thick arcs, if any, will be detected and removed after the above color filling process.
Arc removal algorithm. The algorithm is largely based on the above observations, and includes the following steps.
1) Circle detection, which detects if an object contains a circle. If an object contains a circle, we know it is definitely not an arc, and all other arc removal methods can be skipped. The circle detection method works as follows.
• Draw a bounding box around an object, so that this bounding box does not touch any part of the object.
• Apply the color filling algorithm to the top-left pixel, i.e., flood all background pixels that are connected to the top-left pixel, with a color that is different from foreground and background
• Scan the bounding box for pixels of the background color. If such a pixel is found, then a circle is detected. Otherwise, no circle is detected.
Fig 9 shows two example cases. In Fig 9 (a), there is no pixel of the original background color once the filling algorithm is applied. That is, we are sure this object does not contain any circles. In contrast, the filling algorithm cannot get rid of all pixels of the original background color in Fig 9 (b). Therefore, by detecting these pixels, the algorithm is sure that a circle exists in this object. (To improve the efficiency of the filling algorithm, the minimal gap between the object and the bounding box is just one pixel in both cases.)
2) Scan all objects that contain no circles for discriminative features (other objects are safely ignored). Such discrimination is largely about pixel count checking. If an object has a pixel count smaller than or equal to 50, it is removed as an arc. Fig 10 shows that an arc is removed in the 2nd chunk due to its small pixel count.
3) Relative position checking. This step examines the relative position of objects in a chunk, and is applied to all chucks that contain more than one object (note that connected characters are considered as a single object). The basic idea behind this step is that the relative positions of objects can tell arcs and real characters apart. For example, typically characters are closer to the baseline (i.e. the horizontal central of a chunk) whereas arcs are closer to the top or bottom image borders. In addition, characters are horizontally juxtaposed, but never vertically. Once this step is completed, the histogram is updated. Fig 11 shows that further arcs are removed by this method in the image processed by the previous step, and its histogram is updated.
4) Detection of remaining arcs. The above steps do not necessarily identify all the arcs in an image. What is done in this step is as follows. First, count the number of remaining objects in the image (identified arcs are already removed and thus not counted). If this number is larger than 8, then there is at least one undetected arc in the image. A surprising observation about these undetected arc(s) is that they often are the first or last object in the current image. An ad-hoc method works for most of the cases by simply checking the first and last objects with the following rules:
• If only one of them contains a circle, the object without a circle is removed as an arc.
• If neither of them contains a circle, then the object with a smaller pixel count is removed.
This process repeats until the image has exactly 8 objects remaining.
Fig 12 shows the whole arc removal process with another example. Fig 12 (a) is an image segmented by vertical and CFS segmentations. The discriminative feature checking failed to detect any arc, but relative position checking detected an arc in both the 4th and 6th chunks. Fig 12 (b) is the result after those two arcs are removed and the histogram was updated. Then, escaped arcs detection catch the last object as an arc. The final image after all the process of arc removal is Fig 12 ©.
3.6 Locating connected characters
After removing arcs, an immediate step is to locate, if any, connected characters, which either vertical or color filling segmentation has failed to segment. Among n objects output by the previous step, if n < 8, then at least one of the objects contains two or more characters and these characters are connected (typically by thin intersecting arcs). This step estimates how many characters are connected and locates them. The following design and implementation features of the MSN scheme all contribute to being able to estimate which objects contain how many connected characters.
• Fixed length: every challenge uses 8 characters.
• Connected characters in an object are horizontally but never vertically juxtaposed. Therefore, an object containing two or more connected characters is typically wider than other objects.
• On average a segmented chunk – by definition, a chunk cannot be further segmented by the vertical method but can by the CFS method -- contains more than one character if the chunk is wider than 35 pixels. (This width was measured after the following normalization process was applied to the chunk: the left segmentation line is adjusted to cross the left-most foreground pixel in the chunk vertically and similarly for the right segmentation line.)
According to the number of chunks, the width of each chunk, and the number of objects in each chunk, we can guess with a high success rate which chunk/object contains connected characters and the number of these characters (or in other words, guess how many characters exist in each chunk).
We use two examples (see Fig 13) to show how our algorithm works. The histogram for the image in Fig 13 (a) indicates that it contains four chunks. Since there are exactly 8 characters in these chunks, we know there are the following five exclusive possibilities for the distribution of all the characters among the chunks1:
(a) There are four chunks, each having two characters.
(b) One chunk has three characters and there are two additional chunks each having two characters.
© One chunk has four characters and another two characters.
(d) There are two chunks each having three characters.
(e) One chunk has five characters.
Since the 2nd, 3rd and 4th chunks in the image were all wider than 35 pixels, the algorithm determines that there are at least three chunks each having more than one character. Consequently options ©, (d) and (e) are excluded - none of the options would allow more than two chunks that have more than one character. The algorithm also knows from the CFS algorithm that the 2nd chunk contains three objects, and therefore option (a) is also dropped. This leaves only option (b); thus the algorithm identifies that the 2nd chunk contains exactly three characters and the 3rd and 4th chunks contains two characters each.
The second example (see Fig 13 (b)) is more subtle. The histogram for this image indicates it contains 5 chunks. Since there are exactly 8 characters in these chunks, we know there are the following three exclusive possibilities for the distribution of all the characters among the chunks:
(a) One of the chunks contains 4 characters
(b) One chunk has three characters and another two characters.
© There are three chunks each having two characters.
Since the 3rd and 4th chunks in the image were wider than 35 pixels, the algorithm determines that at least 2 doubles exists and consequently option (a) is excluded. Since there were only two such wider chunks, option © is also dropped. This leaves only option (b).
3.7 Segment connected characters
The previous step has identified any object(s) containing connected characters, as well as the number of these characters, denoted by c, contained in each object. We observed that often, a simple method works to segment the connected characters in an object as follows.
1) Work out the width of the object by identifying its left-most and right-most pixels;
2) Vertically divide the object into c parts of the same width, each part being a proper segment.
For example, it was determined that the last object in Fig 13 (a) and the 3rd object in Fig 13 (b) contain two connected characters. For these objects, what our algorithm does is to evenly divide them into two segments, each being a character. Fig 14 shows the finalized 8 segments for both challenges.
Success rate. The segmentation attack has achieved a success rate of 91% on the sample set. That is, 91 out of 100 challenges were segmented correctly. To check whether the attack was generic enough, we followed the practice in computer vision research. We collected another set of 500 random challenges, and then ran the attack on this test set – our program had no prior knowledge about any sample in this set. The attack achieved a success rate of 92% (the distribution of samples in the test set slightly favors our algorithm).
Attack speed. We implemented our attack in Java (little effort was spent in optimizing the run-time of code), and tested it on a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM. The attack was run ten times on both the sample and test sets, and the average speed was taken (see Table 3). The figures in the table show that our attack is very efficient: on average, it takes only slightly more than 80 ms to completely segment a challenge in both sets.
Implications. State of the art of machine learning can achieve a success rate of at least 95% for recognizing individual characters in the MSN scheme, after they are properly segmented. However, this rate is a conservative estimate for recognizing characters in samples we have collected for this study, for the following reasons.
• First, we checked all samples in our test set after we measured the success rate of our attack, and found that although the same types of distortion techniques were applied to characters in our samples and those listed in Table 1, the former were much less distorted than the latter.
• Second, we have simple methods to get rid of some portions of “intersecting thin arcs” in each segmented character so that these characters are even less distorted and consequently easier to be recognized by machine learning techniques. For example, one of our methods is to guess the area of the real character inside an object by checking the density of foreground pixels for the object. As illustrated in Fig 15 (where the example is taken from the last segment in Fig 14 (a)), the majority of columns and rows inside the red box have a pixel count larger than 3, while for portions outside of this box, the majority of columns and rows will range between 1 and 2 pixels – the thickness of thin intersecting arcs. Thus, portions of such arcs are rightly recognized and removed as distortion.
As such, our segmentation attack suggests that the MSN scheme can be broken with at least an overall (segmentation and recognition) success rate of 61% (≈ .92*.95^8).
5 ANALYSIS OF THE SEGMENTATION ATTACK
All cases of failure of the segmentation attack in both the sample and test sets were analyzed, and found that three types of failure occurred as follows.
Failure of arc removal: in this case, not all thick arcs are detected and removed. In the following example, the 1st object in Fig 16 (a) is an undetected arc - it was treated as a valid character, and therefore our algorithm failed to recognize that the 5th object contained connected characters, leading to the failure to completely segment this challenge.
Failure of approximation. This failure is about incorrectly identifying connected characters. Fig 16 (b) gives such a failure case, in which a single character (‘W’) was wider than two connected characters and therefore the latter were not identified, leading to the final segmentation failure.
Failure of “even cut” (Segmentation of connected characters). It is not surprising that the simple “even cut” method we used for segmenting connected characters does not always work. Fig 16 © gives such a failure case, in which the first two connected characters (‘RA’) were properly segmented, but the second pair (‘CJ’) was not (due to the length of the intersecting arc).
Fig 17 compares the failure rate of all these scenarios in both the sample and test sets. In the sample set, 5% of the segmentation errors were due to failures of arc removal (versus 17 out of 500 samples, i.e. 3.4%, in the test set), 2% of the segmentation errors were due to failure to locate connected characters (versus 3.2% in the test set), and 2% due to inaccuracy of the crude “even cut” method for segmenting connected characters (versus 1.4% in the test set). Thus, the failure patterns in both sets are similar.
The combination of arc removal and “approximation” accounts for 77.8% (7/9) and 82.5% (33/40) of the failures in the sample and test sets respectively. To improve our attack, arc removal will be an obvious first-priority target. This is not only because arc removal is the largest failure category: 56.6% (5/9) in the sample set and 42.5% (17/40) in the test set, but also because if the failure rate of arc removal is decreased, the success rate of “approximation” will be proportionally increased.
5.1 Strength of the MSN scheme
The designers recognized that both security and usability matter in the design of CAPTCHAs, and efforts were spent in addressing both issues in their design including. In particular, they attempted to address trade-off between security and usability – which is a tricky issue for CAPTCHA design.
Usability of this MSN scheme is reasonably good. For example, characters are not so distorted as to damage their recognisability by most human users. One particular good usability feature is that this scheme does not significantly disadvantage people whose mother tongue does not use the Latin alphabet. In some schemes, characters are distorted to be similar to handwriting - native speakers might find it easy to recognize them, but just imagine how difficult it would be for a user to recognize handwriting in a language that she knows nothing about.
5.2 Weakness of the MSN scheme
Security. A major problem of this scheme is that it is vulnerable to our simple segmentation attack. The segmentation resistance built into this scheme seems to be largely about preventing bounding-box based segmentation, and apparently its designers never realized that a simple color filling process can be used to do segmentation effectively and that a combination of vertical and color filling segmentation can be powerful. Moreover, it is easy to tell arcs from characters by examining characteristics such as pixel counts, shapes, locations, relative positions, and distances to baseline. In addition, the use of a fixed number of characters per challenge also aids our segmentation attack.
Usability. Some distortions still cause usability concerns. For example, it is difficult to tell an arc from a character such as ‘J’, ‘7’ and ‘L’. In particular, the confusion between an arc and ‘J’ was observed regularly in our study.
5.3 On the design of CAPTCHAs
• It is not a trivial task to design a CAPTCHA scheme that is both usable and robust.
• Size matters. The size of the character set to be used in a CAPTCHA matters, and so does the length of a string used in each challenge. If the character set size and the string length are too small, bots would have a high chance of using random guesses to pass the CAPTCHA test. On the other hand, the longer a string that is used in a challenge, the more security it can achieve. For example, assume that individual character recognition rate is r (<1), the chance of recognizing the whole challenge of n characters is typically rn, which decreases as n grows.
• The string length issue has interesting usability implications. If strings composed of random combinations of characters are used in a scheme, then the longer the string is, the more difficult the scheme is to use. The reason is it is more demanding for users to correctly decode and enter their answers. For example, users might tend to make recognition mistake, e.g. due to distorted character looking like others. However, this is not necessarily the case in schemes where English words are used. For example, it was observed for the reCAPTCHA scheme that the longer the string is, the higher the pass rate users have. A likely explanation is that the longer the word is, the more information people can gather, and thus Gestalt psychology effectively helps people to decode the word correctly. However, from short words that are too distorted to recognize immediately, users would not be able to gather enough information to decode them correctly.
• Whether the length of strings used in a scheme is predictable or not can also be a design issue that has implications on both security and usability. We observed that although the design choice of using a fixed string length in the MSN scheme aided our segmentation attack, it can improve the scheme’s usability.
• On the contrary, if the MSN scheme used a variable and unpredictable string length for each challenge, it would be much harder or even impossible for users to recognize that the above-mentioned objects are indeed arcs. At the cost of this decrease in usability, however, this design choice would make it much harder or even impossible to perform an automatic segmentation attack such as ours.
For the first time, it has been shown that although the Microsoft’s MSN CAPTCHA intentionally bases its robustness on segmentation resistance, it is vulnerable to a simple, low-cost segmentation attack. Our attack has achieved a segmentation success rate of 92%, and this implies that the MSN scheme can be broken with an overall (segmentation and then recognition) success rate of more than 60%. Therefore, our work shows that the MSN scheme provides only a false sense of security. An important lesson is that even if segmentation resistance is a sound principle for designing secure text-based CAPTCHAs, the devil is in the details. It is critical to make sure that a design is not vulnerable to any known (and ideally unknown) segmentation method. On the other hand, designing CAPTCHAs that exhibit both good robustness and usability is much harder that it might appear to be. First, current collective understanding of this topic is still in its infancy. Second, the requirements, tools and methodologies for assessing
CAPTCHA designs are almost non-existent. To evolve the design of CAPTCHA, a young but important topic, from an art into a science still requires considerable study. Our experience suggests that CAPTCHA will go through the same process of evolutionary development as cryptography, digital watermarking and the like, with an iterative process in which successful attacks lead to the development of more robust systems.
 J. Yan and A.S. El Ahmad, "A Low-Cost Attack on a Microsoft CAPTCHA," Proc. 15th ACM Conf. Computer and Communications Security (CCS 08), ACM Press, 2008, pp. 543–554.
 Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford. The CAPTCHA Web Page: captcha.net. 2000.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion