Distributional Features for Text Categorization
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
project report tiger
Active In SP
**

Posts: 1,062
Joined: Feb 2010
#1
10-02-2010, 10:55 PM


Text categorization is the task of assigning predefined categories to natural language text. With the widely used Ëœbag of wordsâ„¢ representation, previous researches usually assign a word with values such that whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, they have not fully expressed the abundant information contained in the document. This paper explores the effect of other types of values, which express the distribution of a word in the document. These novel values assigned to a word are called distributional features, which include the compactness of the appearances of the word and the position of the first appearance of the word. The proposed distributional features are exploited by a tfidf style equation and different features are combined using ensemble learning techniques. Experiments show that the distributional features are useful for text categorization. In contrast to using the traditional term frequency values solely, including the distributional features requires only a little additional cost, while the categorization performance can be significantly improved. Further analysis shows that the distributional features are especially useful when documents are long and the writing style is casual.
Reply
projectsofme
Active In SP
**

Posts: 1,124
Joined: Jun 2010
#2
07-10-2010, 10:52 AM


.ppt   Text Categorization.ppt (Size: 327 KB / Downloads: 82)
[u]Text Categorization
[/u]


Foundations of Statistical Natural Language Processing


Task Description


Goal: Given the classification scheme, the system can decide which class(es) a document is related to.
A mapping from document space to classification scheme.
1 to 1 / 1 to many
To build the mapping:
observe the known samples classified in the scheme,
Summarize the features and create rules/formula
Decide the classes for the new documents according to the rules.


Reply
raficse
Active In SP
**

Posts: 1
Joined: Jan 2011
#3
18-01-2011, 09:36 PM

Hi friends,
I'm Rafi , doing my final year computer science and Engineering . I'm in need of a ppt or pdf file for Distributional of Text categorization .
Reply
project topics
Active In SP
**

Posts: 2,492
Joined: Mar 2010
#4
19-01-2011, 09:59 AM

check below links to get ppt and pdf of Distributional Features for Text Categorization

cs.umass.edu/~ronb/papers/sigir.ppt
lamda.nju.edu.cn/xuexb/files/ecml06DistFeature.pdf
ieeexplore.ieeeiel5/69/4358933/04589210.pdf?arnumber=4589210
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Reply
seminar class
Active In SP
**

Posts: 5,361
Joined: Feb 2011
#5
16-03-2011, 12:33 PM


.doc   ITDDM03.doc (Size: 26.5 KB / Downloads: 51)
Distributional Features for Text Categorization
Abstract
Text categorization is the task of assigning predefined categories to natural language text. With the widely used “bag-of-word” representation, previous researches usually assign a word with values that express whether this word appears in the document concerned or how frequently this word appears. These features are not enough for fully capturing the information contained in a document. Although these values are useful for text categorization, they have not fully articulated the abundant information contained in the document. This project and implimentation explores the effect of other types of values, which express the circulation of a word in the document. These novel values assigned to a word are called distributional features, which include the neatness of the appearances of the word and the position of the first appearance of the word. The proposed distributional features are exploited by a tfidf style equation, and different features are combined using ensemble learning techniques. Thus we conclude that the distributional features are useful for text categorization, especially when they are combined with term frequency or combined together.
Existing system:
The existing system assigns a word with values that express whether this word appears in the document concerned or how frequently this word appears. Another system uses a statistical phrase that is composed of a sequence of words that occur contiguously in text in a statistically interesting way, which is usually called n-gram.
Existing system disadvantages:
• The existing features are not enough for fully capturing the information contained in a document.
• The performance of the system is comparatively slow.
Proposed system:
The proposed distributional features are exploited by a tfidf style equation, and different features are combined using ensemble learning techniques. The extraction of the distributional features is efficiently implemented using the inverted index constructed for the corpus. Using such type of index, for a given word-document pair, we can obtain not only the frequencies of the word but also the positions where the word appears. With the position information and the length of the document, the distribution of the word is constructed and the distributional features are computed.
Proposed system advantages
• Distributional features for text categorization requires only a little additional cost.
• Combining traditional term frequency with the distributional features improves the performance of the system.
• The effect of the distributional features is obvious when the documents are long and when the writing style is informal.
Software Requirements:-
Operating System Windows XP
Platform Visual Studio .Net 2008
Database SQL Server 2005
Languages Asp.Net, C#.Net.
Hardware Requirements:-
Hard Disk 40 GB
Monitor 15’ Color with VGI card support
RAM Minimum 512 MB
Processor Pentium IV and Above (or) Equivalent
Processor speed Minimum 1.4 GHz
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page

Quick Reply
Message
Type your reply to this message here.


Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Frequent Term-Based Text Clustering project report maker 1 1,896 17-03-2014, 03:03 PM
Last Post: MichaelKa
  COLOR LOCAL TEXTURE FEATURES FOR COLOR FACE RECOGNITION seminar poster 0 441 29-10-2013, 11:42 AM
Last Post: seminar poster
  Text Super-Resolution and Deblurring using Multiple Support Vector Regression pdf seminar projects maker 0 301 12-09-2013, 12:18 PM
Last Post: seminar projects maker
  Robustness of Offline Signature Verification Based on Gray Level Features Full Report study tips 0 338 29-08-2013, 03:54 PM
Last Post: study tips
  TEXT FILE HIDING IN AUDIO FILE USING LOW BIT ENCODING STEGANOGRAPHY FULL REPORT study tips 0 491 16-07-2013, 04:35 PM
Last Post: study tips
  Image Retrieval Techniques based on Image Features: A State of Art approach pdf study tips 0 306 02-07-2013, 04:55 PM
Last Post: study tips
  IMAGE RETRIEVAL USING BOTH COLOR AND TEXTUAL FEATURES USING GLCM ABSTRACT study tips 0 428 08-06-2013, 02:13 PM
Last Post: study tips
  Collaborative target tracking using multiple visual features in smart camera networks study tips 0 341 30-05-2013, 04:10 PM
Last Post: study tips
  Extraction of Text Regions in Natural Images project uploader 2 721 16-04-2013, 10:00 AM
Last Post: study tips
  COLOR AND TEXTURE FEATURES OF IMAGE INDEXING AND RETRIEVAL PPT study tips 0 372 04-04-2013, 04:42 PM
Last Post: study tips