Integration Of Data mining And Data warehousing Systems
computer science topics|
Active In SP
Joined: Jun 2010
28-06-2010, 10:23 PM
Integration Of Data mining And Data warehousing Systems.doc (Size: 553.5 KB / Downloads: 112)
Integration Of Data mining And Data warehousing Systems
3/3 M.C.A I Semester
Traditionally, organizations use data tactically - to manage operations. For a competitive edge, strong organizations use data strategically â€œ to expand the business, to improve profitability, to reduce costs, and to market more effectively. Data mining creates information assets that an organization can leverage to achieve these strategic objectives. Data Mining is the process of extracting knowledge hidden from large volumes of raw data. We define data mining as "the data-driven discovery and modeling of hidden patterns in large volumes of data."
Data mining: the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data might be one of the most valuable assets of your corporation - but only if you know how to reveal valuable knowledge hidden in raw data. Data mining allows you to extract diamonds of knowledge from your historical data and predict outcomes of future situations.
Data warehousing: Integrating data from multiple sources into large warehouses and support on-line analytical processing and business decision making. The necessity of data warehousing is Data explosion problem--- automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases.
The actual need of data warehouse is
Â¢ To store wast and heterogeneous data for managerial decision purpose.
Â¢ We can store data in various dimensions with in a data warehouse. So, it is easy to analyze the data and to take decisions.
A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managementâ„¢s decision-making process.
--- W. H. Inmon
A data warehouse is architecture, a semantically consistent data store to fulfill different data access and reporting requirements, or an on-going process that blends data from multiple heterogeneous sources to support the continuing need for structured and /or ad hoc queries, analytical reporting, and decision support.
We have different types of methods to do modeling of data warehouses, they are
Star schema: A single object in the middle connected to a number of objects radially.
Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables.
Fact constellations: Multiple fact tables share dimension tables
OLAP: On-Line Analytical Processing:
A multidimensional, LOGICAL view of the data. We use OLAP operations to analytical processing of data stored in the form of data cubes in data warehouses. The
OLAP techniques contains interactive analysis of the data like drill, pivot, slice_dice, filter etc., Analytical modeling contains deriving ratios, variance, etc. and involving measurements or numerical data across many dimensions. Summarization and aggregations at every dimension intersection. OLAP methods are useful due to the following facilities,
Â¢Forecasting, trend analysis, and statistical analysis.
Â¢Retrieves and displays data in 2D or 3D cross tabs, charts, and graphs, with easy
pivoting of the axes.
Â¢Responds to queries quickly.
Integration of Data Mining and Data Warehousing:
Â¢ Data warehouse provides clean, integrated data for fruitful mining.
Â¢ Data mining provides powerful tools for analysis of data stored in data warehouses.
Â¢ OLAP can be viewed as data summarization and simple data mining facility.
Â¢ Data mining provides more analysis tools, e.g., association, classification, clustering, pattern-directed, and trend analysis.
Â¢ Mining multi-level knowledge by integration with OLAP facilities: mining in multiple data cubes.
In data warehouses the data can be stored and operated by using data cube technology.
Data Warehouse Operations:
Ã¯ÂÂ¬ Roll-up: Aggregates (summarizes) along a dimension
Ã¯ÂÂ¬ Drill-down: Increases detail of a dimension
Ã¯ÂÂ¬ Slice: Select a subset of the available dimensions
Ã¯ÂÂ¬ Dice: Group or partition on one or more dimensions
Ã¯ÂÂ¬ Pivot: Reorient a cube by swapping dimensions
Data Mining Functionality:
The following are different kinds of functionalities of data miningÂ¦
Concept description: Characterization and Comparison:
Generalize, summarize, and possibly contrast data characteristics.
e.g., dry vs. wet regions.
From association, correlation, to causality.
inding rules like inside(x, city) --> near(x, highway).
Classification and Prediction:
Classify data based on the values in a classifying attribute, e.g., classify countries based on climate, or classify cars based on gas mileage.
Predict some unknown or missing attribute values based on other information
Group data to form new classes, e.g., cluster houses to find distribution patterns.
Trend and deviation analysis:
Find and characterize evolution trend, sequential patterns, similar Sequences, and deviation data, e.g., stock analysis.
Similarity-based pattern-directed analysis:
Find and characterize user-specified patterns in large databases.
Find segment-wise or total cycles or periodic behaviors in time-related data.
Data Mining Applications:
-> Numerous data mining applications.
â€œ Querying database knowledge
â€œ Multi-level data browsing
â€œ Performance prediction
â€œ Market analysis
â€œ Database design and query optimization
â€œ Intelligent query answering.
-> Intelligent Query Answering
â€œ Extended data model: Schemas, hierarchies, multi-layered databases, generalized relations/cubes, data mining tools.
â€œ Intelligent answering, Multi-level summaries & statistics, neighborhood info, Ëœroll-upâ„¢ & Ëœdrill-downâ„¢ facilities.
Â¢ Data mining: A rich, promising, young field with broad applications and many challenging research issues.
Â¢ Recent progress: Database-oriented, efficient data mining methods in relational and transaction DBs.
Â¢ Tasks: Characterization, association, classification, clustering, sequence and pattern analysis, prediction, and many other tasks.
Â¢ Domains: Data mining in extended-relational, transaction, object-oriented, spatial, temporal, document, multimedia, heterogeneous, and legacy databases, and WWW.
Â¢ Technology integration:
â€œ Database, data mining, & data warehousing technologies.
â€œ Other fields: machine learning, statistics, neural network, Information theory,
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion