distributed database full report
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
project report tiger
Active In SP

Posts: 1,062
Joined: Feb 2010
12-02-2010, 10:23 AM

.doc   Distributed Databases.doc (Size: 478.5 KB / Downloads: 277)

Today's business environment has an increasing need for distributed database and client/server applications as the desire for reliable, scalable and accessible information is steadily rising. Distributed database systems provide an improvement on communication and data processing due to its data distribution throughout different network sites. Not only is data access faster, but a single-point of failure is less likely to occur, and it provides local control of data for users. However, there is some complexity when attempting to manage and control distributed database systems.
The DDBMS synchronizes all the data periodically, and in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere. A distributed database can also be defined as a collection of multiple, logically interrelated databases distributed over a computer network.
A distributed database management system is then defined as the software system that permits the management of the distributed databases and makes this distribution transparent to the users. Distributed database system is to referred as a combination of the distributed databases and the distributed DBMS Current trends in multi-tier client/server networks make DDBS an appropriated solution to provide access to and control over localized databases. Oracle, as a leading Database Management System (DBMS) vendor employs the two-phase commit technique to maintain consistent state for the database.
DISTRIBUTED DATABASE SYSTEM was first used in mainframe environments in the 1950s and 1960s. But they have flourished best since the development, in the 1980s and 1990s, of minicomputers and powerful desktop and workstation computers, along with fast, capacious telecommunications, has made it (relatively) easy and cheap to distribute computing facilities widely.
Users have access to the portion of the database at their location so that they can access the data relevant to their tasks without interfering with the work of others. A centralized distributed database management system (DDBMS) manages the database as if it were all stored on the same computer. The DDBMS synchronizes all the data periodically and, in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere. Distributed database is a database that is under the control of, a central database management system in which storage devices are not all attached to a common CPU.
Distributed database technology is one of the most important developments of the past decades. The maturation of data base management systems - DBMS - technology has coincided with significant developments in distributed computing and parallel processing technologies and the result is the emergence of distributed DBMSs and parallel DBMSs. These systems have started to become the dominant data management tools for highly intensive applications. The basic motivations for distributing databases are improved performance, increased availability, share ability, expandability, and access flexibility. Although, there have been many research studies in these areas, some commercial systems can provide the whole functionality for distributed transaction processing. Important issues concerned in studies are database placement in the distributed environment, distributed query processing, distributed concurrency control algorithms, reliability and availability protocols and replication strategies.
For general purposes a database is a collection of data that is stored and maintained at one central location. A database is controlled by a database management system. The user interacts with the database management system in order to utilize the database and transform data into information. Furthermore, a database offers many advantages compared to a simple file system with regard to speed, accuracy, and accessibility such as: shared access, minimal redundancy, data consistency, data integrity, and controlled access. All of these aspects are enforced by a database management system. Among these things let's review some of the many different types of databases.
A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data. A distributed database is a database that is under the control of a central database management system in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Collections of data can be distributed across multiple physical locations. A distributed database is distributed into separate partitions/fragments.
Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies implementation can' and does definitely depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database. And hence the price the business is willing to spend on ensuring data security, consistency and integrity.
A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it
without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data.
Databases have firmly moved from the realm of research and experimentation into the commercial world. In this area we will address distributed databases related issues including transaction management, concurrency, recovery, fault-tolerance, security, and mobility. Theory and practice of databases will play a prominent role in these pages.
A distributed database system consists of a collection of sites, connected together via some communication network, in which
a. Each site is a full database system site in its own right.
b. The sites have agreed to work together so that a user at any site can access data
anywhere in the network in a transparent manner.
A distributed database is a database that is under the control of a central database management system in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.
Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does definitely depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database. And hence the price the business is willing to spend on ensuring data security, consistency and integrity
Distributed database system (DDBS) = Databases + Computers + Computer Network + Distributed database management system (DDBMS)
A distributed database system can be simply defined as a collection of multiple logically interrelated databases distributed over a computer network and managed by a distributed database management system.
The first generation of data processing is decentralized and unintegrated where data are stored in individual files and specifications are embedded into the programs that manipulate the data. Files are therefore not shared, and any changes in the file structure will affect the data specifications in the programs. The second generation of data processing is centralized and integrated in which data are stored in a centralized database and data specification are stored in a centralized location, normally the same location as the database. The advantages of this model are that changes in database may only affect data specifications but not the programs. The third generation of data processing is distributed and integrated in which data and their local specifications are distributed in a network and there also exists a global view of all the data stored in the network.
What is a Distributed Database Management System
A DDBMS (distributed database management system) is a centralized application that manages a distributed database as if it were all stored on the same computer. The DDBMS synchronizes all the data periodically in some cases multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere. A distributed database can also be defined as a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is then defined as the software system that permits the management of the distributed databases and makes this distribution transparent to the users. Distributed database system is too referred as a combination of the distributed databases and the distributed DBMS.
The implicit assumptions in a distributed database system are; .
¢ data is physically stored across several sites
¢ Each site is typically managed by DBMS that is capable of running independently of the other sites.
Homogeneous: Every site runs same type of DBMS. Heterogeneous: Different sites run different DBMSs
In a Homogeneous distributed database
1. All sites use identical software.
2. All sites are aware of each other and agree to cooperate in processing user requests.
3. Each site in the network surrenders part of its autonomy in terms of right to change schemas or software.
4. It appears to user as a single system.
In a Heterogeneous distributed database
1. Different sites may use different schemas and software
Difference in schema is a major problem for query processing ¢ Difference in software is a major problem for transaction processing
2. Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
1. Capacity and incremental growth
The new nodes can be added to the computer network easily without undergoing much complexity
2. Reliability and availability
Using the replicated data at several nodes, the failure of a node still allows access to the replicated copy of the data from another node. It avoids the transportation of data from one site to another
3. Reduced communication overhead
4. Data is stored close to the anticipated point of use.
5. Efficiency and Flexibility
6. Data can be dynamically moved or replicated to where it is most needed.
a. It becomes handy when a system crash or system in situation. The efficiency of the system increases due to the availability of data and the speed at which data becomes available for the required client
7. In early days data were stored in file systems, but the file systems have a large
number of disadvantages. Databases are used to solve the limitations of file
systems and also for easy storing and retrieval of data.
Several types of failures may occur in distributed database systems:
Transaction Failures: When a transaction fails, it aborts. Thereby, the database must be restored to the state it was in before the transaction started. Transactions may fail for several reasons. Some failures may be due to deadlock situations or concurrency control algorithms.
Site Failures: Site failures are usually due to software or hardware failures. These failures result in the loss of the main memory contents. In distributed database, site failures are of two types:
1. Total Failure where all the sites of a distributed system fail.
2. Partial Failure where only some of the sites of a distributed system fail.
Media Failures: Such failures refer to the failure of secondary storage devices. In these cases, the media failures result in the inaccessibility of part or the entire database stored on such secondary storage.
Communication Failures: Communication failures, as the name implies, are failures in the communication system between two or more sites. This will lead to network partitioning where each site, or several sites grouped together, operates independently. As such, messages from one site won't reach the other sites and will therefore be lost. The reliability protocols then utilize a timeout mechanism in order to detect undelivered messages. A message is undelivered if the sender doesn't receive an acknowledgment. The failure of a communication network to deliver messages is known as performance Failure.
In 1987 one of the founders of relational database theory, C. J. Date, stated 12 goals which, he held, designers should strive to achieve in their DDBs and with the associated DBMSs:
1. Local site independence
2. Central site independence
3. Failure independence
4. Location transparency
5. Fragmentation transparency
6. Replication transparency
7. Distributed query processing
8. Distributed transaction processing
9. Hardware independence
10. Operating system independence
11. Network independence
12. Database independence
Local site independence: Each site in the DDB should act independently with respect to vital DBM functions.
¢ Security
e Concurrency Control
¢ Backup
¢ Recovery
Central site independence: Each site in the DDB should act independently with respect to
¢ The central site
¢ All other remote sites
Note: All sites should have the same capabilities, even though some sites may not necessarily exercise all these capabilities at a given point in time.
Failure independence: The DDBMS should be unaffected by the failure of a node or nodes; the rest of the nodes, and the DDBMS as a whole, should continue to work. Note: In similar fashion, the DDBMS should continue to work if new nodes are added.
Location transparency: Users should not have to know the location of a datum in order to retrieve it.
Fragmentation transparency: The user should be unaffected by, and not even notice, any fragmentation of the DDB. The user can retrieve data without regard to the fragmentation of the DDB.
Replication transparency: The user should be able to use the DDB without being concerned in any way with the replication of the data in the DDB.
Distributed query processing: A query should be capable of being executed at any node in the DDBMS that contains data relevant to the query. Many nodes may participate in the response to the user's query without the user's being aware of such participation.
Distributed transaction processing: A transaction may access and modify data at several different sites in the DDB without the user's being aware that multiple sites are participating in the transaction.
Hardware independence: The DDB and its associated DDBMS should be capable of being implemented on any suitable platform, i.e., on any computer with appropriate hardware resources regardless of what company manufactured the computer. Note: Current DBMSs often fail to achieve this goal.
Operating system independence: The DDB and its associated DDBMS should be capable of being implemented on any suitable operating system, i.e., on any operating system capable of handling multiple users.
Note: At present this means Windows NT and 2000, and the various varieties of UNIX including Linux.
Network independence: The DDB and its associated DDBMS should be capable of being implemented on any suitable network platform.
Note: At present, this goal means that the DDBMS should be able to run on Windows NT, on Windows 2000, on any variant of UNIX, and on Novell Networks.
Database independence: The design of the DDB should render it capable of being supported by suitable, i.e., of sufficient power and sophistication, DDBMS from any vendor.
Distribution of databases across a network leads to increased complexity in the system implementation. To achieve the benefits of a distributed database that we have seen, earlier distributed database management software should be able to perform the following functions in addition to the basic functions performed by a no distributed DBMS: o Distributed query processing - Distributed query-processing means the ability to access remote sites and transmit queries and data among the various sites along the communication network.
o Data tracking - The distributed DBMS should have the ability to keep track of the data distribution. Fragmentation and replication by expanding the distributed DBMS catalog..
o Distributed transaction management - Distributed transaction management is the ability to devise execution strategies for queries and transactions that access data from more than one site and to synchronize the access to distributed data and maintain the integrity of the overall database, o Replicated data management - This is the ability of the system to decide which copy of the replicated data item to access and to maintain consistency of . copies of a replicated data item.
o Distributed data recovery - The distributed DBMS should have the ability to recover from individual site crashes and failures of communication links.
o Security - Distributed transactions must be executed with the proper management of the security of the data and the authorization and access privileges of users.
In this section we will examine the components of a distributed database system. One of the main components in a DDBMS is the Database Manager. "A Database Manager is software responsible for processing a segment of the distributed database. Another main component is the User Request Interface, which is usually a client program that acts as an interface to the Distributed Transaction Manager. A Distributed Transaction Manager is a program that translates requests from the user and converts them into actionable requests for the database manager, which are typically distributed. A Distributed database system is made of both the distributed transaction manager and the database manager.
1) Client-Server Architecture is an arrangement of components (clients and servers) among computers connected by a network.
2) Client-server architecture supports efficient processing of messages (requests for service) between clients and servers.
1) Two-Tier Architecture
Courtesy. blueportal. org
Author: Michael Mannino,Publication:Tata Mc Graw Hill,2004
To improve performance, the three-tier architecture adds another server layer either by a middleware server or an application server.
- The additional server software can reside on a separate computer.
- Alternatively, the additional server software can be distributed between the database server and PC clients.
3) Multiple-Tier Architecture
Client-server architecture with more than three layers: a PC client, a backend database server, an intervening middleware server, and application servers. Provides more flexibility on division of processing
The application servers perform business logic and manage specialized kinds of data such as images.
1. More efficient division of labour
2. Horizontal and vertical scaling of resources
3. Better price/performance on client machines
4. Ability to use familiar tools on client machines
5. Client access to remote data (via standards)
6. Full DBMS functionality provided to client workstations
7. Overall better system price/performance
Transaction Management deals with the problems of keeping the database in a consistent state even when concurrent accesses and failures occur.
What is a Transaction
A transaction consists of a series of operations performed on a database. The
important issue in transaction management is that if a database was in a consistent state
prior to the initiation of a transaction, then the database should return to a consistent state
after the transaction is completed. This should be done irrespective of the fact that
transactions were successfully executed simultaneously or there were failures during the
execution. Thus, a transaction is a unit of consistency and reliability. The properties of
transactions will be discussed later in the properties section.
Each transaction has to terminate. The outcome of the termination depends on the success or failure of the transaction. When a transaction starts executing, it may terminate with one of two possibilities:
1. The transaction aborts if a failure occurred during its execution
2. The transaction commits if it was completed successfully.
Figure la shows an example of a transaction that aborts during process 2 (P2). On the other hand, Figure lb shows an example of a transaction that commits, since all of its processes are successfully completed.
Committed transaction Aborted and Committed transaction
Properties of Transactions
A Transaction has four properties that lead to the consistency and reliability of a distributed database.
These are Atomicity, Consistency, Isolation, and Durability. Atomicity: This refers to the fact that a transaction is treated as a unit of operation. Consequently, it dictates that either all the actions related to a transaction are completed or none of them is carried out. For example, in the case of a crash, the system should complete the remainder of the transaction, or it will undo all the actions pertaining to this transaction. The recovery of the transaction is split into two types corresponding to the two types of failures:
the transaction recovery, which is due to the system terminating one of the transactions because of deadlock handling; and the crash recovery, which is done after a system crash or a hardware failure.
Consistency: Referring to its correctness, this property deals with maintaining consistent data in a database system. Consistency falls under the subject of concurrency control. For example, "dirty data" is data that has been modified by a transaction that has not yet committed. Thus, the job of concurrency control is to be able to disallow transactions from reading or updating "dirty data."
Isolation: According to this property, each transaction should see a consistent database at
all times. Consequently, no other transaction can read or modify data that is being modified by another transaction.
If this property is not maintained, one of two things could happen to the data base, as shown in Figure 2:
a. Lost Updates: this occurs when another transaction (T2) updates the same data being
modified by the first transaction (Tl) in such a manner that T2 reads the value prior to the
writing of Tl thus creating the problem of loosing this update.
b. Cascading Aborts: this problem occurs when the first transaction (Tl) aborts, then the
transactions that had read or modified data that has been used by Tl will also abort.

time IT T2
1 iine 1 Read x
Time 1 Read \
Time i Write \ \ " x- 2<'
Time 1 Write \
(a) Lost Up laics
Time Tt T2
Time 1 (...) i . !
Time 2 (...) ( ¦ - 1
Time 3 ABOR 1 ABOR 1
(fa) Caseadi ig Abort
Figure 2: Isolation
Durability: This property ensures that once a transaction commits, its results are permanent and cannot be erased from the database. This means that whatever happens after the COMMIT of a transaction, whether it is a system crash or aborts of other Transactions, the results already committed are not modified or undone
Concurrent Use of a Database: Problems
A common potential problem is that two processes may try to update the database in incompatible ways, e.g.
¢ Process A reads a copy of record R into memory,
¢ Process B reads a copy of record R into memory,
¢ Process A commits an updated version of record R,
¢ Process B commits an updated version of record R, obliterating process A's amendment.
Other related problems arise when one process attempts to alter the structure of a table which another is updating, or when two processes generate duplicate index values for a pair of records which should have unique keys. The need for "read consistency" must also be considered - when process A is generating a report based on a series of values in the database, its results may be falsified if process B changes some of them between the start and end of the transaction. Oracle actually handles this last problem by way of its rollback segments, but it can be approached in the same way as the others, using a system of locking.
A lock can be thought of as assigning a user or process temporary ownership of a database resource. While the lock exists, no other user or process may access the record. So, to safeguard against the lost update described above:
¢ Process A reads record R into the memory and acquires a lock on it,
¢ Process B tries to read record R into memory but is prevented from doing so,
¢ Process A commits an updated version of record R and releases the lock on it,
¢ Process B tries to read record R into memory again - this time successfully.
There are no commands for locking in standard SQL, and the syntax provided by any particular RDBMS will vary according to how it handles the locking process. In discussing this topic, two general pieces of terminology are used:
Shared / Exclusive locks.
Exclusive locks are set when it is intended to write or modify part of the database. While a resource is exclusively locked, other processes may query but not change it. Oracle automatically sets an exclusive lock on the relevant records before executing INSERT, DELETE or UPDATE, but it sometimes proves necessary for programs to lock explicitly in complex transactions.
As with any other situation where computer processes are in contention for resources, database locking gives rise to the potential problem of "deadlock". Suppose process A and B both needs to update records Rl and R2:
¢ Process A reads record Rl into the memory and acquires a lock on it.
¢ Process B reads record R2 into the memory and acquires a lock on it.
¢ Process B tries to read record Rl into memory but is prevented from doing so, going into a "wait" state.
¢ Process A tries to read record R2 into memory but is prevented from doing so, going into a "wait" state.
¢ Both processes hang about forever, waiting for each other.
The DBMS should be able to detect potential deadlocks by maintaining "wait-for" graphs showing which processes are waiting and for what resources. If a cycle is detected in the graph, the DBMS must arbitrarily select one of the offending transactions and roll it back so that the other one can proceed, then re-execute the one which was rolled back. Oracle can detect deadlocks although some less sophisticated systems cannot - in that case it becomes the responsibility of the programmer to handle it.
The discipline is that each transaction begins by explicitly locking ALL the resources it will need before issuing any update commands. The tables must always be named in the same predefined order. After each locking statement, a test is made to see if it succeeded or if the resource was already locked. Any failure causes the whole transaction to be rolled back (with consequent release of locks), after which the attempt to execute the transaction is repeated.
Even with Oracle, the default action on locking and deadlock detection may be risky or inefficient in some circumstances, and needs to be handled explicitly by program. Oracle SQL provides a LOCK TABLE command, which specifies the MODE (share, exclusive, row share, row exclusive, etc.) and may include a NOWAIT directive requesting that control be returned immediately to the user process, whatever the outcome. A database condition code is used to return information about success or failure.
Recovery after Failure
As already indicated, a DBMS must provide mechanisms for recovery after failures of various kinds, which might have corrupted the database or left it in an inconsistent state. In the Oracle context, several levels of failure are identified:
¢ Statement failure: simply causes the relevant transaction to be rolled-back and the database returned to its previous state.
¢ Process failure: e.g. abnormal disconnection from a SQLPLUS session. Once again this is handled automatically by rolling back transactions and releasing resources.
o Instance failure: a crash in the DBMS software, operating system or hardware. This requires action by the database administrator, who must issue a SHUTDOWN ABORT command to trigger off the automatic instance recovery procedures.
¢ Media failure: one or more database files are corrupted, for instance after a disc head crash. This is potentially most serious as it may have destroyed some of the files, like the "redo" log, which are needed for recovery. A previous version of the database must be restored from another storage device.
Instance Recovery
Oracle's handling of physical database updating with the LRU algorithm means that the state of the database at the time of failure is quite complex. It may contain:
¢ The results of committed transactions stored on disc.
¢ The results of committed transactions stored in memory buffers.
¢ The results of uncommitted transactions stored on disc.
¢ The results of uncommitted transactions stored in memory buffers.
There will also be a number of rollback segments for uncommitted transactions, and a "redo log" of committed transactions. Both of these will hold only the most recent data -how many transactions and how far back they go will depend on the disc space allocated to them by the DBA.
The shutdown procedure discards the contents of memory buffers, so recovery uses only information stored on disc. It consists of:
1. ROLLING FORWARD: re-applying the committed transactions recorded in the log. This holds "before" and "after" images of the updated records, which can be compared with, and if necessary used to change, the current database contents.
2. ROLLING BACK any uncommitted transactions already written to the database, using the stored rollback segments.
Media recovery
Recovering a database after a disc failure involves restoring a previously backed-up version from tape. Oracle provides the DBA with a full database EXPORT / IMPORT mechanism for this purpose - it is also necessary to back up log fdes and control files which do not form part of the database proper.
Any work done on the database since the backup will be lost unless the transaction log is archived to tape, rather than having its data overwritten when the allocated disc area is full. The DBA may choose whether or not to keep complete log archives - this is a trade-off between extra time / space / administrative overheads and the cost of having to re-enter transactions manually. Log archiving provides the power to do on-line backups without shutting down the system, and full automatic recovery. It is obviously a necessary option for operationally critical database applications.
The most commonly referred promises as advantages of the distributed DBMSs
- Transparent management of distributed, fragmented, and replicated data.
- Improved reliability/availability through distributed transactions.
- Improved performance,- easier and more economical expansion.
Transparency is the separation of the higher level semantics of a system from the lower level implementation issues. In a transparent system, system hides the implementation details from users. The advantage of a fully transparent DBMS is the high level of support that it provides for the development of complex applications. The fundamental issue to provide in a DBMS is the data independence, where the application programs are immune to changes in the logical or physical organization of data.
The distributed database technology intends to extend the concept of data independence to environments where data are distributed and replicated over a number of machines connected by a network. This is provided by several forms of transparency such that network (or distribution) transparency, replication transparency, and fragmentation transparency. Thus the database users would see a logically integrated, single-image database, even if it is physically distributed, enabling them to access the distributed database as if it were a centralized one.
In its ideal form, full transparency would imply a query language interface to the distributed DBMS that is no different from that of a centralized DBMS. Most of the commercial distributed DBMSs do not provide a sufficient level of transparency.
On the other hand, some systems require remote login to the DBMS for replication. Some distributed DBMSs attempt to establish transparent naming scheme, requiring users to specify the full path to data or to build aliases to avoid long names. Also for the network transparency, operating system support is needed.
Generally no fragmentation transparency is supported but horizontal fragmentation techniques may come in distributed DBMSs. Most of the distributed DBMSs are supporting multiple client/single server architecture.
In commercial systems, some of them do not provide any of the transparencies, such as Sybase provides the primitives available, but applications have to implement distributed transactions themselves. In Oracle 6.x, one can open one database at a time, whereas Oracle 7, Ingress, Nonstop SQL support distributed transactions. .
Assume relational data model Replication
System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance. Fragmentation
Relation is partitioned into several fragments stored in distinct sites. Replication and fragmentation can be combined
Relation is partitioned into several fragments: system maintains several identical replicas of each such fragment.
Data Replication
A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites.
Full replication of a relation is the case where the relation is stored at all sites.
Fully redundant databases are those in which every site contains a copy of the entire
Advantages of Replication
Availability: failure of site containing relation r does not result in ' unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r. . Disadvantages of Replication
Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply concurrency control operations on primary copy. Data Fragmentation
Division of relation r into fragments rl, r2... rn which contain sufficient information to reconstruct relation r.
1. Horizontal fragmentation: each tuple of r is assigned to one or more fragments.
2. Vertical fragmentation: the schema for relation r is split into several smaller schemas.
All schemas must contain a common candidate key (or super key) to ensure lossless join property.
Advantages of Fragmentation
It allows processing of fragments of a relation in parallel. It allows a relation to be split so that tuples are located where they are most frequently accessed. Vertical:
It allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed.
The tuple-id attribute allows efficient joining of vertical fragments.
Commit protocols are used to ensure atomicity across sites.
A transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.
It is not acceptable to have a transaction committed at one site and aborted at another.
The two-phase commit (2 PC) protocol is widely used.
The three-phase commit (3 PC) protocol is more complicated and more expensive, but avoids some drawbacks of two-phase commit protocol. Less used.
¢ Assumes fail-stop model - failed sites simply stop working, and do not cause any other harm, such as sending incorrect messages to other sites.
¢ Execution of the protocol is initiated by the coordinator after the last step of the transaction has been reached.
¢ The protocol involves all the local sites at which the transaction executed.
¢ Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci.
Phase 1: Obtaining a Decision ¢
¢ Coordinator asks all participants to prepare to commit transaction Ti.
o Ci adds the records <prepare T> to the log and forces log to stable storage.
o Sends prepare T messages to all sites at which T executed. « Upon receiving message, transaction manager at site determines if it can commit the transaction.
o if not, add a record <no T> to the log and send abort T message to Ci o if the transaction can be committed, then:
* Add the record <ready T> to the log. B Force all records for T to stable storage. ¦ Send ready T message to Ci.
Phase 2: Recording the Decision
¢ T can be committed of Ci received a ready T message from all the participating sites: otherwise Tmust be aborted.
¢ Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage it becomes irrevocable within this transaction (even if failures occur).
¢ Coordinator sends a message to each participant informing it of the decision (commit or abort), possibly for all active transactions.
¢ Participants take appropriate action locally.
The Two-Phase Commit Protocol (2CP) has two types of node to complete its processes: the coordinator and the subordinate, (Mohan et al., 1986).
The coordinator's process is attached to the user application, and communication links are established between the subordinates and the coordinator.
A database must guarantee that all statements in a transaction, distributed or non distributed, either commit or roll back as a unit. The effects of an ongoing transaction should be invisible to all other transactions at all nodes; this transparency should be true for transactions that include any type of operation, including queries, updates, or remote procedure calls.
The general mechanisms of transaction control in a non distributed database are discussed. In a distributed database, Oracle must coordinate transaction control with the same characteristics over a network and maintain data consistency, even if a network or system failure occurs.
Prepare Phase
An initiating node called the global coordinator notifies all sites involved in the transaction to be read to either commit or roll back the transaction. The coordinator sends a message "prepare for commit" to each node to get ready for committing the transaction. Each participating data receiving the message will force- all the transaction details to disk and then send a "ready to commit" or 'OK' to the coordinator. If the force writing to the disk fails or if the local transaction cannot commit for some reason. The participating database sends a "cannot commit" or "not OK" signal to the coordinator. If the coordinator does not receive a reply from a database within ascertain timeout interval. It assumes a "not OK" response. This is illustrated in given figure.
Database 1 Database 2 Database 3
Execute Phase
If all participating databases reply "ok", the coordinator signals "ok". This means that the transactions are successful. The coordinator sends a "commit" signals for the transaction to all the participating databases.
Each participating database completes the transaction by permanently updating the database .On the other hand if one or more of the database has given a "not ok ' signal then the communication also signals a not "not ok". The coordinator will send a message to "rollback" or undo the local effect of the transaction to each participating database. Thus, if there is no problem with prepare phase. Then all sites commit their transactions; if a network or node Failure occurs roll back their transactions.
Thus two-phase commit (also known as 2PC) is a feature of transaction processing systems. That enables distributed or multi-database systems to be returned to the pre-transaction state if some error condition occurs. A single transaction can update many different databases. The two-phase commit strategy is designed to ensure that either all the databases are updated or none of them are, so that the databases remain synchronized.
The transaction monitor then issues a "pre-commif. Command to each database which requires acknowledgment. If the monitor receives the appropriate response from each database, the monitor issues the "commit" command, which causes all databases to simultaneously make the transaction changes permanent.
The two-Phase Commit protocol goes through, as its name suggests, two phases. The first phase is a PREPARE phase, whereby the coordinator of the transaction sends a PREPARE message.
The second phase is decision-making phase, where the coordinator issues a COMMIT message, if all the nodes can carry out the transaction, or an abort message, if at least one subordinate node cannot carry out the required transaction. (Capitalization is used to distinguish between technical and literal meanings of some terminologies)
Handling of Failures - Site Failure
When site Si recovers, it examines its log to determine the fate of Transactions active at the time of the failure.
Log contains <commit T> record: site executes redo (7).
Log contains <abort T> record: site executes undo (7).
Log contains <ready T> record: site must consult Ci to determine the fate of 7.
¢ If Ci says T committed, redo (7).
¢ If Ci says T aborted, undo (7).
The log contains no control records concerning T implies that Sk failed before responding to the prepare T message from Ci .
¢ since the failure of Sk precludes the earlier sending of a commit response, Ci must abort T.
¢ Sk must execute undo (7).
Handling of Failures- Coordinator Failure
¢ If coordinator fails while the commit protocol for T is executing then other participating sites must decide on Ts fate:
1. If an active site contains a <commit T> record in its log, then T must be committed.
2. If an active site contains an <abort T> record in its log, then T must be aborted (rolled back).
3. If some active participating site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T.
4. If none of the above cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> of <commit T>). In this case active sites must wait for Ci to recover, to find decision.
5. Blocking problem: active sites may have to wait for failed coordinator to recover.
Handling of Failures - Network Partition
¢ If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol.
¢ If the coordinator and its participants belong to different partitions:
¢ Sites that are not in the partition containing the coordinator think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. ¦
No harm results, but sites may still have to wait for decision from coordinator.
The coordinator and the sites in the same partition as the coordinator think that the sites in the other partition have failed, and follow the usual 2-phase commit protocol.
4 Again, no harm results.
¢ In-doubt transactions have a <ready T>, but neither a <commit T>, nor an <abort T> log record.
¢ The recovering site must determine the commit-abort status of such transactions by contacting other sites; this can slow and potentially block recovery.
¢ Recovery algorithms can note lock information in the log:
o Instead of <ready T>, write out <ready T,L> L = list of locks held by T when the log is written (read locks can be omitted). ¦
o For every in-doubt transaction T, all the locks noted in the
<ready T, L> log record are reacquired.
¢ After lock reacquisition, transaction processing can resume; the commit or
rollback of in-doubt transactions is performed concurrently with the execution
of new transactions.
e Assumptions;
o No network partitioning.
o At any point, at least one site must be up.
o At most K sites (participants as well as coordinator) can fail.
¢ Phase 1: Obtaining Preliminary Decision: Identical to 2PC Phase 1.
o Every site is ready to commit if instructed to do so.
¢ Phase 2 of 2PC is split into 2 phases, Phase 2 and Phase 3 of 3PC
o In phase 2 coordinator makes a decision as in 2PC (called the pre-
commit decision) and records it in multiple (at least K) sites.
o In phase 3, coordinator sends commit/abort message to all participating
e Under 3PC, knowledge of pre-commit decision can be used to commit despite
coordinator failure.
o Avoids blocking problem as long as < K sites fail.
¢ Drawbacks:
o Higher overheads.
o Assumptions may not be satisfied in practice.
1. Modify concurrency control schemes for use in distributed environment.'
2. We assume that each site participates in the execution of a commit protocol to
ensure global transaction atomicity.
3. We assume all replicas of any item are updated
1. System maintains a single lock manager that resides in a single chosen site, say Si
2. When a transaction needs to lock a data item, it sends a lock request to Si and lock
manager determines whether the lock can be granted immediately
a. If yes, lock manager sends a message to the site which initiated the request
b. If no, request is delayed until it can be granted, at which time a message is
sent to the initiating site
3. The transaction can read the data item from any one of the sites at which a replica
of the data item resides.
4. Writes must be performed on all replicas of a data item
5. Advantages of scheme:
a. Simple implementation.
b. Simple deadlock handling.
6. Disadvantages of scheme are:
a. Bottleneck: lock manager site becomes a bottleneck
b. Vulnerability: system is vulnerable to lock manager site failure.
¦ 1. In this approach, functionality of locking is implemented by lock managers at
each site.
Lock managers control access to local data items
But special protocols may be used for replicas.
2. Advantage: work is distributed and can be made robust to failures.
Disadvantage: deadlock detection is more complicated.
o Lock managers cooperate for deadlock detection
¦ ¦ More on this later
4. Several
variants of this approach Primary copy Majority protocol Biased protocol Quorum consensus
Several Protocols in Lock Manager Approach 3) Primary Copy
¢ Choose one replica of data item to be the primary copy.
o Site containing the replica is called the primary site for that data item, o Different data items can have different primary sites.
¢ When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q.
o Implicitly gets lock on all replicas of the data item.
¢ Benefit
o Concurrency control for replicated data handled similarly to unreplicated
data - simple implementation.
¢ Drawback
o If the primary site of Q fails, Q is inaccessible even though other sites
containing a replica may be accessible.
4) Majority Protocol
s Local lock manager at each site administers lock and unlock requests for data
items, stored at that site. ¢ When a transaction wishes to lock an unreplicated data item Q residing at site Si, a message is sent to Si's lock manager.
o If Q is locked in an incompatible mode, then the request is delayed until it can be granted.
o When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted.
» In case of replicated data
o If Q is replicated at n sites, then a lock request message must be sent to
more than half of the n sites in which O is stored, o The transaction does not operate on Q until it has obtained a lock on a
majority of the replicas of Q. o When writing the data item, transaction performs writes on all replicas.
¢ Benefit
o Can be used even when some sites are unavailable.
¦ Details on how handle writes in the presence of site failure later.
e Drawback
o Requires 2(n/2 + 1) messages for handling lock requests, and {nil + 1)
messages for handling unlock requests, o Potential for deadlock even with single item - e.g., each of 3 transactions
may have locks on l/3rd of the replicas of a data.
5) Biased Protocol
¢ Local lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks.
¢ Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on O from the lock manager at one site containing a replica of Q.
¢ Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q.
¦ ¢ Advantage - imposes less overhead on read operations.
¢ Disadvantage - additional overhead on writes
6) Quorum Consensus Protocol
¢ A generalization of both majority and biased protocols.
¢ Each site is assigned a weight.
o Let S be the total of all site weights.
¢ Choose two values read quorum Qr and write quorum Qw
o Such that Qr + Qw > S and 2 * Qw > S. . o Quorums can be chosen (and S computed) separately for each item.
¢ Each read must lock enough replicas that the sum of the site weights is >= Qr.
« Each write must lock enough replicas that the sum of the site weights is >= Qw.
¢ For now we assume all replicas are written.
o Extensions to allow some sites to be unavailable described later.
Consider the following two transactions and history, with item X and transaction
Tl at site 1, and item Y and transaction T2 at site 2:
Tl: write (X) 72: write (Y)
write (Y) write (X)
X-lock on X Write (X)
Wait for X-lock on Y
X-lock on Y write (Y)
Wait for X-lock on X
Result: deadlock which cannot be detected locally at either site
¢ Centralized Approach.
¢ A global wait-for graph is constructed and maintained in a single site; the deadlock-detection coordinator.
o Real graph: Real, but unknown, state of the system.
o Constructed graph: Approximation generated by the controller during the execution of its algorithm.
¢ the global wait-for graph can be constructed when:
o a new edge is inserted in or removed from one of the local wait-for graphs.
o a number of changes have occurred in a local wait-for graph, o the coordinator needs to invoke cycle-detection.
¢ If the coordinator finds a cycle, it selects a victim and notifies all sites. The sites roll back the victim transaction.
¢ Timestamp based concurrency-control protocols can be used in distributed systems.
¢ Each transaction must be given a unique timestamp.
e Main problem: how to generate a timestamp in a distributed fashion.
o Each site generates a unique local timestamp using either a logical counter or the local clock.
o Global unique timestamp is obtained by concatenating the unique local timestamp with the unique identifier.
Several issues in query processing in a heterogeneous database. Schema translation.
o Write a wrapper for each data source to translate data to a global schema, o Wrappers must also translate updates on global schema to updates on local schema. Limited query capabilities.
o Some data sources allow only restricted forms of selections.
« E.g. web forms, flat file data sources, o Queries have to be broken up and processed partly at the source and partly at a different site.
The peer-to-peer explosion has reminded people of the power of decentralized systems. The promise of robustness, open-endedness, and infinite scalability has made many people excited about decentralization. But in reality, most systems we build on the Internet are largely centralized.
This two-part article develops a framework for comparing distributed system designs. In this first part, I boil down the design of many systems to their essential topologies and describe how hybrid topologies can be made by combining parts. In the second part, I will introduce seven criteria for evaluating a system design and discuss the relative merits of distributed system designs.
Looking at topology
The peer-to-peer trend has renewed interest in decentralized systems. The Internet itself is the largest decentralized computer system in the world. But ironically in the 1990s many systems built on the Internet were completely centralized. The growth of the Web meant most systems were single web servers running in fabulously expensive collocation facilities. Now with peer-to-peer, the pendulum has swung the other way to radically decentralized architectures such as Gnutella. In practice, extreme architectural choices in either direction are seldom the way to build a usable system.
With the Internet, we have 30 years of experience with distributed systems architectures. With all this experience, a high-level framework for understanding distributed systems is helpful for organizing what we have learned.
Four basic topologies are in use on the Internet: centralized and decentralized, but also hierarchical and ring systems. These topologies can be used by themselves, or combined into one system creating hybrid systems.
The debate between centralized and decentralized systems is fundamentally about topology -.- in other words, how the nodes in the system are connected. Topology can be considered at many different levels: physical, logical, connection, or organizational. For this analysis, topology is considered in terms of the information flow. Nodes in the graph are individual computers or programs, links between nodes indicate that those nodes are sharing information regularly in the system. Typically, an edge implies that the two nodes are directly sharing bits across a network link. For simplicity, I do not consider the direction of information flow; edges are considered undirected. Four common topologies will be explained here: centralized, ring, hierarchical, and decentralized. (A fifth distributed system pattern, group communication, is not considered in this article.)
Centralized systems are the most familiar form of topology, typically seen as the client/server pattern used by databases, web servers, and other simple distributed systems. All function and information is centralized into one server with many clients connecting directly to the server to send and receive information. Many applications called "peer-to-peer" also have a centralized component. SETI@Home is a fully centralized architecture with the job dispatcher as the server. And the original Napster's search architecture was centralized, although the file sharing was not.
A single centralized server cannot handle high client load, so a common solution is to use a cluster of machines arranged in a ring to act as a distributed server. Communication between the nodes coordinates state-sharing, producing a group of nodes that provide identical function but have failover and load-balancing capabilities. Unlike the other topologies here, ring systems are generally built assuming the machines are all

\ \
( C ) CDJ ( K ) QeJ) ”
tree structured network star network
C c y
(jE J (jD
ring network
Hierarchical systems have a long history on the Internet, but in practice are often overlooked as a distinct distributed systems topology. The best-known hierarchical system on the Internet is the Domain Name Service, where authority flows from the root name-servers to the server for the registered name and often down to third-level servers. NTP, the Network Time Protocol, creates another hierarchical system.
The final topology we consider here is decentralized systems, where all peers communicate symmetrically and have equal roles. Gnutella is probably the most "pure" decentralized system used in practice today, with only a small centralized function to bootstrap a new host. Many other file-sharing systems are also designed to be decentralized, such as Freenet or OceanStore
Hybrid topologies
Distributed systems often have a more complex organization than any one simple topology. Real-world systems often combine several topologies into one system, making a hybrid topology. Nodes typically play multiple roles in such a system. For example, a
Distributed Databases
node might have a centralized interaction with one part of the system, while being part of a hierarchy in another part.
Centralized + Ring
As mentioned above, serious web server applications often have a ring of servers for load balancing and failover. The server system itself is a ring, but the system as a whole (including the clients) is a hybrid: a centralized system where the server is itself a ring. The result is the simplicity of a centralized system (from the client's point of view) with the robustness of a ring.
Centralized + Centralized
The server in a centralized system is itself often a client of one or more other servers. Stacking multiple centralized systems is the core of «-tier application frameworks. For example, when a web browser contacts a server, the software on that server may just be formatting results into HTML for presentation and itself calling to servers hosting business logic or data. Web services intermediaries such as Grand Central Networks also create several layers of centralized system. Centralized systems are often stacked as a way to compose function.
Centralized + Decentralized
A new wave of peer-to-peer systems is advancing architecture of centralized systems embedded in decentralized systems. This hybrid topology is realized with hundreds of thousands of peers in the FastTrack file-sharing system used in KaZaA and Morpheus. Most peers have a centralized relationship to a "supernode," forwarding all file queries to this server (much like a Napster client sends queries to the Napster server). But instead of super nodes being standalone servers, they band themselves together in a Gnutella-like decentralized network, propagating queries. Internet email also shows this kind of hybrid topology. Mail clients have a centralized relationship with a specific mail server, but mail servers themselves share email in a decentralized fashion.
Other topologies
There are limitless possibilities in combining various kinds of architectures. A centralized system could have a hierarchy of machines in the role of server. Decentralized systems could be built that span different rings or hierarchies. Systems could conceivably be built with three or more topologies combined, although the resulting complexity may be too difficult to manage. Topology is a useful simplifying tool in understanding the architecture of distributed systems.
The distribution of data and applications has promising advantages. Although they may not be fully satisfied by the time, these advantages are to be considered as objectives to be achieved.
1. Local Autonomy: Since data is distributed, a group of users that commonly share such data can have it placed at the site where they work, and thus have local control. By this way, users have some degree of freedom as accesses can be made independently from the global users. . 2. Improved Performance: Data retrieved by a transaction may be stored at a number of sites, making it possible to execute the transaction in parallel. Besides, using several resources in parallel can significantly improve performance.
3. Improved Reliability/Availability: If data is replicated so that it exists at more than one site, a crash of one of the sites, or the failure of a communication line making some of these sites inaccessible, does not necessarily make the data impossible to reach. Furthermore, system crashes or communication failures do not cause total system not operable and distributed DBMS can still provide limited service.-
4. Economics: If the data is geographically distributed and the applications are related to these data, it may be much more economical, in terms of communication costs, to partition the application and do the processing at each site. On the other hand, the cost of having smaller computing powers at each site is much more less than the cost of having an equivalent power of a single mainframe.
5. Expandability: Expansion can be easily achieved by adding processing and storage power to the existing network. It may not be possible to have a linear improvement in power but significant changes are still possible.
6. Share ability: If the information is not distributed, it is usually impossible to share data and resources.
1. Complexity of management and control- management of distributed data is a more complex task than management of centralized data. Applications must recognize data location, and they must be able to stitch together data from different sites. Database administrators must have the ability to coordinate database activities to prevent database degradation due to data anomalies. Transaction management, concurrency control, security, backup, recovery, query optimization, access path selection, and so on, must all be addressed and resolved. In short, keeping the various components of a distributed database synchronized is a daunting task.
2. Security- The probability of security lapses increases when data are located at multiple sites. Different people at several sites will share the responsibility of data management, and LANs do not yet have the sophisticated security of centralized mainframe installations.
3. Lacks of standards- Although distributed databases depend on effective communication, there are no standard communication protocols. In fact, few official standards exist in any of the distributed database protocols, whether they deal with communication or data access control. Consequently, distributed database users must wait for the definitive emergence of standard protocols before distributed databases can deliver all their potential goods.
4. Increased storage requirements- Data replication requires additional disk storage space. This disadvantage is a minor one, because disk storage space is relatively cheap and it is becoming cheaper. However, disk access and storage management in a widely dispersed data storage environment are more complex than they would be in a centralized database.
5. Lack of Experience: Some special solutions or prototype systems have not been tested in actual operating environments. More theoretical work is done compared to actual implementations.
6. Complexity: Distributed DBMS problems are more complex than centralized DBMS problems.
7. Cost: Additional hardware for communication mechanisms are needed as well as additional and more complex software may be necessary to solve the technical problems. The trade-off between increased profitability due to more efficient and timely use of information and due to new data processing sites, increased personnel costs has to be analyzed carefully.
8. Distribution of Control: The distribution creates problems of synchronization and coordination as the degree to which individual DBMSs can operate independently.
9. Security: Security can be easily controlled in a central location with the DBMS enforcing the rules. However, in distributed database system, network is involved which it has its own security requirements and security control becomes very complicated.
10. Difficulty of Change: All users ha
project report helper
Active In SP

Posts: 2,270
Joined: Sep 2010
15-10-2010, 04:43 PM

.pdf   QUERY PROCESSING.pdf (Size: 1.41 MB / Downloads: 57)
Query Processing in a System for Distributed Databases

Harvard University
University of California at Berkeley
Massachusetts Institute of Technology
Computer Corporation of America

Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalanguage. Optimization begins by translating each Datalanguage query into a relational calculus form called an envelope, which is essentially an aggregate-free QUEL query. This paper is primarily concerned with the optimization of envelopes. Envelopes are processed in two phases. The first phase executes relational operations at various sites of the distributed database in order to delimit a subset of the database that contains all data relevant to the envelope. This subset is called a reduction of the database. The second phase transmits the reduction to one designated site, and the query is executed locally at that site. The critical optimization problem is to perform the reduction phase efficiently. Success depends on designing a good repertoire of operators to use during this phase, and an effective algorithm for deciding which of these operators to use in processing a given envelope against a given database. The principal reduction operator that we employ is called a sem@oin. In this paper we define the semijoin operator, explain why semijoin is an effective reduction operator, and present an algorithm that constructs a cost-effective program of semijoins, given an envelope and a database. Key Words and Phrases: distributed databases, relational databases, query processing, query optimization, semijoins CR Categories: 3.70,4.33
project report helper
Active In SP

Posts: 2,270
Joined: Sep 2010
16-10-2010, 05:35 PM

.doc   DDB[1].doc (Size: 241.5 KB / Downloads: 72)
distributed database full report





Distributed databases are becoming more widespread, fueled by advances in technology and demand for system availability. The purpose of this paper is to present an introduction to distributed databases, issues related to the motivations of DDBS, architecture, design, performance, concurrency control, database links, advantages and disadvantages of distributed databases.
A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.
A key objective for a distributed system is that it looks like a centralized system to the user. The user should not need to know where a piece of data is stored physically.

seminar and presentationproject and implimentationsnewreply.php?tid=8015
seminar ideas
Super Moderator

Posts: 10,003
Joined: Apr 2012
26-05-2012, 10:46 AM


.doc   DISTRIBUTED DATABASE.doc (Size: 152 KB / Downloads: 33)


A distributed database can be defined as consisting of a collection of data with different parts under the control of separate DBMSs running on independent computer systems. All the computers are interconnected and each system has autonomous processing capability serving local applications. Each system participates, as well, in the execution of one or more global applications. Such applications require data from more than one site. The distributed nature of the database is hidden from users and this transparency manifests itself in a number of ways.
Although there are a number of advantages to using a distributed DBMS, there are also a number of problems and implementation issues. Finally, data in a distributed DB can be (a) partitioned or (b) replicated or both.

(a) Data partitioning

In a distributed DBMS a relational table may be broken up into two or more non-overlapping partitions or slices. A table may be broken up horizontally, vertically, or a combination of both. Partitions may in turn be replicated. This feature causes problems for concurrency control and catalogue management in distributed databases. This is transparent to users.

(b) Data replication

In a distributed DBMS a relational table or a partition may be replicated or copied, and copies may be distributed throughout the database. This feature can cause problems for propagating updates and concurrency control and this is transparent to users in distributed database

Concept of DDB: distributed database is a collection of data which belong logically to the same system but are spread over the sites of a computer network
A distributed database on geographically dispersed network.

Consider a bank that has three branches at different locations. At each branch, a computer controls the teller terminals of the branch and the account database of that branch (figure 2). Each computer with its local account database at one branch constitutes one site of the distributed databases: computers are connected by a communication network.

Basic architecture

A database User accesses the distributed database through.

Local applications.

Applications which do not require data from other sites.

Global applications

Applications which do require data from other sites.

A distributed database does not share main memory or disks.
Thinking To Register

19-10-2012, 09:37 PM

I am always investigating online for tips that can benefit me. Thanks seminar and presentationproject and implimentations.com wish you luck
seminar tips
Super Moderator

Posts: 8,857
Joined: Oct 2012
20-10-2012, 11:24 AM

to get information about the topic " distributed database" full report ppt and related topic refer the link bellow




topicideashow-to-transaction-routing-in-distributed-database-systems-ieee-project and implimentation-list


Important Note..!

If you are not satisfied with above reply ,..Please


So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page

Quick Reply
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Database management concepts seminar tips 9 3,822 23-07-2016, 02:17 PM
Last Post: Dhanabhagya
  web spoofing full report computer science technology 13 9,016 20-05-2016, 11:59 AM
Last Post: Dhanabhagya
  Load Rebalancing for Distributed File Systems in Clouds seminar tips 3 1,805 13-04-2015, 05:21 PM
Last Post: shilpavpius
  android full report computer science technology 57 73,251 24-09-2014, 05:05 PM
Last Post: Michaelnof
  steganography full report project report tiger 23 25,818 01-09-2014, 11:05 AM
Last Post: computer science crazy
  3D PASSWORD FOR MORE SECURE AUTHENTICATION full report computer science topics 144 92,700 13-05-2014, 10:16 AM
Last Post: seminar project topic
Video Random Access Memory ( Download Full Seminar Report ) computer science crazy 2 2,423 10-05-2014, 09:44 AM
Last Post: seminar project topic
Brick Virtual keyboard (Download Full Report And Abstract) computer science crazy 37 31,048 08-04-2014, 07:07 AM
Last Post: Guest
  Towards Secure and Dependable Storage Services in Cloud Computing FULL REPORT seminar ideas 5 4,152 24-03-2014, 02:51 PM
Last Post: seminar project topic
  eyeOS cloud operating system full report seminar topics 8 11,476 24-03-2014, 02:49 PM
Last Post: seminar project topic