DISTRIBUTED FILE SYSTEMS ppt
Active In SP
Joined: Sep 2010
24-01-2011, 05:24 PM
DS.pptx (Size: 94.33 KB / Downloads: 114)
Computing System is a collection of processes operating on data objects.
Persistent data objects should be named and saved on nonvolatile storage device.
Named data objects are files.
A file system is a major component in an OS.
A Distributed File System(DFS) is an implementation of file system.
Important concepts in distributed System design
DFSs employ many aspects of the notion of transparency.
The directory service in DFS is a key component in all distributed systems.
The performance and availability require the use of caching and replication.
Access control and protection for DFSs open many problems in distributed system security.
Characteristics of DFS:
dispersion and multiplicity of users and files.
Transparent DFS should exhibit the following properties:
Multiplicity of users
Multiplicity of files
DFS DESIGN AND IMPLEMENTATION
The organization of data files can be either flat or hierarchical.
Files are named and accessed using a hierarchical pathname .
File access must be regulated to ensure security.
Directory , Authorization and file services are user interfaces to a file system.
System services are file systems interface to hardware & are transparent to users.
Major Fns of System services includes:
mapping of logical to physical block addresses
Interfacing to services at the device level for file space allocation/deallocation
Actual read/write file operations.
Services and Servers:
Servers are processes that implement services.
A service may be implemented by a server/ number of servers.
A server may also provide multiple services.
Interaction among services in DFS:
File mounting and Server Registration
Constructs a large file system from various file servers and storage devices
Mounting point is usually the leaf of directory tree that contain only an empty subdirectory
Once files are mounted they’re accessed using the concatenated logical path names.
File system mounting can be done in three diff instances:
Stateful and Stateless File Servers
A connection requires the establishment and termination of communication session. There’s state information associated with each session.
Opened files and their clients
File descriptors and File handles
Current file position pointers
A file server is stateful if it maintains internally some of the state information and stateless if it maintains none at all.
Implementation of stateless server must address the following issues:
File locking mechanism
Session key management
File Access and Semantics of sharing
File sharing- multiple clients access same file at same time.
The may result from either overlapping/interleaving
Coherency Control- Managing access to the replicas, to provide a coherent view of the shared file
Concurrency control- Concurrency is achieved by time multiplexing of the files and the issues here are how to prevent one execution sequence from interfering with others when they’re interleaved & how to avoid inconsistent results.
In space domain read and write accesses to a remote file can be implemented in one of the following ways:
Coherency of replicated data may be interpreted in many diff ways
1. All replicas are identical in all times
2. Replicas are perceived as identical only at some points in time
3. Users always read the “most recent”data in the replicas.
4. Write operations are always performed immediately and their results are propagated in a best -effort fashion
In time domain interleaved read and write results in concurrent file accesses.
1. Simple RW
Semantics of sharing
Solutions to coherency and concurrency control problem depends on semantics of sharing.
Three popular semantic models:
All problems associated with file sharing and replication disappear if the file is read only. To achieve write sharing clients must know the names of newly created files.
A simple solution is to use same file name but with a version number for each revision of the file.
The burden of enforcing file sharing semantics is separated from the file service in to a higher level service called version control.
File with highest version number considered to be current version.
Joined: Apr 2012
29-05-2012, 04:54 PM
DISTRIBUTED FILE SYSTEMS
file systems.doc (Size: 46 KB / Downloads: 23)
Since its introduction in 1985, the Sun Microsystems Network File System (NFS) has been widely used in industry and academia. In addition to its technical innovations it has played a significant educational role in exposing a large number of users to the benefits of a distributed file system. Other vendors now support NFS and a significant fraction of the user community perceives it to be a de facto standard. Portability and heterogeneity are two considerations that have played a dominant role in the design of NFS. Although the original file system model was based on Unix, NFS has been ported to to non-Unix operating systems such as PC-DOS. To facilitate portability, Sun makes a careful distinction between the NFS protocol, and a specific implementation of an NFS server or client. The NFS protocol defines an RPC interface that allows a server to export local files for remote access. The protocol does not specify how the server should implement this interface, nor does it mandate how the interface should be used by a client. Design details such as caching, replication, naming, and consistency guarantees may vary considerably in different NFS implementations. In order to focus our discussion, we restrict our attention to the implementation of NFS provided by Sun for its workstations that run the SunOS flavor of Unix. Unless otherwise specified, the term‘‘NFS’’ will refer to this implementation in the rest of this paper. The term ‘‘NFS protocol’’ will continue to refer to the generic interface specification. SunOS defines a level of indirection in the kernel that allows file system operations to be intercepted and transparently routed to a variety of local and remote file systems. This interface, often referred to as the vnode interface after the primary data structure it exports, has been incorporated into many other versions of Unix. With a view to simplifying crash recovery on servers, the NFS protocol is designed to be stateless. Consequently, servers are not required to maintain contextual information about their clients. Each RPC request from a client contains all the information needed to satisfy the request. To some degree functionality and Unix compatibility have been sacrificed to meet this goal. Locking, for instance, is not supported by the NFS protocol, since locks would constitute state information on a server. SunOS does, however, provide a separate lock server to perform this function. Sun workstations are often configured without a local disk. The ability to operate such workstations without significant performance degradation is another goal of NFS. Early versions of Sun workstations used a separate remote-disk network protocol to support diskless operation. This protocol is no longer necessary since the kernel now transforms all its device operations into file operations. A high-level overview of NFS is presented by Walsh. Details of its design and implementation are given by Sandberg. Kleiman describes the vnode interface, while Rosen comment on the portability of NFS.
Naming and Location
The NFS paradigm treats workstations as peers, with no fundamental distinction between clients and servers. A workstation may be a server, exporting some of its files. It may also be a client, accessing files on other workstations. But it is common practice for installations to be configured so that a small number of nodes run as dedicated servers, while the others run as clients. NFS clients are usually configured so that each sees a Unix file name space with a private root. Using an extension of the Unix mount mechanism, subtrees exported by NFS servers are individually bound to nodes of the root file system. This binding usually occurs when Unix is initialized, and remains in effect until explicitly modified. Since each workstation is free to configure its own name space there is no guarantee that all workstations at an installation have a common view of shared files. But collaborating groups of users usually configure their workstations to have the same name space. Location transparency is thus obtained by convention, rather than being a basic architectural feature of NFS. Since name-to-site bindings are static, NFS does not require a dynamic file location mechanism. Each client maintains a table mapping remote subtrees to servers. The addition of new servers or the movement of files between servers renders the table obsolete. There is no mechanism built into NFS to propagate information about such changes.
Caching and Replication
NFS clients cache individual pages of remote files and directories in their main memory. They also cache the results of pathname to vnode translations. Local disks, even if present, are not used for caching. When a client caches any block of a file, it also caches a timestamp indicating when the file was last modified on the server. To validate cached blocks of a file, the client compares its cached timestamp with the timestamp on the server. If the server timestamp is more recent, the client invalidates all cached blocks of the file and refetches them on demand. A validation check is always performed when a file is opened, and when the server is contacted to
satisfy a cache miss. After a check, cached blocks are assumed valid for a finite interval of time, specified by the client when a remote file system is mounted. The first reference to any block of the file after this interval forces a validation check. If a cached page is modified, it is marked as dirty and scheduled to be flushed to the server. The actual flushing is performed by an asynchronous kernel activity and will occur after some unspecified delay. However, the kernel does provide a guarantee that all dirty pages of a file will be flushed to the server before a close operation on the file completes.
Directories are cached for reading in a manner similar to files. Modifications to directories, however, are performed directly on the server. When a file is opened, a cache validation check is also performed on its parent directory. Files and directories can have different revalidation intervals, typical values being 3 seconds for files and 30 seconds for directories. NFS performs network data transfers in large block sizes, typically 8 Kbytes, to improve performance. Read-ahead is employed to improve sequential access performance. Files corresponding to executable binaries are fetched in their entirety if they are smaller than a certain threshold. As originally specified, NFS did not support data replication. More recent versions of NFS support replication via a mechanism called Automounter. Automounter allows remote mount points to be specified using a set of servers rather than a single server. The first time a client traverses such a mount point a request is issued to each server, and the earliest to respond is chosen as the remote mount site. All further requests at the client that cross the mount point are directed to this server. Propagation of modifications to replicas has to be done manually. This replication mechanism is intended primarily for frequently-read and rarely-written files such as system binaries.
NFS uses the underlying Unix file protection mechanism on servers for access checks. Each RPC request from a client conveys the identity of the user on whose behalf the request is being made. The server temporarily assumes this identity, and file accesses that occur while servicing the request are checked exactly as if the user had logged in directly to the server. The standard Unix protection mechanism using user, group and world mode bits is used to specify protection policies on individual files and directories. In the early versions of NFS, mutual trust was assumed between all participating machines. The identity of a user was determined by a client machine and accepted without further validation by a server. The level of security of an NFS site was effectively that of the least secure system in the environment. To reduce vulnerability, requests made on behalf of root (the Unix superuser) on a workstation were treated by the server as if they had come from a non-existent user, nobody. Root thus received the lowest level of privileges for remote files. More recent versions of NFS can be configured to provide a higher level of security. DES-based mutual authentication is used to validate the client and the server on each RPC request. Since file data in RPC packets is not encrypted, NFS is still vulnerable to unauthorized release and modification of information if the network is not physically secure. The common DES key needed for mutual authentication is obtained from information stored in a publicly readable database. Stored in this database for each user and server is a pair of keys suitable for public key encryption. One key of the pair is stored in the clear, while the other is stored encrypted with the login password of the user. Any two entities registered in the database can deduce a unique DES key for mutual authentication. Taylor describes the details of this mechanism.
Sun provides two mechanisms to assist system managers. One of these, the Yellow Pages (YP), is a mechanism for maintaining key-value pairs. The keys and values are application-specific and are not interpreted by YP. A number of Unix databases such as those mapping usernames to passwords, hostnames to network addresses, and network services to Internet port numbers are stored in YP. YP provides read-only replication, with one master and many slaves. Lookups may be performed at any replica. Updates are performed at the master, which is responsible for propagating the changes to the slaves. YP provides a shared repository for system information that changes relatively infrequently and that does not require simultaneous updates at all replication sites. YP is usually in use at NFS installations, although this is not mandatory. The Automounter, mentioned above in the context of read-only replication, is another mechanism for simplifying system management. It allows a client to lazy-evaluate NFS mount points, thus avoiding the need to mount all remote files of interest when the client is initialized. Automounter can be used in conjunction with YP to substantially simplify the administrative overheads of server reconfiguration