CHAPTER 20 Distributed File System Administration
The RPCs run on top of the UDP protocol. UDP is faster than TCP, but doesn't provide any error checking. NFS relies on the built-in retry logic of the RPCs to make sure that requests and replies arrive at their destinations. The client can specify block sizes, number of retry attempts, and time to wait values when it mounts the servers files, with defaults of 8k blocks (read: rsize, write: wsize), 5 retries (retrans), and a 1 second timeout (timeo). If the client doesn't receive an acknowledgment within the timeout period it sends the request again. To prevent overloading the server it then doubles the time-to-wait period. The client continues the cycle until the server responds or the retry limit is reached. If the latter occurs you get the familiar "nfs server not responding" error message. Since the NFS protocol is stateless the client receiving this error has no information to decide if the problem is with the network or with the server. Processes trying to access server files, e.g. df, will happily wait until the server responds, as it is blocked until it receives a reply. If the server crashed the client program will pick up where it left off after the server comes back on line. You can use "soft" mounts to give you the ability to break out from stalled RPC send/receive requests. If you really want to insure that write requests are completed, though, you should use "hard" mounts, and also specify "intr" if you want to be able to abort the command.
Before a client issues an RPC request to the server it checks to see if the desired data is already cached from an earlier request. If the data is newer than the cache attribute timeout value (actimeo, with a default of 30 seconds) than the data is used, otherwise it sends a request to the server to compare the modification time of it's cached file with that of the server's file. If the server's file is newer a request to resend the data is issued.
NFS version 3, used by IRIX 5.3+ and SunOS 5.5+ has some significant enhancements over earlier versions. NFS can now run on top of the TCP protocol. Additionally it now supports safe asynchronous writes, finer access control, and larger file transfer sizes, with less overhead. Since NFS is stateless you want to make sure that the server has really performed the write request to a stable storage area before acknowledging it to the client. Version 3 allows unsafe, asynchronous, writes to be committed to stable storage reliably.
The maximum transfer size has been increased from 8 kB to 4 GB, where the machines negotiate the transfer size, up to 64 KB, the maximum allowed for both UDP and TCP. The protocol, either TCP or UDP, is also negotiated between the machines, defaulting to TCP if both ends support it. The new protocol now allows 64-bit file offsets, up from the former 32-bit limit, supporting arbitrarily large file sizes. The new version is more efficient, e.g. it returns the file attributes after each call, eliminating the need to issue a separate request for this information.
Solaris 2.5 and IRIX 5.3+ NFS implementations support both version 3 and version 2 of the protocols, so that they can reliably communicate with clients and servers supporting either, with full backwards compatibility. Both NFS versions use port 2049 and should have such and entry for both udp and tcp in /etc/services. The 22 RPC requests used be NFS version 3 are listed below.
NFS version 3 | NFS version 2 | Description |
---|---|---|
void | null | Does nothing, except make sure the connection is up |
GETATTR | getattr | get file, or directory, attributes, e.g. file type, access times & permissions |
SETATTR | setattr | set file, or directory, attributes |
LOOKUP | lookup | lookup file name in a directory |
ACCESS | check access permissions for a user | |
READLINK | readlink | read the data from a symbolic link |
READ | read | read from a file |
WRITE | write | write to a file |
CREATE | create | create a file or symbolic link |
MKDIR | mkdir | create a directory |
SYMLINK | symlink | create a symbolic link |
MKNOD | create a special device node | |
REMOVE | remove | remove a file (delete the directory entry) |
RMDIR | rmdir | remove a directory (delete the subdirectory entry from a directory) |
RENAME | rename | rename a file or directory |
LINK | link | create a link to an object |
READDIR | readdir | read from a directory |
READDIRPLUS | extended read from a directory | |
FSSTAT | statfs | get dynamic file system state information |
FSINFO | get static file system state information | |
PATHCONF | retrieve POSIX information for the filesystem | |
COMMIT | commit the cached data on the server to stable storage (force a flush of data previously written to the server) |