[Next] [Previous] [Up] [Top] [Contents]

CHAPTER 20 Distributed File System Administration

20.2 NFS Protocol


The NFS protocol uses RPCs to communicate between client and server. The client issues an RPC request for information from the server which replies with the result. If requests go to a machine with different byte ordering XDR can translate between them. There are 16 different RPCs used by NFS version 2 to request and regulate file access.

The RPCs run on top of the UDP protocol. UDP is faster than TCP, but doesn't provide any error checking. NFS relies on the built-in retry logic of the RPCs to make sure that requests and replies arrive at their destinations. The client can specify block sizes, number of retry attempts, and time to wait values when it mounts the servers files, with defaults of 8k blocks (read: rsize, write: wsize), 5 retries (retrans), and a 1 second timeout (timeo). If the client doesn't receive an acknowledgment within the timeout period it sends the request again. To prevent overloading the server it then doubles the time-to-wait period. The client continues the cycle until the server responds or the retry limit is reached. If the latter occurs you get the familiar "nfs server not responding" error message. Since the NFS protocol is stateless the client receiving this error has no information to decide if the problem is with the network or with the server. Processes trying to access server files, e.g. df, will happily wait until the server responds, as it is blocked until it receives a reply. If the server crashed the client program will pick up where it left off after the server comes back on line. You can use "soft" mounts to give you the ability to break out from stalled RPC send/receive requests. If you really want to insure that write requests are completed, though, you should use "hard" mounts, and also specify "intr" if you want to be able to abort the command.

Before a client issues an RPC request to the server it checks to see if the desired data is already cached from an earlier request. If the data is newer than the cache attribute timeout value (actimeo, with a default of 30 seconds) than the data is used, otherwise it sends a request to the server to compare the modification time of it's cached file with that of the server's file. If the server's file is newer a request to resend the data is issued.

NFS version 3, used by IRIX 5.3+ and SunOS 5.5+ has some significant enhancements over earlier versions. NFS can now run on top of the TCP protocol. Additionally it now supports safe asynchronous writes, finer access control, and larger file transfer sizes, with less overhead. Since NFS is stateless you want to make sure that the server has really performed the write request to a stable storage area before acknowledging it to the client. Version 3 allows unsafe, asynchronous, writes to be committed to stable storage reliably.

The maximum transfer size has been increased from 8 kB to 4 GB, where the machines negotiate the transfer size, up to 64 KB, the maximum allowed for both UDP and TCP. The protocol, either TCP or UDP, is also negotiated between the machines, defaulting to TCP if both ends support it. The new protocol now allows 64-bit file offsets, up from the former 32-bit limit, supporting arbitrarily large file sizes. The new version is more efficient, e.g. it returns the file attributes after each call, eliminating the need to issue a separate request for this information.

Solaris 2.5 and IRIX 5.3+ NFS implementations support both version 3 and version 2 of the protocols, so that they can reliably communicate with clients and servers supporting either, with full backwards compatibility. Both NFS versions use port 2049 and should have such and entry for both udp and tcp in /etc/services. The 22 RPC requests used be NFS version 3 are listed below.
NFS RPC calls
NFS version 3NFS version 2Description
voidnullDoes nothing, except make sure the connection is up
GETATTRgetattrget file, or directory, attributes, e.g. file type, access times & permissions
SETATTRsetattrset file, or directory, attributes
LOOKUPlookuplookup file name in a directory
ACCESS check access permissions for a user
READLINKreadlinkread the data from a symbolic link
READreadread from a file
WRITEwritewrite to a file
CREATEcreatecreate a file or symbolic link
MKDIRmkdircreate a directory
SYMLINKsymlinkcreate a symbolic link
MKNOD create a special device node
REMOVEremoveremove a file (delete the directory entry)
RMDIRrmdirremove a directory (delete the subdirectory entry from a directory)
RENAMErenamerename a file or directory
LINKlinkcreate a link to an object
READDIRreaddirread from a directory
READDIRPLUS extended read from a directory
FSSTATstatfsget dynamic file system state information
FSINFO get static file system state information
PATHCONF retrieve POSIX information for the filesystem
COMMIT commit the cached data on the server to stable storage (force a flush of data previously written to the server)


Unix System Administration - 8 AUG 1996
[Next] [Previous] [Up] [Top] [Contents]