Commands for Troubleshooting NFS Problems

These commands can be useful when troubleshooting NFS problems.

`nfsstat` Command

You can use this command to gather statistical information about NFS and RPC connections. The syntax of the command is as follows:

nfsstat [ -cmnrsz ]

-c: Displays client-side information
-m: Displays statistics for each NFS-mounted file system
-n: Specifies that NFS information is to be displayed on both the client side and the server side
-r: Displays RPC statistics
-s: Displays the server-side information
-z: Specifies that the statistics should be set to zero

If no options are supplied on the command line, the -cnrs options are used.

Gathering server-side statistics can be important for debugging problems when new software or new hardware is added to the computing environment. Running this command a minimum of once a week, and storing the numbers, provides a good history of previous performance.

Refer to the following example:

# nfsstat -s

Server rpc:
Connection oriented:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs    
719949194  0          0          0          0          58478624   33         
Connectionless:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs    
73753609   0          0          0          0          987278     7254       

Server nfs:
calls                badcalls             
787783794            3516                 
Version 2: (746607 calls)
null       getattr    setattr    root       lookup     readlink   read       
883 0%     60 0%      45 0%      0 0%       177446 23% 1489 0%    537366 71% 
wrcache    write      create     remove     rename     link       symlink    
0 0%       1105 0%    47 0%      59 0%      28 0%      10 0%      9 0%       
mkdir      rmdir      readdir    statfs     
26 0%      0 0%       27926 3%   108 0%     
Version 3: (728863853 calls)
null          getattr       setattr       lookup        access        
1365467 0%    496667075 68% 8864191 1%    66510206 9%   19131659 2%   
readlink      read          write         create        mkdir         
414705 0%     80123469 10%  18740690 2%   4135195 0%    327059 0%     
symlink       mknod         remove        rmdir         rename        
101415 0%     9605 0%       6533288 0%    111810 0%     366267 0%     
link          readdir       readdirplus   fsstat        fsinfo        
2572965 0%    519346 0%     2726631 0%    13320640 1%   60161 0%      
pathconf      commit        
13181 0%      6248828 0%    
Version 4: (54871870 calls)
null                compound            
266963 0%           54604907 99%        
Version 4: (167573814 operations)
reserved            access              close               commit              
0 0%                2663957 1%          2692328 1%          1166001 0%          
create              delegpurge          delegreturn         getattr             
167423 0%           0 0%                1802019 1%          26405254 15%        
getfh               link                lock                lockt               
11534581 6%         113212 0%           207723 0%           265 0%              
locku               lookup              lookupp             nverify             
230430 0%           11059722 6%         423514 0%           21386866 12%        
open                openattr            open_confirm        open_downgrade      
2835459 1%          4138 0%             18959 0%            3106 0%             
putfh               putpubfh            putrootfh           read                
52606920 31%        0 0%                35776 0%            4325432 2%          
readdir             readlink            remove              rename              
606651 0%           38043 0%            560797 0%           248990 0%           
renew               restorefh           savefh              secinfo             
2330092 1%          8711358 5%          11639329 6%         19384 0%            
setattr             setclientid         setclientid_confirm verify              
453126 0%           16349 0%            16356 0%            2484 0%             
write               release_lockowner   illegal             
3247770 1%          0 0%                0 0%                

Server nfs_acl:
Version 2: (694979 calls)
null        getacl      setacl      getattr     access      getxattrdir 
0 0%        42358 6%    0 0%        584553 84%  68068 9%    0 0%        
Version 3: (2465011 calls)
null        getacl      setacl      getxattrdir 
0 0%        1293312 52% 1131 0%     1170568 47%

The previous listing is an example of NFS server statistics. The first five lines relate to RPC and the remaining lines report NFS activities. In both sets of statistics, knowing the average number of badcalls or calls and the number of calls per week can help identify a problem. The badcalls value reports the number of bad messages from a client. This value can indicate network hardware problems.

Some of the connections generate write activity on the disks. A sudden increase in these statistics could indicate trouble and should be investigated. For NFS version 2 statistics, the connections to note are setattr, write, create, remove, rename, link, symlink, mkdir, and rmdir. For NFS version 3 and version 4 statistics, the value to watch is commit. If the commit level is high in one NFS server, compared to another almost identical server, check that the NFS clients have enough memory. The number of commit operations on the server grows when clients do not have available resources.

`pstack` Command

This command displays a stack trace for each process. The pstack command must be run by the owner of the process or by root. You can use pstack to determine where a process is hung. The only option that is allowed with this command is the PID of the process that you want to check. See the proc(1) man page.

The following example is checking the nfsd process that is running.

# /usr/bin/pgrep nfsd
243
# /usr/bin/pstack 243
243:    /usr/lib/nfs/nfsd -a 16
 ef675c04 poll     (24d50, 2, ffffffff)
 000115dc ???????? (24000, 132c4, 276d8, 1329c, 276d8, 0)
 00011390 main     (3, efffff14, 0, 0, ffffffff, 400) + 3c8
 00010fb0 _start   (0, 0, 0, 0, 0, 0) + 5c

The example shows that the process is waiting for a new connection request, which is a normal response. If the stack shows that the process is still in poll after a request is made, the process might be hung. Follow the instructions in How to Restart NFS Services to fix this problem. Review the instructions in NFS Troubleshooting Procedures to fully verify that your problem is a hung program.

`rpcinfo` Command

This command generates information about the RPC service that is running on a system. You can also use this command to change the RPC service. Many options are available with this command. See the rpcinfo(1M) man page. The following is a shortened synopsis for some of the options that you can use with the command.

rpcinfo [ -m | -s ] [ hostname ]

rpcinfo -T transport hostname [ progname ]

rpcinfo [ -t | -u ] [ hostname ] [ progname ]

-m: Displays a table of statistics of the rpcbind operations
-s: Displays a concise list of all registered RPC programs
-T: Displays information about services that use specific transports or protocols
-t: Probes the RPC programs that use TCP
-u: Probes the RPC programs that use UDP
transport: Selects the transport or protocol for the services
hostname: Selects the host name of the server that you need information from
progname: Selects the RPC program to gather information about

If no value is given for hostname, the local host name is used. You can substitute the RPC program number for progname, but many users can remember the name and not the number. You can use the -p option in place of the -s option on those systems that do not run the NFS version 3 software.

The data that is generated by this command can include the following:

The RPC program number
The version number for a specific program
The transport protocol that is being used
The name of the RPC service
The owner of the RPC service

The following example gathers information about the RPC services that are running on a server. The text that is generated by the command is filtered by the sort command to make the output more readable. Several lines that list RPC services have been deleted from the example.

% rpcinfo -s bee |sort -n
   program version(s) netid(s)                         service     owner
    100000  2,3,4     udp6,tcp6,udp,tcp,ticlts,ticotsord,ticots rpcbind     superuser
    100001  4,3,2     ticlts,udp,udp6                  rstatd      superuser
    100002  3,2       ticots,ticotsord,tcp,tcp6,ticlts,udp,udp6 rusersd     superuser
    100003  3,2       tcp,udp,tcp6,udp6                nfs         superuser
    100005  3,2,1     ticots,ticotsord,tcp,tcp6,ticlts,udp,udp6 mountd      superuser
    100007  1,2,3     ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 ypbind      superuser
    100008  1         ticlts,udp,udp6                  walld       superuser
    100011  1         ticlts,udp,udp6                  rquotad     superuser
    100012  1         ticlts,udp,udp6                  sprayd      superuser
    100021  4,3,2,1   tcp,udp,tcp6,udp6                nlockmgr    superuser
    100024  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 status      superuser
    100029  3,2,1     ticots,ticotsord,ticlts          keyserv     superuser
    100068  5         tcp,udp                          cmsd        superuser
    100083  1         tcp,tcp6                         ttdbserverd superuser
    100099  3         ticotsord                        autofs      superuser
    100133  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 -           superuser
    100134  1         ticotsord                        tokenring   superuser
    100155  1         ticots,ticotsord,tcp,tcp6        smserverd   superuser
    100221  1         tcp,tcp6                         -           superuser
    100227  3,2       tcp,udp,tcp6,udp6                nfs_acl     superuser
    100229  1         tcp,tcp6                         metad       superuser
    100230  1         tcp,tcp6                         metamhd     superuser
    100231  1         ticots,ticotsord,ticlts          -           superuser
    100234  1         ticotsord                        gssd        superuser
    100235  1         tcp,tcp6                         -           superuser
    100242  1         tcp,tcp6                         metamedd    superuser
    100249  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 -           superuser
    300326  4         tcp,tcp6                         -           superuser
    300598  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 -           superuser
    390113  1         tcp                              -           unknown
 805306368  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6 -           superuser
1289637086  1,5       tcp                              -           26069

The following two examples show how to gather information about a particular RPC service by selecting a particular transport on a server. The first example checks the mountd service that is running over TCP. The second example checks the NFS service that is running over UDP.

% rpcinfo -t bee mountd
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
% rpcinfo -u bee nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

`snoop` Command

This command is often used to watch for packets on the network. The snoop command must be run as root. The use of this command is a good way to ensure that the network hardware is functioning on both the client and the server. Many options are available. See the snoop(1M) man page. A shortened synopsis of the command follows:

snoop [ -d device ] [ -o filename ] [ host hostname ]

-d device: Specifies the local network interface
-o filename: Stores all the captured packets into the named file
hostname: Displays packets going to and from a specific host only

The -d device option is useful on those servers that have multiple network interfaces. You can use many expressions other than setting the host. A combination of command expressions with grep can often generate data that is specific enough to be useful.

When troubleshooting, make sure that packets are going to and from the proper host. Also, look for error messages. Saving the packets to a file can simplify the review of the data.

`truss` Command

You can use this command to check if a process is hung. The truss command must be run by the owner of the process or by root. You can use many options with this command. See the truss(1) man page. A shortened syntax of the command follows.

truss [ -t syscall ] -p pid

-t syscall: Selects system calls to trace
-p pid: Indicates the PID of the process to be traced

The syscall can be a comma-separated list of system calls to be traced. Also, starting syscall with an ! selects to exclude the listed system calls from the trace.

This example shows that the process is waiting for another connection request from a new client.

# /usr/bin/truss -p 243
poll(0x00024D50, 2, -1)         (sleeping...)

The previous example shows a normal response. If the response does not change after a new connection request has been made, the process could be hung. Follow the instructions in How to Restart NFS Services to fix the hung program. Review the instructions in NFS Troubleshooting Procedures to fully verify that your problem is a hung program.