Strange NFS lockups

Some of you may have heard me talking at one of the meetings about Health & Hospital Corp. of Marion County using Linux and LTSP to provide a Citrix Metaframe connection via Linux based thin clients. If not read on anyway as you might be able to help me put a finger on what's wrong.

Background network layout.

LTSP servers = RedHat 8.0 based pentium 3-450 machines with roughly 256mb ram. Kernel version 2.4.18-14

DNS and DHCP server is Windows 2000 server

Thin clients are Etherboot Launched using a custom kernel version vmlinuz-2.4.19-ltsp-1mod

The thin client boot process is as follows.

Thin Client boots, and reads an Etherboot image off a floppy disk. DHCP hands the ethernet card an IP, and the location of the TFTP server it can download it's kernel from. The kernel boots, and then mounts additional file systems as defined again by DHCP from the LTSP server. Once the machine is fully booted, an X session launches, which then launches the Linux Citrix ICA client. At that point the user is completely ready to work in a windows 2000 environment.

The problem I am having is from time to time, one of the LTSP servers will become completely unresponsive. You can not ping, telnet, or SSH to the server. At this time also all the Thin Client computers booted from the locked server also become unresponsive. Trying to log into the local terminal also is meet with little or more likely no success. However going around the site, and powering off all the Thin Client comptuers, will resolve the problem. I noted that while walking around and powering down thin clients, one of them will be at the point of Mounting additional file systems via NFS.

Once the lockup is cleared I usually restart the LTSP server, and then bring the thin client computers on one at a time. All of them will now function. Including the one that had previously hung at NFS.

Over several lockups, the machine that hangs at NFS mount is never the same. Any suggestions, comments, or questions will be apprciated.

Thank you.

Comments

Re:Strange NFS lockups

Try turning on debug on the rpc.mountd daemon. This may give you some more information on what's happening. You can also run "rpcinfo -u <ip address> 100005" from another system to test the mountd daemon when the problem's happening.

You may want to consider increasing the number of nfsd daemons running.

LTSP

[quote="schultmc"]I've seen NFS lockups when there's no reverse DNS for a client or if there are I/O errors on the NFS server. It's easier to troubleshoot if you have lots of logs to review - I'd start by checking the syslog and other logs on the NFS server that's giving you problems.

In another message you mentioned possibly being able to give an LTSP presentation - any idea when you'd like to present (if you're still available to present)?[/quote]

I'll start compairing logs between the servers. And see if I come up with anything.

As for Presenting LTSP at one of the meetings, January, Feburary meeting?
I am working on providing an XPDE desktop to the Thin Clients for use in an otherwise Microsoft friendly environment. I don't think the Citrix Metaframe model has much use outside of a corporation that's already invested a fortune in Metaframe.

Re:Strange NFS lockups

I've seen NFS lockups when there's no reverse DNS for a client or if there are I/O errors on the NFS server. It's easier to troubleshoot if you have lots of logs to review - I'd start by checking the syslog and other logs on the NFS server that's giving you problems.

In another message you mentioned possibly being able to give an LTSP presentation - any idea when you'd like to present (if you're still available to present)?