In multi-threaded programs, threads compete with each other for resources as well as make progresses concurrently. Improper thread scheduling strategy might incur performance drawbacks instead of gaining concurrency, e.g. delay of critical work and overhead of context-switching. Popular tools like gdb or gprof do not help much in finding and fixing this kind of performance bugs. This talk describes a method and tool for monitoring, analyzing and tuning multi-threaded programs. It proposes a set of performance measurements to help identify the bottleneck and to make suggestions about task distribution or thread scheduling strategy. The measurements include waiting graphs of threads and intensities and efficiencies of synchronization variables. The tool is implemented based on a user-level thread package written by David Hanson. The thread package is instrumented to report a (minimum) set of events to a remote monitoring server via internet sockets. The server uses the event traces to reconstruct, analyze and visualize the thread activities remotely. The advantage of this client-server structure is that CPU cycles taken by the monitoring tool is minimized on the application machine and no huge log file is written. The tool gives useful information on coarse-grained threads as well as fine-grained threads. It was written in C++, Tcl/Tk, Open GL and Perl. Its functionality and usefulness is illustrated by applying to an example application, multi-threaded file server, and improving its performance dramatically.