- Physical Disks and Partitions
- A Closer Look with du
- Simplifying Analysis with sort
- Identifying the Biggest Files
- Keeping Track of Users: diskhogs
- Summary
- Q&A
- Workshop
Keeping Track of Users: diskhogs
Let's put all the information in this hour together and create an administrative script called diskhogs. When run, this script will report the users with the largest /home directories, and then report the five largest files in each of their homes.
Task 3.5: This Little Piggy Stayed Home?
This is the first shell script presented in the book, so a quick rule of thumb: Write your shell scripts in sh rather than csh. It's easier, more universally recognized, and most shell scripts you'll encounter are also written in sh. Also, keep in mind that just about every shell script discussed in this book will expect you to be running as root, since they'll need access to the entire file system for any meaningful or useful system administration functions.
In this book, all shell scripts will be written in sh, which is easily verified by the fact that they all have
#!/bin/sh
as their first line.
-
Let's put all this together. To find the five largest home directories, you can use
du s /home/* | sort rn | cut f2 | head 5
For each directory, you can find the largest files within by using
find /home/loginID -type f -printf "%k %p\n" | sort -rn | head
Therefore, we should be able to identify the top home directories, then step one-by-one into those directories to identify the largest files in each. Here's how that code should look:
for dirname in ´du -s /home/* | sort -rn | cut -f2- | head -5´ do echo "" echo Big directory: $dirname echo Four largest files in that directory are: find $dirname -type f -printf "%k %p\n" | sort -rn | head -4 done exit 0
-
This is a good first stab at this shell script. Let's save it as diskhogs.sh, run it and see what we find:
# sh diskhogs.sh Big directory: /home/staging Four largest files in that directory are: 423 /home/staging/waldorf/big/DSCF0165.jpg 410 /home/staging/waldorf/big/DSCF0176.jpg 402 /home/staging/waldorf/big/DSCF0166.jpg 395 /home/staging/waldorf/big/DSCF0161.jpg Big directory: /home/chatter Four largest files in that directory are: 1076 /home/chatter/comics/lynx 388 /home/chatter/logs/access_log 90 /home/chatter/logs/error_log 64 /home/chatter/responding.cgi Big directory: /home/cbo Four largest files in that directory are: 568 /home/cbo/financing.pdf 464 /home/cbo/investors/CBO-plan.pdf 179 /home/cbo/Archive/cbofinancial-modified-files/CBO Website.zip 77 /home/cbo/Archive/cbofinancial-modified-files/CBO Financial Incorporated.doc Big directory: /home/sherlockworld Four largest files in that directory are: 565 /home/sherlockworld/originals-from gutenberg.txt 56 /home/sherlockworld/speckled-band.html 56 /home/sherlockworld/copper-beeches.html 54 /home/sherlockworld/boscombe-valley.html Big directory: /home/launchline Four largest files in that directory are: 151 /home/launchline/logs/access_log 71 /home/launchline/x/submit.cgi 71 /home/launchline/x/admin/managesubs.cgi 64 /home/launchline/x/status.cgi
As you can see, the results are good, but the order of the output fields is perhaps less than we'd like. Ideally, I'd like to have all the disk hogs listed, then their largest files listed. To do this, we'll have to either store all the directory names in a variable that we then parse subsequently, or we'd have to write the information to a temporary file.
Because it shouldn't be too much information (five directory names), we'll save the directory names as a variable. To do this, we'll use the nifty backquote notation.
Here's how things will change. First off, let's load the directory names into the new variable:
bigdirs="´du s /home/* | sort rn | cut f2- | head 5´"
Then we'll need to change the for loop to reflect this change, which is easy:
for dirname in $bigdirs ; do
Notice I've also pulled the do line up to shorten the script. Recall that a semicolon indicates the end of a command in a shell script, so we can then pull the next line up without any further ado.
TIP
Unix old-timers often refer to backquotes as backticks, so a wizened Unix admin might well say "stick the dee-ewe in backticks" at this juncture.
-
Now let's not forget to output the list of big directories before we list the big files per directory. In total, our script now looks like this:
echo "Disk Hogs Report for System ´hostname´" bigdirs="´du -s /home/* | sort -rn | cut -f2- | head -5´" echo "The Five biggest home directories are:" echo $bigdirs for dirname in $bigdirs ; do echo "" echo Big directory: $dirname echo Four largest files in that directory are: find $dirname -type f -printf "%k %p\n" | sort -rn | head -4 done exit 0
This is quite a bit closer to the finished product, as you can see from its output:
Disk Hogs Report for System staging.intuitive.com The Five biggest home directories are: /home/staging /home/chatter /home/cbo /home/sherlockworld /home/launchline Big directory: /home/staging Four largest files in that directory are: 423 /home/staging/waldorf/big/DSCF0165.jpg 410 /home/staging/waldorf/big/DSCF0176.jpg 402 /home/staging/waldorf/big/DSCF0166.jpg 395 /home/staging/waldorf/big/DSCF0161.jpg Big directory: /home/chatter Four largest files in that directory are: 1076 /home/chatter/comics/lynx 388 /home/chatter/logs/access_log 90 /home/chatter/logs/error_log 64 /home/chatter/responding.cgi Big directory: /home/cbo Four largest files in that directory are: 568 /home/cbo/financing.pdf 464 /home/cbo/investors/CBO-plan.pdf 179 /home/cbo/Archive/cbofinancial-modified-files/CBO Website.zip 77 /home/cbo/Archive/cbofinancial-modified-files/CBO Financial Incorporated.doc Big directory: /home/sherlockworld Four largest files in that directory are: 565 /home/sherlockworld/originals-from gutenberg.txt 56 /home/sherlockworld/speckled-band.html 56 /home/sherlockworld/copper-beeches.html 54 /home/sherlockworld/boscombe-valley.html Big directory: /home/launchline Four largest files in that directory are: 151 /home/launchline/logs/access_log 71 /home/launchline/x/submit.cgi 71 /home/launchline/x/admin/managesubs.cgi 64 /home/launchline/x/status.cgi
This is a script you could easily run every morning in the wee hours with a line in cron (which we'll explore in great detail in Hour 15, "Running Jobs in the Future"), or you can even put it in your .profile to run automatically each time you log in.
-
One final nuance: To have the output e-mailed to you, simply append the following:
| mail s "Disk Hogs Report" your-mailaddr
If you've named this script diskhogs.sh like I have, you could have the output e-mailed to you (as root) with
sh diskhogs.sh | mail s "Disk Hogs Report" root
Try that, then check root's mailbox to see if the report made it.
-
For those of you using Solaris, Darwin, or another Unix, the nifty -printf option probably isn't available with your version of find. As a result, the more generic version of this script is rather more complex, because we not only have to sidestep the lack of -printf, but we also have to address the challenge of having embedded spaces in most directory names (on Darwin). To accomplish the latter, we use sed and awk to change all spaces to double underscores and then back again when we feed the arg to the find command:
#!/bin/sh echo "Disk Hogs Report for System ´hostname´" bigdir2="´du -s /Library/* | sed 's/ /_/g' | sort -rn | cut -f2- | head -5´" echo "The Five biggest library directories are:" echo $bigdir2 for dirname in $bigdir2 ; do echo "" echo Big directory: $dirname echo Four largest files in that directory are: find "´echo $dirname | sed 's/_/ /g'´" -type f -ls | \ awk '{ print $7" "$11 }' | sort -rn | head -4 done exit 0
The good news is that the output ends up being almost identical, which you can verify if you have an OS X or other BSD system available.
Of course, it would be smart to replace the native version of find with the more sophisticated GNU version, but changing essential system tools is more than most Unix users want!
TIP
If you want to explore upgrading some of the Unix tools in Darwin to take advantage of the sophisticated GNU enhancements, then you'd do well to start by looking at http://www.osxgnu.org/ for ported code. The site also includes download instructions.
If you're on Solaris or another flavor of Unix that isn't Mac OS X, check out the main GNU site for tool upgrades at http://www.gnu.org/.
This shell script evolved in a manner that's quite common for Unix toolsit started out life as a simple command line; then as the sophistication of the tool increased, the complexity of the command sequence increased to where it was too tedious to type in directly, so it was dropped into a shell script. Shell variables then offered the capability to save interim output, fine-tune the presentation, and more, so we exploited it by building a more powerful tool. Finally, the tool itself was added to the system as an automated monitoring task by adding it to the root cron job.