Home > Articles

Disk Usage

This chapter is from the book

One of the most common problems that system administrators face in the Unix world is disk space. Whether it's running out of space, or just making sure that no one user is hogging all your resources, exploring how the hard disks are allocated and utilized on your system is a critical skill.

In this hour, you learn:

  • How to look at disk usage with df and du

  • How to simplify analysis with sort

  • How to identify the biggest files and use diskhogs

Physical Disks and Partitions

In the last few years, hard disks have become considerably bigger than most operating systems can comfortably manage. Indeed, most file systems have a minimum size for files and a maximum number of files and/or directories that can be on a single physical device, and it's those constraints that slam up against the larger devices.

As a result, most modern operating systems support taking a single physical disk and splitting it into multiple virtual disks, or partitions. Windows and Macintosh systems have supported this for a few years, but usually on a personal desktop system you don't have to worry about disks that are too big, or worse, running out of disk space and having the system crash.

Unix is another beast entirely. In the world of Unix, you can have hundreds of different virtual disks and not even know it—even your home directory might be spread across two or three partitions.

One reason for this strategy in Unix is that running programs tend to leave log files, temp files, and other detritus behind, and they can add up and eat a disk alive.

For example, on my main Web server, I have a log file that's currently growing about 140K/day and is 19MB. Doesn't sound too large when you think about 50GB disks for $100 at the local electronics store, but having big disks at the store doesn't mean that they're installed in your server!

In fact, Unix is very poorly behaved when it runs out of disk space, and can get sufficiently corrupted enough that it essentially stops and requires an expert sysadmin to resurrect. To avoid this horrible fate, it's crucial to keep an eye on how big your partitions are growing, and to know how to prune large files before they become a serious problem.

Task 3.1: Exploring Partitions

In the last few years, hard disks have become considerably bigger than most operating systems can comfortably manage. Indeed, most file systems have a minimum size for files and a maximum number of files and/or directories that can be on a single physical device, and it's those constraints that slam up against the larger devices.

As a result, most modern operating systems support taking a single physical disk and splitting it into multiple virtual disks, or partitions. Windows and Macintosh systems have supported this for a few years, but usually on a personal desktop system you don't have to worry about disks that are too big, or worse, running out of disk space and having the system crash.

Unix is another beast entirely. In the world of Unix, you can have hundreds of different virtual disks and not even know it—even your home directory might be spread across two or three partitions.

One reason for this strategy in Unix is that running programs tend to leave log files, temp files, and other detritus behind, and they can add up and eat a disk alive.

For example, on my main Web server, I have a log file that's currently growing about 140K/day and is 19MB. Doesn't sound too large when you think about 50GB disks for $100 at the local electronics store, but having big disks at the store doesn't mean that they're installed in your server!

In fact, Unix is very poorly behaved when it runs out of disk space, and can get sufficiently corrupted enough that it essentially stops and requires an expert sysadmin to resurrect. To avoid this horrible fate, it's crucial to keep an eye on how big your partitions are growing, and to know how to prune large files before they become a serious problem.

Task 3.1: Exploring Partitions

Enough chatter, let's get down to business, shall we?

  1. The command we'll be exploring in this section is df, a command that reports disk space usage. Without any arguments at all, it offers lots of useful information:

    # df
    Filesystem     1k-blocks   Used      Available  Use%    Mounted on
    /dev/sda5         380791    108116      253015   30%    /
    /dev/sda1          49558      7797       39202   17%    /boot
    /dev/sda3       16033712     62616    15156608    1%    /home
    none              256436         0      256436    0%    /dev/shm
    /dev/sdb1       17245524   1290460    15079036    8%    /usr
    /dev/sdb2         253871     88384      152380   37%    /var

    Upon first glance, it appears that I have five different disks connected to this system. In fact, I have two.

  2. I'm sure you already know this, but it's worth pointing out that all devices hooked up to a computer, whether for input or output, require a specialized piece of code called a device driver to work properly. In the Windows world, they're typically hidden away, and you have no idea what they're even called.

    Device drivers in Unix, however, are files. They're special files, but they show up as part of the file system along with your e-mail archive and login scripts.

    That's what the /dev/sda5 is on the first line, for example. We can have a look at this file with ls to see what it is:

    # ls -l /dev/sda5brw-rw----  1 root   disk    8,  5 Aug 30 13:30 /dev/sda5

    The leading b is something you probably haven't seen before. It denotes that this device is a block-special device.

    Here's a nice thing to know: The device names in Unix have meaning. In fact, sd typically denotes a SCSI device, and the next letter is the major device number (in this case an a), and the last letter is the minor device number (5).

    From this information, we can glean that there are three devices with the same major number but different minor numbers (sda1, sda3, and sda5), and two devices with a different major number and different minor numbers (sdb1 and sdb2).

    In fact, the first three are partitions on the same hard disk, and the second two are partitions on a different disk.

TIP

If you ever have problems with a device, use ls -l to make sure it's configured properly. If the listing doesn't begin with a c (for a character special device) or a b (for a block-special device), something's gone wrong and you need to delete it and rebuild it with mknod.

  1. How big is the disk? Well, in some sense it doesn't really matter in the world of Unix, because Unix only cares about the partitions that are assigned to it. If the second disk is 75GB, but we only have a 50MB partition that's available to Unix, the vast majority of the disk is untouchable and therefore doesn't matter.

    If you really want to figure it out, you could add up the size of each partition (the Available column), but let's dissect a single line of output first, so you can see what's what:

    /dev/sda5         380791    108116      253015   30%    /

    Here you're shown the device ID (sda5), then the size of the partition (in 1K blocks within Linux). This partition is 380,791KB, or 380MB. The second number shows how much of the partition is used—108,116KB—and the next how much is available—253,015KB. This translates to 30% of the partition in use and 70% available.

    The last value is perhaps the most important because it indicates where the partition has been connected to the Unix file system. Partition sda5 is the root partition, as can be seen by the /.

NOTE

Note - Those purists among you will realize the error of this calculation: 380,791/1024 is not a simple division by 1,000. So everyone is happy, that reveals that this partition is exactly 371.8MB.

  1. Let's look at another line from the df output:

    /dev/sda3       16033712     62616    15156608    1%    /home

    Notice here that the partition is considerably bigger! In fact, it's 16,033,712KB, or roughly 16GB (15.3GB for purists). Unsurprisingly, very little of this is used—less than 1%—and it's mounted to the system as the /home directory.

    In fact, look at the mount points for all the partitions for just a moment:

    # df
    Filesystem      1k-blocks   Used Available Use% Mounted on
    /dev/sda5        380791  108116  253015 30% /
    /dev/sda1        49558   7797   39202 17% /boot
    /dev/sda3       16033712   62616 15156608  1% /home
    none          256436     0  256436  0% /dev/shm
    /dev/sdb1       17245524  1290460 15079036  8% /usr
    /dev/sdb2        253871   88389  152375 37% /var

    We have the topmost root partition (sda5); then we have additional small partitions for /boot, /usr, and /var. The two really big spaces are /home, where all the individual user files will live, and /usr, where I have all the Web sites on this server stored.

    This is a very common configuration, where each area of Unix has its own sandbox to play in, as it were. This lets you, the sysadmin, manage file usage quite easily, ensuring that running out of space in one directory (say, /home) doesn't affect the overall system.

  2. Solaris 8 has a df command that offers very different information, focused more on files and the file system than on disks and disk space used:

    # df
    /         (/dev/dsk/c0d0s0  ): 827600 blocks  276355 files
    /boot       (/dev/dsk/c0d0p0:boot):  17584 blocks    -1 files
    /proc       (/proc       ):    0 blocks   1888 files
    /dev/fd      (fd        ):    0 blocks    0 files
    /etc/mnttab    (mnttab      ):    0 blocks    0 files
    /var/run      (swap       ): 1179992 blocks  21263 files
    /tmp        (swap       ): 1179992 blocks  21263 files
    /export/home    (/dev/dsk/c0d0s7  ): 4590890 blocks  387772 files

    It's harder to see what's going on, but notice that the order of information presented on each line is the mount point, the device identifier, the size of the device in 1K blocks, and the number of files on that device.

    There's no way to see how much of the disk is in use and how much space is left available, so the default df output isn't very helpful for a system administrator.

    Fortunately, there's the -t totals option that offers considerably more helpful information:

    # df -t
    /               (/dev/dsk/c0d0s0  ):  827600 blocks  276355 files
                     total: 2539116 blocks  320128 files
    /boot           (/dev/dsk/c0d0p0:boot):  17584 blocks    -1 files
                     total:  20969 blocks    -1 files
    /proc           (/proc       ):    0 blocks   1888 files
                     total:    0 blocks   1932 files
    /dev/fd         (fd        ):    0 blocks    0 files
                     total:    0 blocks   258 files
    /etc/mnttab     (mnttab      ):    0 blocks    0 files
                     total:    0 blocks    1 files
    /var/run        (swap       ): 1180000 blocks  21263 files
                     total: 1180008 blocks  21279 files
    /tmp            (swap       ): 1180000 blocks  21263 files
                     total: 1180024 blocks  21279 files
    /export/home    (/dev/dsk/c0d0s7  ): 4590890 blocks  387772 files
                     total: 4590908 blocks  387776 files

    Indeed, when I've administered Solaris systems, I've usually set up an alias df="df -t" to always have this more informative output.

NOTE

If you're trying to analyze the df output programmatically so you can flag when disks start to get tight, you'll immediately notice that there's no percentile-used summary in the df output in Solaris. Extracting just the relevant fields of information is quite tricky too, because you want to glean the number of blocks used from one line, then the number of blocks total on the next. It's a job for Perl or awk (or even a small C program).

  1. By way of contrast, Darwin has a very different output for the df command:

    # df
    Filesystem         512-blocks  Used      Avail      Capacity  Mounted on
    /dev/disk1s9         78157200  29955056  48202144    38%      /
    devfs                      73        73         0   100%      /dev
    fdesc                       2         2         0   100%      /dev
    <volfs>              1024      1024         0   100%      /.vol
    /dev/disk0s8         53458608  25971048  27487560    48%      /Volumes/Macintosh HD
    automount -fstab [244]      0         0         0   100%      /Network/Servers
    automount -static [244]     0         0         0   100%      /automount

    About as different as it could be, and notice that it suggests that just about everything is at 100% capacity. Uh oh!

    A closer look, however, reveals that the devices at 100% capacity are devfs, fdesc, <volfs>, and two automounted services. In fact, they're related to the Mac OS running within Darwin, and really the only lines of interest in this output are the two proper /dev/ devices:

    /dev/disk1s9         78157200  29955056  48202144    38%      /
    /dev/disk0s8         53458608  25971048  27487560    48%      /Volumes/Macintosh HD

    The first of these, identified as /dev/disk1s9, is the hard disk where Mac OS X is installed, and it has 78,157,200 blocks. However, they're not 1K blocks as in Linux, they're 512-byte blocks, so you need to factor that in when you calculate the size in GB:

    78,157,200 ÷ 2 = 39,078,600 1K blocks

    39,078,600 ÷ 1024 = 38,162.69MB

    38,162.69MB ÷ 1024 = 37.26GB

    In fact, this is a 40GB disk, so we're right on with our calculations, and we can see that 38% of the disk is in use, leaving us with 48202144 ÷ (2 x 1024 x 1024) = 22.9GB.

    Using the same math, you can calculate that the second disk is 25GB, of which about half (48%) is in use.

TIP

Wondering what happened to the 2.78GB of space that is the difference between the manufacturer's claim of a 40GB disk and the reality of my only having 37.26GB? The answer is that there's always a small percentage of disk space consumed by formatting and disk overhead. That's why manufacturers talk about "unformatted capacity."

  1. Linux has a very nice flag with the df command worth mentioning: Use -h and you get:

    # df -h
    Filesystem      Size  Used  Avail  Use%  Mounted on
    /dev/sda5       372M  106M  247M   30%   /
    /dev/sda1        48M  7.7M   38M   17%   /boot
    /dev/sda3        15G   62M   14G    1%   /home
    none            250M     0  250M    0%   /dev/shm
    /dev/sdb1        16G  1.3G   14G    8%   /usr
    /dev/sdb2       248M   87M  148M   37%   /var

    A much more human-readable format. Here you can see that /home and /usr both have 14GB unused. Lots of space!

This section has given you a taste of the df command, but we haven't spent too much time analyzing the output and digging around trying to ascertain where the biggest files live. That's what we'll consider next.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020