SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Register your product to gain access to bonus material or receive a coupon.
"Your organization needs this book!"
--Peter Salus, Chief Knowledge Officer, Matrix.Net, "The Bookworm"
This book describes the best practices of system and network administration, independent of specific platforms or technologies. It features six key principles of site design and support practices: simplicity, clarity, generality, automation, communication, and basics first. It examines the major areas of responsibility for system administrators within the context of these principles. The book also discusses change management and revision control, server upgrades, maintenance windows, and service conversions. You will find experience-based advice on topics such as:
And there's more! When was the last time you read a book that dealt with:
Chapters are divided into The Basics and The Icing. The Basics are those key elements that, when done right, make every other aspect of the job easier. Things like starting all new hosts with the same configuration and picking the right things to automate first. The Icing sections contain all those powerful things that can be done on top of the basics to wow customers and managers. Do the basics first. The icing is a vision for the future that usually only comes with decades of experience.
Diary of a Network Administrator: Mean People Suck
Click below for Web Resources related to this title:
Author's Web Site
Click below for Sample Chapter related to this title:
limoncellich.pdf
Preface.
Acknowledgments.
About the Authors.
Introduction.
Do These Now!
Use a Trouble-Ticket System.
Manage Quick Requests Right.
Start Every New Host in a Known State.
I. THE PRINCIPLES.
1. Desktops.The Basics.
Loading the System Software and Applications Initially.
Updating the System Software and Applications.
Network Configuration.
Dynamic DNS with DHCP.
The Icing.
High Confidence in Completion.
Involve Customers in the Standardization Process.
A Variety of Standard Configurations.
Conclusion.
2. Servers.The Basics.
Buy Server Hardware for Servers.
Vendors Known for Reliable Products.
Does Server Hardware Really Cost More?
Maintenance Contracts and Spare Parts.
Data Backups.
Servers Live in the Data Center.
Same, Different, or a Stripped-Down OS on Clients.
Remote Administration Access.
Mirrored Root Disks.
The Icing.
Server Appliances.
Redundant Power Supplies.
Full and n + 1</I> Redundancy.
Hot-swap Components.
Separate Networks for Administrative Functions.
Opposing View: Many Inexpensive Workstations.
Conclusion.
3. Services.The Basics.
Customer Requirements.
Operational Requirements.
Open Architecture.
Simplicity.
Vendor Relations.
Machine Independence.
Environment.
Restricted Access.
Reliability.
Single or Multiple Servers.
Centralization and Standards.
Performance.
Monitoring.
Service Rollout.
The Icing.
Dedicated Machines.
Full Redundancy.
Conclusion.
4. Debugging.The Basics.
Learn the Customer's Problem.
Find the Problem's Cause and Fix It.
Have the Right Tools.
The Icing.
Better Tools.
Formal Training on the Tools.
End-to-End Understanding of the System.
Conclusion.
5. Fixing Things Once.The Basics.
Fix Things Once, Rather Than Over and Over.
Avoid the Temporary Fix Trap.
Learning from Carpenters.
The Icing.
Conclusion.
6. Namespaces.The Basics.
Namespaces Need Policies.
Namespaces Need Change Procedures.
Namespace Management Should Be Centralized.
The Icing.
One Huge Database That Drives Everything.
Further Automation.
Customers Do Many of the Updates.
Next-Level Namespace Ubiquity.
Conclusion.
7. Security Policy.The Basics.
Build Security Using a Solid Infrastructure.
Ask the Right Questions.
Document the Company's Security Policies.
Basics for the Technical Staff.
Management and Organizational Issues.
The Icing.
Make Security Pervasive.
Stay Up-to-Date: Contacts and Technologies.
Produce Metrics.
Organization Profiles.
Small Company.
Medium-Size Company.
Large Company.
E-commerce Site.
University.
Conclusion.
8. Disaster Recovery and Data Integrity.The Basics.
What Is a Disaster?
Risk Analysis.
Legal Obligations.
Damage Limitation.
Preparation.
Data Integrity.
The Icing.
Redundant Site.
Security Disasters.
Media Relations.
Conclusion.
9. Ethics.The Basics.
Informed Consent.
Professional Code of Conduct.
Network/Computer User Code of Conduct.
Privileged Access Code of Conduct.
Copyright Adherence.
Working with Law Enforcement.
The Icing.
Setting Expectations on Privacy and Monitoring.
Being Told to Do Something Illegal/Unethical.
Conclusion.
II. THE PROCESSES.
10. Change Management and Revision Control.The Basics.
Technical Issues.
Communications Structure.
Scheduling.
Process and Documentation.
Quiet Times.
The Icing.
Automated Front-Ends.
Change Management Meetings.
Streamline the Process.
Conclusion.
11. Server Upgrades.The Basics.
The Steps in Detail.
The Icing.
Add and Remove Services at the Same Time.
Fresh Installs.
Reusing the Tests.
System Changelog.
A Dress Rehearsal.
Install Old and New Versions on the Same Machine.
Minimal Changes From the Base.
Conclusion.
12. Maintenance Windows.The Basics.
Scheduling.
Planning.
Flight Director.
Change Proposals.
The Master Plan.
Disabling Access.
Mechanics and Coordination.
Deadlines for Change Completion.
Comprehensive System Testing.
Postmaintenance Communication.
Re-enable Remote Access.
Visible Presence the Next Morning.
Postmortem.
The Icing.
Mentoring a New Flight Director.
Trending of Historical Data.
Providing Limited Availability.
High-Availability Sites.
The Similarities.
The Differences.
Conclusion.
13. Service Conversions.The Basics.
Small Groups First, Then Expand Communication.
Minimize Intrusiveness.
Layers Versus Pillars.
Avoid Flash-Cuts.
Successful Flash-Cuts.
Back-Out Plan.
The Icing.
Instant Roll-Back.
Avoid Explicit Conversions.
Vendor Support.
Conclusion.
14. Centralization and Decentralization.The Basics.
Guiding Principles.
Candidates for Centralization.
Candidates for Decentralization.
The Icing.
Consolidate Purchasing.
Outsourcing.
Conclusion.
III. THE PRACTICES.
15. Helpdesks.The Basics.
Have a Helpdesk.
A Friendly Face.
Staff Sizing.
Defined Scope of Coverage.
Defined Processes for Sta.
An Escalation Process.
Helpdesk Software.
The Icing.
Statistical Improvements.
Out of Hours and 24 x 7 Coverage.
Better Advertising for the Helpdesk.
Different "Desks" for Service Provision Versus Problem Resolution.
Conclusion.
16. Customer Care.The Basics.
Ticket Tracking Software.
Phase A: The Greeting.
Phase B: Problem Identification ("What's Wrong?").
Phase C: Planning and Execution ("Fix It").
Phase D: Verification ("Verify It").
Perils of Skipping a Step.
Team of One.
The Icing.
Training Based on the Model.
The Single Point of Contact.
Increasing Customer Familiarity.
Special Announcements for Major Outages.
Trend Analysis.
Customers That Know the Process.
Architectural Decisions That Match the Process.
Conclusion.
17. Data Centers.The Basics.
Picking a Location.
Access.
Security.
Power and Air.
Fire Suppression.
Racks.
Wiring.
Labeling.
Communication.
Console Servers.
Workbench.
Tools and Supplies.
Parking Spaces.
The Icing.
Greater Redundancy.
More Space.
Ideal Data Centers.
Tom's Dream Data Center.
Christine's Dream Data Center.
Conclusion.
18. Networks.The Basics.
The OSI Model.
Clean Architecture.
Network Topologies.
Intermediate Distribution Frame.
Main Distribution Frame.
Demarcation Points.
Documentation.
Simple Host Routing.
Use Network Devices.
Overlay Networks.
Number of Vendors.
Standards-Based Protocols.
Monitoring.
Single Administrative Domain.
The Icing.
Leading-Edge Versus Reliability.
Multiple Administrative Domains.
Conclusion.
19. Email Service.The Basics.
Privacy Policy.
Namespaces.
Reliability.
Simplicity.
Generality.
Automation.
Basic Monitoring.
Redundancy.
Scaling.
Security Issues.
Communication.
The Icing.
Encryption.
Backup Policy.
Advanced Monitoring.
High-Volume List Processing.
Conclusion.
20. Print Service.The Basics.
Select the Level of Centralization.
Print Architecture Policy.
Designing the System.
Documentation.
Monitoring.
Environmental Issues.
The Icing.
Automatic Fail-Over and Load Balancing.
Dedicated Clerical Support.
Shredding.
Dealing with Printer Abuse.
Conclusion.
21. Backup and Restore.The Basics.
Three Reasons for Restores.
The Backup Schedule.
Time and Capacity Planning.
Consumables Planning.
The Restore Process.
Backup Automation.
Centralization.
Tape Inventory.
The Icing.
Firedrills.
Backup Media and Off-Site Storage.
High DB Availability.
Technology Changes.
Conclusion.
22. Remote Access Service.The Basics.
Remote Access Requirements.
Define a Remote Access Policy.
Define Service Levels.
Centralization.
Outsourcing.
Authentication.
Perimeter Security.
The Icing.
Home Office.
Cost Analysis and Reduction.
New Technologies.
Conclusion.
23. Software Depot Service.The Basics.
Understand the Justification.
Understand the Technical Expectations.
Set the Policy.
Selecting Depot Software.
Create the Process Manual.
A Unix Example.
A Windows Example.
The Icing.
Different Configurations for Different Hosts.
Local Replication.
Including Commercial Software in the Depot.
Handling Second-Class Citizens.
Conclusion.
24. Service Monitoring.The Basics.
Historical Data.
Real-Time Monitoring.
The Icing.
Accessibility.
Pervasive Monitoring.
Device Discovery.
End-to-End Tests.
Application Response Time Monitoring.
Scaling.
Conclusion.
IV. MANAGEMENT.
25. Organizational Structures.The Basics.
Sizing.
Cost Centers.
Management Chain.
Appropriate Skills.
Infrastructure Teams.
Customer Support.
Helpdesk.
Outsourcing.
The Icing.
Consultants and Contractors.
Sample Organizational Structures.
Small Company.
Medium Company.
Large Company.
E-commerce Site.
Universities and Non-Profit Organizations.
Conclusion.
26. Perception and Visibility.The Basics.
A Good First Impression.
Attitude, Perception, and Customers.
Align Your Priorities with Customer Expectations.
Be the System Advocate.
The Icing.
The System Status Web Page.
Management Meetings.
Be Visible.
Town Meetings.
Newsletters.
Mail to All Customers.
Lunch.
Conclusion.
27. Being Happy.The Basics.
Organizing for Excellent Follow-Through.
Time Management.
Communication Skills.
Constant Professional Development.
Staying Technical.
The Icing.
Learn to Negotiate.
Loving Your Job.
Managing Your Manager.
Further Reading.
Conclusion.
28. A Guide for Technical Managers.The Basics.
Responsibilities.
Working with Nontechnical Managers.
Working with Your Employees.
Decisions.
The Icing.
Make Your Team Even Stronger.
Sell Your Department to Senior Management.
Work on Your Own Career Growth.
Do Something You Enjoy.
Conclusion.
29. A Guide for Nontechnical Managers.The Basics.
Morale.
Communication.
Sta Meetings.
Look for One-Year Plans.
Technical Staff and the Budget Process.
Professional Development.
The Icing.
Have a Five-Year Vision.
Meetings with Single Point of Contact.
Understand the Technical Staff's Work.
Conclusion.
30. Hiring System Administrators.The Basics.
Job Description.
Skill Level.
Recruiting.
TimingIs Everything.
Team Considerations.
Select the Interview Team.
Interview Process.
Technical Interviewing.
Nontechnical Interviewing.
Sell the Position.
Employee Retention.
The Icing.
Get Noticed.
Conclusion.
31. Firing System Administrators.The Basics.
Follow Your Corporate HR Policy.
Remove Physical Access.
Remove Remote Access.
Remove Service Access.
Fewer Access Databases.
The Icing.
A Single Authentication Database.
Monitoring System File Changes.
Conclusion.
Epilogue.The goal of this book is to write down all the things that we've learned from our mentors and our real-world experiences. These are the things that are beyond what the manuals and the usual system administration books teach. System administrators (SAs) often find themselves swamped with work, struggling to keep the site running, and faced with requests for new technologies from their customers. Servers are overloaded or unreliable, but fixing the problem requires weeks of planning and painstakingly untangling a mess of services so that they can be moved to new machines. Hidden dependencies are lurking around every corner, and getting bitten by one can be catastrophic. In the meantime, repetitive day-to-day tasks still need to be done. The challenges seem insurmountable.
Most sites grow organically, with little thought given to the big picture as each little change is implemented. Haphazardly, SAs learn about the fundamentals of good site design and support practices. They are taught by mentors, if at all, about the importance of simplicity, clarity, generality, automation, communication, and doing the basics first. These six principles are recurring themes in this book.
These principles are universal. They apply at all levels of the system. They apply to physical networks and to computer hardware. They apply to all operating systems running at the site, all protocols used, all software, and all services provided. They apply at universities, non-profit institutions, government sites, businesses, and Internet service sites.
Explaining What System Administration Entails
It's difficult to define system administration, but trying to explain it to a nontechnical person is even more difficult, especially if that person is your mom. Moms have the right to know how their offspring are paying their rent. A friend of Christine's always had trouble explaining to his mother what he did for a living and ended up giving a different answer every time she asked. Therefore she kept repeating the question every couple of months, waiting for an answer that would be meaningful to her. Then he started working for WebTV. When the product became available, he bought one for his Mom. From then on, he told her that he made sure that her WebTV service was working and was as fast as possible. She was very happy that she could now show her friends something and say, "That's what my son does!"
System administrators do many things. They look after computers, networks, and the people who use them. An SA may look after hardware, operating systems, software, configurations, applications, or security. A system administrator is someone who influences how effectively other people can use their computers and networks.
System administration matters because computers and networks matter. Computers are a lot more important than they were years ago. What happened?
First of all, the technology has changed. Corporate computers used to be independent, now they are connected. Business processes used to have a component that involved using a computer, now entire processes are done online and come to a halt if any part of the system is broken.
The widespread use of the Internet, intranets, and the move to a dot com world has redefined the way companies depend on computers. The Internet is a 24 x 7 operation, and sloppy operations can no longer be tolerated. A paper purchase order can be processed any time, anywhere; therefore there is an expectation that the computer system that automates the process will be available all the time, from anywhere. Nightly maintenance windows have become an unheard of luxury. That unreliable power system in the machine room that caused occasional but bearable problems now prevents sales from being recorded.
The biggest change, however, is due to CEOs putting a new importance on computing. In business, nothing is important unless the CEO feels it is important. The CEO controls funding and sets priorities. Now CEOs have become dependent on email. They notice when an outage or an overloaded system slows down their email. The massive preparations for Y2K also brought home to CEOs how dependent their organizations have become on computers.
We use the term chief executive officer (CEO) loosely to mean the top person in an organization. Educational institutions have CEOs, they're just referred to as president, provost, proctor, or head. Governments have CEOs they're just referred to as mayor, governor, Prime Minister, leader, or President.
Management now has a more realistic view of computers. Previously people had unrealistic ideas of what computers could do; seeing them as portrayed in film: big, all-knowing, self-sufficient, miracle machines. This has changed. Even the need for SAs is now portrayed in films. In 1993, Jurassic Park (Crichton 1993) was the first mainstream movie to portray computers as needing system administration, leading to a better public understanding of what it is.
Computers matter more than ever. If computers are to work and work well, then system administration matters. We matter.
This book was born from our experiences as SAs in a variety of companies. We have helped sites to grow. We have worked at small start-ups and universities, where lack of funding was an issue. We have worked at mid-size and large multinationals, where mergers and spin-offs give rise to more challenges. We ve worked at fast-paced companies that do business on the Internet and have high-availability, high-performance, and rapid scaling issues. On the surface, these are very different environments with diverse challenges. But underneath, they all need the same building blocks, and the same fundamental principles apply.
This book gives you a framework a way of thinking about system administration problems rather than a narrow how-to solution to a particular problem. Given a solid framework, you can solve problems every time they appear, no matter what operating system (OS), brand of computer, or type of environment. This book is unique because it looks at system administration from this point of view, whereas most books for SAs focus on how to maintain one particular type of OS. With experience, however, all SAs learn that the big-picture problems and solutions are largely independent of the platform. This book will change the way you approach your work as an SA and the way you view the site you maintain.
The principles in this book apply to all environments. The approaches described may need to be scaled up or down, depending on your environment, but the basic principles still apply. In chapters where we felt that how to apply the information to other environments might not be obvious, we have included a section that illustrates how to apply the principles at different companies.
This book is not about how to configure or debug a particular OS. It will not tell you how to recover the shared libraries or DLLs when someone accidentally moves them. There are some excellent books that do cover those topics, and we will refer you to many of them throughout the book. What we will discuss here are the principles of good system administration, both basic and advanced, that we have learned through our own and others experiences. These principles apply to all OSs. Following them well can make your life a lot easier. If you improve the way you approach problems, the benefit will be multiplied. Get the fundamentals right, and everything else falls into place. If they aren't done well, you will waste time repeatedly fixing the same things, and your customers2 will be unhappy because they can't work effectively with broken machines.
2Throughout the book we refer to the end-user of our systems as customers rather than users. A detailed explanation of why we do this is in Section 26.1.2.
We believe that SAs of all levels will benefit from reading this book. It gives junior SAs insight into the bigger picture of how sites work, their roles in the organizations, and how their careers can progress. Intermediate SAs will learn how to approach more complex problems and how to improve the sites, making their jobs easier and more interesting and their customers happier. It will help you to understand what is behind your day-to-day work, to learn the things that you can do now to save time in the future, to decide policy, to be architects and designers, to plan far into the future, to negotiate with vendors, and to interface with management. These are the things that concern senior SAs. None of them are listed in an OS's manual. Even senior SAs and systems architects can learn from our experiences and the experiences of our colleagues that are captured in these pages, as we have learned from each other in writing this book. We also cover several management topics, both for SA managers and for SAs who aspire to move into management.
The easiest way to learn usually is by example, particularly in the case of practical areas like system administration. Throughout the book, we use examples to illustrate the points we are making. The examples are mostly from medium or large sites, where scale adds its own problems. Typically, the examples are generic rather than specific to a particular OS, although some are OS-specific, usually Unix or Windows. One of the strongest motivations we had for writing this book is the understanding that the problems SAs face are the same across all OSs. A new OS that is significantly different from what we are used to can seem like a black box, a nuisance, or even a threat. However, despite the unfamiliar interface, as we get used to the new technology, eventually we realize that we face the same set of problems in deploying, scaling, and maintaining the new OS. Recognizing that fact, knowing what problems need solving, and understanding how to approach the solutions by building on experience with other OSs let us master the new challenges more easily.
We want this book to be something that changes your career. We want you to become so successful that if you see us on the street you'll give us a great big hug.
This book has four major parts:
The book ends with several appendices.
Each chapter discusses a different topic, and the topics vary from the technical to the nontechnical. If one chapter doesn't apply to you, feel free to skip it. The chapters are linked to each other, so you may find yourself returning to a chapter that you previously thought was boring. We won't be offended.
There are two halves to each chapter: The Basics and The Icing. The Basics discusses the essentials that you just plain have to get right. Skipping any of these items will simply create more work for you in the future. Consider them investments that pay off in efficiency later on. The Icing deals with the cool things that you can do to be spectacular. Don't spend your time with these things until you are done with The Basics. We have made an attempt to drive the points home through anecdotes and case studies from personal experience. We hope that this makes the advice here more real for you. Never trust salespeople who don't use their own products.
Each chapter stands on its own. Feel free to jump around. However, we have carefully ordered the chapters so that they make the most sense if you read the book from start to finish. Either way, we hope you enjoy the book. We have learned a lot and had a lot of fun writing it. Let's begin.
Thomas A. Limoncelli
Lumeta Corporation
tom@limoncelli.org
Christine Hogan
Independent Consultant
chogan@chogan.com
P.S. Books, like software, always have bugs. We intend to maintain a list of updates to this book on its web site: http://www.awl.com/cseng/titles/0-201-70271-1 or our web site, http://www.EverythingSysAdmin.com. Please visit!