<< Chapter < Page Chapter >> Page >

Top

One of the quickest-to-use tools for a picture of the current state of a Linux system is top which displays current top processes running, system uptime, load average, CPU utilization, memory usage (real and swap), and other items.

Uptime

For a quick view of the system uptime and load averages, run uptime .

Iostat

iostat displays information about the current state of the disk I/O on the system

Netstat

To see what network ports are currently open and listening, use netstat . For example, netstat -an | grep 80 will display what is using port 80 (and 8080, and anything else that has '80' in its port number).

Lsof

lsof will show what process is holding open a network port or file. To use "list open files" to see what process is holding port 80, run lsof -i:80

Ps

To see a list of the process table, run ps . My favorite argument sequence is aux which gives lots of information back: ps aux

The similar call on a Solaris machine is: ps -ef

Strace

For a fuller diagnosis of what a given process is doing, strace can be a lifesaver. It essentially wraps around the process in question (either by running strace<program-name> , or by attaching to a running process with strace -p<pid>

On Solaris, the similar tool is truss .

Gdb

The GNU debugger, gdb , is a massively-useful tool in the right hands: tracking individual calls inside a program, setting breakpoints, etc: it should be learned by every developer, and known to advanced users.

User error

"User error" is among the most commonly-cited errors with software and systems: the operator did something the creators did not expect. To use a ubiquitous car analogy, it's "user error" if the driver hits the gas instead of the brake. One interesting article makes the claim that there is [almost] no such thing as "user error", and that instead it should be the developers who make tools not resilient enough to handle any user (no, a car manufacturer can't make the gas act like the brake when you "meant to stop", but maybe software developers can make their products less error-prone, or at least have them give better errors when they do have a problem).

    A spectrum of user-initiated errors:

  • Typos (misspellings, fat-fingering, generally mistyping something)
  • External environmental problems (eg unplugging a network cable)
  • Clickos (ie, misclicks - akin to mistyping)
  • Forgetfulness
  • Etc
From personal observation, I would guess user error accounts for 70-80% of all errors seen.

Post-mortem data collection

When something has gone so awry that it has violently crashed, or even taken out its host system, it's time for some post-mortem data collection - maybe even forensic analysis.

Core dumps, log files, and even images of whole drives can be investigated during a post-mortem analysis of problems seen: as your technical acumen grows, you'll be able to investigate more parts of these prior to escalating to the tool's support or development teams.

Pro-active, preventative measures

Ideally, we would all live and work in a world where nothing ever failed, and everyone acted the way they are "supposed" to. Sadly, that world does not exist. So what can we do to help prevent issues in the first place, or respond more adeptly when they [inevitably] occur?

Some solutions are simple: add more memory to the system; increase swap space; verify storage quotas; make sure all the resources I need are available; etc. Many can be more complex.

If there is a set of "Known Issues" or release notes that come with a particular product, make sure you read and are aware of them: there is almost nothing more frustrating than finding out there is a known issue, but you didn't check the manuals first!

Asking "why"

If you're on the administrative side of the technical world, and not just the end-user side, the other big thing to remember is to always ask "why". Why did it fail? Why did we miss the known issue? Why were we not notified a necessary resource was going to be down? Why was there no alert sent about resources nearing their limits? If you can ask (and answer) those, then you should be able to reduce the number of "why" questions you need to ask in the future - because hopefully you're solving problems before they arise.

"future-proofing" - is it possible?

The idea of "Future-Proofing" is to create an environment that can survive future developments without needing to be changed itself. A common example of this would be to look at the current and expected growth needs of the email infrastructure of an organization, and then size the mail servers to handle 15-25% more than the expected growth (ie 100 users today, adding 20% per year, size the environment today for 200 users in three years (173 expected, plus ~15%). Or it could mean ensuring that data you are working with today in version 4.3 of some tool will be accessible when upgrading to 7.2 in 4 years.

When relying on external vendors, guaranteeing your environment is future-proof may not be possible - they could decide to change database schemas, file formats, etc. Likewise, when relying on expected growth patterns, you may exceed those expectations (requiring additional licenses, hardware, etc), or you may not meet those plans, and have an unnecessarily oversized environment. Several mitigating strategies exist for these eventualities, but are beyond the scope of this lesson.

Closing thoughts

You've completed this module, and so now you're ready to troubleshoot the most ornery problems in the most obscure corners of your system, right? Don't let me discourage you from that lofty goal: but the reality is that becoming a good troubleshooter takes time, practice, lots of exposure, practice, skimming skills, practice, and patience. Oh, and did I mention: practice!

Lots of professions require troubleshooting skills, and each has their own tricks and tips to follow: auto mechanics will check the OBDII and listen to a rattle; electricians look for wiring faults; doctors look at symptoms to come up with a diagnosis. Skills learned in one field may not always translate into another, but if you can learn the basics (which DO all transfer), then gleaning insights from others can only improve your own personal Bag O' Hatchets.

Questions & Answers

how does Neisseria cause meningitis
Nyibol Reply
what is microbiologist
Muhammad Reply
what is errata
Muhammad
is the branch of biology that deals with the study of microorganisms.
Ntefuni Reply
What is microbiology
Mercy Reply
studies of microbes
Louisiaste
when we takee the specimen which lumbar,spin,
Ziyad Reply
How bacteria create energy to survive?
Muhamad Reply
Bacteria doesn't produce energy they are dependent upon their substrate in case of lack of nutrients they are able to make spores which helps them to sustain in harsh environments
_Adnan
But not all bacteria make spores, l mean Eukaryotic cells have Mitochondria which acts as powerhouse for them, since bacteria don't have it, what is the substitution for it?
Muhamad
they make spores
Louisiaste
what is sporadic nd endemic, epidemic
Aminu Reply
the significance of food webs for disease transmission
Abreham
food webs brings about an infection as an individual depends on number of diseased foods or carriers dully.
Mark
explain assimilatory nitrate reduction
Esinniobiwa Reply
Assimilatory nitrate reduction is a process that occurs in some microorganisms, such as bacteria and archaea, in which nitrate (NO3-) is reduced to nitrite (NO2-), and then further reduced to ammonia (NH3).
Elkana
This process is called assimilatory nitrate reduction because the nitrogen that is produced is incorporated in the cells of microorganisms where it can be used in the synthesis of amino acids and other nitrogen products
Elkana
Examples of thermophilic organisms
Shu Reply
Give Examples of thermophilic organisms
Shu
advantages of normal Flora to the host
Micheal Reply
Prevent foreign microbes to the host
Abubakar
they provide healthier benefits to their hosts
ayesha
They are friends to host only when Host immune system is strong and become enemies when the host immune system is weakened . very bad relationship!
Mark
what is cell
faisal Reply
cell is the smallest unit of life
Fauziya
cell is the smallest unit of life
Akanni
ok
Innocent
cell is the structural and functional unit of life
Hasan
is the fundamental units of Life
Musa
what are emergency diseases
Micheal Reply
There are nothing like emergency disease but there are some common medical emergency which can occur simultaneously like Bleeding,heart attack,Breathing difficulties,severe pain heart stock.Hope you will get my point .Have a nice day ❣️
_Adnan
define infection ,prevention and control
Innocent
I think infection prevention and control is the avoidance of all things we do that gives out break of infections and promotion of health practices that promote life
Lubega
Heyy Lubega hussein where are u from?
_Adnan
en français
Adama
which site have a normal flora
ESTHER Reply
Many sites of the body have it Skin Nasal cavity Oral cavity Gastro intestinal tract
Safaa
skin
Asiina
skin,Oral,Nasal,GIt
Sadik
How can Commensal can Bacteria change into pathogen?
Sadik
How can Commensal Bacteria change into pathogen?
Sadik
all
Tesfaye
by fussion
Asiina
what are the advantages of normal Flora to the host
Micheal
what are the ways of control and prevention of nosocomial infection in the hospital
Micheal
what is inflammation
Shelly Reply
part of a tissue or an organ being wounded or bruised.
Wilfred
what term is used to name and classify microorganisms?
Micheal Reply
Binomial nomenclature
adeolu
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Debugging and supporting software systems. OpenStax CNX. Aug 29, 2011 Download for free at http://cnx.org/content/col11350/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Debugging and supporting software systems' conversation and receive update notifications?

Ask