Troubleshooting software systems

Page 4 / 4

Top

One of the quickest-to-use tools for a picture of the current state of a Linux system is top which displays current top processes running, system uptime, load average, CPU utilization, memory usage (real and swap), and other items.

Uptime

For a quick view of the system uptime and load averages, run uptime .

Iostat

iostat displays information about the current state of the disk I/O on the system

Netstat

To see what network ports are currently open and listening, use netstat . For example, netstat -an | grep 80 will display what is using port 80 (and 8080, and anything else that has '80' in its port number).

Lsof

lsof will show what process is holding open a network port or file. To use "list open files" to see what process is holding port 80, run lsof -i:80

Ps

To see a list of the process table, run ps . My favorite argument sequence is aux which gives lots of information back: ps aux

The similar call on a Solaris machine is: ps -ef

Strace

For a fuller diagnosis of what a given process is doing, strace can be a lifesaver. It essentially wraps around the process in question (either by running strace<program-name> , or by attaching to a running process with strace -p<pid>

On Solaris, the similar tool is truss .

Gdb

The GNU debugger, gdb , is a massively-useful tool in the right hands: tracking individual calls inside a program, setting breakpoints, etc: it should be learned by every developer, and known to advanced users.

User error

"User error" is among the most commonly-cited errors with software and systems: the operator did something the creators did not expect. To use a ubiquitous car analogy, it's "user error" if the driver hits the gas instead of the brake. One interesting article makes the claim that there is [almost] no such thing as "user error", and that instead it should be the developers who make tools not resilient enough to handle any user (no, a car manufacturer can't make the gas act like the brake when you "meant to stop", but maybe software developers can make their products less error-prone, or at least have them give better errors when they do have a problem).

A spectrum of user-initiated errors:

Typos (misspellings, fat-fingering, generally mistyping something)
External environmental problems (eg unplugging a network cable)
Clickos (ie, misclicks - akin to mistyping)
Forgetfulness
Etc

From personal observation, I would guess user error accounts for 70-80% of all errors seen.

Post-mortem data collection

When something has gone so awry that it has violently crashed, or even taken out its host system, it's time for some post-mortem data collection - maybe even forensic analysis.

Core dumps, log files, and even images of whole drives can be investigated during a post-mortem analysis of problems seen: as your technical acumen grows, you'll be able to investigate more parts of these prior to escalating to the tool's support or development teams.

Pro-active, preventative measures

Ideally, we would all live and work in a world where nothing ever failed, and everyone acted the way they are "supposed" to. Sadly, that world does not exist. So what can we do to help prevent issues in the first place, or respond more adeptly when they [inevitably] occur?

Some solutions are simple: add more memory to the system; increase swap space; verify storage quotas; make sure all the resources I need are available; etc. Many can be more complex.

If there is a set of "Known Issues" or release notes that come with a particular product, make sure you read and are aware of them: there is almost nothing more frustrating than finding out there is a known issue, but you didn't check the manuals first!

Asking "why"

If you're on the administrative side of the technical world, and not just the end-user side, the other big thing to remember is to always ask "why". Why did it fail? Why did we miss the known issue? Why were we not notified a necessary resource was going to be down? Why was there no alert sent about resources nearing their limits? If you can ask (and answer) those, then you should be able to reduce the number of "why" questions you need to ask in the future - because hopefully you're solving problems before they arise.

"future-proofing" - is it possible?

The idea of "Future-Proofing" is to create an environment that can survive future developments without needing to be changed itself. A common example of this would be to look at the current and expected growth needs of the email infrastructure of an organization, and then size the mail servers to handle 15-25% more than the expected growth (ie 100 users today, adding 20% per year, size the environment today for 200 users in three years (173 expected, plus ~15%). Or it could mean ensuring that data you are working with today in version 4.3 of some tool will be accessible when upgrading to 7.2 in 4 years.

When relying on external vendors, guaranteeing your environment is future-proof may not be possible - they could decide to change database schemas, file formats, etc. Likewise, when relying on expected growth patterns, you may exceed those expectations (requiring additional licenses, hardware, etc), or you may not meet those plans, and have an unnecessarily oversized environment. Several mitigating strategies exist for these eventualities, but are beyond the scope of this lesson.

Closing thoughts

You've completed this module, and so now you're ready to troubleshoot the most ornery problems in the most obscure corners of your system, right? Don't let me discourage you from that lofty goal: but the reality is that becoming a good troubleshooter takes time, practice, lots of exposure, practice, skimming skills, practice, and patience. Oh, and did I mention: practice!

Lots of professions require troubleshooting skills, and each has their own tricks and tips to follow: auto mechanics will check the OBDII and listen to a rattle; electricians look for wiring faults; doctors look at symptoms to come up with a diagnosis. Skills learned in one field may not always translate into another, but if you can learn the basics (which DO all transfer), then gleaning insights from others can only improve your own personal Bag O' Hatchets.

Questions & Answers

how does Neisseria cause meningitis

Nyibol Reply

what is microbiologist

Muhammad Reply

what is errata

Muhammad

is the branch of biology that deals with the study of microorganisms.

Ntefuni Reply

What is microbiology

Mercy Reply

studies of microbes

Louisiaste

when we takee the specimen which lumbar,spin,

Ziyad Reply

How bacteria create energy to survive?

Muhamad Reply

Bacteria doesn't produce energy they are dependent upon their substrate in case of lack of nutrients they are able to make spores which helps them to sustain in harsh environments

_Adnan

But not all bacteria make spores, l mean Eukaryotic cells have Mitochondria which acts as powerhouse for them, since bacteria don't have it, what is the substitution for it?

Muhamad

they make spores

Louisiaste

what is sporadic nd endemic, epidemic

Aminu Reply

the significance of food webs for disease transmission

Abreham

food webs brings about an infection as an individual depends on number of diseased foods or carriers dully.

Mark

explain assimilatory nitrate reduction

Esinniobiwa Reply

Assimilatory nitrate reduction is a process that occurs in some microorganisms, such as bacteria and archaea, in which nitrate (NO3-) is reduced to nitrite (NO2-), and then further reduced to ammonia (NH3).

Elkana

This process is called assimilatory nitrate reduction because the nitrogen that is produced is incorporated in the cells of microorganisms where it can be used in the synthesis of amino acids and other nitrogen products

Elkana

Examples of thermophilic organisms

Shu Reply

Give Examples of thermophilic organisms

Shu

advantages of normal Flora to the host

Micheal Reply

Prevent foreign microbes to the host

Abubakar

they provide healthier benefits to their hosts

ayesha

They are friends to host only when Host immune system is strong and become enemies when the host immune system is weakened . very bad relationship!

Mark

what is cell

faisal Reply

cell is the smallest unit of life

Fauziya

cell is the smallest unit of life

Akanni

Innocent

cell is the structural and functional unit of life

Hasan

is the fundamental units of Life

Musa

what are emergency diseases

Micheal Reply

There are nothing like emergency disease but there are some common medical emergency which can occur simultaneously like Bleeding,heart attack,Breathing difficulties,severe pain heart stock.Hope you will get my point .Have a nice day ❣️

_Adnan

define infection ,prevention and control

Innocent

I think infection prevention and control is the avoidance of all things we do that gives out break of infections and promotion of health practices that promote life

Lubega

Heyy Lubega hussein where are u from?

_Adnan

en français

Adama

which site have a normal flora

ESTHER Reply

Many sites of the body have it Skin Nasal cavity Oral cavity Gastro intestinal tract

Safaa

skin

Asiina

skin,Oral,Nasal,GIt

Sadik

How can Commensal can Bacteria change into pathogen?

Sadik

How can Commensal Bacteria change into pathogen?

Sadik

all

Tesfaye

by fussion

Asiina

what are the advantages of normal Flora to the host

Micheal

what are the ways of control and prevention of nosocomial infection in the hospital

Micheal

what is inflammation

Shelly Reply

part of a tissue or an organ being wounded or bruised.

Wilfred

what term is used to name and classify microorganisms?

Micheal Reply

Binomial nomenclature

adeolu

Got questions? Join the online conversation and get instant answers!

Jobilize.com Reply

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Debugging and supporting software systems. OpenStax CNX. Aug 29, 2011 Download for free at http://cnx.org/content/col11350/1.2

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Debugging and supporting software systems' conversation and receive update notifications?

Ask

	9 BOD- Liver Quiz .... By Brooke Delaney Start Quiz
©flickr: Elliott	Spanish Verbs Subject Pronouns By Mariah Hauptman Start Quiz
	5 Psychology MCQ 2010 1 Exam By John Gabrieli Start Exam
	Immunology Practice Test By Sandhills MLT Start Test
	15 AP 15 Autonomic Nervous System Essay By OpenStax Start Flashcards
	Real Estate Finance Exam 2003 By Tod McGrath Start Exam
	Clinical Psychology MCQ By Saylor Foundation Start Quiz
	Classical Music By Marion Cabalfin Start Quiz
	1 Microeconomics 01 What Is Economics? By OpenStax Start Flashcards
	Social Organization Kinship By Richley Crapo Start Assignment