As you probably know by now, you are talking to the Unix shell whenever you log in to a Unix system via telnet or ssh, or open an xterm. (Throughout this document, I'll use "Unix" somewhat losely to mean "Unix or Linux") You know that you can give commands to the shell, and it will run programs for you. But the shell has many features that can make your work in Unix easier and more enjoyable. The goal of this document is to point out some of those features, to give some pointers for future exploration. I will not tell you everything there is to know about the Unix shell, indeed I don't think that is possible.
You run a program, and the output scrolls off your screen faster than you can read it. You try scrolling back using your terminal program, but that doesn't quite work or the part you want just happened to not fit in the scrollback buffer. Wouldn't it be nice if you could capture the output of the command to a file and examine the file at your leisure?
Redirection to the rescue! Redirecting a program's output means you ask the shell run the program and put whatever the program produces on its standard output (stdout for C fans, or cout for C++ types) in a file rather than sending it to your terminal screen; redirecting input means you ask the shell to run a program, but to arrange things so that the program reads a file as its standard input (aka. stdin in C, cin in C++), instead of reading from a terminal.
Nothing like a little example to clarify the concepts! Type ls -- the directory listing will be displayed on your terminal screen. But you knew that already, right?. Now type ls >mylisting. The prompt will reappear, but the ls output will not appear on your screen. The ">" symbol tells the shell to put output of the ls in the file "mylisting." In other words, it's the shell output redirection symbol. View the mylisting file using less: less mylisting. Note that the mylisting file itself is part of the file list produced by ls. The shell sets up the output file first, truncating an existing file, if necessary, and then feeds the output of a command into that file. That means you can lose a file that way, if you are not careful.
Now let's say you want to delete the letter e from the mylisting file. Try this: tr -d e <mylisting. The < character tells the shell to feed the file mylisting to tr as its standard input. The tr utility copies its standard input to its standard output modifying it along the way. In this case, the "-d" option tells it to delete the letter "e" from its input. Admittedly, deleting the letter "e" from your directory listing is a bit artificial, but you now have a simple way to clean the DOS/Windows files you copied to your account from those annoying ^M characters that appear at ends of lines (and cause problems with, for example, spim): tr -d '\r' <myfile.dos >myfile.unix. This tells the shell to run the tr command feeding it the myfile.dos as a standard input, and putting the output in myfile.unix. Again, remember that doing tr <myfile >myfile is a sure way to permanently lose the data in your myfile. So don't do that. As I said before, the -d option tells tr to delete a certain character. In this case, the character happens to be ^M (ASCII code 13), represented by '\r'. Now you may be wondering why you need to put quotes around '\r': the short answer is that the "\" character is special to the shell, and needs to be quoted. I'll come back to this when I talk about shell wildcards.
Now let's say you have a directory with a lot of files in it, and when you get the directory listing, the files scroll off the screen faster than you can read their names. You can remedy this situation by redirecting ls output into a file, and using the less(or more) utility to view the resulting file. ls >mylisting and then less mylisting. That works, except that there is a better way. The better way is to use piping. What is piping? Say you have two programs, for instance ls and less. You can ask the shell to set things up so that the output of ls is fed to less as its input: ls -l | less. The "|" symbol tells the shell to feed the output of the program(s) on the left of "|" as input to the program(s) to the right of "|". In reality, the shell does not handle the piping all by itself, but this is not important for what I have to say here.
With piping, we get at the heart of one of the principles of Unix design: "The whole is bigger than the sum of its parts". Unix has a bizillion utilities, each of which does a small well-defined task, and does it well. By combining those utilities with redirection, piping, etc new tools can be built on-the-fly as they are needed. Of course, then tools like Perl and Emacs were invented, which sort of, ahem, ehhhh, don't really follow this philosophy...
Here are some more examples of redirection to get used to the idea
These examples just barely scratch the surfaces of what is possible by combining commands with piping and redirection. Here is a teaser:
cat somefile | tr -cs '[:alnum:]' '[\n*]' | sort -u \
| comm -2 -3 - /usr/dict/words
The cat program simply copies its input to its output: it is
useful for getting things into the pipeline. The tr invocation
replaces everything other than a number or a letter with a
newline. We haven't seen this form of tr yet. With two
arguments, tr will translate all the characters in the first set
into the corresponding characters in the second set. The -c
option tells tr to use a complement of alphanumeric characters,
i.e. "everything that's not a letter or a number"
and the "*" following the \n tells it to repeat \n (the
newline character) as many times as necessary so that everything
in the first set is matched. The sort -u removes
duplicates from the input in addition to sorting. (the mnemonics
for -u is "unique"). The comm utility takes
two sorted files and produces three columns of inputs: lines in
the first file, but not in the second file, lines in the second
file, but not in the first file, and lines common to both files.
In this case, we give it the standard input as one input file
(represented by the "magic" character -) and /usr/dict/words -- a
list of words in an English dictionary. The point is not that
this is a good way to check spelling (ispell does a much better
job of it), but rather that this is possible.
Any special character can occur anywhere in the pattern any number of times, so, for instance, *.[ch]* is a valid pattern, as is /*/*bin*.
There is a potential point of confusion here. When you type ./myprog *.c the shell substitutes the *.c with a list of all the files in the current directory that match the *.c pattern, e.g. main.c list.c tree.c. Your program never even sees the *.c pattern. If, for some reason, you need to pass the *.c literally to the program, use quotes: ./myprog '*.c'. This also means you cannot do things like cp *.c *.cc and expect it to make a copy of each file individually. You can do this and more with shell loops.
In what follows, I will describe some basics of scripting for Bourne-style shells (including bash). Occasionally, I will mention C-shell equivalents of commands or constructs, but in general I recommend against C-shell. I think using C-shell makes little sense on Linux (GNU/Linux) machines, since bash is the default shell under every Linux distribution I have seen, and bash has most (all?) of the interactive features of C-shell: tab completion, history, command-line-editing, aliases, etc..
As of recently, it is possible to use the chsh command to change your login shell to bash on the instructional Linux machines. I recommend that you do so. Note that if you have ran the /uns/examples/setup-tutorial script, you already get bash when you log in, but that was done through some trickery. You may still wish to use chsh to set your login shell to bash.
The Unix shell is actually a reasonably sophisticated scripting language (it lacks the data structures facilities to be considered a programming language). The syntax of the scripting language differs between Bourne shells and C-shells. I will talk mostly about Bourne shell syntax here, and only mention C-shell syntax briefly.
As you would expect in a scripting language, the shell supports variables. You set a shell variable like so: my_var="some value here". There should be no space at either side of the = sign. To get at the value of the variable, you prefix the variable name with a $ sign, like so:
my_other_var=$my_var # my_other_var now has the same value as my_var
Like many other scripting languages, shell variables do not need to be declared and are typeless. Variables just pop into existance when you assign a value to them, and the same variable can hold either a string or a number (but not both at the same time). The C-shell syntax for setting a variable is: set my_variable="some value here". The C-shell syntax to get at the value of a variable is the same as for Bourne shell:
set my_other_variable=$my_variable # my variable and my other variable now have the same value # C-shell style.While talking about shell variables, I should probably take a couple of paragraphs to talk about environment variables. Before I can talk about environment variables, I need to say a word or two about processes.
For our purposes, we can say that a process is a running program plus the memory associated with it. The process's environment is simply an array of (null-terminated) strings of the form NAME=VALUE. The NAME names an environment variable. Each process (the shell is no exception) has its own copy of the environment array, which it can examine and modify. When the shell starts a program for you, the program (or, more precisely, the process in which the program runs) gets a current copy of the shell's environment array. So, if you modify the environment of your shell, the programs you run from that point on will get a modified copy of the environment array.
In Unix, there are some well-established conventions regarding the "meaning" of environment variables. For example, programs that need to edit text will often look at the value of the EDITOR environment variable; programs that need to display large amounts of text on your screen will look at the value of the PAGER environment variable, etc.
The syntax for setting an environment variable in modern Bourne shells (bash, ksh, zsh) is:
export NAME=VALUE
The 'classic' Bourne shell syntax is:
NAME=VALUE; export NAME
The syntax to set an environment variable in C-shells (csh, tcsh,
zsh) is:
setenv NAME VALUE # note the absence of the equal sign(=)
You get at the value of an environment variable just as you would
get at the value of regular shell variable -- prefixing the
variable name with a $ sign, for example:
# add $HOME/bin to your PATH
# $HOME is usually your home directory
PATH=$PATH:$HOME/bin
me=`id -u` # me now has my numerical user ID. echo $meNote that backquotes (`) rather than regular quotes are used. Bash introduces a nice extension to the command substitution syntax, where you can say
me=$(id -u) # me now has my numerical user ID. echo $meThis works very nicely if you need to nest command substitution.
Before we plunge into the description of other shell constructs, I should say that ; (the semicolon) and newline are mostly equivalent as command separators in Bourne-derived shells: (including bash) any time you see one you can pretty much substitute it with the other. This is not so at all in C shell, which makes writing essentially single-line commands unnecessarily inconvenaient (that is another reason I recommend against using C shell).
The shell for-loops are quite different from the for-loops in C/C++ (but quite similar to the for-loops in Python, or the foreach-loops in Perl). While the for-loop in C/C++ is a generalization of a while-loop, the shell's for-loop simply allows you to go through a list of "things", doing something for each "thing" on the list.
The syntax of a shell for-loop is:
for loop_variable in list ; do
command1
command2
...
commandN
done
command1 through commandN comprise the body of the for-loop. The loop_variable is set to each item in the list in turn, then the body of the loop is executed. When the list is exhausted, the loop ends.
You could specify the list explicitely:
for name in John Marry ; do echo hello $name ; done
or you could let the shell produce the list for you -- for
example, through command substitution, variable expansion, or, as
in the following example, the use of shell patterns.
Here is how you could rename a bunch of .c files to .cpp :
for file in *.c ; do mv $file `basename $file .c`.cpp ; done
To understand this loop you need to know that the
basename command takes a path to filename and optionally
a suffix, and returns file name sans the path and the suffix. For
example:
tahiti~$ basename ~/main.c main.c tahiti~$ basename ~/main.c .c main
Now we have seen quite a few characters that are special to the shell: comments start with # and go to end of line, variables are accessed by sticking a $ in front of their name, the *, ?, and [] are all special to the shell, as are () and {}, and the space character. What if we want to pass these as arguments to a command we run, without the shell messing with them? For example, we may have some files whose name start with a #, or we may want to do something to a file that has spaces in its name. This is where quoting comes in.
Quoting a character tells the shell to ignore whatever special meaning the character may have, and just use the character verbatim. Quoting a string tells the shell to ignore the special meaning of (some of) the characters contained within the string. You quote a single character by sticking a backslash in front of it, so if you have a file named $make_money_fast$, you remove it by saying:
rm \$make_money_fast\$
Now, this makes the backslash special, so you quote a backslash by
doubling it:
echo c:\\program\ files\\adobe
==> c:\program files\adobe
I am using the ==> to show the result of running a shell
command here.
There are two ways to quote a string: you can use single or double quotes. Using single quotes tells the shell to treat no characters in the string as special, while using double quotes preserves the special meaning of $, `, and \:
knights='Ni! Ni! Ni!'
echo '$knights'
==> $knights
echo "$knights"
==> Ni! Ni! Ni!
me='`id -un`'
echo $me
==> `id -un`
me="`id -un`"
echo $me
==> YOUR_LOGIN_HERE
You can still use the \ inside double quotes:
echo "\$100?"
==> $100?
echo "I said \"OK\""
==> I said "OK"
Before we talk about shell conditions, it is important to remember that whenever the shell runs a program, the exit status of 0(zero) is considered successful, or TRUE, and non-zero exit status is considered FALSE. This is, of course quite opposite to the way C/C++ treat zero vs non-zero return values.
The simplest if condition looks like this:
if test_command ; then
command1
...
commandN
fi
Quite frequently, the command named test will be used as the test command. Because it is so common to want to write
if test test_arguments ; then
command1
command2
...
commandN
fi
there is a shorthand notation. Writing:
if [ test_arguments ] ; then
command1
command2
...
commandN
fi
is a shorthand for to writing:
if test test_arguments ; then
...
Here is a little example:
# rename mylog to mylog.old if the mylog file exists
if [ -f mylog ] ; then
mv mylog mylog.old
fi
Some gotchas to watch out for is that you do need the space after
[, before ], and between ; and
then. Look up the test(1) manual entry for a list of
possible tests (i.e, man test).
While if-conditions using the command named test(or its shorthand form) are very-very common indeed, you should be aware that they are by no means the only game in town. It is possible to condition on the exit code of any command, not just the command named test:
if finger evgenyr | grep -iq "on since" ; then echo "Evgeny is logged in." else echo "Evgeny is not here." fiThe example above also shows that if-conditionals can have an else-clause, by the way. The designers of Bourne shell apparently deemed the if ... else if ... else if ... else idiom common enough to provide a special syntax for it:
if [ -f $myfile ] ; then
echo $myfile is a regular file
elif [ -d $myfile ] ; then
echo $myfile is a directory
elif [ -L $myfile ] ; then
echo $myfile is a symbolic link
elif [ -p $myfile ] ; then
echo $myfile is a named pipe
elif [ -S $myfile ] ; then
echo $myfile is a socket
else
echo I give up.
fi
Now that we've seen the if-conditions, while-loops are really easy to understand, the syntax is:
while test_command ; do command1 ... commandN doneThe test_command works just as it does for if-conditions, the while-loop itself works just the way it does in any other language, and that's all I have to say about that.
The shell has && and || operators, which work similar to the way they do in C: if you have command1 && command2, the shell will run command1, look at its exit status, and only run command2 if the exit status was zero (success); if you have command1 || command2, the shell will run command1, look at its exit status, and will only run command2 if the exit status was non-zero (failure).
Because of these properties of the && operator, you will sometimes see conditionals written like this:
[ -f $dir/Makefile -a -x $dir] && {
cd $dir
make -f Makefile
}
This works because if you take a bunch of commands, put semicolon
(or newline) after them, and put them inside braces, you get a
compound command, which is treated as a unit
syntactically.
You define shell functions like so:
sayhello()
{
echo Hello
}
You use shell function like so:
sayhello # note no parentheses
Simple, huh?
"Aha," I hear you say, "but how do I give arguments to function?". Not to worry, gentle reader: when you call a shell function, you just write the arguments right after the function name:
my_function arg1 arg2 ... argN
Inside the body of the function, the first argument is available
as $1, second as $2, etc. up to $9. You can use the
shift operator to get at more arguments. You can also
use the special form of the for-loop to iterate through all the
function's argument:
print_args()
{
i=1
for arg ; do # note the absence of the "in somelist" part
echo \$$i = $arg
i=`expr $i + 1`
done
}
# read the expr man page for details of what it can do
# in bash, it's also possible to do arithmetics like this:
# echo $((3 + 4))
# i=$(($i + 1))
The argument count is available in the special
variable $#:
count_args() { echo Called with $# arguments; }
In addition, you can get at all the arguments at once in the special variable $* or $@. The difference between the two only matters when you quote them. Suppose you say my_func ham eggs spam. Then, in the body of my_func, "$@" is "ham" "eggs" "spam" (each argument is quoted individually), while $* is "ham eggs spam" (all arguments are quoted together). Most of the times, you won't care, though.
Unlike C/C++ functions variables inside a shell function are global by default. The "classic" Bourne shell (sh) does not provide a way to make shell varibles local, at least none that I know of. However, the Bourne-Again Shell (bash) does provide a local declaration to make variables local to the function in which the declaration occurs. The syntax is simple:
local variable1 variable2 ... variableN
One thing you should know about the local declaration is that it still does not make shell variables work the way they do in C/C++. The shell uses dynamic scoping (like older Lisps), rather than static scoping like just about every language now in existence. "Dynamic scoping" is a common name for "indefenite scope" + "dynamic extent". So now we got two big words to wrestle withinstead of just one -- what a deal! Here's a description of what those big words stand for (borrowed mostly from the Emacs Lisp manual): Scope refers to where textually in the source code the variable can be accessed. Indefinite scope means that any part of the program can potentially access the variable. Extent refers to when, as the program is executing, the variable's value is accessible. Dynamic extent means that the variable's value is accessible as long as the construct (function) that established it is active(has not returned). This is very-very different from C/C++:
#! /bin/bash
x=2
f()
{
local y
y=$x
echo \$y ':' $y
x=10
}
g()
{
local x
x=3
f
echo \$x ':' $x
}
h()
{
local x
x=5
f
echo \$x ':' $x
}
f
g
h
sh foo.sh
$y : 2
$y : 3 # C/C++ would have 2 here
$x : 10 # C/C++ would have 3 here
$y : 5 # C/C++ would have 2 here again
$x : 10 # C/C++ would have 5 here
One conceptual model of how "dynamic scoping" works is that the shell keeps a stack of "value cells" for each variable. The current value (what you get when you stick a $ in front of the variable name) is found on top of the stack, and assignments change the value found in the topmost "value cell". The local keyword pushes a new value on top of the stack; and when the function in which the local declaration occurs exits, the "value cell" gets popped from the stack.
Shell functions can be called recursively, which makes it quite easy to write, say, a directory tree walker in the shell:
action()
{
echo "I am in `pwd`"
}
make_dir_list()
{
unset mydirs
# The pattern below doesn't match anything that begins with a `.'(dot).
# The Unix FAQ q2.11 talks about ways to really match everything.
# their two suggestions are:
# `ls -a | sed -e '/^\.$/d' -e '/^\.\.$/d'`
# In ksh, you can use: .!(.|) *
#
for f in * ; do
[ -d $file ] && mydirs="$mydirs $f"
done
echo $mydirs
}
traverse()
{
if [ $# -eq 0 ] ; then # no directories below this one
action
return
else
action # visit this directory
for subdir ; do # ... then visit the children
cd $subdir
traverse `make_dir_list`
cd ..
done
fi
}
traverse `make_dir_list`