So you want to use your computer for science…

It’s been a while since I was a new graduate student, and I’ve forgotten how little I knew about computers back then.  I was reminded recently while teaching a couple of lab members how to use ffmpeg, an excellent command line tool for building animations from images (as described in this post).  We got there, but I realized that we needed a basic computing tutorial before moving on to anything more advanced.  If you’re trying to use your laptop for science, but you’re not too sure about this whole “command line thing”, this post’s for you.  Be warned that this tutorial is intended as only the most cursory crash course to get you moving up the initial learning curve.  A comprehensive list of commands for Bash on OSX can be found here.  At the end of this tutorial you will have a basic grasp of how to:

  • Navigate the file system
  • Create and edit text files
  • Search text files
  • Create a shell script
  • Modify the PATH variable using startup files (.bash_profile)

Okay, so I want to get a computer to do some science.  What should I get?

Whatever you want.  It really doesn’t matter.  Most grad students seem to get Macs these days, which I don’t love (they’re costly and epitomize form over function), but have the slight advantage of a Unix-like environment hiding behind all that OSX gibberish.  I use a Windows machine (which I also don’t love, as it epitomizes dysfunction over function) with Cygwin, which gives me access to all the Linux tools that I need to carry out day-to-day operations.  Windows 10 users can also make use of the Bash shell add-on for Windows, but I haven’t found any advantage to this over Cygwin.  The point is that you need either a) A Mac, b) A Windows machine running Cygwin or the add-on, or c) A Linux machine.  The command prompt and output given below are what you will see in OSX (faked since I’m not actually using a Mac), but are similar to what you will get with Cygwin or one of the Linux distros.  The commands should work the same across all of these options.

Getting familiar with “Bash”

Bash stands for Bourne-again shell, which you can read all about in many other places on the web.  Bash is a very powerful tool for manipulating your computer’s file system, executing programs, and even creating programs.  It is a cornerstone of scientific computing and you should have at least some passing familiarity with it.  To open the Bash terminal in OSX, go to Applications/Utilities/Terminal in Finder.  A mysterious black (or white) window will open, with a white (black) cursor waiting for YOU.  Type “pwd” for print working directory and hit “Enter”.

jeffscomputer:~ jeff$ pwd
/Users/jeff

Bash will respond with your location, which should be something along the lines of /user/home.  Next type “ls” to list the contents of the directory.

jeffscomputer:~ jeff$ ls
bin
Desktop
Documents
Downloads

We can use the command “cd” to change directories.  For example, if you want to move to your Desktop directory type “cd Desktop”.

jeffscomputer:~ jeff$ cd Desktop
jeffscomputer:Desktop jeff$ pwd
/Users/jeff/Desktop

Because the directory “Desktop” was just one level down, in this case the relative path “Desktop” is equivalent to typing the full path “/Users/jeff/Desktop”.  Here’s a useful tip.  From any location on your system “cd ~” will get you back to your home directory.

jeffscomputer:Desktop jeff$ cd ~
jeffscomputer:~ jeff$ pwd
/Users/jeff

Now let’s create a directory to do some work.  The command for this is “mkdir temp”, for “make directory with name temp.”

jeffscomputer:~ jeff$ mkdir temp
jeffscomputer:~ jeff$ cd temp

Now move into that directory.  You already know how 🙂

Creating and editing text files

You will frequently need to create and edit basic text files without all the fancy formatting of a word processing document.  The most user friendly way to do this is with the program nano, which is likely already present if you are using OSX or Cygwin.  Type “nano temp.txt” and nano will open a blank text file with name temp.txt.

jeffscomputer:temp jeff$ nano temp.txt

Type a couple lines of text and, when you’re ready to exit and save the file, hit ctrl-x.  Nano will prompt you about saving the output, hit yes.  List the contents of the directory and notice that the file temp.txt now exists.  Type “nano temp.txt” again and rather than create a new file, nano will open temp.txt for editing.

Having gone through the trouble of creating that file, let’s go ahead and delete it using the remove or “rm” command.

jeffscomputer:temp jeff$ rm temp.txt

To do some fancier things with files lets download one that has a little more information in it.  There are two programs to use to fetch files online, wget and curl.  I find wget much easier to use than curl.  If you’re using Cygwin or Linux you likely already have it, but for OSX you first need to install a package manager, which is a whole can of worms that I’m not going to tackle in this post.  So let’s use curl to download a text file, in this case “The Rime of the Ancient Mariner” by Samuel Coleridge.  The basic download command for curl is:

jeffscomputer:temp jeff$ curl -O https://www.polarmicrobes.org/extras/ancient_mariner.txt

This should create the file “ancient_mariner.txt” in your working directory (e.g., “/Users/jeff/temp”).

Viewing file content

The reason we downloaded this more complex text file (it’s a pretty long poem) is to simulate a longer data file than our “temp.txt” file.  Very often in scientific computing you have text files with hundreds, thousands, or even millions of lines.  Just opening such a file can be onerous, let alone finding some specific piece of information.  Fortunately there are tools that can help.  Type “head ancient_mariner.txt”, this returns the top 10 lines of the file.

jeffscomputer:temp jeff$ head ancient_mariner.txt
PART THE FIRST
It is an ancient mariner,
And he stoppeth one of three.
"By thy long grey beard and glittering eye,
Now wherefore stopp'st thou me?
"The Bridegroom's doors are opened wide,
And I am next of kin;
The guests are met, the feast is set:
May'st hear the merry din."
He holds him with his skinny hand,

Want to guess what “tail” does?  For either command you can use a “flag” to modify behavior, such as returning more lines.  Flags are always preceded by “-” or “–“, and generally come before the positional arguments of the command, in this case the file.  This is general syntax for Unix commands, and does not apply only to “head” and “tail”.

jeffscomputer:temp jeff$ tail -15 ancient_mariner.txt
Both man and bird and beast.
He prayeth best, who loveth best
All things both great and small;
For the dear God who loveth us,
He made and loveth all.
The Mariner, whose eye is bright,
Whose beard with age is hoar,
Is gone: and now the Wedding-Guest
Turned from the bridegroom's door.
He went like one that hath been stunned,
And is of sense forlorn:
A sadder and a wiser man,
He rose the morrow morn.

In the examples so far the the command output has printed to the screen, but what if we want to capture it in a file?  For example, what if we have a huge data file (millions of rows) and we want just the top few lines to test some code or share with a collaborator?  This is easily done by redirecting the output using “>”.

jeffscomputer:temp jeff$ tail -15 ancient_mariner.txt > end_of_ancient_mariner.txt

Searching file content

To find specific information in a file use the command “grep”.  Without launching into a full-on explanation of regular expressions, grep (very quickly) finds lines that match some given pattern.  You can either count or view the lines that match.  To find the lines with the word “albatross” in ancient_mariner.txt:

jeffscomputer:temp jeff$ grep 'Albatross' ancient_mariner.txt
At length did cross and Albatross,
I shot the Albatross."
Instead of the cross, the Albatross
The Albatross fell off, and sank
The harmless Albatross."
The Albatross's blood.

Seems there’s a typo in this version of the poem (and|an)!  At any rate, use the -c flag to count the lines.  Protip: use the “up” key to bring back the previous line, which you can then modify.

jeffscomputer:temp jeff$ grep -c 'Albatross' ancient_mariner.txt
6

Use the -v flag to select against lines with “Albatross”, which you can combine with the -c or other flags:

jeffscomputer:temp jeff$ grep -cv 'Albatross' ancient_mariner.txt
634

Build a basic program

So far we’ve been executing commands manually, from the command line.  Suppose we have a set of commands that we want to execute a lot, or that need a method to document our workflow.  To do this we create a shell script.  Fire up nano for a file named “temp.sh” and type:

#!/bin/bash

echo "hello world!"
# hey, this line doesn't do anything!
f=ancient_mariner.txt
echo $f
grep -cv 'Albatross' $f

Line by line here’s what’s going on:

  • The first line is called the shebang and it tells your system what interpreter to use to run the script (in this case Bash).  /bin/bash is an actual location on your computer where the Bash program resides
  • The next line is just a bit of formality; by strictly adhered-to convention your first program should always be a little script that says “hello world!”.  It does however, illustrate the “echo” command, which prints out information.
  • The next line starts with “#” which denotes a comment.  Line that start with “#” are not read by Bash.  You can use that character to make notes, or to toggle commands on and off.
  • In the next line we assign a variable (f) a value (ancient_mariner.txt).  We can now call that variable using “$”.  The next two lines are examples of this.

To execute the script we simply type the file name into bash.  Before we do that however, we need to set the file permissions, as files are not by fault executable.  To do that we use the “chmod” command with the “a+x” options (note that this is not a flag).

jeffscomputer:temp jeff$ chmod a+x temp.sh

You can run it now, but there is one final trick.  Bash doesn’t know to look in your working directory for the script, you have to specify that that’s where it is.  The location of the current working directory is always “./”, so the command looks like this:

jeffscomputer:temp jeff$ ./temp.sh
hello world!
ancient_mariner.txt
634

Setting up your environment

Okay, we’re going to ramp things up a bit for the grand finale and modify the Bash startup files to better use your new-found skills.  There are several possible startup files, and the whole startup file situation gets a bit confusing.   We will modify the .bash_profile file, which will handle the majority of user cases, but you should take the time to familiarize yourself with the different files.  The .bash_profile file is a hidden file (denoted by the .), you can see hidden files by using “ls” with the “-a” flag.

jeffscomputer:temp jeff$ cd ~
jeffscomputer:~ jeff$ ls -a
bin
Desktop
Documents
Downloads
.bash_history
.bashrc
.profile
.ssh

Don’t worry if you don’t see .bash_profile, we will create it shortly.  First, to understand why we need to modify the startup file in the first place, from your home directory try executing the shell script that we just created.

jeffscomputer:~ jeff$ temp.sh
-bash: temp.sh: command not found

The command would execute if you typed temp/temp.sh, but remembering the location of every script and program so that you can specify the complete path to it would be silly.  Instead, Bash stores this information in a variable named PATH.  To see what’s in PATH use “echo”:

jeffscomputer:~ jeff$ echo $PATH
/usr/local/bin:/usr/bin:/usr/sbin

If you created a script in /usr/local/bin, Bash would know to look there and would find the script.  It’s a good idea however, to keep user-generated scripts in the home directory to avoid cluttering up locations used by the operating system.  What we need to do is automatically update PATH with customized locations whenever we start a new Bash session.  We accomplish this by modifying PATH on startup using the startup file.  To do this use nano, but you will need to use nano as a super user or “sudo”.

jeffscomputer:~ jeff$ sudo nano .bash_profile

As you already know, if you don’t have .bash_profile this will create it for you.  Now, at the end of the .bash_profile file (or on the first line, if its empty), type (replacing “jeff” with your home directory):

export PATH=/Users/jeff/temp:$PATH

The command structure might look opaque at first but its really not.  This line is saying “export the variable PATH as this text, followed by the original PATH variable”.

Close nano.  Recall that .bash_profile is read by Bash at the start of the Bash session.  Your newly defined PATH variable will be read if you start a new session, but you can also force Bash to read it with the “source” command.

jeffscomputer:~ jeff$ source .bash_profile

Now try to execute your bash script.

jeffscomputer:~ jeff$ temp.sh
hello world!
ancient_mariner.txt
634

Last, let’s clean things up a little.  Previously we used “rm” to remove a file.  The same command with the -r flag will remove a directory.

jeffscomputer:~ jeff$ rm -r temp

Voila!  To keep your .bash_profile file looking pretty, don’t forget to remove the line adding temp to PATH (though it does no harm).  Or you can comment that line out and leave it as an example of the correct syntax for when you next add a new location to PATH.

7385 Total Views 2 Views Today
This entry was posted in Computer tutorials. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield