Fork me on GitHub
25 Jul 2014 / Earle Wilson / How to...

Review of the basics

Code Formatting

New statements, comments and blank lines

Python statements are normally written on individual lines. Multiple statements can be written on the same line if they are separated by semicolons (;). Having multiple statements on a single line is allowable as long as none of the statements start a new code block, such as a for loop or an if statement. Here is a simple set of Python statements:

In [1]:
x = 10
y = 20
z = x+y
print z
30

The above lines can also be written as:

In [2]:
x = 10; y = 20; z = x+y; print z
30

Lines that contain only white spaces are considered blank lines and are ignored by the Python Interpreter. Comments are identified by the hash symbol (#). The Python interpreter ignores anything on a line that comes after # :

In [3]:
x = [1, 3, 4, 6] #this line creates a list of numbers named x
for number in x:
    #this loop prints each number in x
    print number
1
3
4
6

Indentation

Python uses indentation to indicate blocks of code, such as if statements, for loops and functions. For example, a simple but syntactically correct if statement would be:

In [4]:
x = 20
if x > 10:
    print x
    print "This number is larger than 10."
20
This number is larger than 10.

The first print statement prints the value of x. The line below prints the statement in quotations. In MATLAB, indented formatting is encouraged but not mandatory. In Python, indentation is strictly enforced. For example, the following would produce an error:

In [5]:
x = 20
if x > 10:
    print x
      print "This line is not properly indented."
  File "<ipython-input-5-d8a5ef6c8cb2>", line 4
    print "This line is not properly indented."
    ^
IndentationError: unexpected indent

By convention, a new block of code is indented by four spaces. Any number of spaces are allowable as long as all statements within the block are indented by the same amount. This is Python's way of forcing users to write more reader friendly code.

Also note there is no end statement in the first code snippet. To end an if statement, or any nested block of code, you simply return to the previous indentation:

In []:
x = 20
if x > 10:
    print x
    print "x is larger than 10."
print "The previous if statement has ended." 

Standard Data Types

The python standard library supports many data types. Here, we introduce the five primary built-in data types: Numbers, strings, lists, tuples and dictionaries. Bear in mind that there are many other data types outside the standard library. These are only accessible after importing a specific module. For example, the numerical python module (NumPy) supports numerical array and masked array data types. These are important for data analysis and will be discussed in a later section.

Numbers

There are four standard numerical data types: integers (signed integers), floating point real numbers or floats, complex numbers and long integers. In practice, you will probably only use the first three. You can define floats and integers as follows:

In [6]:
x_int = 20 #this is an integer
x_float = 20.0 #this is a floating point number

To define a complex number, you would use the complex function:

In [7]:
x_complex = complex(3,4) #3 + 4i

To get the real and imaginary parts of complex number, use would use the methods: x_complex.real and x_complex.imaj

There are also similar functions for create integers and floats. These same functions can be used to convert one numeric type into another:

In [8]:
x_float = float(20) 
print x_float 
x_int= int(-20.2334) 
print x_int
20.0
-20

Important: Any operation involving integers will produce integers. For example:

In [9]:
1/2
Out[9]:
0

Likewise,

In [10]:
21/2
Out[10]:
10

In both examples, the solution is floored to the nearest integer. This behavior would be unexpected if you are coming from matlab. To produce the actual mathematical solution, you would need to make at least one of the numbers a float:

In [11]:
1./2
Out[11]:
0.5

Before we move one, we need to mention that numbers are immutable objects. This means that you cannot alter a numerical object after it is defined. For example, if you type a = 1230, you cannot change a to 1234 by doing: a[3] = 4. Your only option is to do reassign a to a new number: a = 1234. If you are a MATLAB user, this might seem trivial since numbers in MATLAB have the same characteristics. However, immutability is a more central concept in Python and applies to other data types.

Strings

Strings are defined by quotation marks. Both single and double quotations may be used to define a string literal. The only rule is that you can't mix single quotes with double quotes in a single definition. For example:

In [12]:
str1 = 'banana'
str2 = "banana"

The advantage of using double quotes is that you can easily include apostrophes:

In [13]:
song_title = "Don't stop believin' "

Triple quotes perform the same function as single and double quotes but they have added advantage of being able to span multiple lines:

In [14]:
first_two_lines = """Just a small town girl
Livin' in a lonely world """

Triple quotes are especially useful for providing function descriptions. For example:

In [15]:
def my_function():
    """ This a simple function to demonstrate
    the use of triple quotes. You would normally
    put some a brief description of your function in this space.
    To access this information from the command line, 
    you first import the module (if in a separate script), 
    then type: my_function.__doc__ or 
    help(my_function)"""
    pass #Tells interpreter: nothing to see here, carry on.

Since this particular function is already in our workspace, we query about the function using the help function:

In [16]:
help(my_function)
Help on function my_function in module __main__:

my_function()
    This a simple function to demonstrate
    the use of triple quotes. You would normally
    put some a brief description of your function in this space.
    To access this information from the command line, 
    you first import the module (if in a separate script), 
    then type: my_function.__doc__ or 
    help(my_function)


String slicing

The first thing to note is that Python uses a zero based indexing system:

In [13]:
song_title = "Don't stop believin' "
In [17]:
print song_title[0] 
D

Portions of a string may be accessed using the following slicing syntax:

In [18]:
song_title[2:]
Out[18]:
"n't stop believin' "

The above produces the third string character and everything that comes after it. To get the first four characters you would do:

In [19]:
song_title[:4]
Out[19]:
"Don'"

Likewise, to the last 5:

In [20]:
song_title[-5:]
Out[20]:
"vin' "

There is a space after the apostrophe, so there are in fact 5 characters.

To summarize, the first element of any python sequence is indexed by 0. str[n:] selects for str[n] and all the characters that follow, while str[:n] selects for all the characters that come before str[n].

Like numbers, strings are immutable. Once a string is defined, you cannot alter any character in the string. Immutability makes strings relatively simple and more efficient to use. To create a modified version of a string, you first have to create a new string.

More info on Python strings, check of the official String documentation

Lists

A Python list is exactly what you think it is; it is a list or sequence of variables. Here is a simple list of numbers:

In [21]:
numberlist = [1, 2, 3, 4]

Note that the items of the list are comma separated and bounded by square brackets. A list may contain any type of variable and can be any length:

In [22]:
mylist = ['year', 2001, 2002, 2003, 2004]

The indexing and slicing rules discussed for strings also apply to lists. To access the character e in the first list item, you would use mylist[0][1]. Unlike strings, lists are mutable:

In [23]:
mylist[3] = 2010
mylist
Out[23]:
['year', 2001, 2002, 2010, 2004]

To add an item to a list, you can use the append method:

In [24]:
mylist.append(2005)
mylist
Out[24]:
['year', 2001, 2002, 2010, 2004, 2005]

A more detailed description of lists and their available operations can be found here.

Dictionaries

Dictionaries are mutable containers for Python objects. Dictionaries consist of keys and their contents. Here is a simple dictionary:

In [25]:
contacts = {'John': 6462100, 'Adam': 6461870}

In the above example, the string objects John and Adam are the keys. Values are assigned to keys with a colon (:). Keys must be unique and must be an immutable object such as a string, tuple or number. Values can be any Python object. Dictionaries are particularly useful for storing data. Here is how you could store data from a hypothetical profiling float:

In [26]:
float55 = {'Date': '2010-09-13', 'Lat': 10.1,'Lon': 78.5, 'salinity': [28.8, 31.3, 34.5, 35.1]}

Dictionary data are retrieved using keys instead of numeric indices. For example:

In [27]:
float55['Lon']
Out[27]:
78.5

Similarly, to update data stored under a key, you could do:

In [28]:
float55["Date"] = [2010, 9, 13]
float55
Out[28]:
{'Date': [2010, 9, 13],
 'Lat': 10.1,
 'Lon': 78.5,
 'salinity': [28.8, 31.3, 34.5, 35.1]}

You can use the same syntax to add a new key-value item pair:

In [29]:
float55["Temperature"] = [31.1, 30.5, 28.5, 30.2] #inversion!
float55
Out[29]:
{'Date': [2010, 9, 13],
 'Lat': 10.1,
 'Lon': 78.5,
 'Temperature': [31.1, 30.5, 28.5, 30.2],
 'salinity': [28.8, 31.3, 34.5, 35.1]}

To delete dictionaries, you can use the del statement:

In [30]:
del float55["Lon"]

You might notice that a Python dictionary is somewhat similar to a MATLAB struc- ture. In fact, as you will see later, MATLAB structures are imported into Python as dictionaries.

Tuples

Tuples are like lists with the exception that they are immutable. A simple tuple would be:

In [31]:
tup = ("buffalo", 2, "eggs", 4, 20) #parentheses are optional

You can slice and index a tuple the same way you do a string or list:

In [32]:
print tup[2] #prints "eggs"
eggs

To define a single element tuple you have to use the following syntax:

In [33]:
tup1 = (10 ,)

Without the trailing comma, Python interprets tup1 as an integer. One common application of tuples is string formatting. For example:

In [34]:
vble_name = 'SSH' 
month = 'Jan' 
year = 2010
print "Plotting %s data for %s %d" %(vble_name, month, year)
Plotting SSH data for Jan 2010

The %s and the %d are placeholders for the elements in the tuple (vble name, month, year). These are called argument specifiers. You would use %s, %d and %f when substituting for strings, integers and floats, respectively. This is what happens when you replace %d with %f:

In [35]:
print "Plotting %s data for %s %f" %(vble_name, month, year)
Plotting SSH data for Jan 2010.000000

We can specify the number of digits in the float by doing:

In [36]:
print "Plotting %s data for %s %.1f" %(vble_name, month, year)
Plotting SSH data for Jan 2010.0

String formatting is handy when constructing strings for filenames, plot titles, axis labels etc. To learn more, visit the string formatting documentation.

Other standard/important data types

Other standard data types not discussed here are sets, frozen sets and booleans.

For matrix and array manipulation, there is also the Numpy array and masked array data types. These are the data types that you would use to store and manipulate data. A description of Numpy arrays will be provided later.

Functions, modules and packages

Functions

Functions are defined using the following syntax:

def my_function ():

<statement(s)>

The necessary syntax involves a def statement followed by the function name, open and close parentheses, and a colon (:). The code block that is to be executed when the function is called is indented. The code block can include any valid Python statement. To call this function you would type my_function() - the parentheses are always necessary. A function may have any number of inputs or outputs. To get output from a function, you must use the return statement:

def my_function(arg1, arg2):

<statement(s)>

return output

Here is a more tangible example:

In [37]:
def selfAwareFunction ():
    print "I am a Python Function."
    print "I require nothing and provide nothing." 
    print "I am not very useful."

Here is more useful function:

In [38]:
def findQuadRoots(a,b,c):
    """Finds the roots of a quadratic equation of the form: 
    ax**2 + bx + c = 0. 
    
    The function solves the quadratic formula:
        roots = (-b +/- (b**2 - 4*a*c)**(0.5))/(2*a) 
    and prints the root(s) whether real or complex.
    """ 
    
    a = float(a) # to ensure that math works out properly.
    discrim = b**2 - 4*a*c #compute discriminant
    
    if discrim < 0:
        realpart = -b/(2*a)
        complexpart_pos = (-discrim)**(0.5)/(2*a)
        complexpart_neg = -complexpart_pos
        pos_root = complex(realpart,complexpart_pos)
        neg_root = complex(realpart,complexpart_neg)
        
        print "Equation has two complex roots:"
        print "%.2f +/- %.2f i" %(pos_root.real, pos_root.imag)
        #The %.2f specifies two decimal places.
        
    if discrim == 0:
        root = -b/(2*a)
        print "Equation has one unique root: %.2f" %root
    if discrim > 0: 
        pos_root = (-b + discrim**(0.5))/(2*a)
        neg_root = (-b - discrim**(0.5))/(2*a)
        
        print "Equation has two real roots: "
        print "%.2f and %.2f" %(pos_root,neg_root)       

As mentioned earlier, triple quotes can be used to provide a brief description of your function. Doing this is generally good form. If I type help(findQuadRoots) into my command line, I get a print out of the function’s description.

In [39]:
help(findQuadRoots)
Help on function findQuadRoots in module __main__:

findQuadRoots(a, b, c)
    Finds the roots of a quadratic equation of the form: 
    ax**2 + bx + c = 0. 
    
    The function solves the quadratic formula:
        roots = (-b +/- (b**2 - 4*a*c)**(0.5))/(2*a) 
    and prints the root(s) whether real or complex.


Now use the function:

In [40]:
findQuadRoots(2,4,8)
Equation has two complex roots:
-1.00 +/- 1.73 i

In [41]:
findQuadRoots(2,6,4)
Equation has two real roots: 
-1.00 and -2.00

Modules

Modules are python files that implement a set of functions. When you write a script with a function and save the file, you create a module. To use a function from a module, you have to import it first. Python has many built in modules. A common module is the os module. To import and use the os module, you would do:

In [42]:
import os
cwd = os.getcwd() #get current working directory

The import statement provides access to all the functions available in the os module. The second line shows how to use a function from a module. To find out which functions are included in a module, you would use the help() and dir() functions. The dir function prints out a list of all the names defined in the module. help(module) provides information about an entire module or a specific function from the module. For example, to learn more about os.getcwd, you would do:

In [43]:
help(os.getcwd)
Help on built-in function getcwd in module posix:

getcwd(...)
    getcwd() -> path
    
    Return a string representing the current working directory.


Different ways to import modules

There a few ways to import a module. If the module name is relatively long, it is annoying to have to re-write the module name each time you want to use one of its function. A solution would be to import the module under an alias:

In [44]:
import numpy as np
np_array = np.array([1, 2, 3, 4]) #creates a numpy array

Sometimes you will only need a one function from the module. For example, the following imports the chdir from the os module into the current namespace:

In [45]:
from os import chdir 

This allows use of the chdir function without referencing the os module. It’s also possible to import all names defined from a module into your current namespace. This is done using the following syntax: import module \*. However, doing this is strongly discouraged since it makes code less readable.

Packages

A package is simply a collection of modules. Packages may include other packages. Packages are sometimes called libraries or (less commonly) toolboxes. You can import packages the same way you do modules.

Comments