Introduction to Statistical Learning Using Python

Table of Contents

  • LAB1: Introduction to Python
    • Basic Elements of Python
    • Working with Lists
    • IF Statement
    • FOR and While Loops
    • Functions
  • LAB2: Dealing With Data
    • Importing and Manipulating Data
    • Dealing with Missing Values
    • Summary Statistics
    • Understanding Data
    • Data Interpolation
    • Visualizing Data
  • LAB3: Real World Data
    • Logistic Regression
    • Linear and Quadratic Discriminant Analysis
    • Classification Trees, Random Forest

Basic Elements of Python

  • Object are the core things that Python programs manipulate.
  • Every object has a type that defines the kinds of things that programs can do with objects.

More About Jupyther Notebook, Python and Object-Orianted Programming

In [1]:
from IPython.display import IFrame, Math, Latex
IFrame('http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html', width='100%', height=350)
Out[1]:
In [2]:
IFrame("https://wiki.python.org/moin/BeginnersGuide/Overview", width='100%', height=350)
Out[2]:
In [3]:
IFrame('https://en.wikipedia.org/wiki/Object-oriented_programming', width='100%', height=350)
Out[3]:

You can even Include YouTube Videos

In [4]:
from IPython.display import YouTubeVideo
YouTubeVideo('-5kkdYO7QeQ')
Out[4]:

Variables provide a way to associate names with objects

In [5]:
Student_first_name = "Aramayis"
Student_surname= "Dallakyan"
Student_year_at_school = 3
Student_GPA = 3.85

  • Remember: A variable is just a name.
  • We will get the same answer if we name variables differently.
  • An "assignment" statement associates the name to the left of the = symbol with the object denoted by the expression to the right of the =

Example

  • Find Area of Circle

$$Area = \pi*r^2$$

In [6]:
pi = 3.14159
diameter = 11.2
area = pi*(diameter**2)
In [7]:
print(area)
394.0810496

Dealing with String

  • A string is a series of characters. In Python, anything inside quotes is considered a string
In [8]:
" This is a string"
Out[8]:
' This is a string'

Combining or Concatenating Strings

In [9]:
Student_first_name = "aramayis"
Student_surname= "dallakyan"
In [10]:
Student_full_name = Student_first_name + " " + Student_surname
In [11]:
print(Student_full_name)
aramayis dallakyan
In [12]:
print(Student_full_name.title())
Aramayis Dallakyan

  • Here .title() is a method that tells Python to act on variable Student_full_name

In [13]:
titled_name =Student_full_name.title() #### You can create new variable and assingn titles string
print(titled_name)
Aramayis Dallakyan
  • ## Other methods for strings
In [14]:
print (Student_full_name.upper())
print(Student_full_name.lower())
ARAMAYIS DALLAKYAN
aramayis dallakyan

Numbers: Integers and Floats

  • You can add(+), subtract(-), multiply(*), and divide (/) integers in Python.
In [15]:
print(2+3)
5

Find Perimeter of square

$$ P= 4 * Edge$$

In [16]:
a = 5
Perimeter = 4*a
print ("Perimeter= " + str(Perimeter))
Perimeter= 20

Find Area of Square

$$Area = (Edge)^2$$

In [17]:
Area_1 = a*a
##or
Area_2 = a**2
print ("Area1=" + str(Area_1))
print ("Area2=" + str(Area_2))
print("Are they equal?: "+ str(Area_1 == Area_2))
Area1=25
Area2=25
Are they equal?: True

Using a variable's value within a massage

In [18]:
age = 29
message = "Happy" + age +"rd Birthday"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-ea8c19e1114f> in <module>()
      1 age = 29
----> 2 message = "Happy" + age +"rd Birthday"

TypeError: cannot concatenate 'str' and 'int' objects

  • This is example of "type error". That is Python can't recognize the kind of information you're using

  • To be able use integers within string we need explicit specification.

In [19]:
age = 29
message = "Happy " + str(age) +"rd Birthday"
In [20]:
print(message)
Happy 29rd Birthday

Another Example

In [21]:
Information = str(Student_year_at_school)+"rd year student" + " " +Student_full_name
In [22]:
print(Information)
3rd year student aramayis dallakyan

List

  • List is a collection of items in a particular order.
  • List can include the letters or the alphabet, digits, names of peoples. In other words, you can put everything in a list.
In [23]:
Courses = ["AGEC 619", 'AGEC 630', 'AGEC 621',"AGEC 622"]
In [24]:
print (Courses)
['AGEC 619', 'AGEC 630', 'AGEC 621', 'AGEC 622']

Accessing Elemnts in a List

In [25]:
print(Courses[0])
AGEC 619

Using String methods

In [26]:
print(Courses[0].lower())
agec 619

Remember: Index position starts from 0, Not 1

In [27]:
print (Courses[1])
print (Courses[5])
AGEC 630
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-27-a2a42da94260> in <module>()
      1 print (Courses[1])
----> 2 print (Courses[5])

IndexError: list index out of range

Access Last Item

In [28]:
print (Courses[-1])
AGEC 622

Using Individual Values from a List

In [29]:
Information = "The required Courses are " + Courses[0].title() + ' and ' + Courses[3].title() 
print (Information)
The required Courses are Agec 619 and Agec 622

Changing, Adding and Removing Elements

In [30]:
Courses[0]= 'ECON 629'
In [31]:
print(Courses)
['ECON 629', 'AGEC 630', 'AGEC 621', 'AGEC 622']
In [32]:
Courses.append('AGEC 619')
In [33]:
print(Courses)
['ECON 629', 'AGEC 630', 'AGEC 621', 'AGEC 622', 'AGEC 619']

  • Note: Append method helps to build lists dynamically. Letter we will learn how to do it more efficiently using Loops

In [34]:
Other_Courses= []
In [35]:
Other_Courses.append("Stat 630")
In [36]:
Other_Courses.append("Stat 610")
In [37]:
Other_Courses.append("Econ 630")
In [38]:
print (Other_Courses)
['Stat 630', 'Stat 610', 'Econ 630']

Inserting Elements into a List by position

In [39]:
Courses.insert(0,"MRKT 680")
In [40]:
print(Courses)
['MRKT 680', 'ECON 629', 'AGEC 630', 'AGEC 621', 'AGEC 622', 'AGEC 619']

Removing an Item from List

In [41]:
del Courses[1]
In [42]:
print(Courses)
['MRKT 680', 'AGEC 630', 'AGEC 621', 'AGEC 622', 'AGEC 619']

  • Note: In case when you want to remove item and later use the value of removed item, it is better to use pop() method. Default is to pop last element.

In [43]:
Not_offered_course = Courses.pop()
In [44]:
print (Not_offered_course)
AGEC 619
  • You can use pop() for any position.
In [45]:
Other_Department_Course = Courses.pop(0)
In [46]:
print(Other_Department_Course.title() + " does not offered in fall")
Mrkt 680 does not offered in fall

  • Note: Sometemes you know only value of the elements. In that case you can use remove() method to remove element by values.

In [47]:
print(Courses)
['AGEC 630', 'AGEC 621', 'AGEC 622']
In [48]:
Courses.remove("AGEC 622")
In [49]:
print (Courses)
['AGEC 630', 'AGEC 621']

Orginizing List

In [50]:
Courses= ["AGEC 619", 'AGEC 630', 'AGEC 621',"AGEC 622","ECON 629", "STAT 610", "MRKT 680", "MATH 641"]
In [51]:
print (Courses)
['AGEC 619', 'AGEC 630', 'AGEC 621', 'AGEC 622', 'ECON 629', 'STAT 610', 'MRKT 680', 'MATH 641']
In [52]:
Courses.sort()
In [53]:
print (Courses)
['AGEC 619', 'AGEC 621', 'AGEC 622', 'AGEC 630', 'ECON 629', 'MATH 641', 'MRKT 680', 'STAT 610']
In [54]:
Courses.reverse()
In [55]:
print(Courses)
['STAT 610', 'MRKT 680', 'MATH 641', 'ECON 629', 'AGEC 630', 'AGEC 622', 'AGEC 621', 'AGEC 619']
In [56]:
len(Courses)
Out[56]:
8

WHY USE LOOPS????

  • Usually you want to run through all entries in a list and perform the same task with each item. For example in a list of numbers you want to perform the same statistical operation on every element.
In [57]:
Students = ['Muhammad Abdullah',"Kayli Abernathy","Jacqueline Alvarez","Lainey Bourgeois","Garrett Carr","Chance Chapman"]
In [58]:
for student in Students:
    print (student)
Muhammad Abdullah
Kayli Abernathy
Jacqueline Alvarez
Lainey Bourgeois
Garrett Carr
Chance Chapman

How loop works?

Python initially reads the first line of the loop.

  • for student in Students

This line tells Python to go into list Students and retrieve first element and store it in the variable "student". The first student is "Muhammad Abdullah". Then Python reads the next line:

  • print (student)

Python prints the first value of "student". Because the list contains more values, PYTHON returns to the first line of the loop. PYTHON repeats the same purcedure until no more items remains in the list.

Note: for building loops you can use any name instead of student.

  • ## Another Example
In [59]:
for student in Students:
    print (student.title() +" is a good student")
Muhammad Abdullah is a good student
Kayli Abernathy is a good student
Jacqueline Alvarez is a good student
Lainey Bourgeois is a good student
Garrett Carr is a good student
Chance Chapman is a good student
In [60]:
names = str()
for student in Students:
    names += student +" ,"
print("Our MAB students are " + names)
Our MAB students are Muhammad Abdullah ,Kayli Abernathy ,Jacqueline Alvarez ,Lainey Bourgeois ,Garrett Carr ,Chance Chapman ,

  • Note: you will get error if you forget colon and identation
  • Usually in Economics and Statistics we deal with the set of numbers. range() function makes it easy to deal with numbers.

In [61]:
for value in range(1,6):
    print(value)
1
2
3
4
5
  • You can store genereted numbers as a list
In [62]:
numbers = list(range(1,10))
In [63]:
print(numbers)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [64]:
even_numbers = list(range(2,11,2))
In [65]:
print(even_numbers)
[2, 4, 6, 8, 10]

Genereate squares for each integer from 1 through 10

In [66]:
squares = list()
for number in range(1,11):
    square = number**2
    squares.append(square)
print(squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Simple Statistics

In [67]:
min(squares)
Out[67]:
1
In [68]:
max(squares)
Out[68]:
100
In [69]:
sum(squares)
Out[69]:
385

List Comprehension

  • Question is can we make our for loop code more concise and more efficient when dealing with lists. The answer is yes. We need to use list comprehension.
In [70]:
squares = [number**2 for number in range(1,11)]
In [71]:
print (squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

  • Porcedure is following
    1. Open a set of square brackets and define the expression for the values you want to store in the new list.
    2. Write a for loop to generate the numbers you want.
  • Sometimes you want to work only with the specific part in the list, which Python calls slice.

In [72]:
Students = ['Muhammad Abdullah',"Kayli Abernathy","Jacqueline Alvarez","Lainey Bourgeois","Garrett Carr","Chance Chapman"]
In [73]:
print(Students[1:3])
['Kayli Abernathy', 'Jacqueline Alvarez']
  • When you omit the first index the slice starts at the beginning of the list:
In [74]:
print(Students[:3])
['Muhammad Abdullah', 'Kayli Abernathy', 'Jacqueline Alvarez']
  • Similarly
In [75]:
print(Students[2:])
['Jacqueline Alvarez', 'Lainey Bourgeois', 'Garrett Carr', 'Chance Chapman']
  • As weel you can generate loop through a Slice
In [76]:
print("Here are the firt four MAB students:")
for student in Students[:4]:
    print (student)
Here are the firt four MAB students:
Muhammad Abdullah
Kayli Abernathy
Jacqueline Alvarez
Lainey Bourgeois

IF STATEMENTS

  • Often we want to examine set of conditions and decide which action to take based on conditions. Python's IF statement helps to exemine current state of a program and respond appropriately to that state.
In [77]:
Students = ['Muhammad Abdullah',"Kayli Abernathy","Jacqueline Alvarez","Lainey Bourgeois","Garrett Carr","Chance Chapman"]
In [78]:
name= "Muhammad Abdullah"
if name in Students:
    print (name + " is MAB student")
Muhammad Abdullah is MAB student

  • The loop first checks if the current value of student is "Muhammad Abdullah"
  • Than if it is, the values is printed uppercase.
  • Else if the value is enything other ,than it's printed in title case.

Important: for conditional statements you need to use "==" equality operator, which returns True if the values on the left and right side of the operator match, and False if they don't.

Checking For Inequality

In [79]:
requested_course = "Agec 619"
In [80]:
if requested_course != "Agec 622":
    print ("Request Another Course")
Request Another Course

  • Here the != operator compares if requested_course is equal to the value "Agec 622". If these two value do not match Python returns True.

Numerical comparisons

In [81]:
########TRY TO ANSWER############
Question = 2*2
In [84]:
Answer = int(input("Enter an integer:"))
Enter an integer:15
In [85]:
if Answer == 4:
    print ("Good Job")
elif (Answer!=4) and (Answer <3) and (Answer <5):
        print("You are close")
else: 
    print ("Try one more time **GENIOUS**")
Try one more time **GENIOUS**

FINDING THE CUBE ROOT OF A PERFECT CUBE

In [86]:
x = int(input("Enter an integer" ))
for ans in range (0, abs(x)+1):
    if ans**3 >= abs(x):
        break
if ans**3 !=abs(x):
    print (x, "is not a perfect cube")
else:
    if x<0:
        ans = -ans
    print("Cube root of",x,'is ',ans)
Enter an integer27
('Cube root of', 27, 'is ', 3)

  • The code implements the exhaustive enumeration algortihm for finding roots. The "break" statement causes the loop to terminate before it has been run on each element in the sequence over which it is iterating.

Dictionaries

  • Dictionary is a collection of key-value pairs. Each key is connected to a value and you can use a key to access the value associated with that key.
In [87]:
student_1= {'UIN': "925024924",
            'first_name': "Muhammad",
           'second_name': "Abdullah"}
In [88]:
for key,value in student_1.items():
    print("\nKey: " + key)
    print("Value: " + value)
Key: first_name
Value: Muhammad

Key: second_name
Value: Abdullah

Key: UIN
Value: 925024924
In [89]:
Students = ['Muhammad Abdullah',"Kayli Abernathy","Jacqueline Alvarez","Lainey Bourgeois","Garrett Carr","Chance Chapman"]
In [90]:
student_1= {'UIN': "925024924",
            'first_name': "Muhammad",
           'second_name': "Abdullah",
           'courses':["Agec 619",'Agec 621','Agec 622']}
student_2= {'UIN': "925024924",
            'first_name': "Kayli",
           'second_name': "Abernathy",
           'courses':["Agec 619",'Stat 610','Stat 630']}
student_3= {'UIN': "925024924",
            'first_name': "Jacqueline",
           'second_name': "Alvarez",
           'courses':["Agec 621",'Math 641','Stat 611']}
Students = [student_1,student_2,student_3]
In [91]:
count =0
student_names= []
for student in Students:
    if "Agec 619" in student['courses']:
        student_names.append(student["first_name"])
        count= count +1
        
print("Total number of Agec 619 students are: ",count)
print ("Student who take the class are", student_names)
####### OR we can be more clever
print("Total number of Agec 619 students are: ",len(student_names))
('Total number of Agec 619 students are: ', 2)
('Student who take the class are', ['Muhammad', 'Kayli'])
('Total number of Agec 619 students are: ', 2)

While Loops

  • The "for loop" takes a collection of items and executes a block of code once for each item in the collection. In contrast, the while loop runs as long as, a certain condition is true.
In [92]:
current_number = 0
while current_number <10:
    current_number +=1
    if current_number % 2 ==0:
        continue
    print (current_number)
1
3
5
7
9

  • Important: Be carefull with indentation. With different indentation you may get error or which is worse wrong result

In [93]:
current_number = 0
while current_number <10:
    current_number +=1
    if current_number % 2 ==0:
        continue
print (current_number)
10
In [95]:
current_number = 0
while current_number <10:
    current_number +=1
    if current_number % 2 ==0:
    continue
    print (current_number)
  File "<ipython-input-95-bf152073e989>", line 5
    continue
           ^
IndentationError: expected an indented block

  • Newton_Raphson algorithm is the most commonly used approxiamtion alogorithm. We use this to find the real roots of function.

In [96]:
###Find c such that X**2 -24 is within epsilon of 0
epsilon = 0.01
k =24.0
guess = k/2.0
while abs(guess*guess -k) >= epsilon:
    guess = guess - (((guess**2) - k)/(2*guess))
print ("Square root of",k,"is about",guess)
('Square root of', 24.0, 'is about', 4.8989887432139305)

Functions

  • Functions are named blocks of code that are designed to do one specific job.
  • In Python each function definition is of the form:
    • def name of function(list of formal parameters)
      • body of function
In [97]:
#########
def max(x,y):
    """Define the function which find the maximum between two numbers.
    Return statement takes a value from inside a function and sends it back to the line that called the function."""
    if x>y:
        return x
    else:
        return y
In [98]:
max(3,4)
Out[98]:
4
In [99]:
def new_person(first_name,last_name, age=""):
    """Function returns a dictionary information abour a person.
        Note: Age value is optional"""
    person = {'first':first_name,'last':last_name}
    if age:
        person['age']=age
    return person
In [100]:
new_person("Aram","Dallakyan")
Out[100]:
{'first': 'Aram', 'last': 'Dallakyan'}
In [101]:
new_person("Aram","Dallakyan","29")
Out[101]:
{'age': '29', 'first': 'Aram', 'last': 'Dallakyan'}
In [102]:
def findRoot(x,power, epsilon):
    '''assumes pwr an int; val, epsilon floats > 0
    Returns float y such that y**power is within epsilon of x
        If such a float does not exist, it returns None'''
    assert type(power) == int
    assert type(x) == float
    assert type(epsilon) == float
    assert power > 0 and epsilon > 0
    
    if power%2 and val < 0:
        return None
    low = min(-1,0,x)
    high = max(1.0,x)
    ans = (high + low)/2.0
    while abs(ans**power -x) >= epsilon:
        #print 'ans =', ans, 'low =', low, 'high =', high
        if ans**power < x:
           low = ans
        else:
           high = ans
        ans = (high + low)/2.0
    return ans
In [103]:
findRoot(4.0,2,0.01)
Out[103]:
1.998046875

Handling Exceptions

In [104]:
def readInt():
    while True:
        val = input("Enter an Integer: ")
        try:
            val = int(val)
            return(val)
        except ValueError:
            print (val, "is not integer")
In [106]:
readInt()
Enter an Integer: 14.5
Out[106]:
14