# Introductory Python for Humanists + Text Analysis

[Last Updated: May 4, 2020]

This tutorial will introduce you to the Python programming language and how it can be used for basic text analysis.
No programming experience is necessary.

### 1) Start a new Trinket project

1.1) You can either use the embedded trinket on the right side of this page or work in a new tab at trinket.io/python3.

1.2) You will write your code under "main.py" in the editing tab, and your result will show up under the play tab.

### 2) Test out the trinket

2.1) Write the following in the trinket editing tab under "main.py":
```print("hello world")
```
2.2) Run your code by pressing the play button at the top. In the result box, you should get hello world.

2.3) Go back to the editing tab and delete your code.

2.4) Try doing some math:
```print(2+2)
```

### 3) Create variables

3.1) Create a string variable:
```greeting = "hello"
print(greeting)
```
3.2) Create a float variable:
```myNumber = 3.14159
print(myNumber)
print(myNumber + 10)
```
3.3) Create an array:
```myArray = [1,2,3,4,5]
print(myArray)
```

### 4) Write an If block

4.1) Test if two numbers are equal:
```myFirstNum = 5
mySecondNum = 10
if (myFirstNum == mySecondNum):
print("My numbers are equal.")
else:
print("My numbers are not equal.")
```
4.2) Try changing the values of myFirstNum and mySecondNum to make sure the test works.

### 5) Write a For loop

```myArray = [2,4,5,9,14]
for i in myArray:
print(i+1)
```

### 6) Now let's start doing things with strings

6.1) Define two string variables:
```greeting = "Hello, "
animal = "puppies"
```
6.2) Concatenate the strings:
```greeting = "Hello, "
animal = "puppies"
mySentence = greeting + animal + "!"
print(mySentence)
```
The result should be:
Hello, puppies!

6.3) Count the characters in a string:
```greeting = "Hello, "
animal = "puppies"
mySentence = greeting + animal + "!"
print(mySentence)
print(len(mySentence))
```
The result should be:
Hello, puppies!
15

6.4) Concatenate a string and an integer:
```greeting = "Hello, "
animal = "puppies"
mySentence = greeting + animal + "!"
charCount = len(mySentence)

print(mySentence)
print(str(charCount) + " characters")
```
str() turns the integer, charCount, into a string so it can be concatenated with the string " characters".

The result should be:
Hello, puppies!
15 characters

6.5) Count how many times the letter P occurs in the sentence:
```greeting = "Hello, "
animal = "puppies"
mySentence = greeting + animal + "!"
charCount = len(mySentence)
p_count = mySentence.count('p')

print(mySentence)
print(str(charCount) + " characters")
print(str(p_count) + " Ps")
```
Note: count() is case-sensitive, so count('p') will only count lower-case Ps.

The result should be:
Hello, puppies!
15 characters
3 Ps

### 7) Do things with an array of strings

7.1) Delete all of your code.

7.2) Make an array of strings:
```names = ['ben','alice','evan','doug','cat']
print(names)
```
7.3) Add a name to the array:
```names = ['ben','alice','evan','doug','cat']
names.append('frank')
print(names)
```
7.4) Remove a name from the array:
```names = ['ben','alice','evan','doug','cat']
names.append('frank')
names.remove('evan')
print(names)
```
7.5) Sort the array alphabetically:
```names = ['ben','alice','evan','doug','cat']
names.append('frank')
names.remove('evan')
names.sort()
print(names)
```
The result should be:
['alice', 'ben', 'cat', 'doug', 'frank']

7.5) Loop over the elements in the array:
```names = ['ben','alice','evan','doug','cat']
names.append('frank')
names.remove('evan')
names.sort()

for i in names:
print(i+" is awesome!")
```
The result should be:
alice is awesome!
ben is awesome!
cat is awesome!
doug is awesome!
frank is awesome!

7.6) Add an If block to the loop:
```names = ['ben','alice','evan','doug','cat']
names.append('frank')
names.remove('evan')
names.sort()

for i in names:
if (i == "doug"):
print(i+" is awesome!")
else:
print(i+" is okay.")
```
The result should be:
alice is okay.
ben is okay.
cat is okay.
doug is awesome!
frank is okay.

### 8) Measure word frequencies

8.1) Delete all of your code.

8.2) Import Counter a python library for counting:
```from collections import Counter
```
8.3) Make a variable containing the entire text of the following article:

```from collections import Counter

```
8.4) Count the number of times "Google" occurs in the article:
```from collections import Counter

```
The result should be: 9

8.5) Split the article into an array of individual words and get a word count:
```from collections import Counter

article1_array = article1.split()
print(len(article1_array))
```
The result should be: 661

8.6) Find the 10 most common words in the article:
```from collections import Counter

article1_array = article1.split()

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)
```
The result should be:
[('the', 36), ('a', 24), ('to', 15), ('in', 14), ('and', 14), ('of', 12), ('on', 9), ('that', 8), ('for', 8), ('is', 7)]
This is an array of the ten most common words and the number of times they occur in the article.
Unfortunately, they're small words that aren't very useful.

8.7) Remove words smaller than 6 characters:
```from collections import Counter

article1_array = article1.split()

for i in reversed(article1_array):
if (len(i) < 6):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)
```
The result should be:
[('Google', 6), ('camera', 5), ('looking', 4), ('search', 3), ('information', 3), ('through', 3), ('worked', 2), ("Young's", 2), ('iPhone', 2), ('within', 2)]

8.8) Keep small words that are important, like "AR":
```from collections import Counter

article1_array = article1.split()

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)
```
The result should be:
[('Google', 6), ('camera', 5), ('looking', 4), ('search', 3), ('information', 3), ('through', 3), ('worked', 2), ("Young's", 2), ('iPhone', 2), ('within', 2)]

Note that when we originally counted "Google" using article1.count("Google"), we found 9. And now, using most_common(), we are only finding 6. That's because split() makes an array of strings separated by a space, so when we use count("Google") on the array we created using split(), it will only find " Google " and not "Google." or "Google,". As part of cleaning your data, you will probably want to remove unnecessary characters, like punctuation. You may also want to convert all text to lowercase using lower().

### 9) Add a second article for comparison

9.1) Add the following article as a variable:
```from collections import Counter

article1_array = article1.split()

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)
```
9.2) For each of the 10 most common words in article1, see how many times it occurs in article2:
```from collections import Counter

article1_array = article1.split()

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)

for i in article1_frequentWords:
print(i + ": " + str(article2.count(i)))
```
The result should be:
[('Google', 6), ('AR', 6), ('camera', 5), ('looking', 4), ('search', 3), ('information', 3), ('through', 3), ('worked', 2), ("Young's", 2), ('iPhone', 2)]
AR: 7
camera: 2
looking: 1
search: 5
information: 0
through: 0
worked: 0
Young's: 0
iPhone: 0

### 10) Make a chart

10.1) Import numpy and pyplot, python libraries for math and plotting:
```from collections import Counter
import numpy as np
from matplotlib import pyplot as plt

article1_array = article1.split()

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)
print(article1_frequentWords)

for i in article1_frequentWords:
print(i + ": " + str(article2.count(i)))
```
10.2) Delete your two print statements, make blank arrays for x and y data, and fill the arrays with words and frequencies from Article 1:
```from collections import Counter
import numpy as np
from matplotlib import pyplot as plt

article1_array = article1.split()
data_x = []
data_y = []

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)

for i in article1_frequentWords:
data_x.append(i)
data_y.append(i)
```
10.3) Chart your words and frequencies for Article 1:
```from collections import Counter
import numpy as np
from matplotlib import pyplot as plt

article1_array = article1.split()
data_x = []
data_y = []

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)

for i in article1_frequentWords:
data_x.append(i)
data_y.append(i)

x_pos = np.arange(len(data_x))
plt.bar(x_pos,data_y)
plt.xticks(x_pos, data_x, rotation='vertical')
plt.tight_layout()
plt.show()
```
The result should look like this: 10.4) Add frequency data for the second article and a legend:
```from collections import Counter
import numpy as np
from matplotlib import pyplot as plt

article1_array = article1.split()
data_x = []
data_y = []
data_y2 = []

for i in reversed(article1_array):
if ((len(i) < 6) & (i != "AR")):
article1_array.remove(i)

count = Counter(article1_array)
article1_frequentWords = count.most_common(10)

for i in article1_frequentWords:
data_x.append(i)
data_y.append(i)
data_y2.append(article2.count(i))

x_pos = np.arange(len(data_x))
plt.bar(x_pos-.2, data_y, width=.4)
plt.bar(x_pos+.2, data_y2, width=.4)
plt.xticks(x_pos, data_x, rotation='vertical')
plt.tight_layout()
plt.legend(labels=['article1','article2'])
plt.show()
```
The result should look like this: 