Zenware Academy

Lessons

Introduction to Programming

5. Bonus: Python Code-example

In this bonus lesson, we take a closer look at a Python program that counts the number of words in a text file. This is what the program looks like when you run it:

Here we find the number of times that the words "human", "brain" and "zenware" appear in the book '1984' by George Orwell.

Before we dive into the code itself, here is a brief description of the overall approach to the solution:

Select which text file to analyze.
Review the text one line at a time and:
- Break the line into words.
- Increment the number of times each word is found.
Once the text is analyzed, the user can look up words as they wish.

Converted to Python code, the program could look something like this:

import re
frequencyOfWords = {}

bookName = input("Enter the book you'd like to search through: ")
file = bookName + ".txt"
book = open(file, "r", encoding="utf8")

for line in book:
  wordList = re.split('\; |\, |\. |\"|\! |\n| ', line)
  for word in wordList:
  	if word:
  		lowercaseWord = word.lower()
	  	timesCountedSoFar = frequencyOfWords.get(lowercaseWord, 0)
	  	frequencyOfWords[lowercaseWord] = timesCountedSoFar + 1

while True:
	searchedWord = input("Give me a word: ").lower()
	wordOccurrences = frequencyOfWords.get(searchedWord)
	if (wordOccurrences):
		print('"{}" occured {} times.'.format(searchedWord, wordOccurrences))
	else:
		print('The word "{}" was never used in the book: {}.'.format(searchedWord, bookName))

Below, the program is described in smaller parts, so you can get a better understanding of what the individual lines of code actually do:

import re
frequencyOfWords = {}

1. First, an additional "code package" (in this case 're') is imported, which can help us later in parsing text. In addition, an empty dictionary is created, which will act as a glossary for the words.

bookName = input("Enter the book you'd like to search through: ")
file = bookName + ".txt"
book = open(file, "r", encoding="utf8")

2. The user enters the name of the text file they wish to analyze. The file is then loaded so that it can be broken down into lines.

for line in book:
  wordList = re.split('\; |\, |\. |\"|\! |\n| ', line)

3. By using a 'for-loop' you can go through all lines in the text file. Here we use "Regular Expressions" from the 're' package to split the line into words ie. whenever we encounter a semicolon, a space, or a period.

for word in wordList:
  if word:
    lowercaseWord = word.lower()

4. Another 'for-loop' is used to examine each individual word. First, the word is transformed to use only lowercase letters (so that, for example, there is no difference between "human" and "Human").

timesCountedSoFar = frequencyOfWords.get(lowercaseWord, 0)
frequencyOfWords[lowercaseWord] = timesCountedSoFar + 1

5. The previously mentioned 'dictionary' is then used to find out how many times the word has appeared so far. The number is incremented before the dictionary is updated with the new value.

while True:
  searchedWord = input("Give me a word: ").lower()

6. Once the text file has been analyzed, the user can enter a word that they want to search for. They will continue to be able to do this until the program is closed.

wordOccurrences = frequencyOfWords.get(searchedWord)
if (wordOccurrences):
  print('"{}" occured {} times.'.format(searchedWord, wordOccurrences))
else:
  print('The word "{}" was never used in the book: {}.'.format(searchedWord, bookName))

7. If the 'dictionary' contains the word, then the number of times it has been found is shown on screen - otherwise, the user is informed that the word never appeared in the book.