Primer: Navigating the File System and Basic Read/Write¶
Understanding how to navigate the file system of your computer's operating system is essential, and it's a common source of errors for beginners (ahem.. as well as experience) Python programmers.
Let's walk through some basics.
Nomenclature
Working Directory refers to your current directory or folder.
We'll start by making a variable of our current working directory.
dir = 'C:\Users\phwh9568\Workshops\Python_Data_Camp' # this is mine, but yours will look different
dir = 'C:\Users\phwh9568\Workshops\Python_Data_Camp'
Cell In[44], line 1 dir = 'C:\Users\phwh9568\Workshops\Python_Data_Camp' ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Oops! What is going on here? Did anyone not get this error?
print('test\test')
print('test\test')
test est
Hm.... Turns out \
is an escape character.
What is an escape character?
print('test\test')
print('test\ttest')
print('test\ntest')
print('test\test')
test est
print('Hi, what's your name?')
print("Hi, what's your name?")
print('Hi, what\'s your name?')
print("Hi, what's your name")
Hi, what's your name
Microsoft just doesn't want to make things easy on us...
For Windows users, there are a couple of ways to get through this:
dir = 'C:\\Users\\phwh9568\\Workshops\\Python_Data_Camp' # annoying
dir = 'C:\\Users\\phwh9568\\Workshops\\Python_Data_Camp'
Now, where were we?
Okay, there are some differences worth noting about how paths work on Windows vs Mac.
\
is a back slash. Windows uses back slashes.
/
is a forward slash. Mac and Linux use forward slashes.
Why, Microsoft? WHY?!?!?!
However, rather than \\
, there is an easier way that can also make your file paths more interoperable:
dir = r'C:/Users/phwh9568/Workshops/Python_Data_Camp'
The r''
syntax is called a raw string.
dir = r'C:/Users/phwh9568/Workshops/Python_Data_Camp'
Let's now read a file.¶
For now, use string concatenation to construct the full path to a file:
file = r'C:/Users/phwh9568/Workshops/Python_Data_Camp' + r'/data/demofile.txt'
Watch out for typos...
file = r'C:/Users/phwh9568/Workshops/Python_Data_Camp' + r'/data/demofile.txt'
Now, we'll open that file using the open()
function. The first parameter is our file path, the second parameter 'r'
indicates we're opening in read mode:
open(file, 'r')
open(file,'r')
<_io.TextIOWrapper name='C:/Users/phwh9568/Workshops/Python_Data_Camp/data/demofile.txt' mode='r' encoding='utf-8'>
Okay, so not much happenning there... let's call it to a variable:
data = open(file,'r')
data = open(file,'r')
Okay, let's use the .read()
method:
data.read()
data.read()
'10\n20\n30\n40\n50\n60\n70\n80\n90\n100\n'
What's going on here?
Now, let's try one line at a time...
data.readline()
data.readline()
''
Hm.
In Python, when reading simple files, often you read line by line, and we happened to read them all in one go. So, we need to go back to the top:
data.seek(0) #Why the 0?
data.seek(0)
0
data.readline()
'30\n'
Let's close the file. It is good practice to always close your files when you're done manipulating them:
data.close()
data.close()
Okay, we'll come back to reading/writing files in a few minutes...
Making paths easier: the OS module.¶
Now that you have a fundamental understanding of paths, let's streamline.
import os
import os
OS is an extremely useful module for helping navigate your file system.
It is useful for creating paths, checking the existence of files or directories, checking your current working directory, generating lists of files, and more.
Rather than typing out paths manually (and risk typos and slash issues), we can have os construct them for us.
Let's start with some variables:
proj_dir = r'C:/Users/phwh9568/Workshops/Python_Data_Camp'
fileName = 'demofile.txt'
proj_dir = r'C:/Users/phwh9568/Workshops/Python_Data_Camp'
fileName = 'demofile.txt'
We'll construct our file path using os.path.join()
:
First make a data directory variable:
data_dir = os.path.join(proj_dir,'data')
Then make the path to the file:
f = os.path.join(data_dir,fileName)
data_dir = os.path.join(proj_dir,'data')
f = os.path.join(data_dir,fileName)
Check to see if it exists:
os.path.exits(f)
os.path.exists(f)
True
tip: There are a lot of other useful OS functions and methods, including os.mkdir()
and os.getcwd()
.
Let's check to make sure that all works:
- Print the path to make sure it looks right
- Open the file
- Read the contents of the file
- Close the file
print(f)
C:/Users/phwh9568/Workshops/Python_Data_Camp\data\demofile.txt
data = open(f)
data.read()
'10\n20\n30\n40\n50\n60\n70\n80\n90\n100\n'
data.close()
Let's open it this time in write mode:
data = open(f,'w')
data = open(f,'w')
data.read()
--------------------------------------------------------------------------- UnsupportedOperation Traceback (most recent call last) Cell In[30], line 1 ----> 1 data.read() UnsupportedOperation: not readable
Hm....
Okay, not readable....
Let's write a new line:
data.write('test')
data.write('test')
4
data.close()
Open demofile.txt separately... what happened here?
Woops. Got to be careful when using write mode! There can be unintended consequences.
Let's fix it. Start with a list of the values we want:
numList = [10,20,30,40,50,60,70,80,90,100]
numList = [10,20,30,40,50,60,70,80,90,100]
Or, alternatively, you could construct this using the range()
function.
list(range(0,100,10))
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
Now open our file again in write mode:
data = open(f,'w')
data = open(f,'w')
Start by constructing a loop of our number list:
for n in numList:
print(n)
for n in numList:
print(n)
10 20 30 40 50 60 70 80 90 100
Let's try adding the .write()
method into our loop:
for n in numList:
f.write(n)
f.close()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[37], line 2 1 for n in numList: ----> 2 f.write(n) 4 f.close() AttributeError: 'str' object has no attribute 'write'
Oops... needs to be a string. Okay...
We can use the str()
function to convert to string.
Test it:
type(str(1))
type(str(1))
str
Okay, back to our loop:
data = open(f,'w')
for n in numList:
data.write(str(n))
data.close()
Did that work?
Ehhhhh... close but not quite. What is missing here?
print('test\ntest')
test test
Try again:
data = open(f,'w')
for n in numList:
data.write(str(n)+'\n')
data.close()
Or, alternatively, you will often see files opened using a with
statement. This closes the file automatically when the statement ends.
with open(os.path.join(data_dir,fileName), 'w') as f:
for n in numList:
f.write(str(n)+'\n')
with open(os.path.join(data_dir,fileName), 'w') as f:
for n in numList:
f.write(str(n)+'\n')