Getting up and running with

2024 Research Data Camp




Phil White

Earth, Environment & Geospatial Librarian

cu-boulder-crdds.github.io/Research-Data-Foundations-Camp-2024/

Contents

  • Python: What is it and why?
  • Distributions, Environments, Package Management
  • Scripts and Notebooks

What is Python?

  • A programming language... or scripting language.
  • Free and open source.
  • Human-readable syntax.
  • Object-oriented...
  • Extremely popular for "data science" applications.

Why Learn Python?

  • Pretty easy to learn! (Nevermind me throwing my equipment across the room.)
  • Sought after skill.
  • Excellent data science libraries including Pandas, Numpy, so much more.
  • General purpose makes it extremely versatile.

Why use Python for your research?

  • Well, if you're not coding... obvious benefits.
  • Efficient! Replicable! Reproducible!
  • Open code = open methods = better science
  • Scalable... run your code on a super computer!
  • We're moving toward code/notebooks as scientific communication. Data, methods, product all in one.

Revolutionize the way you work.

Now, some practical bits.

  • Python is a scripting language...
  • ...What is a script?

  • Great! Now I know what a script is!
  • What about this "Notebook" stuff?
  • A "Notebook" is like an interactive, step-by-step, code executing tool...
  • And also a document and scientific communication media...
  • Like, a methods section, results section, (whole paper?) that documents itself sort of...
  • I should just show you.
  • The most popular Python Notebooking tool is Jupyter.
  • Second option: Google Colab (web based)

Scripts vs Notebooks

Notebooks are great for:

  1. Cell by cell executing.
  2. Including context and documentation in Markdown cells.
  3. Viewing outputs and visualizations.
  4. Immediate feedback is useful for learning, isolating errors, debugging, etc.
  5. Analysis when used as a medium of communication.
  6. Futzing around, hacking stuff.
  7. Teaching!

Scripts vs Notebooks

Use scripts when:

  1. You need to "set it and forget it"... or automating at a large scale.
  2. You're creating your own module (aka library, package)... You start writing more functions, classes, methods.
  3. You need parallel processing.
  4. Eventually, you realize the utility of modularizing your code...
  5. There are many times when scripts are better, maybe not until you're more advanced.

More practical bits:

Package managers, dependencies, environments.

Q: How do I install a Python package?


  • A: use a package manager!
  • Package managers are just commands that help you install packages
  • pip install scipy
  • (Maybe?) The most commonly used is pip... DON'T DO IT!

Why not use pip?


Because dependency issues.

Use Conda! (Anaconda)

  • Conda installs packages but also resolves dependency issues.
  • conda install emoji
  • You run these commands in your command line. More on that later.

Sometimes, different python packages conflict!

Conda environments

  • Environments allow you to have multiple Python "stacks"
  • One environment for general purpose, one for geospatial, one for astronomy, etc.
  • As your "stack" gets more complex, it becomes less general purpose, so useful to isolate it.
  • Environments can also be shared with others, so they can replicate your exact Python setup.
  • conda create --name new_env python=3.12

A few words on errors and debugging...


Enough talk. Let's press some buttons.