⚠️ Notice: I’m referencing Python3 throughout this blog.

At the beginning of your coding journey, you start with simple projects to solidify your understanding. You’re probably downloading packages and exploring different functionalities at this stage but aren’t concerned with dependencies.

Fortunately, we live in an era where code re-usability is prominent and easy to access. We don’t need to worry about reinventing the wheel. We can focus on our projects or add/improve existing libraries. But at some point, dependency management will need to be considered.

Managing dependencies is something you can’t get away from (unless you write all the code yourself and you never ever call an external library πŸ™„); it’s almost as certain as death and paying taxes. Since this blog is geared towards data practitioners and those who want to get into data, I’ll focus on an essential tool when developing code in Python.

What is a Virtual Environment?

A virtual environment is a self-contained space within your computer that allows you to download different versions of different packages.

Pandas Profiling

Here’s a concrete example. A couple of months ago, I was doing some analysis and wanted to use a Python package I hadn’t used before called Pandas Profiling. I tried to use this package to automate some initial data quality checks.

The version I installed at the time was 3.1.0, and a particular dependency (jinja2 version 3.1.0) gave me grief.

Image from sundaefundae. If you get the references here’s a πŸ’œ

It turned out that there’s a module in Pandas Profiling named templates.py that uses jinja2 to create HTML profile reports.

 from pandas_profiling import ProfileReport

So when I tried running the line of code above, I ran into this error: ImportError: cannot import name 'escape' from 'jinja2.utils'. Unfortunately, the escape function was removed from jinja2 when they released version 3.1.0.

There were a couple of ways to resolve the issue, and I decided to downgrade jinja2 to version 3.0.3; this was the latest version that still included escape.

Fortunately, I created a virtual environment before starting my project. Imagine if I had worked on another project in early 2022 that relied on jinja2 version 3.1.0 but didn’t give me any errors. I’d continue without care until I was faced with the pandas_profiling fiasco.

What would I have done in that situation? Downgrading jinja2 could have adverse effects on the earlier project. Still, I needed to downgrade to solve the dependency issue.

Hence why, virtual environments are essential. They allow you to separate Python packages and install the versions you need. Without them, we would be one step closer to falling into dependency hell. Our computer would be a nightmare of conflicting packages! We’d rather have the nightmare localized in a smaller folder.

Visualize your libraries and their dependencies as wires. Would you want to deal with a tangled mess of conflicting packages? Where would you even begin to fix the problem?

Virtual environments allow us to store and call libraries without modifying our computer’s main site-package directory. So if we mess up with the dependencies, at least it’s at a smaller scale πŸ˜….

Reproducibility

Using virtual environments also allows us to have a reproducible workspace. Data practitioners like Data Scientists and ML Engineers will need to be able to share their work/findings with their peers and stakeholders.

Having the mindset of: but it worked my machine is πŸ’©. Don’t be that person! You weren’t hired to have models and algorithms only work on your computer.

How do Virtual Environments Work?

When you create a virtual environment, your local machine’s main Python folder structure is copied into the current working directory. A sub-directory will appear and in that sub-folder is another sub-directory called bin (a lot of nested folders). Python will create a symbolic link to the system’s main Python folder within the bin directory.

In the example below, I named my virtual environment env.

Highlighted in purple is the activate script; this will activate your virtual environment. There may be several activate scripts, but you’ll need to run the version compatible with your shell. For example, I use a bash shell so activate is what I’ll use. If I was using PowerShell, then Activate.ps1 would be used.

Highlighted in blue is the lib folder; this will hold your local python version, packages, and modules. Highlighted in orange is the site-packages folder; this is where our installed packages and modules go.

Whenever we use an import statement in our Python scripts, the virtual environment will look for the lib folder relative to its PATH when searching for the site-packages directory.

In this case ./lib/python3.8/ will be used. 

|____env
| |____bin
| | |____activate
| | |____pyproj
| | |____pyftsubset
| | |____pip3.8
| | |____jupyter-run
| |____lib
| | |____python3.8
| | | |____site-packages

πŸ“ Note: a symbolic link (also called a symlink) is a shortcut to a directory. It’s a great way to simplify access to a folder since you don’t need to worry about typing absurdly long paths!

πŸ“ Note:PATH is an environment variable that contains the locations of executable commands; this is where our virtual environments will search when they need to pull in a module or package.

How do We Create Virtual Environments?

There are three methods I can think of off the top of my head.

  • Anaconda
  • venv
  • virtualenv


venv is the one I use most often, and it comes with the standard Python library; you don’t need to install it since you got it when you first downloaded Python to your computer.

Assuming you’re in a different directory that’ll store your Python project, you’d call: python3 -m venv env to create a virtual environment named env.

To activate env in bash call: source venv/bin/activate on PowerShell it’s: .\venv\Scripts\Activate.ps1. To deactivate the current session, we call: deactivate in our terminal.


virtualenv is an external library that was created to address some of the gripes associated with venvβ€”for example, permission handling and having slower execution times.

You’d call: virtualenv -p python3.9 env to create a virtual environment named env that will use Python version 3.9 and source env/bin/activate to activate it (via bash). Similar to venv we call deactivate to deactivate the current session.

You can read more about virtualenv here. However, if you’re relatively new to programming, I’d suggest sticking with venv or Anaconda.

For venv and virtualenv, you could use a package manager like pip to install packages.


Anaconda is a commercial distribution of Python (there is a free tier for students and hobbyists). You’re more likely to encounter it in a university setting or industry since it is adored by data practitioners and companies worldwide. It uses conda for handling environments and dependencies. You can think of conda as a fusion between pip and venv.

We can call: conda create --name myenv python=3.x to create a virtual environment called myenv using a Python version of our choice. We then call conda activate myenv to activate myenv and conda deactivate to deactivate the current session. You can learn more about Anaconda here.

Pip or Conda?

Generally, I’d recommend sticking to one package manager when downloading custom libraries. But sometimes rules are broken, and that’s okay 😌.

Virtual environments are crucial for having clean, reproducible Python workspaces. Fortunately, the learning curve isn’t steep, and the more you practice using it, the more it’ll become habitual.

If this is your first time working with virtual environments, I recommend sticking with venv. If you want to get into data science/machine learning, start with venv and then work your way to Anaconda.

Happy coding πŸ‘πŸΎ.