You’ve probably come across several different programming languages as a data practitioner or someone new to the data/tech world. One of the first questions people usually ask is, which programming language should I learn first?
I’ll admit that many different technologies perform better in specific areas than others. For example, Scala builds big data Spark applications/pipelines well. At the same time, R is fantastic for statistical analysis, and Tableau is excellent for data visualization. I should also mention SQL. Even though it’s not a programming language, anyone who wants to work with data must know basic SQL!
But besides those technologies, I recommend focusing on a generalist programming language like Python. Here are 4 reasons why.
1. Python is Everywhere
Python is constantly ranked amongst the top 5 programming languages year after year according to various polls/blogs like IEEE Spectrum, TIOBE, Statista, and StackOverflow. It truly is ubiquitous; you can’t get away from it!
However, there is a good reason why Python is used in many fields. From hard sciences to finance and everything in between.
Firstly, Python is an open-source programming language with a large community of actively engaged developers. You will see a plethora of well-written documentation, video tutorials, conferences, talks, and regular updates with bug fixes and feature additions. There are also sub-communities within the Python ecosystem. For example, SciPy, an incredible package for statistical analysis, has a notable following and holds annual conferences. There are also PyData, PyTexas, PyCon, and many more meetings.
In addition to conventions focused solely on Python, there are data-centric summits in Data Science, Machine Learning, Data Engineering, Data Analysis, etc., and the prevalent programming language is Python!
Python is like a common language among different data practitioners. At my current job, I’ve had the pleasure of working with data scientists who’ve never coded in Scala, but we both understood Python. I’ve also worked with engineers who’ve programmed in Swift, a language I know nothing about–except it’s used in iOS development. Still, we both shared an understanding of Python.
2. Python is Easier to Pick Up
Python is a dynamically typed programming language, where everything is an object, and the object’s type is determined during runtime. It’s different from a statically typed language like Java, where you must define the type when declaring a variable, and the object’s type is verified during compile time.
Python also allows a variable to take on a different type as the code continues. For example:
value = 6
print(type(value))
This will print out: <class 'int'>
value = 6 * 1.0
print(type(value))
This will print out: <class 'float'>
Python also reads like written English. It’s a very intuitive language that allows new developers to quickly pick up the language. There isn’t a huge overhead when creating a Python script. Compared to a more verbose language like Java, Python is condensed yet can execute the same program written in Java with fewer lines of code.
Now there are downsides to Python’s flexibility. As data practitioners, we must be mindful of events like overwriting a variable. This can bite us in the butt if we aren’t careful. Imagine storing a Pandas data frame in a variable called df
and writing a function that accidentally corrupts the original data frame or sets df
to the value 10
π±! This could be disastrous, especially in an automated pipeline where other people downstream use your data frames or analyses. I’ll write a blog post in the future to help mitigate and prevent bugs like the one I mentioned above. But with flexibility comes minor logic bugs that can lead to horrible results.
Don’t be discouraged, though. There isn’t a perfect programming language, and at least learning Python is fun.
3. Python is Fun to Learn
I always see videos and blog posts on how learning to code is daunting and time-consuming. Still, if you’re the creative type, I think programming can be sort of therapeutic.
There will always be a learning curve; going over that hurdle can be challenging. But I encourage you to take things slowly and enjoy the process of learning. After understanding the basics like variable manipulation, conditional statements, looping, fundamental data structures, defining functions, file input and out, and the big picture idea of object-oriented programming, you can focus on learning by doing.
The list of fundamental topics I mentioned above can’t be done in a day π . You should give yourself 3-6 months to dig into the boring introductory concepts. I promise you won’t regret it π because you can start diving into areas of interest afterward. Whether it be data science, data analysis, data engineering, mathematical simulation, financial engineering, or any other field, you’ll need a good understanding of the core concepts that make up Python and computer science in general.
Along the way, you’ll be learning a new skill set that will improve your creativity and logical thinking. There’s also the added benefit of potentially increasing your pay and being a part of a community of like-minded and brilliant developers!
The sorts of projects you can create with Python are amazing, and this is due to the community and vast amount of libraries/packages.
4. Python is Well Equipped
I remember in the early 2010’s Apple would have these commercials where a narrator would talk; as he spoke, there was a series of “there’s an app for that.” He’d then conclude with, “yup, there’s an app for just about anything.” I think the phrase there’s an app for that later became a meme π .
But Python is similar. If you have an idea for a projectβlet’s say you want to create a machine learning model using Natural Language Processing or create a beautiful dashboard to capture KPIs intuitively, there’s an app for that! And by app, I mean library.
I started off the list with how Python is widespread and its community is actively engaged. This pours into why there’s an abundance of libraries that can be used for wildly different things.
At work, I’ve used the Pandas library for exploratory data analysis, SciPy for statistical modeling, PySpark for creating big data Spark applications, SQLAlchemy to read/write to Postgres tables, the Python wrapper of Airflow for orchestrating tasks, and Boto3 for creating transient AWS EMR clusters that run Scala-Spark jobs!
Those are some of the functionalities I thought of at the top of my head. Still, my work only utilizes a small subset of what Python can do. Python is powerful!
You can also improve existing libraries or create your own if need be. Python is an open-source programming language that gives you that flexibility.
Python is a popular programming language used by many companies and trusted by many professionals. If you were wondering which language you should pick up first as someone new to data/tech, I hope I’ve influenced your decision to choose Python. Or at least present multiple points to consider as you embark on your coding journey!
Just remember to take things slow and enjoy the learning. Create milestones rather than deadlines, and best of luck ππΎ!