Learning a programming language like Python to the point where we become an “expert” at it is a lifetime process. You’ve probably been working with Python for a while now, and you’ve noticed that the code you come up with feels…entry-level. Maybe it’s the way you’re debugging, or how you structure your functions.

After years of reviewing PRs and mentoring adults looking to transition into data science or software engineering, I’ve noticed a few anti-patterns that immediately signal someone’s experience level. I know I’ve done the things I’m about to mention before when I was still getting the hang of Python 😅. The good news is that these tips aren’t about memorizing complex algorithms or design patterns. They’re simple yet practical, everyday habits that make your code more professional and easier to maintain.

1. Print Statements Everywhere → Use Logger Instead

I can spot an entry-level developer from a mile away when I see print statements scattered throughout their code. We’ve all been there, quickly throwing in a print("HERE") to debug something. But print statements vanish the second your script stops running, which means when something breaks in production (and it will), you’ll have a hard time following that code’s trail.

What a Junior Dev Would Do

print("Starting data processing...")
print(f"Processed {count} records")
print("ERROR: Something went wrong!")

Logging is a game-changer. You get severity levels (INFO, WARNING, ERROR), you can write to files, and you can control what gets logged without touching your code. When your pipeline crashes at 3 AM, you’ll actually have the information you need to debug it. And honestly, using logging looks more professional.

How to Level-Up

import logging

logger = logging.getLogger(__name__)

logger.info("Starting data processing...")
logger.info(f"Processed {count} records")
logger.error("Error processing data", exc_info=True)

ML Pipeline Pro Tip: Logging becomes extremely critical when you’re running ML pipelines in production. Proper logging feeds directly into your observability stack. We use Loki for log aggregation and Grafana for visualizing metrics and it’s been amazing for monitoring our model training and inference pipelines.

Here’s how it works: Loki ingests your structured logs, and then you can query and visualize them in Grafana alongside your metrics from Prometheus. So when your model’s accuracy suddenly drops or your data drift detection fires, you can:

  • Search logs by severity, timestamp, or custom fields
  • Correlate log events with performance metrics
  • Set up alerts based on specific log patterns (like error rate spikes)
  • Trace issues across your entire distributed pipeline

Structured Logging Example

import logging
import json_log_formatter

formatter = json_log_formatter.JSONFormatter()
handler = logging.StreamHandler()
handler.setFormatter(formatter)

logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("Model training started", extra={
    'model_version': 'v2.1',
    'dataset_size': 10000,
    'batch_size': 32
})

Now Loki can parse these fields and you can query something like “show me all training runs where dataset_size > 5000” or “alert me when error logs mention ‘OutOfMemory’.”

For metrics like training duration, accuracy scores, or resource usage, you’d instrument those separately with something like prometheus_client to send to Prometheus. But having detailed, structured logs in Loki means when something goes wrong, you have the full context. Not just a metric going down, but WHY it went down.

I’ve literally saved myself hours of debugging time this way. Print statements could never do any of that 🫰🏾.

2. Mystery Function Parameters → Use Type Hints Instead

I remember code reviews where I’d spend half the time just trying to figure out what types functions expected. Was that parameter a list of strings or a single string? Did the function return a dataframe or a dictionary? If there wasn’t proper documentation, the guessing game was mentally taxing.

What a Junior Dev Would Do

def calculate_metrics(data, threshold):
    results = []
    for item in data:
        if item > threshold:
            results.append(item * 2)
    return results

Without type hints, anyone reading your function (including future you) has to play detective. Is data a list? A numpy array? A dataframe? Does it return the same type it receives?

You’re forcing people to read through the entire function body just to understand the interface.

Type hints make your code self-documenting. Your IDE gets smarter. Autocomplete works properly, refactoring becomes safer, and you catch type-related bugs before they hit runtime. Tools like mypy can validate your types automatically in CI/CD, catching mismatches before code review.

How to Level-Up

from typing import List

def calculate_metrics(data: List[float], threshold: float) -> List[float]:
    results: List[float] = []
    for item in data:
        if item > threshold:
            results.append(item * 2)
    return results

ML Pipeline Pro Tip: Type hints are a lifesaver in ML pipelines where you’re constantly juggling DataFrames, numpy arrays, tensors, and different model types. I can’t tell you how many times I’ve seen bugs caused by passing a pandas Series when the function expected a numpy array, or getting tensor shape mismatches that could’ve been caught early. (I briefly mentioned bugs related to type issues in my blog post: Beware of the Mutants under Pitfall #1: Reassignment, check it out for more context).

ML Example

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from typing import Tuple

def preprocess_features(
    df: pd.DataFrame, 
    feature_cols: List[str]
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Preprocess features and split into train/test.
    
    Returns:
        Tuple of (X_train, X_test) as numpy arrays
    """
    X = df[feature_cols].values  # Now it's clear we return numpy arrays
    # ... preprocessing logic
    return X_train, X_test

def train_model(
    X: np.ndarray, 
    y: np.ndarray
) -> RandomForestClassifier:
    """Type hints make it crystal clear what goes in and out."""
    model = RandomForestClassifier(n_estimators=100)
    model.fit(X, y)
    return model

Notice how it’s immediately obvious that preprocess_features returns numpy arrays, not DataFrames? And train_model expects numpy arrays as inputs? This prevents so many shape mismatch errors and “ValueError: could not convert string to float” nightmares.

Bonus: When working with PyTorch or TensorFlow, be specific about tensor types:

import torch
from torch import Tensor

def forward_pass(x: Tensor, model: torch.nn.Module) -> Tensor:
    return model(x)

Your teammates (and future you) will thank you when they don’t have to trace through 10 files to figure out what type something is.

3. Code Catches Fire → Use Proper Exception Handling

Many moons ago, I remember deploying a script that ended up crashing around 11 PM because a file wasn’t where it was supposed to be. My script was looking for different files within an S3 bucket, however, I didn’t write up any error handling because I ASSUMED the files would always be there and have the correct name… never assume anything when it comes to software development because you make an 🫏 out of you and me.

What a Junior Dev Would Do

def load_model(path):
    model = pickle.load(open(path, 'rb'))
    return model

Unhandled exceptions crash your entire code or pipeline with cryptic error messages. Proper exception handling lets you anticipate failure modes, log meaningful context, clean up resources properly (notice the with statement for file handling?), and decide whether to fail fast or attempt recovery.

How to Level-Up

def load_model(path: str) -> Optional[object]:
    try:
        with open(path, 'rb') as f:
            model = pickle.load(f)
        logger.info(f"Successfully loaded model from {path}")
        return model
    except FileNotFoundError:
        logger.error(f"Model file not found: {path}")
        raise
    except Exception as e:
        logger.error(f"Failed to load model: {e}", exc_info=True)
        raise

You can catch specific exceptions differently. Maybe a missing file should alert someone immediately while a temporary network hiccup should retry a few times. Your team and manager will thank you when debugging production issues becomes straightforward instead of archaeological work.

4. Hardcoded Values → Use Configuration Files

Nothing screams “junior developer” louder than hardcoded database credentials or API keys directly in the source code. I’ve seen this cause security incidents, deployment headaches, and countless hours of wasted time.

What a Junior Dev Would Do

def connect_to_database():
    conn = psycopg2.connect(
        host="localhost",
        database="ml_models",
        user="admin",
        password="super_secret_123"
    )
    return conn

Hardcoding values, especially values that should be stored as secrets, is a security nightmare waiting to happen. One accidental push to GitHub and your database credentials are public forever (even if you delete the commit, it’s in the history).

How to Level-Up

import os
from dataclasses import dataclass

@dataclass
class DBConfig:
    host: str = os.getenv("DB_HOST", "localhost")
    database: str = os.getenv("DB_NAME")
    user: str = os.getenv("DB_USER")
    password: str = os.getenv("DB_PASSWORD")

def connect_to_database(config: DBConfig):
    conn = psycopg2.connect(
        host=config.host,
        database=config.database,
        user=config.user,
        password=config.password
    )
    return conn

Beyond security, hardcoding makes your code inflexible. You need different credentials for dev, staging, and production environments. Environment variables let you use the exact same code everywhere while just swapping out the config. You can rotate passwords without redeploying code, adjust batch sizes without diving into source files, and sleep better knowing your secrets aren’t committed to version control.

💡 Pro Tip: Add .env to your .gitignore immediately.
Trust me on this one.

5. Comments Only → Use Docstrings

Comments are helpful, but docstrings are transformative. I learned this the hard way when I was onboarding onto a new project and tried to understand a codebase with zero documentation beyond sparse inline comments. Literally one of the worst experiences I’ve had within my career as a software engineer.

What a Junior/Lazy Dev Would Do

# This function trains a model
def train(X, y, params):
    # Create model
    model = XGBClassifier(**params)
    # Fit it
    model.fit(X, y)
    return model

Docstrings create automatic documentation that tools like Sphinx can turn into beautiful documentation sites. They show up in IDE tooltips when you hover over function names, work with Python’s help() function, and serve as a contract for what your function does.

How to Level-Up

def train(X: np.ndarray, y: np.ndarray, params: dict) -> XGBClassifier:
    """
    Train an XGBoost classifier on provided data.
    
    Args:
        X: Feature matrix of shape (n_samples, n_features)
        y: Target vector of shape (n_samples,)
        params: XGBoost hyperparameters
        
    Returns:
        Trained XGBoost classifier
        
    Raises:
        ValueError: If X and y have mismatched dimensions
    """
    if len(X) != len(y):
        raise ValueError(f"Dimension mismatch: X has {len(X)} samples, y has {len(y)}")
    
    model = XGBClassifier(**params)
    model.fit(X, y)
    logger.info(f"Model trained on {len(X)} samples")
    return model
📝 Here’s the distinction: comments explain why you made certain choices, docstrings explain what the function does and how to use it. When you’re onboarding a new team member or coming back to your own code six months later, docstrings got your back.

Your Level-up Game Plan 💪🏾

If you’ve noticed that you’re guilty of committing a few of these anti-patterns, that’s awesome. You’re a self-aware engineer who can take feedback…we need more of those 😌.

Don’t try to fix everything at once, that’s a recipe for getting overwhelmed. Pick ONE habit this week. I’d start with logging or using docstrings because you’ll immediately see the benefits when debugging or documenting your code.

Once logging feels natural (give it a week or two), add type hints or adopt some other habit. Small, consistent improvements compound over time.

Remember: Every senior developer started exactly where you are right now. The difference? They committed to leveling up one habit at a time.

The code you write tomorrow will already look more professional than the code you wrote yesterday. That’s the goal. Keep building and happy coding! 🚀