A Beginner's Guide to NumPy, My 3 years Experience in Data Science

Table of Contents

When I first started my journey into the world of AI and data science, I heard the term “NumPy” thrown around everywhere. It seemed like every tutorial, every expert, and every project required it. At first, I was a little intimidated. What was this library, and why was it so essential?

Over the years, I’ve come to understand that NumPy is the foundational building block for almost all numerical computing in Python. It’s the engine that powers libraries like Pandas, Scikit-learn, and TensorFlow. Without NumPy, data science in Python would be incredibly slow and difficult.

In this guide, I want to take you on a journey from knowing absolutely nothing about NumPy to being able to use its core features with confidence. We’ll start with the “why,” move to the “how,” and then work on a small, practical project together.

What Exactly is NumPy?

Basically, NumPy (which stands for Numerical Python) provides a powerful object called the N-dimensional array, or ndarray. You can think of it as a super-fast, flexible container for numbers.

“But wait,” you might say, “I can already store numbers in a Python list!” And you’re absolutely right. However, standard Python lists have a big problem when it comes to data science: they are slow and inefficient for mathematical operations.

NumPy arrays, on the other hand, are designed from the ground up to be lightning fast for a few key reasons:

Efficiency: They store data in a single, contiguous block of memory, which allows the CPU to access it very quickly.
Vectorization: Instead of writing complex for loops to perform an operation on every number in a list, NumPy lets you do it all at once with a single, simple command. This is called “vectorization” and it’s where NumPy gets its speed.

Getting Started with NumPy

The first step is to get the library installed. You’ll typically do this using a package manager like pip in your terminal or command prompt.

				
					pip install numpy

Once it’s installed, you just need to import it into your Python script. It’s a convention in the data science community to import NumPy with the alias np. I’ve been doing this for years, and it makes the code cleaner and easier to read.

				
					import numpy as np

Creating Your First NumPy Arrays

Let’s dive into the core of NumPy by creating some arrays. You can create an ndarray from a regular Python list, or use one of NumPy’s many built-in functions.

1-Dimensional Array (Vector): This is the simplest type of array. It’s like a single row of numbers.

				
					# Create a 1D array from a Python list
my_list = [10, 20, 30, 40, 50]
my_array = np.array(my_list)
print("My 1D array:", my_array)

2-Dimensional Array (Matrix): This is where NumPy really starts to shine. A 2D array is a grid of numbers, like a spreadsheet.

				
					# Create a 2D array from a list of lists
my_2d_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_2d_array = np.array(my_2d_list)
print("My 2D array:\n", my_2d_array)

Using Built-in Functions: NumPy provides functions to quickly create arrays with pre-filled values. This is incredibly useful for initializing data.

				
					# Create an array of zeros
zeros_array = np.zeros(5)
print("Array of zeros:", zeros_array)

# Create an array of ones with a specific shape
ones_array = np.ones((2, 3))
print("Array of ones:\n", ones_array)

# Create an array with a range of numbers
range_array = np.arange(10, 21) # Creates a sequence from 10 to 20
print("Array with a range:", range_array)

# Create an array with evenly spaced numbers
space_array = np.linspace(0, 10, 5) # Creates 5 numbers evenly spaced between 0 and 10
print("Array with even spacing:", space_array)

The Power of NumPy: Fast Operations

Now, let’s see why NumPy is so powerful. Imagine you want to double every number in a list. With a regular Python list, you have to use a for loop or a list comprehension. With NumPy, you just use a single operator.

				
					# The slow way with a Python list
python_list = [1, 2, 3, 4, 5]
doubled_list = []
for number in python_list:
    doubled_list.append(number * 2)
print("Doubled Python list:", doubled_list)

# The fast way with a NumPy array
numpy_array = np.array([1, 2, 3, 4, 5])
doubled_array = numpy_array * 2
print("Doubled NumPy array:", doubled_array)

This simple example shows the elegance of NumPy’s vectorized operations. It’s not just cleaner; it’s also much, much faster on large datasets. This is the core reason it’s so fundamental to data science.

Indexing and Slicing Arrays

Accessing specific elements or groups of elements is a key skill. NumPy handles this with a familiar syntax, but with some extra power for multi-dimensional arrays.

				
					# Let's use our 2D array from before
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Access a single element (row, column)
# Remember, indexing starts at 0!
print("Element at row 1, column 2:", my_2d_array[1, 2]) # This will be the number 6

# Get an entire row
print("Second row:", my_2d_array[1, :])

# Get an entire column
print("Third column:", my_2d_array[:, 2])

# Slice a part of the array
print("A 2x2 section:\n", my_2d_array[0:2, 0:2])

The Importance of Shape and Reshaping

The “shape” of an array is its dimensions (rows, columns, etc.). Understanding and changing the shape is crucial, especially when feeding data into a machine learning model that requires a specific input format.

You can check an array’s shape with the .shape attribute and change it with the .reshape() method.

				
					# Create a 1D array with 12 elements
my_1d_array = np.arange(1, 13)
print("Original 1D array:", my_1d_array)
print("Original shape:", my_1d_array.shape)

# Reshape it into a 3x4 matrix
my_2d_matrix = my_1d_array.reshape(3, 4)
print("Reshaped 3x4 matrix:\n", my_2d_matrix)
print("New shape:", my_2d_matrix.shape)

# You can also reshape it back or into other compatible shapes
my_other_shape = my_1d_array.reshape(2, 6)
print("Reshaped 2x6 matrix:\n", my_other_shape)

Putting It All Together: A Simple Sales Analysis

Let’s combine what we’ve learned into a practical, real-world example. Imagine we have sales data for three products over four quarters.

The Task:

Store the sales data in a NumPy array.
Calculate the total sales for each product.
Calculate the average sales for each quarter.

Here’s how I would tackle this with NumPy.

				
					# Step 1: Create a NumPy array for sales data (in thousands)
# Rows are products (Product A, B, C)
# Columns are quarters (Q1, Q2, Q3, Q4)
sales_data = np.array([
    [50, 60, 75, 80],   # Sales for Product A
    [45, 55, 60, 70],   # Sales for Product B
    [90, 85, 95, 100]   # Sales for Product C
])

print("Quarterly Sales Data (in thousands):\n", sales_data)
print("-" * 30)

# Step 2: Calculate total sales for each product
# We use axis=1 to sum across the columns (for each row)
total_sales_per_product = sales_data.sum(axis=1)
print("Total sales for each product (in thousands):", total_sales_per_product)
print("-" * 30)

# Step 3: Calculate average sales for each quarter
# We use axis=0 to average across the rows (for each column)
average_sales_per_quarter = sales_data.mean(axis=0)
print("Average sales per quarter (in thousands):", average_sales_per_quarter)
print("-" * 30)

This simple script demonstrates how just a few lines of NumPy code can perform powerful calculations that would be much more complex with standard Python. You can see how axis=1 sums the rows (giving you the total for each product) and axis=0 averages the columns (giving you the average for each quarter).

Conclusion

NumPy is an absolutely fundamental skill for anyone serious about a career in data science or AI. It provides the speed and tools you need to handle large datasets efficiently.

By understanding the ndarray, learning to perform vectorized operations, and mastering indexing and reshaping, you’ve taken the first big step on a very exciting path.

The Practical Project Using Numpy (Simple Stock Portfolio Analysis)

Imagine NumPy is like a powerful, super-fast calculator that works on entire spreadsheets of numbers all at once, instead of just one number at a time. The project we created is a small spreadsheet for tracking stock prices, and we’ll use our calculator to analyze it.

Step 1: Getting the Data Ready

The first thing we do is set up our “spreadsheet.” We don’t have real stock data, so we create some.

				
					# Import the NumPy library.
import numpy as np

# Create a table (array) of prices with 10 rows and 3 columns.
stock_prices = np.round(np.random.rand(10, 3) * 50 + 100, 2)

import numpy as np: Think of this as opening your calculator’s box. The np is just a short nickname we give it so we don’t have to type numpy every time.
stock_prices = ...: This creates our table of numbers. The (10, 3) inside the np.random.rand() part tells NumPy to make a grid with 10 rows (for 10 days) and 3 columns (for 3 different stocks).
np.round(..., 2): This is a small but important detail. We use this function to make sure all the prices have exactly two numbers after the decimal point, just like real money.

Step 2: Calculating Daily Returns

A “daily return” is just the percentage change in price from one day to the next. For example, if a stock went from $10 to $11, its return is 10%.

The key here is that we use a trick called “slicing” to do this calculation for every single day, for every single stock, all at once.

				
					# `stock_prices[1:]` is the whole table, but we cut off the first row.
# `stock_prices[:-1]` is the whole table, but we cut off the last row.
daily_returns = (stock_prices[1:] / stock_prices[:-1]) - 1

stock_prices[1:]: This gives us every row starting from the second row (day 2).
stock_prices[:-1]: This gives us every row up to the last row (all the rows except day 10).
( ... / ... ) - 1: By dividing these two tables, we’re essentially taking the price of “today” and dividing it by the price of “yesterday” for every single day and stock simultaneously. Subtracting 1 converts this into a percentage change.

Step 3: Finding Key Measurements

Now that we have the daily returns, we can use NumPy’s built-in functions to find important measurements, or “metrics,” about our stocks.

				
					# Find the average price for each stock.
average_prices = np.mean(stock_prices, axis=0)

# Find the total return for each stock.
total_returns = np.sum(daily_returns, axis=0)

# Find the volatility (risk) of each stock.
volatility = np.std(daily_returns, axis=0)

The most important part of this step is understanding what axis=0 means.

Imagine our table of numbers. When you use axis=0, you’re telling NumPy to perform the calculation down the columns. For our stock data, this means:

np.mean(..., axis=0): Find the average of the first column, then the average of the second column, and so on.
np.sum(..., axis=0): Add up all the numbers in the first column, then the second, and so on.

The axis parameter is a fundamental part of working with NumPy.

Step 4: Final Portfolio Performance

This last step shows how powerful NumPy’s “whole table” thinking is. To find the final value of a $1000 investment in each stock, we can simply multiply our initial investment by the total return.

				
					# Our starting amount for each stock.
initial_investment = np.array([1000, 1000, 1000])

# NumPy multiplies each number in the 'initial_investment' list
# by the corresponding number in the 'total_returns' list.
final_value = initial_investment * (1 + total_returns)

Instead of using a for loop to go through each stock one by one, NumPy does it automatically and much faster. It sees that you want to multiply two lists together and it does the math for each item in the list, returning a new list with the results. This is called vectorization.

Complete Source code of the project (Simple Stock Portfolio Analysis.)

				
					# A Practical NumPy Project: Simple Stock Portfolio Analysis

# Step 1: Import the NumPy library.
# We always import it with the alias 'np' by convention.
import numpy as np

print("--- Step 1: Generating Sample Stock Data ---")

# Let's create a hypothetical dataset for daily stock prices.
# Imagine we are tracking 3 different stocks over 10 trading days.
# The `np.random.rand` function creates an array of random numbers
# between 0 and 1. We'll multiply and add to get more realistic prices.
# The shape is (10, 3) for 10 days and 3 stocks.
stock_prices = np.round(np.random.rand(10, 3) * 50 + 100, 2)

# `np.round()` is a handy function to round to a specific number of decimal places.
# We'll print the data so you can see what we're working with.
print("Daily Stock Prices (10 days for 3 stocks):\n", stock_prices)
print("-" * 50)

# Step 2: Calculating Daily Returns

# The daily return is the percentage change from one day to the next.
# We can calculate this using NumPy's slicing capabilities and vectorized operations.
# To do this, we'll take the prices from day 2 to the end and divide them
# by the prices from day 1 to the second to last day.
# We subtract 1 to get the percentage change.
# The result will be a (9, 3) array, as we lose the first day's data.

# `stock_prices[1:]` gives us all rows from the second row to the end.
# `stock_prices[:-1]` gives us all rows from the beginning up to the second to last day.
daily_returns = (stock_prices[1:] / stock_prices[:-1]) - 1

print("Daily Returns (as a percentage):\n", np.round(daily_returns * 100, 2))
print("-" * 50)

# Step 3: Calculating Key Metrics for the Portfolio

# Now, let's use some core NumPy functions to get meaningful insights from our data.
# We will use the 'axis' parameter to specify whether we want to perform
# the calculation along the rows (axis=0) or the columns (axis=1).

# Let's calculate the average price for each stock across all days.
# We use axis=0 because we want to average down the rows (each column represents a stock).
average_prices = np.mean(stock_prices, axis=0)
print("Average Price for each stock:", np.round(average_prices, 2))
print("-" * 50)

# Now, let's find the total cumulative return for each stock.
# We can sum up the daily returns.
total_returns = np.sum(daily_returns, axis=0)
print("Total Return for each stock (in percent):", np.round(total_returns * 100, 2))
print("-" * 50)

# Finally, let's calculate the standard deviation of the daily returns.
# Standard deviation is a measure of volatility. A higher number means more risk.
# We use axis=0 again to get the standard deviation for each stock.
volatility = np.std(daily_returns, axis=0)
print("Volatility (Standard Deviation) for each stock:", np.round(volatility, 4))
print("-" * 50)

# Step 4: Simple Portfolio Performance

# Let's imagine we invested a fixed amount in each stock.
initial_investment = np.array([1000, 1000, 1000]) # $1000 in each stock

# We can find the final value of our investment using vectorized multiplication.
# The first row of `stock_prices` is our initial price point.
final_value = initial_investment * (1 + total_returns)

print("Final value of initial $1000 investment in each stock:")
print(np.round(final_value, 2))

I know that was a lot of new information. Which part would you like to explore more? We could dive deeper into how axis works, or maybe look at another simple project!

I hope this guide helps you feel more confident about tackling your first numerical projects. Now that you’ve got the basics, what’s the first NumPy project you’re going to try? Maybe you’ll analyze some sports data or track your personal finances! Let me know in the comments below!

Explore further into the fascinating world of python by reading my main pillar post:

Python For AI – My 5 Years Experience To Coding Intelligence

Stay ahead of the curve with the latest insights, tips, and trends in AI, technology, and innovation.

[…] plans that offer varying levels of access, power, and features. To truly understand whether ChatGPT is free, you have…

Thank you, I've recently been looking for information about this subject for a while and yours is the best I've…

What i don't realize is if truth be told how you are no longesr actually much more well-appreciated than you…

[…] Artificial Intelligence (AI) and Machine Learning (ML) Engineers are at the forefront of one of the most transformative fields…

[…] conclusion, using AI tools for lead generation in Miami is no longer a luxury, but a necessity for businesses…

A Beginner’s Guide to NumPy, My 3 years Experience in Data Science