Python & NumPy

Foong Min Wong
7 min readApr 28, 2024

The first time I learned Python was for a research project related to the numerical decomposition of algebraic surfaces. Since then, Python has been one of my go-to coding languages for fun and work. Most people know that Python is powerful, but some are concerned about its slow performance for high-performance engineering tasks. In this post, let’s explore the potential of Python and NumPy.

Why Python?

It is true that Python, as an interpreted language, is much slower than compiled languages such as C, C++, Java, etc. The programs written in interpreted languages are not executed directly by the processor. The interpreter parses the Python source, converts it into a simple format (bytecode), then steps through, and executes the bytecode line by line. Python is dynamically typed, where the variables can be objects of any type. Although being both interpreted and dynamically typed makes Python slow for CPU-intensive work, those features make Python user-friendly, and flexible while programming and working with complex data structures. At the beginning of switching from C++/ Java to Python, I was not used to it since Java is statically and compiled typed. After slowly adjusting myself, I find Python simpler with the easy-to-read syntax and more flexible by not declaring the data type of variables.

CPython

You might have heard of CPython, which is the standard interpreter of Python, written in C that translates and runs Python programs. CPython leverages the speed of C and can be used conveniently in Python. There are Python modules, such as math, pandas, NumPy, or SciPy, that heavily rely on compiled C. In Python, we called these extension modules where we extend the functionality of Python by implementing code in languages like C or C++. You might have seen a .so file on Linux or .pyd file on Windows before, those files are compiled C or C++ libraries that can be imported like regular Python modules. For example, NumPy has a load_library function to load C libraries such as multiarray and umath c-extension modules.

These modules bridge the gap between the high-level, interpreted Python and lower-level fast-compiled code to improve performance in handling computationally expensive tasks. While working on GUI development, I learned that PyQt/ PySide are Python extension modules as well since most of their codes are implemented in C/C++ with the underlying Qt framework, which is also written in C++. I find it convenient to use PyQt/ PySide to develop GUI as I can access the C++ functionalities through Python bindings which simplifies the GUI development process.

NumPy

Numerical Python (NumPy) is one of the most fundamental packages for numerical computing in Python. It provides memory and computation-efficient multidimensional arrays (ndarray) and a rich set of features.

NumPy Arrays versus Python Lists

Python lists and NumPy arrays are two popular Python data structures to store elements, but they have some differences. Initially, I thought Python lists were more straightforward than NumPy arrays to store and manipulate elements. When I started to deal with very large datasets, I realized Python lists were not the ideal data structure, especially for numerical data.

Python lists are based on arrays of pointers to objects stored across memory. The pointers in the array do not store the object’s data directly; they point to the memory location where the object’s data resides. NumPy arrays store elements contiguously in memory along with essential metadata (array’s shape, dimension, data type, etc.) to keep track of structure.

If you need to do a lot of numerical data processing and computation, NumPy should be your data structure option because it can perform arithmetic operations without writing any explicit loops (vectorization).

Vectorization

Instead of iterating through each element of an array, vectorization in NumPy performs element-wise operations simultaneously without using a for loop.

Vectorization in NumPy

The vectorized approach is significantly faster than loop-based approaches. Another NumPy capability that is relevant to vectorized computations is called broadcasting.

Broadcasting

Broadcasting is an operation that allows performing arithmetic operations on arrays of different shapes.

Broadcasting over axis 1 in NumPy

In the addition operation, the 1D array (array1) is broadcast to match the shape of 2D array (array2). The array1 (3,) is replicated along the column to become (3,1) array.

I always have to wrap my head around when it comes to understanding broadcasting, and here’s a visual way (inspired by Wes Mckinney’s book) to comprehend broadcasting over axis 0 or 1 of a 2D array.

Broadcasting over axis 1 of a 2D array

Below is an example of broadcasting over axis 0:

Broadcasting over axis 0 in NumPy
Broadcasting over axis 0 of a 3D array

Reshape

There have been times when I reshape ndarrays to create desired array shapes for data processing and visualization. Sometimes, we have to reshape the arrays when one of the Python libraries used expects a specific data format. numpy.reshape function in NumPy changes the layout of the data array without modifying or copying any data.

Reshaping a NumPy array with new shapes that compatible with the original shape

According to the documentation, when specifying the new shape, one shape dimension can be -1, which the value used for that will be inferred from the length of the array.

Reshaping a NumPy with one of the dimension shapes equal to -1

Ravel vs Flatten

To collapse all elements into one single row, we can use NumPy’s ravel and flatten. We often flatten arrays when an algorithm or a visualization library expects one-dimensional data.

Numpy’s ravel and flatten

Ravel is often faster than flatten especially for large arrays because ravel does not copy data, while flattening an array returns a copy of the data array. Modifying a ravel array might change the original array since the ravel function creates a view of the same array.

Array Indexing and Slicing

Array indexing and slicing in NumPy are two techniques to extract elements efficiently without copying data (except fancy indexing).

The slicing notation definition is [start_index : end_index : steps]. The star_index is included while the end_index is excluded.
arr[::-1] or arr[:,::-1] counts indexes backwards and returns a list in reverse order.
arr[-1:,:] sets the start_index as -1 for the row which selects the last row. arr[-1:,:-1] selects all the elements in the last row except the last element in that row.
Boolean indexing on NumPy array

Array indexing and slicing in NumPy are used frequently for data array element selection and manipulation. Aside from those functions, there are more NumPy routines for array manipulation which can be found in their documentation. NumPy also offers a comprehensive set of mathematical routines such as linear algebra operations, numerical optimization, and fast Fourier transform.

Fast Fourier Transform

In signal processing, fast fourier transform (FFT) is crucial to analyzing and handling signals by transforming them between the time domain (amplitude strength vs time) and the frequency domain (magnitude strength vs frequency).

Fourier Analysis

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

data_loader = sm.datasets.sunspots.load_pandas()
sunspots = data_loader.data["SUNACTIVITY"].values

t = np.linspace(-2 * np.pi, 2 * np.pi, len(sunspots))
mid = np.ptp(sunspots)/2
sine = mid + mid * np.sin(np.sin(t))

sine_fft = np.abs(np.fft.fftshift(np.fft.rfft(sine)))
print("Index of max sine FFT", np.argsort(sine_fft)[-5:])

transformed = np.abs(np.fft.fftshift(np.fft.rfft(sunspots)))
print("Indices of max sunspots FFT", np.argsort(transformed)[-5:])

plt.subplot(311)
plt.plot(sunspots, label="Sunspots")
plt.plot(sine, lw=2, label="Sine")
plt.grid(True)
plt.legend()
plt.subplot(312)
plt.plot(transformed, label="Transformed Sunspots")
plt.grid(True)
plt.legend()
plt.subplot(313)
plt.plot(sine_fft, lw=2, label="Transformed Sine")
plt.grid(True)
plt.legend()
plt.show()
Amplitude Spectrum Example

As shown in the code above, when performing FFT on a signal, the FFT result is complex as it represents both the amplitude (real part -magnitude) and phase (imaginary part) information of the frequency components in the signal. Therefore, we use np.abs() to extract the magnitude of the spectrum from the complex FFT result.

Spectral Analysis

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

data_loader = sm.datasets.sunspots.load_pandas()
sunspots = data_loader.data["SUNACTIVITY"].values

transformed = np.fft.fftshift(np.fft.rfft(sunspots))

plt.subplot(311)
plt.plot(sunspots, label="Sunspots")
plt.legend()
plt.subplot(312)
plt.plot(transformed ** 2, label="Power Spectrum")
plt.legend()
plt.subplot(313)
plt.plot(np.angle(transformed), label="Phase Spectrum")
plt.grid(True)
plt.legend()
plt.show()
Power Spectrum Example

In general, we often carry out FFT when working with spectral data for frequency analysis (to identify the main frequencies present in the signal), filtering (to select, modify, or remove certain frequencies), signal modulation, and demodulation in communication systems.

Summary

This blog post is a compilation of some fundamental concepts and practical applications of Python and NumPy that I have learned through work and past projects. However, this post merely scratched the surface. As you delve deeper, you will discover a vast collection of Python and NumPy functions, libraries, and techniques that fit your specific use case. The more you explore, the more you find out.

--

--