Concurrency & Parallelism in PyQt
Whenever we look for answers on StackOverflow on how to optimize our solution to process and visualize big data or enhance software performance, we often hear two technical jargon — concurrency and parallelism.
If I recall correctly, concurrency and parallelism are usually covered during the first or second year of undergraduate computer science courses, but they often get ignored since they mostly appear in the last few chapters under the advanced concepts section. Both concepts sound similar, but they are not the same, let’s revisit and dive into the difference between these two and how they help speed up tasks significantly when building PyQt GUI applications.
Concurrency
Recently, I came across how concurrency improves the responsiveness of GUI applications. In some traditional GUI applications, if the program is executing some long-running tasks such as database queries, network requests, or intensive computations, the user might not be able to interact by clicking buttons or resizing the window. This is because the UI thread is blocked while waiting for those tasks to finish, which leads to unresponsive user interfaces and poor user experience.
To avoid freezing GUIs, when working with PyQt, I realized that it offers concurrency functionalities (QThread) that allow the UI thread to remain responsive even when a background task or script is running. The user can continue to interact with the application during a resource-intensive task.
Implementing QThread in PyQt creates separate threads for long-running tasks, moves them out of the main GUI thread, and puts them into background threads. This keeps the primary thread responsive to handle user inputs while the background threads process other long tasks. To communicate between threads, QThread produces signals (Signal) to notify the status of the primary thread, and methods (Slot), connected to the signals, are called when signals are emitted.
In general, concurrency refers to the ability to handle multiple tasks, but not necessarily running them simultaneously. Using the PyQt main and background threads scenario as an example, concurrency occurs whereby the GUI switches between both threads while waiting for responses. Both thread executions are interleaved, giving an illusion of parallel execution. Another input/output (I/O)-bound task example that involves concurrency (multitasking) is when transferring data from our PC to an external hard drive, we can even still browse Chrome and use other applications while waiting for the transfer to complete.
Parallelism
Parallelism refers to the ability to perform multiple tasks simultaneously. It distributes multiple workloads into independent tasks, assigns each of them to different processing units, and each CPU core executes tasks at the same time.
I remember there was one time working with a large Pandas DataFrame, I needed to apply a text processing function on an existing column in the DataFrame. I found a way to break the DataFrame into chunks with NumPy and use Pool from the multiprocessing library to process each chunk in parallel. Instead of performing computation in one core, you scale the solutions by splitting the computation tasks and executing them in multiple cores simultaneously. The parallel data processing helps save time before the visualization stage.
When building an interactive visualization app with PyQtGraph, besides multiprocessing, we could also use concurrent.futures to parallelize computationally expensive calculations in the background before rendering. Once the computations are complete (futures are resolved), the results will be consumed to update the plots.
You might also come across Dask, another open-source Python package used to process large datasets using Dask DataFrame, which is made of chunks of Pandas DataFrame. Using Dask can improve the responsiveness of PyQt applications as well by efficiently managing and loading datasets in parallel. Dask can help load data subsets dynamically when users interact with the visualization (e.g., filtering, panning, zooming).
Summary
Overall, we could leverage a variety of Python external libraries to achieve concurrency and parallelism to enhance the performance, user experience, and scalability of GUI applications in PyQt.