Efficient Array Computing with Python

Scientists, engineers, and professionals across many sectors increasingly face large and complex datasets. Effective analysis requires both understanding the data and writing computationally efficient code. This course introduces high-performance Python programming for numerical and tabular data analysis, focusing on strategies that overcome Python’s inherent performance limitations.

Students will learn to leverage libraries such as NumPy, Pandas, and SciPy to efficiently store, process, manipulate, and analyze data. Key topics include vectorized computations, advanced array operations, memory-efficient data structures, data wrangling, aggregation, transformation, and high-performance linear algebra and curve fitting. Techniques such as just-in-time compilation, parallelization, and optimized I/O are explored to speed up computations on large datasets.

By the end of the course, participants will be able to write performant Python code for scientific and engineering applications, handle missing or complex data, apply transformations and aggregations on large datasets, and utilize robust numerical routines for modeling and analysis.

Prerequisites

  • Basic experience with Python

  • Basic experience in working in a Linux-like terminal

  • Some prior experience in working with large or small datasets

Learning outcomes

This material is for all researchers and engineers who work with large or small datasets and who want to learn powerful tools and best practices for writing more performant, parallelised, robust and reproducible data analysis pipelines.

By the end of this module, learners should:

  • Have a good overview of available tools and libraries for improving performance in Python (link to leaves in skill tree)

  • Knowing libraries for efficiently storing, reading and writing large data (link to leaves in skill tree)

  • Be comfortable working with NumPy arrays and Pandas dataframes for data analysis using Python (link to leaves in skill tree)

Credit

Don’t forget to check out additional course materials from XXX. Please contact us if you want to reuse these course materials in your teaching. You can also join the XXX channel to share your experience and get more help from the community.

License

Note

To module authors: For code you may use any OSI-approved license as mentioned in https://spdx.org/licenses/, such as Apache License 2.0, GNU GPLv3, MIT. Please make sure to update the deed above and LICENSE.code file accordingly.