Talk on Tyler Collins

Conquering the Scheduler

Tue, 22 Mar 2022 00:50:54 -0400

Coming from a slightly different angle this time, I found that researchers were often isolating themselves to less (strictly fewer!) resources on HPC systems by not investigating what the node feature mixture looked like.

As such this talk was created to help direct potentially abstract development efforts towards optimizing for which feature sets are most available on a HPC cluster.

Below is my abstract for the talk as well as the recording:

“Determining the optimal job configuration for a given workload on HPC systems can be a difficult problem. Researchers often have different job needs, different responsiveness requirements, and different scales. This webinar will discuss these differences and how to investigate making the scheduler as responsive as possible. Topics will include whole node scheduling, by core MPI jobs, GLOST, META, and more. This presentation will assume basic knowledge of job submission, and the Linux environment. Practical examples will be discussed and used as introductions to new tools to maximize performance on the general purpose systems. Open questions will be allowed at the end of the seminar.”

Pandas Recipes for New Python Users

Mon, 21 Mar 2022 19:16:48 -0400

Eventually I got to the point in data analytics where keeping things in lists, or list of lists was no longer quite cutting it. My processing was slowly starting to grind to a halt, and things were getting way too abstract.

I decide to call up a friend who had worked in the business longer than me and they suggested “pandas”. I was vaguely familiar as users/clients had used it in the past. A “DataFrame” did sound like it would take care of a lot of my problems after reading the documentation casually…

Fast forward a year and pandas is now core to everything I do in Python. Couldn’t live without it anymore, and as such my second talk at SHARCNET was about pandas.

Below is my abstract for the talk as well as the recording:

“Often programmers find themselves in need of an effective way of working with “labeled” data. In the case of Python, Pandas is the most mature and reliable package that interacts effectively with other well known packages such as NumPy, and TensorFlow. As a package, Pandas is said to provide “fast, flexible, and expressive data structures” for what is known as “labeled data”. Features include: easy handling of missing data points, grouping functionality, simple indexing, time series support, numerous conversion functions, and more. This webinar will provide a basic introduction on how to install Pandas, a discussion of its strengths and various use cases, and lastly a demonstration of various common operations (recipes) that occur with labeled data. Experience with beginner Python concepts will be expected, while familiarity with Jupyter notebooks will be helpful. Webinar material and code will be made available on GitHub for future reference.”

Cython: A First Look

Sun, 20 Mar 2022 14:40:38 -0400

Back when I first got hired at SHARCNET, I used a lot of Python. I mean a lot. What this meant is that I quickly became the lightning rod for all Python related questions (and commentary).

During a fun Friday chat, a colleague remarked that Python was on average 40x slower than C++. I defended my current language of choice saying it was better than that, surely. To make a long story short, I was wrong. It really is about 40x slower depending on the problem. Determined to prove myself capable, and my language of choice a bit more defensible, I decided to look into ways to make Python faster.

I eventually landed on Cython. Turns out the best way to make Python faster was to use as much C++ as possible.

Below is my abstract for the talk as well as the recording:

“Often we write programs in Python for convenience, not for speed. When work becomes elevated to High Performance Computing (HPC) environments, speed once again becomes a concern. Cython is an extension of Python which allows functions to be compiled as C (or C++) and recover the significant performance trade-offs of Python. Cython achieves this by supporting calling C functions, declaring of type information, as well as providing access to C++ STL functionality. Popular packages and libraries that take advantage of Cython include: TensorFlow, OpenCV, NumPy, Pandas, and more. This webinar will cover a basic introduction to Cython, a demo translating vanilla Python into Cython, followed by a short demo of how to run Cython in our own Compute Canada HPC environments. Experience with Python will be expected, while familiarity with C/C++ and Jupyter notebooks will be helpful. Webinar material and code will be made available on GitHub for reference.”