Better dataframes

Sunday 10:35 AM–11:05 AM in Eureka 2

Dataframes are an abstraction that proven extremely useful for data analysis in dynamic languages like S, R, Python, and Julia. The Pandas package has been dominant in Python for around 15 years but its design is now showing its age. There is now a vibrant and messy ecosystem of potential disruptors to the status quo for data analysis tasks in Python.

This talk will help you make sense of the mess. It will give you a comprehensive review of the strengths and weaknesses of the challengers, including Polars, Ibis, Modin, Dask, and the PySpark Pandas API (formerly known as Koalas). It will also review efforts to unify the PyData landscape such as Apache Arrow, the dataframe interchange protocol, Narwhals, and the Ibis project started by Wes McKinney, the original author of Pandas.

The talk will provide context and guidance for deciding which dataframe library to choose for your next project. It will also explain the best ways to offer cross-dataframe support in your library code.

Edward Schofield

Ed is the founder of Python Charmers (https://pythoncharmers.com), which has trained around 6000 people in data science using Python from organizations like Atlassian, Barclays, Cisco, CSIRO, Dolby, Harvard University, IMC, Interpol, Singtel Optus, Oracle, Shell, Telstra, Toyota, Verizon, and Westpac. Ed is a former release manager of SciPy and the author of the widely used future package. He organized the Python user group in Melbourne for 8 years.

Ed holds a PhD in machine learning (language models) from Imperial College London. He also holds BA and MA (Hons) degrees in mathematics and computer science from Trinity College, University of Cambridge. He has 25 years of experience in programming, teaching, and public speaking.