Verifying and evaluating scientific results with the open source package "scores"

Friday 10:00 AM–10:30 AM in Door 12 / Goldfields Theatre

Part of the Scientific Python specialist track

Verifying, evaluating or interpreting complex data requires specialist tools and methods. Many data scientists, programmers and scientists will be familiar with some evaluation metrics such as accuracy, mean squared error or true positive rate. There are many situations where these scores are insufficient for assessing correctness, accuracy or suitability of a model or prediction. The challenge of verifying models and predictions affects most fields of science, engineering, and many machine learning applications.

This talk will introduce "scores" , an open source Python package for verifying and evaluating labelled, n-dimensional (multidimensional) data at any scale. "scores" includes over 50 metrics, statistical techniques and data processing tools. The software repository can be found at https://github.com/nci/scores and the documentation can be found at https://scores.readthedocs.io/ .

This talk is suitable for beginner, intermediate and expert audiences. Developers and data scientists who are familiar mainly with tabular data, such as supported by the pandas library, may be interested in the additional functionality offered by "scores" (and the xarray library it utilises). For those learning about more advanced methods, every metric and statistical test has a companion Jupyter Notebook tutorial. For expert users already familiar with these ideas, you may be interested in some of the novel scoring methods not commonly found in other packages.

Come to this talk to hear about:

The difference between tabular data, n-dimensional data, and labelled n-dimensional data
Examples of using a common metric from "scores" on labelled, n-dimensional data
Examples of using "scores" for interrogating data in multiple dimensions
Examples of where basic methods overlook important considerations
Examples of using some of the more complex metrics in "scores"

Tennessee Leeuwenburg he/him

Tennessee Leeuwenburg is a data scientist and software developer with over 20 years of experience. He has an interest in open source software, machine learning, and forecast verification. His current research work includes the development of scientific machine learning models for weather and environmental prediction. For an overview of his recent publications, please visit https://orcid.org/0009-0008-2024-1967 .