PyCon AU 2024

Andrew Lonsdale

Andrew had a background in software engineering before deciding to return to study bioinformatics in 2010. After completing the MSc, he was a research assistant and PhD student studying plant cell walls before crossing over to work on human biology in cancer and kidney projects. After submitting his thesis, he began a postdoctoral researcher at the Peter MacCallum Cancer Centre. There he has continued his research interests in the transcriptome of cancers. Andrew is a strong advocate for the discipline of bioinformatics, and enjoys teaching computing and bioinformatics skills.


Session

11-22
16:50
30min
Avocado, Cheese, Grape, Tomato or: How I Used Python to Stop Worrying and Love Emoji in Bioinformatics
Andrew Lonsdale

Bioinformatics is the science of understanding and analysing biological information, such as the genetic information contained in DNA. It combines the disciplines of biology, computer science, and mathematics. If this seems daunting, don’t panic, because this talk will focus on two open-source Python packages I have developed, FASTQE and Biomojify, that make common bioinformatics file formats intuitive and accessible…. by using emoji.

FASTQE simplifies DNA sequencing data analysis by taking numerical quality scores for the data, and summarising them using emoji to quickly convey the good, the bad, and the ugly of sequence data quality. Whether for training, outreach, or debugging, this tool can easily turn unremarkable data quality analysis into an appealing visualisation.

Biomojify takes the concept further by converting plain text data to use emoji. In DNA, for example, the conventional format represents individual A, C, G, and T nucleotides as plain text. Biomojify substitutes them with emojis such as avocado, cheese, grape, and tomato. It supports various bioinformatics file formats and supports user defined emoji mapping. It can be used to teach the underlying biological concepts behind bioinformatics data, by simplifying specialised data structures for a general audience.

Science communication is hard. These tools transform complex bioinformatics data into engaging, emoji-based visualisations, making bioinformatics concepts more accessible and adding an element of fun to scientific education and communication.

Scientific Python
Eureka 2