Cover stories (112:10): Using machine learning to link climate, phylogeny & leaf area

The cover image for our October issue shows the first part of an automated process to extract leaf area from herbarium images. Here the model’s predictions of leaves are on the pressed plant specimen of Corymbia gilbertensis. The image relates to the Editor’s Choice article ‘Using machine learning to link climate, phylogeny and leaf area in eucalypts through a 50-fold expansion of leaf trait datasets’, by Karina Guo et al. Here, Karina tells us the story behind the image:

Machine learning has recently captured the attention of the world, and science is no exception. Across different fields, it is driving advances in leaps and bounds, in various ways. This includes new computational functions that were not previously possible, such as accurate predictions for protein folding. It also includes the generation of datasets at unprecedented scale, via the automation of previously mundane and repetitive tasks. Ecology is a field that too is moving fast to develop artificial intelligence and machine learning for making measurements, and as a consequence, discoveries are being made daily from the new datasets that are being unlocked.

As an ecologist who was taught to collect data manually, it was discouraging in a counterintuitively, encouraging way. The long hours and days of measuring, recording and transcribing data couldn’t help but spark a thought. It made me think, “How can I make this easier? Surely there is a way to automate this?”. While holding onto that idea in the back of my mind, I was also working on helping to digitise the collection of plant specimens in the National Herbarium of New South Wales – two separate tasks that just happened to overlap at the right time. I was going through hundreds of herbarium sheets and realised that centuries of information across the globe were just waiting for someone to transcribe it as a digital dataset we can analyse. But the sheer volume of how much data there was from just one herbarium was enough to inundate someone’s life’s work; never mind all the herbaria across the globe. I knew I had to create a way to automate it. Leaf area is one of the many things that can be extracted from an herbarium sheet and has been expansively studied in multiple families and genera. However, pre-existing datasets were relatively small in volume, and as such leaf area posed itself as a good starting place. I developed and honed my machine learning model on eucalypts, a critical species of Australia currently spread across the world as an important forestry tree. With this, I created the first dataset of such an expansive size. This allowed me to use this new technology with herbaria data to unlock an understanding of evolutionary changes through time.

It wasn’t all smooth going. As an early career researcher, the thought of diving into the deep end of what is a very deep pool of machine learning, was undeniably intimidating. Everything about it was new. The terminology used, the concepts discussed, the software and the code. Tutorials and troubleshooting forums online helped at the time. But the linchpin to the project was definitely the experts around me who supported the project and brought life to the successful model pictured here. I’ll always be appreciative of them, and the bountiful support, knowledge and time they’ve offered. So, if you’re reading thank you.

All in all, this project was a journey. One that has shaped how I think about the world. I’m more inclined to be aware of the new technologies around us, and with how fast things are evolving in machine learning, it is a space to watch. However, I look at new technology with both eyes open – one for the criticism and the other for the opportunities it can bring. We need to encourage new software, hardware, and techniques in our research, because this model I’ve created is just one small depiction of what machine learning can help achieve in ecology.

Read the full article online: Using machine learning to link climate, phylogeny and leaf area in eucalypts through a 50-fold expansion of leaf trait datasets

Hot Topics

Related Articles