31 de enero de 2024

What's new in ML with iNaturalist

We've been doing a bunch of exciting work over the past few months, and I'm excited to share some of it today.

New Hardware

Towards the end of 2023 we bought a new server for training. Part of what we've been doing in January is setting up and burning in the hardware, but we finished with that yesterday. Starting in February, we'll be putting it to use.

Our first experiment with the new hardware will be a simple one. When we made the export for our 2.11 model, we made another export at the same time, with the same taxonomy, but a different random selection of photos. We've always wanted to quantify how much of the run-to-run accuracy variance is due to sampling. We also want a straightforward experiment to estimate the performance of the new hardware.

New Software and Algorithms

Keras 3

We train using Tensorflow and Keras, and the Keras team released version 3 late last year. I spent a few weeks recently experimenting with Keras 3 to see if we could take advantage of it. On the positive side it'll allow us to further simplify our training code. Unfortunately it doesn't look quite ready for us (multi gpu training doesn't seem to work yet), but hopefully we'll be able to adopt the new version later this year.

EfficientNets v2

I also spent some time experimenting with EfficientNetsV2, which is an exciting new model architecture that was designed in part by neural architecture search. It's an evolution of our current architecture (Xception) and it should result in better accuracy, faster training, and more battery efficient prediction on mobile / edge devices.

Progressive Learning

Finally, while reading the EfficientNets v2 paper I learned about the concept of progressive training. The idea is that instead of training the entire run with the same size images, we could start by training on very small images (say 100x100 pixels) and then over duration of the training run, we increase the size of the training images to our full size (299x299). There is some cool logic behind this:

  • training on smaller images is faster,
  • training on smaller images takes less GPU memory, and
  • training on progressively more detailed images is a kind of curriculum learning, where the model starts with a relatively easier task and gets a steadily more complicated task as the training run continues.

Here are some results comparing progressive learning vs non-progressive learning with a few different iNat datasets.

Ladybird Beetles

First, I experimented with a small dataset of photos from 120 species of ladybird beetles, culled from the iNat Open Dataset.
Here's what the validation accuracy looks like, across 60 training epochs:

What the above chart doesn't show is that early epochs of the progressive learning training run executed much more quickly. Here's validation accuracy charted vs time:

This is exciting because we've achieved the same accuracy on previously unseen validation data in 1/4 the time.

Molluscs

My next step was to experiment with a larger dataset of photos from 1,000 species of mollusc, also from the iNat Open Dataset.
Here's validation accuracy against time:

This kinda looked OK, but I was curious about the accuracy dip at 26 hours. Turns out, it coincided with the switch from 200x200 images to 299x299 images.

The EfficientNet v2 paper says this can happen with progressive learning, and offers a theory of why, that applying strong regularization to smaller images can hurt the models ability to progressively learn. They suggest scaling regularization alongside image size over the training run.

Regularization is basically constraining the model at train time to improve its performance on previously unseen data (like we see with val accuracy). We typically use dropout and augmentation as regularization techniques. I decided to try tweaking the batch sizes - early epochs with small images would get very large batch sizes (easier to train on, since the model sees more examples from more classes at once and doesn't overlearn just a few class features per batch), and later epochs with larger images would get smaller batch sizes (harder to train on). This is possible because with smaller images, we can fit more images into GPU memory at once.

Here we see that the validation accuracy of the progressive training run with progressive batch sizes finishes incredibly quickly. It appears that it could be sped up further by reducing the amount of time training with medium and large size photos. But most exciting to me is the overall accuracy - by progressive learning and also progressively changing the batch size, we reach a ceiling of 3.5% higher val accuracy than we did with just progressive learning or without progressive learning at all.

Next steps with progressive learning

There's more work we have to do here: how well does this technique generalize to very large datasets like iNat's full dataset, with 80,000 taxa and millions of images? How well does this work with our transfer learning strategy to train multiple models per year on just a few GPUs?
However, it's very promising and I'm excited.

Going forward

In 2024 I hope to blog more often, describing what I'm working on. Beyond what I've listed here, we have a lot of other ML based projects coming up that we're very excited about, from improvements to our geo models to better evaluation metrics to digging into individual problem taxa that our models struggle with to exploring new areas for us like bounding boxes or visual / mixed modal transformers.

Happy 2024, iNat!

Publicado el enero 31, 2024 10:24 TARDE por alexshepard alexshepard | 1 comentario | Deja un comentario

28 de noviembre de 2022

d3 Chart Race with iNat Observations

Continuing with my d3 & data visualization experiments, here's a chart race of iNaturalist Observations from 2008 to 2022, for US States.

Still working on how to best embed an external data viz like this into an iNaturalist journal post.

Publicado el noviembre 28, 2022 06:52 MAÑANA por alexshepard alexshepard | 3 comentarios | Deja un comentario

31 de octubre de 2022

iNat Activity Visualizations

I've been personally interested in digital and generative art for a long time. In art school I did a bunch of work in Processing, and I've also done generative art in p5js and swift. Lately I've been getting into data visualization, just for fun. For the most part I've just goofing around with d3.

I've always liked the contributions visualization on the GitHub profile page, and I thought it'd be fun to try to re-create this in d3, but visualizing iNat observations instead of GitHub checkins. So this weekend I whipped something up in d3, just for fun.

Here's what my activity chart for 2022 looks like. Pretty grim, I've been a bad naturalist this year.

Let's see what some of my co-workers have been up to.

Here's @tiwane:

Here's our fearless leader, @kueda:

And here's @sambiology, making us look like slackers:

Well, at least @carrieseltzer from the iNat team has been super consistent:

Publicado el octubre 31, 2022 12:56 MAÑANA por alexshepard alexshepard | 3 comentarios | Deja un comentario

02 de octubre de 2022

Walk to End Alzheimer's

Hi folks, I'll be participating in the Walk to End Alzheimer's this year. As some of you know, my father suffers from Alzheimer's. Because it runs in families, I may face it myself as I grow older.

Please don't feel any pressure to donate, but if you're able to consider a tax-deductible donation towards a good cause this year, here's my fundraising page: http://act.alz.org/site/TR?fr_id=15425&pg=personal&px=21536111

Publicado el octubre 2, 2022 07:55 TARDE por alexshepard alexshepard | 0 comentarios | Deja un comentario

28 de diciembre de 2018

iOS Release 2.7.15, camera fixes and more

Hi folks,

We released iOS app version 2.7.15 yesterday, which has a fix for layout issues in the camera. It’s also got some updated translations, a new Donate button in Settings, and some other minor updates.

As always, please report bugs here or in the Google Group or to help@inaturalist.org.

Happy holidays! I’m in Kitsap County, Washington visiting my Mom for the week, seeing lots of fungi.

Best,
alex

Publicado el diciembre 28, 2018 06:13 TARDE por alexshepard alexshepard | 23 observaciones | 0 comentarios | Deja un comentario

01 de junio de 2018

New Release for iOS: translations and an update for NatureWatchNZ

On May 31, we released version 2.7.8 of the iNaturalist iOS app. It contains a bunch of updated translations for many languages, and one new language, Romanian. Thanks again to the iNat volunteer translators!

The other change in version 2.7.8 is an updated URL for the NatureWatchNZ partner site.

As usual, if you find anything wrong with the iOS app, please let me know here or in our Google Group, or by emailing help@inaturalist.org.

Publicado el junio 1, 2018 10:13 TARDE por alexshepard alexshepard | 0 comentarios | Deja un comentario

01 de mayo de 2018

New Translations for iOS!

On April 26th, 2018 we released version 2.7.7 of the iNaturalist iOS app. The main change in this version is four new translations: Czech, Danish, German and Turkish. Thank you to the iNat volunteer translators!

Our mobile translations are done on a platform called CrowdIn. If you find a mistranslation, or if you are bilingual and want to help make iNat work in a language that we don’t support yet, please join our translation effort here: iNaturalist Mobile on CrowdIn.

In other iOS news, iNaturalist was App of the Day on the iOS App Store last week. Apple wrote a nice story on how iNat works and also made a delightful animated illustration to accompany the story. We got a nice bump in new users from it, just in time for City Nature Challenge.

As usual, if you find anything wrong with the iOS app, please let me know here or in our Google Group.

Publicado el mayo 1, 2018 10:04 TARDE por alexshepard alexshepard | 0 comentarios | Deja un comentario

09 de abril de 2018

New species for me, week of April 9

Ever since Seek was released I've started thinking more about new species to see, more than just making new observations. Seek will give you the most common species around you, filtered by what you've seen in Seek, but it's not integrated with iNaturalist (yet). On Android you can get the most common species that you haven't already seen on iNaturalist using a feature called Missions, but we don't have that on iOS yet either. So I'm using the web to come up with my list.

Here's the URL I use to come up with my species challenges for the week:
https://www.inaturalist.org/observations?lat=37.78&lng=-122.46&radius=70&month=3,4,5&unobserved_by_user_id=alexshepard&view=species

You can generate yours by editing lat=37.78 and lng=-122.46 to include the coordinates of your location and editing unobserved_by_user_id=alexshepard to include your iNat username. This also only shows species seen in March, April or May. If you want to use this at another time of year, you might want to edit the month=3,4,5 part of the URL.

Here are my challenges for the next week:

And attached to this post are observations for the challenges I made for myself last week. I went out looking for these five species this past week, and got them all. I'm particularly proud of the Scarlet Pimpernel, because I'm red-green color blind so I'll never notice those flowers in the wild.

Publicado el abril 9, 2018 06:52 TARDE por alexshepard alexshepard | 5 observaciones | 1 comentario | Deja un comentario

03 de abril de 2018

iNaturalist for iOS version 2.7.6 released

Hi folks,

I released 2.7.6 of iNaturalist for iOS yesterday. What's new that you'll see in the app:

  • We should now display the taxon names for observations on the Me tab in your device's language, if the names are available
  • Lots of translation updates
  • Updates to login screen which hopefully help people find the "already have an account button" (#424)
  • New tab bar icons (#418)
  • You can now copy taxon names from the taxon details screen, either by tapping on the taxon name or by the new share button (#389)
  • If you've been suspended from iNat, first of all shame on you 😉 but second of all the app will at least tell you (#420)
  • Fixed location of the dots on the intro slideshow on iPad (#421)

Some under the hood things that are new in this release:

  • Upgraded to the latest facebook SDK for login, if you're into that kind of thing.
  • Switched the way we determine network reachability (in and out of service range, offline/offline, etc) to use a newer framework.
  • Switched to using the iNaturalist Node API for marking updates as seen, agreeing to IDs, and making IDs

As always, please let us know if you run into any problems, preferably by emailing help@inaturalist.org or posting on our Google Group.

Thanks,
alex

Publicado el abril 3, 2018 07:18 TARDE por alexshepard alexshepard | 0 comentarios | Deja un comentario

12 de marzo de 2018

New App: Seek by iNaturalist

Hi folks,

The iNaturalist team recently got a great opportunity to work with the folks at the Howard Hughes Medical Institute and Tangled Bank Studios on a new app to be released in tandem with their film Backyard Wilderness. They wanted a kid-friendly app that was all about discovering the nature around you. Sounds a lot like iNat, huh? 😀

So we built an iOS app called Seek on top of the iNaturalist APIs, with a few important differences compared to the existing iNaturalist apps. Observations contain some sensitive information that we don't want to reveal about children, including where they are and when they are active (in the US it's actually illegal under the Children’s Online Privacy Protection Act (COPPA) for online platforms to record this information without explicit parental consent). Because the app is designed to be kid-friendly, we don't record any observation information. No activity in this new app becomes a record on inaturalist.org, but it uses the same computer vision model as iNaturalist to suggest identifications based on photographs taken by the users. It also suggests species that have been found in the area (based on an obscured location) and recorded on iNaturalist. Seek works because of the observations submitted and identified by the iNaturalist community, so it might work best in areas with active iNaturalist communities.

We're hopeful this app will be fun, and not just for kids. Please try it yourself, and please encourage any of your friends and family who are into nature to give it a try.

It's currently iOS only. You can download it from the iOS App Store here: Seek by iNaturalist. We'll be exploring how to make an Android version soon.

Cheers,
alex

Publicado el marzo 12, 2018 06:55 TARDE por alexshepard alexshepard | 10 comentarios | Deja un comentario