
Hearing a bird trill in the garden and being able to identify it in seconds is no longer science fiction. Thanks to the artificial intelligence applied to soundToday, we can identify bird species from their songs almost as if we had an ornithologist in our pocket. The novelty is that these tools are taking a leap forward: more and more projects are aiming to make this identification reliable even in extreme conditions. No internet connectionThis is crucial if you're traveling through forests, high mountains, or remote rural areas.
In parallel, enormous databases of audio, carefully annotated by experts, are being published, which serve to train and improve these systems. This combination of Recorders, algorithms, and citizen science It is changing the way we monitor biodiversity and opening the door to bird identification apps based on song that work more stably, quickly and, increasingly, also in offline mode.
BirdNET: artificial intelligence to recognize birds by their song
One of the most cutting-edge projects in this field is BirdNETDeveloped jointly by the K. Lisa Yang Center for Conservation Bioacoustics at the Cornell Lab of Ornithology and Chemnitz University of Technology, this tool is based on deep neural networks trained with thousands of hours of audio, and is able to suggest the most likely species from a recording of song or call.
The BirdNET app allows anyone to record ambient sound with the microphone of your Android mobile and receive an estimate of which birds are singing in just a few seconds. You can also upload previously recorded audio files, making it a highly flexible field toolYou can leave a recorder in a remote location, take the files with you, and analyze them later with the app or associated tools.
The system not only gives a name, but also indicates a probability level for each detected species. This is crucial when interpreting the results: the user can see which sounds have been detected in the sonogram, check the suggestions, and assess whether they make sense in the context (habitat, time of year, etc.). This combination of automatic suggestions and human verification is at the heart of BirdNET's approach.
How a computer learns to recognize birdsong
For a computer to distinguish a nightingale from a sparrow based on its song, it needs to be trained with a large number of labeled recordingsBirdNET uses artificial intelligence and deep learning models that analyze audio and transform it into visual representations of sound, such as spectrograms, where you can see how the frequency changes over time.
During training, the algorithm receives thousands of examples of songs from each species. Each fragment is told which bird is singing and, in many cases, also the type of vocalization. Over time, the neural network learns to recognize characteristic patterns in timbre, structure, and rhythm of each species. When it then hears a new sound, it compares those patterns with what it has learned to return a list of likely species.
This approach has a major advantage: BirdNET is not limited to a few local species, but has been trained to recognize more than 3.000 species of birds from around the worldAnd in more recent versions, it already mentions more than 6.000 potential species. The more it is used and the more data is incorporated, the better its models adjust, resulting in a progressive increase in accuracy, especially for complex species.
Advantages and limitations of automatic identification by edge
Long-time BirdNET users emphasize that it is, above all, a tool designed for fieldworkOnce running, simply start recording on your mobile device and let the algorithm mark segments where it detects singing. The sonogram displayed by the application is also a very powerful educational resource: it allows visualize the singing and mentally relate the drawing of the spectrum to what is heard, something very useful for learning by ear.
Although accuracy is improving rapidly, automatic sound identification is more difficult than the one with imagesThe reasons are varied: many phones have limited microphone quality, there is enormous variability in calls between individuals and populations, and background noise (traffic, wind, other species calling at the same time) can greatly complicate the analysis. Even so, regular users have noticed a notable jump in the success rate with certain species.
A major limitation in many current BirdNET installations is that, in its mobile app version, identification is usually performed on remote serversIn other words, the recording is sent to the cloud, processed there, and the result is returned to the user. This means that, for now, in many cases Data coverage is needed to take full advantage of the app's potential, something that is not always possible in isolated natural areas.
BirdNET as an educational and citizen science tool
Beyond the specific identification, BirdNET was conceived from the beginning also as a citizen science projectUsers can submit their recordings, labeled as observations, contributing to a vast global database on bird distribution and phenology. This information is invaluable to researchers in ecology and conservation.
At the same time, the use of the app helps the general public Get to know the species around you betterRecording a bird song, viewing the sonogram, and comparing the results encourages people to learn about bird behavior, migration, and habitats. Resources like The Sound Approach project, with its excellent educational materials on sonograms and bird songs, fit perfectly with this approach of learning through both hearing and sight.
Avefy: Learn birdsongs through play
While BirdNET focuses on automatically identifying what is playing, other apps focus on the active learning by the userA good example is Avefy, an app designed as a kind of quiz game to train the ear and improve the ability to recognize songs and calls on your own.
Avefy's operation is based on presenting the user recordings organized by ecosystemsThese are what they call "soundscapes": a Mediterranean forest, a riverbank, high mountains, etc. Within each soundscape, different species are heard, and the user has to suggest which birds they think they are hearing. Feedback is provided with each attempt, so that, little by little, the ear becomes more attuned and learning is consolidated.
This approach is reminiscent of older training materials for monitoring programs like SACRE, but in an updated format and with a wider variety of scenarios. In addition to the game, Avefy includes a song guide Within the app itself, there are recordings of all Iberian species (as far as we can tell), allowing you to consult and review sounds both at home and, if you wish, in the field.
Learning at home versus identifying in the field
If we compare BirdNET with Avefy, we can see that they address two complementary needs. BirdNET is primarily a automatic identification tool It's mainly used in rural areas: you hear a song, record it, and the app suggests what it might be. Avefy, on the other hand, is designed more like training and gaming platformIdeal for learning at home or in calm moments, without the pressure of having the bird in front of you.
In practice, using both applications together can be very powerful. BirdNET helps you clear up any doubts when you're out in the field and can't identify a bird song, while Avefy will... training the ear So that, over time, you'll become less reliant on technology to recognize common sounds. And, as a bonus, the Avefy guide can serve as a quick reference, just like the guides included on other platforms such as Merlin and eBird.
Merlin Bird ID: identification by song, photo, and questions
Another key player in this scenario is Merlin Bird ID, also developed by the Cornell Lab of Ornithology. Although it has become very popular for its song identification function, Merlin actually offers three main ways of identifying: by sound, by photography and through a guided questionnaire about the observed bird.
The audio mode is very similar in experience to other systems: the user presses the record button, keeps the phone on silent, and waits for the app to listen. Then, Merlin displays a list of species it considers likely, based on the song and location. It also allows listen to other recordings of the same species to compare nuances, and the developers themselves insist that their suggestions are only a starting point: they always recommend comparing with the descriptions and example sounds on each bird's page.
Image recognition is another of Merlin's strengths. Simply take a photo or select one from your camera roll, and the app will attempt to identify the person. What species appears in the photograph?In tests conducted by various media outlets, it was able to correctly identify everything from a great cormorant in Madrid to a rose-backed pelican in Senegal. However, as with any automated system, it sometimes makes mistakes or fails to find matches when the image is unsuitable.
Merlin's third mode is the guided questionnaire, which is very useful for people with little experience in identificationThe app asks simple questions about color, size, behavior (whether it was on the ground, in the water, perched in a tree, in flight, etc.), geographic location, and date. Using this information, it cross-references data with its knowledge base and returns a set of likely species for the user to choose the best fit.
Offline usage and regional packages in Merlin
One of the attractions of Merlin Bird ID for those who travel in areas with poor coverage is its ability to partially function offlineThe app allows you to download regional bird packs, organized by geographical areas, which include fact sheets, sonograms, distribution maps and sounds of the common species in each area.
Thanks to these packages, many of the application's resources can be used in rural or mountain environments without needing a constant internet connection. This is not only practical for hikers and amateur birdwatchers, but also for researchers and volunteers in monitoring programs who conduct counts in remote areas where mobile coverage is unreliable or nonexistent.
Merlin is fully integrated with eBird, the large global citizen science platform for birds. Within eBird, there is also a mode for identification quiz Like Avefy, it allows users to practice with both sounds and images. In this case, users can customize challenges based on specific dates and locations rather than ecosystem types, which helps train identification skills in contexts very similar to those they will encounter on their field trips.
iNaturalist, Google Lens, and iPhone Visual Search
Although the focus of this article is birdsong, it is worth mentioning other tools that, while not exclusively focused on birds, rely on artificial intelligence to recognize species based on images. iNaturalist, Google Lens, and iPhone Visual Search are good examples of how AI has become a kind of "pocket biologist" for any curious person.
iNaturalist began as an academic project at the University of California, Berkeley, and today is a joint initiative with the California Academy of Sciences and the National Geographic SocietyIt's very simple to use: you take a photo of the plant, animal, or fungus you want to identify and upload it to the app. The system then automatically suggests, in a matter of seconds, possible species based on computer vision models trained with millions of observations.
iNaturalist's greatest strength lies in its global community of users and experts, who help to correct and refine the identificationsEach observation is georeferenced and dated, creating a gigantic biodiversity map in near real-time. All this information is shared with scientific repositories such as the GBIF (Global Biodiversity Information System), making the project an invaluable data source for conservation and global change studies.
In the case of iPhone's Visual Search, Apple integrates AI directly into the operating system. When opening a photo, if the system detects a recognizable element (plant, animal, monument, work of art), a special icon appears next to the information button. Tapping it gives the user access to... basic information about the speciesSimilar images and external links. This type of recognition is largely performed on the device itself, taking advantage of the computing power of modern chips.
Google Lens performs a very similar function in the Android ecosystem. It can be used as a standalone application or through the Camera app on many phones. Lens analyzes images, compares recognized objects with visual databases, and assign probabilities for each possible result. If, for example, the AI considers it 95% likely that a dog is a German Shepherd and 5% likely that it is a Corgi, it will only show the first option, as it is the most probable. For plants and animals, in addition to suggesting the name, it suggests performing a quick Google search for more information.
A global database with over 90.000 annotated songs
The qualitative leap in automatic song identification would not be possible without high-quality training data. In this respect, a recent milestone has been the publication of the first global database of detailed annotated bird songs, led by the Centre of Forest Science and Technology of Catalonia (CTFC) and described in a data paper in the journal Ecology.
This database brings together recordings made in 72 locations worldwideencompassing more than 1.100 different species. The key is not just the volume of data, but the fact that, in each file, local ornithological experts have manually marked the precise moment each species sings, adding up to more than 90.000 tagged vocalizations. This level of detail provides invaluable material for train and evaluate algorithms acoustic recognition.
The dataset is open access and available on the Zenodo platform, making it easy for research teams around the world to use it for both improve existing tools like BirdNET This database can be used to develop new models, especially for species or regions that have been underrepresented until now. In fact, this database has already been used to evaluate, on a global scale, the performance and optimal execution parameters of BirdNET, helping to refine its behavior in different contexts.
Artificial intelligence and biodiversity monitoring
The combination of field observation, large open databases, and artificial intelligence is changing the way we monitors biodiversityIn a context of accelerated climate change and ecosystem transformation, having systems capable of automatically recording and analyzing the presence of species is a huge advantage for science and environmental management.
Automated systems do not replace ornithologists, but they can multiply your observational skillsA set of autonomous recorders distributed across a territory, analyzed with algorithms such as BirdNET or other models derived from global databases, allows the generation of continuous time series on which species are present, at what times of the year and with what relative frequency.
This type of information is essential for early detection changes in populations, shifts in distribution areas or the arrival of invasive species. Furthermore, the open and reproducible nature of the data and models fosters more transparent science, where other teams can verify results, propose improvements, and adapt the tools to new realities.
As these technologies become more established and better integrated into mobile applications and accessible platforms, we can expect to see an increase in working bird song identification tools. increasingly better in offline modeby downloading models and data packages directly to the device. The key will be balancing accuracy with the size of the models and local databases, leveraging both the power of remote servers and the computing capacity of current mobile devices.
Everything suggests that, in the coming years, hearing a song in the middle of the woods and being able to determine in seconds what species it is, with high reliability and without the need for coverage, will be commonplace, thanks to the convergence of projects such as BirdNET, Merlin, Avefy, iNaturalist and the new global databases of songs that feed artificial intelligence. Share this information so that more people can learn about the topic.
