Project CETI

Project CETI (Cetacean Translation Initiative), led by biologist David Gruber, is getting ever so close to bridging the gap between sperm whale ‘clicks’ and human words. Breakthroughs in artificial intelligence, especially in the use of machine learning techniques, have allowed algorithms to identif

Project CETI
💡
This is a legacy article.
Legacy articles aren't reviewed and may be incorrectly formatted.

Project CETI

Kai Miller (CDNP)

In 1970, the beautifully intricate sounds of whale vocalisations captivated the global audience with the release of the album: ‘Songs of the Humpback Whale, a collection of whale calls put together by biologist Roger Payne. However, the prospect of actually interacting with whales using their own methods of communication seemed unrealistic.

 

Decades after Payne’s album release, Project CETI (Cetacean Translation Initiative), led bybiologist David Gruber, is getting ever so close to bridging the gap between sperm whale‘clicks’ and human words. Breakthroughs in artificial intelligence, especially in the use of machine learning techniques, have allowed algorithms to identify patterns in complex language models which enables them to predict upcoming words in specific contexts. Researchers at Project CETI are working to implement this machine learning model with sperm whale vocalisations, with a goal of creating an AI-whale ‘chatbot’. This project will integrate robotics, computer science, and linguistics in a search for inter-species communication.

 

Codas

Sperm whales boast the largest brains of any animal on earth, which is demonstrated by their sophisticated methods of communication. They produce a series of clicks, comparable to Morse code, with each series holding a distinct meaning or purpose. These series of clicks are called ‘codas’, and are the basis of a complex cetacean language system which CETI researchers are trying to decode. The codas hold structural similarities with human language in the way they contain ‘click’ patterns distinct to others, much like how human sentences are made up of specific words, which provide a meaning when articulated in the correct order. Furthermore, sperm whales operate in complex social networks like humans, forming tightly knit ‘pods’ made up of related female whales and their offspring which collaborate in feeding and nursing young. Codas are learned, not innate, and are passed down to young whales through matriarchal lineages. On a higher level, the whales form clans of up to ten thousand individuals, with each clan sharing similar moving patterns, hunting strategies, and most relevantly, coda repertoires. These codas are specific to the region the clan operates in, with sperm whales in Dominica having a vastly different coda repertoire to those in Sri Lanka. The structural similarities between codas and human language, linked with the sophisticated social behaviour of these sperm whales, have made them the perfect research model for CETI in their quest for inter-species communication.

 

However, can we classify sperm whale codas as language, and if so, can we directly compare it with human language in the making of an AI chatbot? Every language must include phonetic components which can be consistently produced and recognised by the species - for humans, these components may be the sounds generated by each letter in a word. Language also requires rules for syntax and the arrangements of such phonetic building blocks, creating sentence-like chains of sounds. Lastly, every language must adhere to interpretation rules which assign meaning to these phonetic chains. Sperm whale codas consist of up to 40 clicks, with each coda lasting only a couple seconds. Different codas are distinguished by interval time between each click and the number of clicks within the coda. Sperm whales exchange these vocalisations in conversations, similar to human conversations, which last up to an hour long. Linking back to the classification question of codas as a language, we know that codas consist of phonetic building blocks, which they exchange in a conversation-like manner. If each coda does in fact have a meaning, then whales would possess ‘duality of patterning’. This is a characteristic present in human language which refers to the way meaningless sounds (“pl”, “ay”) can be combined to make meaningful words (“play”). However, the syntactic rules and semantics of whale codas are still obscure, leaving a plethora of questions unanswered. This is where CETI researchers will implement deep learning AI models which can decipher different codas when linked to the context of when the coda is vocalised.

 

The application of deep learning

The bulk of project CETI’s work revolves around the utilisation of AI to process sperm whale codas and identify their meanings. This process consists of many stages on which the learning operates. First AI must be able to filter out other aquatic sounds such as echolocation clicks or the sound of ship engines from coda clicks. When left with only the coda clicks, a machine learning algorithm identifies which whale is producing the clicks, and contextual data supports this by showing what the specific whale is doing at the time of vocalisation. The deep learning model will then process the remaining data by identifying nuances within codas, whilst calculating the statistical properties of each coda – how often the coda is used, and with what other codas it is correlated with — resulting in a neural network of whale codas.

 

Artificial neural networks give deep learning models their ability to take in data and create suitable outputs, precisely mirroring human brains. They are composed of layers of digital neurones, which start at the input layer, pass through multiple ‘hidden’ layers, and through the output layer. The connections between one neuron to the next have a numerical ‘weight’ to them, which can increase of decrease the strength of the connection. Initially, the weight of each inter-neuron connection is random. However, as more and more data is fed through the network, the weights are adjusted, creating stronger bonds between two codas often produced together, and weakening bonds between two irrelevant coda neurones. The constant tweaking of neuron weight is a process called ‘backpropagation’ and is the backbone of how deep learning models grow in accuracy. Correlation between codas are identified by temporal patterns, frequency distributions, and structural similarities. Furthermore, giving the algorithm sentences with missing elements, such as “to _ or not to be”, will stimulate it to learn common associations between words. This can be applied by plugging in unfinished whale codas to the neural network alongside regular coda patterns. The result is an algorithm that can accurately sort codas based on their relevance between one another, such as how “hunger” is to “food” as “thirst” is to “water”. Eventually, linked with contextual data, the neural network will be able to reliably predict when a certain coda is vocalised, and how that coda is conventionally answered.

 

Artificial neural networks produce a layout of every coda put through the algorithm, mapping them out based on their statistical properties in a ‘point cloud’, with a point assigned to each word. These cloud-like mappings of words are similar despite the language — the location of “dog” within the English word cloud would be where ‘chien’ is in the French word cloud, and with any other language. This allows neural networks to translate between two languages without a Rosetta Stone-like human input. However, there is no guarantee that a sperm whale whale coda ‘cloud’ will match a human one; we don’t know what concepts – such as time– are even present in whales. But hopefully there will be some word structures or groupings in the word cloud that do align, which would allow CETI to derive meanings in a few whale codas.

 

Data collection

Despite the many breakthroughs in deep learning methods CETI have achieved, the application of AI in sperm whale codas is useless without a sufficient amount of data from which the artificial neural networks can learn from. Currently, the CETI team have successfully recorded and collected 100,000 whale vocalisations; for the deep learning model to function, they will needed tens of millions of recordings. The sheer amount of data needed is a formidable challenge for CETI, and one which requires constant monitoring of Dominica’s waters. This is carried out using many different recording devices as well as cameras, which capture the whale’s actions when they produce vocalisations.

Field researchers at project CETI are keen to gather as much information as they can; identifying the conversation participants, discerning their social dynamics and relations, and evaluating their physical health are all pertinent to the study. They do this using three main data acquisition devices: tethered buoy arrays, skin tags, and autonomous underwater vehicles (AUV’s). Tethered buoy arrays are comprised of a long chain of sensors and audio recording devices mounted at decreasing depths in several hundred-meter increments. Data from these buoy arrays are constantly transmitted to the shore, where CETI researchers can listen and track sperm whales in real time. Furthermore, the arrays will be positioned within each others' sensor range, allowing them to easily localise whales and track their movements in a broad area of the ocean. However, tethered buoy arrays lack the ability to video record the whales; this is where autonomous underwater vehicles are utilised. There is a broad spectrum of these vehicles, with devices that can be categorised as active, self-propelled crafts, or as passive floating drones. The passive vehicles, mounted with cameras, are designed to be unobtrusive to the aquatic environment while providing supporting contextual data for the audio-only devices. As opposed to the passive drones, CETI also use self-propelled robots capable of autonomous navigation. These robots, with their bio-inspired designs, can operate in close proximity to whales without disruption.

 

At MIT’s computer science and artificial intelligence laboratory (CSAIL), scientists are developing the first untethered, self navigating fish robot with video recording capabilities. The robot, named SoFi, is propelled using a motor which pumps water into two chambers within its body. As one chamber expands, it causes the body to bend; when the water is pushed through to the other chamber, it flexes in the opposite direction. This creates an undulating, side-to-side motion which accurately mimics real fish. To accommodate depth change, the robot possesses a ‘buoyancy control unit’ which changes density through the compression and decompression of air. SoFi, despite being the first in its field, holds promise in its ability to seamlessly fit into the aquatic environment and record whales unobtrusively.

 

Project CETI’s most reliable method of detailed data collection is using tags which use suction to attach the sperm whales. These tags consist of multiple hydrophones to record audio as well as pressure sensors, which will allow CETI to match recorded codas to the whale’s diving habits. However, the suction tags will need to adhere to the whale at speeds of 30 miles per hour, depths of 2000m, and near-freezing temperatures; all without causing damage to the whale skin. At the Harvard School of Engineering and Applied Sciences (SEAS), a robotics team is developing suction cups capable of withstanding such conditions. The suction cups are inspired by sperm whale’s natural prey — squid and octopus. Once attached to a membrane, the radial muscles within an octopi’s arm contract, decreasing water pressure in the suction chamber and creating suction — a feature mimicked in prototype tags. Furthermore, the team at SEAS must draw an equilibrium between making the suction strong enough to withstand the harsh conditions whilst not damaging sperm whale skin. Sperm whales shed or scratch off their skin at high rates, increasing the likelihood that the suction cups may inadvertently rip small parts off. The use of robot fish (SoFi) and suction tags in data collection has emphasised the importance of drawing inspiration from flora and fauna in order to create non-invasive data collection devices, which are key in CETI’s mission for inter-species translation.

 

Conclusions

Despite the 5 years of data collection and machine learning to come for members at CETI, one question imminently looms at the forefront of everyone’s minds: “if this does work, what will we say to them?” Project leader David Gruber says that the most common response to this question from fellow scientists is “sorry”. However, what if we can’t decipher the beautifully enigmatic clicks of sperm whales? I believe the journey CETI will embark on will be profound nevertheless; the advances in machine learning, bio-inspired robotics and bioacoustics will still prove to be monumental.