Understand the reasons behind the huge energy and power needs of artificial intelligence
Modern artificial intelligence (AI) systems are still far from replicating real human intelligence. But they certainly get better at recognizing data patterns and analytics than we are. Currently, AI models can recognize images, communicate with people via a chatbot, drive autonomous vehicles, and even surpass us at Chess. However, did you know that the energy and power costs of training and creating these models are mind-boggling? In other words, AI training is an energy-intensive process with a large carbon footprint.
Thus, reducing energy consumption will have a positive impact on the environment. Also, it will bring other benefits to companies, such as reducing their carbon footprint and moving closer to carbon targets. And before we start building energy-efficient AI or green AI, we need to understand why artificial intelligence consumes so much power.
Neural network training
Consider a neural network model. A neural network is a powerful type of machine learning that models itself by copying the human brain. Made up of layers of nodes, the neural network attempts to recognize the underlying relationships in a dataset by mimicking the functions of the human brain. Each node is associated with another and has an associated weight and threshold. Suppose the node’s output is greater than the specified threshold, which means that the node is on and ready to transmit data to the next level of the neural network.
Training a neural network consists of performing a forward pass, in which the input data passes through it, and the output data is generated after processing the input data. The backward pass then involves updating the neural network weights using the forward pass errors using gradient descent algorithms that require a lot of matrix manipulation.
In June 2019, a group of researchers from the University of Massachusetts Amherst published an article about their research in which they estimated the energy consumption required to train four large neural networks. These are neural networks: Transformer, Elmo, BERT, and GPT-2, which trained on one GPU for one day each and constantly measured the power consumption.
One of these neural networks, BERT (Transformers Bidirectional Encoder Views), uses 3.3 billion words from English books and Wikipedia articles. According to Keith Sayenko’s article on the conversation, BERT had to read this huge dataset about 40 times during the training phase. For benchmarking purposes, note that the average 5-year-old child who is learning to speak could hear 45 million words at that age, 3,000 times less than BERT.
In a study at the University of Massachusetts Amherst, researchers found that a BERT drill once had the carbon footprint of a passenger flying back and forth between New York and San Francisco. The team calculated the total energy consumption for training each model by multiplying that number by the total training time reported by the original designers of each model. The carbon footprint was calculated based on the average carbon emissions used for energy production in the United States.
The pilot study also included training and development of a customization process called Neural Architectural Search. This method involves automating the design of a neural network through an exhaustive process of trial and error. This additional tuning step, used to improve the ultimate BERT accuracy, resulted in approximately 626,155 tonnes of CO2, roughly equal to the total life-cycle carbon footprint of five vehicles. By comparison, the average American produces 18,078 tons of CO2 emissions per year.
Advances in artificial intelligence have been made possible by the powerful GPUs (GPUs) we have today. These GPUs tend to consume a lot of power. According to NVIDIA, the maximum power dissipated by a GPU is 250W, which is 2.5 times that of an Intel processor. Meanwhile, researchers believe that having larger AI models can improve accuracy and performance. This is similar to the performance of gaming laptops, which, while having more storage capacity than a regular laptop, also heat up faster due to their high performance. Today, you can rent servers online with dozens of powerful CPUs and GPUs for a few minutes and quickly develop powerful AI models.
From the early years of machine learning development to 2012, the number of computing resources required for the technology doubled every two years, according to Open AI, an artificial intelligence research lab in San Francisco (drawing parallels with Moore’s CPU power law). However, since 2012, the trajectory of computing power for building world-class models has doubled on average every 3.4 months. This means that new computing requirements are leading to negative environmental impacts due to artificial intelligence.
Also, experts now argue that building massive AI models does not necessarily translate into better ROI in terms of performance and accuracy. Consequently, companies may have to trade off computational accuracy and efficiency.
Neural networks on the rise
A research team from Oak Ridge National Laboratory previously demonstrated a promising way to improve the energy efficiency of AI by converting deep learning neural networks to spike neural networks (SNNs). SNN reproduces the mechanisms of activation of neurons in the brain and, therefore, has many of the capabilities of the brain, such as energy efficiency and Spatio-temporal data processing. The Oak Ridge National Laboratory team augmented the Deep Peak Neural Network (DSNN) by introducing a stochastic process that adds random values such as Bayesian deep learning. Bayesian deep learning is an attempt to mimic the brain’s information processing by injecting random values into a neural network. Through the “magnification” actions, researchers can know where to make the necessary calculations, reducing energy consumption.
SNN is currently being touted as the next iteration of neural networks and the foundation for neuromorphic computation. Last year, researchers from Centrum Wiskunde & Informatica (CWI), the Dutch National Research Center for Mathematics and Informatics, and the IMEC / Holst Research Center in Eindhoven in the Netherlands, successfully developed a learning algorithm to connect neural networks.