2022 has been a great year for advances in AI and 2023 looks to be even bigger.
We’ve seen amazing progress in foundational models, large language models, multi-modal models, and more.
So, to quickly summarize all of the top advances in AI this year, here is my AI year in review for 2022.
Large language models made big headlines this year with ChatGPT based on GPT-3.5. These language models can generate human-like text and perform a wide range of natural language processing (NLP) tasks. For example, language translation, text summarization, Q&A, code generation, and more.
These models will have significant impacts on a wide array of tasks that authors previously performed. For example, I regularly use tools like GitHub Copilot to help me write better code faster. In addition, I regularly use ChatGPT to help me generate ideas for content like articles, abstracts, bios, and more.
Speech recognition and speech synthesis continue to improve year after year. This year was no different. Microsoft announced support for roughly 140 languages in their speech-to-text and text-to-speech models.
In addition, we saw voice cloning models coming out of the research labs and into production as well. Now you can build custom neural speech models using your own voice. For example, products like Descript allow users to edit their audio narration as simply and effortlessly as editing a text document.
We’ve also seen big improvements in image generation models this year with Stable Diffusion, DALL-E 2, and Midjourney. These models can generate photo-realistic images from scratch and perform a wide variety of image-editing tasks. For example, image in-painting, super-resolution, style transfer, and more.
This will have a big impact on a wide array of tasks that creative professionals previously performed by hand. For example, I use image synthesis to generate new images for my website, presentations, online courses, etc. In addition, I use AI-powered image-editing tools to edit images on a regular basis as well.
Beyond text, audio, and images, we’re beginning to see new AI models trained on thousands of hours of video. These models can perform a variety of more general-purpose tasks across various multimedia types. In addition, models like Google’s Imagen Video can take text or video as input and can produce synthetic video as their output.
This is an area that is still in the research phase, so there aren’t any commercially available tools yet. However, I’m currently playing around with models that can synthesize new videos from scratch. In addition, I’m looking forward to using these tools for video creation and video editing in the near future.
Self-driving cars have been the poster child for modern AI for many years now. This year, however, we’re seeing our first Level 4 autonomous vehicle (i.e. “robotaxi“) services in limited areas. You request a car via a mobile app, it picks you up, and you ride in the back seat to your destination. There’s no one in the driver’s seat.
I’ve driven a Tesla Model S with Full Self-Driving (FSD) Beta for roughly a year now. So, I’m closely watching the progress first-hand. On interstates and highways, it’s nearly perfect now. However, it still struggles with some city streets, roundabouts, construction zones, low visibility, and unfinished roads.
We’re still at least a few years away from general-purpose domestic robots doing laundry and taking out the trash. However, we’ve seen some pretty impressive advances in AI for robotics this year. These advances are helping move robots out of research labs and factories into our homes and daily lives.
For example, Google’s PaLM-SayCan is a robotics algorithm that uses the PaLM language model to instruct robots to perform tasks. Google is also helping robots “reason” via an internal monologue. Plus, we saw Tesla unveil a working prototype of their humanoid robot named Optimus this year.
The biggest breakthrough for AI in science came from DeepMind’s AlphaFold 2.0. AlphaFold is an AI system that can predict the 3D structure of proteins based on their amino acid sequence. Last year, it was announced that AlphaFold had effectively solved the problem of predicting protein folding.
This year, DeepMind published the AlphaFold Protein Structure Database, an open-access database of protein structure predictions for nearly every known protein. This will have huge impacts on new drug discovery, cancer research, molecular biology research, and more. It’s hard to describe just how important this discovery will be to science and medicine.
We’re beginning to see the emergence of more general-purpose AI through the use of multi-modal models. These models combine text, audio, images, video, robotic control, code generation, etc. into a single model. As a result, they can perform a wide variety of tasks with little to no additional training.
For example, DeepMind’s Gato is a multi-modal generalist AI that can chat, caption images, play video games, and control a robot arm — all with a single model. Large foundational multi-modal models like Gato are likely the future of AI. It’s an area of research that I will be watching very closely in 2023 and beyond.
To learn how to use state-of-the-art AI to solve real-world problems, please watch my latest course The AI Developer’s Toolkit.