How do Generative Artificial Intelligences (GAI) actually work?

9 min readJan 21, 2023

[Written on January 21, 2023 by Jeremy Lamri with the support of the Open AI GPT-3 Da-Vinci-003 algorithm for approximately 15%.].

We’re in now. At this stage, many people have been able to see or try out generative artificial intelligences (GAI), whether for text with GPT-3 or for images with Dall-e, Stable Diffusion or Mid Journey. But how do these powerful and advanced algorithms work, so powerful and advanced that they sometimes seem inseparable from magic? A very simplified dive into the fantastic world of AI, with a big focus on image-based GAIs.

WARNING: This article does not in any way minimize the risks and challenges associated with this new wave of powerful techs. On the contrary, the risk is real and significant, which is why I will specifically address it in a comprehensive article soon. To make sure I make myself clear on this, I even start talking about it here in a full section. But for now, it’s already about understanding how it all works, at least for what we know! So please take this into account in your kind reading:)

Created by Jérémy Lamri with Dall-e (All rights reserved, 2023)

From AI to GAI

When talking about AI nowadays, we are almost exclusively talking about deep learning, which is a category of machine learning. And increasingly, we are looking into AI that is unsupervised learning. Unsupervised AI is a form of machine learning that allows a computer to identify patterns and trends in data without needing to be guided by examples and instructions. Unsupervised AI algorithms are generally used to discover hidden information and complex relations in very large datasets.

When looking superficially at generative AI or IAG, one could try to summarize the subject like this: algorithms use techniques such as deep learning and convolutional neural networks to learn to recognize relevant features in images and reproduce them. Then, the algorithm can generate images that resemble the input data by using these features and thus create original images. But this would miss out on the real revolution that are the new tools that have been arriving on the market since last year, because there is much more than these aspects of generation and extraction.

From reading a lot of reviews on forums, many people still think that generative AI like Dall-e use generative adversarial networks, or GANs, which appeared about ten years ago. In principle, this is two algorithms that face each other, one to create the most realistic image possible, the other to detect if the image is artificial or not. They train each other until they reach an image for which it is difficult to tell if it was created by an AI or not. This type of adversarial network is very competent when the concept to be illustrated is very defined, framed and unique.

For my part, I am a big fan of GANs because their equation is part of game theory, where the goal is to create a zero-sum game. But apparently, their training is still quite complicated and often prevents the necessary convergence. In other words, it would work for a couple of algorithms whose only purpose is to create and detect cats for example, or another concept that is ultra clear and documented. But when the concepts are more numerous and complex (faces, landscapes, objects, symbols, etc.), it is far from enough, despite the exceptional performance of GANs in their job.

What type of AI are we talking about with GAI?

We could train GANs for everything that exists on Earth and then assemble them, but it would take too much time and wouldn’t allow for mixing concepts to create original concepts. So another approach needed to be taken to enable generative AIs such as Dall-e, Mid Journey or Stable Diffusion. These generative AIs represent a new way of thinking about interpreting information into images. They rely on what is called diffusion algorithms, which excel in unsupervised and conditional denoising sampling.

Said like this, it’s not sure it helps us. But let’s look at it this way: imagine that from a blurry or degraded image, an algorithm can reconstruct a better resolution, like in movies where the CIA magically makes a crappy photo super sharp. If we can do it in this direction, we can also do it in the other direction, by teaching algorithms to degrade an image. A completely degraded image would be represented by a set of pixels of all colors, kind of like a big cloud of dots, where it is impossible to distinguish anything. This is what we could call a completely noisy image. The process of improving the image is called denoising.

Once the algorithm can do this work in both directions, it is able to reconstruct any sharp and precise image from a big cloud of noise. But knowing that from this big cloud, there are a huge number of possible final results, the algorithm needs to be given some indication of what it is supposed to reconstruct. This is where the text we insert, the famous ‘prompt’, comes in. This text is analyzed by an embedding algorithm, which will associate our request with one or more semantic fields. We should see this type of algorithms as advanced versions of natural language processing (NLP) algorithms.

How do the new GAIs work?

So, starting with an image full of (presumably) random pixels, thus completely noisy, the diffusion algorithm progressively deburrs the image, following the conditioning imposed by the embedding algorithm. Little by little, pixel by pixel, it creates the image that is closest to the imposed concept. In a way, it’s a bit like when you look at the clouds and see human faces. We are so trained and conditioned to see human faces that we end up inventing them, even if the cloud had no intention of representing a face.

Since the noise cloud used as a canvas to create the image is completely random, this explains why each final image is different. The AI starts from this cloud and de-noises it until it reaches a clear image representing the desired concept, but this final interpretation necessarily depends on the starting pixels! Just formidable.

Most importantly, the most interesting thing about this couple between diffusion AI and embedding AI is that they can learn from their collaboration, to remember the noise clouds that constitute the best base for given concepts or associations. Also, by offering several results to the user, the AI can learn, on the assumption that the image chosen by the user is the one that best represents its request. Thus, the results can become increasingly fine and precise. Really formidable. And this is only the beginning of this new era!

What ethical and legal issues arise with these GAI?

AGI promises to improve decision-making and offers new and exciting opportunities for businesses and organizations. I think they have the real potential to radically transform the world we live in. However, with the development of these technologies, many ethical and legal issues arise.

Defining the role of human responsibility in the process of creating and using generative AI

It is important to define the role of humans in the process of creating and using generative AI and to determine who is responsible for the decisions and actions taken by these systems. Generative AI are complex systems that can make decisions based on complex data, and their use can carry risks and responsibilities.

Right to privacy and data protection when using generative AI

Generative AI systems can collect and process personal data, which can pose privacy issues. It is therefore necessary to adopt measures to ensure that personal information is not used for unauthorized purposes, and that personal data is protected.

Risks associated with the use of generative AI, such as discrimination and bias

Generative AI systems can facilitate discrimination and bias when used improperly. For example, a generative AI system can be used to make automated decisions based on data that is discriminatory or biased. It is therefore important to ensure that the data used to feed a generative AI system is collected fairly and without bias and that the results obtained are also fair and just.

Law on Intellectual Property Rights and the Use of Works Created by Generative AI

Generative AI systems can be used to create works that can be protected under intellectual property law. It is therefore important to understand how this law applies to works created by AI and to ensure that the rights of creators and users are respected.

Risks associated with the creation and use of generative AI, including in terms of security and confidentiality

Generative AI systems can be targeted by cyberattacks, and their use can lead to security and confidentiality risks. It is therefore important to understand these risks and to take measures to minimize them, including by putting in place appropriate security and confidentiality measures.

Beyond these five challenges, and in view of the new world that is opening up, one could extend this list much further. For example: right to freedom of choice and decision-making autonomy of users in the face of generative AI, impact of generative AI on employment and the economy, effects of generative AI on the environment, use and exploitation of generative AI technologies by unauthorized actors, risk of misuse of generative AI for the propagation of illegal or harmful content, control of generative AI by audit mechanisms, definition and promotion of good practices for the use of generative AI, etc.

Conclusion

Generative Artificial Intelligence is undoubtedly a rapidly growing field of Artificial Intelligence research. Its origins, working methods and implications are subjects that deserve further study. The potential applications to industry, social sciences and health are numerous and promising. Technological developments and research in Generative AI will allow us to better understand the world around us and create innovative solutions to solve some of the greatest challenges we are facing today.

With so many possibilities, at least equally powerful challenges are bound to emerge. In a future article, when the market has a bit more hindsight, I will take the time to detail as many of them as possible. Until then, experiment, learn, form your own opinion. Learn today so you won’t suffer tomorrow!

Bibliography

Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). ChatGPT is not all you need. A State of the Art Review of large Generative AI models. arXiv preprint arXiv:2301.04655.

Huang, C. W., Lim, J. H., & Courville, A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34, 22863–22876.

Ma, H., Zhang, L., Zhu, X., & Feng, J. (2022, October). Accelerating score-based generative models with preconditioned diffusion sampling. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII (pp. 1–16). Cham: Springer Nature Switzerland.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684–10695).

San-Roman, R., Nachmani, E., & Wolf, L. (2021). Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600.

Yang, L., Zhang, Z., Song, Y., Hong, S., Xu, R., Zhao, Y., … & Yang, M. H. (2022). Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796.