Emotion Retrieval through Art in Latent Space.

When I was in the NGMA, this thought came to me that art is simply a way of representing latent space in 2-dimensions, or 3. Its a way of distilling what we are feeling (that latent space information embedding). Plus, the way art moves people is because it is a direct way of matching embeddings. Language is simply an encoder of that latent information. Art works by creating representations (embeddings) that match our internal representations, causing emotional responses which are not transformed into a specific modality.

Match_1 = similarity(thought_embedding_1, thought_embedding_2)

Match_2 = similarity(language(thought_embedding_1), language(thought_embedding_2))

There’s a lot of ways to explore this. For example, why do we say the phrase “this would be funnier in a different language?” Because the encoding of that thought embedding in that specific language matches (or doesn’t match) what we expected which causes funny chemicals to release in the brain. This is line with the theory of incongruity, which says that humour is mainly caused by this mismatch of expectation and reality. [1] The incongruity may not exist in the target language and hence, it ceases to be funny.

I think there’s a lot to unpack here, so here I go on a stream of consciousness. First off what is latent space. Representations of anything. Ok that’s a bit far. Let’s roll it back. Perception is a building block of cognition. Probably the most important one. Perception is also a perpetual process. From the moment you are born you perceive things. Now, the interesting thing is we only “perceive” reality. The real world, the objective truth, is ever elusive and the pursuit of it is what we mortals call science. Now, in this pursuit we forget the fundamental “truth” that our assumption that how we are perceiving phaneron, is through our qualia. Quick side note: qualia is subjective experience (what the colour red means to me vs you), and phaneron is just a collection of it, aka what we can make sense of the “real” world. So, the representation of that truth in our blancmanges, is just an approximation of the real thing. If it matches the approximation of others, we can share that reality and revel in the ecstasy of a seemingly matched qualia.

The key phrase to take away from all this is “representation of reality”. How this representation exists in our head is a mishmash of neuron activations in a moment of time. Now we have around 16 billion of these. The synchronous dance of these is what we can call as that represenataion. Now, what do most machine learning mdoels do, they learn representations of images, words, audio, and other things. Vector embeddings we call them, but they are just a list of numbers. The space that they exist in is what we call latent space. These representations are particularly useful when we want to do matching. Match what set of numbers is capturing across these unknowable, uninterpretable dimensions, but in a way that makes sense to the model based on what it has seen. And it is useful AF. All modern RAG (Retrieval Augmented Generation) is based on this assumption that matching these representations yields useful results. Even going back to the pre-LLM world, matching graph embeddings is how you got your youtube recommendations, which are pretty damn good if you ask me.

What the hell does this have to do with art? There are 2 things I had in mind when started writing this shit. One is that art evokes emotions. Pretty strong ones. And 2 that it is almost always “left to interpretation”. Apart from when artists explain their art. But that’s boring and takes the “fun” away. Why do these 2 things happen? I have been thinking about it for about close to 7 years now. I’ll start with the latter. Because art is just a raw output of that thing, that piece of reality, the expression of qualia. It is the closest to the real thing. It cannot be directly explained because that literally defeats the purpose of the expression. It is meant to be consumed in that manner. Raw. Unprocessed. Untainted by language or any other means that was not supposed be its primary mode of expression. It is just the representation, the vector embedding of the thought or feeling. What about emotions then? What evokes emotion? When you feel connected. When you feel like you “get it”. It could be a joke, it could be a movie. What you “get” is art. And it is easier to get it because? Bingo. You are matching that embedding with your own embedding. In your head. That is why it evokes emotion, is because you can do that matching in that higher dimension. Drawing parallel to that knowledge retrieval example. It is basically you doing embedding similarity based retrieval compared to good old keyword based search. So, yeah experiencing art is just doing emotion retrieval.

References

  1. Jamie Musies. Explaining the Incongruity Theory of Comedy