Monthly Archives: November 2024

King – Man + Woman ≈ King

An article by Plotly1 shows how the main analogy for what embeddings represent is a bit shaky. It sounds good and works like the analogies logic from the SAT. But it only kind of works.

I was reading all this stuff about embeddings and it’s really a surprising feature that you could do simple vector math to solve analogies. At first I assumed that the major trained models must have been trained with analogies. You really could create layers that subtract and add embeddings and do supervised learning on a known set of analogies, training the embeddings to represent gender with a single, similar vector. But nobody does this. It seems like just an amazing emergent feature of training the hidden layer we get the embeddings from!

It kind of works. But not quite, and it takes a bit of cheating.2 Taking an embedding vector for King, subtracting the vector for Man and adding the vector for Woman makes a new vector. But the cosine distance isn’t far enough away from King. You have to exclude King, Man, and Woman in order to make this vector close to the embedding vector for Queen.

I tried this myself with OpenAI’s modern embedding model, “text-embedding-3-small”.3 It gives the same results. The closest vector is still King. There’s a lot of error even if you exclude King from the possible results. The embeddings themselves don’t store good analogy information in just linear vector math. The hidden layers and attention heads will still have to do to make the embeddings into something really useful. It’s neat that we get somewhat close values and it shows something is encoded in these embeddings. The results just don’t work exactly as I’ve been reading and what I’ve been taught about embeddings.

This slightly disappointing result intuitively make sense to me. The model wasn’t trained to make vector addition a feature of its values. It’s a bunch of information ready for transfer learning. And what other information is in there? Embedding vectors encode a lot more information, maybe important nuances and maybe extraneous values, in the vector values than you would want in a space made for analogies.

  1. Plotly Graph, “Understanding Word Embedding Arithmetic: Why there’s no single answer to “King − Man + Woman = ?”. https://medium.com/plotly/understanding-word-embedding-arithmetic-why-theres-no-single-answer-to-king-man-woman-cd2760e2cb7f, 2020 ↩︎
  2. Florian Huber, “King – Man + Woman = King ?”. https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935a85, 2019 ↩︎
  3. Code to try this with OpenAI’s embeddings: https://github.com/seanmcnealy/seanmcnealy_samples/blob/master/embeddings.py ↩︎

Software Bookshelf

I know there’s a lot of reviews out there. And hopefully you already have these books if you work or intend to work in the software industry. In addition to any language specific books, I believe you should have and read these.

Designing Data-Intensive Applications by Martin Kleppmann

A comprehensive book on actually using distributed systems. You’ll want to know what the effects are from your design choices. And this book goes into databases based on the experiences of someone that’s done this work before you. I read this after prototyping a system, and I think the content here is worth about six months of full time work and research. Plus he has citations if you want to go more in depth.

Refactoring by Martin Fowler

Software can change over time and you can change your mind about design choices as the solution changes. Functional or Object-Oriented or IOC or visitor pattern, this walks through practical mechanical steps to show how easy it is to make changes that simplify programs for readability and maintenance. And shows how to go back the other way, too, because maybe that way is better.

You’ll see your programs with many more opportunities to express your designs, and it’ll help you choose for better code.

And I’ll conclude with a short list of books you’ll have to one day see if you study software, but are much less practical. The Art of Computer Programming volumes 1 – 4b, the “CLRS” algorithms book, the compilers “Dragon book.” You’ll have to have them, I guess.