2024-10-18

Last modified: 2024-10-19 13:16:34

< 2024-10-14 2024-10-19 >

LLM steganography

Let's make a first pass at this.

First step is to get set up with an LLM in a way that lets me see all of the token probabilities, and choose them.

Then after that I can come up with different strategies for choosing the tokens.

I think start by making use of the GPU, and only later move on to something that can run in the browser.

Getting this stuff working in Python is actually incredibly easy.

Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)

Something is printing this to my terminal, but I don't know what or why or how to make it stop.

With gpt2 my program was working (generating believable-looking text, not doing steganography yet), but with tinyllama it is stringing words together with no spaces between.

Apparently if you ask the tokenizer to decode an individual token id it removes spaces?? You need to get it to decode the sequence as a whole, you can't concatenate individual decoded words. That seems like a strange design choice.

Well this will do. Next up is choosing either the 1st or 2nd most likely token, to encode a bitstream.

Starting from "The quick brown fox ", we get:

1st most likely: "jumps over the lazy dog."
2nd most likely: "jump over a low hur dull hur hur dul dull dully hur dul dull dul dul dully dul dul dully hur hur dul hur dull dully dull hur dull hur hur hur dull dully hur dull dul dully hur dul hur dully hur hur hur hur hur dully dull hur hur dul dul dul dull hur hur hur dul hur hur"

Lol! I would have expected a lot better than that. I guess that means we can't just swap the least significant bit of the token rank to hide the message.

OK, but we could still keep the token rank the same but have a different preceding context, although that may be likely to just regenerate your secret message?

In a very short example it doesn't seem to regenerate your secret message, but the very occasional low-probability tokens just totally trash the output.

We could make it twice as long by saying we'll add the ranks of 2 consecutive tokens to reconstruct the original token rank? Still has really bad output.

So then what if we then incorporate the idea of the threshold probability? If the token with the required rank is >1% probability, we use it. If not, we generate the most likely token with <= 1% probability, and then we encode the token in multiple digits, selected from the ones that have >1% probability, until done, which you signal by giving the most likely token that is <= 1%.

I think this is almost working relatively well. Using llama3.2-1b.

< 2024-10-14 2024-10-19 >