Steganographic Bitcoin seeds: Hiding cash in plain sight
Wed 22 February 2017Tagged: steganography, software
I made a tool, stegoseed, to generate sentences which steganographically encode Bitcoin wallet seeds, and to decode such sentences to retrieve wallet seeds. It comes with an example BIP39 seed to play with.
(Scroll down to "Instructions" if you don't care how it works, and you just want to play with it).
Background
HD wallets
Hierarchical Deterministic Wallets were introduced in Bitcoin Improvement Proposal 32 (BIP32). Before HD wallets, Bitcoin wallet software had to store a separate private key for each payment address. HD wallets present a way to derive an unlimited stream of private keys from a fixed-size numeric seed. This makes it a lot easier to back up a wallet without having to update the backup every time a new address is used.
HD wallets also present a way to create a "watch-only" wallet: this is a different fixed-size secret that makes it possible to derive the stream of payment addresses corresponding to the stream of private keys, without giving any way to determine what the private keys are. This way you can keep a watch-only wallet of your cold storage without being able to spend it.
BIP39 word lists
BIP39 introduces a way to represent numbers (e.g. BIP32 seeds) using mnemonics. This makes it easier for humans to write down seeds, and harder to write them down incorrectly.
BIP39 includes word lists for many different languages. Each list has 2048 words. For example, here is the English list.
Not all combinations of words are valid because they must also encode a checksum. Typically 12 words are used to generate an HD wallet seed.
Ian Coleman has a fantastic tool for playing with BIP39 seeds.
Markov Models
A Markov model models a random system where (Wikpedia) "future states depend only on the current state [and] not on the events that occurred before it". In modelling sequences of words (i.e. sentences), an order-n Markov model can tell us, given n words, which words can follow, and with what probability.
We can also build Markov models in 2 directions (i.e. one to model which words can go after a given word, and one to model which words can go before a given word). This way we can choose an arbitrary word, and then use the Markov model to generate random words in the forwards and backwards direction, until we reach a valid sentence end point and start point, respectively.
Hiding seeds in text
Given these primitives, we can now generate sentences which encode HD wallet seeds: take as input a text corpus, a BIP39 word list, and a BIP39 mnemonic seed. Build a 2-directional Markov model of the text corpus, but specifically do not allow it to generate any BIP39 words. Now, for each word of the mnemonic seed, start from that word, and use the Markov models to generate words forwards and backwards until we have a complete sentence. Concatenate the sentences together and give output.
Our output is now a series of sentences which contain the BIP39 words given in the input, in the correct order, and do not contain any other BIP39 words. We can simply remove all non-BIP39 words to retrieve the original seed.
Stegoseed comes with a pre-generated 2-directional order-1 Markov model of quite a large text corpus of news articles (it's a 7M Javascript object), and the English BIP39 word list.
Instructions
1.) You have a BIP39 seed and want to hide it in some randomly-generated text
Paste your BIP39 seed in the "Hide" box and click "Stego". You'll get a randomly-generated piece of text. If you don't like it, click "Stego" again to get another. This is quick and easy.
The generated text is quite nonsensical, but without the BIP39 words highlighted, it would be hard for a human to realise what it is encoding.
2.) You have a piece of text and want to retrieve the BIP39 seed
Paste your text into the "Unhide" box and click "Unstego". You'll get the BIP39 seed. This is quick and easy.
3.) You have a BIP39 seed and want to hide it in some existing piece of text
Paste your existing text into the "Unhide" box and click "Unstego". You'll be given a list of BIP39 words that appear in your text. You need to go through and modify the text to remove every single one of them. Do this, and click "Unstego" again, until there are no longer any BIP39 words.
Now take your BIP39 seed and find places to insert those words into the text without inadvertently inserting other BIP39 words. Repeat until done. This is neither quick nor easy and honestly I can't recommend trying it.
Example
This pastebin contains a rewording of this BBC News article about life expectancy. I manually removed all of the existing BIP39 words from the article, and then inserted the words corresponding to my own BIP39 wallet seed. It contains some funds as a prize for whoever sweeps it first (update: this has been taken).
Conclusion
The text generated by stegoseed encodes exactly the same information as the original seed, and should be treated with equal care. You wouldn't go pasting your wallet seed around the Internet and expect not to have your money stolen, so don't do it just because the seed is hidden steganographically.
But it might help you if you have to hide your seed on paper in your house and you don't want a casual burglar to guess what it is. Or if you have to carry it on your person through an area where you might be searched by somebody who might want to steal your money.
If you like my blog, please consider subscribing to the RSS feed or the mailing list: