*Tue 9 October 2018*

*Tagged: chess, steganography*

I designed a steganography system that encodes data as a chess game. A convenient way to communicate chess games is PGN, but any means of communicating the moves of the game would work, as the information is encoded conceptually in the moves themselves, rather than taking advantage of any redundancy in the PGN format.

You can play with it here: Chess Steganography.

The code is on github. There is also a perl implementation of the same concept, but the encoding is incompatible because it was easier that way.

As an example, the text "Hello, world!" becomes:

1. e3 b6 2. b3 Bb7 3. Nf3 Bxf3 4. Bd3 e6 5. h4 g6 6. b4 h5 7. gxf3 c5 8. Bb2 a5 9. Rh3 e5 10. c4 Bh6 11. Be2 Ke7 { White resigns. } 0-1

You can look at this game on lichess, and copying and pasting the full PGN from lichess into the "Unsteg" box (despite lichess's annotations!) will allow you to retrieve the message.

First we encode the input data bytes as a bignum so that it's easier to work with. I prepended an imaginary "0x01" byte to the input so that leading "0x00" bytes are not lost ("0x00 0x03" and "0x00 0x00 0x03" both become just "3" as a bignum; with a "0x01" at the start they can be distinguished from a plain "0x03").

At a given turn, we generate the list of all legal moves. Discard any moves that would end the game (as that might leave us unable to encode all of the data) and sort the moves so that we know we have a canonical ordering. For the first move, there are exactly 20 legal moves.

If every turn had exactly 20 legal moves, we could easily encode the data in the chess game by encoding it as a base-20 number and then choosing the move, for each digit, that corresponds to that digit.

The fun trick here is that the same scheme works even if the number of legal moves per turn is not constant! To remove the least-significant-digit in base 20, we just take our bignum modulo 20, and then divide the bignum by 20 (truncating a fractional part, instead of rounding) before processing the next digit. This works just as well even if the base is not constant. So for each turn of the game, we get to encode some amount of data based on how many legal moves were available. Each move is a "digit" of the number in a variable-base numbering system, where the "base" at any given digit is dictated by how many legal moves there were.

So now we just loop until our bignum is zero, generating the list of legal moves at each turn, and consuming some of the bignum by encoding it into a chess move. Having consumed the bignum, we can just pretend that the player who is supposed to move next resigned instead of playing a move, and we have a complete chess game.

About 1 byte per move (or 4 bits per ply) is typical for short messages (which is about what you'd expect if you assume around 16 possible moves per turn). For longer messages this gets worse, as the endgame tends towards having just the two kings remaining on the board, where fewer possible moves are available.

Let's try to encode the letters "JS".

As bytes, we have "0x4a 0x53". Since my implementation was little-endian by accident, we get "0x01 0x53 0x4a" after prepending the 0x01 byte. 0x53 = 83 and 0x4a = 74. This corresponds to the number 86858 (1*256*256 + 83*256 + 74).

White's first turn of the game has 20 possible moves (sorted lexically): Na3, Nc3, Nf3, Nh3, a3, a4, b3, b4, c3, c4, d3, d4, e3, e4, f3, f4, g3, g4, h3, h4.

So we need to get the least-significant-digit of the base-20 representation of our bignum, which is 86858 % 20 = 18. The move with index 18 (starting from 0) is **h3**.
So white plays the move h3, and we divide our bignum by 20: 86858 becomes 4342.

Black's first turn of the game also has 20 possible moves. Black needs to play the move at index 4342 % 20 = 2. This move is **Nf6**. So black plays Nf6, and then
we divide our bignum by 20 again: 4342 becomes 217.

White's second turn has only 19 legal moves now, so now we need the least-significant-digit of the base 19 representation of what remains of our bignum.
Not a problem. 217 % 19 = 8, move 8 is **c3**. White plays c3, and we divide our bignum by 19 this time: 217 becomes
11.

Black's second turn has 22 legal moves. 11 % 22 = 11, move 11 is **b6**. Black plays b6 and we divide our bignum by 22: 11 becomes 0. Game over. It's white to
move, so white graciously resigns, and we have finished encoding the number 86858, which encodes our letters "JS":

1. h3 Nf6 2. c3 b6 { White resigns. } 0-1

It should be clear that the same scheme works for longer messages, and in practice I have found it works for messages that appear *way* too large, although a 600-move
chess game is obviously suspicious.

Decoding the message is the same process, but in reverse: instead of generating the bignum and then consuming it with chess moves, we build up the bignum from the chess moves and then decode it into the raw bytes.

My current implementation avoids checkmating, and ignores draw by repetition and the 50-move rule, but doesn't notice a stalemate. This is because detecting stalemate in chess.js is a bit more annoying without simulating every possible move at every turn, which makes it quite slow. The downside of this is that there is likely some set of data that cannot be encoded at all because it leaves the game in a stalemated state, where there are no legal moves. Even if this were fixed, it is possible that the game could get into a state where there are legal moves, but all of them are either checkmate or stalemate. After checkmate and stalemate moves are rejected, there would be no available moves to select from. In practice I've not run into any of these conditions so far, avoiding checkmate seems to be good enough.

Also, the moves played in the games are quite likely to be complete blunders, as the algorithm has no idea what makes a reasonable or unreasonable move. It simply plays moves based on whatever move encodes the data. You could encode data using quite-convincing moves if, instead of allowing selection from all ~20 moves per turn, you limited it to only the moves that a chess engine doesn't consider to be blunders. This would make the games more convincing, at the cost of making them longer. And you'd also need to come up with a reason that the engine plays good moves, but seems to miss every available checkmate.

The system as implemented involves a contrived game where both players' moves are chosen freely, but it could be extended to encode a message even when playing against a player who is not complicit. It would work almost the same, but moves would need to be chosen on-the-fly because our list of legal moves depends on what our opponent played. Once we've consumed our entire bignum, we simply resign. This "one-sided" mode does have the problem that the other player might checkmate us before we've got our message across, however. A practical way to use it might be to have the sender set up a lichess account which the recipient knows about. Whenever the sender wants to send a message, he plays a game against anybody else on lichess, encoding his message in the moves he chooses. (In the event that he gets checkmated before finishing the message, he just plays another game and tries again). The recipient periodically checks this lichess account for new games (specifically, ones in which the sender resigned rather than getting checkmated), and decodes the moves to retrieve the message.

James Stanley - james@incoherency.co.uk | ricochet:it2j3z6t6ksumpzd | jesblogfnk2boep4.onion | /ipns/jes.xxx/ | [rss]