Probability and whatnot...
17 November, 2022
3182 words
16 mins

Introduction

This is a crazy challenge that drained my team's brains 😢 and we couldn't solve it in time of the CTF 🙁 After 2 days of trying to brainstorm, we have finally be able to crack the code, with a little bit of luck 😗

Since the intended solution will be provided by the author catto after the finals (18/12), which is over ONE MONTH AFTER from when I'm writing these lines (why it gotta take so long catto 😭 😭 😭), I decided to create my own writeup to explain our method of solving this problem 😗


Challenge

Description

The server generates a securely random 64-bit hex string password that is kept secret. We as the clients have to guess the secret password to get the flag.

The server gives us hints about the password through the output of the encrypt(password), which is a function we can call as many times as we can, each time it seemingly produces a different output despite feeding the same input to the function.

This writeup provides a way to obtain the flag with 8 outputs of encrypt(password) in a probability of about 1/401/40.


Code

#!/usr/bin/env python3 from secret import SECRET_BASIS from secrets import token_hex import random import os import copy assert len(SECRET_BASIS) == len(SECRET_BASIS[0]) == 128 BASIS = copy.deepcopy(SECRET_BASIS) def f(M: list[list[int]], C: list[int]) -> list[int]: v = [0] * len(M[0]) for c, m in zip(C, M): if c: v = [x ^ y for x, y in zip(v, m)] return v def random_bits(n: int) -> list[int]: return list(map(int, bin(random.getrandbits(n))[2:].rjust(n, "0"))) def encrypt(message: bytes) -> str: global BASIS random.shuffle(BASIS) M = [b for c in message for b in map(int, "{:08b}".format(c))] ct = [] for bit in M: C = random_bits(64) v = f(BASIS[:64], C) if bit else f(BASIS[64:], C) ct.extend(v) ct = "".join(map(str, ct)) return bytes([int(ct[i:i+8], 2) for i in range(0, len(ct), 8)]).hex() def decrypt(ciphertext: str) -> bytes: # TODO: implement this pls pass def action_prompt() -> int: print('''============= MENU ============= 1. Have a guess 2. Get ciphertext 3. Change password 4. Quit ================================\n''') try: option = int(input("Your option> ")) except: return None return option def chall(): try: password = token_hex(8) while True: option = action_prompt() if option == 1: user_password = input("Input your password (hex): ") if user_password == password: print(f"Is taking the red pill worth it? Here is the truth that you want: {os.environ['FLAG']}.") else: print("Guess you can't escape the matrix after all.") break elif option == 2: print(f"Leaky ciphertext: {encrypt(bytes.fromhex(password))}") elif option == 3: print("Generating super secure password ...") password = token_hex(8) elif option == 4: print("Maybe the truth is not that important, huh?") break else: print("Invalid option.") print("\n") except: print("An error occured!") chall()

Analyzing stuffs...

Let's first dissect everything in the source code to see what it does and how to make sense of the data given to us 😗 If you've already known what to expect, you can hop onto the next section to see the discussion of the solution ❤️

encrypt(message) converts message in hex form to its binary form M as a list of 0 and 1s,

def encrypt(message: bytes) -> str: ... M = [b for c in message for b in map(int, "{:08b}".format(c))] ...

For each bit of M, it generates a random 64-bit value C (also a list of 0 and 1s),

def random_bits(n: int) -> list[int]: return list(map(int, bin(random.getrandbits(n))[2:].rjust(n, "0"))) def encrypt(message: bytes) -> str: ... for bit in M: C = random_bits(64) ...

calls f(C, [:64]) or f(C, BASIS[64:]) depending on the current bit of M, appends the result of the call to the ct variable.

def encrypt(message: bytes) -> str: global BASIS ... ct = [] for bit in M: C = random_bits(64) ct = [] for bit in M: C = random_bits(64) v = f(BASIS[:64], C) if bit else f(BASIS[64:], C) ct.extend(v)

BASIS is a global variable, a 2-dimensional list with size 128x128 with unknown content whose rows are shuffled in every call to encrypt().

assert len(SECRET_BASIS) == len(SECRET_BASIS[0]) == 128 BASIS = copy.deepcopy(SECRET_BASIS) ... def encrypt(message: bytes) -> str: global BASIS random.shuffle(BASIS) ...

After loop, ct is then converted to its hex form and returned to the user. From this code, it is safe to assume that the value of ct is a list consisting of all 0s and 1s.

def encrypt(message: bytes) -> str: ... ct = [] for bit in M: ... ct.extend(v) ct = "".join(map(str, ct)) return bytes([int(ct[i:i+8], 2) for i in range(0, len(ct), 8)]).hex()

Also, it can be deduced from ct and the type hints and operations of f(), which is just bunch of repeated XORs of values in v andM[i] (M is BASIS[:64] or BASIS[64:]), that BASIS also consists of 128x128 values of 0s and 1s.

def f(M: list[list[int]], C: list[int]) -> list[int]: v = [0] * len(M[0]) for c, m in zip(C, M): if c: v = [x ^ y for x, y in zip(v, m)] return v

All of these XORs and 2-dimensional arrays and lists of 0s and 1s seem to point everything into viewing those things as addition of matrices and vectors in GF(2)GF(2). And it turns out we can view:

  • BASIS as a 128x128 matrix in GF(2)GF(2).

  • As for f(M, C) , there are two ways to view this function:

    You can either view it as an operation that produces a 1xn vector v by multipling a 1xm vector C with an mxn matrix M,

    v=CM\overrightarrow{v} = \overrightarrow{C} * M

    or as an operation that generates a linear combination of the rows in M with coefficients on C.

    v=i=0mCiMiCi{0,1}\begin{matrix} \overrightarrow{v} = \sum_{i=0}^{m} {C_i}\overrightarrow{{M_i}} & & C_i \in \{0, 1\} \end{matrix}

    I'll choose to see it in the 2nd way because it helps when discussing the solution later 😁

  • encrypt(message) randomizes C and uses f(BASIS[...], C) to produce random linear combinations of the first 64 rows of BASIS or the last 64 rows of BASIS depending on the current looping bit of M or message (0 - first 64, 1 - last 64). All of the resulting vectors are stored in ct before converted into hex form and returned to the user.

  • ct is a list of 64 different 128-dimensional vectors vi\overrightarrow{v_i} in GF(2)GF(2) that is either spanned by the last 64 rows of BASIS or the first 64 rows of BASIS. Each vector in ct corresponds to a bit in password.

Now we know how to view the encrypted password, let's discuss the solution!


How to know the bits of password?

In order to get all the bits of password, we must be able to classify which vector vi\overrightarrow{v_i} in ct belongs to which vector space. Is vi\overrightarrow{v_i} generated from BASIS[:64] (so passwordipassword_{i} is 0), or BASIS[64:] (so passwordipassword_{i} is 1)?

We don't even have BASIS, so if we know a particular vector is formed by a set of 64 rows in BASIS, we can't even tell if that set positions at the top of the matrix or the bottom! The only bet for us is to know which bit is different from which bit by comparing the vector space that spans the corresponding vectors. If we've got the mapping of which bit is different from which bit, we can set one random bit to be 0 and let the mapping does the rest. This gives us the probability of finding the correct answer at 1/21/2 (given the mapping, of course 😃)

Notations

Before moving on, I think I should introduce some notations here to help discussing the next part.

  • Bit i of password is passwordipassword_i.

  • vi,a\overrightarrow{v_{i, a}} is the i-th vector of ct, generated at a-th time we request to get encrypt(password).

  • Si,aS_{i, a} be the set of rows in BASIS that generates vi,a\overrightarrow{v_{i, a}}.

    Based on the rule of this challenge, we can see that in an aa-th encrypt operation:

    • Si,a=S_{i, a} = BASIS[:64] when passwordi=0password_i = 0,
    • Si,a=S_{i, a} = BASIS[64:] when passwordi=1password_i = 1,

    Which leads to these properties:

    • Si,a=Sj,aS_{i, a} = S_{j, a} when passwordi=passwordjpassword_i = password_j.
    • Si,aSj,aS_{i, a} \neq S_{j, a} when passwordipasswordjpassword_i \neq password_j.
  • P(e)P(e) is the probability of some event ee happening.

  • #S\#S is the length of set SS.


Matrix1.0

In Glitch in the matrix 1.0, comparing passwordipassword_i and passwordjpassword_j is very much doable. To solve this challenge, we have to provide a way to differentiate between the vectors that are either generated from BASIS[:64] or BASIS[64:].

Because password doesn't change every-time we call encrypt and most importantly, BASIS is not changed in the entire runtime of the program, vi,a\overrightarrow{v_{i, a}} is generated from the same set of 64 vectors as vi,b\overrightarrow{v_{i, b}} for any a,ba, b, or Si,a=Si,b.S_{i, a} = S_{i, b}.

This means that we can request 64 encrypts to obtain vi,0,vi,1,vi,2,...,vi,63\overrightarrow{v_{i, 0}}, \overrightarrow{v_{i, 1}}, \overrightarrow{v_{i, 2}}, ..., \overrightarrow{v_{i, 63}} that is in the same vector space as BASIS[:64] or BASIS[64:]. In order to verify if passwordi=passwordjpassword_{i} = password_{j}, or Si,a=Sj,aS_{i, a} = S_{j, a}, I created a 128x128 matrix that looks like this:

[vi,0vi,1...vi,62vi,63vj,0vj,1...vj,62vj,63]\begin{bmatrix} \overrightarrow{v_{i, 0}} \\ \overrightarrow{v_{i, 1}} \\ ... \\ \overrightarrow{v_{i, 62}} \\ \overrightarrow{v_{i, 63}} \\ \overrightarrow{v_{j, 0}} \\ \overrightarrow{v_{j, 1}} \\ ... \\ \overrightarrow{v_{j, 62}} \\ \overrightarrow{v_{j, 63}} \\ \end{bmatrix}

If the rank of this matrix is 6464, vi,0,vi,1,vi,2,...,vi,63,vj,0,vj,1,vj,2,...,vj,63\overrightarrow{v_{i, 0}}, \overrightarrow{v_{i, 1}}, \overrightarrow{v_{i, 2}}, ..., \overrightarrow{v_{i, 63}}, \overrightarrow{v_{j, 0}}, \overrightarrow{v_{j, 1}}, \overrightarrow{v_{j, 2}}, ..., \overrightarrow{v_{j, 63}} comes from the same span of 64 vectors, so Si,a=Sj,aS_{i, a} = S_{j, a}, and we can conclude that passwordi=passwordjpassword_i = password_j. If the rank of this matrix is 128128, the opposite is true and we get passwordipasswordjpassword_i \neq password_j.

In script, we fix i=0i = 0 and let jj runs from 1631 - 63 to generate the mapping and recover the password with probability of 1/21/2.


Matrix2.0

Glitch in the matrix 2.0 is the updated version where BASIS is shuffled every-time we request a new encrypt(password). This means that the property that:

vi,a\overrightarrow{v_{i, a}} is generated from the same set of 64 vectors as vi,b\overrightarrow{v_{i, b}} where aba \neq b, or Si,a=Si,bS_{i, a} = S_{i, b},

is not correct this time around, which is very sad 😭. We also cannot refer to BASIS[:64] and BASIS[64:] as constants anymore.

Probability hint

After a while, a hint came out from the author:

And we still had no idea what the hell she is talking about until the end of the CTF 😦

...


And then, a thought comes to my mind:

While it's true that vi,a\overrightarrow{v_{i, a}} is not gonna be generated from the same set of 64 vectors as vi,b\overrightarrow{v_{i, b}} if aba \neq b, some of the vectors in those sets might overlap. Let P(Si,aSi,b={})P(S_{i, a} \cap S_{i, b} = \{\}) be the probability of Si,bS_{i, b} purposefully choosing a set of different 64 vectors to Si,aS_{i, a} in a list of 128 vectors, then:

P(Si,aSi,b={})=1/C12864=4.17e-38P(S_{i, a} \cap S_{i, b} = \{\}) = 1/{C_{128}^{64}} = 4.17\text{e-38}

proving that this is a very, very, very, very, very unlikely event.

This gives me an insight that the sets Si,aS_{i, a} and Si,bS_{i, b} almost always overlap, and from above we've got:

P(#(Si,aSi,b)127)=11/C12864P(\#(S_{i, a} \cup S_{i, b}) \le 127) = 1 - 1/{C_{128}^{64}}

This might be the "probability" thing catto was talking about! Also from this another idea is formed!


Detecting 128n+1\lfloor{\frac{128}{n}}\rfloor + 1 similar bits in password

Let's say we generate encrypt(password) nn times for n3n \ge 3.


And somehow, for a certain value of passwordipassword_i, we obtain the vectors vi,0,vi,1,...,vi,n1\overrightarrow{v_{i, 0}}, \overrightarrow{v_{i, 1}}, ..., \overrightarrow{v_{i, n-1}} such that:

#(Si,0Si,1Si,2  ... Si,n1)127\#(S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1}) \le 127

This means that Si,0Si,1Si,2  ... Si,n1S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1} contains about 127 rows in BASIS except one/some row(s). The remaining row(s) of BASIS would be present in the set of Sj,0Sj,1Sj,2  ... Sj,n1S_{j, 0} \cup S_{j, 1} \cup S_{j, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{j, n-1} where passwordjpasswordipassword_j \ne password_i as Sj,aS_{j, a} always contains the vectors that Si,aS_{i, a} doesn't have.


And somehow, we can find a set of 128n+1\lfloor{\frac{128}{n}}\rfloor + 1 elements {passwordi0,passwordi1,...,passwordi128n}\{password_{i_0}, password_{i_1}, ..., password_{i_{\lfloor{\frac{128}{n}}\rfloor}}\}, where passwordi0=passwordi1= ...=passwordi128npassword_{i_0} = password_{i_1} = \text{ } ... = password_{i_{\lfloor{\frac{128}{n}}\rfloor}}.


So, this matrix MM, which has n(128n+1)>128n(\lfloor{\frac{128}{n}}\rfloor + 1) > 128 rows,

M=[vi0,0vi0,1...vi0,n1......vi128n,0vi128n,1...vi128n,n1]M = \begin{bmatrix} \overrightarrow{v_{i_0, 0}} \\ \overrightarrow{v_{i_0, 1}} \\ ... \\ \overrightarrow{v_{i_0, n-1}} \\ ... \\ ... \\ \overrightarrow{v_{i_{\lfloor{\frac{128}{n}}\rfloor}, 0}} \\ \overrightarrow{v_{i_{\lfloor{\frac{128}{n}}\rfloor}, 1}} \\ ... \\ \overrightarrow{v_{i_{\lfloor{\frac{128}{n}}\rfloor}, n-1}} \\ \end{bmatrix}

must have rank 127\le 127 as all of these vectors are generated from the set of Si,0Si,1Si,2  ... Si,n1S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1}, which contains only 127\le 127 rows of BASIS.


If we change just one of passwordikpassword_{i_k} to a different bit from the rest, the above matrix MM will most likely have higher rank (often 128128) than before, since there will be a row in MM that is a linear combination of the vectors including the one(s) that is/are not in Si,0Si,1Si,2  ... Si,n1S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1}.


Here comes our mechanism to detect 128n+1\lfloor{\frac{128}{n}}\rfloor + 1 similar bits in password. In particular, if we randomize {i0,i1,...,i128n}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor}}\} enough times and construct MM, we might stumble across a situation where MM has rank 127\le 127 and we can use this to say that passwordi0=passwordi1= ...=passwordi128npassword_{i_0} = password_{i_1} = \text{ } ... = password_{i_{\lfloor{\frac{128}{n}}\rfloor}} with high probability.


However, in order for this to work, we need to make sure to choose nn so that both of these conditions hold:

  1. #(Si,0Si,1Si,2  ... Si,n1)127\#(S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1}) \le 127 occurs in high probability.

  2. Choose {i0,i1,...,i128n}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor}}\} that passwordi0=passwordi1= ...=passwordi128n=0password_{i_0} = password_{i_1} = \text{ } ... = password_{i_{\lfloor{\frac{128}{n}}\rfloor}} = 0 in high probability.


It turns out, if scenario 1. occurs more likely, scenario 2. will occur less likely and vice versa, so we need to balance nn. Through trials and errors, I've decided to choose n=8n = 8 and the correct set {i0,i1,...,i128n}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor}}\} is found in a probability of about 1/201/20, after brute forcing at most 400000400000 sets.


Recover the remaining bits in password

Remember the matrix MM from the last section? We can use it to detect the rest of the bits in password. For every jj-th bit in password that j{i0,i1,...,i128n}j \notin \{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor}}\}, I decided to construct a new matrix NN from MM that is:

N=[Mvj,0vj,1...vj,n1]N = \begin{bmatrix} M \\ \overrightarrow{v_{j, 0}} \\ \overrightarrow{v_{j, 1}} \\ ... \\ \overrightarrow{v_{j, n-1}} \\ \end{bmatrix}

Then compares rank(N)rank(N):

  • =rank(M)= rank(M), we have passwordj=passwordi0=passwordi1= ...=passwordi128npassword_j = password_{i_0} = password_{i_1} = \text{ } ... = password_{i_{\lfloor{\frac{128}{n}}\rfloor}}.

  • >rank(M)> rank(M), we have passwordjpasswordi0=passwordi1= ...=passwordi128npassword_j \neq password_{i_0} = password_{i_1} = \text{ } ... = password_{i_{\lfloor{\frac{128}{n}}\rfloor}}.

With this information, we can recover password with a probability of 1/21/2.


Solve code

This piece of code implements what I've just described above, with some small changes:

  • Constructs MM by brute-forcing through a bigger random list, {i0,i1,...,i128n+1}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor + 1}}\} instead of just {i0,i1,...,i128n}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor}}\}. This allows MM to have even more rows, ensuring us that we get MM where rank(M)127rank(M) \le 127 actually because of #(Si,0Si,1Si,2  ... Si,n1)127\#(S_{i, 0} \cup S_{i, 1} \cup S_{i, 2} \text{ } \cup \text{ } ... \text{ } \cup S_{i, n-1}) \le 127, not just some false positives.

  • Uses multiprocessing to find the list {i0,i1,...,i128n+1}\{{i_0}, {i_1}, ..., {i_{\lfloor{\frac{128}{n}}\rfloor + 1}}\} that yields rank(M)127rank(M) \le 127 faster.

from Crypto.Util.number import * from tqdm import tqdm, trange from pwn import remote, process from sage.all import * from multiprocessing import Process, Queue import random, time host = '34.132.73.130' port = int(8003) B = 64 N = 128 K = 8 targetRank = N-1 def convertToMatrix(input: bytes): bits = '' for byte in input: bits += f'{byte:08b}' bits = list(map(int, bits)) bits = [bits[i:i+128] for i in range(0, len(bits), 128)] return Matrix(GF(2), bits) class Solver: # -------------------------------------- CONNECT -------------------------------------- def __init__(self) -> None: self.io = remote(host, port) def __del__(self): self.io.close() def request(self, data): if isinstance(data, str): data = data.encode() elif not isinstance(data, bytes): data = str(data).encode() self.io.send_raw(data + b'\n') def recvline(self): return self.io.recvline().strip().decode() # -------------------------------------- REQUEST -------------------------------------- def get(self, option): self.io.recvuntil(b'option> ') self.request(option) def encrypt(self): self.get(2) return convertToMatrix(bytes.fromhex(self.recvline().split(': ')[-1])) def submit(self, key): self.get(1) self.request(key) def changePassword(self): self.get(3) # -------------------------------------- MAIN -------------------------------------- def execOneCore(self, U, Ntries, delta, retQueue, progBar): choices = list(range(1, B)) for _ in range(Ntries): random.shuffle(choices) combination = choices[:N//K + int(N%K != 0) + delta] P = block_matrix( [[U[i]] for i in combination] ) if P.rank() <= targetRank: retQueue.put([ P.rank(), combination ]) break if not retQueue.empty(): break progBar.update() def findBestComb(self, U, ncores=4, Ntries=100000, delta=2): # Progress bars progBars = [] for i in range(ncores): progBar = tqdm(desc=f'#{i}', total=Ntries) progBars.append(progBar) # Hold return values of all cores retQueue = Queue() # Create new processes futures = [] for _ in range(ncores): futures.append(Process( target=self.execOneCore, args=( U, Ntries, delta, retQueue, progBars[_], ))) # Start all processes for future in futures: future.start() # Wait for all process to finish :) for future in futures: future.join() # Close all progress bars for progBar in progBars: progBar.close() # Get answer! if not retQueue.empty(): rank, comb = retQueue.get() return rank, comb return None, None def main(self): M = [] for i in range(K): M.append(self.encrypt()) U = [] for i in range(B): u = [] for j in range(K): u.append([M[j][i, :]]) U.append(block_matrix(u)) print(f"[i] Find combination of vectors that forms a matrix with rank <= {targetRank}...") rank, comb = self.findBestComb(U, Ntries=100000, delta=2) if comb == None: return False print(f"[i] Best rank: {rank}") print(f"[i] Best combination: {comb}") # print(f"[i] Debug matrix:\n{block_matrix([[U[i]] for i in comb], subdivide=False)}") # Deduce the password bits... passwordbits = '' for j in range(B): if j in comb: passwordbits += '0' continue P = block_matrix( [[U[i]] for i in comb + [j]] ) if P.rank() > rank: passwordbits += '1' else: passwordbits += '0' # Convert it to hex and send to server :) password = bytes([int(passwordbits[i:i+8], 2) for i in range(0, len(passwordbits), 8)]).hex() self.submit(password) print('[i] Password is (maybe):', password) print('[i] Password bits are (maybe):', passwordbits) # Get answer from server, which is estimated to be successful at around 1/20... answer = self.recvline() print('server>', answer) if 'Here is the truth that you want' in answer: return True return False cnt = 0 while True: cnt += 1 solver = Solver() if solver.main(): break print(f"[i] Solver took {cnt} times.")
table-of-content
::: Made with (❤️ && 😭) by Mistsu :::
Your browser sucks, you should consider getting a new one (one that supports display: grid)
>.<