TP Node Level Parallelism⚓
Source codes can be found in this repository: Sources
Warm up⚓
We will start by setting up the environment for multi-node MPI execution. First you will need to connect to a couple of servers other than the one you are running on, that we will call
fl-tp-br-XXX
fl-tp-br-YYY
fl-tp-br-ZZZ
To make sure that no password is asked when you connect, you can generate a ssh key (which will be seen on all nodes thanks to the shared HOME filesystem)
1 2 3 |
|
1 |
|
main_1.cpp
file, compile it using the OpenMPI C++ compiler and run it on the three nodes
1 2 |
|
:1
suffix after the name of the nodes indicates how many MPI
processes can be spawned on the corresponding node. The sum of all of these
values must be greater or equal to the number of processes you want to run
(-np
).
Question 1
Now analyze the execution command and the code to answer these questions:
- Verify that different
processor
names are printed. - What does the program do? Do all nodes execute the same code
- Look at the man page to understand the options passed to
mpirun
. How can you run the program with a different number of nodes? - Now look at the code. How does the structure of the code differ w.r.t a non-parallel C++ code? How is the MPI environment initialized and destroyed?
- The master and compute nodes execute different code. What makes this possible?
Brute force it!⚓
The program in main_2.cpp
attempt to find the string that was used to generate the sha512
hash you will provide as input. Hash functions are functions that map an arbitrary sequence of bits into a finite sequence of bits. By construction these functions are surjective and do not admit an inverse, which makes them perfect for security applications. One of their most common use cases is to verify passwords: any decent website or service provider will not store your password on their system as it would be at risk of being divulged if a data breach were to occur; instead they will store a hash of your password. When you send your password for authentication they will compute its hash and compare it to the stored value, which will (almost) unequivocally verify that your password is correct. Hashes can also be used to verify data authenticity, integrity, ... in addition to applications outside of cryptography, such as constructing data maps (see std::map
).
When attackers obtain password hashes out of data breach they usually want to recover the original password. But given the non-invertibility of hash functions they have to use brute force and try several combination of words until they can guess one that generates the correct hash (more advanced techniques exist to speed up the process).
The objective of this exercise is to recover the original word out of the hash using a brute force approach on the "cluster" at your disposal. You will have to compute the hashes of all the words in the dictionary file using the computation nodes, until you get the find the correct one. Several parts of the code have been removed on purpose and you will have to set up the communication part of the program. To simplify things, the master node will not be doing any of the computations itself, it will only be in charge of dispatching sequences of words to the computation nodes.
To set up the environment, you will download a dictionary of words and passwords specially designed for the kind of task we want to do! The download and decompressing might take a bit of time...
1 2 |
|
Question 2
Compare the dictionary you have downloaded to the system dictionary /usr/share/dict/words
in terms of size, content,...
To simplify things for your first attempt at password cracking, we will create auxiliary dictionaries that contain only passwords made of 8 characters
1 2 |
|
Question 3
The code in main_2.cpp
provides a skeleton of the cracking program you will have implement. Before compiling and executing it you will have to fill in the and implement the different functions indicated by a "TODO".
- How would you parallelize the brute force search of the hash?
- Have a look at the code for the master and the computation nodes and understand the order in which the different functions are called.
- What are the master and computation nodes going to do?
- Check the man MPI documentation to figure out what functions you will have to use.
- With this in mind you can start implementing the different functions. At first you should keep using the system dictionary that is easier to work with.
To verify if your code runs, start by generating the hash of a common word:
1 |
|
1 2 3 |
|
crackstation-human-only-8-chars.txt
file you have downloaded as dictionary.
Question 4
Now that you have a running code answer these questions:
- How does the computation time scale with the number of nodes/number of words?
- Try loading all the words in the
crackstation-human-only-8-chars.txt
, are you encountering any issues? - Does the program stop its work immediately after finding the hash? If not why? Is it intended? How would you solve this?
Question 5
Bonus: Modify your code to make it work with words of arbitrary size!
Phantom cluster⚓
In this exercise, we will use the previous experience to compute the discrete Fourier transform (DFT) of a signal of the form
and \(t_n = t_\text{min} + n d_t\). The DFT can be computed by (i) computing the DFT matrix
where \(\omega = \exp\left(-2 \pi \mathrm{i} / N\right)\) and implement a (parallel) matrix-vector product (MVP)
This way of computing the DFT is extremely inefficient and is only used here for didactic purposes. The fast Fourier transform (FFT) algorithm should be used in any other context!
In HPC, system matrices are often too big to store or multiply in a single node. We will replicate this setting here by distributing the rows of the DFT matrix to different nodes. Each row will be computed and stored on a single compute node.
Question 6
To verify that your code is working, the result of the DFT will be displayed in graph form. What do you expect to see, precisely ?
Question 7
Analyze the provided skeleton code asking yourself the same questions as in the previous exercise and complete the missing functions.
To compile and run the code use the following commands
1 2 |
|
main_3
are, in order, \(N\), \(f_m\), \(f_p\), \(t_\text{min}\), and \(t_\text{max}\).
Question 8
We have made the choice of splitting the DFT matrix into rows, is it the most efficient way of handling its distribution? Consider the case of a very large matrix for which you can not store a full row on one of your nodes.
Question 9
Implement the solution to the previous problem. What additional challenges will you encounter?