Context
Computer Science Bachelor Thesis
Duration
Three months
My Role
Coding & Research
Team
Me &
Prof. Emiliano Casalicchio
Abstract
On this page I'll give a quick overview of my Computer Science thesis.
The full thesis is available here.
For this project I repurposed a generative AI model called PassFlow, used to generate passwords that appear to be made by humans, and turned it into a "guesser model", i.e. a model that tries to discern real, human-generated passwords, from fake ones, created with other password generation techniques.
This is called honeyword guessing, and these "fake" passwords are called honeywords.
I empirically tested this repurposed model against other models previously used in the literature to compare their performance and found that, under certain conditions, PassFlow outperforms them.
Sweetwords & Honeywords
Honeywords are fake passwords that are stored in the database alongside the user's real password.
The point is to confuse potential attackers, as even if they manage to gain access to the database, they won't be able to distinguish the real password from the honeywords.
Then, if an attacker attempts to access an account using a honeyword, the system can detect the intrusion and initiate appropriate security measures.

A sweetword is a string whose identity we don't know - it could be either a real password or a honeyword.
Each user in the database will have a set of

There are various methods to automatically generate Honeywords. These methods are called Honeyword Generation Techniques.

Schematic of an example Honeyword Generation Technique "Modeling Syntax"
Honeyword Guessing means trying to guess, given a set of sweetwords, which of them is the real password and which are instead honeywords.
To do this, a model assigns a probability to each of the sweetwords, and the sweetword that is assigned the highest probability of being a real password is selected for an access attempt, a "guess".
This guess is sent to the honeychecker, the part of the system that will check if the guess is correct.

Schematic of the honeyword guessing process: assign probabilities, select guess and check.
PassFlow
PassFlow, a model devised by Pagnotta et al. [1], is a type of generative machine learning model. Specifically, it is categorized as a Generative Flow.
It was originally designed to compete with models like PassGan [2] in traditional Password Guessing.
Generative Flow models like PassFlow attempt to model the distribution they observe in a training set, in order to generate completely new samples.
To do this, the model learns an invertible function
In the case of PassFlow, the training set is a list of real passwords, and once trained, PassFlow can generate new passwords similar to those created by real users.

The
To generate a password using PassFlow we generate a value from the starting distribution (Gaussian) and give it to function
The property of generative flows that allows us to use them for honeyword guessing is their invertibility. In fact, by taking care to re-normalize the distribution at each step, we can invert

Reversing the flow to obtain the probability of a sample
Experiments
To test PassFlow's performance, we compared it with another simpler model that has already been used with success in the existing literature. The model we selected is a Markov Model, chosen because in Wang et al.'s [3] paper on the Security of the Honeywords System, it proved to be a top performer at honeyword guessing according to several metrics.

Experiment workflow
Results
We tested four HGTs using two evaluation metrics: flatness and success. These are better explained in My Thesis.

Results of the experiments
These experimental results show that PassFlow can be effectively used as a model for honeyword guessing, and that under certain conditions, its performance surpasses that of other models previously used successfully in the literature.
Full Thesis
References:
1. Passflow: guessing passwords with generative flows. - G Pagnotta, D Hitaj, F De Gaspari, L Mancini.
2. Pass-gan: A deep learning approach for password guessing. - B Hitaj, P Gasti, G Ateniese, F Perez-Cruz.
3. A security analysis of honeywords. - D Wang, H Cheng, P Wang, J Yan, X Huang.
(Complete reference list in the Full Report)