Keystroke biometrics in the encrypted domain: a first study on search suggestion functions of web search engines

Whiskerd, Nicholas; Körtge, Nicklas; Jürgens, Kris; Lamshöft, Kevin; Ezennaya-Gomez, Salatiel; Vielhauer, Claus; Dittmann, Jana; Hildebrandt, Mario

doi:10.1186/s13635-020-0100-8

Research
Open access
Published: 21 February 2020

Keystroke biometrics in the encrypted domain: a first study on search suggestion functions of web search engines

Nicholas Whiskerd¹,
Nicklas Körtge¹,
Kris Jürgens¹,
Kevin Lamshöft¹,
Salatiel Ezennaya-Gomez¹,
Claus Vielhauer^1,2,
Jana Dittmann¹ &
…
Mario Hildebrandt¹

EURASIP Journal on Information Security volume 2020, Article number: 2 (2020) Cite this article

5851 Accesses
5 Citations
Metrics details

Abstract

A feature of search engines is prediction and suggestion to complete or extend input query phrases, i.e. search suggestion functions (SSF). Given the immediate temporal nature of this functionality, alongside the character submitted to trigger each suggestion, adequate data is provided to derive keystroke features. The potential of such biometric features to be used in identification and tracking poses risks to user privacy.For our initial experiment, we evaluate SSF traffic with different browsers and search engines on a Linux PC and an Android mobile phone. The keystroke network traffic is captured and decrypted using mitmproxy to verify if expected keystroke information is contained, which we call quality assurance (QA). In our second experiment, we present first results for identification of five subjects searching for up to three different phrases on both PC and phone using naive Bayesian and nearest neighbour classifiers. The third experiment investigates potential for identification and verification by an external observer based purely on the encrypted traffic, thus without QA, using the Euclidean distance. Here, ten subjects search for two phrases across several sessions on a Linux virtual machine, and statistical features are derived for classification. All three test cases show positive tendencies towards the feasibility of distinguishing users within a small group. The results yield lowest equal error rates of 5.11% for the single PC and 11.37% for the mobile device with QA and 23.61% for various PCs without QA. These first tendencies motivate further research in feature analysis of encrypted network traffic and prevention approaches to ensure protection and privacy.

1 Motivation

So-called search suggestion functions (SSF) are popular means to support users in typing search terms in text fields of search engines. These SSF are technically implemented by incremental, character-by-character server-side queries, where key-by-key information of the typing process is transferred to the servers, which then return proposed search term lists to the web browsers for display and potential selection by the users. Search engines use the Transport Layer Security (TLS) protocol for protection of the communication (authentication and encryption) between client and server. As keystrokes are input, the search engine creates encrypted data packets and forwards them to the search provider via the TCP/IP network. Typically, search suggestions are provided as the queries are typed, i.e. immediately after each keystroke. Exactly what data is shared with the search provider and how frequently packets are exchanged is not transparent. Therefore, the level of data disclosure involved in interacting with search engines is likely not something a typical user is aware of and consent is unclear.

Keystroke patterns, or more precisely, temporal features of key presses during the user’s typing of text sequences, are commonly known to be potentially adequate for biometric user recognition. In this domain, biometric keystroke analysis, the dynamics of the typing are utilised to construct typical user patterns and to use those for biometric identification or verification. This can be achieved by means which is either text-dependent or text-independent. If an instance of biometric recognition is possible, this would allow passive registration of a user during normal search behaviour by the search engine provider. Moreover, potentially a third party with access to the network data stream could classify similarly, even if only considering the subset of biometric information in the encrypted domain.

However, there are limiting factors on derivable keystroke information in the specific context of SSF. In the context of established keystroke dynamics, only a subset of conventionally usable keystroke events are available in the data captured from the network stream. When typing on a traditional hardware keyboard, key down events trigger input, and therefore, these are the only events which can be inferred from the respective captured network data. In this case, the distance measure from which biometric features can be derived is a “down-down” (DD) distance. On the other hand, conventional software keyboards such as those commonly used on modern mobile devices will only input characters on key up events. Thus, the distance measure for software keyboards is an “up-up” (UU) distance. Further to these limitations, dynamic network conditions of realistic environments will influence the time of capture for each the keystroke and risk damaging biometric usefulness or reliability. These factors are presented in Fig. 1 which breaks down the keystrokes and respective packets for the incrementally typed word “weather” on a hardware keyboard. The figure contrasts the conventional biometric perspective (i.e. involving dwell and flight times) with the limited derivable features from encrypted captured packets, which nonetheless may divulge significant biometric information.

Once a biometric keystroke pattern is built, the user could be later re-identified in further typing in the same applications (search engines) or within other applications and platforms. Even if the visit is anonymous, the keystroke pattern allows potential correlations to be made between visits and also across other applications, with further potential to divulge true identity. While discrimination in large populations may be difficult, the application within smaller domains is enough of a risk for identification, e.g. shared devices within families or school computer labs. Whether this data is processed (and, if it is, how) may not be determinable from the application perspective, but from the network perspective, a data collection is very possible. There are considerable legal requirements, e.g. GDPR [1], to abide by given the tracking and misuse potential.

The observation of the key-by-key transmission of user data to servers, in combination with the known potentials for keystroke dynamics, poses risks of unwanted user identification/tracking. As a prospective violation of user privacy, this motivates the work presented in this article. Thus, the following main contributions are made, which appear to be closely within the scope of this special issue “Biometric Authentication on Mobile Devices”:

Thorough theoretical and experimental analysis of potential flows of biometrically relevant keystroke data by SSF for different browsers, computing platforms, keyboard concepts and operating systems.
Constitution of the relevance of leakage of biometric keystroke data in SSF both on desktop computers, as well as ubiquitous mobile computing platforms such as smartphones and tablet devices.
Identification and formulation of a novel research challenge involving biometrics on mobile devices, in regard to data security and user privacy.
Proof of concept of the feasibility of biometric user identification based on timing features of metadata in encrypted network data packets, resulting from keystroke input to SSF.

With all these observations and privacy implications as incentive, the investigation in this article is focused towards two research questions as goals:

Q₁: General analysis of the relation between encrypted network packets generated by SSF and the temporal keystroke sequences: how frequently are TLS protocol packets sent during typing of a search query and can these packets be assigned to the temporal relations between single, individual keystrokes?
Q₂: Analysis of the potential of biometric user identification by metadata in encrypted network data packets generated by SSF: are there biometric features which can be derived from the TLS-encrypted packets based on the typing (keystroke dynamics) and, if so, how effective is this limited feature space for biometric recognition within the encrypted domain?

In approaching these research questions, three test goals are established which are formally defined in Section 3.2. The first phase of investigation (T₁) involves an overall assessment of behavioural tendency (outlined in Q₁) across four search engines S (Google S₁ and three others with a focus on privacy: Qwant S₂, DuckDuckGo S₃ and Ecosia S₄), two browsers B (Chrome B₁ and Firefox B₂), and two operating systems (Linux L and Android A). This assessment is done with two different input methods K: a hardware keyboard on Linux K_HW and a software keyboard on Android K_SW. A second phase of experimentation (T₂) involves a small data collection from five subjects N₁ – N₅ on both K_HW and K_SW with a focus on narrower selected test cases for one selected search engine and browser, which is chosen based upon the first phase findings. The decrypted network traffic captured with these criteria is used to create a proof of concept and to assess the impact of differing lengths of the search query phrase. The third and final experiment (T₃) involves ten anonymous subjects in more realistic and varied network conditions. For T₃, an open call for contributions was created in our laboratory with the option to upload the captured samples anonymously. The PCs used by participants differed in hardware and respective keyboards K_HW, but the experiment ensured the use of a common Linux virtual machine. From this setup, the encrypted data capture is focused upon in analysis for identification and verification potential.

The further article is structured as follows: Section 2 covers related work and presents conventional features we base our analysis upon; Section 3 details the technical specifications of our experimental setup, goals, and assumptions and limitations; Section 4 explains the methods we use in pre-processing and evaluating our captured data; Section 5 presents the full data collection specifics along with our results presentation and discussion; and Section 6 presents our conclusions.

2 Related work

History of keystroke dynamics dates back to early days of wireless radio, in which people transmitting Morse code messages could learn to identify each other by the way a person manually input the code, i.e. their unique “fist” [2], notably useful in verifying the integrity of military intelligence communications.

In recognition from typing, preliminary efforts established potential of keystroke dynamics to authenticate a user’s access to a computer. Works [3] considered the moment each key is struck (key down) and the resultant distances between. Through the application of statistical models, the authors were able to present clearly distinct differences of different typists on a small population. These early principles were built upon in the work [4] where the authors consider lengths of time keys remain depressed as further features, along with research into “Free-Style” text (free-text) rather than only structured input (fixed-text).

Presently, the standard data that can be captured from hardware keyboards on modern computers are the times when each specific keys are pressed and released, i.e. the key down and key up events. These can easily be captured by specialised software designed to record keystrokes. Exactly how these key events are used to build features in keystroke analysis varies but are in principle built from temporal distances between these events. A subset of the following features is typically used [5], denoted as combinations of down (D) and up (U) events to represent the time differences between when:

DU1: a key is pressed and the same one is released (dwell time)
DU2: a key is pressed and the next key is released
UD: a key is released and the next is pressed (flight time)
DD: a key is pressed and the next key is pressed
UU: a key is released and the next key is released

Based upon the choice of features, various machine learning techniques are often applied to classify the data [5] such as Bayesian classifiers.

Much of the study historically has been on hardware keyboards. Today, software keyboards are vastly popular on mobile devices, and this has opened up possibilities for further features to be derived from input and interaction, that is, beyond only when the keys are pressed and released [6]. More broadly, further information can be included for classification on both software and hardware keyboards, as there exists potential for combination with other biometric features or behaviours, e.g. patterns and habits in performing searches.

With regard to biometric recognition in the encrypted domain, a great variety of methods have been proposed already to protect biometric data during processing and communication to ensure privacy and confidentiality for the users [7]. These concepts typically involve some kind of cryptography and related key assignment to the entities involved in protocols, limiting access to biometric data to only those legitimate entities.

Works on biometric recognition based on entirely encrypted data, without any access to keys or plain unencrypted data, seem to be limited to date. However, some examples [8] demonstrate classification capability even with only limited features derived from network packets identified to contain keystrokes.

3 Experimental setup

To the best of the authors efforts, no earlier comparable publications on keystroke analysis based on network packets could be identified, and consequently, no previous test goals, protocols, or data can be referred to. Therefore, in this work, we propose a novel scenario of experimental analysis in applying keystroke biometrics to SSF and also present our own test data.

Our tests must therefore achieve an assessment of general tendencies in different test environments: how we observe differences in the keystroke-capturing behaviour exhibited by these different environments, based on the network packets we can capture and their contents. After successfully capturing data within the established conditions, we evaluate the biometric usefulness of the data.

We first present in Section 3.1 “Test setup” an initial overview of all components in our test environments and further specify in detail. The following Section 3.2 “Test goals” describes our intentions for use of the overall setup. Under Section 3.3 “Assumptions and limitations”, we then present the constraints we place on our setup, in consideration of how this will affect result interpretation.

3.1 Test setup

The experiments T₁ and T₂ are performed on decrypted network traffic captured with a man-in-the-middle proxy (mitmproxy [9]). The experiment T₃ is performed on the foundation of the encrypted network traffic. The two different test setups are described in the following subsections.

3.1.1 Decrypted network capture involving man-in-the-middle (T₁ and T₂)

The overall setup for our investigation includes the known web-based client-server architecture enhanced with mitmproxy to capture network data.

This mitmproxy is the intermediary to the network connection of each of our test devices as shown in Fig. 2. By emulating the role of the search engine provider, mitmproxy receives the same data from the clients as the genuine provider would. Moreover, using mitmproxy enables the possibility to decrypt the network flow between the client and the proxy, which is useful for assuring that extracted timestamps from packets correlate to actual key events. We refer to this process of verifying expected packet contents by decryption as quality assurance (QA).

To collect the data, we use a setup consisting of a MacBook Pro 2017 running MacOS Mojave in our tests. This laptop uses a USB wireless dongle (Anadol AWL150 Micro with chipset Reling RT5370) routed to the virtual machine (Virtual Box 6.0.6 r130049) running on the laptop to host a WiFi network for our clients to connect to. The virtual machine itself makes use of Privacy International’s data interception environment (February 2019) which is a Debian-based distribution. To inspect the traffic, mitmproxy runs (in “Regular” mode) on the virtual machine. Table 1 details the information of the client devices we use to connect to our proxy.

Table 1 Client devices used for data collection: physical devices for both T₁ and T₂, whereas virtual setup for T₃

Full size table

For cross-referencing, the behaviour of sending packets for each keystroke is tested in both browsers Chrome B₁ and Firefox B₂ on four different search engines: Google S₁, Qwant S₂, DuckDuckGo S₃ and Ecosia S₄. The test subjects input the data into search fields on each search engine website.

The network traffic from the client is proxied by mitmproxy to the search engine website. The communication between the client and mitmproxy is TLS-encrypted with a self-signed certificate and, therefore, can be decrypted for QA during the assessment of our research questions. As the timestamps of the client requests are non-encrypted metadata, decryption for the biometric feature extraction is not necessarily required. However, by doing so in this whitebox scenario, we are able to ensure that the extracted timestamps correlate correctly to the corresponding key-press event and opens up the possibility to detect packet loss or incomplete network captures. This possibility is especially useful for answering research question Q₁. In order to perform decryption, the certificate has to be installed on both client devices. The network packets are then TLS-encrypted (with the search engine certificates) and forwarded to the search engine by mitmproxy. Filtering the network flow for outgoing traffic and limiting to specific IP addresses of the search engines would narrow it down to only relevant packets, e.g. when working only on the encrypted stream without the decryption done by mitmproxy (such filtering is applied in the T₃ experiment).

In addition to our data collection from the network packets, logging keys locally with comparable timestamps allows examination of the delay. On the Linux hardware keyboard K_HW, we face technical limitations with no available functional keylogger which could record timestamps in milliseconds or better. However, for the software keyboard K_SW, we are able to make additions to the free software keyboard used (Simple Keyboard 3.71) to allow local keylogging of inputs to the nearest millisecond, comparable with the times extracted from the respective network packets. These alterations had no effect on the keyboard interface and user functionality.

3.1.2 Encrypted network capture (T₃)

Without access to the session keys an external observer will be limited to the analysis of encrypted TLS traffic directed to and coming from the search engine. Such an observer could gain access to the network communication at any node within the routing path between the user and the search engine provider.

The overall setup for our investigation in T₃ consists of a virtual appliance for Virtual Box running Ubuntu 18.04 with Firefox 70.0.1 (B₂) as shown in Table 1. The network traffic to and from this virtual machine V is automatically captured using Wireshark. Our overall intention in using a virtual appliance is keeping the software component constant during the experiment. Moreover, the captured network traffic will be limited to the virtual machine. However, in contrast to the experiments T₁ and T₂, this virtual machine is deployed on different computers, equipped with various hardware keyboards K_HW. The experiment is limited to Google S₁ as the search engine.

In order to gather the keystroke data for T₃, we developed a test protocol consisting of four total sessions. In each session, two search strings are typed five times alternating between the two to minimise learning effects during the session. In addition to that, there is a required break of at least 2 h between sessions of the same subject.

3.2 Test goals

Based on our motivation and research questions, we establish the following specific test goals T (specific fixed search queries are described in Section 4):

T₁ – To find tendencies in data disclosed to search engines under various conditions. The effects of the following factors are evaluated:

T_1.1 – Four different search engines, Google S₁ and three with a focus on privacy: Qwant S₂, DuckDuckGo S₃ and Ecosia S₄.
T_1.2 – Two different browsers: Chrome B₁ and Firefox B₂.
T_1.3 – Hardware keyboard K_HW on Linux OS and software keyboard K_SW on Android.
T_1.4 – Phrase input behaviour while SSF are or are not correct, no predictions can be made or suggestions are presented, and when the same phrase is deleted and re-entered.

T₂ – To determine if a subject can be identified in a small population based upon data received by a search engine provider. Due to the findings from the T₁ investigation—presented in Section 5— T₂ uses the conditions of search engine S₁ and browser B₁ for both K_HW and K_SW.

T_2.1 – Identification performance differences between keyboards on K_HW and K_SW.
T_2.2 – Identification performance differences of three search query phrases differing in length and commonality.

T₃ – To determine if a subject can be identified in a small population based upon encrypted data directed to a search engine provider but received by an external observer. T₃ uses the conditions of the same search engine S₁ but a different browser B₂ in comparison with T₂ for K_HW in V.

T_3.1 – Evaluation of the performance in a verification mode for V with K_HW using two different search strings.
T_3.2 – Identification performance differences between different search string lengths.

3.3 Assumptions and limitations

Our assumptions mostly relate to our test setup, especially to the network infrastructure. A relevant overview for the T₁ and T₂ setup is shown in Fig. 3. Our setup limits us to inspect on the (TLS-encrypted) network layer. Furthermore, we consider an optimal setting and perform no error handling, with regard to the following aspects:

1.
We consider only a simple network without influence from high or low levels of network traffic, switched networks, or variable travel distances. Therefore, we do not experience fluctuating travel times for packets between client devices and the proxy.
2.
We make the assumption that every packet is delivered on the first attempt. When collecting data for T₂, we only include phrases with both the first and last characters submitted successfully, such that we have our defined start and end points, and the full phrase for analysis. In some cases, no new GET requests are captured by the proxy. This could either be due to packet loss, the behaviour of the search engine provider, or other reasons unknown. We refer to this occurrence as packet omission. If this occurs for the first or last character in a test phrase, all corresponding packets to this particular phrase instance are removed from the data collection. For the T₃ experiment, where we do not have the possibility to decrypt the captured keystrokes and, therefore, cannot verify completeness of the input phrases (QA), this pre-processing obviously is not possible. Thus, in T₃, we consider each (encrypted) keystroke to be as expected.
3.
In testing with subjects, an assumption is made that they input their search terms correctly without any typing errors for each phrase. Due to increased complexity in considering mistyped phrases, we limit ourselves to analysis of perfectly input samples, all inputs with errors are discarded. The captured data is not changed in any other way.

In addition, for T₃, a limitation of the experiment could be the uncontrolled usage of various computer hardware and Internet connections. This is a result of the increased number of test subjects for T₃.

4 Evaluation procedure

The sets C_HW and C_SW contain all possible keyboard inputs on K_HW and K_SW, respectively. We define a set of all keyboard inputs as C_T⊆C_HW=C_SW. Moreover, a set of possible phrases P, generated from set of keyboard inputs, is defined as $P = \{p \mid p=c_{1}c_{2}\dots c_{n} \mid c_{m} \in C_{T} \wedge \ n,m \in \mathbb {N}\}$. Our set of phrases for T₂ tests is $P_{T_{2A}}=\{p_{1}, p_{2}, p_{3}\} \subset P$ for B_1A (Android) and $P_{T_{2L}}=\{p_{1}, p_{3}\} \subset P$ for B_1L (Linux). For T₃, the set used is $P_{T_{3}}=\{p_{1}, p_{2}\} \subset P$ for B_2V. The corresponding phrases are:

p₁= “weather”
p₂= “security and biometrics”
p₃= “department of computer science”

The targeted sample collection totals for these phrases are presented in Table 2.

Table 2 The targeted sample totals for collection in T₂ and T₃ experiments

Full size table

Additionally for the initial T₁ exploratory tests, the phrase p₀ “weather oldtown” is used to cover search inputs in line with T_1.4. The phrase p₀ is always correctly predicted and suggested throughout the first word, but when entering the second word, never correctly predicted as the full phrase through SSF (on any search engine tested). This allows comparison of any keystroke sending behaviour both while predictions are and are not correct.

For each keyboard input, we measure the corresponding timestamp using our test setup (see Section 3.3). In our experiments across both K_HW and K_SW, it is important to make a distinction in the features captured given the different default behaviours of input to the search field:

K_HW: Keys are input immediately on each key-down instance.
K_SW: Due to the typical behaviour of K_SW, different characters can be selected by holding a key, and selection is only confirmed and input on release of the key, i.e. each key-up instance.

Assuming consistent delay, the time difference between K_HW keystrokes captured is DD and the K_SW distance is UU (as introduced in Section 1). Both of these have been shown to be viable features of comparable performance [10], therefore, we use these temporal distances in evaluation for both T₂ and T₃ (though the distinction between the types of events limits direct comparison across the two environments).

Due to the conceptional difference in the consideration of decrypted network packets for QA in experiments involving T₁ and T₂, and encrypted packets in T₃, we suggest two complementary procedures using temporal distances as features. In all circumstances, the timestamps are based on the capture times of the network packets (observer time). In T₁ and T₂, the timestamps originate from mitmproxy, whereas they are captured within the virtual appliance in T₃. Section 4.1 details evaluation procedures for T₁ and T₂, whereas Section 4.2 presents details on the approach for T₃.

4.1 Decrypted keystrokes for QA (T₁ and T₂)

The experiments T₁ and T₂ both consider temporal distances extracted from decrypted network packets. Pre-processing steps are similar for both T₁ and T₂, but the processes involving feature extraction and subsequent classification are exclusively in line with test goal T₂ in this section. An example for extracted features is shown in Fig. 4.

4.1.1 Pre-processing

According to T₂ test setup, data is captured and recorded from both B_1L and B_1A for all phrases P_T. A Python script extracts the timestamps of the inputs from the file (.cap) generated by mitmproxy for each keystroke. As shown in Fig. 3 the timestamp of the first packet sent from the client to the search engine provider, containing the request string (in mitmproxy called first request byte), is used.

Even though no decryption is needed as the used timestamps are metadata of when packets arrived at the proxy, we use the decrypted stream between client and proxy only as QA in our investigation to determine changes in characters or if packets were omitted.

The file structure contains packets in sequence, from which we examine the GET requests generated by a change made in the input search query. The irrelevant GET requests, e.g. for image data, are ignored. The sequential order we capture is not reliably the actual order in which the inputs were typed, but we sort our extracted keystrokes by the extracted timestamp of the first request byte of each packet. Within each such packet, the current query phrase string is sent as part of the request, e.g. p₃ can be seen as “q=department%20of%20computer%20science” after packet decryption with mitmproxy. In comparison with the previous string, it is viable to determine whether it is a newly input character, character deletion, or, in some cases, whether packet omission has occurred.

In cases of intermediary packet omission, there are steps taken either to remove or salvage samples. As mentioned in Section 3.3, if the first or last character of a phrase is not captured we do not consider it to be a usable sample and therefore immediately discard it. However, if any other timestamps within the sample phrase are omitted as a result of measurement or network errors, we estimate the missing data by one-dimensional linear interpolation. This divides the time difference between the captured characters spanning the window containing omission(s) equally amongst the missed characters. For example, if p₁ (“weather”) is missing characters “t” and “h”, the time difference from “a” to (the second) “e” will be divided by three and produce equal estimated “at”, “th” and “he” distances. This example is visually presented in Fig. 5. This method of interpolation ensures we have a full set of features in the sample for analysis. Nonetheless, in the following Section 5.2, for T₂, we consider our results when using purely the complete samples as well as with those interpolated.

Due to variability in how many samples were gathered from each subject (explained in Section 5.1.2), and in the interests of avoiding distortion of results, the amount of samples across subjects for each class are balanced. With this approach, we can present and compare results when classification uses either balanced or unbalanced classes. When considering only our complete samples in the experiment, excess samples are removed from the set in overrepresented classes, specifically the latest captured ones (in consideration of reducing the effects of learning). In the case of our interpolated samples, we first prioritise removing the samples with the most interpolations for the sake of sample quality, then similarly further remove any by latest captures if necessary until all classes are balanced. Table 3 shows the amount of instances of each class, which is the same total for each subject, after balancing. From the table, it is evidenced that tests with at least one subject only captured two and one samples for p₁ and p₂ respectively. This motivates our interpolation method to make more samples, and therefore these classes, usable in subsequent classification.

Table 3 From T₂ tests, amount of samples remaining after balancing, per subject (e.g. 41×5=205 total for B_1L - S₁ - p₁)

Full size table

The result of these steps is a set of files (.csv) with extracted timestamps and corresponding query phrases, sorted by timestamp. The data is pre-processed in all the possible permutations of including interpolations and/or balancing classes.

4.1.2 Feature extraction

In order to evaluate our data, we define empirical values that we can use to classify the data. Overall, for each phrase used in our tests, we have an initial feature set comprised of all the total distances between each neighbouring character in the phrase. Additionally, the average is calculated and appended to each sample as an additional feature. Therefore, for each of our phrases, we have feature totals: 7, 23 and 30.

Our evaluation data M is defined as pairs of sequences of characters of the test phrases in P_T and the corresponding timestamps of each sequence, that is $M=\left \{\langle c_{1},t_{1}\rangle,\langle c_{1}c_{2},t_{2}\rangle,...,\langle c_{1}c_{2}...c_{n},t_{m}\rangle \mid n,m\in \mathbb {N}\right \}$.

We calculate the time difference between two consecutive keyboard inputs c_i and c_i+1 simply as:

$$ d_{i,i+1}=t_{i+1} - t_{i} \mid i\in\mathbb{N} $$

(1)

When applied to all characters of a phrase p∈P_T, we obtain a set D_p={d_1,2,...,d_n−1,n∣n=|p|} of time differences for all consecutive key pairs.

We calculate the average (mean) for each phrase as an additional feature. The average over all time differences between two consecutive keyboard inputs for one phrase can be calculated by:

$$ D_{p} = \frac{1}{n-1}\sum\limits_{i=1}^{n-1}d_{i,i+1} $$

(2)

4.1.3 Classification

For every phrase p entered by a test subject, we have a set F_p={d_1,2,...,d_n−1,n,D_p} of all features of that particular phrase. With samples of the same phrase, we can try to classify each subject by some classification algorithms. We choose Gaussian naive Bayesian (NB) and nearest neighbour by Euclidean distance (NN) as classifiers specifically, due to their effective performance with small data sets.

4.2 Encrypted keystrokes without QA (T₃)

The experiment T₃ utilises temporal distance features extracted from the encrypted data stream. As the data is encrypted, this means definitive QA of packet content for correct key information is not possible. The classification is performed on the foundation of the Euclidean distance between the feature vector and a subject’s template.

4.2.1 Pre-processing

The pre-processing of the encrypted data captures consists of the following steps:

Determination of the IP addresses of the search engine by analysing DNS responses from the name server.
Extraction of candidate type sessions based on the determined target IP addresses.
Selection of the pre-sessions based on the session length.

Firstly, the capture files are read using Scapy and analysed towards the DNS responses for the specified search engine domain. In our case, the search engine is limited to S₁ (in top-level domain.com). The DNS response contains one or multiple IP addresses which are contacted in order to perform the search. Each IP from the DNS response is stored in a set to compare with the destination IP of each network packet.

During the second step, sessions are extracted based on their metadata: packet length between 208 and 300 bytes and monotonic increase of the packet size in comparison with the previous packet to the same destination IP, which is equivalent to two consecutive key-presses resulting in key down-down times. Due to the compression and encryption of the requests, we can assume that the next network packet including one additional character will be either as large as its predecessor or one byte longer. If the packet size is lower or significantly larger, the session will be terminated. Furthermore, sessions with less than five packets will be discarded. If the inter-packet-time exceeds 5000 ms, the session will be terminated as well.

For the last step, the candidate sessions are manually selected based on the session length:

Between 5 and 9 packets: p₁ (“weather”)
Between 21 and 24 packets: p₂ (“security and biometrics”)

The intervals are chosen because packets might be omitted in the capture file. Furthermore, additional packets sent during the typing session might sporadically interfere with the total number of detected packets. This selection process is a realistic consideration because an external observer would also have to deal with the same issues.

4.2.2 Feature extraction and selection

From the extracted sessions, statistical features are extracted based on the timestamps of two consecutive packets within the session. Overall, we differentiate between a set of length-independent features F₁ and a length-dependent feature in F₂ from an encrypted session E with $n \in \mathbb {N}$ packet timestamps e_n∈E:

$F_{1}\text {min}=\operatorname *{arg\,min}\limits _{e_{n} \in E,n>1}(e_{n}-e_{n-1})$
$F_{1}\text {max}=\operatorname *{arg\,max}\limits _{e_{n} \in E,n>1}(e_{n}-e_{n-1})$
$F_{1}\text {mean}=\bar {e}=\frac {1}{n-1}\sum \limits ^{n}_{i=2} e_{i}-e_{i-1}$
$F_{1}\text {variance}=\sigma ^{2}=\frac {1}{n-1}\sum \limits ^{n}_{i=2} ((e_{i}-e_{i-1}) - \bar {e})^{2}$
$F_{1}\text {skewness}=\frac {1}{n-1}\sum \limits ^{n}_{i=2} (\frac {(e_{i}-e_{i-1}) - \bar {e}}{\sigma })^{3}$
$F_{1}\text {kurtosis}=\frac {1}{n-1}\sum \limits ^{n}_{i=2} (\frac {(e_{i}-e_{i-1}) - \bar {e}}{\sigma })^{4}$
$F_{1}\text {median}=\tilde {e}$
F₁regressslope: slope of the linear regression of inter-packet-times
F₂total=e_n−e₁

From these features, we calculate the templates for each test subject based on the first three of the four sessions. In particular, we create three templates—for p₁, for p₂, and for all samples from p₁ and p₂: p₁p₂. The templates are calculated by determining the mean of all training/enrolment sessions (15 out of 20 samples). In addition to that, we determine the intra-person-variance as a foundation for the feature selection.

The feature selection is performed empirically by comparing the equal error rates (EER) after selection of the subsets of features. In particular, a variance threshold of 0.5 for each feature, resulting in selection of the features F₁min,F₁mean,F₁median and F₁regressslope which yield the best performance based on our data set.

4.2.3 Classification

The matching is performed by a standard template matching approach using the Euclidean distance between the trained template and a feature vector. In addition, we have evaluated the Manhattan distance as well as the Canberra distance; however, both yielded lower detection performances. We evaluate the biometric system in the verification mode and in the identification mode up to a rank level of three.

5 Experiments

Firstly, in Section 5.1 “Data collection” we detail what data we actually captured through use of our test setup. This is followed by Section 5.2 “Results” in which our analysis is discussed and presented for each of our test goals in turn.

5.1 Data collection

5.1.1 T₁ collection

To address test goals T_1.1 – T_1.4, the data was collected across the specified browsers B_1L,B_2L,B_1A and B_2A, on search engines S₁ – S₄ with the phrase p₀ as introduced in Section 4. Two subjects were used in completing the inputs across two days. The website of each respective search engine was visited, the input was made into the search bar provided, and then it was submitted with the enter key. Subsequent searches and amendments to the term were made on the results page where a search field remained accessible.

To test any keystroke sending behaviour when the same data is re-entered, first “weather” is typed, deleted by backspace, and then re-entered exactly. The second “oldtown” input is entered and then a similar test is performed after completion of the full phrase: the phrase was partially deleted by the backspace key to leave only “weather”, and then “oldtown” was re-entered. To test the behaviour while suggestions were no longer visibly generated, further non-specific words were input—chosen in direct contradiction to any further search suggestions—until they were no longer provided, then one word more was typed beyond that boundary.

5.1.2 T₂ collection

All data was collected on a single day for the K_SW and across two neighbouring days for the K_HW. The test subjects typed all instances of each phrase in one single session.

We used the three phrases defined in Section 4. The phrase p₁ was chosen specifically because it is one of the most commonly searched terms across all search engines and, therefore, serves as an indication of whether the typing behaviour can be used to track people. The other two terms p₂,p₃ were chosen as examples of longer, more uncommon terms as per T_2.2.

Each phrase was first typed five times to train the person on the unfamiliar keyboard and phrase. Then each person entered the phrase at least ten times correctly. Any phrases with mistakes were discarded at this stage as described in Section 3.3. Any method for clearing the text was permitted, such that the next sample could be input into a newly blank search field.

The subjects came from different language backgrounds, and while all use both kinds of keyboards daily, the keyboard layout usually used differed. The keyboard layout did not get swapped for any subject. The K_HW used was a German language layout keyboard; however, only phrase p₂ involved a key not identically placed to that of an English keyboard (Y/Z). The K_SW was used with a default English keyboard layout.

Every test for T₂ was conducted while using search engine S₁ on the Chrome B₁ browser on both devices. For the initial tests with the K_SW, at least ten samples were collected from each subject for each phrase: p₁,p₂ and p₃. In each case, the subject was asked to be certain if they had input the full amount of phrases correctly without typing errors; therefore, in some cases, more than ten samples were collected.

For the second round of tests involving K_HW, ten p₃ samples were collected in a similar manner. However, it was decided while testing to focus on at least 40 inputs for p₁ per person. This change was motivated by issues raised from the small sample sizes of initial results gathered from the K_SW, discussed in Section 5.2.

5.1.3 T₃ collection

The sessions for T₃ are recorded within a fixed software setup provided by the virtual machine V (using B_2V with S₁). The test subjects were instructed to type p₁ “weather” and p₂ “security and biometrics” five times within each session. Hereby, the keyboard layout should be ignored and all subjects should type with the layouts they are accustomed to. Since the actual content of the search string cannot be determined due to the TLS encryption, the impact of an incorrect search string is negligible. However, the test subjects are instructed to abort the current search if they recognise a mistyped character. In total, four sessions are recorded by each of the test participants yielding 20 samples for p₁ and p₂ per subject.

5.2 Results

5.2.1 T₁ presentation

In Table 4, the tendencies of sending keystrokes are presented for the scenarios outlined in T_1.1 – T_1.3, denoted by symbols corresponding to disclosure levels of full, irregular or none, which we define as follows:

Table 4 Results for T_1.1 – T_1.4

Full size table

Full: Up to (but no more than) two consecutive keystrokes may not be sent in these conditions. We consider this a “full” tendency given the general behaviour to send every keystroke.
Irregular: At least one occurrence is recorded where more than two keys in a row are omitted from phrases. Therefore, incomplete samples are regularly captured.
None: No evidence is found of keystrokes being sent in packets.

The latter three headings of Table 4 address the T_1.4 process as detailed in Section 5.1.1; in summary, these are the situations during which:

Suggestions: Entry while SSF are predicted and presented to the user.
No suggestions: Entry as further text is entered such that SSF have ceased.
Re-entry: The phrase entered (which SSF had been generated for) is deleted by backspace and then retyped exactly, intended to trigger the same suggestion.

5.2.2 T₁ discussion

Our exploratory data collection across these scenarios demonstrates reliable keystroke collection to some extent across three of the four search engines. With such data disclosure occurring, this suggests potential for the data to be used in recognition applications.

We note that regardless of the conditions in our tests, whenever a packet was successfully sent on keystroke input, the size difference was always the same for any individual keystroke. While from our results we observe that packets were not always sent due to packet omission, this irregularity was a behaviour observed to be independent of how fast or slow a user types.

One search engine behaviour shown in Table 4—which is not transparent to the user—is how keys are sent even when SSF are no longer presenting suggestions. This is true for S₁,S₂ and S₃, while S₄ does not send keystrokes after suggestions can no longer be predicted.

Another observation present in Table 4, is the re-sending of keystrokes when text is deleted and re-entered, therefore, triggering the same suggestions as before. This user action noticeably reduced the sent keystrokes for S₂ and S₃, with none sent for S₄, which simply reused the previously requested suggestions. S₁’s behaviour was consistent in reliably sending regardless of circumstance.

To partially address the reason for differing behaviour between search engines on Linux and Android, it is to be noted that the “mobile” versions of websites differ from “desktop” counterparts. This difference is at the very least visible in presentation, but also may be responsible for different underlying behaviours.

Due to the highest reliability and consistency observed by B₁ and S₁ across both the Linux and Android operating systems, this was selected as the preferred environment for addressing T₂.

5.2.3 T₂ presentation

For classification of each set of samples, both the NB and NN classifiers are applied. Classification is performed using Python through use of the scikit-learn packages [11]. To split testing and training data, fivefold cross validation is used, and for each fold, the macro average of the EER is calculated. The EER presented in the Table 5 is the mean of these averages across all five folds for each phrase on each device. Two concatenation combinations of p₁p₂p₃ for K_SW and p₁p₃ for K_HW are also included.

Table 5 Results for T_2.1 and T_2.2: fivefold cross validation equal error rates (EER) for each phrase under Gaussian naive Bayes (NB) and Euclidean nearest neighbour (NN)

Full size table

Results for both unbalanced and balanced classes are shown. The results also show when samples with missing values are excluded or included with interpolations. For K_SW phrases p₁ and p₂, when missing values are excluded, the EER cannot be calculated as there are too few instances of some classes for fivefold cross validation. Therefore, interpolation is a prerequisite for evaluation by our classification process in these two cases.

5.2.4 T₂ discussion

In overall observation of the results of Table 5, we can see lower EER results from K_HW than K_SW. For example, considering our longest single phrase p₃ with balanced classes in the samples, for K_SW, we see EER of 18.04% and 22.28% for NB and NN respectively, whereas for K_HW EER 5.11% and 9.44% are the respective values. Similarly with our shortest phrase p₁, balanced class results for K_SW of 24.12% and 34.36% are markedly lower for K_HW at 12.97% and 16.83%.

One reason for lower EER with K_HW is likely the more reliable data capture of the environment. Despite ensuring correct input and using the browser and search engine combination deemed most reliable in T₁ for K_SW, some expected packets were omitted; therefore, we had incomplete samples. While still usable to an extent through interpolation, omission is information loss and naturally would be expected to affect performance. Furthermore, the familiarity with the K_HW environment would make inputs more consistent as all subjects have previously used a physical keyboard with a similar layout. The K_SW on the other hand is on a smaller device, which subjects may not be used to, hindering natural movements. Nonetheless, what likely had the most impact would be that subjects are instructed to type without K_SW predictions/suggestions and expected to make no mistakes. Given the prevalence of keyboards with auto-corrective systems, users would be accustomed to making frequent small mistakes, which contrasts with our assumption of a totally flawless input.

Concatenation of the three phrases showed clear improvement in EER for K_SW, with balanced classes the lowest K_SW EER value of 11.37% is achieved by NB with concatenation. The improvement is greater for NN with an EER value of 13.35% compared to the previous (balanced and interpolated) lowest of 25.36%.

The concatenation of p₁ and p₃ for K_HW also showed lower EER results when compared with p₁. However, EER values for p₃ alone are the lowest. Due to disparate sample totals for p₁ and p₃ (41 and 9 respectively in the case of balanced classes), this meant concatenation of the two is limited to producing nine samples. As only one concatenation configuration was trialled, this coincidentally may have included the least effectively distinct intra-class samples for p₁ to concatenate with p₃, increasing EER values by NB and NN, rather than decreasing EER values as desired.

To address the issue of missing values captured from the K_SW, we assess results both when only complete samples are considered, and when samples with interpolated values are also included. However, as only p₃ contained enough samples for each class for fivefold cross validation, we can only examine the differences for p₃. Interpolation here appears to increase the EER value, suggesting samples with interpolation are not significantly valuable and in some cases may produce samples without utility for classification. Introducing a maximum threshold for interpolations within a phrase may be a compromise to make more samples usable without risk of filling them with too much insufficiently meaningful data.

5.2.5 T₂ latency

To further assess the conditions of our T₂ results, we review our assumption on latency. If the packet goes through the Internet and to the search engine provider, in reality, due to variation in traffic or route distances, the travel time could be variable. Therefore, interpretation of our results assumes a stable travel time despite these potential conditions. However, beyond this assumption, due to our captured local keystrokes, some basic statistical analysis can be performed on the average delay in comparison with network results. Capturing p₂ on the K_SW is the scenario with the most omitted packets for one of the subjects—only one complete sample from N₁ without interpolation can be derived—so we use the p₂ collection as our most diverse example for latency comparison.

Table 6 shows the delay on our sessions for each subject for inputs of p₂. As client and server are running on unsynchronised clocks, we consider “delay” to be additional time beyond the minimum difference between timestamps as an offset for each subject, e.g. for N₁, the minimum difference was 23,683 ms, so we make the assumption that the clocks were 23,683 ms out of sync in p₂ collection for N₁, and look at average and maximum delays beyond that baseline offset. Table 7 summarises the absolute value differences in feature calculations; the features calculated based on local timestamps are compared with those calculated based solely on the network timestamps.

Table 6 Latency effects on T₂: p₂ keystrokes delay beyond minimum offset (ms)

Full size table

Table 7 Latency effects on T₂: difference in p₂ features calculated from local timestamps compared with features derived from respective network timestamps (ms)

Full size table

For phrase p₂, packet omission occurred for N₁,N₄ and N₅. Most frequently for N₁—with the largest delay in Table 6—and least frequently for N₅—with the smallest delay of those with omissions. In this case for N₂ and N₃, we saw no instances of packet omission, and these display the shortest delays of under 100 ms even at their maximum.

While this is limited data for comparison, the average difference between local and network features is under 100 ms for all subjects, suggesting that while adverse network conditions affect the delay and likely increase packet omission, even in poorer network conditions, the delay was consistent to such a degree that the features which are captured remain very close to the true locally captured values.

5.2.6 T₃ presentation

The results for the evaluation of verification performance T_3.1 with the ten test subjects are depicted in Fig. 6.

Table 8 summarises the results of the identification within the scope of T_3.2 on the encrypted data stream up to a rank level of three. An attempt to identify a subject is considered successful when the correct identity is contained within a ranked list. The rank of the list describes the number of its entries; for rank 1, the template with the minimum distance is the only one contained in the list.

Table 8 Identification performance in the encrypted data stream T_3.2 (value range [0,1]; higher values indicate better results)

Full size table

5.2.7 T₃ discussion

In comparison with T₂, we can see a lower classification performance for the encrypted data stream. Based on the results shown in Fig. 6, we can infer that an increased length of the search string leads to increased verification performance and a lower EER. Hence, the best performance is achieved for p₂ with an EER value of 23.61%. We can also see that, at least in our experiment, for different search string lengths, the overall performance will be similar to the performance of evaluation of the shortest search string. However, due to our limitation to two different search strings, we cannot quantify this finding.

The analysis of the identification partially confirms the assumption that the experimental setup is influenced by the differences in the utilised hardware. At least two test subjects shared a common computer for recording the sessions—during the identification of the samples from those two subjects, usually both have yielded highest similarities with the stored templates. Nevertheless, mostly the correct decision was made by the identification algorithm.

6 Conclusion

Due to the nature of biometrics, data is generated at every step we take in an online environment. Combined with easily accessible processing power and algorithms, it is easier than ever to identify or track people by methods they may be totally unaware of. The behaviour of each search engine may differ, but we have demonstrated the principle potential of recognising users of SSF, even in scenarios where data is incomplete. We only consider the data that is transmitted independently yet have demonstrated the capability and risks. We have demonstrated the overall approach both under consideration of decryption of single packets (using mitmproxy) and by considering only encrypted packets, the latter using a simple Wireshark network tool. In both cases, we filter client outgoing network streams to search engine providers and extraction of metadata to build the biometric features for identification and/or verification purposes.

Without clear consent, biometric data is given and could be collected for each user and processed for tracking purposes. As further data can be gathered, this would only increase the ability to identify people, either through particularly distinct phrases with high intra-class variation or through methods such as concatenation as shown.

Our first proof-of-concept experiments indicate that biometric recognition is feasible with an EER of up to 5.11% in the non-encrypted stream, e.g. by the operator of a search engine, and 23.61% for the encrypted stream, which is visible to any network node transferring the search requests. This is especially worrisome since information like monitor resolution, operating system, and other hardware details can already be captured by web service providers. This information can be exploited along with an IP to track the machine used. In light of keystrokes used as an identifier on top of these established methods, even in a scenario with several people in one household sharing use of the same device, the search provider may learn to distinguish users and, therefore, learn which search terms are searched by each such user. To this end, the new interesting field of biometrics in SSF-enabled web systems has been established.

To protect against this potential for tracking, as a basic protection measure, we would recommend disabling JavaScript. This demonstrably disables any suggestion behaviour within the four tested search engine websites, and would ensure the most vulnerable biometric data would not be transmitted. However, this is not a total solution against browser inputs which do not have JavaScript dependencies such as typing in the address bar or through search engine browser extensions. Alternatively, development of some specific browser add-on could offer protection, for example by withholding user input until pressing enter. Note that either of these approaches would also naturally disable the SSF outright.

We have only considered the basic scenario of a fixed-text keystroke system to examine the possibility of identification through network packets generated from typing behaviour. As we have demonstrated such capability, we recommend further work to inspect the potential beyond fixed-text to free-text analysis. This would explore the risks of whether previously unrecorded search terms could be used to identify people and therefore would cover even broader scenarios of identification from searching behaviour.

There is great scope for future work in this field of biometrics. More extensive investigation using more subjects in similar experiments could uncover potential for identification of larger user groups. Further experimentation with pre-processing, such as utilisation of discarded incomplete samples as substrings of phrases, or adjustment of the approach to missing values and/or interpolation, could improve results. Different classification approaches and advancements in machine learning could be applied beyond the NB and NN classifiers presented, and additional examination and experimentation with control over network conditions is warranted. For the template, matching additional features could be evaluated. Beyond that, further header data could be evaluated in order to minimise artefacts in the extracted sessions. Moreover, the application of the methods presented in this article should be investigated further for other services that use user input, e.g. for cloud-based office solutions. Another field of further investigations might be the de-anonymisation of users in the TOR network, specifically when observing the entry node traffic.

Availability of data and materials

The keystroke data sets collected and analysed during the current study are not publicly available due to the sensitivity of biometric data. Volunteers had privacy concerns and did not consent to wider data sharing (as is their right in GDPR).

Abbreviations

K _HW :: Hardware keyboard (Linux)
K _SW :: Software keyboard (Android)
EER:: Equal error rate
QA:: Quality assurance of packet contents
SSF:: Search suggestion functions
B :: Browser
K :: Key input method
S :: Search engine
V :: Virtual machine

References

European Parliament and Council, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation),Official Journal of the European Union, L 119, 1-88 (2016).
K. Haring, Ham Radio’s Technical Culture, 1st edn (MIT Press, Massachusetts, 2006).
Book Google Scholar
R. S. Gaines, W. Lisowski, S. J. Press, N. Shapirog, Authentication by keystroke timing: some preliminary results, Rand Rep. R-2526-NSF (RAND Corporation, California, 1980).
Google Scholar
F. Monrose, A. Rubin, in Proceedings of the 4th ACM Conference on Computer and Communications Security, 4CCS97, Zurich, Switzerland, April 1-4, 1997. Authentication via keystroke dynamics, (1997), pp. 48–56. https://doi.org/10.1145/266420.266434.
P. H. Pisani, A. C. Lorena, A systematic review on keystroke dynamics. J. Braz. Comput. Soc.19:, 19–573 (2013). https://doi.org/10.1007/s13173-013-0117-7.
Article Google Scholar
D. Buschek, A. D. Luca, F. Alt, in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI’15, Seoul, South Korea, April 18-23, 2015. Improving accuracy, applicability and usability of keystroke biometrics on mobile touchscreen devices, (2015). https://doi.org/10.1145/2702123.2702252.
C. Rathgeb, A. Uhl, A survey on biometric cryptosystems and cancelable biometrics. EURASIP J. Informa. Secur.2011(1), 3 (2011). https://doi.org/10.1186/1687-417X-2011-3.
Article Google Scholar
R. Koch, G. D. Rodosek, in Proceedings of the 6th International Conference on Network and Service Management, CNSM 2010, Niagara Falls, Canada, October 25-29, 2010. User identification in encrypted network communications, (2010), pp. 246–249. https://doi.org/10.1109/CNSM.2010.5691292.
A. Cortesi, M. Hils, T. Kriechbaumer, contributors, mitmproxy: a free and open source interactive HTTPS proxy [Version 4.0] (2019). https://mitmproxy.org/. Accessed 6 December 2019.
D. Hosseinzadeh, S. Krishnan, Gaussian mixture modeling of keystroke patterns for biometric applications. IEEE Trans. Syst. Man Cybern. C (Appl. Rev.)38(6), 816–826 (2008). https://doi.org/10.1109/TSMCC.2008.2001696.
Article Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res.12:, 2825–2830 (2011).
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work presented has been supported in part by the European Commission through the MSCA-ITN-ETN - European Training Networks under Project ID: 675087 (“AMBER - enhAnced Mobile BiomEtRics”). The information in this document is provided as is, and no guarantee or warranty that the information is fit for any particular purpose is given or implied. The user thereof uses the information at one’s own sole risk and liability. We thank all the anonymous volunteers for contributing test data to experiments T₂ and T₃.

Funding

The work presented has been supported in part by the European Commission through the MSCA-ITN-ETN - European Training Networks under Project ID: 675087 (“AMBER - enhAnced Mobile BiomEtRics”). This project has received funding from the European Union’s Horizon 2020 research and innovation programme.

Author information

Authors and Affiliations

Multimedia and Security Lab (AMSL), Otto-von-Guericke-University, Magdeburg, Germany
Nicholas Whiskerd, Nicklas Körtge, Kris Jürgens, Kevin Lamshöft, Salatiel Ezennaya-Gomez, Claus Vielhauer, Jana Dittmann & Mario Hildebrandt
Department of Informatics and Media, Brandenburg University of Applied Sciences, Brandenburg an der Havel, Germany
Claus Vielhauer

Authors

Nicholas Whiskerd
View author publications
You can also search for this author in PubMed Google Scholar
Nicklas Körtge
View author publications
You can also search for this author in PubMed Google Scholar
Kris Jürgens
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Lamshöft
View author publications
You can also search for this author in PubMed Google Scholar
Salatiel Ezennaya-Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Claus Vielhauer
View author publications
You can also search for this author in PubMed Google Scholar
Jana Dittmann
View author publications
You can also search for this author in PubMed Google Scholar
Mario Hildebrandt
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JD originally proposed the idea and MH performed the initial speculative tests. All authors contributed in the discussion of the topic and the overall approach. JD, KL and SEG each gave input during the project and provided their experience to aid NW, NK and KJ in their implementation, testing, and reporting of T₁ and T₂. MH implemented the algorithms and performed the tests for T₃ with all authors performing analysis of the test results. JD and CV both gave significant guidance on the motivation section and the article structure, while all authors contributed in writing and approved of the final manuscript.

Corresponding author

Correspondence to Nicholas Whiskerd.

Ethics declarations

Competing interests

Two of the guest editors for this submission, Richard Guest and Christian Kraetzer, are members of the European Training Network AMBER, as are authors NW, SEG, JD and CV. Furthermore, Christian Kraetzer is also a member of the AMSL workgroup at OVGU, which is represented by all authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Whiskerd, N., Körtge, N., Jürgens, K. et al. Keystroke biometrics in the encrypted domain: a first study on search suggestion functions of web search engines. EURASIP J. on Info. Security 2020, 2 (2020). https://doi.org/10.1186/s13635-020-0100-8

Download citation

Received: 23 August 2019
Accepted: 14 January 2020
Published: 21 February 2020
DOI: https://doi.org/10.1186/s13635-020-0100-8

Keystroke biometrics in the encrypted domain: a first study on search suggestion functions of web search engines

Abstract

1 Motivation

2 Related work

3 Experimental setup

3.1 Test setup

3.1.1 Decrypted network capture involving man-in-the-middle (T1 and T2)

3.1.2 Encrypted network capture (T3)

3.2 Test goals

3.3 Assumptions and limitations

4 Evaluation procedure

4.1 Decrypted keystrokes for QA (T1 and T2)

4.1.1 Pre-processing

4.1.2 Feature extraction

4.1.3 Classification

4.2 Encrypted keystrokes without QA (T3)

4.2.1 Pre-processing

4.2.2 Feature extraction and selection

4.2.3 Classification

5 Experiments

5.1 Data collection

5.1.1 T1 collection

5.1.2 T2 collection

5.1.3 T3 collection

5.2 Results

5.2.1 T1 presentation

5.2.2 T1 discussion

5.2.3 T2 presentation

5.2.4 T2 discussion

5.2.5 T2 latency

5.2.6 T3 presentation

5.2.7 T3 discussion

6 Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

3.1.1 Decrypted network capture involving man-in-the-middle (T₁ and T₂)

3.1.2 Encrypted network capture (T₃)

4.1 Decrypted keystrokes for QA (T₁ and T₂)

4.2 Encrypted keystrokes without QA (T₃)

5.1.1 T₁ collection

5.1.2 T₂ collection

5.1.3 T₃ collection

5.2.1 T₁ presentation

5.2.2 T₁ discussion

5.2.3 T₂ presentation

5.2.4 T₂ discussion

5.2.5 T₂ latency

5.2.6 T₃ presentation

5.2.7 T₃ discussion