Our proposed system adopts a client-server architecture. It is composed of four components as shown in Fig. 2.
The first component represents the client side, which consists of ad banners that are shown in mobile applications. Since they are accessed via mobile phones, we represented them as mobile devices in Fig. 2. We will refer to this component as “ad banner component.”
The server side contains three components: the first server side component consists of ad network APIs that handle the ad selection and billing for both publishers and advertisers. We will refer to this component as the “ad network component.” The second server side component consists of advertisers’ websites. We will refer to this component as “advertiser component.” The third server side component consists of our online server that works as a click fraud detection engine. We will refer to this component as “CFC component—server side.”
Ad banner component
This client side component is added to the publishers’ applications by publishers that wish to generate revenues from ad clicks. These applications represent any application that can be downloaded and used by end users (usually via Play Stores).
In order to add the ad banner component to a mobile application, the publisher should sign up on an ad network website that he chooses and follow the instructions explained by the ad network to integrate this component in his application. This integration might vary from one ad network to another, but in general, it consists of downloading a jar file and adding it to the application with the publisher’s credentials for this ad network service. This integration process is common for all ad networks; however, the only difference in our proposed model is that the downloaded jar file will actually contain, not only the ad network ad managing logic, but also our own CFC click fraud detection logic (which is explained in detail in Section 3.4). Furthermore, this addition to the CFC model is transparent to the publisher since no additional steps are required from him compared to ad banner integration with traditional ad networks that do not offer the CFC component as part of their ad banner logic. It does not impose as well any extra effort on the end user side when viewing or clicking the ad banner in the mobile application.
Once added to the application, the ad banner will use APIs provided by the ad networks, to fetch ads and display them in the banner. The ads are then visible to end users who are using the publisher’s application. Once the end user clicks on a banner ad, he will be redirected to the advertiser’s website. At the same time, the ad banner component will be sending click information to the CFC server side component. These click information are explained in detail in Section 3.4.
Ad network component
Similarly to traditional ad networks currently in the market, this component acts as a relay between different publishers and advertisers, by selecting which ads to send to the publisher for display, charging advertisers for each ad click and paying the publisher a percentage of the charged money.
In order to give more assurance to the advertisers that the clicks to their websites are actually legitimate clicks, it is in the interest of ad networks to inform advertisers that they are using the CFC detection mechanism. To achieve this CFC protection, ad networks can request from the CFC party this service. This request can be made by email or using a website built by the CFC for this purpose. The CFC party will then send an encrypted jar file to the ad network, which can then be added by the ad network to the existing ad banner jar file that they offer to the publisher. Both files will be sent to the publisher. In other words, the ad network will merge the logic that they normally use to display ads in ad banners with the CFC logic that detects malicious activities. Although the CFC logic is not built and managed by the ad network, it will host it in its own ad banner. The implementation of the jar file that contains the ad banner logic is explained further in detail in Section 4 below.
Advertiser component
After a user clicks on an ad featured by the ad banner component, she will be redirected to the advertiser’s website, which represents the advertiser component. No modification will be made to this component in our proposed model in comparison to traditional advertiser’s websites.
CFC component (server side)
Similar to companies that offer anti-virus malware detection services, this component is managed by another party that does not benefit from clicks but rather present for the advertiser as a safety control mechanism. After collecting a large enough number of clicks from different ad banner components using the CFC service, a crowdsource-based calculation is performed. We will refer to this click fraud crowdsourcing algorithm as “CFCA.” This calculation is following a crowdsource approach by building its analysis results on a large number of clicks in order to reduce false positives for a given user. For example, if the user in question is practicing suspicious behavior on several applications or on several occasions, it is highly possible that he is in fact malicious. The more we have records of his suspicious activities, the more our judgement can be certain, and this is where we benefit from our crowdsource approach.
In addition to following a crowdsource approach, the motivation behind our approach is based on two main needs: on the one hand, an ad network is able to monitor and assess many clicks from different apps; however, it is not able to monitor the user’s activity for click fraud detection once she is redirected to the advertising site; on the other hand, while an advertiser is able to monitor the user activity on her site (for example the duration spent on the site), the judgment is done per click and is therefore prone to a high false positive rate [3]. Accordingly, if a user clicks on an ad and she shows no interest in the ad website, she will be flagged as malicious for quickly exiting the website. Our proposed solution addresses these two shortcomings by combining both the click information provided by the ad networks and the user activity information provided by the advertiser. In addition to the CFCA, the CFC party manages several APIs that communicate with the CFC ad banner component and a corresponding online database.
CPA enhanced model
Our proposed mobile ad charging model benefits from our CFC system to charge advertisers based on the duration spent in the advertiser’s website. As opposed to current CPA systems that rely on the advertisers to report the actions to be charged for, our CFC model presents a secure framework in which the duration is measured by a trusted party. In addition, our system reduces the work load on the advertisers since they are not required to perform any integration on their websites or perform any action reporting to the ad network.
For billing purposes, the CFC must send the captured actions (ad request with duration information) to the corresponding ad network. The ad network can define the pricing scheme for different durations, for example, charge the advertisers if the duration spent on the advertiser’s website by the user is more than 1 min or even charge as a function of the time spent.
Although this new mobile ad charging model is represented as an enhancement to the CPA model, it can also support the CPC and CPM models: The CFC system registers whenever an ad is clicked, which can be used as a CPC model. In addition, it can keep track whenever a new ad is shown in the application, which can be used as a CPM model. In fact, the CFC system could be used as a combination of the three models (CPA, CPC, and CPM).
Ad banner component CFC steps
Although the application can be used by the publisher, the CFC steps are applied to any user that uses the application and interacts with the ad banner component (as explained in this section); we will refer to this user with the mobile phone symbol in Fig. 3. When the application starts, our library requests an ad to show from the ad network by sending the publisher’s ID (Fig. 3—step 1). Unlike traditional ad fetching systems where each ad network manages its own publishers’ identification system, this ID is generated by the CFC party and given to the publisher by the ad network. This is to ensure that each publisher has a unique identifier among all the ad networks registered with the CFC services.
As a response to the ad request, the ad network returns information about a selected ad. The ad banner component then displays the ad in a simple banner.
When the user clicks on an ad, the CFC ad banner component performs several sequential requests (Fig. 3):
In steps 3–4, for billing purposes, the library informs the ad network that an ad is clicked, by sending the publisher’s ID and the ad ID (Fig. 3—step 3). After confirming this publisher-ad ID combination, the ad network returns a confirmation billing response to the publisher (Fig. 3—step 4).
Steps 1, 2, 3, and 4 are performed similarly to existing ad fetching systems. However, the following steps are added in our system:
In steps 5, 6, and 7 (ad clicked challenge), we consider, intuitively, an ad click as potentially fraudulent, if after clicking on the ad and being redirected to the advertiser’s landing webpage, the user spends less than a certain time before exiting the advertised website. Therefore, the CFC ad banner component sends a request to the CFC server to indicate that an ad view session has begun on the client side. This request takes as input in step 5, the publisher’s ID and the user’s IP address, a timestamp (of when the ad was clicked), and a state integer of value 1. This non-local IP address is fetched on the client side using an online service called “ipify” [32]. The CFC server saves this information in its online database with a state field of value 1 and a timestamp. The server saves the timestamp for future reference.
In step 6, and to verify that the extracted IP is not spoofed, the CFC server challenges the client side by sending a random token and a created session ID. To prove its IP address legitimacy (step 7), the client sends back the challenge token with a state equal to 2, and the session ID. The state field identified by the returned session ID is updated to 2 in the online CFC database (after verifying that the previous state value of this session is 1). The state field is used to keep track of the actions performed by the client side; for example, a state of value 2 means that the user has clicked on the ad but has not exited the advertised website yet.
In step 8 (ad view), after receiving a confirmation of the ad click challenge from the server, the user is redirected to the advertised website.
In steps 9, 10, and 11 (ad closed challenge), once the webview is closed, or the application is no longer visible (by detecting when it is no longer in foreground after clicking the exit button), the client library informs the CFC server of the ad view session ending, by sending the session’s ID, publisher’s ID, user’s IP address, the new timestamp (of when the ad was closed), and state of value 3 (step 9). The CFC server then updates state value of the record identified by the session’s ID to 3 (after verifying that the previous state value of this session is 2).
Similarly to the ad clicked challenge explained in steps 6 and 7, to check if the client’s IP is spoofed, the server generates a new random challenge token and sends it back to the client in step 10. To prove that its IP is not spoofed, the client sends back the new challenge token with the session’s ID, publisher’s ID, IP address, and a state of value 4.
After verifying that the credentials sent by the client are correct and that the previous session state value is 3, the CFC server decides whether to consider this ad session as potentially malicious or not based on one/multiple criteria set by the CFC admin such as the difference between the previous saved timestamp and the current received timestamp, which represents the duration spent on the advertiser’s website. The ad requests that are flagged as potentially malicious, are updated in the database with a state value of 5. The ad requests that are flagged as non-malicious are updated in the online database with a state of value 4. Ad requests saved in the online database of both states 4 and 5 are added to the CFCA for analysis.
CFCA component (server side)
Besides managing the APIs and the online database, the server performs the CFCA in order to identify malicious publishers. The intuition behind our algorithm is based on the following idea: It is likely that a legitimate app (associated with a publisher) will have many non-malicious users that clicked on an ad and exited the ad webpage simply because they were not interested anymore in the landing page. However, it is unlikely that a non-malicious app has a high number of these suspicious requests. To reduce the false positive rate, we compare the percentage of malicious clicks per publisher to a starting point determined by the CFCA admin.
This system benefits from both a global view, where it gathers multiple ad requests data corresponding to different ad network-publisher-advertiser combinations, and a local view, where it is able to track the user’s engagement in each advertising website.
Attacker’s model
To evaluate the robustness of our click fraud detection system, we tested many different attackers’ models whose goal is to generate high revenues from ad networks:
-
Type A: A malicious publisher (without IP spoofing) creates a repetitive click automated tool that does not spend a long time on the advertiser’s website, since it is forced to exit the ad to open another ad immediately. This attack is easily detected by the proposed system since the duration spent on the advertiser’s website is calculated and it is used to determine whether the publisher is malicious or not.
-
Type B: A malicious publisher (without IP spoofing) places ads next to buttons in order to trick user into clicking them. This fraudulent act is known as placement ads in the literature. Since such a user is not necessarily interested in the ad, she will most likely exit the ad webpage directly, and thus this small session duration will be flagged by the proposed system as potentially malicious. However, there is a slight chance that, although tricked into clicking it, the user spends more than minimum required time on the advertised website. We consider these redirected ad requests as non-malicious since the users could become customers from the advertiser’s point of view.
-
Type C: A malicious publisher (without IP spoofing), after completing the ad clicked challenge, closes the webview in less than the minimum number of seconds, to be able to open a new ad session. However, being on the client’s side, she is able to drop the request generated by the CFC ad banner component when the ad is closed (when the duration spent is less than the minimum number of seconds), and fabricate this request later, after the minimum number of seconds has passed. The goal of this attack is to be able to repeatedly click on ads to generate higher revenues without spending the required duration on the advertiser’s website and without being detected. Although the attacker is able to drop the legitimate ad closing request and fabricate it, and by that falsify the duration spent on the advertiser’s website, however, since the IP is not spoofed, the CFC server can identify the user and therefore limit this attack to just one undetected ad request. If the malicious publisher fabricates many falsified ad requests, the CFC can detect abnormal entries of the same IP. For example, it is not feasible for the same user identified by IP, to spend more than a minimum number of seconds on 3 different websites in a short time interval.
-
Type D: A malicious publisher uses a spoofed IP address to be considered as a new user with each ad request. Whether this publisher fakes IP before or after completing the first ad clicked challenge (steps 5, 6, 7), since she did not complete both challenges, her state in the online database will not be updated to the final state of the ad requests considered as non-malicious. As explained in Section 3.7, the CFCA can simply time-out the ad requests with a state different than the final state and a reasonably old timestamp (to take into consideration honest sessions that still have not completed both challenges).
-
Type E: A malicious publisher hires multiple human clickers to imitate the normal user’s behavior by clicking on the ad and spending enough time to avoid being detected by our system. In addition to being forced to spend a significant number of seconds on each advertising website, the proposed system is able to identify the human clicker by her IP address that is sent with each ad click. When a high number of click requests with the same IP address is detected, the CFC server will flag the publishers using this IP address as potential threats. This limits the number of fraudulent clicks per human clicker before being detected as malicious by our system.