Stopping Distributed Phishing Attacks

This is the html version of the file http://weis2006.econinfosec.org/docs/47.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.
To link to or bookmark this page, use the following url:

http://www.google.com/search?q=cache:Lnyv3PfEm_8J:weis2006.econinfosec.org/docs/47.pdf+Stopping+Distributed+Phishing+Attacks&hl=en&ct=clnk&cd=2&gl=us

Google is neither affiliated with the authors of this page nor responsible for its content.

These search terms have been highlighted:

stopping

distributed

phishing

attacks

Stopping Distributed Phishing Attacks

Page 1

Stopping Distributed Phishing Attacks

Alex Tsow

Markus Jakobsson

Filippo Menczer

School of Informatics

Indiana University

Bloomington, IN 47406

http://www.indiana.edu/˜phishing

Abstract

Server takedown, the primary remedy for “email to Web host” phishing at-

tacks, may not be possible in the server’s jurisdiction, requires expeditious action

to be most effective, and cannot scale to the threats posed by distributed phishing

attacks (DPAs). In addition to these technical barriers, the economic incentives

do not align with server takedown. Under this policy, server ISPs must inconve-

nience their customers (who may only be guilty of running insufficiently secured

machines) and reap no benefit since ISPs are rarely victims of identity fraud. This

paper briefly outlines an in-progress system for identifying DPAs, but focuses on

the necessary policy implications of such a system. The paper’s thesis is that client

side ISPs should block suspected fraudulent Web hosts. Not only does this avoid

jurisdictional problems and time lost to negotiations with server side ISPs, but it

provides a visible and competitive advantage to client side ISPs.

1 Introduction

Phishing is a serious threat which Internet service providers (ISPs) are uniquely situ-

ated to stop. Yet ISPs have little incentive to stop phishing since they do not suffer high

dollar costs resulting from identity fraud. This paper outlines the a distributed variant

of phishing attacks [19] that thwarts existing countermeasures and an ISP based solu-

tion to this problem. We examine the effect of phishing on its stakeholders including

consumers, spoofed entities, target institutions, ISPs, and law enforcement. Finally, we

align our technological solution with the goals of each stakeholder to make a credible

case for adoption.

The most common phishing tactic today uses an “email to Web host” structure: the

phisher sends “bait” emails that direct their recipients to fraudulent Web hosts. In a

successful attack the victim voluntarily follows the email’s fraudulent link and sub-

mits personally identifying information (PII) to the Web host. Most often, the phish-

ing messages and Web hosts masquerade as financial institutions, online marketplaces,

government agencies or some other trustworthy entity that could plausibly ask for PIIs.

Page 2

This attack structure as some important vulnerabilities, mostly stemming from the

Web host collection mechanism. Since Web hosts must be available to the phishing

victims through the their standard browsers, the phisher cannot hide the Web host ad-

dress. Once phishing messages are identified the attack can be halted by eliminating the

server referenced in the email message. Subsequent “bites” to the bait message simply

direct the victim to an invalid link. No further credential collection can proceed.

While very effective in halting a phishing attack, the policy of server takedown

also bears several weaknesses. Server ISPs may refuse to cooperate with takedown

requests, and further they may be in jurisdictions takedown is unenforceable. Phishing

attacks have short lifetimes. Some experimental studies have shown that the bulk of

victim credentials are collected within 24 hours of mailing the bait messages [17, 18].

Assuming early detection, many people are still at risk in the time spent negotiating

server takedown. Phishers are already mitigating the consequences of takedown by

deploying thousands of fraudulent Web hosts per attack (Figure 1).

Internet crime has evolved from being principally executed by one broadly knowl-

edgeable person to being split among specialists who commoditize their services. No

longer do spammers search for open mail relays and harvest their own email lists,

nor do denial of service (DoS) attackers compromise their own zombie machines with

specially written software. Computer criminals have commoditized their specialties,

contracting their requirements to resellers of stolen resources. Botnets, or zombie nets,

are collections of compromised computers, often numbering in the tens of thousands,

whose behavior is controlled by a single operator. Because these networks are cheap,

large, and geographically heterogeneous, they are the primary actors in a variety of

criminal activities including spam, phishing, spoofed hosting, and denial of service at-

tacks. The presence of cheap botnet nodes has fundamentally altered the adversarial

model for many computer crimes by making vastly distributed attacks economically

feasible [22]. Ranging from $0.02-$0.10 per day rental depending on usage, a phisher

can deploy thousands of PII collection servers for $100.

Distributed phishing attacks [19] (DPA) use large numbers of fraudulent Web hosts

for each set of bait messages. Each server is responsible for collecting only a tiny

percentage of victim PIIs, so server takedown only significantly hinders a DPA when

applied to thousands of servers within hours of the initial mailing. In the extreme case

where each victim is referred to a unique Web page, the benefits of detection vanish.

If the user recognizes the bait message as a component of a phishing attack, the link

to the fraudulent Web server is not generalizable information since it only collects

information for the one victim; disabling the server will not prevent any other potential

victims from betraying their PII. While the extreme case may not be economical if

large numbers of victims (> 100, 000) are targeted, DPAs that use thousands of severs

carry most of the benefits. In this more affordable case, reporting a server eliminates

less than 0.1% of the attack’s collection capacity.

In addition to frustrating server takedown, DPAs limit the utility of current database

oriented phishing countermeasures. Systems that depend on user reports such as Cloud-

mark [9], Netcraft [25] and PhishGuard [26] suffer the same problem as takedown: the

One variant of this attack was identified in the wild on March 9, 2006, http://www.rsasecurity.

com/press release.asp?doc id=6615.

Page 3

Standard

Phishing

Bait

Standard

Phishing

Bait

Standard

Phishing

Bait

Standard

Phishing

Bait

DPA

bait

DPA

bait

DPA

bait

DPA

bait

Zombie

Host

Zombie

Host

Zombie

Host

Zombie

Host

Fraudulent

Web

Host

Standard Phishing Attack

Distributed Phishing Attack

Figure 1: Topology of a standard phishing attack vs. a distributed phishing attack

(DPA).

reported collection server is only responsible for a portion of the attack. Moreover,

reputation based mechanisms may take too much time for an fraudulent host to build

a sufficiently poor rating that would deter a user from submitting the PII. The phisher

only needs to establish a 24-hour window of opportunity for an effective attack.

Many current anti-phishing technologies focus their efforts on the authenticity col-

lection servers: SpoofGuard [8] does an in-depth content analysis of Web pages, Pwd-

Hash [28] generates per-domain passwords so that collection hosts using one-off do-

main names will receive unusable data, Fraud Eliminator, Earthlink , McAfee, and

eBay. Other technologies treat phishing as a subset of spam. Spamhaus, Brightmail,

PureMessage, and Cloudmark (spam side) filter phishing bait messages using spam

classifiers. To our knowledge, no proposal for evaluating both bait mail and the refer-

enced collection servers has been published.

2 Coordinated Anti Phishing Infrastructure

Client side internet service providers (ISPs) have a unique structural advantage for

detecting DPAs. They have access to both components of a phishing attack: the bait

email and the collection server. We are developing an ISP centric approach to fighting

DPAs (Figure 2).

Coordinated Anti Phishing Infrastructure (CAPI) is an initiative combining local

phishing detection and widespread traffic analysis to quickly identify and halt dis-

tributed phishing attacks. Its goal is to characterize distributed phishing attack in-

stances with lightweight classifiers, distribute and refine these classifiers among peers,

and produce accurate signature-specific takedown lists for DPAs. Since takedown lists

are grouped by attack instance (e.g., collections of messages that vary only by host

references), they expose part of the botnet to which they belong. Law enforcement can

use these lists to monitor, infiltrate or disrupt identified botnets.

First CAPI collects candidate messages from incoming email streams and honey-

Page 4

Polymorphic

phishing message

Victim

Honey

Pot

Honey

Pot

Victim

Zombie

Host

Zombie

Host

Message

Filtering

Units

ISP

Phishing

Detection

Units

Candidate Sets

At ack

Signature

Initial

Takedown

List

Law

Enforce-

ment

Confirmed

Takedown

List

Figure 2: DPAs are vulnerable to incoming email filtering, victim Web host request

filtering, and timely takedown of zombie hosts. Email from honeypots and gross can-

didate set classifiers are forwarded to the phishing detection units (PDUs). The PDUs

produce attack signatures based on an in-depth feature analysis. These attack filters are

forwarded to the message filtering units (MFUs). The MFUs apply the attack signatures

to incoming email streams. Upon positive match, the MFU forwards the corresponding

takedown lists to the PDUs for confirmation. Law enforcement can use these lists to

track zombie hosts and their corresponding botnets.

Page 5

pots using a low false negative filter (e.g. DPA bait messages are unlikely to escape).

False positives are unimportant for preliminary classification since the candidate sets

will be analyzed in depth by later heuristics (described below). Collection of candidate

emails from honeypot accounts will employ existing technology for spam filtering; a

common set of keywords and images will be used to screen emails and identify candi-

date messages.

These candidate messages are forwarded to a phishing detection unit (PDU) which

performs in-depth feature analysis of candidate messages and corresponding links to

determine its DPA likelihood. Initially, the PDU will examine email text and referenced

websites. As the system becomes more sophisticated, PDUs will examine images and

embedded scripts.

Text analysis. Phishers cannot use all the tricks of spammers (such as character sub-

stitution, extra spaces, etc.) to defeat keyword filters because their messages must

appear identical to authentic ones. They can however use morphological variations

at the level of HTML markup, for example introducing invisible tags. Additionally,

a sophisticated attack may obtain a degree of polymorphism by combining pieces of

messages from a pool of candidate components, and interspersing randomly selected

text displayed in an undetectable manner. Therefore we need robust techniques to de-

tect any degree of similarity among emails used in a DPA. Techniques for detecting

partial matches (e.g., [16, 5, 10]) have been successfully applied to detecting large-

scale polymorphic spam attacks [21]. Partial signatures for such documents collected

in honeypots or reported by users have been successfully coordinated both in commer-

cial products such as Brightmail and in open-source projects such as Vipul’s Razor [27]

and the Distributed Checksum Clearinghouse [11].

Link and functional analysis. It may be possible to detect a DPA by examining

hosts that are referenced in DPA candidate messages. Pages pointed to by candidate

members of the DPA are likely to exhibit a large degree of similarity as well, therefore

techniques to detect mirrors on the Web can be applied to this task [4, 7]. Traversing

links to detect this similarity is an inherently unsafe activity for the end user, due to the

possibility that a link could have undesired semantics associated with it. However, such

an analysis could be performed with information sharing between spam and phishing

detection components, as proposed in [12], and automatically by user agents operating

within a sandboxed environment. Once the PDU traverses a link in an email that is

suspected of participating in a DPA, the content of subsequent Web pages may also

be evaluated to determine whether they are similar to other pages known to be part of

a DPA. This is also similar to analysis performed by some phishing toolbars, with an

added degree of scrutiny applied to suspicious emails.

The PDU analysis is resource intensive, and therefore is not suitable for real-time

filtering of incoming mail. Its primary goal is to build an individualized email filter per

phishing attack. We thus frame the problem as a large collection of classification prob-

lems, each with a different (and moving) target concept. Each PDU will construct a

filter, in the form of one or more classifiers, reflecting its attack instances. The labeled

examples needed to train each classifier will be provided by the PDU’s in-depth anal-

ysis. Once trained, the classifier will quickly recognize future instances of the DPA.

Individual classifiers in an ensemble will be obtained in two ways: by direct training us-

Page 6

ing positive and negative examples from the PDU, and by collecting similarly-trained

classifiers from other nodes. Use of ensemble classification will make the resulting

filters accurate and adaptive [3].

The resulting ensemble classifiers are used for real-time filtering of incoming emails.

The Web collection hosts are accumulated into takedown lists. A DPA could mali-

ciously include legitimate Web sites in the collection host segment of a DPA template

(Figure 3). This is an attempt to cause denial of service to legitimate Web sites by

correlating them with an actual attack. To avoid this, a final round of takedown list

analysis is necessary. In addition to content similarity, the rank of the associated do-

main could be determined by search engines. Since collection hosts only persist for

short periods of time, they will not score highly in these queries.

This system is in development, but regardless of its implementation details, CAPI

has many foreseeable properties for which the policy of server side takedown is in-

feasible. ISPs are responsible for deploying and maintaining the system, yet are rarely

targets of identity fraud. The primary target institutions will benefit most from an effec-

tive system, but do not bear any direct costs for its implementation. Every automatic

analysis had some false positives. In the presence of the wrong takedown policies,

imperfect classification could result in denial of service to honest Web hosts.

Honest

Message

DPA

bait

DPA

bait

DPA

bait

Zombie

Host

Zombie

Host

Zombie

Host

Honest

Domain

Strong Heuristic Correlation

Potential Denial of Service Against Honest Host

Figure 3: A phisher may try to cause denial of service for some honest domains that are

too small for whitelist protection by including them as links in a real or apparent DPA.

Thus, each candidate zombie site needs to be individually examined for participation in

a DPA; we cannot assume that the IP that occupies the zombie host position of of a DPA

message template is necessarily a zombie host. In light of this consideration, a phisher

may attempt to cause a false negative with the same attack. By linking legitimate sites

to a moderate percentage of the DPA bait messages, the phisher hopes to lower the

signature’s confidence rating below the deployment threshold.

Page 7

3 Stakeholders in a phishing attack

Consumer costs and motivation

Consumers are the recipients of bait messages. Phishing attacks place their PII at risk.

In addition to loss of privacy, the 2006 addendum to the Federal Trade Commission’s

Identity Fraud Survey places the average monetary consumer loss due to identity fraud

at $422. Worse is the 40 hours on average spent resolving the issue; at a conservative

time valuation of $15 per hour, this aggravating task costs $600 of consumer time.

While the Gartner phishing survey [23] finds that 3% of its subjects have divulged

PII to fraudulent Web hosts, this estimate of risk is probably low. Recent experiments

suggest much higher compromise rates. A phishing experiment that leveraged social

networks inside Indiana University shows a high rates of PII disclosure (72%) when

messages appear to come from a recipient’s friend, and 16% when messages appear to

come from an unknown address inside the University’s domain [17]. These messages

were not subject to spam filtering, so an adjustment is necessary to model a real at-

tack. Another experiment which simulated PII disclosure on clients of a popular online

marketplace showed hit rates varying between 5% and 20% depending on the level

of message customization [18]. Messages in this experiment exhibited inconsistent

header information as most phishing messages do, and were subject to spam filtering

by the target’s ISP. While their 5%±4% result appears to confirm the Gartner survey’s

estimate, this measures the success rate per attack. Consumers are typically subject to

multiple attacks per day.

Besides direct identity theft, phishing is also a vehicle for pushing malware [2].

The a Web site

Consumers have much to gain from effective anti-phishing technology. Yet re-

search suggests that consumers will not pay for stand alone privacy or security solu-

tions. Security products are often bundled with products unrelated to computer secu-

rity in recognition of many consumers’ low willingness to pay for such protection [14].

Shostack and Syverson [29] illustrate many contexts in which consumers pay for pri-

vacy and that in all instances the protection is bundled with another good of indepen-

dent utility. They argue that consumers often assume privacy, and need to understand

in simple terms why additional protection may be required.

In addition, consumers frequently underestimate their risk of victimization for all

kinds of dangers because of incorrect anchoring [24, 6]. Risk perception depends

on anchoring, or an initial judgement of likelihood. Phishing is new enough that its

statistical risks remain a moving target. Developing a meaningful anchor at this stage

is difficult for researchers, let alone end users.

Target Institutions

The target institution is the agent from whom fraudsters attempt to steal money. While

identity fraud directly hurts consumers, far more money is lost by the corresponding

commercial services and their insurers: in 2005 average consumer losses per fraud

event were $422, while the average amount stolen was $6,383 [20]. Actual costs may

be higher due to time spent resolving the issue. At well over $50 billion per year

Page 8

stolen by fraudseters, target institutions have the most to gain by eliminating identity

fraud. However increased security could hurt the bottom line more than it helps. For

instance, a heavyweight vetting process for opening new accounts could cost lenders

more money than it saves if it deters too many honest borrowers.

Spoofed Entities

In case the phisher opts to impersonate a legitimate institution such as a bank, we refer

to this agent as the spoofed entity. While spoofed entities are frequently the subsequent

targets of identity fraud (e.g., financial institutions, online marketplaces, credit cards),

the phisher may be seeking PIIs for the purposes of new account fraud, a class of fraud

that is harder to detect and results in much higher average theft amounts [30]. Also

noteworthy, is the commoditization of stolen identity dossiers. Depending on wealth,

completeness of information, and credit records, such dossiers sell for $25-$250 per

item ??.

The Anti Phishing Working Group monthly reports (e.g., [2]) show that financial

services are the dominant spoofed entities with a spoof share hovering between 80%

90%. Usually these are banks or credit card companies, however there has been a noted

increase in Internal Revenue Service spoofing. ISPs are a distant second, holding be-

tween 4% and 10% of the spoof share. Retail constitutes 2% to 5%, and the remaining

total of miscellaneous attacks varies in the same range.

It’s likely that many of the financial services are simultaneously target institutions

for fraud. The IRS spoofing is more likely a convincing front for a wealth of PII

disclosure (governments are not famous for giving money back). This information will

be used or resold for new account fraud. Enterprise management software will become

an attractive target for phishers seeking large amounts of PII for resale. These systems

replace traditional paper administration in many areas, most notably payroll and taxes.

Indiana University’s Peoplesoft based system offers full monthly salary statements,

full contact information, and even W-2 forms (federal documents stating earnings and

social security number). Moreover, the system is a portal for managing retirement,

investment services, banking, and student loans.

Internet Service Providers

Phishing does not expose ISPs serious threats. It may constitute a significant chunk of

spam, however spam filters will stop enough of the phishing attacks to prevent traffic

congestion. Stopping phishing is only a secondary concern. As attacks become more

sophisticated, phishers will depend less on indiscriminate spamming and select contex-

tualized information. Although ISPs are the second most common spoofed entity, they

are not high cost victims of identity fraud. This is a serious problem for the adoption of

CAPI. Our proposal places the infrastructure deployment and maintenance squarely on

the shoulders of an agent who reaps the least direct benefit. Unless a credible business

case can be made for adopting CAPI, it is doomed to fail [1].

Page 9

Law enforcement

In rich countries, law enforcement has an obvious interest in eliminating phishing at-

tacks. Citizens and institutions of rich countries are the primary victims of phishing.

Phishing damages trust on the Internet and could slow economic expansion as a result.

DPAs have particular interest to these law enforcement agencies since they depend on

botnets. Their identification can help track, disrupt, or otherwise extract information

about these zombie networks and their associated criminal activities.

Poor countries do not have attractive victims. Law enforcement will be correspond-

ingly apathetic. Even if the country hosts fraudulent servers, law enforcement has far

more pressing concerns than protecting the money of foreigners.

4 Stakeholder analysis of CAPI

Internet Service Providers

While server side takedown of fraudulent Web hosts is the dominant method for com-

batting phishing today, the benefits apply to all internet users, not just the clients of ISPs

that participate in CAPI. This kind of free riding effect severely damages the incentives

for ISPs to participate.

Client ISP host blocking is an attractive alternative or supplement to server side host

takedown. In this scenario, when the recipient clicks on the malicious bait message’s

link, the client’s ISP intercepts the request and refuses to serve the link. Client side

blocking does not depend on the server’s jurisdiction. Client ISPs can protect their

users as soon as CAPI identifies a credible threat; no time is spent convincing a server

side ISP to shut down a paying customer.

Critically, this gives the ISP a direct benefit for its participation in CAPI. Their

quick reaction and independence from the server’s cooperation creates a safer Internet

for users of their service. In the process of blocking the ISP may opt to replace the re-

quested page with a short promotional message underscoring the ISP’s commitment to

security. This action makes the ISP’s safeguards visible to end users, possibly increas-

ing perception of value. Marketed correctly, these policies endow participating ISPs

with tangible competitive advantages and can lead to further tiering of their service

structure.

CAPI requires competing ISPs to share information. Previous research shows that

cooperation among competitors produces both direct and strategic benefits, even in the

presence of incomplete disclosure [13, 15]. The direct benefit here is wider examina-

tion and dissemination of phishing signatures and takedown lists. Strategically, their

cooperation reduces downward pricing pressure in a competitive market. ISPs can

maintain higher pricing because of the added value from phishing protection.

Consumers and PII victims

We want to provide a transparent experience for end users who receive their email

through a CAPI filtered system. CAPI operates outside of client machines and is there-

fore not subject to client misconfiguration or malware. It is also independent of the

Page 10

user platform, so long as standard Internet protocols are followed. A perfect system

eliminates the class of phishing messages that we target and results in immediate take-

down/blocking of the spoofed hosts. In this case, there is no need for user policies.

No matter how good the filtering is, it will invariably produce small numbers of false

positives and false negatives. Whitelisting can protect influential companies from in-

advertent takedown and should be implemented, but finer policies are necessary in the

presence of the occasional misclassification.

Sufficiently distinct attacks will produce false negatives because of the time needed

to detect their existence (assuming it is unmatched by prior signatures) and generate a

suitable classifier. While the ISPs will not “undeliver” or remove offending messages

from the user’s in box, they can certainly scan delivered messages to extract the suspi-

cious IP addresses. If the bait recipient follows one of these links, the requested page

will be blocked either by the client ISP.

Other false negatives may be missed by the filtering system. These represent the

messages that successfully evade CAPI’s filtering technologies. If this number is low,

the user will still benefit from the system, since the overwhelming number of attacks

will be avoided.

False positives are a larger concern. High false positive rates in either message

filtering or server takedown will cause clients to stop using the system. As noted earlier,

a phisher could also try to affect denial of service by applying the DPA template to

honest servers (Figure 3). If some of these honest servers escape detection, client-side

blocking results limited DoS.

We suspect that users will have a higher tolerance for host false positives than

for bait mail false positives if they are allowed to proceed at their own risk in the

presence of stern warnings. One possible strategy will be for the client side ISP to link

to the suspicious host upon client completion of a survey. The survey may ask how

the page was linked (e.g., direct address entry, email reference, application reference,

etc.), a multiple choice expected use classification, and a CAPTCHA (a quick test that

confirms a human agent) to thwart automatic click through. In a deployed CAPI, this

data could optionally inform the re-evaluation of false positives.

Since reaction time is so critical to halting phishing attacks, client ISPs could ag-

gressively intercept requests to suspicious servers even for medium confidence matches.

In these cases, the ISP offers service to the Web site, but advises the clients to wait. Ev-

idence of phishing will become clearer as time passes, either through additional CAPI

nodes reporting similar attacks or through third party verification. The degree of hur-

dles to access the page could increase (or decrease) with certainty of its participation as

a fraudulent Web host. This countermeasure is designed to balance protection of small

Web sites that are not whitelisted from malicious DPA inclusion against the very real

threat of a small Web site compromise.

Continuing to deliver a small portion of bait phishing messages has the benefit of

educating clients. If an end user clicks on the bait link, the corresponding Web page

is blocked in the manner suggested above. The blocking page could further enumerate

the suspicious features to the user. Repeated instances of this scenario would help the

user to anchor his perception, or establish an initial estimate, of phishing risk.

Page 11

Target institutions

While target institutions have the most to gain, they would bear a disproportionately

small burden of CAPI’s direct costs. In the absence of regulation or cooperation among

the main beneficiaries of phishing elimination (i.e. the target institutions, not the end

users), ISPs may elect to pay for their infrastructure investments by charging an en-

rollment fee to the large targets of phishing related spoofing. In the current phishing

landscape, most spoofed entities are also the targets of identity fraud. These spoofed

entities would reap direct benefits from their enrollment. If phishing shifts to spoofing

PII portals such as the digital bureaucracy administration software described above,

this pricing scheme becomes less effective. Owners of these portals are not the targets

of identity fraud. The stolen credentials will be used to open new accounts at third

party institutions.

Spoofed entities

Spoofed entities can further participate in CAPI (beyond the subscriber fee proposed

above) by developing consistent design principles that aid the automated recognition

that CAPI performs and also behave as an evident marker for their customers. As CAPI

becomes widespread, DPA attacks will become more adept at skirting the boundaries

of reliable filtering.

Presently, institutions often enable effective spoofing by violating their own rules

of thumb. For instance, even though users commonly overlook the SSL browser frame

padlock, it is an important cue for Web site authenticity. An easy rule of thumb is

“don’t enter login information on http Web sites, and don’t accept self-signed cer-

tificates.” If followed (people may be fooled into abandoning the rule), the attacker

would need to get a certificate authority to sign his zombie host’s SSL certificate;

this makes the attack much harder. At the time of this writing large companies such

as Chase (www.chase.com, www.cardmemberservices.com) use http for

their front pages. Of course the buttons are part of a javascript that launches a crypto-

graphic SSL session to send the login information. From an efficiency point of view,

http is much cheaper to serve. From a usability standpoint, a login collection on the

front page makes the Web site accessible to new users. Yet the combination of these

create a Web site that is easy to spoof, particularly in the presence of insecure DNS

protocols. Worse, it makes spoof detection even harder for savvy users (since the rule

of thumb breaks).

To the user, an effective security signal is both evident when present and conspicu-

ous when absent or incorrect. Technologically, the signal should be hard to duplicate.

Furthermore, attempts to place an incorrect placeholder signal should be automatically

detectable. For instance, many anti-phishing tools perform extra analysis when a login

page is detected. Detection is easy when the page internally labels a form box with

“login” and another with “password.” This becomes considerably harder if a phisher

tries to conceal the page’s semantic content by internally labeling the form boxes as

“search term” and “opt1,” and externally replacing text with graphics. Yet to a human

the purpose of the page is clear both by visual likeness (the graphics duplicate the text)

and by context from the bait message. For important pages such as login, a distinctive

Page 12

and easily machine-readable font that marks the form box usage could be an effective

countermeasure to obfuscation. To be effective, the font needs to be widely adopted for

this purpose so that people expect to see it for entry of critical information. In this case,

automatic analysis extracts the Web page’s login intent by applying character recogni-

tion to the screenshot of the Web browser. As we gain experience with the limits of

automatic signature generation, we will be able to suggest practices for critical page

design that make filtering easier.

Law enforcement

Phishing is a jurisdictionally diverse crime. Recipients of bait messages, owners of

compromised hosts, and spoofed target institutions are in different geographic loca-

tions, limiting the intervention options of local law enforcement agencies. CAPI is a

distributed data collection infrastructure and needs force of federal authority to shut

down phishing attacks as quickly as possible. While many attack structures may reside

outside of the United States, this country’s vast bandwidth gives federal authorities a

great deal of leverage in fighting the foreign elements of phishing.

Of course many bait recipients are in the United States, and must connect to spoofed

hosts through the national network. Even if ISPs in foreign jurisdictions refuse to shut

down positively identified phishing servers, federal authorities can still ban national

routing of the offending hosts. If the phishing host has dynamic IP, it becomes neces-

sary to ban the entire subnet – a strong incentive for uncooperative ISPs.

CAPI produces two important sets of data that are relevant to fighting phishing

and cybercrime. First is the list of IP addresses for spoofed servers resulting from

host-side analysis. This necessary portion of phishing requires an acknowledged TCP

handshake and is therefore not suitable for IP spoofing. As noted earlier, the prevalence

of botnets makes it cheap for phishers for employ thousands of compromised hosts

for a single DPA; each server handles limited numbers of recipients so that individual

takedown only disables a small portion of the attack. CAPI classifies individual attacks,

thus grouping the servers of DPAs together. It is likely that these hosts are purchased

from the same botnet. Lists of correlated zombies provide concrete starting points

for disrupting botnets. These networks for hire form the ground troops for a bazaar

of criminal activity including spam, denial of service, and illegal content distribution.

Dismantling them has multiplicative benefits for Internet safety.

Secondly, CAPI collects a great deal of timing statistics for phishing attacks. Traffic

analysis will not only improve detectability of attacks, but provide investigators with

time bounds on botnet resource usage. This added data can help correlate multiple

attacks. Patterns in bait arrival times can signify phishing attacks launched by particular

botnet operators.

5 Conclusion

We propose CAPI, an ISP based solution to distributed phishing attacks. It leverages

the ISP’s unique access to both bait messages and fraudulent Web hosts to perform an

analysis that correlates the two sides of the attack. While a technologically sound idea,

Page 13

ISPs have little to gain directly from phishing elimination yet bear the costs of system

deployment and maintenance. We propose ways to align CAPI with stakeholder inter-

ests. Fraudulent Web hosts should be blocked by the client ISP; this limits the imme-

diate benefits of detection to clients of participating ISPs. The pressures of immediate

action in the presence of medium confidence can be mitigated contingent access to sus-

picious websites. Continuing to deliver phishing messages in a manner that makes their

phishing score evident while blocking the fraudulent Web site can help to plausibly an-

chor end user risk perception. Finally, deferring server takedown also has the benefit of

allowing law enforcement to track, disrupt and infiltrate botnets. Because botnets are

a convergent point in computer crime, their identification and subsequent disruption

not only curtails phishing attacks, but stops the network from launching other criminal

activities.

References

[1] R. Anderson. Why information security is hard-an economic perspective. In

ACSAC ’01: Proceedings of the 17th Annual Computer Security Applications

Conference, page 358, Washington, DC, USA, 2001. IEEE Computer Society.

[2] APWG. Phishing activity trends report. Technical report, Anti-Phishing Working

Group, December 2005.

[3] E. Bauer and R. Kohavi. An empirical comparison of voting classification algo-

rithms: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.

[4] K. Bharat, A. Broder, J. Dean, and M. Henzinger. A comparison of techniques to

find mirrored hosts on the WWW. Journal of the American Society for Informa-

tion Science, 51(12):1114–1122, 2000.

[5] A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of

the web. In Proc. 6th International World Wide Web Conference, Santa Clara,

California, April 1997.

[6] L. J. Camp. Mental models of security. IEEE Technology & Society, under revi-

sion.

[7] J. Cho, N. Shivakumar, and H. Garcia-Molina. Finding replicated web collec-

tions. In Proceedings of 2000 ACM International Conference on Management of

Data (SIGMOD), 2000.

[8] N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense

against web-based identity theft. In NDSS, 2004.

[9] Cloudmark messaging security solutions. http://www.cloudmark.com.

[10] C. Collberg, S. Kobourov, J. Louie, and T. Slattery. Splat: A system for self-

plagiarism detection. In ICWI, pages 508–514, 2003.

[11] Distributed Checksum Clearinghouse. http://www.rhyolite.com/anti-spam/.

Page 14

[12] A. Emigh. Anti-phishing technology. Report of the United States Secret Service

San Francisco Electronic Crimes Task Force, January 2005.

[13] E. Gal-Or and A. Ghose. The economic consequences of sharing security infor-

mation. In Workshop on Economics and Information Security, 2003.

[14] A. Ghose and A. Sundararajan. Pricing security software: Theory and practice.

In Workshop on Economics and Information Security, 2005.

[15] L. A. Gordon, M. P. Loeb, and W. Lucyshyn. An economics perspective on the

sharing of information related to security breaches: Concepts and empirical evi-

dence. In Workshop on Economics and Information Security, 2002.

[16] N. Heintze. Scalable document fingerprinting. In Proc. Second USENIX Work-

shop on Electronic Commerce, Oakland, California, November 1996.

[17] T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer. Social phishing. Commu-

nications of the ACM, To appear.

[18] M. Jakobsson and J. Ratkiewicz. Designing ethical phishing experiments: A

study of (rot13) ronl auction query features. In WWW, 2006.

[19] M. Jakobsson, A. Young, and A. Emigh. Distributed phishing attacks. Technical

Report 091, International Association for Cryptologic Research, ePrint, 2005.

[20] Javelin Strategy & Research. Identity theft survey report (consumer version),

2006.

[21] A. Kolcz, A. Chowdhury, and J. Alspector. The impact of feature selection on

signature-driven spam detection. In Proceedings of the First Conference on Email

and Anti-Spam (CEAS), 2004.

[22] B. Laurie and R. Clayton. Proof of work proves not to work. In Workshop on

Economics and Information Security, 2004.

[23] A. Litan. Phishing attack victims likely targets for identity theft, 2004.

[24] M. G. Morgan. Probing the question of technology-induced risk. In T. S. Glick-

man and M. Gough, editors, Readings in Risk, pages 5–15. Resources for the

Future, Washington, D.C., 1990.

[25] Netcraft toolbar. http://toolbar.netcraft.com.

[26] Phishguard. http://www.phishguard.com.

[27] V. V. Prakash. Razor. http://razor.sourceforge.net/.

[28] B. Ross, C. Jackson, N. Miyake, D. Boneh, and J. C. Mitchell. Stronger password

authentication using browser extensions. In Usenix Security Symposium, 2005.

Page 15

[29] A. Shostack and P. Syverson. What price privacy? In L. J. Camp and S. Lewis, ed-

itors, Economics of Information Security, volume 12 of Advances in Information

Security, pages 129–142. Springer, 2004.

[30] Synovate. Federal trade commission identity theft survey report, 2003.