This is the html version of the file http://weis2006.econinfosec.org/docs/47.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:Lnyv3PfEm_8J:weis2006.econinfosec.org/docs/47.pdf+Stopping+Distributed+Phishing+Attacks&hl=en&ct=clnk&cd=2&gl=us


Google is neither affiliated with the authors of this page nor responsible for its content.
These search terms have been highlighted: stopping distributed phishing attacks 

Stopping Distributed Phishing Attacks
Page 1
Stopping Distributed Phishing Attacks
Alex Tsow
Markus Jakobsson
Filippo Menczer
School of Informatics
Indiana University
Bloomington, IN 47406
http://www.indiana.edu/˜phishing
Abstract
Server takedown, the primary remedy for “email to Web host” phishing at-
tacks, may not be possible in the server’s jurisdiction, requires expeditious action
to be most effective, and cannot scale to the threats posed by distributed phishing
attacks (DPAs). In addition to these technical barriers, the economic incentives
do not align with server takedown. Under this policy, server ISPs must inconve-
nience their customers (who may only be guilty of running insufficiently secured
machines) and reap no benefit since ISPs are rarely victims of identity fraud. This
paper briefly outlines an in-progress system for identifying DPAs, but focuses on
the necessary policy implications of such a system. The paper’s thesis is that client
side ISPs should block suspected fraudulent Web hosts. Not only does this avoid
jurisdictional problems and time lost to negotiations with server side ISPs, but it
provides a visible and competitive advantage to client side ISPs.
1 Introduction
Phishing is a serious threat which Internet service providers (ISPs) are uniquely situ-
ated to stop. Yet ISPs have little incentive to stop phishing since they do not suffer high
dollar costs resulting from identity fraud. This paper outlines the a distributed variant
of phishing attacks [19] that thwarts existing countermeasures and an ISP based solu-
tion to this problem. We examine the effect of phishing on its stakeholders including
consumers, spoofed entities, target institutions, ISPs, and law enforcement. Finally, we
align our technological solution with the goals of each stakeholder to make a credible
case for adoption.
The most common phishing tactic today uses an “email to Web host” structure: the
phisher sends “bait” emails that direct their recipients to fraudulent Web hosts. In a
successful attack the victim voluntarily follows the email’s fraudulent link and sub-
mits personally identifying information (PII) to the Web host. Most often, the phish-
ing messages and Web hosts masquerade as financial institutions, online marketplaces,
government agencies or some other trustworthy entity that could plausibly ask for PIIs.
1

Page 2
This attack structure as some important vulnerabilities, mostly stemming from the
Web host collection mechanism. Since Web hosts must be available to the phishing
victims through the their standard browsers, the phisher cannot hide the Web host ad-
dress. Once phishing messages are identified the attack can be halted by eliminating the
server referenced in the email message. Subsequent “bites” to the bait message simply
direct the victim to an invalid link. No further credential collection can proceed.
While very effective in halting a phishing attack, the policy of server takedown
also bears several weaknesses. Server ISPs may refuse to cooperate with takedown
requests, and further they may be in jurisdictions takedown is unenforceable. Phishing
attacks have short lifetimes. Some experimental studies have shown that the bulk of
victim credentials are collected within 24 hours of mailing the bait messages [17, 18].
Assuming early detection, many people are still at risk in the time spent negotiating
server takedown. Phishers are already mitigating the consequences of takedown by
deploying thousands of fraudulent Web hosts per attack (Figure 1).
1
Internet crime has evolved from being principally executed by one broadly knowl-
edgeable person to being split among specialists who commoditize their services. No
longer do spammers search for open mail relays and harvest their own email lists,
nor do denial of service (DoS) attackers compromise their own zombie machines with
specially written software. Computer criminals have commoditized their specialties,
contracting their requirements to resellers of stolen resources. Botnets, or zombie nets,
are collections of compromised computers, often numbering in the tens of thousands,
whose behavior is controlled by a single operator. Because these networks are cheap,
large, and geographically heterogeneous, they are the primary actors in a variety of
criminal activities including spam, phishing, spoofed hosting, and denial of service at-
tacks. The presence of cheap botnet nodes has fundamentally altered the adversarial
model for many computer crimes by making vastly distributed attacks economically
feasible [22]. Ranging from $0.02-$0.10 per day rental depending on usage, a phisher
can deploy thousands of PII collection servers for $100.
Distributed phishing attacks [19] (DPA) use large numbers of fraudulent Web hosts
for each set of bait messages. Each server is responsible for collecting only a tiny
percentage of victim PIIs, so server takedown only significantly hinders a DPA when
applied to thousands of servers within hours of the initial mailing. In the extreme case
where each victim is referred to a unique Web page, the benefits of detection vanish.
If the user recognizes the bait message as a component of a phishing attack, the link
to the fraudulent Web server is not generalizable information since it only collects
information for the one victim; disabling the server will not prevent any other potential
victims from betraying their PII. While the extreme case may not be economical if
large numbers of victims (> 100, 000) are targeted, DPAs that use thousands of severs
carry most of the benefits. In this more affordable case, reporting a server eliminates
less than 0.1% of the attack’s collection capacity.
In addition to frustrating server takedown, DPAs limit the utility of current database
oriented phishing countermeasures. Systems that depend on user reports such as Cloud-
mark [9], Netcraft [25] and PhishGuard [26] suffer the same problem as takedown: the
1
One variant of this attack was identified in the wild on March 9, 2006, http://www.rsasecurity.
com/press release.asp?doc id=6615.
2

Page 3
Standard
Phishing
Bait
Standard
Phishing
Bait
Standard
Phishing
Bait
Standard
Phishing
Bait
DPA
bait
DPA
bait
DPA
bait
DPA
bait
Zombie
Host
Zombie
Host
Zombie
Host
Zombie
Host
Fraudulent
Web
Host
Standard Phishing Attack
Distributed Phishing Attack
Figure 1: Topology of a standard phishing attack vs. a distributed phishing attack
(DPA).
reported collection server is only responsible for a portion of the attack. Moreover,
reputation based mechanisms may take too much time for an fraudulent host to build
a sufficiently poor rating that would deter a user from submitting the PII. The phisher
only needs to establish a 24-hour window of opportunity for an effective attack.
Many current anti-phishing technologies focus their efforts on the authenticity col-
lection servers: SpoofGuard [8] does an in-depth content analysis of Web pages, Pwd-
Hash [28] generates per-domain passwords so that collection hosts using one-off do-
main names will receive unusable data, Fraud Eliminator, Earthlink , McAfee, and
eBay. Other technologies treat phishing as a subset of spam. Spamhaus, Brightmail,
PureMessage, and Cloudmark (spam side) filter phishing bait messages using spam
classifiers. To our knowledge, no proposal for evaluating both bait mail and the refer-
enced collection servers has been published.
2 Coordinated Anti Phishing Infrastructure
Client side internet service providers (ISPs) have a unique structural advantage for
detecting DPAs. They have access to both components of a phishing attack: the bait
email and the collection server. We are developing an ISP centric approach to fighting
DPAs (Figure 2).
Coordinated Anti Phishing Infrastructure (CAPI) is an initiative combining local
phishing detection and widespread traffic analysis to quickly identify and halt dis-
tributed phishing attacks. Its goal is to characterize distributed phishing attack in-
stances with lightweight classifiers, distribute and refine these classifiers among peers,
and produce accurate signature-specific takedown lists for DPAs. Since takedown lists
are grouped by attack instance (e.g., collections of messages that vary only by host
references), they expose part of the botnet to which they belong. Law enforcement can
use these lists to monitor, infiltrate or disrupt identified botnets.
First CAPI collects candidate messages from incoming email streams and honey-
3

Page 4
Polymorphic
phishing message
Victim
Victim
Honey
Pot
Honey
Pot
Victim
Victim
Zombie
Host
Zombie
Host
Message
Filtering
Units
ISP
Phishing
Detection
Units
Candidate Sets
At ack
Signature
Initial
Takedown
List
Law
Enforce-
ment
Confirmed
Takedown
List
3
2
1
Figure 2: DPAs are vulnerable to incoming email filtering, victim Web host request
filtering, and timely takedown of zombie hosts. Email from honeypots and gross can-
didate set classifiers are forwarded to the phishing detection units (PDUs). The PDUs
produce attack signatures based on an in-depth feature analysis. These attack filters are
forwarded to the message filtering units (MFUs). The MFUs apply the attack signatures
to incoming email streams. Upon positive match, the MFU forwards the corresponding
takedown lists to the PDUs for confirmation. Law enforcement can use these lists to
track zombie hosts and their corresponding botnets.
4

Page 5
pots using a low false negative filter (e.g. DPA bait messages are unlikely to escape).
False positives are unimportant for preliminary classification since the candidate sets
will be analyzed in depth by later heuristics (described below). Collection of candidate
emails from honeypot accounts will employ existing technology for spam filtering; a
common set of keywords and images will be used to screen emails and identify candi-
date messages.
These candidate messages are forwarded to a phishing detection unit (PDU) which
performs in-depth feature analysis of candidate messages and corresponding links to
determine its DPA likelihood. Initially, the PDU will examine email text and referenced
websites. As the system becomes more sophisticated, PDUs will examine images and
embedded scripts.
Text analysis. Phishers cannot use all the tricks of spammers (such as character sub-
stitution, extra spaces, etc.) to defeat keyword filters because their messages must
appear identical to authentic ones. They can however use morphological variations
at the level of HTML markup, for example introducing invisible tags. Additionally,
a sophisticated attack may obtain a degree of polymorphism by combining pieces of
messages from a pool of candidate components, and interspersing randomly selected
text displayed in an undetectable manner. Therefore we need robust techniques to de-
tect any degree of similarity among emails used in a DPA. Techniques for detecting
partial matches (e.g., [16, 5, 10]) have been successfully applied to detecting large-
scale polymorphic spam attacks [21]. Partial signatures for such documents collected
in honeypots or reported by users have been successfully coordinated both in commer-
cial products such as Brightmail and in open-source projects such as Vipul’s Razor [27]
and the Distributed Checksum Clearinghouse [11].
Link and functional analysis. It may be possible to detect a DPA by examining
hosts that are referenced in DPA candidate messages. Pages pointed to by candidate
members of the DPA are likely to exhibit a large degree of similarity as well, therefore
techniques to detect mirrors on the Web can be applied to this task [4, 7]. Traversing
links to detect this similarity is an inherently unsafe activity for the end user, due to the
possibility that a link could have undesired semantics associated with it. However, such
an analysis could be performed with information sharing between spam and phishing
detection components, as proposed in [12], and automatically by user agents operating
within a sandboxed environment. Once the PDU traverses a link in an email that is
suspected of participating in a DPA, the content of subsequent Web pages may also
be evaluated to determine whether they are similar to other pages known to be part of
a DPA. This is also similar to analysis performed by some phishing toolbars, with an
added degree of scrutiny applied to suspicious emails.
The PDU analysis is resource intensive, and therefore is not suitable for real-time
filtering of incoming mail. Its primary goal is to build an individualized email filter per
phishing attack. We thus frame the problem as a large collection of classification prob-
lems, each with a different (and moving) target concept. Each PDU will construct a
filter, in the form of one or more classifiers, reflecting its attack instances. The labeled
examples needed to train each classifier will be provided by the PDU’s in-depth anal-
ysis. Once trained, the classifier will quickly recognize future instances of the DPA.
Individual classifiers in an ensemble will be obtained in two ways: by direct training us-
5

Page 6
ing positive and negative examples from the PDU, and by collecting similarly-trained
classifiers from other nodes. Use of ensemble classification will make the resulting
filters accurate and adaptive [3].
The resulting ensemble classifiers are used for real-time filtering of incoming emails.
The Web collection hosts are accumulated into takedown lists. A DPA could mali-
ciously include legitimate Web sites in the collection host segment of a DPA template
(Figure 3). This is an attempt to cause denial of service to legitimate Web sites by
correlating them with an actual attack. To avoid this, a final round of takedown list
analysis is necessary. In addition to content similarity, the rank of the associated do-
main could be determined by search engines. Since collection hosts only persist for
short periods of time, they will not score highly in these queries.
This system is in development, but regardless of its implementation details, CAPI
has many foreseeable properties for which the policy of server side takedown is in-
feasible. ISPs are responsible for deploying and maintaining the system, yet are rarely
targets of identity fraud. The primary target institutions will benefit most from an effec-
tive system, but do not bear any direct costs for its implementation. Every automatic
analysis had some false positives. In the presence of the wrong takedown policies,
imperfect classification could result in denial of service to honest Web hosts.
Honest
Message
DPA
bait
DPA
bait
DPA
bait
Zombie
Host
Zombie
Host
Zombie
Host
Honest
Domain
Strong Heuristic Correlation
Strong Heuristic Correlation
Potential Denial of Service Against Honest Host
Figure 3: A phisher may try to cause denial of service for some honest domains that are
too small for whitelist protection by including them as links in a real or apparent DPA.
Thus, each candidate zombie site needs to be individually examined for participation in
a DPA; we cannot assume that the IP that occupies the zombie host position of of a DPA
message template is necessarily a zombie host. In light of this consideration, a phisher
may attempt to cause a false negative with the same attack. By linking legitimate sites
to a moderate percentage of the DPA bait messages, the phisher hopes to lower the
signature’s confidence rating below the deployment threshold.
6

Page 7
3 Stakeholders in a phishing attack
Consumer costs and motivation
Consumers are the recipients of bait messages. Phishing attacks place their PII at risk.
In addition to loss of privacy, the 2006 addendum to the Federal Trade Commission’s
Identity Fraud Survey places the average monetary consumer loss due to identity fraud
at $422. Worse is the 40 hours on average spent resolving the issue; at a conservative
time valuation of $15 per hour, this aggravating task costs $600 of consumer time.
While the Gartner phishing survey [23] finds that 3% of its subjects have divulged
PII to fraudulent Web hosts, this estimate of risk is probably low. Recent experiments
suggest much higher compromise rates. A phishing experiment that leveraged social
networks inside Indiana University shows a high rates of PII disclosure (72%) when
messages appear to come from a recipient’s friend, and 16% when messages appear to
come from an unknown address inside the University’s domain [17]. These messages
were not subject to spam filtering, so an adjustment is necessary to model a real at-
tack. Another experiment which simulated PII disclosure on clients of a popular online
marketplace showed hit rates varying between 5% and 20% depending on the level
of message customization [18]. Messages in this experiment exhibited inconsistent
header information as most phishing messages do, and were subject to spam filtering
by the target’s ISP. While their 5%±4% result appears to confirm the Gartner survey’s
estimate, this measures the success rate per attack. Consumers are typically subject to
multiple attacks per day.
Besides direct identity theft, phishing is also a vehicle for pushing malware [2].
The a Web site
Consumers have much to gain from effective anti-phishing technology. Yet re-
search suggests that consumers will not pay for stand alone privacy or security solu-
tions. Security products are often bundled with products unrelated to computer secu-
rity in recognition of many consumers’ low willingness to pay for such protection [14].
Shostack and Syverson [29] illustrate many contexts in which consumers pay for pri-
vacy and that in all instances the protection is bundled with another good of indepen-
dent utility. They argue that consumers often assume privacy, and need to understand
in simple terms why additional protection may be required.
In addition, consumers frequently underestimate their risk of victimization for all
kinds of dangers because of incorrect anchoring [24, 6]. Risk perception depends
on anchoring, or an initial judgement of likelihood. Phishing is new enough that its
statistical risks remain a moving target. Developing a meaningful anchor at this stage
is difficult for researchers, let alone end users.
Target Institutions
The target institution is the agent from whom fraudsters attempt to steal money. While
identity fraud directly hurts consumers, far more money is lost by the corresponding
commercial services and their insurers: in 2005 average consumer losses per fraud
event were $422, while the average amount stolen was $6,383 [20]. Actual costs may
be higher due to time spent resolving the issue. At well over $50 billion per year
7

Page 8
stolen by fraudseters, target institutions have the most to gain by eliminating identity
fraud. However increased security could hurt the bottom line more than it helps. For
instance, a heavyweight vetting process for opening new accounts could cost lenders
more money than it saves if it deters too many honest borrowers.
Spoofed Entities
In case the phisher opts to impersonate a legitimate institution such as a bank, we refer
to this agent as the spoofed entity. While spoofed entities are frequently the subsequent
targets of identity fraud (e.g., financial institutions, online marketplaces, credit cards),
the phisher may be seeking PIIs for the purposes of new account fraud, a class of fraud
that is harder to detect and results in much higher average theft amounts [30]. Also
noteworthy, is the commoditization of stolen identity dossiers. Depending on wealth,
completeness of information, and credit records, such dossiers sell for $25-$250 per
item ??.
The Anti Phishing Working Group monthly reports (e.g., [2]) show that financial
services are the dominant spoofed entities with a spoof share hovering between 80%
90%. Usually these are banks or credit card companies, however there has been a noted
increase in Internal Revenue Service spoofing. ISPs are a distant second, holding be-
tween 4% and 10% of the spoof share. Retail constitutes 2% to 5%, and the remaining
total of miscellaneous attacks varies in the same range.
It’s likely that many of the financial services are simultaneously target institutions
for fraud. The IRS spoofing is more likely a convincing front for a wealth of PII
disclosure (governments are not famous for giving money back). This information will
be used or resold for new account fraud. Enterprise management software will become
an attractive target for phishers seeking large amounts of PII for resale. These systems
replace traditional paper administration in many areas, most notably payroll and taxes.
Indiana University’s Peoplesoft based system offers full monthly salary statements,
full contact information, and even W-2 forms (federal documents stating earnings and
social security number). Moreover, the system is a portal for managing retirement,
investment services, banking, and student loans.
Internet Service Providers
Phishing does not expose ISPs serious threats. It may constitute a significant chunk of
spam, however spam filters will stop enough of the phishing attacks to prevent traffic
congestion. Stopping phishing is only a secondary concern. As attacks become more
sophisticated, phishers will depend less on indiscriminate spamming and select contex-
tualized information. Although ISPs are the second most common spoofed entity, they
are not high cost victims of identity fraud. This is a serious problem for the adoption of
CAPI. Our proposal places the infrastructure deployment and maintenance squarely on
the shoulders of an agent who reaps the least direct benefit. Unless a credible business
case can be made for adopting CAPI, it is doomed to fail [1].
8

Page 9
Law enforcement
In rich countries, law enforcement has an obvious interest in eliminating phishing at-
tacks. Citizens and institutions of rich countries are the primary victims of phishing.
Phishing damages trust on the Internet and could slow economic expansion as a result.
DPAs have particular interest to these law enforcement agencies since they depend on
botnets. Their identification can help track, disrupt, or otherwise extract information
about these zombie networks and their associated criminal activities.
Poor countries do not have attractive victims. Law enforcement will be correspond-
ingly apathetic. Even if the country hosts fraudulent servers, law enforcement has far
more pressing concerns than protecting the money of foreigners.
4 Stakeholder analysis of CAPI
Internet Service Providers
While server side takedown of fraudulent Web hosts is the dominant method for com-
batting phishing today, the benefits apply to all internet users, not just the clients of ISPs
that participate in CAPI. This kind of free riding effect severely damages the incentives
for ISPs to participate.
Client ISP host blocking is an attractive alternative or supplement to server side host
takedown. In this scenario, when the recipient clicks on the malicious bait message’s
link, the client’s ISP intercepts the request and refuses to serve the link. Client side
blocking does not depend on the server’s jurisdiction. Client ISPs can protect their
users as soon as CAPI identifies a credible threat; no time is spent convincing a server
side ISP to shut down a paying customer.
Critically, this gives the ISP a direct benefit for its participation in CAPI. Their
quick reaction and independence from the server’s cooperation creates a safer Internet
for users of their service. In the process of blocking the ISP may opt to replace the re-
quested page with a short promotional message underscoring the ISP’s commitment to
security. This action makes the ISP’s safeguards visible to end users, possibly increas-
ing perception of value. Marketed correctly, these policies endow participating ISPs
with tangible competitive advantages and can lead to further tiering of their service
structure.
CAPI requires competing ISPs to share information. Previous research shows that
cooperation among competitors produces both direct and strategic benefits, even in the
presence of incomplete disclosure [13, 15]. The direct benefit here is wider examina-
tion and dissemination of phishing signatures and takedown lists. Strategically, their
cooperation reduces downward pricing pressure in a competitive market. ISPs can
maintain higher pricing because of the added value from phishing protection.
Consumers and PII victims
We want to provide a transparent experience for end users who receive their email
through a CAPI filtered system. CAPI operates outside of client machines and is there-
fore not subject to client misconfiguration or malware. It is also independent of the
9

Page 10
user platform, so long as standard Internet protocols are followed. A perfect system
eliminates the class of phishing messages that we target and results in immediate take-
down/blocking of the spoofed hosts. In this case, there is no need for user policies.
No matter how good the filtering is, it will invariably produce small numbers of false
positives and false negatives. Whitelisting can protect influential companies from in-
advertent takedown and should be implemented, but finer policies are necessary in the
presence of the occasional misclassification.
Sufficiently distinct attacks will produce false negatives because of the time needed
to detect their existence (assuming it is unmatched by prior signatures) and generate a
suitable classifier. While the ISPs will not “undeliver” or remove offending messages
from the user’s in box, they can certainly scan delivered messages to extract the suspi-
cious IP addresses. If the bait recipient follows one of these links, the requested page
will be blocked either by the client ISP.
Other false negatives may be missed by the filtering system. These represent the
messages that successfully evade CAPI’s filtering technologies. If this number is low,
the user will still benefit from the system, since the overwhelming number of attacks
will be avoided.
False positives are a larger concern. High false positive rates in either message
filtering or server takedown will cause clients to stop using the system. As noted earlier,
a phisher could also try to affect denial of service by applying the DPA template to
honest servers (Figure 3). If some of these honest servers escape detection, client-side
blocking results limited DoS.
We suspect that users will have a higher tolerance for host false positives than
for bait mail false positives if they are allowed to proceed at their own risk in the
presence of stern warnings. One possible strategy will be for the client side ISP to link
to the suspicious host upon client completion of a survey. The survey may ask how
the page was linked (e.g., direct address entry, email reference, application reference,
etc.), a multiple choice expected use classification, and a CAPTCHA (a quick test that
confirms a human agent) to thwart automatic click through. In a deployed CAPI, this
data could optionally inform the re-evaluation of false positives.
Since reaction time is so critical to halting phishing attacks, client ISPs could ag-
gressively intercept requests to suspicious servers even for medium confidence matches.
In these cases, the ISP offers service to the Web site, but advises the clients to wait. Ev-
idence of phishing will become clearer as time passes, either through additional CAPI
nodes reporting similar attacks or through third party verification. The degree of hur-
dles to access the page could increase (or decrease) with certainty of its participation as
a fraudulent Web host. This countermeasure is designed to balance protection of small
Web sites that are not whitelisted from malicious DPA inclusion against the very real
threat of a small Web site compromise.
Continuing to deliver a small portion of bait phishing messages has the benefit of
educating clients. If an end user clicks on the bait link, the corresponding Web page
is blocked in the manner suggested above. The blocking page could further enumerate
the suspicious features to the user. Repeated instances of this scenario would help the
user to anchor his perception, or establish an initial estimate, of phishing risk.
10

Page 11
Target institutions
While target institutions have the most to gain, they would bear a disproportionately
small burden of CAPI’s direct costs. In the absence of regulation or cooperation among
the main beneficiaries of phishing elimination (i.e. the target institutions, not the end
users), ISPs may elect to pay for their infrastructure investments by charging an en-
rollment fee to the large targets of phishing related spoofing. In the current phishing
landscape, most spoofed entities are also the targets of identity fraud. These spoofed
entities would reap direct benefits from their enrollment. If phishing shifts to spoofing
PII portals such as the digital bureaucracy administration software described above,
this pricing scheme becomes less effective. Owners of these portals are not the targets
of identity fraud. The stolen credentials will be used to open new accounts at third
party institutions.
Spoofed entities
Spoofed entities can further participate in CAPI (beyond the subscriber fee proposed
above) by developing consistent design principles that aid the automated recognition
that CAPI performs and also behave as an evident marker for their customers. As CAPI
becomes widespread, DPA attacks will become more adept at skirting the boundaries
of reliable filtering.
Presently, institutions often enable effective spoofing by violating their own rules
of thumb. For instance, even though users commonly overlook the SSL browser frame
padlock, it is an important cue for Web site authenticity. An easy rule of thumb is
“don’t enter login information on http Web sites, and don’t accept self-signed cer-
tificates.” If followed (people may be fooled into abandoning the rule), the attacker
would need to get a certificate authority to sign his zombie host’s SSL certificate;
this makes the attack much harder. At the time of this writing large companies such
as Chase (www.chase.com, www.cardmemberservices.com) use http for
their front pages. Of course the buttons are part of a javascript that launches a crypto-
graphic SSL session to send the login information. From an efficiency point of view,
http is much cheaper to serve. From a usability standpoint, a login collection on the
front page makes the Web site accessible to new users. Yet the combination of these
create a Web site that is easy to spoof, particularly in the presence of insecure DNS
protocols. Worse, it makes spoof detection even harder for savvy users (since the rule
of thumb breaks).
To the user, an effective security signal is both evident when present and conspicu-
ous when absent or incorrect. Technologically, the signal should be hard to duplicate.
Furthermore, attempts to place an incorrect placeholder signal should be automatically
detectable. For instance, many anti-phishing tools perform extra analysis when a login
page is detected. Detection is easy when the page internally labels a form box with
“login” and another with “password.” This becomes considerably harder if a phisher
tries to conceal the page’s semantic content by internally labeling the form boxes as
“search term” and “opt1,” and externally replacing text with graphics. Yet to a human
the purpose of the page is clear both by visual likeness (the graphics duplicate the text)
and by context from the bait message. For important pages such as login, a distinctive
11

Page 12
and easily machine-readable font that marks the form box usage could be an effective
countermeasure to obfuscation. To be effective, the font needs to be widely adopted for
this purpose so that people expect to see it for entry of critical information. In this case,
automatic analysis extracts the Web page’s login intent by applying character recogni-
tion to the screenshot of the Web browser. As we gain experience with the limits of
automatic signature generation, we will be able to suggest practices for critical page
design that make filtering easier.
Law enforcement
Phishing is a jurisdictionally diverse crime. Recipients of bait messages, owners of
compromised hosts, and spoofed target institutions are in different geographic loca-
tions, limiting the intervention options of local law enforcement agencies. CAPI is a
distributed data collection infrastructure and needs force of federal authority to shut
down phishing attacks as quickly as possible. While many attack structures may reside
outside of the United States, this country’s vast bandwidth gives federal authorities a
great deal of leverage in fighting the foreign elements of phishing.
Of course many bait recipients are in the United States, and must connect to spoofed
hosts through the national network. Even if ISPs in foreign jurisdictions refuse to shut
down positively identified phishing servers, federal authorities can still ban national
routing of the offending hosts. If the phishing host has dynamic IP, it becomes neces-
sary to ban the entire subnet – a strong incentive for uncooperative ISPs.
CAPI produces two important sets of data that are relevant to fighting phishing
and cybercrime. First is the list of IP addresses for spoofed servers resulting from
host-side analysis. This necessary portion of phishing requires an acknowledged TCP
handshake and is therefore not suitable for IP spoofing. As noted earlier, the prevalence
of botnets makes it cheap for phishers for employ thousands of compromised hosts
for a single DPA; each server handles limited numbers of recipients so that individual
takedown only disables a small portion of the attack. CAPI classifies individual attacks,
thus grouping the servers of DPAs together. It is likely that these hosts are purchased
from the same botnet. Lists of correlated zombies provide concrete starting points
for disrupting botnets. These networks for hire form the ground troops for a bazaar
of criminal activity including spam, denial of service, and illegal content distribution.
Dismantling them has multiplicative benefits for Internet safety.
Secondly, CAPI collects a great deal of timing statistics for phishing attacks. Traffic
analysis will not only improve detectability of attacks, but provide investigators with
time bounds on botnet resource usage. This added data can help correlate multiple
attacks. Patterns in bait arrival times can signify phishing attacks launched by particular
botnet operators.
5 Conclusion
We propose CAPI, an ISP based solution to distributed phishing attacks. It leverages
the ISP’s unique access to both bait messages and fraudulent Web hosts to perform an
analysis that correlates the two sides of the attack. While a technologically sound idea,
12

Page 13
ISPs have little to gain directly from phishing elimination yet bear the costs of system
deployment and maintenance. We propose ways to align CAPI with stakeholder inter-
ests. Fraudulent Web hosts should be blocked by the client ISP; this limits the imme-
diate benefits of detection to clients of participating ISPs. The pressures of immediate
action in the presence of medium confidence can be mitigated contingent access to sus-
picious websites. Continuing to deliver phishing messages in a manner that makes their
phishing score evident while blocking the fraudulent Web site can help to plausibly an-
chor end user risk perception. Finally, deferring server takedown also has the benefit of
allowing law enforcement to track, disrupt and infiltrate botnets. Because botnets are
a convergent point in computer crime, their identification and subsequent disruption
not only curtails phishing attacks, but stops the network from launching other criminal
activities.
References
[1] R. Anderson. Why information security is hard-an economic perspective. In
ACSAC ’01: Proceedings of the 17th Annual Computer Security Applications
Conference, page 358, Washington, DC, USA, 2001. IEEE Computer Society.
[2] APWG. Phishing activity trends report. Technical report, Anti-Phishing Working
Group, December 2005.
[3] E. Bauer and R. Kohavi. An empirical comparison of voting classification algo-
rithms: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.
[4] K. Bharat, A. Broder, J. Dean, and M. Henzinger. A comparison of techniques to
find mirrored hosts on the WWW. Journal of the American Society for Informa-
tion Science, 51(12):1114–1122, 2000.
[5] A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of
the web. In Proc. 6th International World Wide Web Conference, Santa Clara,
California, April 1997.
[6] L. J. Camp. Mental models of security. IEEE Technology & Society, under revi-
sion.
[7] J. Cho, N. Shivakumar, and H. Garcia-Molina. Finding replicated web collec-
tions. In Proceedings of 2000 ACM International Conference on Management of
Data (SIGMOD), 2000.
[8] N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense
against web-based identity theft. In NDSS, 2004.
[9] Cloudmark messaging security solutions. http://www.cloudmark.com.
[10] C. Collberg, S. Kobourov, J. Louie, and T. Slattery. Splat: A system for self-
plagiarism detection. In ICWI, pages 508–514, 2003.
[11] Distributed Checksum Clearinghouse. http://www.rhyolite.com/anti-spam/.
13

Page 14
[12] A. Emigh. Anti-phishing technology. Report of the United States Secret Service
San Francisco Electronic Crimes Task Force, January 2005.
[13] E. Gal-Or and A. Ghose. The economic consequences of sharing security infor-
mation. In Workshop on Economics and Information Security, 2003.
[14] A. Ghose and A. Sundararajan. Pricing security software: Theory and practice.
In Workshop on Economics and Information Security, 2005.
[15] L. A. Gordon, M. P. Loeb, and W. Lucyshyn. An economics perspective on the
sharing of information related to security breaches: Concepts and empirical evi-
dence. In Workshop on Economics and Information Security, 2002.
[16] N. Heintze. Scalable document fingerprinting. In Proc. Second USENIX Work-
shop on Electronic Commerce, Oakland, California, November 1996.
[17] T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer. Social phishing. Commu-
nications of the ACM, To appear.
[18] M. Jakobsson and J. Ratkiewicz. Designing ethical phishing experiments: A
study of (rot13) ronl auction query features. In WWW, 2006.
[19] M. Jakobsson, A. Young, and A. Emigh. Distributed phishing attacks. Technical
Report 091, International Association for Cryptologic Research, ePrint, 2005.
[20] Javelin Strategy & Research. Identity theft survey report (consumer version),
2006.
[21] A. Kolcz, A. Chowdhury, and J. Alspector. The impact of feature selection on
signature-driven spam detection. In Proceedings of the First Conference on Email
and Anti-Spam (CEAS), 2004.
[22] B. Laurie and R. Clayton. Proof of work proves not to work. In Workshop on
Economics and Information Security, 2004.
[23] A. Litan. Phishing attack victims likely targets for identity theft, 2004.
[24] M. G. Morgan. Probing the question of technology-induced risk. In T. S. Glick-
man and M. Gough, editors, Readings in Risk, pages 5–15. Resources for the
Future, Washington, D.C., 1990.
[25] Netcraft toolbar. http://toolbar.netcraft.com.
[26] Phishguard. http://www.phishguard.com.
[27] V. V. Prakash. Razor. http://razor.sourceforge.net/.
[28] B. Ross, C. Jackson, N. Miyake, D. Boneh, and J. C. Mitchell. Stronger password
authentication using browser extensions. In Usenix Security Symposium, 2005.
14

Page 15
[29] A. Shostack and P. Syverson. What price privacy? In L. J. Camp and S. Lewis, ed-
itors, Economics of Information Security, volume 12 of Advances in Information
Security, pages 129–142. Springer, 2004.
[30] Synovate. Federal trade commission identity theft survey report, 2003.
15