Sift: Protecting the GDP of the internet

For any new technology platform, infrastructure is a critical piece to make that new technology usable. Highways (infrastructure) made cars (technology) much more useful. One piece of internet infrastructure that every tech fanboy (like myself) knows about is Stripe. Before Stripe, accepting payments as an online business was a complex & long process. Stripe made it quick & easy for online businesses to accept payments. And the market has rewarded them with the crown of most valuable US-based startup at $95 billion. Put simply, their mission is to increase the GDP of the internet. But we aren't here to talk about who is increasing the GDP of the internet. Rather, I want to talk about a question that does not get enough attention:

Who protects the GDP of the internet?


Another critical barrier for any new technology platform is trust. Today, we trust the internet services we use with our money, communication, really our entire lives, so it may be difficult for us to understand how did we get to trusting these internet services. That's why it's useful to look at crypto. A new technology platform is being built before our eyes, and if you know anything about it, plenty of people don't trust it.



Notice a common theme in the pictures above? Crypto exchanges, where you can buy and sell cryptocurrencies, kept getting hacked. That created a gap for a crypto exchange to spur widespread adoption. Be a crypto exchange where fraud doesn't run rampant. A place where users could trust to trade crypto. And that's exactly what Coinbase, the first major crypto company to go public for $65 billion, did.

Coinbase co-founder Fred Ehrsam spoke on trust when Coinbase went public:

It was very important to lean into having high integrity as a brand for both customers and regulators. We made a critical decision very early when the FinCEN guidance, which is a subset of Treasury, came out. The first crypto regulatory guidance ever in March 2013. I remember calling our lawyer that day and asking him, "hey, this crypto regulatory guidance came out. What should we do?" And his response was, "you guys are a really small startup and this is going to be super burdensome, it's going to cost you a ton of money. You should ignore it for as long as possible." So I sort of said, "ok". Hung up the phone. Later that night at dinner, Brian (Coinbase co-founder) and I talked about it. We decided to call the guy back the next day and fire him. And basically go the exact opposite direction. And I think that is the bedrock on which a lot of the white knight brand of Coinbase is built today.

So who helped Coinbase gain the trust of millions of users to use its product? Who is protecting the GDP of Coinbase? Some google searching reveals in Coinbase's Privacy Policy that a company called Sift (formerly Sift Science) plays a key role.


This need to prevent fraud isn't unique to crypto. As mentioned earlier, every technology platform needs it. If online businesses don't, they risk users complaining about their company on social media, losing users all together, and worst of all, not being able to accept payments. Or in Sift's terms, not protecting the GDP of the internet.

Before we explore how Sift is helping online businesses prevent fraud, let's understand how much of a problem online fraud has become.

Who's winning losing? Offline or online fraud?

Recently, there's been a change in the payment fraud world. I'll let you try to decipher the not-so-nicely designed chart from the Federal Reserve below.


To clarify the picture above, in-person card payments are when you're in a store and purchase something, often referred to as card-present (CP). Remote card payments are transactions you make on the internet, known as card-not-present (CNP).

2016 marked the first year in the US that online fraud surpassed offline fraud. You may be concluding, "So? Everyone shops online now! Obviously fraud would be higher online." Hold your horses, buddy, I got another stat for ya..

In 2016, eCommerce only made up 8.2% of all retail sales in the US. Yet, it made up 61% of credit card fraud. 🤯

And it wasn't just that online fraud grew faster than offline fraud, but also the fact that offline fraud dropped. The cause of this drop is the introduction of EMV cards. You know, that credit card you got a few years ago with a chip in it. The one that made you insert your card in the credit card reader, awkwardly wait for it to process, accidentally pull it out too early, further increasing the awkwardness. I won't go into the technology behind EMV cards, but EMV cards weren't built for speed, they were built to improve security. And that's why offline fraud dropped.

Which leads to the fraudsters next prey. Fraudsters gotta eat too. And here are the reasons they are going to the internet to feast.

Internet fraud - more scale & specialization, less friction


Fraudsters act like how I wish I organized my task list - high impact, low effort.  They see where they can make the most money, for the lowest effort. They're looking for where the lowest hanging fruit fraud is. The scalability of internet fraud makes it high impact.

🌏 Mr. Worldwide (web)!

Pitbull is an artist you hate to love. Cheesy when sober, so good when drunk. When Pitbull came onto the scene, he was called Mr. 305 because that's the area code of his hometown, Miami-Dade County. But when Pitbull elevated from being a star in his local town to being a star all around the world, he upgraded his nickname - Mr. Worldwide! The same potential came true for businesses when the internet was born.

Take for example, an instructor selling a course in their local town vs. the internet. Let's say you want to teach people English in an American town that has high concentrations of non-English speakers, great! You have plenty of potential customers. But what if your skill is more niche? Like piloting a drone, or learning to code. In either situation, you limit growth because your total addressable market is confined to your local town. With the internet, that changes. Teachable is a place where you can sell your online course, helping you reach every potential customer, no matter where they are in the world.  17 Teachable instructors earned over $1M in 2019 teaching things like astrology/metaphysics, film making, and..making money online (how meta).


Big companies (eventually) understood that being online could drive revenue growth. So when big companies rushed to get online in the mid 2000's, they didn't understand the vulnerabilities in the systems they were building. One hacker, Albert Gonzalez, found he could quite easily access customer credit card information. After he was caught, it was found that he stole info on 170 million credit & debit cards. And that's only the ones the government discovered. One hacker said, "there were major chains and big hacks that would dwarf TJX. I'm just waiting fo them to indict us for the rest of them."

Since then, breaches have gotten only worse. I mentioned in my article about Tonic.ai, that 2020 was another record year for data breaches. Hackers like Gonzalez can then take these stolen cards to the dark web, where forums like Shadowcrew exist. Shadowcrew gave a place for fraudsters to anonymously sell the identities and card numbers they stole to fraudsters who wanted to use them. Think of it like eBay for fraudsters. But, how could you trust if what someone is selling is legit or not? After all, they are fraudsters. So Shadowcrew moderators became the middleman that every transaction went through. If you got ripped off, the moderator would reimburse you or find you a replacement cards/identities. Huh, It turns out even fraudsters need fraud prevention.

Contrast the sophistication of the online fraud scene to the constraints of most offline fraudsters. As an offline fraudster, you were confined to your local town when stealing credit cards. In what local town are fraudsters going to be able to steal 170 million credit cards? And then how the hell do you find people to sell those stolen credit cards? It's not quite the same as a teenager pulling a "hey mister" and asking someone to buy beer for them.

A real economy developed in the online fraud world, with specific players specializing in what function they could perform best (I know my Economist professor would love that line). Another advantage of online fraud was being able to combine the scalability of sensitive data available online with scalable labor.

🤖 An army of robots is already here

Hollywood loves scaring us with a time in the future when robots run the world - I Robot, Eagle Eye, Smart House. That's the blue pill. The red pill is that the robots are already here.



I know. I'm basic for quoting Naval. What Naval is referring to here is what software engineers do (and what my Indian dad wishes I did). Through code, you can program computers to do repetitive tasks for you. It's like the difference between (Iron Man 3 spoiler alert!) being Iron Man vs having a bunch of Iron Man drones. At first, killing all the bad guys, doing photo-ops, looking like a badass is all fun and games. But like any repetitive task, it gets boring. So you create a drone version of your suit to handle the repetitive task of greeting your girlfriend when she comes home from work (I really hope my gf doesn't kill me for that line).

Robots that run repetitive, automated processes are rampant in the digital world. More than half of Internet traffic is bots scanning content, interacting with webpages, chatting with users, or looking for attack targets. Bots aren't only good for automating repetitive tasks, they also do it in a scalable way. Imagine if bots didn't make up half of internet traffic, and instead we needed humans to do that. Recruiting, training, and keeping that many individuals happy would be far more difficult and uneconomical compared to writing computer code once for the initial bot and scaling to as many bots as you'd like.

Fraudsters use bots for these same reasons to carry out their attacks. You know how we talked about massive data breaches earlier? That data is sold on the dark web by fraudsters who not only gained access to stolen credit cards, but also to login credentials.

👶👶👶DUMB DOWN TIME
👶👶👶

Logins aren't just logins. Logins are the keys we use to access our digital homes that contain our valuables. Today, we value our digital property just as much as our physical (my cousin Shaan has a great twitter thread on this). You have stored funds and payment methods on sites like Venmo, Robinhood, Airbnb, etc. Fraudsters noticed too. Getting access to someone's login credentials and using it for malicious reasons is a newer, faster-growing form of fraud, compared to the more traditional payment fraud we've been talking about so far. It's called account takeover. Fraudsters are exploiting it because businesses don't realize just how big of a problem it's going to become, and thus dedicate less resources to stopping it.

To commit account takeover, one technique fraudsters employ is using bots, called credential stuffing. Let's say you get login credentials from a breach of a major department store. You're not going to use those login credentials to try to just login to that major department store's eCommerce site. You're going to try it on tons of sites. The reason you can try someone's login on a plethora of sites is because it's estimated that as high as 85% of users reuse the same login credentials for multiple services. (BTW, don't be that person. please grow the f up and use some free service like LastPass). The problem fraudsters encounter is with success rate.

Only about 0.1% of the time a fraudster tries this method do they actually get access to an account. Fraudsters aren't going to sit there and type in logins one-by-one for only 1 in every 1,000 to work, especially when they have access to millions of credentials. They want high impact, for low effort. They're going to write a coding program that will make bots do it for them. So if they have access to 1 million login credentials and successfully login into 0.1% of them, they've gained access to 1,000 accounts, on one site! They can then scale these efforts over plenty of other sites on the web.

This combination of having access to millions of login credentials and being able to automate the process of hacking into them with bots makes online fraud a scalable pursuit.

🛍 Friction free leads to a fraud spree

When you look at online businesses, they have reduced friction in our lives. No longer do I need to commute to the grocery store, walk around searching for what I need, get hit by some 5-year-old driving a cart, wait in a long line, and carry heavy bags home. I can type grocery items in a search bar, click a button, and have them delivered to my house (please catch up Trader Joes). This phenomenon isn't unique to the grocery industry. Making commerce more convenient through the internet has been replicated across every industry, leading to the consumer expectation that internet experiences are fast. The internet's speed has reduced friction for all users, not just legitimate ones, which has had the opposite effect of EMV cards. The friction and security EMV cards introduced made offline fraud difficult. The speed and convenience the internet created made online fraud easy. To understand better, let's compare some of the risks & dynamics fraudsters face in real life (IRL) vs online.

IRL fraud - concrete identity & high stake consequences


One great IRL fraud scheme back in the swiping card days was getting access to stolen debit cards, going to ATM's, and extracting funds from bank accounts. The one problem is hiding your identity. In this scenario, you can't pull the ski-mask to hide your face trick (or at least in pre-COVID times) because that's fucking sketchy. Your face is visible and clear for that ATM camera. So you decide to wear a disguise to make the face caught on camera look nothing like you. But then you have to consider the second type of risk, which the hacker Anthony Gonzalez mentioned before, encountered. He didn't go to that ATM to extract funds from one stolen debit card, he did it for a fuck ton. So he was at the ATM raking in stacks of cash in the middle of the night, but it took him like 10 - 20 minutes to do it. Who's at an ATM for 20 minutes? Sketch alert! Some cops noticed Gonzalez doing this, foiled his master plan, and arrested him. The risk of what happens if you get caught for IRL fraud is high. You go to fucking jail!

Online fraud - fluid identity & lower stake consequences


Let's say an online fraudster gets caught by an online business' anti-fraud system. What happens? Does the company report that online fraudster to the authorities who then easily identify and arrest that person? No, the most common outcome is that fraudster gets one of their internet identity signals banned from using the service again. Does this mean the fraudster can't commit fraud again on that site? Nope! Manipulating your internet identity signals can be quite simple and cheap. Below, we'll talk about how a fraudster responds when one of their identifiers gets banned.

Our digital form of an address is called an IP address. But fraudsters use what are called Virtual Private Networks (VPNs) that only cost a few bucks per month. VPNs hide your real IP address and give you a randomized, often-changing one. So when a fraudster gets caught and the IP address they were using gets banned, it ain't no thang, you can change your IP address easily with a VPN!

A device ID is another form of identity on the internet. The easiest way to understand it is let's say you login to your Twitter on your computer and hit the "Remember Me" box, the next time you go to Twitter on that same computer, you don't have to login again. Twitter recognizes that device ID. But let's say you try to login to your Twitter from a new, different device, like your iPhone. Twitter will recognize the iPhone as a new device ID, which is why you need to login again. The specific make, model, operating system, etc. makes a device ID unique. So if a fraudster gets caught committing fraud on a device and the online business bans the device, it doesn't mean shit to the fraudster. Fraudsters can change device IDs by updating their operating system, or change a setting in their browser, for example. The half-life for a device fingerprint can be shorter than one month. Additionally, fraudsters aren't just selling credit cards and login credentials on the dark web, they can also sell device IDs of legitimate users. If a company has seen successful transactions with a device ID, they can mark that device as legitimate, which also gives a way for fraudsters to sneak past anti-fraud systems.

The last common form of identifier is your account. But come on, how legit is an account as an identifier? How much does an account really know about who I am? In most cases, they just want an email address, which I can quickly create...for free. I mean have you ever heard of a finsta, you boomer? "Fake" accounts are exactly how fraudsters transact with each other on the dark web. No longer do you have to reveal your name and face to other fraudsters to commit a crime with them.

So not only are the consequences of getting caught low risk, but you can also respawn like it's Call of Duty. That risk/reward calculation has led fraudsters to flock to the possibilities on the internet.

So how have businesses attempted to protect their services from online fraudsters? Let's explore.

The business version of whack-a-mole


Traditionally, businesses have approached the issue of fraud with what I call the whack-a-mole technique, but more professionally, it's known as rules. Fraud rules are simple; if a user fits x criteria we've deemed suspicious, take y business action. Some rules are obvious to create, such as block all orders from North Korea. Review orders over $1,000 when the average customer purchase is only $50. But many rules happen after fraud has occurred, and that's the whack-a-mole approach. Usually businesses figure out about fraud 2 - 4 weeks after it's occurred. How? Through a chargeback.

👶👶👶DUMB DOWN TIME
👶👶👶

A chargeback is the legitimate cardholder saying something like, "What the hell are these charges at Chicago parking meters and hotels?! The last time I went to Chicago was years ago for an overpriced, overcrowded NYE party. I'm going to call the bank and file a chargeback." Your bank rep will say, "Ah yes sir, of course. We will have these charges removed right away, cancel your current card, and send you a brand spanking new one." You will get your money back because you are the bank's customer, the bank is incentivized to keep you happy, not the business that let a fraudulent transaction slip past through their anti-fraud system. A chargeback will make the business fully refund the customer, and will charge a fee to the business ranging from $20 - $40. Additional costs for the business include lost inventory, any operations associated with fulfilling the order, and time spent dealing with the chargeback.

Businesses want to avoid chargebacks for all the reasons above, but the biggest potential threat is losing the ability to accept payments. If a business exceeds chargeback thresholds (~1% of transactions being chargebacks), the middlemen who help businesses accept payments (Visa, Mastercard, Discover) can remove the business from their network, which means your card won't work on that site. So for instance, if you were removed from Visa's network, you would be losing access to about half of the purchase volume on credit cards. In laymen terms, you would be fucked.

So how does this business respond to that chargeback to prevent future ones?


Have you heard of Tom & Jerry? If not, you're probably way younger than me, and I have a shit ton of respect that you're reading a 3,000+ word essay on online fucking fraud. Tom & Jerry is about a cat's never-ending quest to catch a mouse. The cat makes a fool of himself chasing the mouse causing collateral damage along the way. The mouse is always coming up with ways to outfox the cat, constantly adapting behavior after being briefly captured by the cat. That my friend, is the game of online fraud.

After businesses become aware of fraud, they put their fraud fighting teams on the case. Let's say a fraudster has their order blocked by the $1,000 rule we just talked about. The fraudster realizes that maybe making a $1,000 order isn't the best way to blend in and seem like a legitimate user. So instead, they make 12 orders from different accounts and it slips past the business' fraud detection system. 2 - 4 weeks later, the business will receive notice of the fraud that occurred and have to figure out what were the signals or characteristics about that user that are indicative of fraud. Maybe they'll block the fraudsters IP address and device from using their service again. Maybe the order happened at 3 AM, so instead of instantly fulfilling it, they may block, or if they have the capacity, review orders at such odd hours before processing them. Will that stop the fraudster? You know better. This is Tom and Jerry, a game of cat and mouse.

Online businesses try hard to catch fraud, but in the process create many additional problems they have to deal with. When they do catch the fraudsters, it is a temporary bandaid. As we just learned, when a fraudster gets caught, they don't call it quits, they change and adapt. So here are the issues created when a business uses a rules-based fraud detection system.

💪 Fraudsters are playing survival of the fittest

As mentioned before, if a fraudster sees their $1,000 order is blocked, maybe they'll do multiple orders broken down into smaller chunks. Their job is to poke holes in systems and find weaknesses. They need to find ways to adapt, or else they risk not being able to make a living. They'll have to become a normal person and get a 9-5 job with a boss. Yuck. Now that's a fraudster's worst nightmare.

As an example, companies have improved their ability to detect credit card fraud, so fraudsters have turned to alternative payment methods (cryptocurrencies, gift cards, in-app credits) as new attack vectors. Despite this, only 26% of businesses believe they are very effective in preventing fraud from non-credit card payment sources. This highlights that merchants aren't keeping pace with the rate at which fraudsters are adapting.

🤬 Rules are reactive

When you implement a rule, it's after fraud has already been committed. And potentially lots of it. When a fraudster finds a hole in the system, they go hard at it.  Since it takes a business 2-4 weeks to receive a chargeback and investigate it, it leaves a large block of time for that hole in the system to continue to be exposed. So fraudsters continue to attack it and share it with their online fraud buds through forums on the dark web.

But like we already learned, fraudsters adapt. You'll continue to be hit with fraud, just in new and unique ways that you'll have to discover and create rules for. So you're just constantly in reaction mode, not being able to proactively prevent fraud before it happens. That's just poor business operations.

💰 Throw bodies at the problem

When businesses face problems they throw people at the problem to solve it. That's how online fraud has been solved historically, a combination of rules and people. In addition to rules that stop fraudulent orders, companies will manually review potentially fraudulent ones where they aren't as sure if it's fraud or not. These potentially fraudulent orders are in a queue where they will be accepted or rejected, after the manual reviewer looks it over. Here are three difficulties we mere mortals face when trying to manual review online orders to combat fraud:

  1. We expect everything to be fast online. Thanks to Amazon, when we place an order on an eCommerce site, we now expect it to be shipped the next day, if not faster. When we place a food delivery order on DoorDash, our hangry meter will accept no delivery longer than one hour. So when you consider the scale of orders you can receive as an online business, the number of potentially fraudulent orders can come into your manual review queue at a dizzying pace. How can you handle the scale of thoroughly manually reviewing these orders, while maintaining that low friction, high-quality experience (fast delivery) that consumers have come to expect on the internet?
  2. As online companies scale their revenues, they'll have to also scale their manual review team to handle the increase in potentially fraudulent cases to review. And if you know anything about scaling companies, you can never hire fast enough. Finding high-quality talent, building training programs that quickly onboard new teammates, retaining, managing, and keeping employees happy is a lot. Some parts of your business this is a necessary undertaking, specifically ones where you have a core competency and are trying to differentiate. But other parts of your business are non-core, repeatable problems every business faces. Those are opportunities to outsource or automate.
  3. The fundamental problem isn't MR. It's what causes excessive MR - inaccurate rules. Fraud teams have to review lots of orders that are obviously fraud, or obviously legitimate. That's inefficient. You want your team to only review orders that are more nuanced and require a human eye. The inaccuracy of rules is an opportunity to use technology that is more accurate to reduce the amount of MR required.

😱 Limit revenue growth (VCs/startups gasping)

Rules treat the world as black and white. If an order meets this specific criteria, block it, because it must be fraud. But as you grow older, you realize it's grey (or maybe that's just my gf pointing out my 7 grey hair?). Here are examples of rules that seem to catch fraudulent orders, but also would block legitimate ones:

  1. Get as stereotypical as you can for a second. What do you think of when you think of a hacker? Probably someone in a dark room in the middle of the night hacking away. The fact that the most fraud occurs at 3 AM totally makes sense, fraudsters don't have bosses! They work weird hours. But, wait a second, ever heard of drunk shopping? In one survey, 52% of people who online shopped while drunk did it late into the night. So keep in mind in addition to all that late night fraud you catch, you'll also be blocking some drunk purchases that may not have been purchased otherwise.
  2. Fraudsters don't browse around like normal shoppers. They know exactly what they want to defraud and they do it immediately. That's why it's been found that orders placed within 4 minutes of their user account being created, are 35x more risky. But, what if I, Sameer Jauhar, fellow legitimate user was browsing their site on my mobile phone, and then decided to go home, quickly create an account, and make a purchase on my laptop? Or rather I heard about a product on some podcast I trust and was ready to buy immediately. If my first experience as a consumer, I get blocked, I'm unlikely to return.

Businesses who primarily sell online tend to be younger, meaning they're more likely to be a startup. And if there's one thing I know about startups, it's that they fucking love growth. They hate anything that gets in the way of it. I know one eCommerce founder that didn't care if the CVV on a credit card didn't match the one a user inputted while placing an order. She was more concerned that she was blocking orders from legitimate users who incorrectly inputted the CVV vs stopping fraudulent orders because the fraudster didn't have the CVV. I love that founder's uninhibited focus on one metric.

Startups have good reason to be worried about this risk. In a recent survey, 36% of respondents tried to make a purchase, but the transaction was falsely declined. 25% of those consumers who had their order incorrectly flagged as fraudulent ended up buying from a competitor. The consumer went elsewhere because that key currency of trust is missing between the customer and the original brand.

So rules not only struggle to stop fraud, but they also are too rudimentary to identify legitimate orders. From both sides, rules just aren't accurate enough.

"Ok, ok, ok we get it. You think rules suck. Do you have a better solution?"

Glad you asked.

An automated, scalable, and customer friendly approach to fraud


All of these issues described above made it possible for a new anti-fraud company to emerge. One that proactively adapts to changing fraud patterns, holistically assess users, utilizes automation, and doesn't limit revenue growth. One that protects businesses from fraudsters and helps them enable trust with their legitimate users.

This company had to be built different. It couldn't utilize the technology of the past. That company is Sift. And they use machine learning (ML) to address the shortcomings of rules.

👶👶👶DUMB DOWN TIME👶👶👶

Think of ML like our brain. More specifically, let's use a baby's brain as an example. When babies are young, they don't know a damn thing. So we have to teach them! An example is animals. We show them animals and tell them the name of those animals. We give them the answer today, and they can tell us the answer tomorrow. In their brain, their mind is learning all the little details that are associated with each animal. So then going forward, you don't need to give them the answer, they can predict the right answer by observing the little details about the animals. That's how ML works.

At first, when an ML model can't predict anything, you have to train it. You start by feeding it two parts of data. One is the input data. In this example, it's the picture of a tiger or lion. The ML model can analyze every little detail (inputs). And then you tell it which animal that is (output). So during that process the ML model will learn inputs like sharp fangs aren't a good indicator of which animal it is because both tigers and lions have them. But the model will also learn inputs that are clear differentiators. The model will notice that 100% of the time that an animal has bold, black stripes it's a tiger. And then finally, the model will learn inputs that could be potential indicators. Generally, tigers are longer, more muscular and heavier. So if the model only notices one of these inputs/traits, it might not be so sure if it's a tiger or lion. But if the model observes an animal has all these inputs/traits, it will be fairly confident that it's a tiger. Once you've trained the ML model, then it can predict from a picture if it's a lion or tiger, without you giving the answer to the ML model. Each input is associated a weighting to determine how important it is to predict the output. The model combines all of these inputs & their weights to come up with a prediction (eg. the model is 94% sure this is a tiger).

"Why the fuck would an ML system predict lions vs tigers?"


Listen, it's a silly, simple example! I was trying to make it easy for you. Sue me! But since you're such a smart cookie, let's go deeper into how ML can actually be useful.

To understand how ML changes the equation, let's look at how an ideal fraud fighting team would operate.

Gain a more ~holistic~ understanding


Sorry, my Yoga instructors and their mini speeches on living a more holistic life seem to be slipping into this post. When fraud fighters see a signal as fraudulent (eg. order placed at 3 AM), they might think placing a rule there will stop fraud. It may, but it'll also stop some legitimate orders, like a drunk online shopper. And that's a fundamental issue with rules, especially for startups that need to grow fast and build trust with consumers. Rules see a single signal as a determiner of fraud or not. "Since we're seeing a lot of fraud from orders at 3 AM, all orders then must be fraud!" But any given signal isn't your silver bullet solution to stop fraud. Rather, you want an ML model that observes the entire user journey - from start to finish on your site, collecting every data point along the way.

The reason you want to collect every data point is because this gives you a more holistic understanding of the risk a user poses. As a fraudster, you're no longer just evading specific fraud signals that you've learned to get around, you're being evaluated on every action you take. As a fraudster, you have to act like a legitimate user, which adds friction to commit fraud. That's frustrating for a fraudster. Fraudsters want to go where the lowest hanging fruit is, which means they'll go elsewhere.

But ML is also important for the other set of users you have. You know the ones who make up all your revenue, that helps justify your batshit startup valuation to venture capitalists? You shouldn't ruin the customer experience for all users because there are a few bad apples. That just eliminates trust. Instead, by getting a more holistic understanding of a user, you can also make sure legitimate users aren't flagged as fraud and end up buying from your competitor. Growth is an under-discussed topic for fraud fighting teams because well, they fight fraud. They don't focus on growth. But obviously growth is a goal more broadly for the company. ML can help fraud fighting teams be more aligned in helping their company achieve their goals.

Adopt technology when it has an edge vs humans


When we talk about Sift's ML model collecting data points, it's good to understand the scale of it. Sift's ML model collects 16,000 data points on each user that comes through the site. No human would ever be able to accurately process and understand all the fraud patterns within those 16,000 data points. It would be too overwhelming. A human would just resort to looking at a few signals that they've relied on in the past, kind of like rules. But here's where computers beat us. They have the capacity (compute power) to be able to ingest all 16,000 data points and understand what data points are indicative of a fraudulent user vs a legitimate one. They can see trends in fraudulent activity that we can't. Just like an Excel formula can sum numbers, do vlookups, and other formulas better than if I did them in my head.

There's two models Sift uses. The one you have to train is a custom model. You train that model because each business faces their own unique flavor of fraud. By training the model, it can understand the fraud patterns unique to your business. But there's also a second model. Without even training this second model, you can get value from it on day 1. The day 1 value model is Sift's global model. You often hear about network effects in the consumer social world. Facebook gets more valuable to users with each user it adds, which makes it more compelling for new users to join. That virtuous cycle also applies to Sift. Here you can benefit from the fraud learnings that all their other customers have gleaned. When Sift adds a customer, they get more data on what fraud looks like, which means more fraud learnings for customers, making it more compelling for existing customers to continue to use it & new customer to consider using it.

Some of the companies' fraud learnings you can access by using Sift's global model.

This data heavy approach helps you become more proactive in your fraud fighting techniques. Let's say you ban (create a rule) a certain IP after you saw a fraudulent order came from it. Not a problem for the fraudster. They'll simply use a VPN that will generate a new, random IP address for them. You'll be creating rules over and over again. And that's just for one fraudster. Instead, you want a fraud solution that can take in vast amounts of data and understand patterns in the data. So when fraudsters eventually change their tactics, your fraud solution keeps up. A big piece of this is the combination of the global network and online learning.

Let's say Sift's customer DoorDash marks something as fraud. Sift doesn't wait a month to act on this information and improve their fraud models. They engage in online learning, where they automatically update their global model with this fraud learning. That means after a fraudster is successfully flagged in Sift's network, within seconds, that learning is shared with every Sift customer. So if that fraudster tries to defraud any other Sift customer even seconds later, they'll be stopped in their tracks by the global model.

Automate away


As a business grows, a typical fraud team would solve it by trying to hire more people. But that could mean long wait teams for a consumer to have their order fulfilled (bad customer experience), having to hire and train more team members (complex), and having the team review legitimate orders that were flagged by a rule, like a drunk shopper shopping at 3 AM (inefficient use of fraud team). Instead, you can automate. Here's how you would do it.

Similar to the lions and tiger example, you'll need to train your custom ML model to understand your fraud patterns. As the custom model collects data on each user, you'll tell the model, "this user is fraudulent" or "this user isn't fraudulent". After you've labeled ~50 fraudulent transactions, the model will be able to properly calculate the probability of how likely each signal is indicative of fraud. The model will combine all of this information into a score. You can feel confident this score is accurate because it's a much more comprehensive way to understand users and the risk they pose. Given this confidence, you can now automate. Now you can auto-accept any order with a fraud score of 0 - 60, review anything from 70 - 90, and auto-block anything above 90.

The number ranges you choose are all based on your business objectives. Business focused more on revenue growth? Increase the score at which you block orders so you block less. Want to focus on bringing your fraud rate down? Decrease the score where you accept orders so you accept less. Don't want to increase your fraud team as your revenue grows? Decrease the score range in which you review orders so your team reviews less.

Now that your fraud solution is more accurate from both sides, your fraud team won't have to review orders that are obviously fraud or obviously legitimate. You're now auto-accepting or auto-blocking them. Your business is doing a better job protecting itself against fraud, and enabling trust with legitimate users. And your fraud team will have to review less orders. While this is great from the perspective of efficiency, it's also a better customer experience. The less customers have their orders in a holding period, the faster they get their product, which is the experience internet consumers demand.

When a user goes in the manual review queue now, maybe that user has some fraudulent signals and some legitimate ones, giving it a fraud score of 70. Your trained human fraud fighter can do extra digging to clear up the complexity of the user. The fraud fighter can navigate this data through Sift's console, understanding some of the reasons the ML model thought this user might be fraud. When the fraud fighter finishes their investigation, they will label the user as a fraud or not. The ML model will learn from that, improving its accuracy to understand fraud, which means more automation.

Sift's console makes it easy to quickly manually review orders.

Want to compete with the FANG's of the world? (or whatever hell the acronym is now)

Facebook doesn't need Sift's technology. In addition to payment fraud, Facebook also faces fraudsters creating fake accounts that post scam/spam content. In the last quarter of 2018, they took down 116M accounts. They have hundreds of people working on anti-fraud machine learning models. So do the other tech monopolies like Google. In fact, Google has done a magnificent job of combatting email spam on Gmail. There are 347 billion emails sent per day. Some estimates have spam emails making up half of that. Opening up your email inbox every morning is already awful enough. Could you imagine if half of your inbox was spam? Well Gmail uses machine learning to stop 99.9% of spam emails from ever entering your inbox.

But who's protecting the GDP of the rest of the internet and enabling trust with legitimate users? Most businesses don't have the same resources at their disposal as the largest tech companies. As an example, in 2016, Google said they had 27K employees in research and development (product & engineering departments of the org). 27,000!

You need to partner with a company that's sole focus is using machine learning to help you protect and grow. A company that stops fraud on 34,000 sites, analyzes 70 billion events per month, and assessed risk on $250 billion in transactions in 2020. That company is Sift.

You did it!

Thank you for reading my ridiculously long posts.

If you'd enjoyed it, I'd appreciate it if you joined my email list. I share a new company deep dive ~once a month.

And if you're looking to do a good deed today, like or retweet the thread on Twitter (linked below). It would mean a lot!



Thank you to my wonderful girlfriend Rishika for editing, and my good friend Nigel for his feedback.

Ready to get your learning on?

Get started

Sift: Protecting the GDP of the internet

For any new technology platform, infrastructure is a critical piece to make that new technology usable. Highways (infrastructure) made cars (technology) much more useful. One piece of internet infrastructure that every tech fanboy (like myself) knows about is Stripe. Before Stripe, accepting payments as an online business was a complex & long process. Stripe made it quick & easy for online businesses to accept payments. And the market has rewarded them with the crown of most valuable US-based startup at $95 billion. Put simply, their mission is to increase the GDP of the internet. But we aren't here to talk about who is increasing the GDP of the internet. Rather, I want to talk about a question that does not get enough attention:

Who protects the GDP of the internet?


Another critical barrier for any new technology platform is trust. Today, we trust the internet services we use with our money, communication, really our entire lives, so it may be difficult for us to understand how did we get to trusting these internet services. That's why it's useful to look at crypto. A new technology platform is being built before our eyes, and if you know anything about it, plenty of people don't trust it.



Notice a common theme in the pictures above? Crypto exchanges, where you can buy and sell cryptocurrencies, kept getting hacked. That created a gap for a crypto exchange to spur widespread adoption. Be a crypto exchange where fraud doesn't run rampant. A place where users could trust to trade crypto. And that's exactly what Coinbase, the first major crypto company to go public for $65 billion, did.

Coinbase co-founder Fred Ehrsam spoke on trust when Coinbase went public:

It was very important to lean into having high integrity as a brand for both customers and regulators. We made a critical decision very early when the FinCEN guidance, which is a subset of Treasury, came out. The first crypto regulatory guidance ever in March 2013. I remember calling our lawyer that day and asking him, "hey, this crypto regulatory guidance came out. What should we do?" And his response was, "you guys are a really small startup and this is going to be super burdensome, it's going to cost you a ton of money. You should ignore it for as long as possible." So I sort of said, "ok". Hung up the phone. Later that night at dinner, Brian (Coinbase co-founder) and I talked about it. We decided to call the guy back the next day and fire him. And basically go the exact opposite direction. And I think that is the bedrock on which a lot of the white knight brand of Coinbase is built today.

So who helped Coinbase gain the trust of millions of users to use its product? Who is protecting the GDP of Coinbase? Some google searching reveals in Coinbase's Privacy Policy that a company called Sift (formerly Sift Science) plays a key role.


This need to prevent fraud isn't unique to crypto. As mentioned earlier, every technology platform needs it. If online businesses don't, they risk users complaining about their company on social media, losing users all together, and worst of all, not being able to accept payments. Or in Sift's terms, not protecting the GDP of the internet.

Before we explore how Sift is helping online businesses prevent fraud, let's understand how much of a problem online fraud has become.

Who's winning losing? Offline or online fraud?

Recently, there's been a change in the payment fraud world. I'll let you try to decipher the not-so-nicely designed chart from the Federal Reserve below.


To clarify the picture above, in-person card payments are when you're in a store and purchase something, often referred to as card-present (CP). Remote card payments are transactions you make on the internet, known as card-not-present (CNP).

2016 marked the first year in the US that online fraud surpassed offline fraud. You may be concluding, "So? Everyone shops online now! Obviously fraud would be higher online." Hold your horses, buddy, I got another stat for ya..

In 2016, eCommerce only made up 8.2% of all retail sales in the US. Yet, it made up 61% of credit card fraud. 🤯

And it wasn't just that online fraud grew faster than offline fraud, but also the fact that offline fraud dropped. The cause of this drop is the introduction of EMV cards. You know, that credit card you got a few years ago with a chip in it. The one that made you insert your card in the credit card reader, awkwardly wait for it to process, accidentally pull it out too early, further increasing the awkwardness. I won't go into the technology behind EMV cards, but EMV cards weren't built for speed, they were built to improve security. And that's why offline fraud dropped.

Which leads to the fraudsters next prey. Fraudsters gotta eat too. And here are the reasons they are going to the internet to feast.

Internet fraud - more scale & specialization, less friction


Fraudsters act like how I wish I organized my task list - high impact, low effort.  They see where they can make the most money, for the lowest effort. They're looking for where the lowest hanging fruit fraud is. The scalability of internet fraud makes it high impact.

🌏 Mr. Worldwide (web)!

Pitbull is an artist you hate to love. Cheesy when sober, so good when drunk. When Pitbull came onto the scene, he was called Mr. 305 because that's the area code of his hometown, Miami-Dade County. But when Pitbull elevated from being a star in his local town to being a star all around the world, he upgraded his nickname - Mr. Worldwide! The same potential came true for businesses when the internet was born.

Take for example, an instructor selling a course in their local town vs. the internet. Let's say you want to teach people English in an American town that has high concentrations of non-English speakers, great! You have plenty of potential customers. But what if your skill is more niche? Like piloting a drone, or learning to code. In either situation, you limit growth because your total addressable market is confined to your local town. With the internet, that changes. Teachable is a place where you can sell your online course, helping you reach every potential customer, no matter where they are in the world.  17 Teachable instructors earned over $1M in 2019 teaching things like astrology/metaphysics, film making, and..making money online (how meta).


Big companies (eventually) understood that being online could drive revenue growth. So when big companies rushed to get online in the mid 2000's, they didn't understand the vulnerabilities in the systems they were building. One hacker, Albert Gonzalez, found he could quite easily access customer credit card information. After he was caught, it was found that he stole info on 170 million credit & debit cards. And that's only the ones the government discovered. One hacker said, "there were major chains and big hacks that would dwarf TJX. I'm just waiting fo them to indict us for the rest of them."

Since then, breaches have gotten only worse. I mentioned in my article about Tonic.ai, that 2020 was another record year for data breaches. Hackers like Gonzalez can then take these stolen cards to the dark web, where forums like Shadowcrew exist. Shadowcrew gave a place for fraudsters to anonymously sell the identities and card numbers they stole to fraudsters who wanted to use them. Think of it like eBay for fraudsters. But, how could you trust if what someone is selling is legit or not? After all, they are fraudsters. So Shadowcrew moderators became the middleman that every transaction went through. If you got ripped off, the moderator would reimburse you or find you a replacement cards/identities. Huh, It turns out even fraudsters need fraud prevention.

Contrast the sophistication of the online fraud scene to the constraints of most offline fraudsters. As an offline fraudster, you were confined to your local town when stealing credit cards. In what local town are fraudsters going to be able to steal 170 million credit cards? And then how the hell do you find people to sell those stolen credit cards? It's not quite the same as a teenager pulling a "hey mister" and asking someone to buy beer for them.

A real economy developed in the online fraud world, with specific players specializing in what function they could perform best (I know my Economist professor would love that line). Another advantage of online fraud was being able to combine the scalability of sensitive data available online with scalable labor.

🤖 An army of robots is already here

Hollywood loves scaring us with a time in the future when robots run the world - I Robot, Eagle Eye, Smart House. That's the blue pill. The red pill is that the robots are already here.



I know. I'm basic for quoting Naval. What Naval is referring to here is what software engineers do (and what my Indian dad wishes I did). Through code, you can program computers to do repetitive tasks for you. It's like the difference between (Iron Man 3 spoiler alert!) being Iron Man vs having a bunch of Iron Man drones. At first, killing all the bad guys, doing photo-ops, looking like a badass is all fun and games. But like any repetitive task, it gets boring. So you create a drone version of your suit to handle the repetitive task of greeting your girlfriend when she comes home from work (I really hope my gf doesn't kill me for that line).

Robots that run repetitive, automated processes are rampant in the digital world. More than half of Internet traffic is bots scanning content, interacting with webpages, chatting with users, or looking for attack targets. Bots aren't only good for automating repetitive tasks, they also do it in a scalable way. Imagine if bots didn't make up half of internet traffic, and instead we needed humans to do that. Recruiting, training, and keeping that many individuals happy would be far more difficult and uneconomical compared to writing computer code once for the initial bot and scaling to as many bots as you'd like.

Fraudsters use bots for these same reasons to carry out their attacks. You know how we talked about massive data breaches earlier? That data is sold on the dark web by fraudsters who not only gained access to stolen credit cards, but also to login credentials.

👶👶👶DUMB DOWN TIME
👶👶👶

Logins aren't just logins. Logins are the keys we use to access our digital homes that contain our valuables. Today, we value our digital property just as much as our physical (my cousin Shaan has a great twitter thread on this). You have stored funds and payment methods on sites like Venmo, Robinhood, Airbnb, etc. Fraudsters noticed too. Getting access to someone's login credentials and using it for malicious reasons is a newer, faster-growing form of fraud, compared to the more traditional payment fraud we've been talking about so far. It's called account takeover. Fraudsters are exploiting it because businesses don't realize just how big of a problem it's going to become, and thus dedicate less resources to stopping it.

To commit account takeover, one technique fraudsters employ is using bots, called credential stuffing. Let's say you get login credentials from a breach of a major department store. You're not going to use those login credentials to try to just login to that major department store's eCommerce site. You're going to try it on tons of sites. The reason you can try someone's login on a plethora of sites is because it's estimated that as high as 85% of users reuse the same login credentials for multiple services. (BTW, don't be that person. please grow the f up and use some free service like LastPass). The problem fraudsters encounter is with success rate.

Only about 0.1% of the time a fraudster tries this method do they actually get access to an account. Fraudsters aren't going to sit there and type in logins one-by-one for only 1 in every 1,000 to work, especially when they have access to millions of credentials. They want high impact, for low effort. They're going to write a coding program that will make bots do it for them. So if they have access to 1 million login credentials and successfully login into 0.1% of them, they've gained access to 1,000 accounts, on one site! They can then scale these efforts over plenty of other sites on the web.

This combination of having access to millions of login credentials and being able to automate the process of hacking into them with bots makes online fraud a scalable pursuit.

🛍 Friction free leads to a fraud spree

When you look at online businesses, they have reduced friction in our lives. No longer do I need to commute to the grocery store, walk around searching for what I need, get hit by some 5-year-old driving a cart, wait in a long line, and carry heavy bags home. I can type grocery items in a search bar, click a button, and have them delivered to my house (please catch up Trader Joes). This phenomenon isn't unique to the grocery industry. Making commerce more convenient through the internet has been replicated across every industry, leading to the consumer expectation that internet experiences are fast. The internet's speed has reduced friction for all users, not just legitimate ones, which has had the opposite effect of EMV cards. The friction and security EMV cards introduced made offline fraud difficult. The speed and convenience the internet created made online fraud easy. To understand better, let's compare some of the risks & dynamics fraudsters face in real life (IRL) vs online.

IRL fraud - concrete identity & high stake consequences


One great IRL fraud scheme back in the swiping card days was getting access to stolen debit cards, going to ATM's, and extracting funds from bank accounts. The one problem is hiding your identity. In this scenario, you can't pull the ski-mask to hide your face trick (or at least in pre-COVID times) because that's fucking sketchy. Your face is visible and clear for that ATM camera. So you decide to wear a disguise to make the face caught on camera look nothing like you. But then you have to consider the second type of risk, which the hacker Anthony Gonzalez mentioned before, encountered. He didn't go to that ATM to extract funds from one stolen debit card, he did it for a fuck ton. So he was at the ATM raking in stacks of cash in the middle of the night, but it took him like 10 - 20 minutes to do it. Who's at an ATM for 20 minutes? Sketch alert! Some cops noticed Gonzalez doing this, foiled his master plan, and arrested him. The risk of what happens if you get caught for IRL fraud is high. You go to fucking jail!

Online fraud - fluid identity & lower stake consequences


Let's say an online fraudster gets caught by an online business' anti-fraud system. What happens? Does the company report that online fraudster to the authorities who then easily identify and arrest that person? No, the most common outcome is that fraudster gets one of their internet identity signals banned from using the service again. Does this mean the fraudster can't commit fraud again on that site? Nope! Manipulating your internet identity signals can be quite simple and cheap. Below, we'll talk about how a fraudster responds when one of their identifiers gets banned.

Our digital form of an address is called an IP address. But fraudsters use what are called Virtual Private Networks (VPNs) that only cost a few bucks per month. VPNs hide your real IP address and give you a randomized, often-changing one. So when a fraudster gets caught and the IP address they were using gets banned, it ain't no thang, you can change your IP address easily with a VPN!

A device ID is another form of identity on the internet. The easiest way to understand it is let's say you login to your Twitter on your computer and hit the "Remember Me" box, the next time you go to Twitter on that same computer, you don't have to login again. Twitter recognizes that device ID. But let's say you try to login to your Twitter from a new, different device, like your iPhone. Twitter will recognize the iPhone as a new device ID, which is why you need to login again. The specific make, model, operating system, etc. makes a device ID unique. So if a fraudster gets caught committing fraud on a device and the online business bans the device, it doesn't mean shit to the fraudster. Fraudsters can change device IDs by updating their operating system, or change a setting in their browser, for example. The half-life for a device fingerprint can be shorter than one month. Additionally, fraudsters aren't just selling credit cards and login credentials on the dark web, they can also sell device IDs of legitimate users. If a company has seen successful transactions with a device ID, they can mark that device as legitimate, which also gives a way for fraudsters to sneak past anti-fraud systems.

The last common form of identifier is your account. But come on, how legit is an account as an identifier? How much does an account really know about who I am? In most cases, they just want an email address, which I can quickly create...for free. I mean have you ever heard of a finsta, you boomer? "Fake" accounts are exactly how fraudsters transact with each other on the dark web. No longer do you have to reveal your name and face to other fraudsters to commit a crime with them.

So not only are the consequences of getting caught low risk, but you can also respawn like it's Call of Duty. That risk/reward calculation has led fraudsters to flock to the possibilities on the internet.

So how have businesses attempted to protect their services from online fraudsters? Let's explore.

The business version of whack-a-mole


Traditionally, businesses have approached the issue of fraud with what I call the whack-a-mole technique, but more professionally, it's known as rules. Fraud rules are simple; if a user fits x criteria we've deemed suspicious, take y business action. Some rules are obvious to create, such as block all orders from North Korea. Review orders over $1,000 when the average customer purchase is only $50. But many rules happen after fraud has occurred, and that's the whack-a-mole approach. Usually businesses figure out about fraud 2 - 4 weeks after it's occurred. How? Through a chargeback.

👶👶👶DUMB DOWN TIME
👶👶👶

A chargeback is the legitimate cardholder saying something like, "What the hell are these charges at Chicago parking meters and hotels?! The last time I went to Chicago was years ago for an overpriced, overcrowded NYE party. I'm going to call the bank and file a chargeback." Your bank rep will say, "Ah yes sir, of course. We will have these charges removed right away, cancel your current card, and send you a brand spanking new one." You will get your money back because you are the bank's customer, the bank is incentivized to keep you happy, not the business that let a fraudulent transaction slip past through their anti-fraud system. A chargeback will make the business fully refund the customer, and will charge a fee to the business ranging from $20 - $40. Additional costs for the business include lost inventory, any operations associated with fulfilling the order, and time spent dealing with the chargeback.

Businesses want to avoid chargebacks for all the reasons above, but the biggest potential threat is losing the ability to accept payments. If a business exceeds chargeback thresholds (~1% of transactions being chargebacks), the middlemen who help businesses accept payments (Visa, Mastercard, Discover) can remove the business from their network, which means your card won't work on that site. So for instance, if you were removed from Visa's network, you would be losing access to about half of the purchase volume on credit cards. In laymen terms, you would be fucked.

So how does this business respond to that chargeback to prevent future ones?


Have you heard of Tom & Jerry? If not, you're probably way younger than me, and I have a shit ton of respect that you're reading a 3,000+ word essay on online fucking fraud. Tom & Jerry is about a cat's never-ending quest to catch a mouse. The cat makes a fool of himself chasing the mouse causing collateral damage along the way. The mouse is always coming up with ways to outfox the cat, constantly adapting behavior after being briefly captured by the cat. That my friend, is the game of online fraud.

After businesses become aware of fraud, they put their fraud fighting teams on the case. Let's say a fraudster has their order blocked by the $1,000 rule we just talked about. The fraudster realizes that maybe making a $1,000 order isn't the best way to blend in and seem like a legitimate user. So instead, they make 12 orders from different accounts and it slips past the business' fraud detection system. 2 - 4 weeks later, the business will receive notice of the fraud that occurred and have to figure out what were the signals or characteristics about that user that are indicative of fraud. Maybe they'll block the fraudsters IP address and device from using their service again. Maybe the order happened at 3 AM, so instead of instantly fulfilling it, they may block, or if they have the capacity, review orders at such odd hours before processing them. Will that stop the fraudster? You know better. This is Tom and Jerry, a game of cat and mouse.

Online businesses try hard to catch fraud, but in the process create many additional problems they have to deal with. When they do catch the fraudsters, it is a temporary bandaid. As we just learned, when a fraudster gets caught, they don't call it quits, they change and adapt. So here are the issues created when a business uses a rules-based fraud detection system.

💪 Fraudsters are playing survival of the fittest

As mentioned before, if a fraudster sees their $1,000 order is blocked, maybe they'll do multiple orders broken down into smaller chunks. Their job is to poke holes in systems and find weaknesses. They need to find ways to adapt, or else they risk not being able to make a living. They'll have to become a normal person and get a 9-5 job with a boss. Yuck. Now that's a fraudster's worst nightmare.

As an example, companies have improved their ability to detect credit card fraud, so fraudsters have turned to alternative payment methods (cryptocurrencies, gift cards, in-app credits) as new attack vectors. Despite this, only 26% of businesses believe they are very effective in preventing fraud from non-credit card payment sources. This highlights that merchants aren't keeping pace with the rate at which fraudsters are adapting.

🤬 Rules are reactive

When you implement a rule, it's after fraud has already been committed. And potentially lots of it. When a fraudster finds a hole in the system, they go hard at it.  Since it takes a business 2-4 weeks to receive a chargeback and investigate it, it leaves a large block of time for that hole in the system to continue to be exposed. So fraudsters continue to attack it and share it with their online fraud buds through forums on the dark web.

But like we already learned, fraudsters adapt. You'll continue to be hit with fraud, just in new and unique ways that you'll have to discover and create rules for. So you're just constantly in reaction mode, not being able to proactively prevent fraud before it happens. That's just poor business operations.

💰 Throw bodies at the problem

When businesses face problems they throw people at the problem to solve it. That's how online fraud has been solved historically, a combination of rules and people. In addition to rules that stop fraudulent orders, companies will manually review potentially fraudulent ones where they aren't as sure if it's fraud or not. These potentially fraudulent orders are in a queue where they will be accepted or rejected, after the manual reviewer looks it over. Here are three difficulties we mere mortals face when trying to manual review online orders to combat fraud:

  1. We expect everything to be fast online. Thanks to Amazon, when we place an order on an eCommerce site, we now expect it to be shipped the next day, if not faster. When we place a food delivery order on DoorDash, our hangry meter will accept no delivery longer than one hour. So when you consider the scale of orders you can receive as an online business, the number of potentially fraudulent orders can come into your manual review queue at a dizzying pace. How can you handle the scale of thoroughly manually reviewing these orders, while maintaining that low friction, high-quality experience (fast delivery) that consumers have come to expect on the internet?
  2. As online companies scale their revenues, they'll have to also scale their manual review team to handle the increase in potentially fraudulent cases to review. And if you know anything about scaling companies, you can never hire fast enough. Finding high-quality talent, building training programs that quickly onboard new teammates, retaining, managing, and keeping employees happy is a lot. Some parts of your business this is a necessary undertaking, specifically ones where you have a core competency and are trying to differentiate. But other parts of your business are non-core, repeatable problems every business faces. Those are opportunities to outsource or automate.
  3. The fundamental problem isn't MR. It's what causes excessive MR - inaccurate rules. Fraud teams have to review lots of orders that are obviously fraud, or obviously legitimate. That's inefficient. You want your team to only review orders that are more nuanced and require a human eye. The inaccuracy of rules is an opportunity to use technology that is more accurate to reduce the amount of MR required.

😱 Limit revenue growth (VCs/startups gasping)

Rules treat the world as black and white. If an order meets this specific criteria, block it, because it must be fraud. But as you grow older, you realize it's grey (or maybe that's just my gf pointing out my 7 grey hair?). Here are examples of rules that seem to catch fraudulent orders, but also would block legitimate ones:

  1. Get as stereotypical as you can for a second. What do you think of when you think of a hacker? Probably someone in a dark room in the middle of the night hacking away. The fact that the most fraud occurs at 3 AM totally makes sense, fraudsters don't have bosses! They work weird hours. But, wait a second, ever heard of drunk shopping? In one survey, 52% of people who online shopped while drunk did it late into the night. So keep in mind in addition to all that late night fraud you catch, you'll also be blocking some drunk purchases that may not have been purchased otherwise.
  2. Fraudsters don't browse around like normal shoppers. They know exactly what they want to defraud and they do it immediately. That's why it's been found that orders placed within 4 minutes of their user account being created, are 35x more risky. But, what if I, Sameer Jauhar, fellow legitimate user was browsing their site on my mobile phone, and then decided to go home, quickly create an account, and make a purchase on my laptop? Or rather I heard about a product on some podcast I trust and was ready to buy immediately. If my first experience as a consumer, I get blocked, I'm unlikely to return.

Businesses who primarily sell online tend to be younger, meaning they're more likely to be a startup. And if there's one thing I know about startups, it's that they fucking love growth. They hate anything that gets in the way of it. I know one eCommerce founder that didn't care if the CVV on a credit card didn't match the one a user inputted while placing an order. She was more concerned that she was blocking orders from legitimate users who incorrectly inputted the CVV vs stopping fraudulent orders because the fraudster didn't have the CVV. I love that founder's uninhibited focus on one metric.

Startups have good reason to be worried about this risk. In a recent survey, 36% of respondents tried to make a purchase, but the transaction was falsely declined. 25% of those consumers who had their order incorrectly flagged as fraudulent ended up buying from a competitor. The consumer went elsewhere because that key currency of trust is missing between the customer and the original brand.

So rules not only struggle to stop fraud, but they also are too rudimentary to identify legitimate orders. From both sides, rules just aren't accurate enough.

"Ok, ok, ok we get it. You think rules suck. Do you have a better solution?"

Glad you asked.

An automated, scalable, and customer friendly approach to fraud


All of these issues described above made it possible for a new anti-fraud company to emerge. One that proactively adapts to changing fraud patterns, holistically assess users, utilizes automation, and doesn't limit revenue growth. One that protects businesses from fraudsters and helps them enable trust with their legitimate users.

This company had to be built different. It couldn't utilize the technology of the past. That company is Sift. And they use machine learning (ML) to address the shortcomings of rules.

👶👶👶DUMB DOWN TIME👶👶👶

Think of ML like our brain. More specifically, let's use a baby's brain as an example. When babies are young, they don't know a damn thing. So we have to teach them! An example is animals. We show them animals and tell them the name of those animals. We give them the answer today, and they can tell us the answer tomorrow. In their brain, their mind is learning all the little details that are associated with each animal. So then going forward, you don't need to give them the answer, they can predict the right answer by observing the little details about the animals. That's how ML works.

At first, when an ML model can't predict anything, you have to train it. You start by feeding it two parts of data. One is the input data. In this example, it's the picture of a tiger or lion. The ML model can analyze every little detail (inputs). And then you tell it which animal that is (output). So during that process the ML model will learn inputs like sharp fangs aren't a good indicator of which animal it is because both tigers and lions have them. But the model will also learn inputs that are clear differentiators. The model will notice that 100% of the time that an animal has bold, black stripes it's a tiger. And then finally, the model will learn inputs that could be potential indicators. Generally, tigers are longer, more muscular and heavier. So if the model only notices one of these inputs/traits, it might not be so sure if it's a tiger or lion. But if the model observes an animal has all these inputs/traits, it will be fairly confident that it's a tiger. Once you've trained the ML model, then it can predict from a picture if it's a lion or tiger, without you giving the answer to the ML model. Each input is associated a weighting to determine how important it is to predict the output. The model combines all of these inputs & their weights to come up with a prediction (eg. the model is 94% sure this is a tiger).

"Why the fuck would an ML system predict lions vs tigers?"


Listen, it's a silly, simple example! I was trying to make it easy for you. Sue me! But since you're such a smart cookie, let's go deeper into how ML can actually be useful.

To understand how ML changes the equation, let's look at how an ideal fraud fighting team would operate.

Gain a more ~holistic~ understanding


Sorry, my Yoga instructors and their mini speeches on living a more holistic life seem to be slipping into this post. When fraud fighters see a signal as fraudulent (eg. order placed at 3 AM), they might think placing a rule there will stop fraud. It may, but it'll also stop some legitimate orders, like a drunk online shopper. And that's a fundamental issue with rules, especially for startups that need to grow fast and build trust with consumers. Rules see a single signal as a determiner of fraud or not. "Since we're seeing a lot of fraud from orders at 3 AM, all orders then must be fraud!" But any given signal isn't your silver bullet solution to stop fraud. Rather, you want an ML model that observes the entire user journey - from start to finish on your site, collecting every data point along the way.

The reason you want to collect every data point is because this gives you a more holistic understanding of the risk a user poses. As a fraudster, you're no longer just evading specific fraud signals that you've learned to get around, you're being evaluated on every action you take. As a fraudster, you have to act like a legitimate user, which adds friction to commit fraud. That's frustrating for a fraudster. Fraudsters want to go where the lowest hanging fruit is, which means they'll go elsewhere.

But ML is also important for the other set of users you have. You know the ones who make up all your revenue, that helps justify your batshit startup valuation to venture capitalists? You shouldn't ruin the customer experience for all users because there are a few bad apples. That just eliminates trust. Instead, by getting a more holistic understanding of a user, you can also make sure legitimate users aren't flagged as fraud and end up buying from your competitor. Growth is an under-discussed topic for fraud fighting teams because well, they fight fraud. They don't focus on growth. But obviously growth is a goal more broadly for the company. ML can help fraud fighting teams be more aligned in helping their company achieve their goals.

Adopt technology when it has an edge vs humans


When we talk about Sift's ML model collecting data points, it's good to understand the scale of it. Sift's ML model collects 16,000 data points on each user that comes through the site. No human would ever be able to accurately process and understand all the fraud patterns within those 16,000 data points. It would be too overwhelming. A human would just resort to looking at a few signals that they've relied on in the past, kind of like rules. But here's where computers beat us. They have the capacity (compute power) to be able to ingest all 16,000 data points and understand what data points are indicative of a fraudulent user vs a legitimate one. They can see trends in fraudulent activity that we can't. Just like an Excel formula can sum numbers, do vlookups, and other formulas better than if I did them in my head.

There's two models Sift uses. The one you have to train is a custom model. You train that model because each business faces their own unique flavor of fraud. By training the model, it can understand the fraud patterns unique to your business. But there's also a second model. Without even training this second model, you can get value from it on day 1. The day 1 value model is Sift's global model. You often hear about network effects in the consumer social world. Facebook gets more valuable to users with each user it adds, which makes it more compelling for new users to join. That virtuous cycle also applies to Sift. Here you can benefit from the fraud learnings that all their other customers have gleaned. When Sift adds a customer, they get more data on what fraud looks like, which means more fraud learnings for customers, making it more compelling for existing customers to continue to use it & new customer to consider using it.

Some of the companies' fraud learnings you can access by using Sift's global model.

This data heavy approach helps you become more proactive in your fraud fighting techniques. Let's say you ban (create a rule) a certain IP after you saw a fraudulent order came from it. Not a problem for the fraudster. They'll simply use a VPN that will generate a new, random IP address for them. You'll be creating rules over and over again. And that's just for one fraudster. Instead, you want a fraud solution that can take in vast amounts of data and understand patterns in the data. So when fraudsters eventually change their tactics, your fraud solution keeps up. A big piece of this is the combination of the global network and online learning.

Let's say Sift's customer DoorDash marks something as fraud. Sift doesn't wait a month to act on this information and improve their fraud models. They engage in online learning, where they automatically update their global model with this fraud learning. That means after a fraudster is successfully flagged in Sift's network, within seconds, that learning is shared with every Sift customer. So if that fraudster tries to defraud any other Sift customer even seconds later, they'll be stopped in their tracks by the global model.

Automate away


As a business grows, a typical fraud team would solve it by trying to hire more people. But that could mean long wait teams for a consumer to have their order fulfilled (bad customer experience), having to hire and train more team members (complex), and having the team review legitimate orders that were flagged by a rule, like a drunk shopper shopping at 3 AM (inefficient use of fraud team). Instead, you can automate. Here's how you would do it.

Similar to the lions and tiger example, you'll need to train your custom ML model to understand your fraud patterns. As the custom model collects data on each user, you'll tell the model, "this user is fraudulent" or "this user isn't fraudulent". After you've labeled ~50 fraudulent transactions, the model will be able to properly calculate the probability of how likely each signal is indicative of fraud. The model will combine all of this information into a score. You can feel confident this score is accurate because it's a much more comprehensive way to understand users and the risk they pose. Given this confidence, you can now automate. Now you can auto-accept any order with a fraud score of 0 - 60, review anything from 70 - 90, and auto-block anything above 90.

The number ranges you choose are all based on your business objectives. Business focused more on revenue growth? Increase the score at which you block orders so you block less. Want to focus on bringing your fraud rate down? Decrease the score where you accept orders so you accept less. Don't want to increase your fraud team as your revenue grows? Decrease the score range in which you review orders so your team reviews less.

Now that your fraud solution is more accurate from both sides, your fraud team won't have to review orders that are obviously fraud or obviously legitimate. You're now auto-accepting or auto-blocking them. Your business is doing a better job protecting itself against fraud, and enabling trust with legitimate users. And your fraud team will have to review less orders. While this is great from the perspective of efficiency, it's also a better customer experience. The less customers have their orders in a holding period, the faster they get their product, which is the experience internet consumers demand.

When a user goes in the manual review queue now, maybe that user has some fraudulent signals and some legitimate ones, giving it a fraud score of 70. Your trained human fraud fighter can do extra digging to clear up the complexity of the user. The fraud fighter can navigate this data through Sift's console, understanding some of the reasons the ML model thought this user might be fraud. When the fraud fighter finishes their investigation, they will label the user as a fraud or not. The ML model will learn from that, improving its accuracy to understand fraud, which means more automation.

Sift's console makes it easy to quickly manually review orders.

Want to compete with the FANG's of the world? (or whatever hell the acronym is now)

Facebook doesn't need Sift's technology. In addition to payment fraud, Facebook also faces fraudsters creating fake accounts that post scam/spam content. In the last quarter of 2018, they took down 116M accounts. They have hundreds of people working on anti-fraud machine learning models. So do the other tech monopolies like Google. In fact, Google has done a magnificent job of combatting email spam on Gmail. There are 347 billion emails sent per day. Some estimates have spam emails making up half of that. Opening up your email inbox every morning is already awful enough. Could you imagine if half of your inbox was spam? Well Gmail uses machine learning to stop 99.9% of spam emails from ever entering your inbox.

But who's protecting the GDP of the rest of the internet and enabling trust with legitimate users? Most businesses don't have the same resources at their disposal as the largest tech companies. As an example, in 2016, Google said they had 27K employees in research and development (product & engineering departments of the org). 27,000!

You need to partner with a company that's sole focus is using machine learning to help you protect and grow. A company that stops fraud on 34,000 sites, analyzes 70 billion events per month, and assessed risk on $250 billion in transactions in 2020. That company is Sift.

You did it!

Thank you for reading my ridiculously long posts.

If you'd enjoyed it, I'd appreciate it if you joined my email list. I share a new company deep dive ~once a month.

And if you're looking to do a good deed today, like or retweet the thread on Twitter (linked below). It would mean a lot!



Thank you to my wonderful girlfriend Rishika for editing, and my good friend Nigel for his feedback.