The Open Source A.I. Boom — Standing on the Shoulders of Giants

One of the things that excites me about artificial intelligence is how open it is. All of the brilliant minds that are working on it aren’t hoarding their ideas and keeping them close to their chest, hoping to make billions off of them. The amount of free and open source technology that is available to build your own artificial intelligence is mind blowing, to the point where you can use the same underlying tech that Google or Microsoft uses, without paying a dime. Ever. No strings attached.

A lot of what has been developed in AI started with research, typically academic research. Most academic researchers publish their findings in journals, or academic texts. If you’re a nerd like me, you probably realized after leaving university that there’s no chance in hell you can afford all of those amazing journals you had access to as a student. Journals charge hundreds of dollars for access to single articles, let alone open access to their entire databases. Furthermore, academic texts usually costs hundreds of dollars each, again, no surprise to students or recent grads (or parents of students) out there.

Open Source

With technology, we’ve always had a quiet ruckus movement of those supporting free and open source software (FOSS). It’s the idea that the best way to make amazing software is by making the source code available to everybody, so that they can find bugs, fix bugs, add functionality, or build off someone else’s idea. This leads to innovation, and is the reason why more than 70% of the internet is powered by Linux, an operating system that is open source, unlike Windows or Apple Mac OS.

The research behind artificial intelligence is done by those who grew up in a world where open source software made sense. We have websites like GitHub that have over 1.1 billion contributions of open source software stretching from kids creating games, college students doing their homework, all the way to multinational billion dollar companies hosting their source code. Many AI researchers have even come out against traditional academic journals, stating that they won’t publish to their journals if they are closed access.

A good way to look at it is that source code = time. It takes time to write code, but it also takes time to collect and curate the information necessary to build and train an AI model. So the impact of working off of someone else’s code is that it let’s you start with a head start. Like a 10 year head start.

Labeling 14 Million Pictures

ImageNet is a image database that is organized in a structured format, and contains 14 million images that have been labelled and categorized by humans. You need something like ImageNet if you want to train an AI to detect objects in pictures. It is run by professors from Stanford and Princeton, and has been in development since 2009.

How we teach computers to understand pictures | Fei Fei Li

You Only Look Once

If you think ImageNet is cool (nerd), then you might find the project Darknet Yolo interesting. This tool allows you to download some software, hook it up to a camera, and you have instant object recognition capabilities. It can highlight the objects it sees and label them. If there are 3 dogs, 2 people and a frying pan in the picture, it will find them, put a big rectangle around each, with a label indicating what it’s found. Just. Like. That. It was developed by a student at University of Washington, and it’s free and open source. The best part, it was trained using ImageNet data. One amazing project that ties into another. If you want to create a smart camera to classify the objects it records, all of the hard work is done for you. For free.


Google — Solving the World’s Problems

Image result for tensorflow

Lastly I want to talk about Google. Google is using machine learning to solve some pretty huge problems in health care. In developing countries, there are too many people that require medical attention that simply do not have access to a doctor. There aren’t enough doctors being trained to treat all of the patients. An example of this is the rise of diabetes in India. If you have diabetes, you need to be in and out of doctor’s offices to check not only you blood sugar, but diabetes can lead to high blood pressure, increased risk of heart attack and stroke, kidney disease, nerve damage, you can even go blind if you don’t regularly see the eye doctor. In India, there aren’t enough eye doctors for people to see. That means that people with diabetes are going blind, which is completely preventable, if you have access to an eye doctor. Google has built an AI that can help with this, and can perform a quick scan using just a camera and it’s software to determine if the patient is a high risk of going blind. The best part? The software Google used to create this tool is absolutely free. Even better, the software was created by Google. TensorFlow is a free and open source tool that Google has developed and made available to everybody to use, for free. And if it’s good enough for Google, I think it’s good enough for me.

So I’m off to go program some AI. I hope you enjoyed this post. If you found it interesting or useful, feel free to comment below.

Data Science Industry Insurance

Insurance in the Age of AI, Block Chain and an Overabundance of Data

The traditional insurance model has had a pretty good run. It has been slowly evolving over the past few hundred years to include new coverages, multiple distribution channels (broker, agent, online), and create more complex actuarial models.  The financial industry has been known to be relatively slow adopters of new technology, mostly because companies simply cannot take undue risk – and those working in insurance are experts in minimizing risk.


This article is going to outline the current state of insurance, and the current progress technology has made, and will argue that the industry may be on the verge of significant disruption – the likes of which has the potential to render most policies, companies, employees and value added services obsolete.


Two things to keep in mind as we move through this piece.

  1. Nobody likes insurance. They need insurance, therefore they don’t like it.  It’s a necessary evil, and the perceived value isn’t there for those who do not have claims and have limited risk
  2. Technology has increased the rate of change to the point where the time it takes science fiction to become a reality can be reduced to months. The telegraph was the only mode of long distance communication for over 50 years before the telephone was invented.  And even then, it took an additional 60 years before they were mainstream.   Twitter started off with 400,000 per quarter and grew to 100,000,000 per quarter after one year.  Now, there are 6,000 tweets per second and 200 billion (with a b) per year.


Artificial Intelligence


There are a number of terms used when describing AI.  Neural networks, machine learning, deep learning, the list goes on.  This article will not provide a rich overview of how artificial intelligence works, there are a number of great resources that do that already.  But let me give you the Cole’s Notes version.  Traditional computer programs and algorithms have been exceptionally good at performing computational tasks.  Give it a problem to solve, or some code to run, and it can do so significantly faster than a human ever could.  The bottleneck with that approach is that it needed a human to give it clear and specific instructions on exactly what to do.  Furthermore, the computer could only process data that it could understand, which has typically been something coded into a language a computer can translate (programming language/machine code) or broken down into pure numbers (computers are great at math).  What traditional computers haven’t been able to do, in clear contrast to AI, is come to it’s own conclusions and uncover for itself the best solution given a specific rule set.  Furthermore, computers are getting really good at understanding non-traditional sources of information.  Image recognition, speech recognition, unsorted datasets.  When given access to a large number of YouTube videos, DeepMind’s AI taught itself what a cat looked like, and became very good at identifying cats against other images of animals.  Nobody told it what a cat looked like, it learned that on it’s own.




Everyone’s favorite subject.  It’s the technology that is going to change the world, and none of us really know what it is.  Again, I’m just going to provide a nutshell description of the technology, and it should become clearer as we go through the thought experiment below.  Blockchain technology provides a single ledger of transactions that is secured by cryptography and the fact that everyone works off the same books, at the same time.  You cannot change the record maliciously because your update will be in conflict with what everyone else’s records hold as the truth, and therefore it will spot the lie.  Blockchain effectively reduces the need for traditional institutions because it increases the self service capability without the risk of fraud.  It means that the gatekeeper mentality is no longer necessary because once the rules are established, everyone gets forced into playing nicely together, tracking whatever transactions are implemented.


The real cause of disruption for insurance – data


We’ve heard the stories about AI for awhile, but it’s nothing really new for the insurance world.  Analytics has been a growing part of the industry for years, and it’s obvious that with increased computing power we would inevitably evolve the rating algorithms and risk assessment tools to use the latest available tech.  The problem is that the AI boom is only made possible by the explosion of available data.  AI needs huge datasets to learn from, and these datasets are collected and aggregated by some of the largest companies in the world.  Google and Amazon are fighting toe to toe in a race to develop the smartest most capable AI, and others are joining that fight constantly.


Let’s step through the traditional method of getting insurance.  First it starts with someone seeking out insurance, physically deciding that they want/need, and then shopping for it.  Then they go through the drudgery of providing all of their information, which is tedious because they don’t readily know 90% of it, and because they also know that everything they say from here on in will impact the price of their insurance.  It’s kind of like going to a new doctor every time without any patient records, and relying on the patient to provide their entire medical history from memory – guaranteed they will some things wrong, or leave some things out.  And the more complicated their medical history is, the higher the likelihood of them forgetting something, and the greater chance that whatever they forgot was very important.


Now let’s contrast this to Google.  A quick look through my Google Account and Privacy Settings shows a detailed history of everything that Google knows about me.  They have my banking institution, my spending habits, obviously my address, where I work, how I get to work, what music I’m into, what food I eat, my hobbies and interests.  I also have a Nest thermostat and smoke alarm, so they have real time monitoring of my house’s temperature.  But I also upload all of my photos to Google Photos.  So they have pictures of the inside and outside of my house. They also have instant access to my photos if, let’s say, I get into a car accident and my first response is to snap pictures for proof.  Your first response might be “you should do more to protect your privacy” and my answer is that I already do.  I am relatively conservative in my approach to letting “Big Brother” track me.  I’m an informed user with a tech background and I do monthly audits of my privacy settings and sift through my personal data to weed out anything sensitive.  My question to you is “when is the last time you reviewed your privacy settings on Google, Facebook, Twitter, Instagram…?”.


Customer acquisition over profits


In the data driven age, profits are not your first priority, what you’re looking for is customers and data.  If you acquire enough users, you will be able to collect enough data, and many companies rely on that data to bring in the money. Companies will either sell that data, use the data to teach their algorithms, or analyze the data to determine how to become profitable.  More data equals higher levels of certainty, and you move from guessing to knowing when making decisions. It took Amazon 14 years to become profitable.  How many companies sat idly by saying “Amazon’s business model isn’t sustainable, they aren’t even profitable yet”?  Tesla still isn’t profitable, having started manufacturing electric cars in 2003 and being valued at $51 billion.  Just because a company isn’t profitable today does not mean it won’t become a world leader tomorrow.


The Future


This is the story of a fictitious insurance company named Acme Insurance.  The board of directors at Acme decided that the most important thing for their company to do is acquire as much data as possible and start using AI to rate policies, check claims for fraud, and to determine the best risks by segmenting the market into micro markets and focusing all their efforts on acquiring the right risks.


Their approach was simple, direct sales to consumers, 100% online, a beautiful user interface that was intuitive and easy to use. They undercut the market by minimizing their expenses.  Investment was small because they ran on Amazon Web Services, therefore only using the computing power they needed.

To identify their initial target markets, they hired contract programmers who had experience scraping the web.  They scoured the internet for every quick quote tool, price comparison website, rate manual, info sheets etc. that were publicly available. They then ran scenarios against these data sources to determine which risks received the best rates.  This let them use the knowledge and experience of the established industry to determine how to rate their clients.  They also used the loss ratios and experience from each insurance company to weight their algorithms, so Acme’s rating would more closely reflect higher performing companies.


Now that they knew how the industry priced risk, they hired the top digital marketing firms to go after the most attractive clients.  Marketing online is significantly cheaper and easier than traditional marketing.  They mapped out their demographics based on what social media platforms cater to each group.  They used information from Google search to figure out where these people lived, and what they searched for online.  They used sentiment analysis AI to read and understand all the reviews found online for their competitors, from Google to Glassdoor to Facebook.  Using all of this information, they determined the pain points for their key target markets, exactly what language they use to describe their frustrations, and exactly where these people spend their time online.


Acme is growing at an astronomical rate, increasing it’s book by about 40% month over month.  About 6 months in, they start to see the claims that go with that growth.  Luckily enough, the company is growing so fast that it artificially keeps their loss ratio to a controllable 200%.  The company is losing money, but with every claim they are gaining equity.  The way they are doing that is by scrutinizing every claim to teach an AI to better predict fraudulent claims and better understand risk factors.  Every minute detail of a claim is tracked and analyzed.  The system collects all like policies, determines the exposure of risks with similar geography, coverage, customer demographic, construction type, build year… The AI then incrementally adjusts the rates, limits, deductibles and wordings of those coverages and details to fine tune it’s algorithms to cover future losses.  But it doesn’t just have one set of rates and rules. The system maintains 10+ different rates and rules in order to test them against their policies to determine which action made sense.  Some policies will see an increase in premium, some will see a drop in limits, some will have increased deductibles, some will have all 3, some will see no change.  By doing this, Acme can perform an A/B test on it’s book to determine which actions properly mitigate risk, but also which actions are acceptable by their user base (by tracking number of cancellations).


After 5 years of rapid growth and losing money, Acme insurance seems to be a failure.  They grew too big too fast.  They tried breaking into too many different markets, servicing different countries, and stretching their region too thin.  They weren’t advertising using traditional methods, so most people hadn’t even heard of them before.  As a private company that only employs roughly 100 staff, they didn’t seem to pose a threat.  Furthermore, their CEO had been active on social media, and was quite transparent about the unsustainable losses they experienced.  With over $400 million in venture funding, everyone from big banks to housing developers, even governments had jumped on the bandwagon.  The company appeared to be a bust.  But that was from the outside looking in.


Year 5 marked a turning point for Acme. Their client base reached 50 million users world wide, writing in just about every developed country.  Their learning algorithm used it’s ability to constantly retrain and learn with every new policy, claim and customer complaint to successfully rate any building or automobile, anywhere in the world.  They were also able to successfully eliminate the individual from the rating equation, making decisions purely based on the physical item that was insured. By partnering with a number of companies developing smart home technologies, the vast majority of their houses were outfitted with appliances and sensors to speak with their AI and the insured in real time.  This provided constant reminders to minimize the risk of a claim – a hot spot reader will alert the insured that a candle is still on when they head for bed, the scheduler will notify the insured that it’s been 2 years since they cleaned their wood fireplace, and the smart stove automatically shuts off if it detects too much smoke in the range hood. Acme also had partnerships with every major automobile manufacturer, and created a data sharing pool which allowed them to gain insights into driving patterns, and in return Acme was able to significantly discount insurance rates on new cars – effectively reducing the cost of ownership and acting as a discount.  All policies were rating month to month, and were based on the previous month’s usage and learnings.


Acme’s final move was to leverage governments to allow it’s product to be sold in conjunction with mortgages, car loans, property taxes, and in some cases they just rolled the predicted cost into the purchase price.  Acme’s algorithm was so successful that it could confidently assess the 5-10 year premium for certain makes and models, and since most people own new cars for roughly 10 years, they simple added it onto the initial sale of the car.  Acme also began leveraging it’s capital to hire more and more climate scientists, and began modelling weather patterns in-house.  The company became synonymous with environmentalism and safety.  Acme rebuilt damaged homes using state of the art building materials to prevent future risks, such as breathable concrete, fire suppression systems, and hot spot detection devices.  Even houses located in flood plains received giant inflatable balloons that would circle the entire house to protect it from floods.


History would look fondly on the impact that Acme had, not just as a business and AI developer, but because it saved lives and changed the world.  Acme’s largest financial risk was climate change, and so the company leveraged it’s relationships with governments and it’s positive relationship with it’s insured to make drastic changes towards sustainability.  Acme led the charge towards fighting climate change, developing self driving cars that reduced traffic accidents to zero, and eliminated the need to shop for the best price for insurance.


Did you like this article? Want to learn about how Tesla and other car manufacturers will disrupt the insurance industry? Check out my other post here.