How did Instagram reach 14 million users with only 3 engineers?

How did Instagram reach 14 million users with only 3 engineers?

Compiled by Yun Zhao

Planning | Yan Zheng

Produced by | 51CTO Technology Stack (WeChat ID: blog)

October 6, 2010, San Francisco. While people were still enjoying the excitement of iPhone 4 with its more powerful camera, an iOS photo-sharing app called “Instagram” appeared in the App Store.

On the same day, it gained 25,000 first users. A week later, the number of downloads climbed to 100,000. From October 2010 to December 2011, Instagram's user base grew from 0 to 14 million in just over a year.

And its founder, Kevin Systrom, did this with only three engineers. Let's go back to that magical moment and think from the perspective of engineers to see how they did it.

In simple terms, they do this by following 3 key guiding principles and having a solid technology stack: keep things very simple, don’t reinvent the wheel, and use proven and reliable technologies whenever possible.

1. Early Basic Configuration

Instagram's early infrastructure was run on AWS using EC2 and Ubuntu Linux. For reference, EC2 is Amazon's service that allows developers to rent virtual computers.

To keep things simple, and because I like to think about users from an engineer’s perspective, let’s review the lifecycle of a user scenario session.

2. Front-end

Scenario review: The user opens the interface.

Instagram was originally launched as an iOS app in 2010. Since Swift was released in 2014, we can assume that Instagram was written using a combination of Objective-C and other things like UIKit.

picture

3. Load Balancing

To recap the scenario: When the app is opened, a request to fetch photos from the main feed is sent to the backend, where it reaches Instagram’s load balancer.

Instagram uses Amazon's Elastic Load Balancer. They have 3 NGINX instances that are swapped in and out depending on their health.

Each request first reaches the load balancer and is then routed to the actual application server.

picture

4. Backend

To recap, the load balancer sends the request to the application server, which holds the logic to handle the request correctly.

Instagram's application server uses Django, written in Python, and Gunicorn as their WSGI server.

To recap, WSGI (Web Server Gateway Interface) forwards requests from a web server to a web application.

Instagram uses Fabric to run commands in parallel on multiple instances at the same time. This allows code to be deployed in seconds.

They run on more than 25 Amazon High-CPU Super Large machines. Since the servers themselves are stateless, they can add more machines when they need to handle more requests.

picture

5. General Data Storage

Scenario recap: The application server discovers that the request requires data from the main feed. To do this, we assume that it requires:

  • Latest relevant photo ID
  • Actual photos that match those photo IDs
  • User data for these photos.

1. Database: Postgres

Scenario review: The application server obtains the latest relevant photo ID from Postgres.

The application server will pull data from PostgreSQL, which stores most of Instagram's data, such as user and photo metadata.

Connections between Postgres and Django are pooled using Pgbouncer.

Instagram sharded their data due to the volume of data they received (over 25 photos and 90 likes per second). They used code to map thousands of “logical” shards to a few physical shards.

An interesting challenge that Instagram faced and solved was generating chronologically sortable IDs. The chronologically sortable IDs they generated look like this:

  • 41 bits of time in milliseconds (provides for 41 year IDs and custom epochs)
  • 13 bits represent the logical shard ID
  • 10 bits represent an auto-incrementing sequence, modulo 1024. This means we can generate 1024 IDs per shard, per millisecond

Scenario recap: Thanks to the time-sortable IDs in Postgres, the application server has successfully received the latest relevant photo ID.

2. Photo storage: S3 and Cloudfront

Scenario recap: The application server then fetches the actual photos matching those photo IDs via a fast CDN link so they load quickly for the user.

Several terabytes of photos are stored in Amazon S3. These photos are quickly served to users using Amazon CloudFront.

3. Cache: Redis and Memcached

Scenario: To get user data from Postgres, the application server (Django) uses Redis to match the photo ID with the user ID.

Instagram uses Redis to store a mapping of about 300 million photos to the user ID that created them so it knows which shard to query when getting photos for the home feed, activity feed, etc. All Redis is stored in memory to reduce latency, and it is sharded across multiple machines.

Through some clever hashing, Instagram was able to store 300 million key mappings in less than 5 GB. This photoID to userID key-value mapping is needed in order to know which Postgres shard to query.

Scenario review: Thanks to efficient caching using Memcached, fetching user data from Postgres is fast because recent responses are cached.

For general caching, Instagram uses Memcached. They have 6 Memcached instances at the time. Memcached is relatively simple to layer on top of Django.

Interesting fact: Two years later, in 2013, Facebook published a landmark paper describing how they scaled Memcached to help them handle billions of requests per second.

The user can now see the home page with the latest photos of the people he follows.

picture

4. Master copy settings

Both Postgres and Redis run in a master-replica setup, using Amazon EBS (Elastic Block Store) snapshots to frequently back up the system.

6. Push Notifications and Asynchronous Tasks

Scenario Review: Now, suppose the user closes the app but then receives a push notification that a friend posted a photo.

This push notification, along with the billion-plus other push notifications Instagram has sent, was sent using pyapns, an open-source, universal Apple Push Notification Service (APNS) provider.

Scenario recap: The user loves the photo so much! So he decides to share it on Twitter.

On the backend, tasks are pushed into Gearman, a task queue that outsources work to more suitable machines. Instagram has about 200 Python workers using the Gearman task queue.

Gearman is used to perform multiple asynchronous tasks, such as pushing an activity (such as a newly posted photo) to all of a user's followers (this is called fanning out).

picture

7. Monitoring

Scenario recap: The Instagram app crashed due to a server error and sent an error response. Three Instagram engineers were immediately alerted.

Instagram uses Sentry, an open source Django application, to monitor Python errors in real time.

Munin is used to graph system-wide metrics and alert on anomalies. Instagram has a bunch of custom Munin plugins to track application-level metrics, such as photos posted per second.

Pingdom is used for external service monitoring, and PagerDuty is used to handle events and notifications.

8. Final Architecture Overview

picture

--postscript--

19 months after Instagram was released, the number of active users exceeded 50 million, and the number of active users reached 100 million, reaching 130 million in June 2012. On October 25 of the same year, Facebook acquired Instagram for a total of US$715 million, and founder Kevin received a return of US$400 million.

It is worth mentioning that Kevin is a self-taught programmer. With a background in management, he had a blank slate when he just graduated. When he was working in the marketing department of the social travel website Nextstop, Kevin began to take time out every night to teach himself programming.

Instagram’s success has not only created one of the greatest success stories in modern Silicon Valley, Kevin’s self-taught journey has also become a catalyst for developers’ passion for programming.

Reference Links:

https://instagram-engineering.com/what-powers-instagram-hundreds-of-instances-dozens-of-technologies-adf2e22da2ad

https://instagram-engineering.com/storing-hundreds-of-millions-of-simple-key-value-pairs-in-redis-1091ae80f74c

https://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram


<<:  Does iOS 17.1 consume more power? Here comes the battery life test!

>>:  Integrate UniLinks with Flutter (Android AppLinks + iOS UniversalLinks)

Recommend

Barcelona's CosmoBox Science Museum: Bringing in a Real Rainforest

The Cosmobox Science Museum in Barcelona opened t...

Hey! You "sons of the earth" in armor and helmets, you have good news again

The Chinese pangolin (Manis pentadactyla) is dist...

Windows 10 free upgrade! Can piracy be washed away?

At the launch of Windows 10, Microsoft announced ...

How to come up with a good public account name, just read this article

A long time ago, I saw a joke like this. The teac...

28 thoughts on marketing, operations, copywriting, and new media!

1 What is a brand? It seems that there are very f...

Who is to blame for the poor sales of consoles sold in China?

In the gaming world, Sony and Microsoft can be co...

Why does the Hammer phone have three physical buttons?

Smartphones are in great demand today, and all ma...

From 0 to 1, building an Internet operation analysis system

There are indicators but no system Numbers, no an...

Are niche milks reliable? Are they really nutritious or just a waste of money?

Are niche milks reliable? Are they really nutriti...

Hot on the Internet! What is this plant that can produce "gems"?

Recently, a plant called "Opal Berry" h...