An overview of Big O Notation and an example at Amazon scale

In computer science, big-o-notationBig O NotationBig O is essentially an equation that describes how the runtime scales in relation to some input variables/data. Big O is not meant to be a measure of raw speed, rather a measure into how the performance scales in relation to the inputs it's given. Consider this example, where we have a function which checks every element of an input array looking for the target element. We can describe this as $O(N)$ where $N$ is the size of the array def contains(arr, target): for x in arr: if arr == tar describes how the runtime of an algorithm scales as it's input size gets bigger. This matters a lot, specially when you're dealing with the scale of a company like Amazon.

What is Big O Notation

big-o-notationBig O NotationBig O is essentially an equation that describes how the runtime scales in relation to some input variables/data. Big O is not meant to be a measure of raw speed, rather a measure into how the performance scales in relation to the inputs it's given. Consider this example, where we have a function which checks every element of an input array looking for the target element. We can describe this as $O(N)$ where $N$ is the size of the array def contains(arr, target): for x in arr: if arr == tar is a tool used in fields like mathematics and computer science to describe how a function scales according to it's input. In software development, it is normally used to describe how the runtime of an algorithm will scale as it's input size grows. Take the following function for example:

def printNumber(n: int):
    print(f"The number you entered is ${n}")

This function takes a number n as it's input and prints it to the screen. This algorithm runs in constant time, because while n grows in value approaching infinity, the runtime of this algorithm will stay the same, because it always makes the same amount of work. We can say this algorithm has a Time complexityTime complexityBig O is essentially an equation that describes how the runtime scales in relation to some input variables/data. Big O is not meant to be a measure of raw speed, rather a measure into how the performance scales in relation to the inputs it's given. Consider this example, where we have a function which checks every element of an input array looking for the target element. We can describe this as $O(N)$ where $N$ is the size of the array def contains(arr, target): for x in arr: if arr == tar of .

Now let's consider the following algorithm:

def printNumbers(n: int):
    for i in range(n):
        print(i)

This algorithm runs in linear time, because it's runtime increases linearly as n grows up to infinity. To put it simply, if it takes 1 ms to run this function when , it will take roughly 1000 ms to run when . We can say this algorithm has a time complexity of .

If we plot both a linear function and a constant function , there will always be a value from where .

338ea838-bac8-46c8-b0dd-5709c1311466.png

As you can see, no matter how separate the initial points of the functions are, eventually the value of , the constant function, will be lower than the value of , the linear function. Remember this functions are representing an algorithms runtime, so lower is better.

What does Amazon have to do with this?

Amazon is a Cloud Company first, with AWS accounting for most of it's revenue. They sometimes need to transfer huge amounts of data from a customer to it's servers, a process which you can imagine takes time. The more data there is, the more time it will take for it to be sent to Amazon's data-centers. When you're working with petabytes and petabytes of data, may still be too slow.

Amazon found a way around this problem, a way to transfer data at time! How did they do this? What was the clever algorithm they discovered to transfer data in constant time? The answer may be a little bit surprising: trucks.

AWS Snowball

Yes, you read right, trucks. Amazon sends a truck and physically loads the customer's data all at once. It takes the drives from the customer to it's data-centers with a service they call AWS Snowball, and it takes (roughly) the same amount of time regardless of how much data you move, aka .

I found this very clever and funny, moving the drives physically would have to be faster at some point, and Amazon has clearly found themeselves past the point in the previous graph, where physically moving the drives is now faster than transfering the data through the internet.

Disclaimer

This isn’t really , since you still have to move data from one drive to another even if they are right next to each other physically, but this is practically a lot faster than sending the data over the internet, and while technically still , it is still a big improvement for higher values of .