In my last post, I wrote about my top ten announcements (very briefly). But there is one more thing I found very important. I didn’t put it in the top 10, though. Why?
Well, there are several reasons.
- SnapStart itself is not new. We’ve had this functionality in Java for about a year.
- This is another improvement to Lambda functions that minimizes cold starts.
I think these two are the most important.
Before we discuss SnapStart, let’s talk about why AWS provides us with this feature.
Everyone who knows anything about AWS Lambda (and serverless computing in general) understands the concept of cold start. But let’s explain it more deeply and thoroughly.
There is no magic. Serverless means nothing more than that you are not the one managing the server. But there are servers. To run your code, AWS uses Firecracker virtual machines. It is a solution built by AWS and uses KVM to run microVM (how they call it). These microVMs are very lightweight virtual machines that run on Firecracer (actually a virtualization technology).
In fact, the virtual machine is lightweight and therefore able to provide the required performance and speed to fulfill your requests. But no matter what, you can’t beat physics, at least today. The virtual machine takes some time to start.
When Firecracker runs your VM, it prepares the environment for you. Install all dependencies of the runtime, download the code, and finally start the runtime. These elements are part of a cold start managed by AWS.
Another cold start is upon us. It depends on how our code is written, how many dependencies we load, and how we initialize elements in our code.
So, we had two cold starts. One is managed by us, which we can try to reduce, and the other is managed by AWS, which is beyond our capabilities.
As we can see in the image above, a cold start can take a long time. When a customer tries to perform something on our serverless application, it takes a long time for him 🙂 The second part of the picture shows what we call a warm start. This occurs when the previous execution instance of the microVM has completed its work and is free to perform another task. MicroVM remains in this state for a few minutes, at which point it receives the first incoming request and processes it without any preparation.
It’s not part of this article, but if you want to reuse hot functions, remember to cache, store data, etc. 🙂
So, SnapStart.
…or, well, not yet.
There is another factor that is important in the case we are exploring today.
If you look closely at the above image, you will see that the execution time of the Lambda handler is shorter on a warm boot than on a cold boot. It makes perfect sense, at least in some languages like .Net.
Disclaimer: I’m not a developer. I don’t know about .Net. This article is not an analysis of .Net Lambda performance, just my test of SnapStart. I want to clarify something. Clear? 😉
.Net uses dynamic compilation, which means that part of the code is only compiled when it is executed. As far as I know, this behavior can be changed, but I haven’t tried it. So my tests only use standard methods.
In other words, you run your function, the function calculates something, performs many operations, and when it gets to the point where it has to save the stuff in a DynamoDB table, this part (mod, or whatever we call it) will be compiled dynamically.
What’s the meaning…
Yes, you are right. Let’s call it the third type of cold start:) At this time, compilation takes time. More or less, but it takes time. We’ll see it on screen soon.
This information is important for our further exploration.
at last! Let’s talk about SnapStart.
During re:Invent 2024, AWS announced SnapStart for .Net and Python. SnapStart for Java is available for approximately one year.
How does it work?
When you deploy a new feature (or updated feature), AWS takes a snapshot of the microVM. This snapshot is taken after all elements included in the cold start process have been completed. Every time the function is called and no hot instance is available, the function restores from the snapshot and all elements are ready.
Who already sees the trap here? 🙂
An important prerequisite is to enable versioning for the Lambda function. Without it, SnapStart cannot be enabled.
What’s the price? SnapStart itself is free. However, this is not the case for snapshot storage and restore transfers. Pricing is on the AWS page and you can easily calculate how much you’ll need to pay.
With SnapStart, you must remember to store and cache data during the snapshot process. You must ensure that no sensitive information is stored.
It’s worth remembering that the time it takes to restore a snapshot depends on many things. One of them is the memory size you configure for Lambda.
Enough of the theory, let’s see what it looks like in practice.
I started experimenting with dotnet6. I must admit, I didn’t check which runtimes are available for SnapStart. It turns out that dotnet6 doesn’t have that, you have to use dotnet8. Anyway, I tested this version to see a “clean” cold start and measured it.
The image above shows two executions of my Lambda. It’s easy to see a cold start. Yes, this is unacceptable. Let’s look at the traces.
What can we see here? I think the second diagram will help to understand the runtime better.
Initialization
– This is our cold start. It took almost half a second. Is it a lot or not? Well, for me – it does. But there is something else more interesting. Do you remember that I wrote about dynamic compilation?
It’s there. Compare execution times Invocation
. The cold function takes 14 seconds (!!!) and the cold function takes less than 300 milliseconds.
This is unacceptable. Fortunately, we have a solution.
Before we begin, you must know that you cannot enable SnapStart for this runtime. I talked about it a few paragraphs ago.
Well, what I did wasn’t a perfect test, but I didn’t care. Our goal is to check out SnapStart and that’s it. So I changed the runtime to dotnet8 and Copilot had a lot of trouble getting the code usable again. I already mentioned that I don’t know dotnet, but I have to clearly show Copilot what the error is before it can fix it.
Anyway, the code can be found in this repository. I’ve directed you to the tag v1.0 where you can find dotnet8 code prepared using the Ready To Run feature. This is very important.
Prepare to run preparation code to significantly reduce the time required to launch the code. However, it doesn’t solve all the problems of dynamic compilation, but it helps a lot.
Let’s look at the picture:
We can see improvements across all executions. Hot functions are faster than before, but most importantly, cold functions are much faster:
Although, we are seeing longer initialization times!
How is the execution?
This time it came so fast! The entire execution took 55 milliseconds!
Yes, I know my Powertools tool is missing a lot in this version and I forgot to add it, but, it doesn’t change the conclusion!
Now it’s time to enable SnapStart. To do this we need to make some changes to the SAM template. We need to enable versioning for the Lambda function. The code is below v2.0 tag.
This template supports version control and also supports SnapStart.
If you want to do this manually, you need to go to the configuration options.
Click Edit
exist General configuration
and change SnapStart to None
arrive PublishedVersions
.
Let’s run our function!
I made sure to execute the function from a cold state and I saw this:
We see clearly, no Init
stage, it is replaced by Restore
. This means functionality has been restored and no initialization is required. The whole function is a bit long (don’t worry, we’ll come back to it), and the duration of Restore is very similar to Init. This needs some explanation because it’s very important and I’m Conlusions
part.
For the record, here’s the execution of the hot function:
There is no history here and the results are exactly what we expected.
Why don’t I care about longer execution times?
It’s very simple. One execution doesn’t really measure. In the next section, you’ll see the effect of doing it multiple times.
Next experiment
I run some “load tests”. 200 executions, concurrency is 25. What is the average?
Average cold start duration: 2.75s
Average hot start duration: 0.07s
Times are good. Especially when the function is hot.
Let me provide more numbers:
Cold start:
Maximum execution time: 8.5s (!!!) I’m sure it’s just an accident, but… it raises the average a bit.
Maximum repeatable execution time: 2.74s (2.5s without 8s long run)
Minimum execution time: 2.15
As we can see, these times are very close to each other (except one :))
Warm start:
Maximum execution time: 0.18s
Minimum execution time: 0.027
Of course, there is also API Gateway to consider. But like I said, I didn’t do perfect performance testing 🙂
SnapStart works. That’s for sure. Now, you may ask, what are the benefits? I didn’t show anything, did I? Well, consider this:
I asked other community builders and what I heard confirmed what I thought. If you check the code, you’ll see that there’s nothing to do. It’s very simple. Collects records from DynamoDB, increments a counter and stores the value back to DynamoDB. that’s all.
This is why SnapStart failed to reach its full potential. The time required for all initialization is not long enough to justify using snapStart. Moreover, I am using 512M memory. If I test 2048M, the snapshot recovery time will be twice as long.
This means that SnapStart’s simple features are unnecessary and will add to your bill.
It’s time to test something a little more complex, I’m already planning to make it bigger and do more things. This should indicate an increase in SnapStart’s effectiveness. I will post the second part later, so stay tuned!