Resolving Serverless Webpack Issues
A story about how I debugged an issue with our webpack bundles on a serverless infrastructure and the key takeaways about developing JavaScript systems.
The Problem
My team’s CircleCI deployments for our serverless stack began to fail due to a JavaScript out of memory issue during the webpack compilation stage of the process. The logs were outputting the following:
Specifically, we were consistently seeing issues with our user’s service which contains roughly 22 functions at present. After some investigation, I discovered that our bundles being uploaded to AWS’s Lambda (λ) were ~5MB/function. λ can easily handle bundles of this size, but these bundles were significantly larger than what we expected. After some research, I discovered that webpack might just need more memory allocated to the process to complete successfully. However, I didn’t believe our code bundles should be large enough to require this change and I wanted to make sure we weren’t sweeping a different issue under the rug.
Around the same time, I was also beginning to see issues with longer cold starts which raised an alarm of concern since I know bundle sizes are directly correlated to cold start time. This lead me down a path of investigating why our bundles were larger than our expectations and how to reduce those sizes.
Background
My team’s backend is structured as several microservices that are deployed as lambda stacks to AWS via the serverless framework for node. These services are structured to share a common lib that primarily includes database models and other reusable functionality in order to follow DRY and SOLID principles to create an extensible backend system.
As part of this, we developed a custom ORM for our database using ES6 classes that emulated the behaviors of libraries like Sequelize, Mongoose, or Rails’ ActiveRecord. This gave us classes with static members for record lookup and instance methods for entity queries and mutations.
We also created other classes around the codebase categorized into strategic types such as business logic, utilities, API wrappers, etc. All of these classes are imported/exported throughout the system using ES6 default exports. For instance, our UserModel file vaguely resembles:
As I’ll show you through this article, you’ll see that our decision to use ES6 classes and default exports created the severe issues that lead us to this investigation and fix.
Finding the root cause
1. Debugging the bundling step
I could see that our deployments were failing during the webpack bundling stage where it was running out of memory. However, I didn’t understand why it was running out of memory. I had recently configured our system to use the serverless webpack plugin to integrate webpack with the serverless frameworks bundling when shipping to AWS.
Serverless framework provides the command sls package
that packages all the functions and prepares them for deployment to your configured provider. This command is the first step that is typically run with the sls deploy
command which also handles deploying the code to infrastructure. As such, I decided to start my investigation here and ran the package
command locally to see what results it would produce. This produced 2 major findings:
- The aforementioned ~5MB bundles being uploaded to Lambda were actually the gzipped and compressed versions of the functions. I was reading the values from the AWS console and didn’t realize it was the gzipped value. This also meant our minified bundles of code were larger and I soon discovered that they were closer to ~7MB.
- The raw bundles were roughly ~16MB of unminified code! I don’t know exactly how much additional memory webpack requires to perform all its operations, but ~16MB of JavaScript definitely seemed like it would be a problem for a limited memory machine to accomplish.
Given all these findings, I decided to track down why the bundle sizes were so large. Our codebase isn’t that large and some of the functions in question definitely should have been significantly smaller than ~16MB.
2. Understanding Webpack
Per a recommendation from a colleague, I read through this article on Google’s Dev site for “Decrease Front-end Size”. Based on my initial discovery and re-reading our webpack config, a few things immediately stood out:
- We weren’t running minification at all.
- We were converting everything to commonjs.
- We are using large 3rd party packages, e.g. lodash and momentjs.
- We didn’t have tree shaking configured correctly.
These were all excellent candidates for further exploration but none explained how some of these functions were growing to be ~16MB. Hence, I needed delve deeper into how webpack’s bundling worked to understand the core issue.
3. Exploring the bundled code
To dig deeper into the bundles, I needed to understand what was going into them. I’m a visual person so I went for visualizations options and found these two popular packages for accomplishing this type of analysis:
These tools have their individual pros and cons, but I decided to go with the Webpack Vizualizer because I found its output to be easier to read.
I couldn’t run the sls package
command on our user service without disabling most of its function which was the main problem so instead, I decided to run the tool on our graphql service which was also starting to experience some issues. This required some configuration and I ended up making some of the webpack optimizations before I got here which reduced our bundles by a bit, but the data still gave me some immediate data points to dig into further:
- Our raw bundle was still too big at ~13+MB! 🤮
- 97% of our bundle came from our node_modules 😑
- The Twilio node library accounted for over 50% of our bundle at 7MB 🤯
Twilio being the main culprit was exceedingly shocking since I was expecting either lodash or moment to be the main culprits as this service uses both those libraries, but, more importantly, the service in question doesn’t even use the Twilio package. In fact, Twilio is only used in 2 specific lambas in the user service so it had no place being in this bundle whatsoever. So why was Twilio being included in this bundle?
I stumbled up on the WHYBUNDLED? library which provides tooling to investigate the outputs from webpack stats and provide a path explanation of why a particular package was included through dependency trees. I hit a small snag at this point because the serverless-webpack plugin doesn’t provide the same CLI optinos as webpack so getting the stats.json used by WHYBUNDLED? wasn’t possible. I ended up finding the Webpack Stats Plugin and was able to get the needed stats.json produced with some minor configuration changes. Finally, I could run WHYBUNDLED? against our service and got the following output:
Accordingly to WHYBUNDLED?, Twilio was being imported by the graphql main handler function. Looking at that file’s imports, I didn’t see Twilio being imported directly, but I did see that our UserModel was being imported which does include the Twilio package. This indicated that maybe there was a flaw with our usage of ES6 classes and default exports, but we needed to try a few simple tests first.
Solution Attempt #1: Patch it for later?
Like most startups, we didn’t really have time to rethink big architectural decisions we committed to months ago and wanted to avoid spending too much time trying to solve this problem. Additionally, we were fond of the architecture we were using and were happy to try and keep it. As such, I started by looking for some quick solutions to the problem. I went down a few different tracks and these were our results.
AdonisJS’s Configuration
A colleague told us about AdonisJS which is an MVC node framework trying to build smaller modular systems for deployment across different systems. Our colleague thought that AdonisJS might suffer from some of the same set backs we were seeing with our class structuring and might have a solution. After digging under the hood for a bit, I discovered AdonisJS is built using module exports but their ORM layer uses Lucid which appeared to be extremely similar to how our system worked. A Google search or two later investigating AdonisJS + Lambda, I found this forum discussion where they started to discuss the issue with deploying Lucid to Lambda which started to sound a lot similar to our issue. In short, AdonisJS suffers from similar problems and their configurations would not help solve our problem. 😢
Google’s Closure Compiler
I read somewhere that Google’s Closure Compiler has support for dead-class code shaking. There’s also a webpack plugin for using the Closure Compiler that I decided to at least try. I time boxed this experiment to one hour because of some of the complexities I read about getting the closure compiler to work. After about 30 minutes though, I discovered the install was going to take more than a few hours and a rewrite of a bunch of code to get working so abandoned this solution. 😬
TypeScript
After some research, I found that TypeScripts’ loader for webpack might support the static class member tree shaking I was looking for. Like the closure compiler, I time boxed this effort to an hour. After quickly installing TypeScript and the ts-loader for Webpack and getting our config setup with some minor adjustments, I ran the bundle and found we had the same issue afterwards. 😭
Solution Attempt #2: Quick Wins?
Unfortunately, finding a quick patch ended up being a long shot and didn’t work out. This left me with some of the more expensive solutions which were less than ideal. However, there were the findings from the original webpack investigation which seemed like some relatively quick wins that might solve the problem. The specific issues we wanted to solve were:
- Exporting everything as commonjs
- Large package misuse
- No minification
Exporting everything as commonjs
Due to the cold start issue, we had installed the Serverless Warmup Plugin to keep certain functions warm. However, during setup, we were having issues getting the code to correctly bundle and run. After some research, we found that by changing our bundle target to commonjs we could get things running again, so that’s what we did. However, the Google Dev article had the following warning:
This was directly regarding minification and tree-shaking. Compiling to commonjs made it impossible to use either feature which was a big blocker to solving this problem.
As it turns out, we had installed the warmup plugin incorrectly. Fixing its installation and switching us back to the auto targeting mode quickly resolved this problem. Unfortunately, this didn’t really do much to fix the problem directly, but it did shave a few hundred kilobytes from the bundle. The bigger win was that I could now turn on minification and tree-shaking again so it was a move in the right direction.
Large Package Misuse
This seemed like a pretty straight forward problem for me to solve. From the Google article and other sources, there were 2 quick things we could look into: lodash and moment.
Starting with lodash, I read how tree shaking didn’t really work unless you did specific things to lodash. Then I discovered that those articles didn’t apply to Lodash v4 using ES6 module import statements correctly which is what we were doing 😄, but if you import chain
anywhere, it pulls in the entirety of lodash because of how that function is written 😰. Turns out, we had a single instance of chain which I quickly replaced to alleviate this issue.
With moment, I read that by default it includes its locales which adds ~200KB of minified code to bundles. Reading further, I discovered we were already ignoring the moment locales correctly with our webpack configuration so this was a non-issue. 😅
With the two well-documented easier wins out of the way, this left Twilio’s node package. As it turns out, it was not tree-shakable and adds ~2MB of minified code to your bundles. At the time of this article, we’re still looking for a better solution to Twilio’s node package but found a way to limit its impact on our bundles.
No Minification
When we first started the project, we had installed serverless bundle to manage our webpack needs. It came out of the box with Webpack 4, which ships with Terser enabled. As our application matured, we discovered the need for new features that serverless bundle was unable to provide for us, so we converted our project to use serverless webpack instead. When we did the setup, we cut some corners and mostly copied over serverless bundle’s webpack config verbatim and then added some of the features we wanted. One of the bits we copied over was their optimization config. It has a note of “Don’t minimize in production. Large builds can run out of memory.” Clearly, we ignored this but didn’t understand why this feature would be disabled.
As this was is in direct conflict of the problem I was trying to solve, I needed to turn minification back on. I started by turning on webpack’s default minifier and running the bundle. Two minutes later, my system kernel panicked and crashed. I tried tweaking node’s memory allocation but that didn’t help either.
At this point, I noticed that we could try tweaking the minifier or switching out Terser with other alternatives. Playing with a few different configuration options, I was finding the same heap crashes were happening. At this point, I had exhausted the faster options and needed to delve into more extreme solutions.
Solution Attempt #3: ES6 Modules (FTW!)
The only thing I hadn’t tried yet was to convert all our classes to ES6 modules. I was avoiding this because of the sweeping changes required to make it work and the impact to our dev team’s day-to-day. But like I said, I had exhausted the other options and needed to try something.
Rather than just rewriting the entire codebase, I decided to identify high impact models that could definitively prove my hypothesis with the lowest effort. Based on the investigation so far, the UserModel was the best candidate as it was in most of our services and was the only file using the Twilio service that seemed to be causing our issues. I ran this test in two phases:
- Convert statics to export named functions
- Move functions to their own files for export
Phase 1
Given the earlier UserModel example, we made the following changes:
Running our analyzer, this yielded fantastic results as we saw our ~16MB become ~5.9MB unminified and saw webpack bundling again correctly. The new minified source was ~2.4MB. While this still isn’t fantastic, it was infinitely better than our ~16MB builds!
Phase 2
Given results of my Phase 1 test, I could have quit here and just called it the solution, but I decided to take it a step further and see if there was more benefit from creating proper module exports. So, I transformed things even further to be:
To our surprise, we saw our bundle size for user functions to drop even further to ~2.9MB raw and ~1.4M minified. Clearly, this was the optimal solution. We didn’t see the same gains in our graphql service, but there are other optimization options now that we have this information.
Solution
Given the results, ES6 modules perform better in a webpack environment and lead to smaller code bundles in large systems. I eventually finished converting the UserModel and its related dependencies to give us the optimizations, and my team committed to refactoring the rest of the system as we rebuilt features.
Conclusion
While our initial design of models with static methods was a good idea that is implemented throughout different frameworks and tools, it does not work at scale in serverless systems and should be avoided. Following ES6 module best practices regarding small exportable elements is the golden solution and should be followed when possible. We haven’t explored what TypeScript would do for our system or found solutions to our bigger package issues (currently Twilio and the monogdb node driver), but we now have a path forward that will allow to build at scale.
All code examples can be found at https://github.com/dustinsgoodman/serverless-webpack-optimization-article. It includes the HTML documents that contain the webpack visualizer tool results as well as stats.json files that can be used with WHYBUNDLED.
Credits
I’d like to thank the following people for their contributions in helping me find the solution I arrived at:
- Matt Van, Founder of Optic Power
- Scott Chen, Director of Engineering at Optic Power
- Nathan Welch, Director of Engineering at smash.gg