Principal Engineering at AWS Scale, with Nikki Pinski

Ep 9

May 17, 2023 • 47 min

0:00

ABOUT THIS EPISODE

Nikki Pinski, Principal Engineer at AWS, shares how AWS is organized for autonomy and customer-centric product development, dealing with millions of events per second, and how Staff and Principal Engineer career tracks work across different software organizations.

Download Audio

Get new episodes shipped to your ears

Apple

Spotify

Castro

Overcast

Pocket Casts

RSS

TRANSCRIPT

Allen: Welcome to It Shipped That Way, where we talk to product leaders about the lessons they’ve learned helping build great products and teams. I’m Allen Pike, we’re joined today by Nik Pinski. Nick is a Principal Engineer at AWS where he works in the Serverless Group at the world’s biggest cloud provider. Welcome Nick. Glad to have you on the show.

Nik: Thanks, Allen. I’m glad to be here.

Allen: Yeah, I’m excited to dig in. We’ve had a lot of folks on the managerial track talking about how to manage people and motivate their teams, but I am excited to dig in to talk about this staff plus engineering track folks that are in the leadership but not necessarily in the management side of things.

Nik: Totally. Soft leadership is its own type of both curse but also heaven in some ways, depending on the day.

Allen: Yeah, I think all kinds of leadership are a mix of curse and heaven depending on the day. We’ll dig into that. I’d also love to dig into also some of what makes AWS tick as an organization that almost all of us are probably familiar with in the way that it powers a lot of our lives today. But first, I want to give you a chance to give us a little bit of context, kind of a sound-bitey overview of how did you get to be a Principal Engineer at AWS? You probably didn’t just graduate high school and then, “All right, here’s the role.” What did that path look like for you?

Nik: No, in fact, when I think about my earlier career, I tell this openly to junior engineers who join, I don’t think I would’ve been able to succeed at a company like Amazon, or really anywhere in what we call big tech these days, when I was a junior engineer first coming out of either high school or university. I was a little bit adrift at that time, I knew I liked technology, I knew I liked geeking out on code and building things, but I wasn’t really sure if this was actually going to be the right thing for me and it took me some time before I figured it out. Like you, I’m a Vancouver based software engineer. I went to UBC. I wasn’t born in Vancouver, but my family immigrated to Canada when I was 10, so basically I’m from here. When I first graduated UBC with a computer science degree, I really wasn’t sure what I wanted out of a career or life. I guess, probably pretty common, and I actually ended up just hanging around university for another year, taking classes and thinking about grad school, and ran out of money, and got a job because my parent’s friend was hiring, and would end up being at the small Victoria based foreign currency exchange. Eventually, that actually ended up unlocking something in me and what it was, was I really enjoyed the process of how they converted business requirements to technology, and I found that that was actually my passion more than specifically the latest frameworks or the latest whatever. I just really liked solving problems ultimately, and what that was, whether that meant building something or not building something, I just really liked the process of that. That was sort of my first passion that I discovered, just the collaborative experience of domain-driven design and coming up with a ubiquitous language for a problem. I found that modeling a system, I found that stuff really interesting, and then I rose to ranks, that company was probably in the street for about seven or eight years before I realized I was just getting bored and was seeking a new challenge. That was around the same time that Amazon had just come to Vancouver. That was 2013, and at that point, I was still in the first hundred employees, I think, in Vancouver. Now we have over 3000 in Vancouver alone. What attracted me there was, at that point was like, “Well, yeah, this cloud thing, what’s that all about?” And I hear they’ve got some pretty intense problems of scale, and it was very different than what I was doing in my career at that point, but it seemed like a good challenge. I joined as an Intermediate Engineer, and I got promoted twice, so first to a Senior Engineer, and then eventually to a Principal Engineer, which I guess some companies call that staff, and then they have another level beyond staff called principal. So like in Google, staff, I think, is comparable to an Amazon principal. There’s always overlap, there’s some senior staff, it’s fuzzy. There’s a lot of overlap between these.

Allen: Yeah, it’s interesting. There’s a website called levels.fyi, people will debate about how accurate the information on that website is, but what it’s trying to do is trying to create a little bit of a conversion cheat-sheety thing where you’re trying to understand, okay, well I’m looking at this either a potential hire or a potential role, or trying to make sense of something that somebody said from this organization. And let’s say they’re at Amazon and they’re a Principal Engineer, and then this other person is at Apple and they’re ICT6. And it’s like, okay, and they don’t even have a name for it. It’s like, are these people relatively similar in seniority? Or this is a big difference here? And then there’s orgs that, like you say, will do staff and then principal, or sometimes they’ll go staff and senior staff and then principal, and then there’s in extreme cases, there’s distinguished engineers, which there might only be one in the entire organization and stuff like that.

Nik: I think we have 12 or something at Amazon, and I think more than half of them have Wikipedia pages, so.

Allen: Yeah.

Nik: While that is a sign of the path, towards send your principal beyond, a distinguished engineer, that’s not really one I’m expecting to reach myself. The Wikipedia prerequisite is not written down anywhere, but I do think it helps.

Allen: Yeah, it’s correlated, right?

Nik: Maybe a little.

Allen: You’ve invented something. Well, I’ve seen before actually career ladders that explicitly say, at this level, which is a distinguished engineer or whatever, you’re likely to have invented something that is meaningful industry-wide.

Nik: Right.

Allen: Which is a cool thing to aspire to, but it’s not one of those requirements, I think actually an interesting, and we can actually jump to this, was on the topics I wanted to get to, but one of the things that I think folks that have not worked in big tech or larger tech that has a formalized career ladder and rigorous thinking around this stuff might not be familiar with the term of a terminal level. Do you use that terminology at Amazon?

Nik: Yeah, we do. Although I don’t know if there’s a gentler term for it.

Allen: It sounds bad, right?

Nik: Sounds bad, right, yeah. When I joined 10 years ago, the terminal level was senior, which is junior engineers, intermediate engineers, and then senior engineers, so you had to get promoted twice. Now the terminal level got bumped down one, and I think it’s happened across big tech, that basically you can get promoted once, but after that… In big tech, the intermediate engineer used to be the role model for an intermediate engineer, and I’m not saying that just because that’s how I got hired, was you were a senior engineer at a smaller company.

Allen: And now you’re an intermediate engineer at a big-

Nik: Intermediate at a big company. And where things got interesting is that that probably worked for most of the 2000’s, and maybe the 2010’S. And then just in the last couple years, in the quote unquote, great resignation, of course this year is a little different, last two years, there’s just so many unicorns and kind of startups that could actually hire those folk and basically give them senior offers for more money and more responsibility. So, it became really hard for big tech to actually kind of down level the way that they historically have, where you basically take a person who’s already probably not writing code. I wasn’t writing code when I was started at Amazon. I was already a quote unquote tech lead, just doing code reviews and architectural design diagrams, not really writing code full-time except very rarely. And I came in, and suddenly my only job was to write code. And I was like, “Oh, this is actually kind of great.” I still love working with people like that who are overscaled for the job a little bit because it means you have good judgment and they can work autonomously and deliver a lot of great stuff. And if you can reach those people and say, hey, that’s the compelling rally proposition for you to come over, it can be an amazing match in terms of getting a lot of stuff done at high impact and a person who’s really satisfied with their job. But it’s impossible to hire those people right now.

Allen: Yes, and especially it was a year ago. One thing, before we lose the thread, I want to sort of just put a cap on this idea of the terminal level, or there’s probably nicer terms that some organizations call it, but generally when you’re building out these advanced career ladders, so it’s not just like, okay, we have junior, intermediate, senior, which is best practice now, it’s pretty standard in most larger companies or even medium sized companies now, it’s giving a career path for folks that isn’t into management. You can be a senior software engineer, and then it’s like, well where do you go from there? What are your options? And try to avoid this old pathology that the generation ago companies led to. Whereas, if you want to go past senior software engineer and you want to move in your career, you need to become a people manager, which you might not like or be good at, but that’s what you kind of have to do. And so there is now, this model is generally copied across most of the industry, where you now have three options. One is that once you get to the point where you’re a senior software engineer, or maybe you’re describing now in some works intermediate is this level, is like, you could just stop. You can keep doing a good job and have good performance, but you don’t have to keep getting promoted in order to stay on, which maybe sounds obvious, of course you don’t have to keep getting promoted. But that’s not true if you’re an intern, if you get brought in as an intern or junior engineer, and maybe this isn’t totally obvious to people are early in their career, I think most of our audience is in leadership path or track at least. But if you come in as a junior engineer and then you just stay as a junior engineer, you’re not really growing, you’re not getting any better, you’re eventually going to be asked to kind of ship up or shape out, at some point. And that’s true up until you get to, in any given organization, what they will call, or in any ladder, the terminal level, meaning you can optionally stay here and that’s okay. You’ve gotten to a level of seniority where you’re not necessarily, in order to continue in your career to stay employed here, you don’t have to move up to principal engineer in order to stay. You could stay as a senior engineer, and that’s great if that’s the level of work that you want to do.

Nik: Well, what I see a lot of people in the industry struggle with is this idea of, oh, I was told I had a great performance review, or maybe you had exceeded expectations, that means I’m ready for a promotion, right? And it turns out the answer is no because, at least again, it depends of course on the company and the amount of job levels they have and how rigorously they’re defined, but at the companies where there’s a fixed ladder, the expectations of each role are so different, it’s almost like a different job at each level, and exceeding expectations at one level does not mean that you’re ready for the next one. It’s entirely a different job. Now, if you want that different job, great, and you should talk to your manager and they probably will offer you the opportunity to start trying it, and then they get you promoted to that. But it doesn’t necessarily mean that, just because for example, you are the world’s best coder, that means now you’re going to be the next level up of engineer at any random company. And that’s always difficult, I think, especially for folks in our industry who generally are kind of more INTJ kind of focused on, okay, what are the milestones? Give me a checklist, I will meet the checklist and then I’ve done a good job, yes? And you will give me a new title and more money. It’s like, well, yeah, that’s how maybe it should work, but it doesn’t always, it doesn’t usually, I think.

Allen: That’s probably more true for junior engineer roles. Most of the time, most managers get pretty good at helping steer junior engineers into how do you become an intermediate engineer because that is the success path. If that’s not happening, then there’s something wrong. The reason you hire junior engineers is not to have junior engineers, it is to turn them into strong intermediate engineers, right? And so most of the time if you’re exceeding expectations of junior engineer, in most orgs you’re probably tracking towards getting promoted to intermediate engineer. Where it starts to become fuzzy is when you’re a senior engineer, and like you’re describing, I’d be interested actually for your recap, at least from memory of how AWS sort of describes it as different, but most orgs once you get past senior engineer, have a fairly different way of describing what does success look like for a staff plus level. And your senior engineer might be like, you write lots of really high quality code then, and then so you really exceeded the expectations of how much high quality code you would write. And like you were sort of alluding to, at staff level or principal level, it might be like you leveled up the code that entirely other parts of the organization maybe, you’re writing or things like that. So I don’t know, how does that work at AWS? What are the sort of things that the organization looks for? I mean, you’ve gone through this path long enough ago, that I assume you’re also involved in some of the decision making in helping other people move up this path. What are the key differences in what that track looks like when you get into that staff plus principal level?

Nik: It’s definitely a part of my job to help others kind of level up. One of my mentors in the past said that the best output of senior engineers is more senior engineers, you bring people up to that level and what’s the best way to force magnify it? And I feel the same way with the people that I work with. To some degree I try to mostly work with people at my level or above or if I’m talking about people below my level, I’m trying to share what the people above my level are sharing with me and kind of magnify. And also then to get them autonomous and bring them into those meetings, and bring those discussions, and start thinking about the way that you get that level. The AWS approach to this is hard to quantify because one of the things that has kept me at Amazon for 10 years is that very rarely has anyone concretely defined to me what my job description was. Obviously, I knew what products I was working on, and I knew that broadly speaking that we had customers and they had needs and we were working on those, and so that ultimately becomes requirements and that becomes software lines of code and things like that. But the exact spot where my participation or my role started and where it ended was always very fuzzy. When I came to Amazon, I came from a company that had QA and had Ops engineers that would deploy soft code to production. And I suddenly joined Amazon and I was on a team that didn’t have any of those and every engineer was responsible for designing, developing, testing and releasing production, all their software. And that’s sort of become more and more common, I think, in our industry now. And you hear about the DevOps or platform engineering, there’s a bunch of buzz words around it, but ultimately it means that there’s a perception, I think it’s become more ubiquitous, although maybe somebody is challenging it these days, it’s always hard to tell, that the people writing business logic code should also be the same people that understand how it’s deployed to production, even if their decision of how it gets deployed production is I’m going to delegate this to a managed service or some sort of operations to the cloud. You understand the constraints and limitations of your dependencies, and you’re not just writing code that input receives one object, output’s another one. Similar, on the role level itself, developers in much of the industry, or some of the industry are definitely, I feel like when I came out of university in the early 200’s, the feeling was, developers receive requirements, and the output code for QA for testing, and that was sort of the pipeline. And what I found is that as my own career progressed and my senior grew, my own involvement in software became earlier and earlier. On the career ladder, it talks about, for example, that intermediate engineers take well-defined requirements but maybe the architecture isn’t defined and they define the architecture, and then they implement. And then CNG engineers take maybe kind of undefined business problems that exist and come up with technical solutions for them. And then beyond that, you’re talking about business problems that you are identifying business problems that maybe nobody’s even asking for, maybe the customers are not even asking for yet and you’re coming to customers saying, “Hey, what would you say about,” blah. “Would that make your life easier?” And if they say yes, then you’re like, “Okay, let’s talk more. Let’s figure out what that looks like and then work with product managers and identify what that solution looks like.”

Allen: What does that relationship then look like? At least in your organization, because Amazon has literally one million in place. You can’t speak for all of Amazon. Or at least for the largest of AWS, which I understand you’re in the serverless. Actually, I’ll pop up one level, and is AWS organized into units? I assume it’s not just there’s the 150 different icons on the dashboard console, and so there’s 150 unrelated teams, there’s some sort of structure to it?

Nik: Definitely. Okay, I can walk you up the chain. My job is, I’m one of the two principal engineers on the AWS service called EventBridge, which used to be called CloudWatch Events, and it’s our event bus service. Most people listening are familiar with event buses, AWS has one. There’s also queuing services that are one-to-one, and topic services that are one-to-many, and event buses are many-to-many, that’s ours. And our kind of nearest neighbor is, we have an org called Events and Workflows, and our nearest neighbor in that org is a service called Step Functions, which is our AWS orchestration service. With event buses you can create pretty advanced choreography based architectures in the cloud, and with Step Functions you can create workflow services, and so choreography and orchestrations kind of go hand to hand, so we have an org around that. And then there’s a few peripherally related to them as well. There’s another service called Simple Workflow, which is actually, it was original workflow service that I actually, ironically, applied for, but during the interview, they’re like, “Actually, you’re going to work on a different team.” And I was like, “Okay.” That happened a lot 10 years ago, I think. And so yeah, we have this workflow and events, and all these services, we have also scheduling in this organization, so we have a serverless scheduler and that’s about, I would say, about 200 people, across these two. This is probably five or six services including our managed Apache Airflow Service, and an internal service called the Distributed Job Scheduler, which also does kind of fancy both scheduling on orchestration stuff. And then that exists inside a larger org, which we used to be a part of Serverless, with Lambda and a API gateway, kind of all those services that people typically gravity to when they first hear the word serverless. And now we’re also kind of nearest neighbors is also the containers orgs as well, like the Elastic Container Service, and the Kubernetes Service. And at this point now you can see the affinity starts to diverge a little bit, but there’s still a lot of common elements, especially in my service. What I really love about what I work on right now is that EventBridge, kind of it’s in the name, is a bridge service, it’s there to integrate other AWS services. So I talk to customers that use a lot of AWS, and they’re comfortable with using the right tool for the job, and they want the integration to be smooth and seamless and as easy as possible, and we help with that. But sometimes we integrate with AWS services whose customers only use that service, and they use nothing else. And there’s actually tension between the fact of what those customers that use a lot of AWS and a lot of services want versus customers that really just want, let’s say, AWS to orchestrate some of their container deployments and basically nothing else. And that comes up with some interesting challenges between how do you build something that works for both? Because a lot of times you just can’t, they just have conflicting needs. AWS is too big, and you can optimize for the largest customers, the enterprise customers that maybe spend many, many, many millions of dollars on the Cloud, versus the long tail of individual developers. There might be a hundred times more of them, but you only make a dollar a month from each of those people, and so the things that they need is some -.

Allen: I’ve seen that evolution in firsthand, or secondhand I guess, as a small company, for supporting a small company and we have needs and we’ll go in and which are like, “Oh, let’s configure this thing on the AWS IAM portal, which is their thing for authentication and permissions.” And it’s like this is 100 times more complicated than it needs to be for what we’re trying to do. Why is that? And you’re like, it’s an entire DSL for describing this stuff and we’re just, “We have three users, and what is going on?” And it’s because of exactly what you described, it’s just like, yeah, there’s a whole bunch of people that just want three things, but then General Electric has, however many tens of thousands of people that all have different needs and things like that. So when-

Nik: You’ve probably seen that curve of where, as you gain knowledge and you feel like you’re an expert, and then you encounter a certain threshold and you start dripping.

Allen: Yes.

Nik: Where your confidence suddenly plummets.

Allen: You feel like you have no less, yeah.

Nik: Yeah, I feel like that, actually, I had that sudden collapse in my own confidence only recently because I actually set out to become an identity in access management, that I am expert last year. I mean, expert in so far as for what I need to do my job, in context. And it made me realize just how much inconsistency and backwards compatibility issues exist. It’s a sort of microcosm of the Microsoft problem, where Windows has all these solutions to make sure the customers workloads keep running. We have the same in AWS. We are very, very vigilant about not breaking backwards compatibility. And that means that there’s often times when you want to introduce something that makes perfect sense, and feels like not a breaking change, but some very smart people that I now talk to regularly will explain to you just exactly why certain customers will have just a heart attack if you ever try to do them like this because it breaks all their mental models and integrations. It’s an interesting challenge about how to still ship anything knowing that this exists out there.

Allen: I remember how I’ve had a few times, arguments with people with ears, why does AWS, they’re nuts, always making all these new services. There’s like 10 new services every year announced, or probably way more than 10 new ones now, and some of them are like, “Okay, there’s AI now, and so we want to explore what we can do for AI,” or whatever. But some of it is like, okay, well, we have this legacy thing that is super flexible and can serve 10,000 different needs and it’s a huge octopus with a whole bunch of backwards compatibility, but some people have just a subset of those needs, and so trying to in some ways be competitive aren’t, tell me if I’m off base on this, but this is the sense I get from the outside, trying to be competitive enough to try and maybe serve some of those simpler needs in ways that don’t have to have all of the katulu level of complexity underneath the surface, to give some answer to the ever-growing industry of companies that just take, like Heroku is an age old version of this. It’s just take AWS and create a nice simple UI for people who have a simpler problem and don’t necessarily want to learn all the depth of AI and EventBridge and all those sort of things.

Nik: That’s definitely part of it. I mean, there’s definitely a subset of services that are like that. Light sale is one, and definitely a few services that you can think of as going up the abstraction stack, and remove complexity, and have sensible defaults that make sense. I would say the majority are not, we’re typically always very careful to find ways to introduce the simplification of the underlying service. It’s not always possible. I don’t know if you’re familiar with this, there’s a console called Cloud Development Kit, which is a client site solution for generating infrastructure as code, and there’s a lot of really nice things on the client side before it outputs the cloud formation, which is ultimately the languages of the AWS cloud, with the declarative infrastructure definition. And it’s somewhat controversial because you can be extremely efficient in it, but people wish that those efficiencies exist on the server side rather than being inside your client generated infrastructure. And I totally sympathize with those folk that want that. It’s not always possible. I will say though that the majority of services and products that AWS builds, I mean, not a majority, all of them are always based on a specific customer need. Some customer says, “I want to be able to do this, I don’t have a way to do that.” Sometimes those things are what you said, I don’t want it to be simpler. But a lot of times it actually does something unique, and people realize, oh, it doesn’t fit into any existing services, let’s build a new one. But what happens is, when you introduce that, as soon as you release things and people start using your software in production, it’s a little bit like the death of the author thing with novels and interpretation, you kind of lose your original intent for what it was built and designed for.

Allen: And then it’s what people use it, how they use it.

Nik: It’s how they use it, and so when they come to you the next and say, “Can you build blah?” And you say, “Well, that feature does make sense, but it doesn’t really belong in this product. You should use this other product instead, which already does that.” And they say, “No, I don’t want to use that.”

Allen: I’m already using this thing.

Nik: “It’s your product, I’m already using it. Why can’t you just do it for me?” And we hem and haw, and we talk about migrations and we talk about portability, but ultimately, at some point if enough customers ask for it, it’s going to get built, and it’s going to get used, and that then dilutes the very crisp product delineation between, this service is for this, and this service is for this. But, the customer is kind of right, it makes life harder of course for new customers coming on, if they’re trying to learn, and you can tell them, “Oh, we have these purpose-built features, databases for different use cases, or application integration services for queues and streams and topics and buses.” But when you start looking closer to the details, you’re like, “Well, wait a second, why can this service do blah, that really makes sense for all of them, but the other ones don’t do it.” And the answer is, “Well, their customers were more needy and more pushy and they asked for it.”

Allen: Or maybe there was more of them. I mean, that’s one of the things they-

Nik: Maybe more, absolutely, yeah.

Allen: When you’re looking at, and I’m sure you’re even more familiar with this than I am, being on the inside of it, but when you’re a customer and you’re looking at, okay, there’s 999 different services that AWS offers, and it looks like at least three of them might address this problem that we have. One of the things you’re trying to do, and you’re Googling, and talking to other people who build on AWS to try to build your mental model, of course the docs say, “This service is great and it can solve this problem.” But you also want to know, is this a popular service? Are other people trying to do what I’m trying to do? Because I’m trying to do X, and it looks like other people are doing things, but if they’re all trying to do Y, I’ll be the only person that’s putting in a support message, and of course I’m the small fry, and this startup, being like, “Hey, can you add support for,” whatever, and it’ll be in the bottom of the pile. Whereas if it’s something that everybody’s using, especially if a lot of people are using for the same thing as me, then I’m way more likely to be swimming with the tide.

Nik: That’s a legitimate problem. I do think that we are trying to get better at it, and some things that my service, EventBridge, tries to really internalize and help people with. We’ve recently launched this thing called pipes, which is a way to connect existing AWS event sources to AWS targets, and it was very much in this mental model of, you should be able to have a common tool that you can use whenever you’re trying to just consume events from, let’s say, SQS or Kinesis, or any source, you want to send it somewhere, you should be able to reach for this generic thing. It won’t be able to solve all your problems but it’s a good starting point for most problems. But to tie back to something you asked earlier, the reason this happens is a lot of people will criticizes AWS for this. I think it’s legitimate, that there’s so much inconsistency and so many services, but it’s very much a feature of the intentional bottoms up organizational design where each of those teams have a lot of autonomy. Each of these services is its own standalone business. They have a general manager who owns the profit and loss of what they try to do. And of course, obviously, we’re a huge behemoth as well, and we have a central marketing teams and all that other stuff, but ultimately there’s very, very little top down force guidance of what you have to build. There’s actually a small list of every service must launch with blah, and that small list has definitely grown over the years, but it probably still, if somebody saw the internal list of what is mandatory requirements for new service launches, it’s still probably much smaller than people realize. And there’s still so much autonomy for each individual team to make decisions on what they believe is the best for this domain, this context, their customers. And what’s attractive about that is that typically that process of being deeply connected to the customers who are trying to ask for it, means that every time we build something it tends to get pretty fast adoption, which, for me, is extremely addicting. My last three, maybe four projects, before I came to Amazon over the last three, four years in the industry were projects almost a whole year that either got canceled close to lunch, or launched and then went nowhere. I found that very dissatisfying, writing code that no one would then run and use, and I haven’t had that problem since I came to Amazon and that’s been very, very rewarding for me, and I hope others too, their work on this stuff, because, I think, what’s the point of doing anything if it doesn’t have an impact? You’re doing it for a purpose. The output isn’t the code, the output is the problem solving you’re giving to somebody.

Allen: Yeah, that customer orientation goes back to one of the famous things it’ll tell in the lore about Amazon as a company is a customer obsession. And you hear often about that in the retail side of the business, which I understand is very separate from the AWS stuff. But, it’s interesting to me to what degree that the org structure of AWS has been set up for success at the product per product level. Obviously, as we were talking about, it has some weird side effects in terms of the overall strategy of all AWS. You end up with the old way, that the console used to be that you open it up and there’s 999 icons, right?

Nik: Right.

Allen: And so it has some trade offs, but that you end up with this tight loop where you are talking to customers, and I say you, not just your org, but Nik is talking to customers on a regular basis. I don’t know how many times a month or whatever, a week or how you quantify how often you’re talking to customers, but it sounds like you’re regularly doing it, and then that’s driving this feature. Obviously, there’s a bit of a dynamic where you’re trying to balance between this one really big customer that hits a bit of a whale and you’re like, “I definitely want to serve them.” But then also you’re presumably or maybe describing a little detail, you’re having folks either on the product management side, or maybe you’re doing this, synthesizing. We also have all these smaller users, and in aggregate, what are their needs?

Nik: Absolutely. The answer to your question, how often is peace and famine, right? It’s kind of like the rest of the role, there are different parts of the year depending on what’s going on. If there’s a new project and we’re trying to get clarity on it, I might be talking to customers multiple times a week, multiples a day, trying to get a clear answer on some specific differentiator, and then there’ll be gaps. I’m just working heads down with a team on something, and then maybe getting feedback again near launch. You mentioned the whales versus the small customers, so when my service, EventBridge, launched, it was launched as CloudWatch Events. And CloudWatch is AWS’s kind of observability service. It’s got metrics and logs and events, and so these are things happening in your account that you are observing. And so when it was launched as that it was a really good way to demonstrate the value of event buses, like you have an event bus, and whether you do anything or not there, AWS is publishing events into your event bus about everything going on everywhere else. And you can choose to match those events and do something about them and then call other AWS services. So, when that’s happening, these things are happening in your account, whether you do anything about them or not, it sort of made sense that these should be free. And so, even to this day, I don’t know what public numbers I’m allowed to speak about, but the majority of the stuff that happens inside EventBridge is free, because it’s AWS events, and somebody’s matching them the same without AWS. So, I can probably say, we do millions of events per second that we receive across AWS, and most of that we don’t charge for. Customers love that and they’re happy with that. And as the customers actually build full, let’s say, organizational event buses, and have dozens of teams where they’re all publishing events to that bus, and then a dozen of other teams are receiving those events, and that, obviously, now we do charge for that, but these are your own custom events, but the majority of traffic is AWS itself.

Allen: Yeah, that’s an interesting bootstrapping approach, I guess. It’s one of the things like once get to AWS’s scale that you can find opportunities like that, like, “Okay, we have this new service and it can process events, we’re just going to wire it up with a bunch of events that you might find useful so that you don’t just have an empty dashboard.”

Nik: Right. It also is an interesting challenge in terms of scaling, and like I said, we have millions of events that happen across all US accounts-

Allen: Per second.

Nik: Per second. That’s pretty inefficient to accept all that, and then if nobody’s matching them, then just could drop them. So we spend a lot of time trying to figure out how we can go upstream to the hundreds of AWS services that publishes events, and reduce the amount of work that is being done throughout, which of course, bucks against certain, let’s say, best practice conventions for these things typically. You want to do, for example, typically if you’re going to do, let’s say cashing, you try to do that on the service side and not pushing to the client because that has its own risks of drift, and just all sorts of problems. But we have to, because otherwise we’re going to be handling a gargantuan amount of traffic and then not doing anything with it.

Allen: Yeah, that makes a lot of sense. But it is interesting to me, these organizational boundary things that people don’t think of. People think of AWS as if it’s this monolithic thing, where it’s very, very much not, which may makes sense, but you can’t build that size monolithically.

Nik: Yeah, a lot of AWS services that people use and not might realize it’s probably, well again, certainly not the biggest ones that people of course recognize, but if for example, AWS launches a new service and you see an announcement, that’s probably one team, a team of about 10 people, that owns the whole thing. The whole thing, console, the service side, all the integrations with the client side tooling. Which is why it’s also something that is funny for me and I get a little bit defensive, so there’s a couple of folks on the internet who, may be lesser these days, but there was definitely a peak when there’s a lot of snark online of tearing down new service launches. And I was like, “I get it. AWS is a huge trillion dollar, Amazon is a trillion dollar company. People think they’re punching up.” But what ends up happening is that four junior engineers who worked on that feature are super upset because some dude with a huge Twitter following is dunking on what he believes is a trillion dollar company, but really, four junior engineers in a room are tending for the whole thing.

Allen: That’s one of those things, that you work with people who do work in big tech and you build way more empathy, I think, that it is a whole bunch of individual people all trying to do well, and not just Apple is just a sphere that just makes decisions and commutes them into the world.

Nik: Right. I mean, that’s the thing. I mean, I’m not getting any sympathy, but I’m certainly not asking for it. But it is interesting to always remember that in a general sense. I’m guilty of the same thing whenever I think of others.

Allen: I mean, it’s also good to still keep a critical eye, just try to be a little bit kind about it. Before we run out of time, I want to loop back to this, where we started the conversation, because I think it’s super value to people thinking about career paths. Especially in 2023, there’s a lot of people who are at crossroads either because they’re seeing some of the changes happening in the industry. There’s a lot of companies that are growing, there’s a lot of companies that are shrinking, and people are thinking like, “Okay, in a leadership path in terms of my career, but how do I think about either on my teams or on my own career path, this path of how we’re orienting to managerial roles versus these staff and principal engineer roles.” And so I’d be interested, I mean, I’ve got lots of questions I could ask on that, but your take on this thing, we were talking a little bit before the recording started, on this idea of staff archetypes or principal engineer archetypes. And this is something, there’s a website, staffeng.com, which I can link in the show notes that people may have seen before, where somebody’s gone and done a bunch of research trying to build an understanding of, okay, what does staff plus engineering role look like across different organizations and what are some of the common themes? And they sort of built this idea of archetypes. Most junior engineers, maybe, and most orgs have relatively similar set of responsibilities and expectations, but once you get into the staff principal levels, your job could be a totally unrelated job to someone else’s, where there’s almost no overlap of what you do.

Nik: Even within the same company, for sure. Yeah.

Allen: Yeah. You’re talking to customers constantly, there’s probably other principal engineers at Amazon, or maybe not at Amazon, but certainly at some companies, where, certainly at Apple, I’m sure there’s lots of principal engineers that talk to customers. I’d be curious on your take on, to what degree you empathize within your role, and within the org of these archetypes, which I’ll just kind of list out the way that they describe them. They describe, there’s the idea of a tech lead, which is this staff precedent role where they’re basically empowering a team, so there’s a team and then your job is to just make sure that that team is doing well, executing well. There’s the architect, which has a larger area, that there’s a whole bunch of teams, they have responsibility for direction, quality and approach within some technical area. There’s the solver, that goes around and comes into wherever there’s a fire and they help put up that fire, or solve a really tricky problem. They’re the sort of person that might get pulled in where it’s like, “Oh, we have 1% of launches are crashing, and we can’t reproduce it and we have no logs. We’ve got to have a solver to solve this problem.” And then the last one they had was the right hand, which is the idea of there’s an executive who is trying to execute across a wide area, and then you pair them up with a principal engineer that can go in and solve problems specifically that is in the sort of purview of that executive, which may be in any of those three categories. I’m curious, A, does that resonate with what you have seen? And B, do you see yourself in any of those categories?

Nik: It does. It definitely resonates. Of course, they say any senior or staff or principal engineer’s, favorite phrase is, “It depends.”

Allen: Yeah.

Nik: And this is very much one of those. It depends on the time year and the project. We actually do have a principal roles framework inside Amazon, inside AWS for sure, that was first popularized by S3, which is the oldest AWS service, and probably the biggest and probably the most well known, with it’s 11 nines durability. And so there’s probably no one person that knows how the whole thing works. And so-

Allen: I’m just going to say that again because it’s a known thing that people talk about, but 11 nines durability, and then sometimes you’ll hear that and it just washes over you. It’s like, what does that actually mean? It’s like 99.9 more times.

Nik: It’s an insane amount of durability, but ultimately it also is at the scale of the service, it also means that basically, in theory, data is lost every single day-

Allen: Well, yeah, sure, at that scale.

Nik: The way that they have, they discussed that, and I think the leader of the team might have even done a reinvent talk about it, is that you have roles on different projects, and there’s roles like guide, or sponsor, or catalyst, and they sort of vaguely map to some of what you’re talking about here. And so I think the reality of the role is that you are almost never going to be, again, there’s a threshold, depending on where those levels line up for different teams. I’ve been a tech lead, I’ve been an architect, I’ve been a solver, and at this point now I would say I gravity between these depending on the project that I’m on. I’ve also, I guess, been a right hand, I would say probably, although not for as long of a period. Yeah, I mean it’s a cop out, but the reality is that this kind of depends on the needs of the organization, and the needs of your customers, and of course, your personal interests. Because you don’t get to this level of seniority without having some agency over your responsibility. If the organization’s asking something of you that does not align with your interest values, you’re probably not going to be super successful at it. For me, how I measure my motivation is less in terms of specific output, but more in terms of impact. And the best way I can have a large impact kind of depends on the project and the situation. Sometimes it might mean just sitting down with a team and saying, “Hey, we have this problem. Let’s work together, let me try to help you debug it.” Sometimes it’s sitting down with an architectural design document saying, “I think you have a lot of ambiguity here, but I think we should go forward with this even though there’s a certain amount of risk, let’s start it. Let’s prototype it. I’m taking on the responsibility of the decision to go forward despite the ambiguity, and I’ll own the repercussions if we’re wrong, but ultimately I think this is the right path, let’s get going with it.” Sometimes it might mean talking to a senior manager or senior leader, even a director above, and convincing them of a certain path. Or, sitting down with two of them and saying, “Hey, look, we have this problem. We have this inconsistency or this lack of alignment of mental models that exists between these two teams. Let me write a document that explains it, and let’s both review it so we’re all on the same page, and then that can disseminate through the two different organizations.” And that might not be even that it’ll start from technical perspectives, but ultimately you’re trying to influence high level leaders because then they will affect their roadmaps and their prioritization. It kind of depends on what’s the most impactful thing you can do and where the complexity lies, right? Sometimes it’s a question of ambiguity at the tech level, sometimes an ambiguity at product level. There’s almost always something to do with ambiguity.

Allen: Yeah, that’s one of the things that as you move through your career, I think a lot of people, most people realize this explicitly, although some people just kind of know it without really having thought about it explicitly, is that one of the biggest inputs to how far are you on your career, or how far are you on this ladder in terms of seniority, principal, whether it’s principal engineer or you’re a senior engineer or director or whatever, is how ambiguous your job is in terms of what are you even supposed to be doing? How are you even spending your time? What even is impact, and what output should you be pushing for, is less and less clear as you move up. Not quite by definition, but certainly by strong correlation.

Nik: Everyone probably knows that when they start a new job, there’s a lot of confusion, but then the goal at that point is you disambiguate that and then figure out how to do it. And what I’ve found is attractive for myself is always to be at the edge of that, is to go to the next thing. As soon as I get good at anything, I start finding it boring. It’s probably not a healthy trait.

Allen: It has pros and cons.

Nik: Yeah.

Allen: I have some empathy with that mindset, for better or for worse. But I think you hit on something there that I think is a really valuable leadership skill, you didn’t phrase it this way, but this being comfortable with being uncomfortable.

Nik: Yes.

Allen: And there’s a certain habit that most people earlier in their career have, and some people, being uncomfortable and working with ambiguity is not for everyone. It’s not necessarily pleasant, even when you are comfortable with it or it’s the way that you like to work. And so, when we talk about back to this thing of the terminal level, and you get to, maybe you’re a senior engineer and you’re working on problems that are, there’s some ambiguity necessarily, but it’s not like you’re going off and talking to customers every day about what would they even want and totally brainstorming or dealing with 10 different teams that all disagree with each other. Maybe you’re actually taking tickets from a bug tracking system and fixing them. There’s so much value in having folks in the team that will move towards where there’s ambiguity, that will see this thing is not well-defined and there’s struggles happening, and it seems to be because these two people don’t have a shared mental model of how this works, or there doesn’t seem to be agreement on what success would even look like, or it’s not clear whether or not we even can move the needle on this thing, or whatever, those kind of things. And people whose attention drifts towards that are way more likely to have an impact than the people who see, oh, well, it looks like there’s a mess over there, so I’m just going to avoid it and try to pretend it’s not there, which is what you’re your junior folks will do, but.

Nik: Yeah, I think that at some point you realize that there’s just class of problems that just don’t have standard, well-defined solutions, let’s say. Right? Everything’s a trade-off, and you’re trying to find a compromise between different conflicting needs, and some compromises work, even if nobody’s happy, that infamous compromise. And other times, there isn’t a compromise, you have to pick a decision. You have to pick one and not the other, and accept the trade-offs. And I know I’m being vague, but that’s sort of, I think a lot of life is like that, isn’t it? It’s not just our industry. It’s not just our jobs. It’s humanity actually, right?

Allen: Being a human being, ambiguous things, where it’s not clear what the right thing to is, but we try to pick one, as good as we can.

Nik: I mean, I do think about these things as I get older, about how humanity has gotten this far in terms of how we developed anything, really. Technological progress, societal progress, and when you’re in the microcosm of seeing how the sausage is made of any progress, now it’s about AI, or political advancements, or progressive advancements. It’s kind of a miracle that we got this far to begin with, pretty humbling to think about that it’s just done by people like us.

Allen: Well, that’s the humbling thing, and almost disturbing thing. I think of the first time someone works in a big company like Amazon or Apple, it’s like, “Oh, it’s just a bunch of people, just trying to do stuff. I thought there was some magic here that you would get in and you’d be given like, oh no, here’s the way to make perfect decisions and make sure that… We have just a way better system for everything than anywhere else has.” It’s like, “No, it’s just a bunch of people, trying-“

Nik: I’ll tell you little anecdote, when I was first promoted, I got access to the principal engineer slack room. It just so happened, obviously newly promoted, you have a ton of imposter syndrome about it.

Allen: Yes.

Nik: And it just so happened that week, there was already a discussion about imposter syndrome. One of the people that commented about their imposter syndrome was James Gosling, who’s a distinguished engineer at Amazon, and creative of Java, was freely talking about their imposter syndrome and the things that they struggled with, and that was pretty eye-opening for me.

Allen: Did that make you feel more or less like an imposter? If you were like, well, I guess maybe we’re all on the same footing, or did that make you feel like, oh man, if he doesn’t feel like he belongs here, that I really don’t belong here.

Nik: I have just enough confidence that it was the first one, and maybe realized that it doesn’t necessarily have to be associated with any reason, and I was able to kind of ignore the second voice that was saying the second thing.

Allen: Oh, that’s probably for the best. It’s been awesome chatting. I think that folks will find it useful thinking about some of these paths and how these roles can come together. Is there anywhere people can go to find you online or read or follow your work?

Nik: Not really. I’m trying to quit Twitter right now, so I’m not going to direct people there. I am on Blue Sky, but it’s still a closed community. At one point I might set up a little, I do have a vanity domain name, but there’s nothing on there right now, so I won’t share it. But yeah, definitely, if you’re curious about the work that I do, just check out EventBridge, or the whole area of AWS, EventBridge, pipes, buses, schedules, it’s all good stuff. I’m pretty proud of it, and I’ve been working it for the last few years.

Allen: Awesome. Thanks Nik. Thanks for being on the show.

Nik: Thanks, Allen.

Allen: It Shipped That Way is brought to you by Steamclock Software. If you’re a growing business and your customers need a really nice mobile app, get in touch with Steamclock. And that’s it for today, you can give us feedback, you can rate the show by going to itshipped.fm, or you can give us feedback on Twitter, or on Masterdon. And until next time, keep shipping.

NEXT EPISODE

Building Developer Experiences at Slack, with Steve Gill

PREVIOUS EPISODE

Scaling a UX Org from 30 to 450, with Shopify’s former Head of Design Kyle Peatt

Supported by:

Hosted by Allen Pike. New episodes every 3 weeks.