Deep Dive into Open Policy Authorization Layer (OPAL)
Damian Schenkelman: Welcome to another episode of Authorization in Software, where we dive deep into tooling standards and best practices for software authorization. My name is Damian Schenkelman. Today, we're diving deep into the open policy authorization layer AKA Opal with Gabriel Manor, Head of DevRel and Growth at Permit. io. Hey, Gabriel. It's great to have you here.
Gabriel Manor: Hey, Damian. That's actually great to be here. Super excited to speak about authorization as every day.
Damian Schenkelman: Yeah, I always look forward to chatting with folks that do this, let's say, for a living. Could you give our listeners a bit of overview of your background so they learn more about who is our guest today?
Gabriel Manor: Definitely. So, my name is Gabriel. For the last, I'm not counting years anymore, I'm doing software engineering. What I'm counting is my years in authorization, particularly authorization in the field of access control. So, in 2015, I started to work at Cisco in a product called Cisco ISE, which is like a server to authenticate an authorized user to the network, which is rare today, because today, we are using zero trust. So, we authenticate an authorized user in application instead of in the network, but that was a thing there. I worked there for five years. Then I moved to Palo Alto Network, also work around security operation control rooms, and then most of the stuff was about broken access control. For the last couple of months, I am working at Permit. io, a startup that bring authorization for application developers as a service, which is a trend growing now, but now I'm doing it from the go- to market perspective. I'm the DevRel there, so I have the feeling of the user and I'm really happy to help actually developers implement better access control and particularly authorization into their application.
Damian Schenkelman: How do you go from those positions, again, that were hands- on to a startup and doing head of growth and DevRel? How did that transition happen?
Gabriel Manor: So let's say from 2018 when I was back in Cisco, I participate in hackathon and this is when we started to see everything regard zero trust. So, we have huge customers doing network auth and they have thousands of switches that deploys in many sites. All the authentication was with network credentials, but then they started to use the cloud. I mean it sounds late, but this is how things happened in Cisco and huge enterprises. They started to use cloud and then they come and say, "Hey, I need to authorize my user based on cloud data, maybe even serverless stuff." We don't have a way to do it because we like a server that sit together with network equipment. So, my idea was let's create cloud extensions for authorization. So, when we do authorization decision, instead of just doing it in the local setup of the authorization server, we can go to the cloud. So, I developed something like this in the hackathon and then it was a huge success and then I promoted to do a technical leadership in CTO offices. This is where I started to meet users. So, since then, I'm not working on traditional R& D teams. I'm working more on CTO offices and more innovation teams. For me, the next step was like do more of the business, do more of the PN. DevRel is the nature place for me currently because I can start every week a new software project. I can see tons of types of architectures and deployments and even experience code and also experience this manner of go- to market and do the business stuff.
Damian Schenkelman: That's very interesting, very interesting journey. It all started from starting to take authorization logic away from the networking layer and back into the application layer, not relying on the network and network membership to make authorization decisions, which is a typical thing that as you say, as you move to the cloud, you start seeing, but again, some enterprises don't do it or not all components do it and it takes a while for these things to happen. The topic of today is OPAL, the open policy authorization layer. I thought that again, that's a good start because it leads us in that direction. What's a technique that you can use to extract authorization logic from different components into a set of policies and then deploy those in your code, make decisions based on software? Before we dive deep into OPAL, I thought we can maybe do a high level overview of policies and attribute based access control. This is a topic that we discuss a lot on the podcast, but it's always good to do a quick refresher. So, in your own words, what's an authorization policy? How are they related to roles to attribute based access control for people that maybe are new to this and aren't experts in the topic?
Gabriel Manor: Yeah, definitely. I always love to start with the basic, understand what we're trying to account. We'll get to it later, but I think this is one of the most challenging for developer that try to do authorization. So, there is a whole topic of access control and application. So, we have application, we have their data, we have operation that we need to do on the data. We have a lot of stuff that can be done through our application and then we control the access to the application and we always like to stay in the comfort zone of authentication. Only verify who our user is. But in the real world, we want also to maybe categorize the user or some users can do something like this and some user can do something else. This is where policies come to place. The most abstract manner of policy is a simple text sentence like, " Is a monkey allowed to eat a banana?" So we try to understand who is a monkey and we try to categorize the action that they can do like eating. Then we have resources like banana and then we have a lot of policies that we try to implement. By the way, this is the way I recommend developers to start. Think about authorizations, thinking your own word, define the policies in your own word. Every application of course has a different set of policies, but we need to streamline them, right? Software is a structured thing. When we go to streamline, we go to policy models. You mentioned RBAC and ABAC. So, RBAC in my opinion is the most common permission model because that's the simplest one. We're taking the users or the principles that do something in the application and we categorize them by rules. The most common one is like let's say we have admins and we have startup users and then we get a list of all the actions or the operation that our users can do and then the resources. Let's say documents, pictures, whatever it is. Then we streamline the policies. So, now it's not, " Is a monkey allowed to eat a banana?" Is an animal allowed to eat food? This is a way that we streamline our policy in a way that we can have better authorization, better permission in the application.
Damian Schenkelman: Sorry to interrupt but I think this is an important word. When you say streamline, what do you mean? Is it in comparison to something else?
Gabriel Manor: So let's say when you do write policies, like let's say in free language, you write always sentences or statements that sounds the same. There is a principle or user. There is action or operation and there is resource, okay? When you implement it in software, when you want to have policy rules that actually software can use it, you need to streamline each of the principles into something that software can understand. So, the easiest way to do it is using roles and types of resources because this is something we can configure. We can configure roles for our users. We can say these user are admins, these user are managers, these user are just users. We can categorize resources because this is our data structure. We have a database, no matter if SQL, now SQL, all the data is structured in resource type. So, we have document. This is why when we're modeling permissions, the easiest way to track and maintain is having the simplest categorization, roles and resource type. This is actually where all developers start. They do RBAC. So, the most common term in authorization is let's implement RBAC, let's do RBAC, when we will support RBAC. This is the most common model. In the real world, that's not enough. So, let's say for example, we have a lot of work to assign roles to the users, but then we have more aspects. So, let's say we have paid viewers. So, our users are admins, but we have premium admins and we have standard admins. Sometimes we need to make decisions based on maybe key location, maybe the first letter on their name. Also, in the resources, it's not enough to have only resource type, because we have sensitive document and then we have document. If we tried to keep strict ARBA all the way, then we need to create a whole let's say database table just to keep a special type of data for authorization. So, then we go ABAC, attribute based access control. Instead of streamlining our policy and roles as categories and resource type of categories, we are looking deeper on the principles. We are looking at the user attributes. We are looking at the resource attributes and then we write our policy in attributes manner. So, if we're again speaking about monkey and banana, we can say animal from species and food from species by the attributes. So, we can say it could be yellow food because we know the attribute is there.
Damian Schenkelman: Yeah, that's an interesting thing you mentioned in terms of evolution. As the application, as the users gets more sophisticated, you start needing to do more granular things and that's where something like role based access control where you have a few roles and very coarse- grained actions that they can take on or resources or even categories of resources might not be enough. As you need to go more granular, policies become viable option for that granular authorization decision making. The thing I wanted to ask is what's the typical evolution in this path that an application follows, right? I guess not everyone starts using policies. So, when do they decide to start using them? How do they take it always and adapt it to use policies? What challenges do you see teams face when they start doing that?
Gabriel Manor: From what I see, the first mistake happen on team when they start to design authorization, they are falling into the implementation details of the particular language or framework that they are using and specifically the term RBAC. So, one of the things that I dislike in authorization, I don't want to use that word because RBAC is great and I'm probably using it for a lot of stuff, but the way that development teams, instead of implementing authorization or defining permissions, they look at the framework they use. Let's say they use fast API or express GS and then they look on the RBAC framework that's giving them the basic functionality they need and they're implementing it. A week or a month or a year after, they understand that it's just not standing. The reason could be a lot of reasons, but the main reason is bugs. The bugs usually happen when policies start to be mixed with the code. So, you started with the framework and probably our audience is developers. So, let's talk about the most common way of RBAC framework. It has decorators or annotators or API endpoint that declared the role that actually allowed to get into this endpoint, right? This is actually really simple, but then someone needs something that's considered ABAC or they need multiple roles or they need to get a decision in the middle of the endpoint itself and then they start to create if statement that mixed with the code and then someone get access to somewhere they not should get to. When they look at the code, they see this code. Usually, the problem not started with the framework because the framework aims to be simple. The complex start when they are understanding that instead of writing application, they are writing permissions and then they try to decouple policies from code. There is some approaches today to decouple policies from code. One of them is the one that we are going to speak a lot about today, policy as code, declaring policy rules instead of imperative coding them inside the application, declare them out of the application and only get decisions based on them or only enforce the authorization based on them in the application. But there is also some approaches and we can expand on it later like policies graph or maybe policy that modeled on databases, which is a narrow thing, but we saw some implementation of it. So, after they are doing the mistake of implementation details, instead of designing permissions, they are moving to policy as code. Policy as code, the main benefit again is the decoupling of policy from code. So, now the company can have one place where all the policy is declared. If I want to see the configuration, I know that instead of just guessing what happened in the application code, I can go to the policy's code, the declaration of the policy and understand what is the permission. Also, when I need to manage it, I have one place to do it. When I need to version it, I can couple and decouple it from the application release cycle. So, in one hand, they can version my policy configuration because now it's code and I can use Git or other version control tool. But in the other end, I'm not dependent on application release cycle. So, let's say if I release a new feature, I not must couple the policy configuration to it and I can change the policy just in time without delivering a new policy code. That also helped them then to scale for testing the policy themself and then focusing the application only in the enforcement. So, the way that we are enforcing feature on the application, the way that we are not allowing user with particular attribute to access to document's particular attribute, the way that we enforce it is not related to the way that we actually declare it. The declaration and the decision in one thing and the feature that enforce it is something that should be decoupled from it. This is where we see the exponential growth of policy as code in application. So, after they do the mistake, after they suffered of authorization coupled together with application code, they move to more mature models like policy as code and then declaring policies and manage them separately of the application itself.
Damian Schenkelman: That makes sense. I like how you went from you start with doing the authorization in your framework, which typically is you get a role or something in the token, maybe it's a JWT that comes into the endpoint and you take the scopes from it and then you use some logic to match that with what you can do or not with the endpoint. But over time, that doesn't scale. As you say, if you need to apply authorization decisions at some other points of your code base, then you don't have that middleware to make that decision. So, you start extracting this authorization logic into policies and it makes sense because you deduplicate it and it's in one place. But there are a couple of things to unpack here. On the one hand, these policies need to be written in a language and that's like a new thing for the developers to learn and then this language needs a runtime or some way in which you can run product and put in that language to get to authorization decisions. How does that typically work?
Gabriel Manor: So the way that policy as code work has two essentials component. One is the language itself and the second is let's say the runtime, right? So we need to run this language somewhere. The runtime usually refers as a policy engine, right? An engine that can get decisions. The policy itself is like the configuration for this runtime. So, if we take an example of ABAC policy, we think that a paid user, it has an attribute that they are a paid user, is allowed to create a document with an attribute of rich text. Okay? Let's say that we allowed only paid users to create rich text documents and then the way that we model it traditionally, we already explained why it's hard, but what if we can declare it in a very strict way? So let's speak for example on Cedar, the policy language that crafted by AWS now and released as open source. So, instead of modeling it in database, we are declaring a statement. This statement said permit. Okay, let's think about it like a function, permit, get three argument, user, action, and resource. Then we can declare which of the attributes of those principles are actually permitted. So, we're declaring like a function that's called permit with the relevant argument to the policy we want to declare and et cetera. Then we have the runtime, the runtime know to get this declaration and then you can query the runtime with the data. So, I am passing the runtime, the argument, and by the declaration, the runtime get a decision if a permission is allowed or not allowed. This way, we don't need to model our data. We can for example decouple the state of the data and the way that we are modeling the data in the application from what we need in the policy config.
Damian Schenkelman: I see what you're saying where again you have the language and you have these policy files and you need to version them. You need to learn how to write in that language. For example, you mentioned Cedar, which means that as a developer, instead of writing JavaScript in my JS application, in my Node. js application, I need to figure out how to write Cedar. I need to also feed out a way to communicate with this runtime that's making these authorization decisions via either interprocess communication, if it's available. If it's in another machine, I need to do it via some networking protocol. It seems that we're going from a world where the challenges the developer has are at the authorization logic level, figuring out who has access to what. We're moving them somewhere else more at the operational level and also the knowledge level. I need to learn a bunch of components and then I need to operate them in order to get a solution to work.
Gabriel Manor: Yeah, that's correct. But when looking at operation perspective, this is actually the way software goes today. So, what we are trying to do is having developer develop and deliver much more feature that's important to our business logic. The way we are seeing it happen everywhere is having them worry, let's call it worry and then we explain how they did not really need to worry for it for the operations, because operations happened once. Coupling logic is something that actually break you to deliver new business functionality. So, if I need to create more bugs in my code and new code mean more bugs. If I need to couple it to my releasing cycles, that mean I don't deliver a new functionality. But if at one point of time, I need to spin up a new service, I need to set up a new service, that mean for the long term, it'll be much easier for me to maintain the configuration. Correct, it's not the language that I'm writing code today but who's writing one language today, right? Developers are polyglot. I'm myself coding in Golang and JavaScript all the time. All the friends that I know as developers, they are not like Java developer anymore or JavaScript developers anymore. We are polyglot and the worry of a new language, the less worry I'm worrying about, particularly because it's a very simple language and something that everyone can implement, but the benefits it brings in separating the concern of getting authorization decision. The way that you manage it in a different lifecycle that help you deliver safely business value to the product while you maintain great access control in the other end, that's a benefit that's in my opinion worthy.
Damian Schenkelman: I think there's a trade- off here and it's important to know that there's a trade-off. I do still think that it's a good trade- off to make. I think in general, eventually, if I was figuring out how to build one of these apps, I would need to figure out how to decouple authorization from my code logic, again, especially as things scale, as I have more teams working on things, and as things need to be audited and there's compliance needs, security needs, and so on. But this is important, right? You are adding pieces to a potentially distributed system and you are having people learn that they need expertise in a language that ultimately is going to be used to make authorization decisions, which means that they need to be competent enough to know that the policies that they're putting in place actually do what they want them to do for all possible inputs. Again, that's definitely a challenge that's worth considering. Now, with all that being said, I know that there's a set of tools that helps with some of these challenges, of managing these policies, of making sure that operations and the policy deployment works and versioning. I know there's Amazon verified permissions for Cedar, which you mentioned earlier that are Styra DAS and OPAL for OPA. What's this category of solutions and what does it helps teams building with policies too?
Gabriel Manor: Yeah. So, as we mentioned, there is the operation trade- off and this is where those tools trying to come into the picture and solve these operational problems. So, I do have the components, right? I do have the language and the file where I store the policy. I do have the policy engines, but now I left with the operational part of deploying those engines, managing the versioning of the policy configuration. One thing that we haven't speak about before is the data itself, right? When I'm getting a policy decision, when I get authorization decision, there is data that need to take into consideration. Let's say the attribute of the users, the attribute of the resources, et cetera, et cetera. You mentioned three tools and that's actually great, because each one of them is representing a different trend in this policy as code trends. Amazon verified permissions is a tool that's offered by a cloud provider for developers that based their application on AWS and they want to get this deployment of policy architecture together with their application. So, AWS like offering a service that help you deploy and manage policy engines together with your Cedar policies that you configure in AWS itself. That's actually a great solution for AWS developers that are trying to add policy as code authorization into application. Styra DAS in the other end is very, very focused on the infrastructure side, which we don't mention much here, but authorization also has an aspect in service to service communication, which microservice can get to which microservice and why and in what conditions, et cetera. So, this is where Styra DAS comes into the place as a solution particularly for cloud native architectures where developer want to manage that mission and infrastructure stage of authorization. It helped them to manage the data. It helped them to manage the policy configuration and it helped them to manage the deployment of the policy engines. OPAL is an open source tool that the main benefit you can get from it is the independencies of policy engine, of policy technology. You mentioned that we have a challenge, the challenge of learning new things, the challenge of learning new language maybe, learning new models. This challenge is actually growing, because at some point, some developers understand that the language that they use, the policy engine that they use is actually not fit into their product. We can get to it later, but policy languages that are different from each other, not only in how the language look like but in what policy they can handle and regarding let's say latency and so on and so forth. So, OPAL is an open source tools that help you first managing all the policy engines but you're not limited to one policy engine like open policy agent or Cedar core or whatever policy engine in it. You're also not limited with the policy coders you're using. So, you can use Cedar for something and you can use Rego which is the language of open policy agent for something else or whatever language you want. So, OPAL is very pluggable for everything related to policy config, to policy engine, but not less important, it's also pluggable for the data that you need to get the policy decision. So, looking at the other tools, verified permission is very close to the data that you hold in AWS to old architecture that you manage your data in AWS. Styra DAS is more for infrastructure level. Application has many, many data sources. This is one of the things that we see growing that if we think even five or six years earlier, application has one database. This is where my data was all the time. One of the things before the NoSQL hype was a term of one source of truth. Today, we don't have one source of truth, we have data everywhere. We have data in multiple types of database of our application. We have data that you consume directly from other cloud services and OPAL is a plugin system that help you manage all the data you need for policy decisions in one place. So, think about it like a tool and we can dive later for the architecture itself and how it work, but think about it like a tool with single lines of configuration code like Helm charts or Docker Compose configuration. Spin up a whole authorization system that can be pluggable into any policy config, any policy engine, and any data that you need to get this decision. So, all the operation trade- off, of spinning off engines, connecting your data is actually happened once in a very cloud native way.
Damian Schenkelman: I get what you're saying. So, you're saying OPAL obstructs a couple of things. On the one hand, it helps abstract operations, takes care of it. So, that if you're in a cloud native environment, in Kubernetes, the agents are available, data is made available to them as well. But on the other hand, it also obstructs the runtime. So, if you're using Cedar policies or if you're using Rego policies, OPAL does not care. Maybe we can go a bit backwards in time and start from, " What's the history of OPAL? How did it get started and how did it go down this road? How did it make these decisions?"
Gabriel Manor: Yeah, so OPAL actually started when Or Weis and Asaf Cohen, Permit founders, created a company. So, they decided to do authorization as a service for application and the right thing to do was of course based on policy as code. So, if you want to do something as a service, the best practice is to do what you'll recommend your users. So, they choose back then, it was almost two years ago, to choose back then to use open policy agent. But as I mentioned, open policy agent is very oriented for infrastructure level of authorization. When you need to stand in a velocity of application in the speed of syncing policy configuration, in fetching data from multiple endpoint, in scaling your decision point everywhere, that's where open policy agent lacks. So, they decided to create administration layer for open policy agent that will help them to sync all the policies. Then the data is growing and they understood that they need more types of policies. So, specifically, they will need in implementation of Google Zanzibar. We haven't mentioned Google Zanzibar yet, we can go back to it later, but that's actually another type of policy implementation. Then they need to add to the software that they worked support in another let's say policy system and then they edit and then they saw they build a tool that can potentially help a lot of developers out there that want to move to policy as code, policies graph, not policy that's imperative in application code to manage their authorization workload. Then they decided to release OPAL to be an open source and instead of developing what's right for Permit, which everyone that develop commercial software know that limit yourself for your need, it's something that can cost later. So, they decided to open source and see the part of the community that already deal with the complexity of managing policy as code to add support in more engines. This is actually what we see. We see the community adopt it. We see that more and more contribution coming in.
Damian Schenkelman: It seems it started with the notion of eating your own dog food, right? I have a problem, I need to solve it. Also, in your case, you were not just working on your internal way of managing these policies but also solving authorization externally. What's the story here? How it's deployed, maybe at Permit, how it started, and then how can people deploy OPAL in the wild? Is it just for cloud native environments? Can I deploy it if I'm running a few machines on GCP or EC2? How does it work?
Gabriel Manor: Yeah, so OPAL as a nice server and client architecture is not aimed to be cloud native. It could sit everywhere and it could actually fit to every modern application. The server responsibility and we already mentioned the component of the modern policy system, the server responsibility is for the policy configuration. So, the server is like GitHub has enabled tools that can be connected to Git policy configuration as code and make sure it's always synced. All the versioning is in place. It also has architecture to connect it to API bundle. So, the servers, think about it like the control plane part that's responsible for the policy configuration and for let's say data configuration. The client is in the other hand the data plane side. So, it's like containers that you can scale wherever you want. So, you mentioned you have few machines on GCP. You can run each client as a sidecar to each machine or you can run it as a sidecar for your pods or in Kubernetes or whatever it is. The client is, think about it, not real but stateless containers that wrapping the policy engines. Again, no matter what the policy engine it is, wrapping it in an obstruction layer for getting decision and with engines, we call them data features that know to keep the data always sync with the decision making and it's also of course connected to the server. So, the server know how to scale them and the server also know to sync all the policy with those clients. So, for now, you can actually just take OPAL, deploy it on your environment, and you can plug the policy engines to it. So, for now, OPAL already support open policy agent and see their agent, because this is actually what we see the real need for the market currently. But we're really looking for the community to add support on day one. I can say from Permit perspective that we develop the support of the engine and release them open source as we need them. This is actually the reason we lately release support in Cedar agent.
Damian Schenkelman: I was going to ask about that. We did an episode in this season, season two of the podcast with Emina from the Cedar team. We talked about how they currently open sources. This was early May 2023. Why did the OPAL team decide to add Cedar support? What were you seeing?
Gabriel Manor: So that's actually a great question because there are policy engines out there. As Permit, we are our engine, kidding, right? Our internal engine is add support from what we believe the market will adopt. We do have our internal, let's say, commercial implementation of the custom thing we do with OPAL. When we think of something to be released as open source, it's because the trend we see coming, the hype we see coming. When a tech giant like AWS come and say to developers, " Look, people, you need to stop using comparative policy code in your application. You need to move into the declarative policy as code way," this is something we feel OPAL need to support. So, we want to demonstrate how OPAL as an open source tool can make developers polyglot. In our perspective, this is actually a flow of users that decided to do policy as code, started with let's say Rego, because it was great for infrastructure. Now, they are looking for more application level policy language. The nature move will be to Cedar. There are application level policy languages out there, but none of them came as a result of a company, of cloud provider that actually knows how application developer work. This is why we feel like the nature addition to OPAL as an administration layer for application level policy particularly to supporting this language.
Damian Schenkelman: That makes sense. It seems it's both an addition to a now flexibility and choice but also more I would say" strategic bet" on hey, if Cedar is pushing for this, a few folks might become experts in this. We should have support in it for the open source project and it also helps exercise that structural layer for the runtime on the language, I guess, right?
Gabriel Manor: Correct. So, we partnered with AWS even before they release it. We get early access to the code and to the product and we actually also released an open source project called Cedar Agent, which is like the equivalent for Cedar but for open policy agents. So, Cedar itself is actually a Rust package. When you want to do an abstract API calls to it, you need to run piece of Rust code. We created an open source project called Cedar Agent that actually let developer run the abstracted engine of Cedar and hence OPAL can use it too.
Damian Schenkelman: That makes sense and it's always nice to partner when doing some of these launches to gauge how the open source community will use some of these components and figure out how everything works. Let's go at a higher level. Who is using OPAL today and what are you seeing it being used for?
Gabriel Manor: So that's a good question because I have names that I'm really impressed of that use OPAL. I can mention Tesla, Microsoft, inaudible, Accenture, Zapier, really good software organizations that using actually OPAL. One of the thing that I like them to see doing with OPAL is the use cases. So, you see a large organization that decided to streamline their infrastructure policy or their IT policies, which is like let's say all the domain of policy configuration. Then when they tried to take it to a modern environment, they tackled the scaling. So, they started to use OPAL or for example, you can see organization that decided to go with let's say open policy agent itself and then they understood that the data cost a lot into open policy agent and they need a better way to manage the data that open policy agent used and then they started to use OPAL. So, the way of using OPAL is for me and the reason I'm so enjoying OPAL is that we see that we are really solving challenges when users go to scale. So, the saying that OPAL is the administration layer that help you scale policy as code is something that we see almost in every organization or developer that start using it.
Damian Schenkelman: What challenges or maybe got you mistakes are you seeing from some of these teams when they start to implement OPAL? Maybe one or two things that listeners can say, " Oh, I should avoid this."
Gabriel Manor: That's a question that connect me to the point that we spoke before, a new learning curve that need to make for policy as code. One of the common got you is the mistakes that user do regarding control plane and data plane. So, for example, they are mixing configuration that should came as a pure data in the policy config and then they are getting harder to deliver policy on time. So, instead of configuring abstract policies and let the data on real time being processed by the policy engine, they are declaring the data on the policy itself. This is learning curve that in general, in policy as code, developers need to train for how I think about policy as configuration and decision as the data plane itself that need to be done.
Damian Schenkelman: I get what you're saying. Could you maybe provide an example so it's clear in people's head but particularly think if someone hasn't ever written a policy and fed data to it?
Gabriel Manor: Exactly. So, one of the things that specifically open policy agents support is actually doing stuff during the policy running, let's say do a network call and this is actually taking time. Time means the decision could take longer. When you do the right thing, I'm not saying that you should never do policy call as part of your decision- making, but sorry, inaudible while you do decision making. But the point is when you manage it right, you want to take this data at the point of time that is relevant. So, users making their policy decisions take a long time and then they come and say, " Hey, we see that the decision is taking two or three second and we want to get sub- 10 millisecond." Then by a simple rethinking of the policy itself as the more abstraction level of policy declaration. So, instead of modeling the data inside the policy or trying to get this data as part of the policy, they are now separating the concern between the control plane, the policy configuration itself and the data itself that happened in the data plane. When I get a decision, they get the right time for decision.
Damian Schenkelman: I see what you mean. In that case, that example would mean try to think about the policy just from a data perspective regardless of location and don't think about how to get that data into the policy layer, delegate that to another component. Is that what you're going for?
Gabriel Manor: So specifically in OPAL, you can configure data features and the data feature. OPAL has a smart mechanism, how to manage the data the decision makers save, right? So let's say for example, you have a billing system and you want to know now that the user paid for something. So, instead of doing it as part of the policy configuration, so configure the policy to do some call, you're doing it as part of your data feature. So, when the policy engine need to get a decision, the policy itself is very abstract. Is that paid users or non- paid users? Then the data layer, this is the layer that actually knows how to manage these slice of data, how to manage this connection of data. So, we see this for example that again, we can get to it later, but let's say graph based policies. So, we want to get decision based on relationship between entities and this require a lot of data to be in the policy layer themselves, because we want to know the connection, the relationship between entities and then people try to declare the policy itself. So, they say, " Go to some endpoint check for relationship and stuff like that" and then they are losing the world of getting policy fast and keeping the right data in the other end. So, this way of thinking that we are now declaring and configuring code while the decision itself happen in real time and need to get the right data independently of what happened in the policy configuration that a way of thinking that we see a lot of glitches.
Damian Schenkelman: It seems the things that you're mentioning allude to the fact that authorization decisions or at least teams and users want authorization decisions to be fast, because an authorization decision happens whenever you're interacting with their application. It's for every let's say request or there might be multiple per request and it usually happens in tributed systems. It seems state and managing state required for these authorization decisions is a big part of it. So, if I'm running a policy for example and the policy takes two milliseconds, but before I run the policy, I need to go fetch data from a database and that call takes half a second, then even if the policy is very fast itself, the authorization decision as a whole might be slow. This is where again, the notion of a fetch share comes in. So, let's start there maybe. What problems were you seeing in the wild, and how does OPAL help to manage this state?
Gabriel Manor: So one of the things that we see, you're speaking about problems and I want to start from expanding this landscape of problems. So, when we configure policy as code, we have configuration that let's say very complex ABAC configuration. We want to get a decision based on conditions on three or four attributes of the users. So, let's say the account age is larger than 100 days and they live in Europe and they are for example paid user in a particular tier and so on and so forth. Then we also want to have a lot of decision making on the resource itself. So, it should be document that's stored there and belongs there and et cetera, et cetera. So, we have a big policy. To make this big policy, complex policy to run fast, we need to have, let's call it, small chunk of data. The more data, the more data fetching, the more data processing, the more data handling that we'll need to have when we get a decision, make this policy decision longer. You can see it actually in policy engines based on policy as code, and let's call it like the stateless policy as code. When they are getting a lot of data, they start to get decision slow. This is how that happened. One of the thing in OPAL that actually even differentiated from other abstraction layers, other administration layers of policy engines is the data filter architecture. The data filter architecture is first pluggable. You can plug in any kind of data fetcher. So, think about it like fetching HTP data, fetching GRPC data, fetching Postgres database data, fetching whatever data that you need to fetch. You can write a plugin that do the right thing for you. So, you mentioned before, not in this context, but you mentioned before the connectivity between stuff. So, you can in the data fetcher have data fetcher for one client, one engine that can go to some data source and another client that can go to another data source. So, you can plug and play this configuration of data fetching for each client that first help you to make sure that your client get the real data that they need. On top of that, OPAL has a nice slicing and eventually consistent mechanism to make sure that only the data that the client need, only the data that you configured is getting actually into the policy engines. Not only that, it has, as I mentioned, eventually consistent mechanism that the policy engine could know that now something changed and then I have policy to decide. I can say, " I deny everything until something changed." I can say, " I can't trust on the previous configuration until something changed, but for sure I know that something changed now." Then OPAL actually helping you in scaling the fast of these policy decisions. I'm saying traditionally. It's not traditionally, right? It's all a new topic. But let's say if let's say in a traditionally policy agent, you need to fetch the whole data because this is the way you should work. OPAL is actually with the data fetcher architecture let you a very sophisticated way to manage the state that your data needs to get now. So, you not only configured the policy itself, you not only configure the way you declare your policy, you also configure the way that they are consuming data. This is one of the thing that OPAL itself, even it is not a policy engine, helping you get policy decisions much faster.
Damian Schenkelman: It seems then that there are a couple of things here. The data fetcher allows you to connect to different components, get data before a policy actually runs. There's probably something there. How does it know what data to fetch before the authorization decision needs to be made? Maybe we can dig into that. Then there's another part which is it also has a mechanism or at least maybe you have to implement mechanisms by which you can check if data has become stale or data has changed, so that policies can become aware of that. Maybe there's hey, is this data recent or not and give an escape hatch. He, if this data has changed since the last update on I guess any memory cache or something like that, then go down this route and maybe deny decisions until we have the latest? How does the fetcher know what data to fetch and what data not to fetch? Because ultimately, I guess here, there are cost reasons around minimizing networking costs and not doing unnecessary things, particularly if you're going to external data sources. If you're running on a cloud provider and you need to go fetch data from somewhere else, that might cost you a bit that are in memory resource reasons. Otherwise, you're going to have your agent take up lots of resources depending on the data structures that you use. Your local searches might be slower. How does that work?
Gabriel Manor: So that's actually connected in my mind to the previous question about got yous. When speaking about data, there is, as you mentioned, baseline data, data that actually are application based on let's say resource type, something that almost never happened, almost never changed. There is data like say real time data and OPAL actually let you separate that. So, you have baseline data which is the data that you load your clients with it, but there is even a level of granularity on it, because you can actually load different clients with different baseline of data. So, for example, if you have microservice architecture, you can design that different engines on different microservices. It has different sets of baseline data. This way, you make sure that each one get the slice of the right data, that they can get the right decisions based on. On top of baseline data, you have real- time data. Here, again, you have this granularity, you have data fetcher. Let's say you can use polling, you can use pushing, you can use bidirectional RPCs. Everything is actually configured in the data fetcher level, in configuration of the data fetcher level, and in the configuration of the deployment. In this way, you can make sure that the right policy engines, the right client that ran policy agent for a purpose, for a microservice, for the whole system is run only with the right data that they need at one point of time.
Damian Schenkelman: As I hear this, it makes sense. I also think that it seems that as a team or as a company using OPAL, this is what I would spend a lot of my time like fine- tuning this fetcher, making sure they work the right way. Does OPAL come with some set of default fetchers or already available fetchers for genetic providers or maybe very common providers? Then I might have to do some things for anything that's specific to my team, my company.
Gabriel Manor: So definitely, Opal comes with fetcher that we develop ourself, like I mentioned HTP fetchers, but some community members that as you mentioned, spend the time to write their own fetcher that contribute them back to the community. So, NBA, for example, release a fetcher for Cosmos DB, a DB that stands for making policy decision. For example, we have a community member that contribute Postgres fetcher since this is a pluggable system. That's correct that all of our actually our, let's say, large OPAL users are actually writing a very specific data fetcher from the reason you mentioned to make the policy decisions faster. But some of them just contributed back. So, I can't remember now the whole data fetcher that we already have, but the OPAL repository on GitHub has a list of all the fetchers and integrations that already contributed back from the community. It's nice to mention here that integrations and third- party support like pluggable stuff is not only on the data fetcher side. So, we have Git bundles that you can implement yourself. We have a way to write plugins as we mentioned for policy engine themselves. So, OPAL is actually a very pluggable system and we really see the community that contribute back the thing that's important to them in this plugin system of OPAL.
Damian Schenkelman: That makes sense. So, that's one of the benefits of the open source approach and the plug system, right? It's like you get to benefit from others. One thing you alluded to earlier and you mentioned a couple of times was this notion of relationship based or graph based authorization decisions. You talked about the notion of Zanzibar, which I worked on a system that's based on it. We had a couple of episodes about the topic last season, which we're going to link on the show notes. Can you maybe expand of where that fits in? Because it seems at least, again, we're talking about OPAL, we're talking about state. This would be a good moment to explain some of that.
Gabriel Manor: So I'm happy that you come with that question, because personally for the last couple of months, I really want to head supporting one you mentioned, relationship based engines to Opal, but I want first... We had an explanation on RBAC and ABAC. ... explain what is this relation based authorization and why it's different from RBAC and ABAC. So, relationship- based authorization is something that is very common in consumer applications. So, let's say when you are a social network, you can see all your posts, you can see posts of your friends, you can see posts of groups that you are a member of and getting very complex when you have layers. So, I am for example a member of organization that is a member of organization, a member of organization. By my belonging to this very low level organization, I want to be able to perform operation on let's say like my cousin in other organization because we belong to the same top organization. This is what's called relationship based access control. Again, it's very common if you imagine one of example that I like to bring is Google Drive. So, you can share documents in a very granular way. This relationship based access control is a very nice protocol or standard called Google Zanzibar and also nice implementation. In my opinion and you also mentioned it here, one of the best implementation for application level itself and for other application is OpenFGA. It's actually implementation of a very good standard for relationship- based access control that has the same, maybe not the exact the same, but the ideas of policy engine, something that's getting the decision, policy that's configured as code, but it has also something that don't have in RBAC and ABAC. It has graph database that by looking on its node, you can get the decision. So, relationship is something that could take time to calculate. So, we want to keep state. We want to keep data and this is the main differentiation between RBAC and ABAC and ReBAC. RBAC and ABAC in one hand, you don't need a lot of data. The decision itself happened based on a very specific set of data, but the policy can be very complex. As we mentioned, we could combine 10 attributes in the user, 12 attributes in the resource, and get a decision on it. But again, we don't need a lot of data. When we look on relationship- based access control, the policy rules in the nature are simple. If a user belong to a relationship, they are allowed. If not, they are not allowed. In the other end, the state that we manage, the data that we manage is much higher and need to be in much scale than a policy agent that's based on RBAC and ABAC. The nice thing about OPAL is because the nature of open source and the understanding that at some point, and I'm personally already seeing it now with some users, at some points, policy as code will be polyglot, developer will be polyglot. They will want, for example, to have one setup of OpenFGA for having a very granular or highly granular relationship- based decision. But for other need of the application, they will want to declare ABAC policy to stand in very complex decisions very fast. One of the things that I like with OPA and I really like to see the hype around the community is bringing OPA to a face that we are actually supporting in one hand in policy engines and policy agent that are mostly stateless, use very specific slices of data and allowed complex policy in one hand and in the same setup, in the same policy server that actually kept everything seeing and scale the policy clients support also relationship based policy that require a lot of data, require manage to state but let you get much sophisticated authorization decision based on relationship.
Damian Schenkelman: I think that this is ultimately how I think about this. Where on the one hand, there is a space and a need for attribute based access control and attribute based decisions. On the other hand, there are cases where you need very fast authorization decisions. You don't want to deal with operational overhead of running all of these components. So, in having that state in the database makes sense and that's where systems like OpenFGA or other Zanzibar implementations come in. A couple of things for people that aren't familiar with it. So, Google released a paper called Google Zanzibar, which is how Google does authorization internally. That Google Drive example that you mentioned is a good one. As Gabriel was saying, it's not a standard. It's maybe becoming a community defacto standard because a few companies have a few open source projects that are starting to follow in those footsteps. Maybe a comparison would be much like MapReduce and Hadoop happen from an open source perspective. The other thing as you were saying is it uses the notion of a graph of these relationships between users and objects to make authorization decisions. But the key thing and this is one thing that they call out in the paper and also a thing that's in most of the implementations that I've seen is that it doesn't actually end up using a graph database behind because of how the authorization model DSL, domain specific language works. You can actually implement those graph relationships and these very fast lookups on top of ordinary SQL databases. That makes a lot of the operation and overhead a lot less. Graph databases I wouldn't say are new. They were a thing in the '80s. Now they're back, but their operational properties are very different from things that maybe if you're used to writing a Postgres or MySQL or a cloud database like Dynamo or Cosmos, et cetera, that's a big thing. You don't have to run a database that you're not familiar with in order to run some of these open source projects.
Gabriel Manor: Yeah, exactly. This is the way we're seeing it and that's truly excitement for us. We are truly eager to see how the community will adopt this idea of managing the polyglot policy setups.
Damian Schenkelman: A couple of closing thoughts. I want to go back to how this all started with Permit. Can you explain what Permit. io does and why you folks ended up creating OPAL for those that aren't familiar with Permit?
Gabriel Manor: Yeah, so Permit is actually wrapping all the good things together, but also lets you everything. So, you mentioned all the operational costs, right? So at the end, they have to deploy the services, they have to maybe develop audit and monitoring and scaling and maybe UI to edit all the policies and whatever it is, probably just bring it all in a nice cloud offer that user can subscribe. One of the things that differentiate Permit from, let's say, other cloud products that offer authorization of the service is the ability to scale. So, you can take Permit, connect it into your Git repository where you already store your policy as good, and let Permit scale everything for you, plus get all the external services like auditing, monitoring, user management, data fetching maintenance, et cetera, et cetera. So, beside of being like, let's call it, standard cloud service for authorization application, it also support all the operational headache that you could have from maintaining authorization system yourself as a cloud service.
Damian Schenkelman: What I take away from that is if you want to use OPAL but not have to run it, Permit is like OPAL on steroids as a service. I guess you can deploy it to your own cloud, not just like the Permit cloud.
Gabriel Manor: Yeah, so OPAL, as we said, has a lot of scalability options. So, we have OPAL Plus offer, which is the more traditional commercial open source. Permit itself is more for, as you say, save you all the effort of scaling, deploying everything, and get everything around it.
Damian Schenkelman: Yeah. Yeah, that makes sense. Gabriel, it's been a great conversation. We went deep into a number of topics. Hopefully, this gives listeners the opportunity to learn about some of the things that are happening in this space and trigger thoughts and help and figure out some things with like, " Hey, I should be using this. I should be checking that out," and so on. It's been great to have you on the show, man. I hope you enjoyed it.
Gabriel Manor: Yeah, sure. That was really great and I was happy to be here. Again, I'm pretty sure you can show links, you can find me on Twitter, LinkedIn. I always really love to talk about policy as code and create better access control experiences for application.
Damian Schenkelman: Definitely. Yeah, we will add those links, your social links, some of the Permit links to show notes. We'll also share links to everything that we've been discussing today to show notes, so that people can easily find those there. It's doing amazing to have you. Again, to everyone listening in, if you folks have questions about Permit or OPAL itself, just feel free to ask Gabriel or go to some of those links. That's it for today's episode of Authorization in Software. Thanks for tuning in and listening to us. If you enjoy the show, be sure to subscribe to the podcast on your preferred platform, so you'll never miss an episode. If you have any feedback or suggestions for future episodes, feel free to reach out to us on social media. We love hearing from our listeners. Keep building secure software, and we'll catch you on the next episode of Authorization in Software.
DESCRIPTION
Dive into the world of advanced authorization with Gabriel Manor, Head of DevRel and Growth at Permit.io. In this episode of Authorization in Software, Damian Schenkelman engages Gabriel in a discussion on the Open Policy Authorization Layer, better known as OPAL.
Damian and Gabriel delve deep into how OPAL enables a structured and effective approach to authorization. They cover the shift from traditional Role-Based Access Control (RBAC) to the more dynamic Attribute-Based Access Control (ABAC), highlighting the need for granular control in modern application environments.
This episode is insightful for those interested in understanding the complexities of policy-based authorization systems. It discusses the challenges and benefits of decoupling authorization policies from application code, emphasizing the importance of streamlined policy management for secure and efficient software development.