<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>E13 Capital &amp; Technology Blog</title><link>https://e13.dev/blog/</link><description>Blog on everything technology, Kubernetes, software engineering and software operations.</description><image><url>https://e13.dev/images/e13.png</url><link>https://e13.dev/images/e13.png</link><title>E13 Capital &amp; Technology Blog</title></image><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 14 Nov 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://e13.dev/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Software Is Still Eating the World</title><link>https://e13.dev/blog/software-still-eats-the-world/</link><pubDate>Thu, 14 Nov 2024 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/software-still-eats-the-world/</guid><description>Why do software projects fail? Computer scientists and software practitioners alike have wrestled with this question ever since software has entered the corporate space. There are supposedly many factors leading to the &amp;ldquo;failure&amp;rdquo; of a software project (however failure is defined) but the one constant I&amp;rsquo;ve witnessed in many projects I was on is that Conway&amp;rsquo;s law is always true. No single corporate entity has ever been able to detach its software engineering or software operations teams from the rest of the organization in a productive way.</description><content:encoded><![CDATA[<p>Why do software projects fail? Computer scientists and software practitioners alike have wrestled with this question ever since software has entered the corporate space. There are supposedly many factors leading to the &ldquo;failure&rdquo; of a software project (however failure is defined) but the one constant I&rsquo;ve witnessed in many projects I was on is that Conway&rsquo;s law is always true. No single corporate entity has ever been able to detach its software engineering or software operations teams from the rest of the organization in a productive way.</p>
<p>Companies tend to apply the same processes and principles they already have in place to their software creation and maintenance endeavors. A company that is very good at producing cars will take their successful car-making processes and force them on their software engineering teams. The same goes for any other manufacturer, construction, utility, healthcare, education, trading or finance company. And it Does. Not. Work.</p>
<p>You cannot successfully build software the same way you build cars, maintain power grids or produce industrial-grade machines. All the metaphors out there like &ldquo;building software is like building a house&rdquo; are wrong! There is no single metaphor for the software development process. You can&rsquo;t plan it 3 years ahead. You can&rsquo;t make up a blueprint and go building. You&rsquo;re not building a single thing. You&rsquo;re not even building something that will not change in a significant way 1 or 2 years from now. Software development needs to be rapid, fast-paced with short feedback cycles to be successful. Bill Gates and Paul Allen <a href="https://en.wikipedia.org/wiki/Altair_BASIC#Origin_and_development" target="_blank" rel="noopener">built the first BASIC interpreter for the Altair within weeks</a>! Let that be your benchmark and not how you build your great cars or machines or space rockets.</p>
<p>That&rsquo;s why heavyweight processes have been largely replaced by lightweight methodologies for software engineering in the 1990s when XP, Scrum, FDD and later the Agile Manifesto have been introduced. This is now more than 20 years ago! So why do many software projects still fail? The methodologies are not to blame. It&rsquo;s the organizational structures that they are pressured into. For an agile software development process to be applied successfully it needs to be implemented in a way that allows it to breathe, to blossom. Software engineers need to be given creative space. Building software never is like building a physical machine from a blueprint. There are no blueprints for software. Each piece of software is different from anything that has been built before.</p>
<p>That&rsquo;s what agile is all about. You can&rsquo;t lock developers into a corporate chamber and expect anything meaningful to come out of there. And even in cases where a company tries to detach their software development from their other business as much as possible, processes and culture will not stop leaking into it. Volkswagen&rsquo;s Cariad is the most recent example of how this doesn&rsquo;t work.</p>
<p>So if Conway&rsquo;s law does proof to be true in almost every case, what can companies do about that? They obviously can&rsquo;t (and shouldn&rsquo;t) change the processes they&rsquo;ve successfully applied to their other businesses. Should they stop producing, maintaining and operating software at all and leave it to those who are specialized software practitioners? Should they stop trying to produce any software and always opt for buying existing software products?</p>
<p>People who know me also know I strongly believe that every big company out there needs to admit to the fact that software is and will continue to be a significant part of your business. Software has long passed its status as a mere utility for the &ldquo;actual&rdquo; business. It IS your business. Look at cars again: Those companies that have made software a central and integral part of their manufacturing process are successful (Tesla, Xiaomi, BYD and others). Those that have not, aren&rsquo;t (any traditional car maker). In space technology you can even see how the recognition of Conway&rsquo;s law can lead to the reverse effect, i.e. agile methodologies being applied to the manufacturing process by SpaceX, in effect making them the leading space technology company in the world.</p>
<p>Marc Andreessen coined the phrase <a href="https://a16z.com/why-software-is-eating-the-world/" target="_blank" rel="noopener">software is eating the world</a> in 2011. Now, 13 years later, many big corporations out there have still not grasped the extent to which this phrase affects them and anyone else, too. Stop treating software as a utility. Start making it the defining part of your business, adding value to your baseline products, be it cars, rockets, supermarkets or utilities. This is the foundation upon which companies will build a successful future for their businesses.</p>
]]></content:encoded></item><item><title>"My Pod doesn't see the clients' IP addresses": Kubernetes External Traffic Policy Caveats</title><link>https://e13.dev/blog/k8s-external-traffic-policy/</link><pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/k8s-external-traffic-policy/</guid><description>Yesterday I had an interesting conversation with a friend who asked me how he could configure his Kubernetes deployment so that the application running in his Pods is able to see the IP addresses of the clients that issue TCP requests. Since his application was running behind a LoadBalancer service, I pointed him to the official Kubernetes docs on the topic that basically advise to set the Service&amp;rsquo;s .spec.externalTrafficPolicy to Local.</description><content:encoded><![CDATA[<p>Yesterday I had an interesting conversation with a friend who asked me how he could configure his Kubernetes deployment so that the application running in his Pods is able to see the IP addresses of the clients that issue TCP requests. Since his application was running behind a LoadBalancer service, I pointed him to the <a href="https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-loadbalancer" target="_blank" rel="noopener">official Kubernetes docs</a> on the topic that basically advise to set the Service&rsquo;s <code>.spec.externalTrafficPolicy</code> to <code>Local</code>. This will lead to requests being served from the node they arrived at and consequently preserves the clients&rsquo; IP addresses. I didn&rsquo;t forget to mention that this may lead to an imbalance in how traffic is routed to his Pods (as the documentation also mentions <a href="https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#caveats-and-limitations-when-preserving-source-ips" target="_blank" rel="noopener">on another page</a>). When he asked why that was the case I had to think for a second and ended up illustrating it to him with an example:</p>
<p>Imagine your cluster has 3 nodes and your application is deployed with 6 replicas, i.e. 6 Pods. The Pods are spread across the nodes in the following way:</p>
<ul>
<li>1 Pod on Node 1</li>
<li>2 Pods on Node 2</li>
<li>3 Pods on Node 3</li>
</ul>
<p>In the usual case you want incoming traffic to be distributed equally among the Pods, 1/6th of all requests to each node. Now with an external load balancer fronting your Kubernetes Service of type LoadBalancer, that external load balancer usually knows nothing about Pods but only about nodes (that&rsquo;s true e.g. for an AKS Load Balancer and presumably for GKE, EKS and on-premise LBs, too). Your external load balancer will consequently balance traffic equally across your Kubernetes nodes, 1/3rd of all requests to each node. From there, a component called kube-proxy takes over and distributes the traffic to the matching Pods of the Service.</p>
<p>With the default external traffic policy of <code>Cluster</code>, kube-proxy will take into account all Pods of the whole cluster and distribute traffic equally among them, no matter where they run. This is illustrated in the following diagram:</p>
<figure >
    <img loading="lazy" src="/images/external-traffic-policy-cluster.svg"
         alt="A schematic diagram illustrating how traffic flows with cluster external traffic policy"/> 
</figure>

<p>1/6th of all requests goes to each Pod. Good, that&rsquo;s what we want. But now that we want to reveal the clients&rsquo; IP addresses to the application running inside of the Pods, we change the traffic policy to <code>Local</code> as explained in the documentation. This will lead to a significant change of how traffic flows within your cluster. With all traffic from the external load balancer still being balanced equally among all nodes (because what does LB know about Kubernetes traffic policies, anyway?), kube-proxy will no longer forward it to Pods outside of the node that it&rsquo;s running on, leading to the following traffic flow:</p>
<figure >
    <img loading="lazy" src="/images/external-traffic-policy-local.svg"
         alt="A schematic diagram illustrating how traffic flows with local external traffic policy"/> 
</figure>

<p>As you can see, now Pod 1 has to handle 1/3rd of all traffic, Pods 2 and 3 still handle 1/6th each and Pods 4, 5 and 6 only handle 1/9th. So Pod 1 has to handle 3 times as much traffic as Pods 4, 5 and 6. This is a huge imbalance and may lead to your application behaving very differently depending on which node handles a request.</p>
<h1 id="what-you-can-do-about-it">What You Can Do About It</h1>
<p>There&rsquo;s multiple things you can do to preserve client IP addresses while still balancing traffic equally:</p>
<ul>
<li>Use a <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/" target="_blank" rel="noopener">Pod Topology Spread Constraint</a> to run more or less the same amount of Pods on each node. This, of course, only makes sense if all the nodes have more or less the same resources in terms of CPU cores, RAM and network connectivity (depending on which one&rsquo;s important to your application).</li>
<li>Use an ingress controller: Usually, ingress controllers allow for a much more fine-grained load-balancing behaviour, e.g. ingress-nginx can be configured to use a different load-balancing algorithm <a href="https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-nginx-load-balancing" target="_blank" rel="noopener">per Ingress resource</a>.</li>
<li>Depending on your Kubernetes cluster provider you may be able to use the Gateway API which <a href="https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.BackendRef" target="_blank" rel="noopener">provides limited ways to influence the weighting of backends</a>.</li>
</ul>
]]></content:encoded></item><item><title>Kubernetes is your Operations Development Kit</title><link>https://e13.dev/blog/kubernetes-sdk/</link><pubDate>Fri, 10 May 2024 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/kubernetes-sdk/</guid><description>Ever since I first dipped my toes into the Kubernetes waters, there were people arguing against Kubernetes along the lines of &amp;ldquo;you can run your stuff on a single EC2 instance much cheaper and simpler&amp;rdquo;. Here I want to lay out why I believe this is a short-sighted and incomprehensive perspective. Hear me out! 😉
Most people think of Kubernetes as a mere container orchestrator, a component so deep down in your application&amp;rsquo;s operational stack that developers don&amp;rsquo;t have to think about it as the target platform for the applications they build.</description><content:encoded><![CDATA[<p>Ever since I first dipped my toes into the Kubernetes waters, there were people arguing against Kubernetes along the lines of &ldquo;you can run your stuff on a single EC2 instance much cheaper and simpler&rdquo;. Here I want to lay out why I believe this is a short-sighted and incomprehensive perspective. Hear me out! 😉</p>
<p>Most people think of Kubernetes as a mere container orchestrator, a component so deep down in your application&rsquo;s operational stack that developers don&rsquo;t have to think about it as the target platform for the applications they build. Just build a container from your code and run it. And yes, that&rsquo;s what Kubernetes does: it runs your containers. If Deployment/Pod is where you stop using Kubernetes&rsquo; machinery, then you are likely better off running your containers on an EC2 instance indeed. But I highly doubt this is where anyone stops even in a development or integration environment. I will give you a real-world example of a customer engagement where we were asked to create an offer for deploying and running a certain business application. The ask by the customer was explictly to have the application running on virtual machines.</p>
<p>The application the customer needed to operate was a critical part of a multi-tenant messaging bus so it had to be highly available and be properly monitored for outages. The application itself needs access to a database and an AMQP broker. This is what we came up with:</p>
<ul>
<li>2 VMs for each instance of the application itself for high availability/failover scenarios</li>
<li>2 VMs for the database (primary + replica)</li>
<li>2 VMs for the AMQP broker (deployed as cluster for high availability)</li>
<li>1 VM for monitoring</li>
<li>1 VM for logging</li>
</ul>
<h2 id="step-1---the-path-towards-containers">Step 1 - The Path Towards Containers</h2>
<p>That&rsquo;s 8 VMs for running and monitoring a single business application. What&rsquo;s the potential to drive down the cost for this setup? Sure, start with consolidating applications onto one VM. For our architecture proposal, we decided to put the monitoring and logging components onto the same VM. We then put the DB and the AMQP broker on the same VM, too. What&rsquo;s the consequence of this consolidation? 37,5% percent cost reduction (minus 3 VMs). Good. But honestly, we separated the VMs by application domain on purpose to begin with. One of the reasons is better isolation for security purposes, e.g. for the case where one of the instances gets compromised. You simply reduce the potential to move laterally across the infrastructure.</p>
<p>How do I properly isolate the applications to keep fulfilling this security goal when they&rsquo;re sharing the same VM? I insert an isolation layer. How do I do this in Linux? <strong>Using containers</strong>! That&rsquo;s step 1.</p>
<h2 id="step-2---kubernetes-to-the-rescue">Step 2 - Kubernetes to the Rescue</h2>
<p>Now I have several VMs running that in turn run several containers each. Great! But I still need to properly manage network traffic flowing between each instance of my landscape. With containers I would use the runtime&rsquo;s networking features to do this, probably create several container networks and allow traffic to flow from certain parts of the landscape towards other parts (e.g. let the each business application instance open a TCP port to the database). I&rsquo;ll probably have to do some host-side iptables/nftables tweaking, too.</p>
<p>Next challenge: Deploying all these containers. Since I need to run multiple containers across several VMs, a solution such as Docker Compose isn&rsquo;t feasible, anymore. I will have to start scripting my own deployment machinery. But even now that I have all the containers running, I still need to deploy additional services such as a load balancer/reverse proxy to balance traffic between the two business application instances. I need to build a way to automatically fail over as soon as one of the VMs goes down. I need to manage access to the VMs for different roles, i.e. create user accounts, deploy SSH keys etc. etc.</p>
<p>But there&rsquo;s more: Maybe the application needs access to some sort of secret store, e.g. Hashicorp Vault or cloud-native solutions such as AWS/GCP KMS or Azure Key Vault. That&rsquo;s another thing I need to manually set up or better build some kind of custom automation for.</p>
<p>At this point I assume you see where this is leading: Running an application in production rarely means spinning up a single VM and putting a JAR file onto it. There&rsquo;s a lot of auxiliary components at play. And this is where we will now take a step back and see what the operational challenges are that we need to solve in this specific scenario:</p>
<ul>
<li>Orchestrate multiple containers across multiple machines (deploy, auto-restart, update, undeploy)</li>
<li>Manage network traffic flowing between containers and between machines</li>
<li>Balance traffic between instances and reverse-proxy services and manage failover scenarios</li>
<li>Manage machine access for different roles and users</li>
<li>Manage secret access from container instances</li>
</ul>
<p>Experienced, senior ops people will have built or bought the proper tooling to do all of this for them over the many years they&rsquo;ve been working in the space. They will have every tool and every process at hand to solve all of the problems stated above. These are not new problems, after all.</p>
<p>But what if there was a software out there that solved all of these challenges in a declarative, standardized way so that every ops person in the world could easily understand any environment operated by that software to a certain degree? A software that provides a common API with standardized syntax and semantics? An operational development kit if you will, flexible enough to adapt to the myriad of different operational environments out there.</p>
<p>Well, <strong>this operational development kit is Kubernetes</strong>! See for yourself:</p>
<table>
<thead>
<tr>
<th>Challenge</th>
<th>Kubernetes API</th>
</tr>
</thead>
<tbody>
<tr>
<td>Orchestrate containers</td>
<td><a href="https://kubernetes.io/docs/concepts/workloads/controllers/" target="_blank" rel="noopener">Deployments/StatefulSets/DaemonSets</a></td>
</tr>
<tr>
<td>Manage network traffic</td>
<td><a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/" target="_blank" rel="noopener">NetworkPolicies</a></td>
</tr>
<tr>
<td>Balance traffic/reverse-proxy/failover</td>
<td><a href="https://kubernetes.io/docs/concepts/services-networking/" target="_blank" rel="noopener">Service/LoadBalancer/Ingress</a></td>
</tr>
<tr>
<td>Manage machine access</td>
<td><a href="https://kubernetes.io/docs/concepts/security/controlling-access/" target="_blank" rel="noopener">RBAC (Role, ClusterRole, RoleBinding and ClusterRoleBinding)</a></td>
</tr>
<tr>
<td>Manage secret access</td>
<td>RBAC + <a href="https://kubernetes.io/docs/concepts/configuration/secret/" target="_blank" rel="noopener">Secrets</a> + <a href="https://secrets-store-csi-driver.sigs.k8s.io/" target="_blank" rel="noopener">Secrets Store CSI Driver</a></td>
</tr>
</tbody>
</table>
<p>Kubernetes provides all the building blocks for the operational challenges you&rsquo;re facing, anyway, out of the box. The overhead it brings in terms of operational complexitiy (it&rsquo;s not an easy task to keep a Kubernetes cluster up and running in production) is easily compensated by the simplicity of managing workloads running on it. So easy, in fact, that many development teams within companies can be handed a <code>kubeconfig</code> file and manage their applications themselves. I know because I&rsquo;ve been on such a team in the past.</p>
<p>Kubernetes shifts work I&rsquo;d be doing myself in a classical VM-based environment to software operators running in the cluster so that I can focus on more important work. It makes my application landscape transparent and reproducible if I add GitOps to the mix and store all the infrastructure in a repository. That&rsquo;s the real power of Kubernetes and why I like it so much.</p>
]]></content:encoded></item><item><title>The commoditization of Kubernetes</title><link>https://e13.dev/blog/commoditization-of-k8s/</link><pubDate>Fri, 23 Jun 2023 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/commoditization-of-k8s/</guid><description>There&amp;rsquo;s so many rants out there about Kubernetes and container environments in general and the most recent statements by Kelsey Hightower just fueled these so I want to share why I believe Kubernetes and the cloud-native way to run apps these days is a good thing.
Back in the days when I wanted to run a server application and expose it to the Internet I rented a root server (or used one I already had), copied all the artifacts onto it using scp and started the app/web server in the background, using screen or maybe a systemd service.</description><content:encoded><![CDATA[<p>There&rsquo;s so many rants out there about Kubernetes and container environments in general and the <a href="https://github.com/readme/podcast/kelsey-hightower" target="_blank" rel="noopener">most recent statements</a> by Kelsey Hightower just fueled these so I want to share why I believe Kubernetes and the cloud-native way to run apps these days is a good thing.</p>
<p>Back in the days when I wanted to run a server application and expose it to the Internet I rented a root server (or used one I already had), copied all the artifacts onto it using scp and started the app/web server in the background, using screen or maybe a systemd service. For an upgrade of the app I stopped and restarted the app server after updating the artifacts. Simpe workflow.</p>
<p>Nowadays my usual workflow is to make a container image out of the app and run it in Kubernetes. For a completely new environment I spin up a Kubernetes server, install an ingress controller, use GitOps (i.e. install Flux), encrypt Secrets with SOPS or connect to a Vault (installing external-secrets operator), install Prometheus and Grafana, setup Slack notifications for Grafana alerts and a couple of other things.</p>
<p>What does this change in workflows and technology tell us, I wonder? Are we all adding unnecessary overhead to our production environments? Is Kubernetes complete overkill? I don&rsquo;t think so and here&rsquo;s why:</p>
<p>In my opinion the new workflow represents two things: First, the mindset of what it takes to run an application reliably and securely has changed. People are much more aware of what it means to run an application in production. Users don&rsquo;t accept considerable downtimes; adversaries have become much more efficient and effective. Occasionally spinning up your regular Tomcat and exposing it to the Internet doesn&rsquo;t work, anymore. Second, the technology needed to spin up a production environment that deserves the name has become a commodity. Kubernetes plays a huge part in this commoditization. It doesn&rsquo;t take days or weeks to get a decent environment up and running, it takes minutes to a couple of hours now.</p>
<p>Part of that commoditization are the very well-defined APIs that Kubernetes ships with and that allow it to be extended through e.g. CRDs. Containers of course have also shifted deployment processes left and generally simplified things a lot.</p>
<p>So does Kubernetes add overhead? Of course it does. Is it unnecessary? No way! The commoditization of production deployments is a good thing. Now, software engineers with little background in system operations now have all the tools at hand to run their apps reliably and securely. There is a learning curve but it&rsquo;s nowhere near as steep as it was back in the days.</p>
<p>Kelsey is right. Kubernetes will likely go away in the future but not in the sense that some people seem to understand it: It will become even more commoditized, to the degree that most people don&rsquo;t have to think about it. The new mindset will stick, though, and that&rsquo;s good, for users and operators alike.</p>
]]></content:encoded></item><item><title>The Story of a GitHub Actions Workflow</title><link>https://e13.dev/blog/a-gh-actions-workflow-story/</link><pubDate>Sat, 19 Nov 2022 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/a-gh-actions-workflow-story/</guid><description>Discuss this post
This is the story of a seemingly simple task of creating a GitHub Actions workflow that &amp;hellip; escalated quickly. I hope you people can learn from my mistakes and do better (or quicker).
You&amp;rsquo;ll find the tl;dr version here.
Over at Weaveworks we try to automate as many engineering processes as possible. That&amp;rsquo;s especially true for the tedious work of releasing a new version of one of the components we build.</description><content:encoded><![CDATA[<p><a href="https://hachyderm.io/@makkes/109377473189626346" target="_blank" rel="noopener">Discuss this post</a></p>
<p>This is the story of a seemingly simple task of creating a GitHub Actions workflow that &hellip; escalated quickly. I hope you people can learn from my mistakes and do better (or quicker).</p>
<p>You&rsquo;ll find the tl;dr version <a href="#the-lessons">here</a>.</p>
<p>Over at Weaveworks we try to automate as many engineering processes as possible. That&rsquo;s especially true for the tedious work of releasing a new version of one of the components we build. One of these components is a Kubernetes controller running as part of Weave GitOps Enterprise, the enterprise version of our <a href="https://github.com/weaveworks/weave-gitops/" target="_blank" rel="noopener">OSS Weave GitOps</a>. The controller is basically shipped in a container image and a Helm chart wrapping all the necessary manifests, Deployments, Services etc.</p>
<p>What we had already setup was a GitHub Actions workflow that would build and push a new container image version whenever a Git tag was pushed to the repository, nice and easy and a pretty standard workflow. However, after that image was pushed we still had to go ahead and manually update the chart version and the image version used within it. The chart building and publishing again was already properly automated.</p>
<p>So inbetween two tasks I was working on I wanted to spend an hour or two building a workflow that would bump the chart version and the app version within the chart whenever a new container image was pushed. It should then create a PR with those changes so we can still verify it. Sounds like a very low-hanging fruit, right? That&rsquo;s what I thought, too.</p>
<h2 id="version-1">Version 1</h2>
<p>This is the initial version I came up with. First, the trigger:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">name</span>: <span style="color:#ae81ff">Update app in chart</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">on</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">registry_package</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">types</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">published</span>
</span></span></code></pre></div><p>Simple, right? GitHub Actions provides a nice <a href="event">https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#registry_package</a> that triggers a workflow whenever something is pushed to the package registry. Spoiler alert: This didn&rsquo;t work without changes to other workflows. More on that later. Let&rsquo;s look at the single job within that workflow:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">jobs</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">update-chart</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">if</span>: <span style="color:#ae81ff">${{ github.event.registry_package.name == &#39;pipeline-controller&#39; }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runs-on</span>: <span style="color:#ae81ff">ubuntu-latest</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Checkout</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">actions/checkout@v3</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bump app version</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">mikefarah/yq@v4.30.4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq -i &#39;.appVersion = &#34;${{ github.event.registry_package.package_version.container_metadata.tag.name }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span></code></pre></div><p>Easy; set the new app version from the image that triggered the workflow. We will see later on that like it is here it may set the <code>appVersion</code> to an empty string. More on that later.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">get chart version</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">id</span>: <span style="color:#ae81ff">get_chart_version</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">mikefarah/yq@v4.30.4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq &#39;.version&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">increment chart version</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">run</span>: <span style="color:#ae81ff">echo ${{ steps.get_chart_version.outputs.result }} awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update chart version</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">mikefarah/yq@v4.30.4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq -i &#39;.version = &#34;${{ steps.get_chart_version.outputs.result }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span></code></pre></div><p>These 3 steps above were supposed to extract the existing chart version, increase the minor version, set the patch version to &lsquo;0&rsquo; and store the new version in the <code>Chart.yaml</code>. However, there&rsquo;s two bugs in there, can you spot them?</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create Pull Request</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">id</span>: <span style="color:#ae81ff">cpr</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">peter-evans/create-pull-request@v4</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">commit-message</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            Update app version in chart</span>            
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">committer</span>: <span style="color:#ae81ff">GitHub &lt;noreply@github.com&gt;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">author</span>: <span style="color:#75715e">####### REDACTED ######</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">update-chart</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">title</span>: <span style="color:#ae81ff">Update app version in chart</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Check output</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">run</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          echo &#34;Pull Request Number - ${{ steps.cpr.outputs.pull-request-number }}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          echo &#34;Pull Request URL - ${{ steps.cpr.outputs.pull-request-url }}&#34;</span>          
</span></span></code></pre></div><p>Straightforward, create a PR from the changes so we can review and merge them. Turns out, a PR created like that couldn&rsquo;t be merged with the repo settings we had in place.</p>
<p>Almost every single step in that workflow has bugs. But were we able to spot them before actually merging the new workflow into <code>main</code>? No, because I yet have to find a way to test a workflow without actually merging and running it. Please let me know if you know of any! So we went ahead and merged that workflow file, created a new Git tag and waited until a new image version was pushed for the workflow to be triggered.</p>
<h2 id="not-running-at-all">Not Running At All</h2>
<p>The first we observed was that the workflow wasn&rsquo;t even triggered at all. We already knew that you couldn&rsquo;t just <a href="https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow" target="_blank" rel="noopener">trigger a workflow from another workflow</a> but what we didn&rsquo;t know was that this behaviour is carried forward even for transitive actions such as an image push. We changed the other workflow pushing the new image to the registry to use a personal access token and that fixed that. The workflow was running now.</p>
<p><strong>Lesson #1:</strong> When you want a workflow to be triggered by a new image version being pushed to GitHub&rsquo;s registry, make sure to not use the default workflow token for pushing that image. Otherwise workflows listening the push event won&rsquo;t run.</p>
<h2 id="version-2">Version 2</h2>
<p>The next thing we noticed was that the workflow was triggered 3 times. We had no clue why but decided to fix the other issues first. One of these was the step incrementing the chart version not working. This was a simple syntax error as we forgot a pipe character:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>-        <span style="color:#f92672">run</span>: <span style="color:#ae81ff">echo ${{ steps.get_chart_version.outputs.result }} awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        run</span>: <span style="color:#ae81ff">echo ${{ steps.get_chart_version.outputs.result }} | awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;</span>
</span></span></code></pre></div><p>Easy! Next!</p>
<h2 id="version-3">Version 3</h2>
<p>Next we discovered that the new chart version set by the workflow was wrong. It didn&rsquo;t bump the version at all. Turns out the step setting the new version referenced the wrong step:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>         <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>           <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq &#39;.version&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">increment chart version</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        id</span>: <span style="color:#ae81ff">inc_chart_version</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">run</span>: <span style="color:#ae81ff">echo ${{ steps.get_chart_version.outputs.result }} | awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update chart version</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">mikefarah/yq@v4.30.4</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>-          <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq -i &#39;.version = &#34;${{ steps.get_chart_version.outputs.result }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          cmd</span>: <span style="color:#ae81ff">yq -i &#39;.version = &#34;${{ steps.inc_chart_version.outputs.result }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create Pull Request</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">id</span>: <span style="color:#ae81ff">cpr</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">peter-evans/create-pull-request@v4</span>
</span></span></code></pre></div><h2 id="version-4">Version 4</h2>
<p>Finally we wanted to find out why the workflow was triggered 3 times so I added a debug step that would just dump the complete event:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>     <span style="color:#f92672">if</span>: <span style="color:#ae81ff">${{ github.event.registry_package.name == &#39;pipeline-controller&#39; }}</span>
</span></span><span style="display:flex;"><span>     <span style="color:#f92672">runs-on</span>: <span style="color:#ae81ff">ubuntu-latest</span>
</span></span><span style="display:flex;"><span>     <span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span><span style="color:#f92672">+      - name</span>: <span style="color:#ae81ff">dump event</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        run</span>: <span style="color:#ae81ff">echo ${{ toJson(github.event) }}</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Checkout</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">actions/checkout@v3</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">bump app version</span>
</span></span></code></pre></div><p>This didn&rsquo;t work because the <code>run</code> syntax wasn&rsquo;t correct but it did dump the event nevertheless. The reason for the multiple triggering was actually kind of simple: We pushed a multi-arch container image comprised of a AMD64 image and an ARM64 manifest. Another manifest list manifest ties these together then. For each of the manifests pushed, a <code>registry_package</code> event is emitted.</p>
<p>So we went ahead and added another condition to the job run:</p>
<h2 id="version-5">Version 5</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span> <span style="color:#f92672">jobs</span>:
</span></span><span style="display:flex;"><span>   <span style="color:#f92672">update-chart</span>:
</span></span><span style="display:flex;"><span>-    <span style="color:#f92672">if</span>: <span style="color:#ae81ff">${{ github.event.registry_package.name == &#39;pipeline-controller&#39; }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+    if</span>: <span style="color:#ae81ff">${{ github.event.registry_package.name == &#39;pipeline-controller&#39; &amp;&amp; github.event.registry_package.package_version.container_metadata.tag.name != &#39;&#39; }}</span>
</span></span><span style="display:flex;"><span>     <span style="color:#f92672">runs-on</span>: <span style="color:#ae81ff">ubuntu-latest</span>
</span></span><span style="display:flex;"><span>     <span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">dump event</span>
</span></span></code></pre></div><p>Now the <code>update-chart</code> job is only run for the event carrying the new image tag.</p>
<p><strong>Lesson #2:</strong> When using the <code>registry_package</code> event as a trigger make sure to use proper conditions when reacting to multi-arch image pushes.</p>
<h2 id="version-6">Version 6</h2>
<p>Now the workflow was running only once (it still shows up 3 times but the other 2 are skipped) but the new chart version still wasn&rsquo;t set. Turns out I didn&rsquo;t understand how you carry command outputs from one step to another. After reading up on this <a href="https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-output-parameter" target="_blank" rel="noopener">in the docs</a> we fixed that:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>           <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq &#39;.version&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">increment chart version</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">id</span>: <span style="color:#ae81ff">inc_chart_version</span>
</span></span><span style="display:flex;"><span>-        <span style="color:#f92672">run</span>: <span style="color:#ae81ff">echo ${{ steps.get_chart_version.outputs.result }} | awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        run</span>: <span style="color:#ae81ff">echo NEW_CHART_VERSION=$(echo ${{ steps.get_chart_version.outputs.result }} | awk -F. -v OFS=. &#39;{print $1,++$2,0}&#39;) &gt;&gt; $GITHUB_OUTPUT</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">update chart version</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">mikefarah/yq@v4.30.4</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span>-          <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">yq -i &#39;.version = &#34;${{ steps.inc_chart_version.outputs.result }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          cmd</span>: <span style="color:#ae81ff">yq -i &#39;.version = &#34;${{ steps.inc_chart_version.outputs.NEW_CHART_VERSION }}&#34;&#39; charts/pipeline-controller/Chart.yaml</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Create Pull Request</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">id</span>: <span style="color:#ae81ff">cpr</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">peter-evans/create-pull-request@v4</span>
</span></span></code></pre></div><p><strong>Lesson #3:</strong> Use <code>GITHUB_OUTPUT</code> for carrying command output from one step to another.</p>
<h2 id="version-7">Version 7</h2>
<p>Now the commit from the PR looked good but no CI checks were run. One more time the constraint of &ldquo;a workflow can&rsquo;t trigger another workflow with the default GitHub token&rdquo; kicked in. Fixing this was easy:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>         <span style="color:#f92672">id</span>: <span style="color:#ae81ff">cpr</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">peter-evans/create-pull-request@v4</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">with</span>:
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          token</span>: <span style="color:#ae81ff">${{ secrets.GHCR_TOKEN }}</span>
</span></span><span style="display:flex;"><span>           <span style="color:#f92672">commit-message</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">             Update app version in chart</span>             
</span></span><span style="display:flex;"><span>           <span style="color:#f92672">committer</span>: <span style="color:#ae81ff">GitHub &lt;noreply@github.com&gt;</span>
</span></span></code></pre></div><p><strong>Lesson #4:</strong> When creating a PR using the default workflow token, no CI checks are run. You need to create a personal access token.</p>
<h2 id="version-8">Version 8</h2>
<p>Woohoo, we got it! After creating what felt like a million Git tags to trigger the workflow over and over again and cluttering Git history with another million commits fixing the workflow, it was kicked off as expected, the PR looked fine and all CI checks were running.</p>
<p>But, oh no, GitHub didn&rsquo;t allow us to merge the PR because the commit wasn&rsquo;t signed. Duh! One more time:</p>
<h2 id="version-9">Version 9</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>     <span style="color:#f92672">steps</span>:
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Checkout</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">uses</span>: <span style="color:#ae81ff">actions/checkout@v3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+      - name</span>: <span style="color:#ae81ff">Import GPG key for signing commits</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        uses</span>: <span style="color:#ae81ff">crazy-max/ghaction-import-gpg@v3</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+        with</span>:
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          gpg-private-key</span>: <span style="color:#ae81ff">${{ secrets.GPG_PRIVATE_KEY }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          passphrase</span>: <span style="color:#ae81ff">${{ secrets.GPG_PASSPHRASE }}</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          git-user-signingkey</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          git-commit-gpgsign</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>This additional step led to the commits created by the <code>create-pull-request</code> action to be signed and the PR to finally be in a mergeable state. Hooray!</p>
<h2 id="the-final-version">The Final Version</h2>
<p>The icing on the cake was a little change to make the PR more comprehensible and basically document what it does in the description:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>           <span style="color:#f92672">committer</span>: <span style="color:#ae81ff">GitHub &lt;noreply@github.com&gt;</span>
</span></span><span style="display:flex;"><span>           <span style="color:#f92672">author</span>:  <span style="color:#75715e">###### REDACTED ######</span>
</span></span><span style="display:flex;"><span>           <span style="color:#f92672">branch</span>: <span style="color:#ae81ff">update-chart</span>
</span></span><span style="display:flex;"><span>-          <span style="color:#f92672">title</span>: <span style="color:#ae81ff">Update app version in chart</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          title</span>: <span style="color:#ae81ff">Update app version to ${{ github.event.registry_package.package_version.container_metadata.tag.name }} in chart</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">+          body</span>: <span style="color:#ae81ff">|</span>
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">+            This PR bumps the minor chart version by default. If it is more appropriate to bump the major or the patch versions, please amend the commit accordingly.</span>
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">+</span>
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">+            The workflow that this PR was created from is &#34;${{ github.workflow }}&#34;.</span>
</span></span><span style="display:flex;"><span>       - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Check output</span>
</span></span><span style="display:flex;"><span>         <span style="color:#f92672">run</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">           echo &#34;Pull Request Number - ${{ steps.cpr.outputs.pull-request-number }}&#34;</span>           
</span></span></code></pre></div><p>This was the story of a seemingly very simple workflow we thought wouldn&rsquo;t take more than 1 or 2 hours and turned out to take around a full day.</p>
<h2 id="the-lessons">The Lessons</h2>
<p><strong>Lesson #1:</strong> When you want a workflow to be triggered by a new image version being pushed to GitHub&rsquo;s registry, make sure to not use the default workflow token for pushing the image. Otherwise workflows listening to the push event won&rsquo;t run. <a href="https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow" target="_blank" rel="noopener">Related documentation</a></p>
<p><strong>Lesson #2:</strong> When using the <code>registry_package</code> event as a trigger make sure to use proper conditions when reacting to multi-arch image pushes. I created <a href="https://github.com/github/docs/pull/22092" target="_blank" rel="noopener">a PR for adding this info to the documentation</a> that hopefully gets merged soon.</p>
<p><strong>Lesson #3:</strong> Use <code>GITHUB_OUTPUT</code> for carrying command output from one step to another. <a href="https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-output-parameter" target="_blank" rel="noopener">Related documentation</a></p>
]]></content:encoded></item><item><title>Hosting Mastodon identities at your own domain</title><link>https://e13.dev/blog/hosting-your-mastodon-identity/</link><pubDate>Wed, 16 Nov 2022 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/hosting-your-mastodon-identity/</guid><description>Discuss this post
EDIT January 11, 2022: In previous versions of this article I advertised the try_files directive which made the solution vulnerable to path traversal attacks. Using the return directive and sending a 301 redirect fixed this. Thanks to Penple for making me aware of this vulnerability.
With Mastodon being all the rage right now and people massively moving over, new opportunities arise. One of these is that Mastodon allows you to take ownership of your identity using the WebFinger protocol.</description><content:encoded><![CDATA[<p><a href="https://hachyderm.io/@makkes/109353563088733849" target="_blank" rel="noopener">Discuss this post</a></p>
<p><strong>EDIT January 11, 2022:</strong> In previous versions of this article I advertised the <code>try_files</code> directive which made the solution vulnerable to path traversal attacks. Using the <code>return</code> directive and sending a 301 redirect fixed this. Thanks to <a href="https://penple.dev/" target="_blank" rel="noopener">Penple</a> for making me aware of this vulnerability.</p>
<p>With Mastodon being all the rage right now and people massively moving over, new opportunities arise. One of these is that Mastodon allows you to take ownership of your identity using the <a href="https://docs.joinmastodon.org/spec/webfinger/" target="_blank" rel="noopener">WebFinger protocol</a>. This way you can have an identitiy like <code>me@example.org</code> without actually having to host your own Mastodon server (or instance in Mastodon lingo).</p>
<p>Maarten Balliauw has already posted on <a href="https://blog.maartenballiauw.be/post/2022/11/05/mastodon-own-donain-without-hosting-server.html" target="_blank" rel="noopener">how to achieve this</a> but with a little caveat:</p>
<p><em>&ldquo;this approach works much like a catch-all e-mail address. @anything@yourdomain.com will match, unless you add a bit more scripting to only show a result for resources you want to be discoverable.&rdquo;</em></p>
<p>I went ahead and solved this by tweaking the nginx configuration of one of my servers slightly (caveat here is you need access to the web server&rsquo;s configuration):</p>
<pre tabindex="0"><code>server {
    listen 80;

    location = /.well-known/webfinger {
        absolute_redirect off;
        return 301 $uri/$arg_resource;
    }
</code></pre><p>A WebFinger requests URL looks similar to this: <code>https://home.e13.dev/.well-known/webfinger?resource=acct:makkes@home.e13.dev</code>. Now whenever a request comes in at that URL nginx sends an HTTP 301 redirect pointing to <code>/.well-known/webfinger/acct:makkes@home.e13.dev</code> which in turn returns the contents of the requested file (if it exists). So the only thing to do is to create that file with the WebFinger details in it and store it at that location in nginx&rsquo;s web root.</p>
<p>This mitigates the &ldquo;catch-all&rdquo; limitation and only serves the identity or identities you want it to.</p>
]]></content:encoded></item><item><title>Taking it home — Kubernetes on bare-metal</title><link>https://e13.dev/blog/k8s-at-home/</link><pubDate>Wed, 09 Nov 2022 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/k8s-at-home/</guid><description>To learn how Kubernetes works you should run your own Kubernetes cluster on bare-metal hardware.
Discuss this post
In the world that I live in Kubernetes is all the rage. This is the world of professional software development and deployment where medium- and large-sized companies are trying to reduce cost and complexity of their IT platforms while at the same time becoming faster at making changes to the software that they run as services to either their internal or external customers.</description><content:encoded><![CDATA[<blockquote>
<p><em>To learn how Kubernetes works you should run your own Kubernetes cluster on bare-metal hardware.</em></p>
</blockquote>
<p><a href="https://hachyderm.io/@makkes/109315054984564587" target="_blank" rel="noopener">Discuss this post</a></p>
<p>In the world that I live in Kubernetes is all the rage. This is the world of professional software development and deployment where medium- and large-sized companies are trying to reduce cost and complexity of their IT platforms while at the same time becoming faster at making changes to the software that they run as services to either their internal or external customers. I&rsquo;ve been on the side of development teams consuming Kubernetes myself and I was impressed and delighted by its concept of &ldquo;desired state&rdquo; represented by simple manifest files that me and my team were maintaining for the applications that we built. Later I switched roles and became a Kubernetes engineer myself, now helping platform teams delivering Kubernetes to development teams. If you&rsquo;re eager to learn how Kubernetes works internally and what a complex system it is that makes it so simple to deliver applications then this blog post is for you. Because I deeply believe that <strong>in order to learn how Kubernetes works you should run your own Kubernetes cluster on bare-metal hardware</strong>.</p>
<p>Taking first steps with Kubernetes is easier today than it has ever been: My favorite project for quickly spinning up a cluster is <a href="https://kind.sigs.k8s.io/" target="_blank" rel="noopener">kind</a>, Kubernetes in Docker. Run <code>kind create cluster</code> and after a couple of seconds your cluster is ready to go. There&rsquo;s various alternatives out there, too, with <a href="https://microk8s.io/" target="_blank" rel="noopener">microk8s</a>, <a href="https://k3s.io/" target="_blank" rel="noopener">k3s</a> and <a href="https://minikube.sigs.k8s.io/docs/" target="_blank" rel="noopener">minikube</a> being the most prominent ones. This got me started easily and quickly with Kubernetes development back when I switched roles. However, later on, when I was involved in more complex product development around Kubernetes, building controllers and maintaining an enterprise-grade Kubernetes distribution at <a href="https://d2iq.com" target="_blank" rel="noopener">D2iQ</a>, I needed to get more intimate with the internals. I wanted to understand all the intricacies of it, what happened under the hood when I ran <code>kubectl apply -f my-awesome-app.yaml</code>, how traffic is ingested into a cluster and further routed to the right container, how DNS works in the cluster, what all the possible ways were to provide persistent storage to containers, how a cluster is properly secured from unauthorized access etc. etc.</p>
<p>At that point I figured I needed to run my own cluster at home on bare-metal hardware and dig really deep into the details of keeping a Kubernetes cluster up 24/7, serving applications to the Internet and the internal home network in a secure fashion. That was nearly 3 years ago when Raspberry Pis were still affordable enough that I could just grab a handful and get going. I ordered 4 Rpi 4s with 4 GByte of RAM in addition to the various older RPis I already owned, the awesome <a href="https://www.c4labs.com/product/cloudlet-cluster-case-raspberry-pi/" target="_blank" rel="noopener">8-slot transparent cluster case</a> from C4 Labs, a cheap 8-port Ethernet switch, a couple of Cat 6 Ethernet cables and a 6-port USB power adapter.</p>
<h1 id="setting-goals">Setting Goals</h1>
<p>I quickly figured I needed to set clear expectations of how the cluster would be used so I set myself some goals:</p>
<ul>
<li>It should run on a separate network, isolated from the rest of my home network for security purposes.</li>
<li>It should be possible to expose services from inside the cluster to my home network but not the Internet.</li>
<li>It should be possible to expose services from inside the cluster to the Internet.</li>
<li>The API server should be reachable from inside the cluster&rsquo;s LAN as well as from inside my home LAN but not from the Internet.</li>
<li>It doesn&rsquo;t need to be highly available so running a single control-plane node is good enough as a start.</li>
</ul>
<p>From these goals I derived a couple of designations for each of the nodes on the cluster network:</p>
<ul>
<li>1 router for bridging the cluster network and my home LAN.</li>
<li>1 control-plane node for both etcd and the Kubernetes control-plane components.</li>
<li>3 worker nodes.</li>
<li>1 machine for providing storage to the cluster using NFS.</li>
</ul>
<h1 id="the-final-architecture">The Final Architecture</h1>
<figure >
    <img loading="lazy" src="/images/k8s-home-arch.png"
         alt="An architecture diagram showing the network layout of my home Kubernetes cluster and surrounding components"/> <figcaption style="text-align: center;" alt="Click on Add New Site > Import an existing project">
            <p>The final architecture of my Kubernetes bare-metal cluster</p>
        </figcaption>
</figure>

<p>In the image above you see all the components that currently make up my home Kubernetes cluster. Everything in the 10.0.0.0/24 LAN is pretty standard with one node serving as control plane and 3 others serving as workers. All of the Kubernetes nodes are running an LTS Ubuntu version and are manually provisioned. I built some scripting around setting up default firewall rules, SSH access and a couple of other configuration items. Automating the node provisioning is still on my list. An additional node (running Debian, I don&rsquo;t recall why) has an SSD attached and serves it over NFS. More on that later.</p>
<h2 id="kubernetes">Kubernetes</h2>
<p>As one of my goals was to learn Kubernetes the hard way (not Kelsey Hightower style, though), I used <a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/" target="_blank" rel="noopener">kubeadm</a> to get the cluster going and that&rsquo;s still the tool I use to maintain it, e.g. when upgrading the K8s version. The configuration doesn&rsquo;t deviate too much from kubeadm&rsquo;s defaults which is good enough for my needs.</p>
<p>Even though I&rsquo;m the only user of that cluster at the moment, I did want to make it &ldquo;tenant-aware&rdquo; in the sense that there&rsquo;s a rather simple way to manage users. In the beginning I just created certificates for each user manually but I moved on and now user management is offloaded to a Keycloak instance I&rsquo;m running on a hosted server. Configuring Kubernetes&rsquo; API server for OpenID Connect isn&rsquo;t extremely complicated but you need figure out the right knobs. Here&rsquo;s an excerpt from my kubeadm configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">ConfigMap</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kubeadm-config</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">namespace</span>: <span style="color:#ae81ff">kube-system</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ClusterConfiguration</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    apiServer:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      certSANs:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - apiserver.cluster.home.e13.dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      extraArgs:</span>    
</span></span><span style="display:flex;"><span>[<span style="color:#ae81ff">...]</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">authorization-mode</span>: <span style="color:#ae81ff">Node,RBAC</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">oidc-client-id</span>: <span style="color:#ae81ff">k8s-apiserver</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">oidc-groups-claim</span>: <span style="color:#ae81ff">groups</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">oidc-issuer-url</span>: <span style="color:#ae81ff">https://##REDACTED##/realms/e13</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">oidc-username-claim</span>: <span style="color:#ae81ff">email</span>
</span></span><span style="display:flex;"><span>[<span style="color:#ae81ff">...]</span>
</span></span></code></pre></div><p>For client-side OIDC support I have installed the <a href="https://github.com/int128/kubelogin" target="_blank" rel="noopener">kubelogin kubectl plugin</a>. After having set these up I created some RoleBindings to provide the respective users/groups access to API resources (the RoleBinding manifests are all maintained in Git, more on that later).</p>
<p>Upgrading to the latest Kubernetes version is probably the most tedious task at the moment as I haven&rsquo;t automated any of that so it&rsquo;s mostly following the <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/" target="_blank" rel="noopener">upgrade guide</a>.</p>
<h2 id="network">Network</h2>
<p>The 10.0.0.0/24 network is a simple switched network using a cheap tp-link 8-port gigabit switch. The other network, 10.11.12.0/24 is my home LAN for all devices that need Internet connectivity, the Playstation 4, Echo devices, smartphones and laptops. We have Ethernet outlets in each room of our house and a 24-port gigabit switch in the basement. For wireless connectivity I have several wifi APs running in the house that operate on the same network. A MikroTik hEX router together with a VDSL modem provides Internet access. It serves IP addresses for Ethernet and wifi devices, acts as router and DNS server. It provides <a href="https://wiki.mikrotik.com/wiki/Manual:IP/Cloud#DDNS" target="_blank" rel="noopener">DDNS capabilities</a> capabilities out of the box and I&rsquo;m using a DNS CNAME entry to get traffic from outside into the network. You&rsquo;ll see it in action when accessing <a href="https://home.e13.dev" target="_blank" rel="noopener">home.e13.dev</a> (nothing fancy there, though).</p>
<h3 id="traffic-out">Traffic Out</h3>
<p>As you can see in the architecture diagram above, another Raspi (&ldquo;rpi0&rdquo;. I&rsquo;m too lazy to come up with a fancy naming scheme so all Raspis are just enumerated.) serves as router between the home LAN and the cluster LAN. It has two physical Ethernet interfaces (one provided through a USB-to-Ethernet adapter) and a MACVLAN interface. A pretty good explanation of the different virtual networking options you have on Linux is provided over at the <a href="https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking#macvlan" target="_blank" rel="noopener">Red Hat Developer portal</a>. Creating a MACVLAN interface with NetworkManager is pretty simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ nmcli c add ifname veth0 autoconnect yes save yes type macvlan dev eth1 mode bridge
</span></span><span style="display:flex;"><span>$ nmcli con modify macvlan-veth0 ipv4.dhcp-hostname <span style="color:#e6db74">&#34;rpi0-1&#34;</span>
</span></span></code></pre></div><p>I&rsquo;m not sure if there&rsquo;s a way to incorporate the second command into the first one but this is good enough for my needs. Now two of the interfaces are part of the home LAN (that provides Internet access) and the third one is part of the cluster LAN. The home LAN interfaces just use DHCP to get their IP configuration from the MikroTik router.</p>
<p>To the cluster LAN rpi0 serves as DHCP and DNS server using the awesome <a href="https://thekelleys.org.uk/dnsmasq/doc.html" target="_blank" rel="noopener">dnsmasq</a>. Dnsmasq automatically serves the host it&rsquo;s running on as default route. The domain of all cluster nodes is set by dnsmasq using the <code>domain=cluster.home.e13.dev</code> parameter. Now to make rpi0 actually work as a NAT gateway for the cluster LAN hosts, the Linux firewall (aka iptables) needs to be properly configured. This was the hardest part for me as I&rsquo;m not at all proficient in iptables. I would rather defer to your favorite search engine for finding out how to do that instead of giving potentially wrong advice. Suffice to know that my setup works (though it might not be the most efficient or secure).</p>
<h3 id="traffic-in">Traffic In</h3>
<p>Now the cluster nodes have Internet access through rpi0 but we also want to connect to services running in the cluster, e.g. a Grafana instance or any other web application deployed in Kubernetes. The usual way to expose a service in Kubernetes is to create a <code>LoadBalancer</code> type Service resource. If you&rsquo;re running Kubernetes on one of the major cloud providers this is all you need to do to get a public IP address or hostname assigned to the service. On bare metal, though, this is not the case. This is where <a href="https://metallb.universe.tf/" target="_blank" rel="noopener">MetalLB</a> enters the stage. Running in a cluster it takes care of assigning IP addresses and setting up the network layer of the nodes to direct traffic to those IP addresses to the right pods. On my cluster I&rsquo;m using the (simpler) <a href="https://metallb.universe.tf/concepts/layer2/" target="_blank" rel="noopener">Layer 2 mode</a> for advertising services and I set aside a part of the 10.0.0.0/24 address space to MetalLB (which I excluded from dnsmasq&rsquo;s DHCP server for assignment).</p>
<p>Next, traffic coming from outside of the cluster network needs to be proxied to each LoadBalancer IP address. For this to work I created my own little <a href="https://github.com/makkes/l4proxy" target="_blank" rel="noopener">transport layer proxy</a> configured simply through YAML files. It also ships with a <a href="https://github.com/makkes/l4proxy/tree/1a2ce6834f04bb2aa1f7f5c20e3609568fc2053c/service-announcer" target="_blank" rel="noopener">service-announcer tool</a> that generates l4proxy configuration files based on Kubernetes LoadBalancer-type Service resources it finds on the cluster. L4proxy then just binds to a configured interface and proxies the connections to one of the LoadBalancer services&rsquo; IP addresses.</p>
<p>L4proxy runs on both home LAN interfaces so that I can selectively forward traffic from either of the two home LAN interfaces on rpi0. Each of these interfaces has a specific dedication: One is only reachable from the home LAN (the one that has 10.11.12.32 assigned to it in the diagram above) so that I can constrain e.g. my smart home Grafana instance to LAN machines. The other interface receives traffic forwarded from the MikroTik Internet router that forwards all traffic directed at the DDNS domain to rpi0&rsquo;s interface (10.11.12.51 in the diagram).</p>
<p>Now that we have all the network shenanigans behind us we need to let Kubernetes know about the incoming traffic and where to direct it. As I said above MetalLB picks up LoadBalancer Services but there&rsquo;s no need to create those yourself when you&rsquo;re using an ingress controller. I opted for <a href="https://github.com/kubernetes/ingress-nginx" target="_blank" rel="noopener">ingress-nginx</a>, mainly for its simplicity. It creates a LoadBalancer service and directs traffic based on Ingress resources. You can read all about Ingresses in the wonderful <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" target="_blank" rel="noopener">Kubernetes documentation</a>.</p>
<h4 id="ingressclass-configuration-with-ingress-nginx">IngressClass configuration with ingress-nginx</h4>
<p>I have two instances of ingress-nginx running on the cluster, one for external traffic and one for internal traffic. Two different <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class" target="_blank" rel="noopener">IngressClass resources</a>, &ldquo;ingress-nginx&rdquo; and &ldquo;ingress-nginx-internal&rdquo; let each Ingress choose whether it should be exposed internally or externally. This is what the Helm values look like for the internal ingress-nginx controller:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>    <span style="color:#f92672">controller</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">electionID</span>: <span style="color:#ae81ff">ingress-controller-internal-leader</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ingressClass</span>: <span style="color:#ae81ff">nginx-internal</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ingressClassResource</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">internal-nginx</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">default</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">controllerValue</span>: <span style="color:#e6db74">&#34;k8s.io/internal-nginx&#34;</span>
</span></span></code></pre></div><p>One important thing I only figured out later on is that I needed to set the <code>electionID</code> parameters of each Helm release to a different value so that both instances don&rsquo;t conflict with each other for leader election.</p>
<h3 id="dns">DNS</h3>
<p>There is actually one last thing left to do: resolve host names defined in the Ingress resources to either the IP address of the internally facing rpi0 interface or the publicly facing ISP-assigned IP address of the MikroTik router. For internal services I merely maintain a list of static DNS entries on the MikroTik router. Each internal service, e.g. <code>grafana.cluster.home.e13.dev</code> is backed by a CNAME entry in turn resolving to the internal rpi0 interface. By using a CNAME I don&rsquo;t have to change all DNS entries whenever that interface&rsquo;s IP address changes. For externally facing services I maintain DNS entries at my DNS provider. Those also are just CNAME entries resolving to the DDNS name of my MikroTik.</p>
<h2 id="storage">Storage</h2>
<p>I&rsquo;m running a couple of stateful applications on my cluster, e.g. Grafana and some internal applications backed by SQL databases. This state needs to be persisted somewhere. In my search for a simple yet production-ready solution I chose to bet on NFS because it is very simple to set up and PersistentVolume provisioning in Kubernetes is easy to get using the <a href="https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner" target="_blank" rel="noopener">Kubernetes NFS Subdir External Provisioner</a>. The latter provides all resources to get going quickly. All my stateful data is backed by a PV provisioned from NFS at the moment. Before you do this on your own cluster, though, be aware of the following caveats:</p>
<ul>
<li>NFS is inherently insecure: It doesn&rsquo;t provide transit encryption of traffic or access control mechanisms. <a href="https://tldp.org/HOWTO/NFS-HOWTO/security.html" target="_blank" rel="noopener">This guide</a> by the linux documentation project provides details on the security aspects of NFS.</li>
<li>I found that NFS-backed PVs respond pretty badly to unscheduled node restarts. When a node goes down unexpectedly the pods can&rsquo;t be automatically moved to another one because they are stuck in Terminating state until I restore the node. I haven&rsquo;t found a solution to this, yet.</li>
<li>When the NFS server goes down, NFS mounts on nodes might get stuck without any ability to restore them other than rebooting the node. I managed to mitigate this a little by instructing the provisioner to use soft mounts. Those have a couple of drawbacks, though, so you might want to understand the implications before doing that yourself.</li>
</ul>
<p>I would never serve any serious production data from NFS shares but it&rsquo;s good enough for my home setup, especially since all the other solutions out there seem to require a lot more work to get setup and they consume more resources on the cluster nodes.</p>
<p>At the moment the NFS storage has no backup. I&rsquo;m manually creating DB backups of all the PostgreSQL databases from time to time but all other data might get lost once the NFS disk dies. This is something I still need to improve.</p>
<h2 id="day-2-operations-gitopsflux">Day 2 Operations: GitOps/Flux</h2>
<p>Given that the cluster setup is a little flaky, especially with only one control plane node, I wanted to operate it with the assumption that it might go down any day. (The disk <strong>will</strong> die some day!) This led me to store all the Kubernetes resources in Git and having <a href="https://fluxcd.io" target="_blank" rel="noopener">Flux</a> manage them for me. This way, I can easily restore all the applications from that Git repo in case I need to setup a new cluster.</p>
<h1 id="takeaways">Takeaways</h1>
<p>I did learn an awful lot in the last couple of years operating this cluster. I had downtimes for the strangest reasons, I replaced the CNI provider once while the cluster was running, I lost data by <a href="https://hachyderm.io/@makkes/109301463748074424" target="_blank" rel="noopener">accidentally deleting a PV with a <code>Delete</code> ReclaimPolicy</a> and I probably forgot a couple of other issues I ran into (and very likely caused myself). As you can see from the list above running your own Kubernetes cluster at home and using it for anything serious is a lot of upfront work. It also is a lot of regular maintenance work. You need to keep the OS on each node up-to-date, you need to update Kubernetes from time to time, exchange dying nodes, restore data after disk failures. You&rsquo;ll occasionally be opening your browser only to see that your app is down for some strange reason.</p>
<p>For me that was the whole purpose of the exercise and it helps me improve in my day-to-day job as a Kubernetes engineer and Flux maintainer.</p>
]]></content:encoded></item><item><title>Running a Docker registry on Kubernetes (in kind)</title><link>https://e13.dev/blog/docker-registry-on-k8s/</link><pubDate>Fri, 06 Nov 2020 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/docker-registry-on-k8s/</guid><description>In the last weeks I have been working a lot on supporting Kubernetes in air-gapped environments, i.e. environments that don&amp;rsquo;t have any access to the internet. Many companies prefer to run their IT infrastructure in such a way to minimize the attack vector against it and be able to tightly control what&amp;rsquo;s running on their clusters. Part of these setups naturally is a Docker registry that runs on that air-gapped infrastructure and in order to properly reproduce such a scenario, I had to run a Docker registry on my kind cluster as well and I thought sharing the manifests may help anyone out there get setup faster next time.</description><content:encoded><![CDATA[<p>In the last weeks I have been working a lot on supporting Kubernetes in air-gapped environments, i.e. environments that don&rsquo;t have any access to the internet. Many companies prefer to run their IT infrastructure in such a way to minimize the attack vector against it and be able to tightly control what&rsquo;s running on their clusters. Part of these setups naturally is a Docker registry that runs on that air-gapped infrastructure and in order to properly reproduce such a scenario, I had to run a Docker registry on my <a href="https://kind.sigs.k8s.io/" target="_blank" rel="noopener">kind</a> cluster as well and I thought sharing the manifests may help anyone out there get setup faster next time. Running a Docker registry may be even more important given the <a href="https://www.docker.com/blog/what-you-need-to-know-about-upcoming-docker-hub-rate-limiting/" target="_blank" rel="noopener">new position</a> that Docker Inc. has put us into.</p>
<h2 id="tldr-">TL;DR ⏳</h2>
<p>When trying to run a custom Docker registry on kind, you will face some obstacles: The registry has to be reachable from outside of the cluster (to push images) and from each cluster node (by kubelet). Plus, the CA certificate of the registry has to be advertised to each cluster node as well. <a href="#the-complete-rundown-">Jump down for the TL;DR steps</a>.</p>
<h2 id="getting-there-">Getting there 🚶</h2>
<p>My first idea was to just create a <code>Secret</code>, a <code>Deployment</code> and a <code>ClusterIP</code> <code>Service</code> exposing the deployment. To be able to push images to the running registry I just had to add <code>registry.registry.svc</code> to my <code>/etc/hosts</code> file with the address 127.0.0.1 and do a <code>kubectl -n registry port-forward svc/registry 1443</code>. From then on I was able to tag an image with the <code>registry.registry:1443/</code> prefix and push it to the newly created registry. 🥳</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ docker tag nginx:1.19.4 registry.registry.svc:1443/nginx:1.19.4
</span></span><span style="display:flex;"><span>$ docker push registry.registry.svc:1443/nginx:1.19.4
</span></span><span style="display:flex;"><span>The push refers to repository <span style="color:#f92672">[</span>registry.registry.svc:1443/nginx<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>7b5417cae114: Layer already exists
</span></span><span style="display:flex;"><span>aee208b6ccfb: Layer already exists
</span></span><span style="display:flex;"><span>2f57e21e4365: Layer already exists
</span></span><span style="display:flex;"><span>2baf69a23d7a: Pushed
</span></span><span style="display:flex;"><span>d0fe97fa8b8c: Pushed
</span></span><span style="display:flex;"><span>1.19.4: digest: sha256:34f3f875e745861ff8a37552ed7eb4b673544d2c56c7cc58f9a9bec5b4b3530e size: <span style="color:#ae81ff">1362</span>
</span></span><span style="display:flex;"><span>$ k run nginx --image<span style="color:#f92672">=</span>registry.registry.svc:1443/nginx:1.19.4
</span></span><span style="display:flex;"><span>pod/nginx created
</span></span><span style="display:flex;"><span>$ k get pod nginx
</span></span><span style="display:flex;"><span>NAME    READY   STATUS         RESTARTS   AGE
</span></span><span style="display:flex;"><span>nginx   0/1     ErrImagePull   <span style="color:#ae81ff">0</span>          13s
</span></span></code></pre></div><p>Whoops, that didn&rsquo;t work so well. So a pod that would reference the image I just pushed into the internal registry has issues pulling it. Let&rsquo;s look at the details:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ k describe pod nginx
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>Events:
</span></span><span style="display:flex;"><span>  Type     Reason     Age               From               Message
</span></span><span style="display:flex;"><span>  ----     ------     ----              ----               -------
</span></span><span style="display:flex;"><span>  Normal   Scheduled  16s               default-scheduler  Successfully assigned default/nginx to kind-control-plane
</span></span><span style="display:flex;"><span>  Normal   BackOff    15s               kubelet            Back-off pulling image <span style="color:#e6db74">&#34;registry.registry.svc:1443/nginx:1.19.4&#34;</span>
</span></span><span style="display:flex;"><span>  Warning  Failed     15s               kubelet            Error: ImagePullBackOff
</span></span><span style="display:flex;"><span>  Normal   Pulling    3s <span style="color:#f92672">(</span>x2 over 16s<span style="color:#f92672">)</span>  kubelet            Pulling image <span style="color:#e6db74">&#34;registry.registry.svc:1443/nginx:1.19.4&#34;</span>
</span></span><span style="display:flex;"><span>  Warning  Failed     3s <span style="color:#f92672">(</span>x2 over 16s<span style="color:#f92672">)</span>  kubelet            Failed to pull image <span style="color:#e6db74">&#34;registry.registry.svc:1443/nginx:1.19.4&#34;</span>: rpc error: code <span style="color:#f92672">=</span> Unknown desc <span style="color:#f92672">=</span> failed to pull and unpack image <span style="color:#e6db74">&#34;registry.registry.svc:1443/nginx:1.19.4&#34;</span>: failed to resolve reference <span style="color:#e6db74">&#34;registry.registry.svc:1443/nginx:1.19.4&#34;</span>: failed to <span style="color:#66d9ef">do</span> request: Head https://registry.registry.svc:1443/v2/nginx/manifests/1.19.4: dial tcp 127.0.0.1:1443: connect: connection refused
</span></span><span style="display:flex;"><span>  Warning  Failed     3s <span style="color:#f92672">(</span>x2 over 16s<span style="color:#f92672">)</span>  kubelet            Error: ErrImagePull
</span></span></code></pre></div><p>Look closely at the <code>From</code> column of the events. It&rsquo;s the kubelet service that&rsquo;s unable to pull the image and when you think about it, it makes total sense that it can&rsquo;t because kubelet isn&rsquo;t run inside of the cluster but rather directly on each node. So somehow I needed to make the registry available to each node.</p>
<h2 id="trying-harder-">Trying Harder 💪</h2>
<p>Enter the <code>NodePort</code> service type which makes a service available externally via the IP addresses of cluster nodes. This service also helps us killing two birds with one stone: We can push images to the registry into the cluster as well as pull images from inside of the cluster (i.e. the kubelet). So I created a kind cluster exposing the service&rsquo;s port to the host using the <code>extraPortMappings</code> configuration option, changed <code>/etc/hosts</code> to let <code>kind-control-plane</code> point to 127.0.0.1 and change the <code>ClusterIP</code> service to be a <code>NodePort</code> service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ kind create cluster --config<span style="color:#f92672">=</span>- <span style="color:#e6db74">&lt;&lt;EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">kind: Cluster
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">apiVersion: kind.x-k8s.io/v1alpha4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">nodes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- role: control-plane
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  extraPortMappings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  - containerPort: 30443
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    hostPort: 30443
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    listenAddress: &#34;127.0.0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    protocol: tcp
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>Creating cluster <span style="color:#e6db74">&#34;kind&#34;</span> ...
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>$ k create -f docker-registry.yaml
</span></span><span style="display:flex;"><span>namespace/registry created
</span></span><span style="display:flex;"><span>secret/registry created
</span></span><span style="display:flex;"><span>deployment.apps/registry created
</span></span><span style="display:flex;"><span>service/registry created
</span></span><span style="display:flex;"><span>$ docker push kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>$ k run nginx --image<span style="color:#f92672">=</span>kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span>pod/nginx created
</span></span><span style="display:flex;"><span>$ k describe pod nginx
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>...<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>  Normal   Pulling    7s <span style="color:#f92672">(</span>x3 over 66s<span style="color:#f92672">)</span>   kubelet            Pulling image <span style="color:#e6db74">&#34;kind-control-plane:30443/nginx:1.19.4&#34;</span>
</span></span><span style="display:flex;"><span>  Warning  Failed     7s <span style="color:#f92672">(</span>x3 over 50s<span style="color:#f92672">)</span>   kubelet            Error: ErrImagePull
</span></span><span style="display:flex;"><span>  Warning  Failed     7s <span style="color:#f92672">(</span>x2 over 38s<span style="color:#f92672">)</span>   kubelet            Failed to pull image <span style="color:#e6db74">&#34;kind-control-plane:30443/nginx:1.19.4&#34;</span>: rpc error: code <span style="color:#f92672">=</span> Unknown desc <span style="color:#f92672">=</span> failed to pull and unpack image <span style="color:#e6db74">&#34;kind-control-plane:30443/nginx:1.19.4&#34;</span>: failed to resolve reference <span style="color:#e6db74">&#34;kind-control-plane:30443/nginx:1.19.4&#34;</span>: failed to <span style="color:#66d9ef">do</span> request: Head https://kind-control-plane:30443/v2/nginx/manifests/1.19.4: x509: certificate signed by unknown authority
</span></span></code></pre></div><p>Oh well, that is somehow expected. I created a self-signed certificate to back the registry&rsquo;s HTTPS transport so somehow I now had to make kubelet aware of the CA certificate.</p>
<h2 id="the-last-step-">The last step 🏁</h2>
<p>To make kubelet (or rather containerd) aware of the new CA certificate, I had to copy it into the Docker container that&rsquo;s running the cluster node (this is a single-node cluster, after all):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ docker cp /tmp/tls.crt kind-control-plane:/usr/local/share/ca-certificates/
</span></span><span style="display:flex;"><span>$ docker exec -t kind-control-plane update-ca-certificates
</span></span><span style="display:flex;"><span>Updating certificates in /etc/ssl/certs...
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">1</span> added, <span style="color:#ae81ff">0</span> removed; <span style="color:#66d9ef">done</span>.
</span></span><span style="display:flex;"><span>Running hooks in /etc/ca-certificates/update.d...
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>.
</span></span><span style="display:flex;"><span>$ k run nginx --image<span style="color:#f92672">=</span>kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span>pod/nginx created
</span></span><span style="display:flex;"><span>$ k get pod nginx -w
</span></span><span style="display:flex;"><span>NAME    READY   STATUS              RESTARTS   AGE
</span></span><span style="display:flex;"><span>nginx   0/1     ContainerCreating   <span style="color:#ae81ff">0</span>          0s
</span></span><span style="display:flex;"><span>nginx   1/1     Running             <span style="color:#ae81ff">0</span>          2s
</span></span></code></pre></div><p>Et voilà! The table is set. An improvement to having to have the CA certificate file laying around in my filesystem, I just extraced it from the <code>Secret</code> in the cluster.</p>
<h2 id="the-complete-rundown-">The Complete Rundown 🏎</h2>
<ol>
<li>
<p>Download the <a href="/downloads/docker-registry.yaml">Docker registry manifest</a></p>
</li>
<li>
<p>Install the registry and configure the cluster node:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ kind create cluster --config<span style="color:#f92672">=</span>- <span style="color:#e6db74">&lt;&lt;EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">kind: Cluster
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">apiVersion: kind.x-k8s.io/v1alpha4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">nodes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- role: control-plane
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  extraPortMappings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  - containerPort: 30443
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    hostPort: 30443
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    listenAddress: &#34;127.0.0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    protocol: tcp
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>$ k create -f docker-registry.yaml
</span></span><span style="display:flex;"><span>$ k -n registry get secret registry -o jsonpath<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;{.data.tls\.crt}&#39;</span>|base64 -d|docker exec -i kind-control-plane sh -c <span style="color:#e6db74">&#34;cat - &gt; /usr/local/share/ca-certificates/registry-ca.crt &amp;&amp; update-ca-certificates &amp;&amp; systemctl restart containerd.service&#34;</span>
</span></span></code></pre></div></li>
<li>
<p>Make the service available with the node&rsquo;s name (the <code>grep</code> makes sure we&rsquo;re not adding a 2nd entry):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>grep -E <span style="color:#e6db74">&#39; kind-control-plane( |$)&#39;</span> /etc/hosts <span style="color:#f92672">||</span> echo <span style="color:#e6db74">&#39;127.0.0.1 kind-control-plane&#39;</span> | sudo tee -a /etc/hosts
</span></span></code></pre></div></li>
<li>
<p>Push an image and create a test pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>$ docker pull nginx:1.19.4
</span></span><span style="display:flex;"><span>$ docker tag nginx:1.19.4 kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span>$ docker push kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span>$ k run nginx --image<span style="color:#f92672">=</span>kind-control-plane:30443/nginx:1.19.4
</span></span><span style="display:flex;"><span>$ k get pod nginx -w
</span></span></code></pre></div></li>
</ol>
]]></content:encoded></item><item><title>Ansible delegation madness: delegate_to and variable substitution</title><link>https://e13.dev/blog/ansible-delegation-madness/</link><pubDate>Fri, 19 Jul 2019 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/ansible-delegation-madness/</guid><description>Today I spent several hours tracking down a bug in one of our playbooks where a variable would be substituted with the wrong value if the task was delegated.</description><content:encoded><![CDATA[<p>This is going to be a short piece but I really want to share this because 1)
I have to talk! It cost me several hours today to get a grip on this and 2) I
couldn&rsquo;t find any explanation of this Ansible behaviour on Stack Overflow or
anywhere else (I actually <a href="https://stackoverflow.com/questions/57116025/ansible-variable-substitution-in-combination-with-task-delegation/" target="_blank" rel="noopener">posted this on
SO</a>
to make sure it&rsquo;s now there). By the way, I was reminded today that it can save
you several hours of bug tracking, experimenting and general hair-tearing if you
just know to <a href="https://stackoverflow.com/questions/31912748/how-to-run-a-particular-task-on-specific-host-in-ansible/31912973" target="_blank" rel="noopener">ask. the right. question.</a></p>
<p>Here at <a href="https://mesosphere.io" target="_blank" rel="noopener">Mesosphere</a> (and especially in the Cluster Ops
team I&rsquo;m in) we use Ansible a lot for various stuff related to spinning up/down
and maintaining clusters. We build tools around making all of the operations of
DC/OS (and other) clusters insanely easy. Since I&rsquo;ve joined the company recently
coming more from an application developer background and mostly developing tools
in Go here I&rsquo;m not the most proficient Ansible user on this planet. So what I
had to achieve today was to run some tasks on all of the cluster&rsquo;s nodes and
some tasks only on one special node. What I came up with looked a bit like this:</p>
<pre tabindex="0"><code>01 - hosts: all
02  name: Test Play
03  gather_facts: false
04
05  tasks:
06      - name: Create output directory
07        tempfile:
08            state: directory
09            suffix: diag
10        register: output_dir
11
12      - name: Create API resources directory
13        file:
14            path: &#34;{{ output_dir.path }}/api-resources&#34;
15            state: directory
16        delegate_to: &#34;{{groups[&#39;control-plane&#39;][0]}}&#34;
17        run_once: yes
18        register: api_resources_dir
</code></pre><p>The intent of this playbook was to create temporary directories on every node
(for storing some command output) and on one and only one host this temporary
directory should contain a directory named <code>api-resources</code>. When I ran this
playbook, though, that host ended up with two temporary directories, one of
which had the same name as the temporary directory on another host (and the
latter was surprisingly (or not) the one that conducted the delegation).</p>
<h1 id="what-happened-here">What happened here?</h1>
<p>Turns out, the expression <code>{{ output_dir.path }}</code> in the second task is
evaluated before the task is delegated to the other node. Therefore, the node
creates the <code>api-resources</code> directory in another directory as the one that is
created in the first task.</p>
<h1 id="whats-the-correct-way-to-do-this">What&rsquo;s the correct way to do this?</h1>
<p>The correct way is to first figure out what you&rsquo;re doing wrong and why. That
took me 90% of the time today. It&rsquo;s probably just a matter of Ansible experience
and of not just applying the same pattern (using <code>delegate_to</code>) you&rsquo;ve seen
elsewhere. Interestingly enough, I figured out the correct question only after I
found the answer to my problem: &ldquo;How do I run a task on one specific node?&rdquo;. But
when you think that <code>delegate_to</code> is the right solution you don&rsquo;t even arrive at
asking that question (again).</p>
<p>There&rsquo;s this nice thing called <code>when</code> in Ansible that comes in handy here.
Here&rsquo;s the corrected playbook:</p>
<pre tabindex="0"><code> 1	- hosts: all
 2	  name: Test Play
 3	  gather_facts: false
 4
 5	  tasks:
 6	      - name: Create output directory
 7	        tempfile:
 8	            state: directory
 9	            suffix: diag
10	        register: output_dir
11
12	      - name: Create API resources directory
13	        file:
14	            path: &#34;{{ output_dir.path }}/api-resources&#34;
15	            state: directory
16	        when: inventory_hostname == groups[&#39;control-plane&#39;][0]
17	        register: api_resources_dir
</code></pre><p>Nice and slick. I hope this post will save someone a bad day.</p>
<p>Have a great one!</p>
]]></content:encoded></item><item><title>O'Reilly Software Architecture Conference: My ping from London</title><link>https://e13.dev/blog/oreilly-sac18/</link><pubDate>Sun, 04 Nov 2018 00:00:00 +0000</pubDate><guid>https://e13.dev/blog/oreilly-sac18/</guid><description>I attented the conference this October in London and I&amp;#39;m sharing my two cents here.</description><content:encoded><![CDATA[<p>I attended O&rsquo;Reilly&rsquo;s <a href="https://conferences.oreilly.com/software-architecture/sa-eu" target="_blank" rel="noopener">Software Architecture Conference in
London</a> this
October and I thought I&rsquo;d share my personal wrap-up of the most striking talks
I&rsquo;ve heard there. So buckle up for a tiny race through three days of talks and
workshops:</p>
<p><a href="https://twitter.com/sarahjwells" target="_blank" rel="noopener">sarahjwells</a> from Financial Times gave a
great advice on how to fight code rot in your microservice architecture:
<strong>Consider building overnight to fight code rot and keep services live and
healthy</strong>. This is great advice since there may be services in your environment
that you&rsquo;ll probably won&rsquo;t touch for a few months and if you don&rsquo;t constantly
keep them building some developer having to fix a bug in one service will have a
hard time fixing outdated dependencies and stuff first.</p>
<p>I especially enjoyed <a href="https://twitter.com/lizrice" target="_blank" rel="noopener">lizrice&rsquo;s</a> keynote on
container security: <strong>Scan your container images for security vulnerabilities</strong>
and consider using <code>seccomp</code> in your containers.</p>
<p><a href="https://twitter.com/crichardson" target="_blank" rel="noopener">crichardson</a> simply stated: <strong>Microservices
shall not be the goal, that&rsquo;s an anti-pattern</strong>. Yeah, for those of you who
didn&rsquo;t grasp that, already, probably.</p>
<p>I also attended <a href="https://twitter.com/allenholub" target="_blank" rel="noopener">allenholub&rsquo;s</a> talk on
choreographing microservices (in contrast to orchestrating them). Especially
enjoyable was his opinion on delivery: <strong>I deploy the most simple implementation
and if nobody complains I&rsquo;m done</strong>. So true on so many levels, especially in an
enterprise environment where I work.</p>
<p><a href="https://twitter.com/nikhilbarthwal" target="_blank" rel="noopener">nikhilbarthwal</a> shed some light on
real-world FaaS. My insight from his talk: <strong>FaaS instances are auto-scaled but
your DB probably isn&rsquo;t</strong>. As I followed the Twitter stream, though, his
opinions very passionately discussed and disputed. I liked his balanced plea
for a hybrid world of FaaS and &ldquo;old-school&rdquo; microservices.</p>
<p><a href="https://twitter.com/stilkov" target="_blank" rel="noopener">stilkov</a> presented the most common software
architect&rsquo;s types; the one that sticked to me most is the <strong>Disillusioned
Architect</strong> that abstracts everything away. Stefan pointed to the term
&lsquo;Architecture Astronauts&rsquo; coined by Joel Spolsky.</p>
<p><a href="https://twitter.com/mikebroberts" target="_blank" rel="noopener">mikebroberts&rsquo;</a> keynote was especially
enlightening when he talk about the <strong>four levels of adopting serverless</strong>:</p>
<ol>
<li>
<p>Serverless operations (env. reporting, Lambda as shell scripts, Slack bots,
deployment automation)</p>
</li>
<li>
<p>Cron jobs, Serverless offline tasks</p>
</li>
<li>
<p>Serverless activities (message processing, isolated microservices)</p>
</li>
<li>
<p>Serverless ecosystems (websites, web applications, serverless data pipelines)</p>
</li>
</ol>
<p>Really great!</p>
<p>Less technical career advice for architects was given by JetBrains&rsquo;
<a href="https://twitter.com/trisha_gee" target="_blank" rel="noopener">trisha_gee</a>: <strong>Everyone is an architect these
days</strong>,  <strong>ask questions and then LISTEN to the answers</strong>, <strong>be open to change
your mind</strong>, <strong>do pair programming not only with developers but probably with a
business analyst</strong>, <strong>answer Stack Overflow questions</strong>. The last one is&hellip;
so&hellip; great.  Lay aside some time for your team to constantly be active on Stack
Overflow and it will change your attitude towards people and technologies and
you will learn A LOT!</p>
<p>Pivotal&rsquo;s <a href="https://twitter.com/cdavisafc" target="_blank" rel="noopener">cdavisafc</a> talked about getting rid
of the request-response paradigm in your software architecture. The punch line:
<strong>There&rsquo;s a major difference between old-style messaging (aka ESB) and event
logs like Kafka (e.g. no queues, event log as single source of truth, loosely
coupled data): The former is anti-agile while the latter is agile.</strong></p>
<p>Thanks, O&rsquo;Reilly for getting all those people (and me) to London. Perhaps we&rsquo;ll
see again next year.</p>
]]></content:encoded></item></channel></rss>