Last week we were fortunate enough to attend NDC London which had some of the industry’s biggest names including Troy Hunt, Jon Skeet and Scott Hanselman. Rather than give a brief summary of each talk we attended, we’ll discuss the areas we found most interesting and those that could potentially benefit the company in the future.
You build it, you run it. Why developers should also be on-call (Chris O’Dell)
Great talk on how developers should be on call and take ownership of our own code.
It seems as developers, the continuous delivery pipeline usually ends at deploy. But why does the feedback loop stop there? We are provided with feedback at every step in the pipeline but once it’s shipped we get very little feedback other than metrics, if we build those ourselves, listening for what we think we need to listen for. Sometimes the only way to know what is happening on the other side is metrics using something similar to the ELK Stack. These real-time metrics allow us to drive rapid, precise and granular iterations and is sometimes referred to as Metrics Driven Development. The initial idea of developers being on-call is an evolution of this.
If we spot something that is obscure in a given metric, this could indicate early that something is wrong. What if this service goes down or no longer functions as it should, affecting the consumer? Who best knows about the changes made and how the underlying functionality is written? The developer who worked on it right?
“When things are broken, we want people with the best context trying to fix things.” – Blake Scrivener, Netflix SRE Manager
Involving the right people as quickly as possible will result in quicker action being taken as the developer who made the changes is more likely to locate the source of failure much quicker than a developer who has never seen the code.
So why should we own our own code? Ownership means we would have much greater freedom of tool choice and allows us to innovate. We can become experts in supporting our existing tooling should own our product for its lifetime. If we own the products we build and things do go wrong, wouldn’t you want to know as soon as possible if things are failing? If we are called out of hours, we can then make the compensating actions such as turning off a feature toggle or redeploy a known good version. What is important here though is to not investigate the root cause during an incident. Get the system/application stable and functioning again and hold a blameless post-mortem shortly after. Mitigating fixes can then be prioritized to ensure you never fail the same way twice.
As an on-call developer we should not be expected to sit and watch logs but instead some sort of alerting tool should be in place. These alerts should be informative and actionable and only used if the issue is something worth waking somebody up for. These calls/alerts could arrive at 3am!
Reasonable SLA’s for responding to alerts should be agreed and sensible escalation procedures to support the developer. On-call rotas should also be agreed, being mindful of the addition time and pressures of being on-call and do everything possible to prevent burnout. Ensure the team is large enough to allow for rest, sickness and holidays. Limit the number of consecutive days a developer can be on call. This is all very easy on paper but what if the development team isn’t large enough or only one or two developers worked on the product?
Rewarding on-call duties may lead to developers wanting to be on-call for months on end as they are saving up for a holiday, adding to the fear of burnout. Incentives is likely to be the most difficult to agree within the team but should not be overlooked.
Some major companies already have developers on-call including Amazon, Netflix and Google. The idea of having developers on-call is fast becoming an industry standard but is this a bad thing? Being empowered to own and innovate? We should take pride in our services, without destroying our team.
Web Apps can’t really do *that*, can they? (Steve Sanderson)
Another informative talk on new features that exists that we may not know even exists or we are not currently taking advantage of. One concept covered in this talk was the use of Service Workers.
The fetch event listener is fired for every http request made on the web page which allows us to change what happens in terms of network access. This is how we allow our web pages to be viewed offline. On initial page load we can cache all resources and when subsequent http requests are made, and the fetch event listener is fired, we can instruct the browser to return all resources from the cache as opposed to the server when there is no network connection.
Service Workers can also help with handling slow connections. We can now try to fetch from the network but if the response takes longer than a specified maximum wait time we can serve from the cache instead. We can still wait for the original network response to complete and once it does we can use it to programmatically update the cached content. This giving the benefit of immediate page loads whilst still getting updated content, eventually. The purpose of this talk was not necessarily to demonstrate the only way to handle viewing web pages offline but it did show how powerful Service Workers are as they give you the ability to control how your page interacts with the network.
Another use case for Service Workers is push notifications. Push notifications can be sent from the backend of a web application to a user’s browser. This would fire an event inside the registered Service Worker even if the user is not on the web page.
Synchronisation can be used too where a user edit a page when they are offline, the Service Worker can send updates back to the server when the user’s connection is restored. Service Workers will also be capable of doing scheduled tasks in the future but none of the browsers have implemented this yet. Service Workers will soon become a universal feature of all popular browsers.
Composite UIs: The Microservices Last Mile (Jimmy Bogard)
Interesting talk around compositional UI’s when working with microservices. Although this is not how we are currently structured it did highlight the issues people find when deciding where to compose the UI elements of a microservice architecture. In a typical application you have the client side, server side and database and we can compose at any one of these levels. Composing a nice user friendly interface can be difficult when there are multiple backend services behind the scenes.
To compose on the client side you first must make a decision on the front end framework to be used (Angular, React, Vue, etc.) These frameworks are usually built around components and can be bundled using webpack. These individual packages then communicate using their own API’s to the backend services. One thing highlighted when designing was to avoid cross-component chatter as this is a sign that your boundaries are wrong and is not autonomous. A service should not rely on other services to operate. Client side composition is great when it is independent widgets but not so when multiple backend services are required to produce a single widget. This may be better solved using server side composition.
With server side composition you don’t have to worry about shifting your entire development team over to a new front end framework but tends to be a little messier. It is usually always using the MVC pattern but we still need to decide at which of these levels do we perform the composition? The model, view or controller. As the controller should typically not handle logic we can exclude this immediately.
Model side composition involves getting all the information for one widget. Data may come from multiple services in the backend but is gathered in a single view model. We should build resilience in case one service goes down to ensure the rest of the requests can be made for the other data and not prevent the whole request from resolving. For example, if a single service goes down we should be able to simply not render this section on the page but still show the rest of the page content.
We could implement a view model appender to append information to a dynamic view model if the request context matches. This view model can then be passed to a view and need not know where the information came from or what services were called to get the data. No framework exists to do this for you so must be built yourself.
View side composition takes advantage of ASP .NET Core’s view components. They encapsulate a mini-request with invocation, model and a view. They are typically invoked from within a view, usually a layout, and can include parameters and business logic.
Finally we can compose at the database level. The problem with composing at this level you may have data coming from multiple backend services leading to duplicate data. If each service doesn’t own the data and is immutable, it only needs to know when the data is changed so it can update its own copy. Storing this data in something such as ElasticSearch, effectively acting as a data cache, raises questions as to how often this data needs to be updated. Some data could be updated daily but other data may need updating immediately. A messaging queue could be useful with database triggers sending messages to the queue when data is updated.
Choosing the right compositional model should be based on what is best for the user experience and will largely be application specific. Only compose if it is absolutely necessary as the front end can become its own service boundary.
Overall I thought the conference was very good and was fortunate to have been given the opportunity to attend. Some of the talks were extremely informative whilst others gave a small insight into the possibilities of the subject. I came away with a few areas I would like to investigate further and I hope in the future we can learn from these and maybe implement here at tombola. Troy Hunt’s talks were delivered as a true professional and is no surprise he was invited to give a speech at congress.
Web Apps can’t really do *that*, can they? (Steve Sanderson)
Running Blazor on Mono in the browser – http://blog.stevensanderson.com/2017/11/05/blazor-on-mono/
Nginx for .NET Developers (Ian Cooper)
Thursday started off with Ian Cooper’s talk titled Nginx for .NET Developers. It was a talk I was excited to hear with .NET Core not necessarily having to use IIS as it’s default web server. Ian started his talk describing how Kestrel, the default web server for ASP.NET Core apps was not suitable as an edge server, as although it will do HTTP processing, it lacks several features including compression and URL rewriting. This is where Nginx would sit in front of Kestrel proving the features it is lacking. At the core of Nginx there is a configuration file which includes simple and block directives. An example directive could be the return directive which is used to define rewrite rules. Although Nginx can be used instead of IIS there is considerable differences between the two. Nginx is always doing I/O tasks, so it is not an application container, neither is it hosting your code like IIS does. Nginx will receive a request and either serve a static file from disc or forward the request to another endpoint.
Testing in Production (Gel Goldsby)
The final talk I’ll review was by Gel Goldsby titled Testing in Production. Gel works for an online advertising company called Unruly, where they decided that getting features into production as quickly as possible was their number one priority, even if it meant increasing the likelihood of shipping bugs. Unruly’s developers got together and decided that context switching was the main issue they faced while working through a sprint. They found that the time it took for a new piece of code to make its way through development, staging and finally production took too long and more often than not, if that piece of code caused an issue in production the developer that had carried out the work had already moved onto another task.
In order to speed up the deployment process Gel’s team implemented a number of optimisations, a couple of which made me feel a bit uncomfortable. The first of which being abandoning the use of branches and committing straight to master. Gel claimed the use of branches caused confusion when committing and deploying which to be honest I didn’t fully understand as at Tombola we’ve never really had an issue with branches. Secondly, Unruly removed the need to deploy to development and staging in order to get code to production. Finally they moved all of their tests to be run on production.
In order to facilitate these significant changes to their deployment pipeline the emphasis on code quality had to be as high as possible. Each piece of code committed was done as a pair, so at least two developers had written and reviewed the code before merging. If a feature was seen to be high risk, Gel’s team would use mob programming. Finally each commit had to be a single unit of work.
Once the code made it to production it was behind a feature toggle and could be released to a subset of users. Gel also mentioned the need to have thorough metrics on display throughout the office so any issues could be identified straight away. With all of these optimisations Unruly were able to push to production in five minutes and only having to roll back on average three times a month. I believe that this model of testing in production may well work for certain business sectors as Unruly have shown.
Overall I enjoyed the conference, with Steve Sanderson’s talk being the standout. Interestingly it was the lesser known speakers that I found delivered more informative talks. Nginx and other HTTP/reverse proxy servers are definitely something I will look to investigate more and look forward to how Blazor evolves over the next year.