I have wanted to go to NDC for years now, and it’s amazing that tombola sent me (along with a colleague) down to London to be a part of it this year.

There was lots to take in! We saw 18 talks in 3 days, along with a boat ride and a party. With 8 tracks you can easily fill an entire day with a specific discipline should you wish!

Rather than just an overview of NDC or a summary of many talks, I’ve decided to pick a few specifically and explore those with detail…

 


Life with actors

Erlend Wiig & Vagif Abilov

This talk was given by two gents from Norway, Erlend and Vagif (Vagif being a contractor originally from Russia), and it was the first talk I have ever attended in which included a song. The topic was based around their experience building and running an application built using Akka.net. I wouldn’t of been interested in watching another ‘how to get started with actors’ or ‘why you should use Akka.net’, they shared some really valuable insights – Issues they encountered and how they solved them which is valuable.

To set the scene, NRK is a Norwegian broadcasting company. Their competitors would be the Netflix’s and the like… The application in particular they talked about was a streaming media application. Previously they struggled to achieve parallelism using threads, often leading to complicated flows which were difficult to maintain and even harder to debug. During programme download, they would buffer the stream in parallel in an attempt to speed up the process, however to achieve that with threads they had to keep state somewhere, which thread was processing which chunk etc… So there was a lot of shared state. When redesigning the application they decided to address these problems two fold:

  1. Adopt a functional language (F#)
    The type system helped them reason about their domain much easier. As well as F# being more expressive, they could make certain assumptions about there problem space which a language like C# (currently…) doesn’t inherently support.
  2. The actor model for concurrency (Akka.net)
    They chose to adopt the actor model, primarily for it’s approach to concurrency which fits well in their problem space.

Tell() vs Ask():

Actors communicate via message passing. In order to give an actor work, you must send it a message. Each actor has it’s own mailbox, which no other actor can manipulate. There are several ways to tell an actor what to do.

In Akka, you can call Tell() on an actor reference; which sends a one way message to that actor. Tell() doesn’t wait for a response.
Then there is Ask(), Ask() sends a message to the actor and also waits for a reply. This can be useful for many scenarios, but it’s not very performant.
Actors process messages one at a time, this is important. So when the actor is waiting for a response to an Ask() call, it can’t process anymore messages. So you are essentially losing your parallelism performance by using Ask(). Even worse is if a child actor fails to respond at all, then you have an Ask() task which will never complete.

So what’s the solution?

Part of the problem is the way we tend to look at a problem in terms of request/response pattern. It’s become natural to many of us developers to reason about a problem in this manner. A request to something is made, and the thing that made that request then waits for a response, but there are many ways thing about this problem, RPC is only one of them. Take a step back from computing for a second, take a look at the world around you; you will notice that most information exchange in real life aren’t based on the request/response pattern. So generally, the request/response pattern isn’t best suited to message based solutions.

We can instead choose to think about it in a tell only manner. The guys found a solution using Become() (C#) or using mutually recursive functions (F#) to create an actor state machine.

I’ve implemented it myself in F# to give you an idea. Rather than dump the code in the post, I’ve dumped it on pastebin; so you can check it out here http://pastebin.com/eF39QUXv if you’re interested.

The Akka Song

Talks should be foremost presented as entertainment. The amount we take in is in directly affected by this. Vagif understands this, so he decided to summarise the entire talk as a song – In which he sang and played the keyboard to the audience, it was a bit of fun and probably one of the reasons why i’m talking about it right now.

 


 

Avoiding micro service mega-disasters

Jimmy Bogard | (@jbogard)

Jimmy is the author of Automapper and MediatR .NET libraries, but he also has a really good blog on lostechies which I’ve always been a fan of. He’s one of those people who is very good at explanation. With that, along with a timely topic – I had to go and see Jimmy’s talk!

Jimmy told us a story from a company he previously worked for who had a micro service nightmare. He courteously hid their identity and instead referred to them as Bell.
Bell.com had been undergoing a rewrite, in which they chose to adopt a micro services architecture – Loosely following Netflix’ approach.

To me; to you

After 18 months of development, everything was ready to be switched on, they were ready to go live. Once they hit the button, they found the entire site was completely unresponsive, nothing loaded – Just a white page.

This was because they failed to adhere to any strict rules, fundamentally; that a service should only be able to call 1 other service; and that other service should not call any other. So you can imagine an architecture similar to a death star diagram but where the circle is completely full of network calls – Every service having multiple dependencies; each calling the other. So a request comes in, which calls a service, which calls another service, and another – On it goes. In the end this led to incredibly slow requests.

Own your SLA or your SLA will own you.

It’s not unreasonable to expect a service to have a 99.99% SLA, which seems pretty good. But lets think about this a little. Say we have a micro services architecture with 20 services. Each service has an SLA of 99.99% uptime, that works out at about 9 seconds of downtime per day – But we have 20… Let’s do the math…

 

99.9920 = 99.8%
99.8% = 2m 58s downtime per day.
1 day = 1 million requests (let’s say)
99.8% =  2,060 failed requests per day

 

Remember, this is with good uptime! I’d imagine it’s going to be worse in reality.

Back to the original point – What if we have many service calls depending on each other to fulfil a single request, upto 5 in the chain let’s say? It gets worse, much worse.

 

99.9920 = 99.8%
99.85 = 99%
99% = 14m 24s downtime per day.
99% =  10,000 failed requests per day (of 1 million)

 

This is most certainly being generous, depending on the load, if one service becomes latent; the whole request lingers around in the pipeline which can choke the whole system.

Microservices does not mean high availability, you need to build fault tolerance in from the beginning and be strict with it throughout. Just because someone missed a semicolon and the whole system doesn’t crash you’re not off the hook, errors still propagate but now they are doing it on a network level.

Segregated development

Another problem Bell had was while they were developing these services, they were built and tested independently to the system, so full integration never occurred until the end stages. You would suspect this is fine when you have strict contracts between services, but integration unearths new problems which don’t exist during development – most notably; performance.

Bell defined a rule that no service call could take more than ~150ms to complete. This is fine until they ran a request through all integrated services – All of this latency adding up.

The solution

It was clear they needed to find a way to fix this, and without another 18 months of development. Jimmy (and a team) decided, at least temporarily; to inverse the core services – To push the data to a database/store rather than to call the services. This works particularly well for the services in which the data doesn’t change often.


ASP.NET Core – Real world patterns and pitfalls

Damian Edwards and David Fowler | @DamianEdwards , @davidfowl

This was the first session I attended, and it was a barrage of .net core insights from a couple of it’s core creators. Damian is program manager on the ASP.NET team. David is a core architect on that team and together they both created SignalR. There were many talks around .net core this year, naturally, but I was really looking forward to getting it from the horses mouth. I knew this was going to be the most valuable .net core talk of the conference.

This talk wasn’t a ‘What is new in .net core’, or ‘This is how to build apps in .net core’. Most of the scenarios the guys ran us through were ‘Don’t ever do this… ever. Do this instead!’ type deal. Which instantly makes it much more valuable to me. Tutorial type talks at a conference are quite throw away I find, I much prefer a talk based around a problem and how they solved it.

I’ll briefly talk about a few of the points that I found valuable, maybe it will interest you enough to delve deeper.

 

TestServer() – Microsoft.AspNetCore.TestHost package

How’s this for integration testing…

 

Makes testing an api much easier than .net 4.5 doesn’t it? This is possible because of the separation of a Host and the Server. A host is responsible for application startup and lifetime management, where as the server is responsible for handling HTTP requests. .net core doesn’t tightly couple these concepts like .net 4.5 does. This makes mocking HTTP and context a doddle.
Interestingly, Entity Framework (if you’re using that) provides it’s own testing solution by allowing you to simulate a DB in memory. So together with the TestHost package you can get end to end testing.

 

Configuration reload

As of .net core 1.1, there is an optional flag on the IConfigurationBuilder.AddJsonFile extension method called reloadOnChange, if true; forces a reload of the whole configuration file during runtime. Handy!

 

Dependency Injection

This was quite a big portion of the talk, the guys demonstrated various way in which the built in IoC container can be abused, lots of dos and don’ts:

  1. Don’t use async constructors in your DI factories. They should be kept fast and synchronous, else you risk deadlocks.
  2. Disposable transient services are captured by the container for disposal, this can lead to a memory leak. It’s a pretty bad one I think, it’s not obvious why the container would do this – So a good one to remember!note: I’d imagine they will fix it at some point, but as of v1.1, this issue exists.
  3. Service location – Avoid manually calling GetService where possible, the container does a much better job through constructor injection, take this for example:

 

 


 

Conclusion

To summarise the conference, it was a great experience! Chin wagging with many industry experts and innovators (certainly within the Microsoft and JavaScript platforms anyway), and the learning will indeed prove useful to the team. I can’t help but be glad I don’t live in London however – North East is much nicer 🙂