This talk will provide motivation for the extensive instrumentation of complex computer systems and make the argument that such systems. This talk will provide practical starting points in Erlang projects and maintain a perspective on the human organization around the computer system. Brian will focus on getting started with instrumentation in a systematic way and follow up with the challenge of interpreting and acting on metrics emitted from a production system in a way which does not overwhelm operators’ ability to effectively control or prioritize faults in the system. He’ll use historical examples and case studies from my work to keep the talk anchored in the practical.
Talk objectives:
Brian hopes to convince the audience of two things:
* that monitoring and instrumentation is an essential component of any long-lived system and
* that it's not so hard to get started, after all.
He’ll keep a clear-eyed view of what works and is difficult in practice so that the audience can make a reasoned decision after the talk.
Target audience:
This talk would appeal to engineers with long-running production employments, operations folks and Erlangers in general.
Slides
Brian is a seven-year Erlang hacker, having gotten into it at University and his interests run to the large-scale distributed side of things. He's worked with Erlang professionally for four years, currently at AdRoll where he's a developer on the RTB (real-time bidding) team and previously at Rackspace, where he was a developer on the FireEngine project (previously discussed at Erlang Factory 2012). Twitter: @bltroutwine GitHub: blt