This week’s articles: Cloudflare Blog: A Byzantine failure in the real world Dropbox Tech Blog: Rewriting the heart of our sync engine Jaana Dogan: Why is metric collection still a hard problem in 2020? Ronak Nathani: What I wish I knew about incident management Upcoming Events: 10 Dec 2020: Papers We Love SF: Jon Moroney on “An Empirical Analysis of Email Delivery Security”
It’s been a while since I’ve written a weekly wrap, however, I want to try and get in the habit again of highlighting articles I really enjoyed reading recently. https://rachelbythebay.com/: Type in the exact number of machines to proceed. H/T to SREWeekly( https://www.idontplaydarts.com/: Detecting the use of “curl | bash” server side Usenix OSDI20: A large scale analysis of hundreds of in-memory cache clusters at Twitter Fastly: Can QUIC match TCP’s computational efficiency? Upcoming Events: 19 Nov 2020: Papers We Love SF: David Murray on “Time, Clocks, and the Ordering of Events” 25 Nov 2020: Deploy RIPE ATLAS probes
News Google talk about their Canary Analysis Service (CAS) here Baron Schwartz on what is obervability here Microservices 101 here Github Publish their incident report from their recent 1.7Tbps DDOS attack here Events [March 15th - BayLisa - Learning from the Fire Department: Experiences with Incident Command for IT](Learning from the Fire Department: Experiences with Incident Command for IT) March 21st - Performance Engineering Meetup @ LinkedIn March 22nd - Docker Birthday #5 March 29th - Papers We Love (San Francisco) April 4th - San Francisco meetups Meetup April 30 - May 4th Interop ITX Paper of the Week On Designing and Deploying Internet-Scale Services here RFC of the Week RFC-7540: Hypertext Transfer Protocol Version 2 (HTTP/2)
Hi everyone, welcome to this week’s wrap. I am now including a list of Meetups/ Events that may be of interest to my audience. These are all Bay Area based News The Evolution of Distributed Systems Management here A brilliant guide from Redhat on container lingo here Github survives the largest DDOS ever here How to manage feature flag tech debt here Events March 6th - Big Data Meetup @ LinkedIn - Tuning Spark & Hadoop Jobs with Dr Elephant March 7th - San Francisco Metrics Meetup [March 15th - BayLisa - Learning from the Fire Department: Experiences with Incident Command for IT](Learning from the Fire Department: Experiences with Incident Command for IT) March 21st - Performance Engineering Meetup @ LinkedIn March 22nd - Docker Birthday #5 March 29th - Papers We Love (San Francisco) April 4th - San Francisco meetups Meetup Paper of the Week B4: Experience with a Globally-Deployed Software Defined WAN here RFC of the Week RFC-8203: BGP Administrative Shutdown communication
Welcome to this week’s wrap ThousandEyes wrote a nice post on their new ‘Network Intelligence’ product and show it in action during the Dow Jones drop here A great view at how Software engineering is more than just writing code here A good introduction to Prometheus here A great piece on why engineering managers should do oncall here Mike Julian from Monitoring Weekly has started a new project called ‘Your Next Hire’. A site based around hiring engineers here Have a great week!
Welcome to this weeks wrap. Facebook wrote an excellent post here about planning, scaling and load-testing their live-video services for New Years (here) A good article that cuts through the hyperbole of observability and gives a solid analysis of the space (here) Real-time streaming ETL with Oracle Transactional data (here) Building your own CDN for fun and profit (here) Nginx now supports HTTP2 server push (here) Great writeup of great Devopsy resources (here)
Welcome to this weeks wrap. Unfortunately a bit late due to a busy week Fonseca: “An empirical study on the correctness of formally verified distributed systems” Erin Atwater: Netsim is a simulator game intended to teach you the basics of how computer networks function, with an emphasis on security. You will learn how to perform attacks that real hackers use, and see how they work in our simulator! Brandon Rhodes: Tutorial on Sphinx from Pycon Adrian Coyler: The Morning Paper on Operability Ethan Banks: Slides from Interop ITX: The Future of Networking Sachin Malhotra: How we fine-tuned HAProxy to achieve 2M concurrent SSL connections Argo from Cloudflare
Some of my colleagues have mentioned to me that I share some really good articles on LinkedIn, so I thought I would try doing a weekly post with a wrap of the best things I read. I’m going to start on a Tuesday due to the Memorial Day public holiday. HighScalability: “The Always On Architecture - Moving Beyond Legacy Disaster Recovery” Bilgin Ibryam: “It Takes More Than a Circuit Braker to Create a Resilient Application” Ben Treynor, Mike Dahlin, Vivek Rau, Betsy Beyer: “The Calculus of Service Availability” Manas Gupta: “Monitorama 2017: My Impressions” Yuval Bachar: “Taking Open19 from Concept to Industry Standard” Nick Babich: “4 Ways Use Functional Animation in UI Design” Geoff Huston: “BBR TCP” Lisa N Roach: “Exploring Network Programmability with Python & Yang” Bruno Connelly & Bhaskaran Devaraj: “Building the SRE Culture at LinkedIn” See you next week!