I recently did a post about an upcoming migration for Crash Catch where its backend rewrite is complete and getting released along with some website improvements. More information can be found about what to expect by checking my previous blog post titled "Crash Catch Backend Rewrite Release".
In the previous blog post it mentioned how the client libraries (the libraries that you can add to your projects to send crashes and errors to Crash Catch) will need to be updated.
The reason for this, the new backend crash submission is now JSON posted content, where originally it was form url encoded.
This will mean that any projects will need to be updated to ensure you main compatibility with Crash Catch. But don't worry, you don't need to update straight away I've maintained backwards compatibility. So, what I have done to ensure this, lets have a bit of deep dive into the previous backend and the new backend.
Previous Data Flow and Architecture
The previous version of Crash Catch data processing engine, what I can the engine is responsible for the crash submission API, processing crash queues, sending email alerts, clearing old data etc. This engine was originally written in C++ and received HTTP request in form url encoded format, processed the crash and added it to an internal FIFO (First In, First Out) queue (I fixed that as well, which I'll talk about briefly later on) to then be processed and stored in the database. Below is a diagram showing the current data flow:
New Data Flow and Architecture
The new backend is done in C# .Net Core which makes the build and release a lot simpler compared to using C++ and using the in built API instead of my own custom built HTTP handler as I had within the C++ engine. The .Net Core API prefers JSON as do most APIs nowadays use JSON over form-url-encoded hence why there's a requirement for the client libraries to be upgraded.
What I created to maintain backward compatibility is a small Python script that will receive the data from the Crash Catch client and check the HTTP request. This is what I refer to as the API proxy. It will check if the Content Type is application/json which if it is, it will just forward the request to the engine, and then return a response back to the client. If the Content Type is form-url-encoded it will then adapt the HTTP POST data to be a JSON object and update the HTTP headers accordingly and then forward to the engine. The new data flow is shown below:
This API proxy is only a temporary connection to remain backwards compatible to give users a changes to update their projects to the latest version.
I'll make the new versions of the libraries available once the new release has been installed and will send out a notification email when they are available.
I don't have any timeframes as yet but the plan would be to remove this API Proxy at some in the not too distant future, I want to see how it goes first before I start adding a deadline and cause potential problems for existing projects.
FIFO Queue Changes
As mentioned above, and shown in the diagrams, part of the engine rewrite was changing the crash processing queue. The reason for the queue is to ensure the API performs as quickly as possible, we don't want any slight delay in us processing the crash and storing the crash to cause a delay in the response, therefore putting that delay into your own projects and apps and services. The idea is the API will receive the crash, perform some basic validation and then submit to the query, and then return an HTTP response, hopefully HTTP 200 OK to say the crash was received, anything else, means the crash submission failed. Then a separate thread within the engine is polling the FIFO queue to do the nitty gritty grunt work of processing the crash and storing it in the database.
Although this never happened, I realised this had a fundamental flaw. If we received a crash and had several crashes queued up within the internal crash queue and the engine itself crashed all of those crashes would be lost, which I didn't like.
Instead what will happen is the crash comes into the engine, does the same validation check but instead of adding it to an in memory queue within the engine itself, it passes it over to Redis all of the details and again, a separate thread within the engine is looking for new crashes in the redis queue, pulls them off and processes. In this scenario, if the engine crashes, when it starts back up again, it will still have access to the submitted crashes in the queue and continue to process them.
Of course this is not completely fool proof, as Redis is running on the same local server as the engine and if Redis, or the server itself crashes or unexpectedly restart the engine will still lose the crashes that were in the queue, but I think this is less likely and I don't believe there's a way of avoiding this completely, apart from maybe a synchronised redis server but I think that's a tad unnecessary - at least at this stage.
So, there you have it, there's a little bit of background on the engine rewrite and what to expect upon the release of Crash Catch later on this month. If you have any questions or comments then please let me know, and as always if you have any feedback about Crash Catch, good or bad, then again please let me know.