SmartScan outage due to failed OCR service update.
Incident Report for Visma Cloud Services
Postmortem

The outage happened when updating the Golang dependencies of our OCR service.

Due to a part of the code that casts the Google generated TextAnnotation protobuf to a version of the TextAnnotation we maintain to be able to change add Normalized bounding boxes, after calculating them. Casting between the 2 of them broke due to a lot of changes that the golang protobuf APIs er going through, and which has in the newer versions of the OCRs services go dependencies are now being followed at google. (https://blog.golang.org/protobuf-apiv2)

Unfortunately this edge case was not covered by our unit tests, our healthchecks for smartscan didn’t ping us because our main healthcheck is using a cached image, to avoid incurring high costs by always calling google vision, and for a so far unexplained reason the one that doesn’t run the cached image, did not observe the error.

And the one graph we have that would have warned us, was previously not setup with an alarm.(It will be now).

We apologize for the inconvenience.

Kind regards

Visma ML.

Posted Aug 10, 2020 - 16:25 CEST

Resolved
The incident started on friday Aug. 7 with a failed update of our OCR services, that unfortunately got through staging without being caught by tests, due to caching.
The issue got resolved after 1hour and 20 minutes on Aug. 7 10:55, after customers made us aware of the issue.
Posted Aug 07, 2020 - 09:30 CEST