Thursday, 12 January 2017

Base 64 Encoding and URLs

I recently had some issues with base64 encoding of images and documents, prior to sending them over HTTP to the frontend of my app, and there decoding it again.

The issues were manifold.

Let me try and indicate the problems I encountered.

In the backend I use the BaseEncoding class provided in the Core of Google Guava. (com.google.common.io.BaseEncoding) The method "base64()"speaks for itself.

At the frontend, using Javascript, I used the "atob()" method¹ to get the whole base64 encoded string back into it's original shape.

Problem 1 - String contains an invalid character

The "atob()" method threw an error, when attempting to decode String. After some research, involving comparing the string that is sent by the Backend, with the string received by the Frontend, I did notice two differences.

Apparently, the backend is sending a url-safe encoded string², despite me not having specified that this is
what I want.

Well, the differences aren't major and a simple solution does the trick:

atob(contents.replace(/-/g, "+").replace(/_/g, "/"));

And voila, atob() no longer complains about invalid characters.

Problem 2 - Decoded document does not match original document

My app stored the received document, after decoding, into a file locally, so it can be opened. My App is created using Cordova (and Ionic and some other stuff) and I use the Cordova File Plugin³ to write the file.

The PDF document that I was using as a test, seemed to be transferred just fine, but upon opening it on the Tablet, an empty PDF document was shown.

There were vast differences between the original document and the decoded document. The differences seem to focus on the decidedly "weird" characters. The alphabet seemed just fine.

Being at a loss for the moment, I decided to use a library named js-base64⁴. That didn't help in the slightest. Using runkit⁵ to test it, I found that it actually decodes base64 encoding into UTF-8 badly.

There were a number of bugs reported with it in GitHub.

Problem 3 - Cordova File Plugin

After switching back to the method "atob()", and comparing the output of "atob()" with the original document, I found them to be identical.

However, the file stored on the Tablet was still suffering from the exact same symptoms. Clearly something was going wrong with the Cordova File Plugin.

After looking at the documentation³, I found that the File Plugin will also output UTF-8, similar to the js-base64 library.

In the end, I found out that it only outputs UTF-8, if I write a string in a Blob to the File. If I change what I write into a JavaScript ArrayBuffer in a Blob, things work as they should.

And I finally got a nice PDF in the standard PDF Viewer of the Tablet.

References

[1] MDN - Base64 encoding and decoding: https://developer.mozilla.org/en/docs/Web/API/WindowBase64/Base64_encoding_and_decoding
[2] RFC4648 The Base16, Base32, and Base64 Data Encodings - Section 5: https://tools.ietf.org/html/rfc4648#section-5
[3] File - Apache Cordova - cordova-plugin-file: https://cordova.apache.org/docs/en/latest/reference/cordova-plugin-file/
[4] js-base64 - Yet another Base64 transcoder in pure JS: https://www.npmjs.com/package/js-base64
[5] Runkit: https://runkit.com/npm/js-base64
StackOverflow - Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings: http://stackoverflow.com/questions/30106476/using-javascripts-atob-to-decode-base64-doesnt-properly-decode-utf-8-strings
C#411 - Convert Binary to Base64 String: http://www.csharp411.com/convert-binary-to-base64-string/

Random Thoughts on Java Programming