20 Jun 2017 • on OpenCV C++ cross-platform

OpenCV Web Apps

OpenCV comes to the web!

OpenCV

This blog is mostly about C++ and computer vision. I’ve written about my cross-platform development philosophy with The Salami Method and about the OpenCV library. I’ve also experimented with C++ on the web.

The culmination of these explorations is obviously to write an OpenCV web app!

TL;DR
Emscriptening OpenCV
1. The Build System
2. Configuring the OpenCV Build
Building The App Core
1. The Color Cycling Algorithm
Platform Targeting
The Boundary Interface Layer (BIL)
The Native Import Layer (NIMP)
The Host App
The Party Parrot App
Summary
Outstanding Issues
1. Wish List

TL;DR

For the impatient:

On Windows 10: I built OpenCV static libs with CMake, the ninja generator and the Emscripten toolchain.
I created a new C++ OpenCV project with desktop and Emscripten targets.
For the web target, created C wrapper functions to handle RGBA image buffers
On the JS glue side:
1. get frames from HTML5 video element;
2. copy them to the Emscripten heap memory and pass to asm code;
3. convert result frames to TypedArrays;
4. display on HTML canvas
SHOW ME!!

Emscriptening OpenCV

Emscripten is a powerful llvm-based compiler that can compile C++ code to JavaScript.
OpenCV is a cross-platform C++ library.
How hard can it be?
– me

This whole journey started when I came upon the OpenCV.js project. This project provides “JavaScript Bindings for OpenCV”. Essentially exposing OpenCV types and functionality via a JS interface.
But I’m a C++ guy and, more importantly, wanting to write cross-platform, multi-target code, I want to continue working in C++. OpenCV.js does the opposite of what I wanted.
It did however manage to build OpenCV with Emscripten, so bottom line: it can be done!

The Build System

OpenCV.js used custom Python scripts to build OpenCV instead of using CMake. I wanted to build OpenCV directly using its own CMake files, as these are always kept in sync with the latest versions and I did not want the additional Python in the build toolchain.
To make things more interesting I was working on a Windows 10 machine.

Fail #1

The Emscripten docs claim that you can use Emscripten on Windows if you have Visual Studio 2010. Fortuitously, I actually had VS2010 installed on my machine. I used CMake to generate a VS2010 project with the Emscripten toolchain and generated the project. Unfortuately, VS2010 balked with lots of strange errors which I’ve never encountered before and refused to build 😞.

Fail #2

Undettered, I thought: 💭 “Who needs an IDE, I’ll just build it on the command line like a manly-man! 💭 I whipped up CMake, and as per the Empscripten instructions, chose the “UNIX makefiles” generator with the Emscripten toolchain… Alas, make is not available on Windows. Maybe I’ll install MINGW, MSYS, … blah blah blah… (2 hours later)… didn’t work 😖.

Fail #3

Aha! Windows 10 sports the newfangled WSL, the Windows Subsystem for Linux! … installs WSL
Now let’s get Emscripten for Linux… hmmm… no binaries for Linux. OK, let’s build from source. Needs CMake so we apt-get CMake. Oops.. needs a newer CMake version… build CMake from source… installs many more dependencies… (several hours pass)… dejection 😠. I am a Linux wuss.

Remember, at this stage, all I am trying to do is set up the F*CKING build system just to generate the build files! This doesn’t have anything to do with OpenCV yet!!

Success at Last!

Admit it: everybody hates make (a million monkeys might recreate a Shakespeare sonnet, but all makefiles are descended by copy-paste-mutate from some long forgotten ancestral Makefile-Adam).
But I digress. As a final, desparate, attempt, I decided to use the Ninja CMake generator. Ninja is a small build system with a focus on speed. It differs from other build systems in two major respects: it is designed to have its input files generated by a higher-level build system, and it is designed to run builds as fast as possible. It is also supported on Windows without any additional dependencies.
So, cinst ninja on the command line, and Ninja was installed (Chocolatey is a command-line package manager for Windows).

Open CMake, select the OpenCV source folder, select the build folder and Configure:

Choose the “Ninja” generator;
Choose “Specify toolchain file for cross-compiling”;
Next;
Specify the Emscripten toolchain file path (look for it in the (Windows) Emscripten installation folder, it’s called Emscripten.cmake);
Finish.

CMake will churn and run a bunch of compiler-config tests and settle down for the build customization.
Of course, you could use the command-line cmake for this, but I prefer cmake-gui which is what I used and what I show here.

At this point we have a CMake OpenCV build system!

Configuring the OpenCV Build

The OpenCV CMake files provide lots of options and optional dependencies. You can choose what subset of OpenCV to build. The CMake config will also disable anything it can’t find.

In my build, I enabled:

all the BUILD_opencv_* modules that were on by default;
BUILD_JPEG
BUILD_PNG

I unchecked (disabled):

BUILD_DOCS
BUILD_EXAMPLES
BUILD_FAT_JAVA_LIB (only appears after the first Configure)
BUILD_IPP_IW
BUILD_PACKAGE
BUILD_PERF_TESTS
BUILD_SHARED_LIBS ⇐ This is important since we want to statically link with the OpenCV libraries.
BUILD_TESTS
BUILD_WITH_DEBUG_INFO
CV_ENABLE_INTRINSICS ⇐ More on this later.
WITH_PTHREADS_PF

You can also uncheck a whole bunch of default-checked WITH_* options, though configuring will detect that these are not supported and not build them anyway. I only left WITH_PNG, WITH_WEBP and WITH_JPEG which are on by default.

Select the empty options for CPU_BASELINE and CPU_DISPATCH. These are related to the CV_ENABLE_INTRINSICS option. You might have to do this after the first Configure again.

Now, let’s make a few changes:

Set CMAKE_BUILD_TYPE to “Release`
Set Emscripten compiler options CMAKE_CXX_FLAGS and CMAKE_C_FLAGSto:
-O3 --llvm-lto 1 --bind -s ASSERTIONS=2 --memory-init-file 0.

Hit Configure and inspect the report to see what was found and what was generated.
Some options will only appear after configuring so you may have to do several iterations.

When satisfied, hit Generate.

Go to the build folder and run ninja. If all is well, OpenCV will be built with emscripten as desired. Amazingly, the full build is very fast taking less than 2 minutes.

Building The App Core

The Color Cycling Algorithm

At this point we need to write our OpenCV app. As per The Salami Method, we’ll create a cross-platform static library to provide the desired functionality, and wrap this library with target specific layers.

For my app I decided to make Sirocco, the zoologist shagging Kākāpō parrot behind the meme, PARTY!

The algorithm I implemented is very simple:

Convert the RGB color image to the HSV color space;
Shift H, the Hue channel, by some value;
Convert back to RGB [¹]

cv::Mat3b hsv;
int accHOffset = 0;

namespace color_cycle
{
  void rotate_hue(cv::Mat3b const& img, cv::Mat3b& result_img, int hsteps)
  {
     assert(result_img.size() == img.size() && result_img.type() == img.type());
  
     // re-allocate global temp storage if needed
     hsv.create(img.size());
     
     // convert to HSV
     cv::cvtColor(img, hsv, CV_BGR2HSV_FULL);
  
     for (int r = 0; r < hsv.rows; ++r)
        for (int c = 0; c < hsv.cols; ++c)
           hsv.at<cv::Vec3b>(r, c)[0] = (hsv.at<cv::Vec3b>(r, c)[0] + accHOffset) % 255; // cycle H
  
     // convert back to BGR
     cv::cvtColor(hsv, result_img, CV_HSV2BGR_FULL);
     
     // update cycle offset
     accHOffset += hsteps;
  }
  
  void clear_all()
  {
     // release global's memory
     hsv.release();
     accHOffset = 0;
  }
}

The rotate_hue() function expects two images: an input color image img and an out parameter result_img. It also takes an integer hsteps for the cycling speed.
Note that the function expects the output image result_img to already be allocated and of the correct size and format.

As with any online, real-time, processing algorithm it is important to avoid repeated reallocations.
In this case I am using a global (😱) image hsv as the temporary HSV image. More complex systems can use objects to store state across calls. Once allocated with a particular size, it will remain in memory and reused.
The function clear_all() can be used to release the memory.

The Hue channel units are angles, so cycling the colors requires modulus addition which unfortunately is not part of the arithmetic operations supported by cv::Mat. Thus, I use the simplest method of iterating over all the pixels.

Since these are simple cross-platform C functions, this core functionality code corresponds to the XAPI or XCAPI layers of the Salami Method separation layers.

All the code can be found here.

Platform Targeting

Our project’s CMakeLists.txt file is surprisingly similar to any OpenCV dependent project’s. It basically does find_package(OpenCV REQUIRED) and include_directories(${OpenCV_INCLUDE_DIRS}) into the projects. It then target_link_libraries() with ${OpenCV_LIBS}.

CMake will generate a XAPI static library color_cycle_lib for all platforms, and an additional second target “driver” app which is platform-specific (e.g. Emspcripten or desktop), .

For the Emscripten target, all we need is:

add_executable (color_cycle_asm web/color_cycle_js.cpp)                 
target_link_libraries(color_cycle_asm ${OpenCV_LIBS} color_cycle_lib)

This creates the color_cycle_asm JavaScript module and links it with both OpenCV and our static library color_cycle_lib. We set the Emscripten compiler flags:
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++1z --llvm-lto 1 --bind -s ASSERTIONS=2 --memory-init-file 0 -O3") and that’s really all there is to it!

The full file is here.

The Boundary Interface Layer (BIL)

For the Emscripten target, the platform-specific Boundary Interface Layer (BIL) is the C-wrapper in color_cycle_js.cpp:

“The role of this layer is to have one clear place in the code where the core interfaces the target platform. This is where platform-specific conventions, constraints and conversions are enforced. It is here that we must perform bidirectional data type and value conversions between the “native” target platform and our platform agnostic code from the previous layers. The required conversions are dictated by each particular target.

Here are the interesting bits of the file:

#include <emscripten.h>
// ...
cv::Mat3b bgr_g, bgr_out_g; // global data

extern "C" 
{
   //////////////////////////////////////////////////////////////////////////

   bool EMSCRIPTEN_KEEPALIVE rotate_colors(int width, int height,
                                           cv::Vec4b* frame4b_ptr,
                                           cv::Vec4b* frame4b_ptr_out, 
                                           int hsteps) try
   {
      // wrap memory pointers with proper cv::Mat images (no copies)
      cv::Mat4b rgba_in(height, width, frame4b_ptr);
      cv::Mat4b rgba_out(height, width, frame4b_ptr_out);

      // allocate 3-channel images if needed
      bgr_g.create(rgba_in.size());
      bgr_out_g.create(rgba_in.size());

      // rearrange channels
      cv::cvtColor(rgba_in, bgr_g, CV_RGBA2BGR);
      
      // do the actual work!!
      color_cycle::rotate_hue(bgr_g, bgr_out_g, hsteps);

      // mix BGR + A (from input) => RGBA output
      const Mat in_mats[] = { bgr_out_g, rgba_in };
      constexpr int from_to[] = { 0,2, 1,1, 2,0, 6,3 };
      mixChannels(in_mats, std::size(in_mats), &rgba_out, 1, from_to, std::size(from_to)/2);
      return true;
   }
   catch (std::exception const& e)
   {
      std::cout << "Exception thrown: " << e.what() << std::endl;
      return false;
   }
   catch (...)
   {
      std::cout << "Unknown exception thrown!" << std::endl;
      return false;
   }
   
   // ...

We expose an extern "C" function rotate_colors with the EMSCRIPTEN_KEEPALIVE flag. This is the way to indicate to the Emscripten emcc compiler not to remove the C function from the resulting module.

The function expects raw pointers to 4-channel, 8-bit, memory buffers representing an RGBA input and output images, as provided and consumed by JavaScript canvas images. It then proceeds as follows:

Wraps these pointers inside cv::Mats;
Allocates stateful temprary storage for 3-channel images;
Drops the A(lpha) channel from then input and rearranges the channel components;
Calls the core static library function color_cycle::rotate_hue().
This is where the actual processing is done!
Reassembles an RGBA image from the result directly into the output buffer.

The BIL is the only place where Emscripten, the host platform, is #included and acknowledged.

One important thing to note, is that we are exporting the C function directly, and we’ll be calling it directly (that’s why we need to specify the EMSCRIPTEN_KEEPALIVE flag). The Emscripten binding tools like Embind and the WebIDL Binder introduce multiple conversions and redundant memory copying overhead that makes them terribly inefficient for transferring large memory buffers. Similarly, ccall() and cwrap() introduce unneccessary delays on the JS side. I’ve seen innocuous JS ArrayBuffer-passing code that silently incurred at least 3 consecutive copying of the data before reaching the C++ code!

Also note that since this is a module boundary we need to beware of any exceptions. In this case I just catch anything and log the unfortunate event.

On the desktop CMake target, I have a separate driver app that reads a video, applies rotate_hue() and displays the result.

The Native Import Layer (NIMP)

At this point we have crossed the Rubicon or some would say The River Styx. We have left our beloved C++ behind and entered the realm of JavaScript. I focused on browser side JS, but the same concepts will work on the server as well [²].

All the JS logic for the web-page is in color_cycle.js. A lot of it has to do with loading and showing an HTML5 video element and manipulating canvases. I adopted the code from a multitude of web samples, and delving into their details is not only not my specialty, but is not very pertinent to what we are discussing.

Perhaps the only major performance-related drawback of using Emscripten over plain JavaScript, is that Emscripten cannot interact directly with JavaScript memory buffers (e.g. ArrayBuffers or TypedArrays). Instead, memory buffers must be allocated and released (no GC) on the Emscripten heap via the Module._malloc() and Module._free() functions provided by Emscripten. When passing around ArrayBuffers or TypedArrays using Emscripten’s binding tools, the wrappers will silently allocate heap data and copy everything there, and only then pass the data onwards (often incurring additional copies further down).

My puzzlement, befuddlement, research and inquiries eventually had Alon Zakai, Emscripten’s creator, explain to me to what is apparently the most efficient way to pass buffer data to Emscriptend code to-date.

The relevant parts in the code are these:

// Given a JS TypedArray, Module._malloc() a buffer of the same size and copy the data there. 
function _arrayToHeap(typedArray) {
  var numBytes = typedArray.length * typedArray.BYTES_PER_ELEMENT;
  var ptr = Module._malloc(numBytes);
  heapBytes = Module.HEAPU8.subarray(ptr, ptr + numBytes);
  heapBytes.set(typedArray);
  return heapBytes;
}

// Free the malloced data. No GC works on this heap. 
// Alas, no dtors in JS either :-(
function _freeArray(heapBytes) {
  Module._free(heapBytes.byteOffset);
}

// Compute and display the next frame
fp.renderFrame = function () {
  // Acquire a video frame from the video element
  fp.ctx.drawImage(fp.video, 0, 0, fp.video.videoWidth,
               fp.video.videoHeight, 0, 0, fp.width, fp.height);
  var img_data = fp.ctx.getImageData(0, 0, fp.width, fp.height);

  // allocate Emscripten Heap memory buffer only when needed:      
  if (!fp.frame_bytes) {
     fp.frame_bytes = _arrayToHeap(img_data.data);
  }
  else if (fp.frame_bytes.length !== img_data.data.length) {
     _freeArray(fp.frame_bytes); // free heap memory
     fp.frame_bytes = _arrayToHeap(img_data.data);
  }
  else {
     fp.frame_bytes.set(img_data.data);
  }

  // Perform operation on copy, no additional conversions needed, direct pointer manipulation
  // results will be put directly into the output param.  
  Module._rotate_colors(img_data.width, img_data.height, 
                        fp.frame_bytes.byteOffset, fp.frame_bytes.byteOffset, 
                        fp.color_change_speed);
                        
  // copy output to ImageData
  img_data.data.set(fp.frame_bytes);
  // Render to viewport
  fp.viewport.putImageData(img_data, 0, 0);
};

We have _arrayToHeap() to mallocate and copy TypedArrays onto the heap, and _freeArray() to release said memory as the GC does not operate on the Emscripten heap.
Note that _arrayToHeap() (or more specifically Module.HEAPU8.subarray()) returns a JS TypedArray, but the internal buffer is in fact on the Emscripten heap. The copying is then done via the TypedArray.set() method, which is generaly not implemented in JS but natively by the browser and is essentially equivalent to memcpy(). This, at least, is a small consolation for the copying penalty.

Another interesting thing, is that ptr is just an integer offset on the flat heap memory buffer. We are used to pointers as distinct types, but here they are laid bare as the integer offsets they truely are.

In renderFrame() we:

Grab the video frame bytes from the canvas in img_data;
Allocate a same sized heap buffer (avoiding repeated reallocations on subsequent frames) and copying img_data.data to this buffer;
Finally, we call our exported function Module._rotate_colors() (note the added _) and pass it the required size and pointer arguments; This is where the actual functionality happens!
We copy (via set() again) the result image back into the TypedArray;
Display the result on the canvas.

Note that we are passing the color-cycle-speed value fp.color_change_speed directly. Only pointers and buffers must explicitly use the heap (single values are copied there automagically).

You can see the whole file here: color_cycle.js.

In this app we did not create a full fledged JavaScript API, so we essentially skipped the Native Interface Wrappers (NIW) layer.

The Host App

Well, on the web, the host application, is basically our HTML. Using the JS code above is as simple as:

<div id="video_place"></div>
<script src='color_cycle_asm.js'></script>
<script src='color_cycle.js'></script>
<script>
    var fp = makeFrameProcessor("sirocco.mp4");
    function updateColorChangeSpeed(newValue) { fp.color_change_speed = newValue; }
</script>
<input type="range" min="0" max="20" value="1" oninput="updateColorChangeSpeed(this.value)" onchange="updateColorChangeSpeed(this.value)"/>

Add a <div> to position the elements on the page;
Include our importer JS color_cycle.js and the Emscriptened JS libray color_cycle_asm.js;
Call the relevant functions to start processing the video;
Set up a GUI element

The reason I’m instantiating the <video> and <canvas> tags/elements programmatically via JS, is that I had problems with GitHub Markdown not always allowing these HTML tags. When using plain HTML one could skip that and simply place them directly on the page.

A now, without further ado…

The Party Parrot App

And here’s the app in all its glory (I literally pasted the code above into the post’s Markdown):

On the left, the original, a regular HTML5 video.
On the right, the same video being processed live, frame-by-frame cycling of the frame’s hue channel.
This is a live, real time view, running in the browser!

Use the slider to change the color cycling speed.

Summary

Despite a the OS/build/config related problems at the beginning, I have shown that you can build cross-platform OpenCV apps that also target the web. The Salami Method served me well in separating concerns and keeping the code clean, simple and debuggable. I would estimate that over 90% of the time I spent on this project was related to config issues and JS/HTML5/browser/Markdown related problems - most likely because JavaScript [²].

Outstanding Issues

Despite having built an outstanding working app, there are still a few outstanding issues which I have not figured out and would be glad to improve:

Intrinsics: When building OpenCV, I had turned off all intrinsics related build options. This means OpenCV will be composed only of plain cross-platform C/C++ implementations. This seems to make sense as browsers tend to run on many diferent architectures and it is hard to make assumptions about them (and also JavaScript and SIMD are not an obvious couple).
However, Emscripten does in fact support SIMD instructions! You need to specify a few build, compiler and linker flags and SIMD code ought to be able to build. This may significantly improve OpenCV’s performance (in general) as many/most of its functions have optimized SIMD versions.
However, I didn’t manage to get this to work as there were some missing header files that emcc could not find. If anyone knows how to create a SIMD enabled OpenCV build, please let me know or make a pull request.
When building OpenCV, I used the default Emscripten build configuration into static libraries. There is an optional CMake flag EMSCRIPTEN_GENERATE_BITCODE_STATIC_LIBRARIES which generates “bitcode static libraries” instead of the usual .a files. I don’t know if using this flag will make any difference or why/when one should to enable it.
I have not tried a WebAssembly (WASM) build - it would be nice to try that with the asm.js as a fallback for browsers with no WASM support.
I am not certain I have enabled all the possible optimizations for the Emscripten builds. There may be alternative options/flags that will yield a more optimized build.
The resulting asm.js file size is about 2.2MB. I suspect that it might be possible to reduce this further. There is an option to remove file-system support from the build (FS) but disabling that did not have any effect on my build.
(I suspect its still there because you can see the the Module.FS_* members it in the JS console.)
Based on this post, I changed the code to use printf() instead of std::cout for logging. This helped make the asm file somewhat smaller.
Together with the above, and on a few more Emscripten recommendations, I added these compilation flags to reduce the asm size by just over 5%:
-s NO_FILESYSTEM=1 -s ELIMINATE_DUPLICATE_FUNCTIONS=1 -s NO_EXIT_RUNTIME=1
To make matters worse, OpenCV’s CMake files have a 5-year old rule that changes -O3 to -O2 on gcc builds, and for some reason decides that emcc is a gcc compiler. So, in fact, OpenCV is currently being built with -O2. I have submitted an issue about this. You can override this by manually updating OpenCVDetectCXXCompiler.cmake to not do this when EMSCRIPTEN is defined.
BTW, identifying this silent build change shows why it’s important to inspect the CMake build report.
I am curious if there is a way to do a Visual Studio build. A post from just a few days ago claims “improved CMake and Ninja support in Visual Studio 15.3 Preview 2”. Perhaps, this will allow editing and compiling in the IDE (even if the compiler is emcc).

If you have any answers to these questions, please leave a message in the comments, tweet them to me, open an issue or make a PR on GitHub. I will update the post with the answers and due credit of course.

Wish List

It would be really nice to have a zero-cost C++ interface for passing data to Emscripten.

I can envision the exported functions accepting somthing like std::string_view or a gsl::span<> for non-strings buffers like TypedArray. This will also make the binding tools a lot more efficient.
Of course, doing this zero-cost wrapping is exactly what we do manually at the Boundary Interface Layer. In my case I used a cv::Mat wrapper, but conceptually it is similar to a gsl::span<> pairing the raw memory with its size (in my case a 2D image).

Similarly, it would be great if/when Emscripten (or WASM) could use the JS buffers directly instead of requiring copying the data to a separate heap. Perhaps the experimental SharedArrayBuffer will help with this.

If you found this post interesting, or you have more thoughts on this subject, please leave a message in the comments, Twitter or Reddit. If you know of better ways to do this, DO let me know!
Follow me on Twitter.

Credits: banner :: Cult of the Party Parrot :: BBC Last Chance To See: Kākāpō :: Sirocco

I’m actually using the BGR channel order which is the default on many platforms, making it easier to debug and display during development. ↩
Caveat: IANAJSD (I Am Not A JavaScript Developer). Most of the JS code presented here is based on the copy-pasted code from StackOverflow and the Interwebs. PRs and suggestions are most welcome. ↩ ↩²