Thursday, November 11, 2010

Improving frame buffer exchange

Ok, I thought it was time for a v4l4j update, and decided to improve the way image buffers are retrieved from libvideo, converted and passed to Java. I ll briefly explain how v4l4j does this now, look at alternative ways of doing things, and explain my choices.

Current design

During the call to init_capture(), libvideo maps a certain number of memory chunks into the driver. v4l4j asks for 4 chunks,  but the driver may decide to create more  or less. These chunks will hold images captured from the video device. The mappings' addresses are decided by the system call, ie. the memory chunks are not allocated by v4l4j but during the mmap() syscall.

Then, v4l4j creates as many ByteBuffer objects as there are mmap'ed buffers. Each ByteBuffer has its own memory block used as a storage area backing the ByteBuffer.

When retrieving the next frame, libvideo's dequeue_buffer() returns a pointer to the mmap'ed buffer containing the latest frame. v4l4j passes it to its conversion routines along with a pointer to a destination buffer where the converted image will be stored. If no conversion is required, a simple copy is made. The destination buffer is the address of the backing buffer behind the first ByteBuffer. (ByteBuffers are then used in a round-robin fashion).

Last, the contents of the ByteBuffer object is then copied into a java byte array.

Looking back on it, this is hugely inefficient. There can be up to 3 copy operations plus a possible format conversion. This leaves a lot of room for improvement.

What can be changed ?

Let's start with the end result: from Java, what is the best object type to encapsulate the image data ? When I started v4l4j, I was immediately attracted to the java.nio package and its ByteBuffer class (which is the one used in the current implementation), mostly because it is readily available from native code. However, this may not be the easiest type to deal with when trying to do anything useful with the image inside. Most of the time, the ByteBuffer data has to be transferred to a byte array in order to be used, specially when trying to display the image in a GUI element. I believe a copy can be avoided if the data was readily available in a Java byte[] instead of a ByteBuffer.

Next, let's see what the best way is to have the latest image stored in a byte array. The ideal option would be to have the byte arrays allocated in the Java code (so as to avoid a new allocation every time a frame arrives), then obtain a C pointer to the byte array and mmap() it in the driver memory, so that new frames are directly placed into the Java byte array. Reading the JNI documentation on arrays thoroughly, there are two ways to obtain a C pointer to a Java byte array: GetByteArrayElements() and GetPrimitiveArrayCritical().
  • The first one does not guarantee that the JVM will return a pointer to the actual Java array, ie it may copy the array into a C heap-allocated array and return a pointer to that array, later on synchronised when ReleaseByteArrayElements() is called. Looking at the implementation of these methods in the OpenJDK HotSpot JVM v6 & 7 source code, confirmed one thing: these JVM implementations do indeed copy the Java array into a C array. Always. Bad news !
  • This leaves us with GetPrimitiveArrayCritical(). Even with this method, the JVM may return a copy of the array, but, in the Javadoc's own words, it is "[...] more likely that the native code will obtain an uncopied version of the array". Looking at the JVM source code again confirmed that the OpenJDK JVM v6 & 7 will always return a pointer to the Java byte array, not a copy. However, the only issue with this method is that the code between GetPrimitiveArrayCritical() and its counterpart ReleasePrimitiveArrayCritical() has very similar restrictions to that of critical sections: it must execute fast, and it must not sleep. This is not good either because if we want the driver to place bytes in our mmap()'ed byte array, we must call libvideo's dequeue_buffer which calls ioctl(DEQUEUE_BUF), which blocks...
With this in mind, I came up with the following design:
  • Use mmap() provided memory buffers
  • Call GetPrimitiveArrayCritical()
  • Either copy the image into the Java byte array or convert the image directly into the Java byte array
  • Call ReleasePrimitiveArrayCritical()
This way, for each image, the code between the Get/ReleasePrimitivearrayCritical() does not sleep, and at most one copy or one conversion will occur.