Thursday, November 11, 2010

Improving frame buffer exchange

Ok, I thought it was time for a v4l4j update, and decided to improve the way image buffers are retrieved from libvideo, converted and passed to Java. I ll briefly explain how v4l4j does this now, look at alternative ways of doing things, and explain my choices.

Current design

During the call to init_capture(), libvideo maps a certain number of memory chunks into the driver. v4l4j asks for 4 chunks,  but the driver may decide to create more  or less. These chunks will hold images captured from the video device. The mappings' addresses are decided by the system call, ie. the memory chunks are not allocated by v4l4j but during the mmap() syscall.

Then, v4l4j creates as many ByteBuffer objects as there are mmap'ed buffers. Each ByteBuffer has its own memory block used as a storage area backing the ByteBuffer.

When retrieving the next frame, libvideo's dequeue_buffer() returns a pointer to the mmap'ed buffer containing the latest frame. v4l4j passes it to its conversion routines along with a pointer to a destination buffer where the converted image will be stored. If no conversion is required, a simple copy is made. The destination buffer is the address of the backing buffer behind the first ByteBuffer. (ByteBuffers are then used in a round-robin fashion).

Last, the contents of the ByteBuffer object is then copied into a java byte array.

Looking back on it, this is hugely inefficient. There can be up to 3 copy operations plus a possible format conversion. This leaves a lot of room for improvement.

What can be changed ?

Let's start with the end result: from Java, what is the best object type to encapsulate the image data ? When I started v4l4j, I was immediately attracted to the java.nio package and its ByteBuffer class (which is the one used in the current implementation), mostly because it is readily available from native code. However, this may not be the easiest type to deal with when trying to do anything useful with the image inside. Most of the time, the ByteBuffer data has to be transferred to a byte array in order to be used, specially when trying to display the image in a GUI element. I believe a copy can be avoided if the data was readily available in a Java byte[] instead of a ByteBuffer.

Next, let's see what the best way is to have the latest image stored in a byte array. The ideal option would be to have the byte arrays allocated in the Java code (so as to avoid a new allocation every time a frame arrives), then obtain a C pointer to the byte array and mmap() it in the driver memory, so that new frames are directly placed into the Java byte array. Reading the JNI documentation on arrays thoroughly, there are two ways to obtain a C pointer to a Java byte array: GetByteArrayElements() and GetPrimitiveArrayCritical().
  • The first one does not guarantee that the JVM will return a pointer to the actual Java array, ie it may copy the array into a C heap-allocated array and return a pointer to that array, later on synchronised when ReleaseByteArrayElements() is called. Looking at the implementation of these methods in the OpenJDK HotSpot JVM v6 & 7 source code, confirmed one thing: these JVM implementations do indeed copy the Java array into a C array. Always. Bad news !
  • This leaves us with GetPrimitiveArrayCritical(). Even with this method, the JVM may return a copy of the array, but, in the Javadoc's own words, it is "[...] more likely that the native code will obtain an uncopied version of the array". Looking at the JVM source code again confirmed that the OpenJDK JVM v6 & 7 will always return a pointer to the Java byte array, not a copy. However, the only issue with this method is that the code between GetPrimitiveArrayCritical() and its counterpart ReleasePrimitiveArrayCritical() has very similar restrictions to that of critical sections: it must execute fast, and it must not sleep. This is not good either because if we want the driver to place bytes in our mmap()'ed byte array, we must call libvideo's dequeue_buffer which calls ioctl(DEQUEUE_BUF), which blocks...
With this in mind, I came up with the following design:
  • Use mmap() provided memory buffers
  • Call GetPrimitiveArrayCritical()
  • Either copy the image into the Java byte array or convert the image directly into the Java byte array
  • Call ReleasePrimitiveArrayCritical()
This way, for each image, the code between the Get/ReleasePrimitivearrayCritical() does not sleep, and at most one copy or one conversion will occur.

Friday, February 12, 2010

VIDIOC_REQBUFS behaviour

A few days ago, Daniel posted a message on the v4l4j mailing list with an issue I had not seen before: the test-gui application captures video fine the first time, but subsequent attempts fail. The only way to get the capture going again is by restarting the test application. The entire discussion can be found here.

Finding out what went wrong
Looking at the debug log generated by the test application, it is clear that everything works well for the first capture, but when the second is initiated, setting the capture parameters (namely the resolution and pixel format) fails, as indicated by this line in the log:
[v4l2-input.c:247 apply_image_format] CAP: palette 0x56595559 rejected
The requested capture parameters are given to the driver using the VIDIO_S_FMT ioctl, which is precisely what causes the second capture to fail, this ioctl call returns EBUSY (device or resource busy). Now, at this point, I could not explain why this ioctl would work the first time, but not the second one.


Looking at the driver side 
I now need to check what the driver does upon reception of an VIDIO_S_FMT ioctl. I updated my mercurial copy of the v4l-dvb tree and took a look at uvc-v4l2.c . The function handling the VIDIO_S_FMT ioctl is at line 244 (uvc_v4l2_set_format() ). This function returns EBUSY if uvc_queue_allocated() != 0. A quick look at uvc_queue.c reveals that uvc_queue_allocated() returns 0 if no buffers have been allocated, or if all allocated buffers have been released...

There it is, that's the solution to the problem: making sure no buffers were allocated before calling VIDIO_S_FMT. However, I could not remember reading anything about this in the V4L2 API specs when writing v4l4j... So I headed back to the V4L2 specs pages, specifically, the one about VIDIO_S_FMT . I realised that I overlooked this (self-explanatory) line: 
When I/O is already in progress or the resource is not available for other reasons drivers return the EBUSY error code.
How to release buffers

Here again, I could not remember reading anything about this in the V4L2 specs at the time I wrote v4l4j. Buffers are requested using the VIDIOC_REQBUF ioctl. Back in uvc_v4l2.c, it is clear by reading the code handling VIDIOC_REQBUF (at line 867) that this ioctl can be called with a value of 0 for the number of buffers requested. A quick look at the documentation for this ioctl (here) indicates that a value of 0 is perfectly acceptable... Weird, I really dont remember reading this, maybe it was added afterwards. Just to confirmed whether it's something recent, I decided to take a quick look at other drivers to see if they also accept a value of 0 when calling VIDIOC_REQBUF . Interestingly, the pwc driver will silently ignore it and do nothing. The gspca driver will gladly accept a value of 0, and release existing buffers. On the other hand, the bttv and zr364xx drivers both use the V4L-generic function videobuf_reqbufs() (in videobuf-core.c) to do the buffer allocation, which will fail (EINVAL) with a buffer count of 0. It seems that different drivers behave differently when handling this ioctl... I should (and will) discuss that on the V4L mailing list.


Solves Daniel's problem ?
I now needed to confirm whether or not these findings would help solving Daniel's issue. I quickly hacked up a simple capture application (here) which runs a video capture twice. Without releasing the buffers, the capture fails the second time (just like it did with the v4l4j test-gui app). And releasing the buffers does allow capture the second time !!!

Next, implementing the fix in v4l4j
I am still working on this one. However, because it seems some drivers will fail when asked to release buffers, I will probably end up adding code to release allocated buffers when stopping an on-going capture and ignore the result.