Sunday, March 25, 2012

Changelog for v0.9.0

About a year after the previous release (v0.8.10), here comes a shiny new version of v4l4j: v0.9.0 This version packs a few bug fixes and a lot of improvements, which I have detailed below:

  •  On the mailing list, Roberto reported on the mailing list there was no way to switch the input used for capture during capture, other than stopping the current frame grabber and re-creating another one. I have implemented a fix and it is now possible to change the current input while capturing, however, the time required to complete the switch is driver dependent, and, as Roberto reported, it can take a couple of frames for the switch to happen, and the switch might not happen between frames.
  • Leonardo reported a bug here due to v4l4j not handling properly string, interger64 and bitmask controls. This is now fixed.
  • An error reported by Jay here was occurring when capturing with PAC_3711 devices, which is now fixed
  • Many users have reported an issue where v4l4j fails to run because of undefined symbols. I tracked down the problem to some versions of GCC not parsing the command line in the same way as others, and ignoring the list of linked libraries.
  • Jeff reported a NullPointerException when stopping and starting the capture. This has been fixed.
  • v4l-utils has been updated to v0.8.6 .
  • v4l4j uses PixFC v0.3 to speed up the following conversions in v4l-utills: YUV422 <-> RGB24, YUV422P <-> RGB24, YUV420P <-> RGB24
  • JPEG conversions have also been optimised. v4l4j now only supplies downsampled YUV  frames to libjpeg, and relies on PixFC for YUV downsampling and RGB to YUV conversions.
  • I have updated the Ubuntu packages to not depend specifically on OpenJDK, as request on the mailing list.
  • Last, I started testing v4l4j in qemu emulating an ARMv7l architecture. While v4l4j builds fine, I couldn't actually test interactions with a video device. So ARM support is only experimental as this stage, but I am working on it actively.
Thanks to Daniel for sending some sample code showing how to integrate v4l4j with the Monte Media library to capture video and save it to a file. More on this here.

That's it. v4l4j v0.9.0 can be downloaded from here. Ubuntu packages will be available shortly.

Friday, March 23, 2012

Performance results for libjpeg-turbo

Awesome performance gain with libjpeg-turbo !!! It performs significantly faster than the older libjpeg62 / libjpeg8 implementation. I used the jpeg test app in v4l4j and here are the conversion timings for both libjpeg62 and libjpeg-turbo8

The table below shows the average conversion times over 10 runs for a 1280x1024 image in various YUV source format. Tests were run from a VM running Ubuntu 12.04 Precise beta1, so the conversion time itself does not mean much, but what's interesting is the comparison between the two implementations. In my tests, the turbo implementation was 3x to 4x faster.
libjpeg-turbo8 libjpeg62
YUYV to JPEG 5.5 ms 21.5 ms
UYVY to JPEG 5.7 ms 21.1 ms
YVYU to JPEG 6.8 ms 22.0 ms
YUV420p to JPEG 3.8 ms 15.6 ms
ARGB to JPEG 4.6 ms 16.4 ms
RGB24 to JPEG 4.6 ms 16.4 ms
BGRA to JPEG 4.6 ms 16.4 ms
BGR24 to JPEG 4.9 ms 16.4 ms


Thursday, November 11, 2010

Improving frame buffer exchange

Ok, I thought it was time for a v4l4j update, and decided to improve the way image buffers are retrieved from libvideo, converted and passed to Java. I ll briefly explain how v4l4j does this now, look at alternative ways of doing things, and explain my choices.

Current design

During the call to init_capture(), libvideo maps a certain number of memory chunks into the driver. v4l4j asks for 4 chunks,  but the driver may decide to create more  or less. These chunks will hold images captured from the video device. The mappings' addresses are decided by the system call, ie. the memory chunks are not allocated by v4l4j but during the mmap() syscall.

Then, v4l4j creates as many ByteBuffer objects as there are mmap'ed buffers. Each ByteBuffer has its own memory block used as a storage area backing the ByteBuffer.

When retrieving the next frame, libvideo's dequeue_buffer() returns a pointer to the mmap'ed buffer containing the latest frame. v4l4j passes it to its conversion routines along with a pointer to a destination buffer where the converted image will be stored. If no conversion is required, a simple copy is made. The destination buffer is the address of the backing buffer behind the first ByteBuffer. (ByteBuffers are then used in a round-robin fashion).

Last, the contents of the ByteBuffer object is then copied into a java byte array.

Looking back on it, this is hugely inefficient. There can be up to 3 copy operations plus a possible format conversion. This leaves a lot of room for improvement.

What can be changed ?

Let's start with the end result: from Java, what is the best object type to encapsulate the image data ? When I started v4l4j, I was immediately attracted to the java.nio package and its ByteBuffer class (which is the one used in the current implementation), mostly because it is readily available from native code. However, this may not be the easiest type to deal with when trying to do anything useful with the image inside. Most of the time, the ByteBuffer data has to be transferred to a byte array in order to be used, specially when trying to display the image in a GUI element. I believe a copy can be avoided if the data was readily available in a Java byte[] instead of a ByteBuffer.

Next, let's see what the best way is to have the latest image stored in a byte array. The ideal option would be to have the byte arrays allocated in the Java code (so as to avoid a new allocation every time a frame arrives), then obtain a C pointer to the byte array and mmap() it in the driver memory, so that new frames are directly placed into the Java byte array. Reading the JNI documentation on arrays thoroughly, there are two ways to obtain a C pointer to a Java byte array: GetByteArrayElements() and GetPrimitiveArrayCritical().
  • The first one does not guarantee that the JVM will return a pointer to the actual Java array, ie it may copy the array into a C heap-allocated array and return a pointer to that array, later on synchronised when ReleaseByteArrayElements() is called. Looking at the implementation of these methods in the OpenJDK HotSpot JVM v6 & 7 source code, confirmed one thing: these JVM implementations do indeed copy the Java array into a C array. Always. Bad news !
  • This leaves us with GetPrimitiveArrayCritical(). Even with this method, the JVM may return a copy of the array, but, in the Javadoc's own words, it is "[...] more likely that the native code will obtain an uncopied version of the array". Looking at the JVM source code again confirmed that the OpenJDK JVM v6 & 7 will always return a pointer to the Java byte array, not a copy. However, the only issue with this method is that the code between GetPrimitiveArrayCritical() and its counterpart ReleasePrimitiveArrayCritical() has very similar restrictions to that of critical sections: it must execute fast, and it must not sleep. This is not good either because if we want the driver to place bytes in our mmap()'ed byte array, we must call libvideo's dequeue_buffer which calls ioctl(DEQUEUE_BUF), which blocks...
With this in mind, I came up with the following design:
  • Use mmap() provided memory buffers
  • Call GetPrimitiveArrayCritical()
  • Either copy the image into the Java byte array or convert the image directly into the Java byte array
  • Call ReleasePrimitiveArrayCritical()
This way, for each image, the code between the Get/ReleasePrimitivearrayCritical() does not sleep, and at most one copy or one conversion will occur.

Friday, February 12, 2010

VIDIOC_REQBUFS behaviour

A few days ago, Daniel posted a message on the v4l4j mailing list with an issue I had not seen before: the test-gui application captures video fine the first time, but subsequent attempts fail. The only way to get the capture going again is by restarting the test application. The entire discussion can be found here.

Finding out what went wrong
Looking at the debug log generated by the test application, it is clear that everything works well for the first capture, but when the second is initiated, setting the capture parameters (namely the resolution and pixel format) fails, as indicated by this line in the log:
[v4l2-input.c:247 apply_image_format] CAP: palette 0x56595559 rejected
The requested capture parameters are given to the driver using the VIDIO_S_FMT ioctl, which is precisely what causes the second capture to fail, this ioctl call returns EBUSY (device or resource busy). Now, at this point, I could not explain why this ioctl would work the first time, but not the second one.


Looking at the driver side 
I now need to check what the driver does upon reception of an VIDIO_S_FMT ioctl. I updated my mercurial copy of the v4l-dvb tree and took a look at uvc-v4l2.c . The function handling the VIDIO_S_FMT ioctl is at line 244 (uvc_v4l2_set_format() ). This function returns EBUSY if uvc_queue_allocated() != 0. A quick look at uvc_queue.c reveals that uvc_queue_allocated() returns 0 if no buffers have been allocated, or if all allocated buffers have been released...

There it is, that's the solution to the problem: making sure no buffers were allocated before calling VIDIO_S_FMT. However, I could not remember reading anything about this in the V4L2 API specs when writing v4l4j... So I headed back to the V4L2 specs pages, specifically, the one about VIDIO_S_FMT . I realised that I overlooked this (self-explanatory) line: 
When I/O is already in progress or the resource is not available for other reasons drivers return the EBUSY error code.
How to release buffers

Here again, I could not remember reading anything about this in the V4L2 specs at the time I wrote v4l4j. Buffers are requested using the VIDIOC_REQBUF ioctl. Back in uvc_v4l2.c, it is clear by reading the code handling VIDIOC_REQBUF (at line 867) that this ioctl can be called with a value of 0 for the number of buffers requested. A quick look at the documentation for this ioctl (here) indicates that a value of 0 is perfectly acceptable... Weird, I really dont remember reading this, maybe it was added afterwards. Just to confirmed whether it's something recent, I decided to take a quick look at other drivers to see if they also accept a value of 0 when calling VIDIOC_REQBUF . Interestingly, the pwc driver will silently ignore it and do nothing. The gspca driver will gladly accept a value of 0, and release existing buffers. On the other hand, the bttv and zr364xx drivers both use the V4L-generic function videobuf_reqbufs() (in videobuf-core.c) to do the buffer allocation, which will fail (EINVAL) with a buffer count of 0. It seems that different drivers behave differently when handling this ioctl... I should (and will) discuss that on the V4L mailing list.


Solves Daniel's problem ?
I now needed to confirm whether or not these findings would help solving Daniel's issue. I quickly hacked up a simple capture application (here) which runs a video capture twice. Without releasing the buffers, the capture fails the second time (just like it did with the v4l4j test-gui app). And releasing the buffers does allow capture the second time !!!

Next, implementing the fix in v4l4j
I am still working on this one. However, because it seems some drivers will fail when asked to release buffers, I will probably end up adding code to release allocated buffers when stopping an on-going capture and ignore the result.

Saturday, October 3, 2009

Ideas for future work on v4l4j

These are some of the ideas I came up with, to add functionality to v4l4j and achieve better integratation it with other projects:
  • Review and perhaps improve the way frames are given to the user. For now, ByteBuffers are used but they involve a memory copy operation, at least to transfer the contents of the ByteBuffer to a byte array which can be then passed on to other SWING objects for display. It might not be necessary when doing video streaming over the network for instance (I believe it is possible to  send the contents of the ByteBuffer to a socket without extra copy, but I have not tested this). Also worth investigating is whether implementing a push model would be useful.
  • Add support for MPEG image format: in order to achieve this, libvideo must first be modified to return not just a pointer to the frame buffer, but most of the fields present in the struct v4l2_buffer so as to report timecode, frame type, ... The reason I havent implemented things this way originally, is because MPEG is only available in v4l2, and I wanted to investigate more how to emulate (or fake) these fields that are present in v4l2 but missing in v4l1. However, with a bit more experience now and after making libvideo fake (a lot) of missing things to make it behave like v4l2 in a lot of ways, I believe it shouldnt be too hard to implment.
  • Integration with FMJ: FMJ is an open source implementation of the (now dead) JMF (Java Media Framework). FMJ is API-compatible with JMF, so applications written to use JMF can (in theory, not tested) work with FMj without modification. Alexandra has submitted some sample code to quickly create a JMF DataSource out of a v4l4j VideoDevice, but it needs some refactoring.
  • Since I am currently deeply involved with DirectShow at work, I cant help but thinking about making v4l4j use DirectShow to capture video streams. Of course, the v4l part in v4l4j wont be too relevant there, but I think a project rename is acceptable if it means supporting addtional platforms.

This is all I can think for now.  As always, comments (on the mailing list please) are welcome.

Monday, August 31, 2009

Changelog for v0.8.6

v4l4j 0.8.6 packs in support for a brand new feature: dealing with capture frame rate.

With this version, an application can dynamically find out wheter a video device supports adjustable capture frame rates, enumerate them and set an appropriate one for capture.
Of course, this feature is available only if the driver and video device support it. I have made a few test with the hardware I have at hands and I managed to achieve capture rates of up to 30 frame per seconds ! I have listed my results on this wiki page.

First post

I have just setup this blog about a project I have been running for a while now: v4l4j. In short, the goal of v4l4j is to bring video capture capability & video device control to Java. It is an API through which a Java application can :
  • enumerate capabilities of a video device, such as the supported image format, resolution and frame rate,
  • setup a video device and capture images from it,
  • enumerate video controls and act on them.
A lot more information is available on the website.

With this blog, I am hoping to post design ideas, future directions & improvements as well as keeping a changelog of updates to v4l4j.