Simple webcam access from C

06 Feb 2021 - tsp
Last update 06 Feb 2021
Reading time 26 mins

Video for Linux 2 (V4L2) API

The API used on most Unixoid operating systems (i.e. Linux, FreeBSD, etc.) is Video 4 Linux. It basically consists of a specification for device naming (i.e. the /dev/videoN devices) as well as:

These are realized using the standard Unix read / write and ioctl APIs as usual. V4L does not only support webcams but also tuners, video capture, satellite receivers, etc. - this page only focuses on cameras though most of the operations being the same for other video capture devices.

The specification for V4L2 can be found online.

For webcams there are three different methods that can be used to read or stream frames from the camera:

Up to my knowledge USB webcams currently only support the mmap mode for USB webcams so this is what this blog post will look into first. Note the v4l2 specification does not specify any mandatory interface so for a truly portable application it would be a good idea to support both streaming methods as well as a method based on read/write.

Header files used

All Video4Linux2 methods and data types are defined in a single header file that’s usually contained in linux/videodev2.h

Getting the frames

Opening the device

The first thing is obviously opening the device file. The naming is specified by the Video4Linux specification but it’s a good idea to allow overriding by the user anyways - as one usually has to support systems including multiple capture devices this is not a huge problem anyways.

The devices are usually named:

Before one opens the device it’s a good idea to check if the file exists and is really a device file:

	enum cameraError deviceOpen(
	    int* lpDeviceOut,
	    char* deviceName
	) {
	    struct stat st;
	    int hHandle;

	    if(lpDeviceOut == NULL) { return cameraE_InvalidParam; }
	    (*lpDeviceOut) = -1;
	    if(deviceName == NULL) { return cameraE_InvalidParam; }

	    /* Check if the device exists */
	    if (stat(deviceName, &st) == -1) {
	        return cameraE_UnknownDevice;
	    }

	    /* Check if it's a device file */
	    if (!S_ISCHR (st.st_mode)) {
	        return cameraE_UnknownDevice;
	    }

	    hHandle = open(deviceName, O_RDWR | O_NONBLOCK, 0);
	    if(hHandle < 0) {
	        switch(errno) {
	            case EACCES:    return cameraE_PermissionDenied;
	            case EPERM:     return cameraE_PermissionDenied;
	            default:        return cameraE_Failed;
	        }
	    }

	    (*lpDeviceOut) = hHandle;
	    return cameraE_Ok;
	}

Since we opened the device using open we have to close the device in the end using close:

	enum cameraError deviceClose(
	    int hHandle
	) {
	    close(hHandle);
	    return cameraE_Ok;
	}

Querying capabilities

The next step is to query capabilities of the opened device. This is first done via the VIDIOC_QUERYCAP ioctl. This call fills a struct v4l2_capability structure. This structure contains:

The most important field is the capabilities field. This can be used together with some interesting flags:

The first thing to check for when capturing from a webcam or video camera is, that the device really supports V4L2_CAP_VIDEO_CAPTURE and either the V4L2_CAP_READWRITE mode for single frame capture or V4L2_CAP_STREAMING for mmap or userptr mode.

Since the ioctl calls can be interrupted which is indicated by an EINTR error code libraries usually supply an xioctl method that retries the ioctl until it either succeeds or fails:

static int xioctl(int fh, int request, void *arg) {
	int r;
	do {
		r = ioctl(fh, request, arg);
	} while ((r == -1) && (errno == EINTR));
	return r;
}

To fetch the capability flags one simply uses this xioctl method and checks for the required flags:

	struct v4l2_capability cap;
	bool bReadWriteSupported = false;
	bool bStreamingSupported = false;

	if(xioctl(hHandle, VIDIOC_QUERYCAP, &cap) == -1) {
		return cameraE_Failed; /* Failed to fetch capabilities */
	}

	if((cap.capabilities & V4L2_CAP_VIDEO_CAPTURE) == 0) {
		return cameraE_InvalidParam; /* We are not a capture device */
	}

	if((cap.capabilities & V4L2_CAP_READWRITE) != 0) { bReadWriteSupported = true; }
	if((cap.capabilities & V4L2_CAP_STREAMING) != 0) { bStreamingSupported = true; }

The next step is to query cropping capabilities and pixel aspects. This is done using the VIDIOC_CROPCAP call. This call requires a pointer to a to be filled struct v4l2_cropcap that’s initialized to the requested stream type. Since the task of this blog post is to describe video capture the buffer type will be V4L2_BUF_TYPE_VIDEO_CAPTURE.

Now one can simply call the driver:

	struct v4l2_cropcap cropcap;

	memset(&cropcap, 0, sizeof(cropcap));
	cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

	if(xioctl(hHandle, VIDIOC_CROPCAP, &cropcap) == -1) {
		return cameraE_Failed; /* failed to fetch crop capabilities */
		/*
			Note that some applications simply ignore this error
			and simply don't set any cropping rectangle later on
			since there are drivers that don't support cropping.
		*/
	}

The v4l2_cropcap structure contains three interesting members:

Each rect contains left, top, width and height

Initializing device

Setting cropping region

After querying one can initialize cropping - for example to the default cropping rectangle that should usually cover the whole image. This is done using the VIDIOC_S_CROP call supplying an struct v4l2_crop. Usually this should not be required but since there are drivers that do not initialize using the default cropping rectangle it’s a good idea anyways. The structure basically only contains a cropping rectangle c.

	struct v4l2_crop crop;
	/*
		Note that this should only be done if VIDIOC_CROPCAP was successful
	*/
	crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	crop.c = cropcap.defrect;

	if(xioctl(hHandle, VIDIOC_S_CROP, &crop) == -1) {
		/* Failed. Maybe only not supported (EINVAL) */
	}

Format negotiation

To be able to negotiate a format one should usually query the formats supported by each device to locate one supported by the application. The code sample accompanying this blog post does not perform this negotiation but simply assumes an webcam to support the YUYV color model and at least 640x480 resolution to make the code easier to read. But I’ll cover the format negotiation here - it’s rather simple.

The first thing one has to know is that there are two major basic representations for colors used:

The main advantage of luma and chroma based models is that one immediately has an grayscale image available when just looking at the luma channel. This is also how this encoding schemes emerges historically - YUV models have just added two subcarrier encoded chroma channels to transmit color information in addition to backwards compatible grayscale images for TV usage.

RGB models on the other side are usually easier to use on modern input and output devices.

All color models basically support the same information but dependent on their encoding support different resolution and scales. Nearly all models allow one to add an optional alpha channel that covers transparency. Since we’re interested in video capture alpha channels usually don’t play a role.

The most major difference for all color models is the way they encode the data. Again there are two major encoding methods:

Depending on the chosen format the information for each channel may be of the same amount or there may be different amount of information for each pixel. For the mostly used YUYV format (that’s also selected by the example and is often calles YUV422) there are for each two pixels two luminance informations but only one U and one V coordinate for both. The idea is that the human eye is more sensitive to luminance changes than chroma changes so one has to encode way less chromatic information. These four values then occupy - for YUV422 - three bytes in a specific pattern that has to be decoded.

There is a huge number of supported formats - the usualy way to handle this inside media processing libraries is to decide on one or two internally supported formats and decode as well as re-encode on the application boundaries. For example I personally usually decide to support:

For more specialized algorithms I personally also use:

To determine which format an capture devices supports one can use the VIDIOC_ENUM_FMT function call. This is built around the struct v4l2_fmtdesc structure:

	struct v4l2_fmtdesc {
		__u32		    	index;
		enum v4l2_buf_type  type;
		__u32               flags;
		__u8		    	description[32];
		__u32		    	pixelformat;
		__u32		    	reserved[4];
	};

The basic idea is that the application just fills the index and type fields, calls the VIDIOC_ENUM_FMT function and the driver fills the fields with available information. To query information about our capture device one will iterate the index value from 0 and count upwards till the driver fails with an error code of EINVAL. The type has to be set to V4L2_BUF_TYPE_VIDEO_CAPTURE:

	for(int idx = 0;; idx = idx + 1)
		struct v4l2_fmtdesc fmt;

		fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		fmt.index = idx;

		if(xioctl(hHandle, VIDIOC_ENUM_FMT, &fmt) < 0) {
			/* Failed, usually one should check the error code ... */
			break;
		}

		/* We got some format information. For demo purposes just display it */
		printf("Detected format %08x (is compressed: %s): %s\n", fmt.pixelformat, ((fmt.flags & V4L2_FMT_FLAG_COMPRESSED) != 0) ? "yes" : "no", fmt.description);
	}

Setting the format

The next step is setting the desired format. There are three calls involved with setting, trying or getting the format:

Setting the format requires usually negotiation of the format but most webcams support YUYV color space and interlaced pixel layout. This can be set in a struct v4l2_format:

	struct v4l2_format fmt;
	unsigned int width, height;

	/*
		Select 640 x 480 resolution (you should use dimensions
        as previously set while setting cropping parameters),
        YUYV color format and interlaced order
	*/
	fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	fmt.fmt.pix.width = 640;
	fmt.fmt.pix.height = 480;
	fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
	fmt.fmt.pix.field = V4L2_FIELD_INTERLACED;

	if(xioctl(hHandle, VIDIOC_S_FMT, &fmt) == -1) {
		/* Failed to set format ... */
	}

	/* Now one should query the real size ... */
	width = fmt.fmt.pix.width;
	height = fmt.fmt.pix.height;

In some code like v4l2grab there is some additional handling of buggy drivers. Since webcams are usually cheap products and there are some buggy drivers so on Linux they check if the fmt.fmt.pix.bytesperline is at least two times the fmt.fmt.pix.width and that fmt.fmt.pix.sizeimage is at least 2 * fmt.fmt.pix.width * fmt.fmt.pix.height.

Capturing and capturing frames

Streaming I/O using mmap

The interface supported for most webcams is streaming I/O using memory mapped buffers. This has been the most efficient streaming method for a long time - allowing an application to virtually map device memory areas (for example memory contained on an PCI capture card) directly into application memory. Later on a second method using userptr has been added that allows one also to exploit DMA transfer into real main memory when using devices supporting busmastering. For cheap USB webcams this usually doesn’t make a difference though and userptr streaming I/O mode is usually not supported by most hardware anyways.

Note that there is no way for a driver to indicate which type of streaming methods they support except for one to request allocation of buffers.

The basic idea is:

There is a common structure used by the queue and dequeue operations that’s called struct v4l2_buffer. This structure contains:

As shown in the outline above the first step is to request buffers from the device driver. One can request multiple buffers - the driver itself determines the lower (!) and upper bound onto the number of buffers that have to be requested. It’s a good idea to support a variable number in case the driver requests on to use more or less buffers.

To request buffers one can use the VIDIOC_REQBUFS ioctl that resembles the function call int (*vidioc_reqbufs) (struct file *file, void *private_data, struct v4l2_requestbuffers *req);

The struct v4l2_requestbuffers structure contains:

If the driver does not support mmap (or if it has been requested the userptr mode) it will return EINVAL. This is the only way to determine the supported streaming data transfer mode.

	struct v4l2_requestbuffers rqBuffers;

	/*
		Request 1 buffer (simple but not seamless, usually use 3+) ...
	*/

	rqBuffers.count = bufferCount;
	rqBuffers.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	rqBuffers.memory = V4L2_MEMORY_MMAP;

	if(xioctl(hHandle, VIDIOC_REQBUFS, &rqBuffers) == -1) {
		printf("%s:%u Requesting buffers failed!\n", __FILE__, __LINE__);
		deviceClose(cameraE_Ok);
		return 2;
	}

	bufferCount = rqBuffers.count;

After the buffers have been requested they have to be mapped into memory. To do so one has to VIDIOC_QUERYBUF each buffer to determine the parameters that will be passed to mmap in the same way as mapping from a memory mapped file. On entry into QUERYBUF one just has to pass type and index.

	struct imageBuffer* lpBuffers;
	{
		lpBuffers = calloc(bufferCount, sizeof(struct imageBuffer));
		if(lpBuffers == NULL) {
			printf("%s:%u Out of memory\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}

		int iBuf;
		for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
			struct v4l2_buffer vBuffer;

			memset(&vBuffer, 0, sizeof(struct v4l2_buffer));

			/*
				Query a buffer identifying magic cookie from the driver
			*/
			vBuffer.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			vBuffer.memory = V4L2_MEMORY_MMAP;
			vBuffer.index = iBuf;

			if(xioctl(hHandle, VIDIOC_QUERYBUF, &vBuffer) == -1) {
				printf("%s:%u Failed to query buffer %d\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}

			/*
				Use the mmap syscall to map the drivers buffer into our
				address space at an arbitrary location.
			*/
			lpBuffers[iBuf].lpBase = mmap(NULL, vBuffer.length, PROT_READ|PROT_WRITE, MAP_SHARED, hHandle, vBuffer.m.offset);
			lpBuffers[iBuf].sLen = vBuffer.length;

			if(lpBuffers[iBuf].lpBase == MAP_FAILED) {
				printf("%s:%u Failed to map buffer %d\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}
		}
	}

Then one has to enqueue all buffers that one wants to provide to the driver (typically all of them before starting the processing loop) by using the VIDIOC_QBUF function. One just has to supply type and index when using memory mapped buffers.

	{
		int iBuf;
		for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
			struct v4l2_buffer buf;
			memset(&buf, 0, sizeof(struct v4l2_buffer));

			buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			buf.memory = V4L2_MEMORY_MMAP;
			buf.index = iBuf;

			if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
				printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}
		}
	}

Whenever the device is ready the processing loop will use VIDIOC_DQBUF to pop the oldest filled buffer from the output queue. This is a blocking call - that can also be realized using standard select, epoll or kqueue asynchronous processing functions in case O_NONBLOCK had been set during the open. Usually one wants to re-enqueue the buffer after having finished processing or having copied the data for further processing.

	int iFrames = 0;
	while(iFrames < numFrames) {
		struct kevent kev;
		struct v4l2_buffer buf;

		int r = kevent(kq, NULL, 0, &kev, 1, NULL);
		if(r < 0) {
			printf("%s:%u kevent failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}

		if(r > 0) {
			/* We got our frame or EOF ... try to dqueue */
			memset(&buf, 0, sizeof(struct v4l2_buffer));

			buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			buf.memory = V4L2_MEMORY_MMAP;

			if(xioctl(hHandle, VIDIOC_DQBUF, &buf) == -1) {
				if(errno == EAGAIN) { continue; }

				printf("%s:%u DQBUF failed\n", __FILE__, __LINE__);
				deviceClose(hHandle);
				return 2;
			}

			printf("%s:%u Dequeued buffer %d\n", __FILE__, __LINE__, buf.index);

			/* ToDo: Process image ... */

			/* Re-enqueue */
			if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
				printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, buf.index);
				deviceClose(hHandle);
				return 2;
			}

			iFrames = iFrames + 1;
		}
	}

The last two important functions start and stop the stream processing. These are VIDIOC_STREAMON and VIDIOC_STREAMOFF. Of course one should start streaming before running the event processing loop.

	{
		/* Enable streaming */
		enum v4l2_buf_type type;

		type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

		if(xioctl(hHandle, VIDIOC_STREAMON, &type) == -1) {
			printf("%s:%u Stream on failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}
	}
	{
		/* Disable streaming */
		enum v4l2_buf_type type;

		type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

		if(xioctl(hHandle, VIDIOC_STREAMOFF, &type) == -1) {
			printf("%s:%u Stream off failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}
	}

Using read/write interface

The usage of the read/write interface will be added in near future (hopefully). Note that it’s usually not supported by webcams on FreeBSD anyways.

Writing frames into a JPEG file using libjpeg

The process of writing an raw image into a JPEG file has been discussed in a previous blog post. The major remaining task is to convert the read image into the format accepted by libjpeg. In my application I had to convert the YUV422 format into RGB888. In YUV422 there are always two luminance values as well as a single set of chroma values per sample - two pixels share the chroma values but have different luminance values.

Simple sample (FreeBSD, streaming mmap)

External references

This article is tagged: Programming, ANSI C, Tutorial


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support