18.01.2013 Views

On-line Web Application using Image Segmentation - Laboratoire de ...

On-line Web Application using Image Segmentation - Laboratoire de ...

On-line Web Application using Image Segmentation - Laboratoire de ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>On</strong>-<strong>line</strong> <strong>Web</strong> <strong>Application</strong> <strong>using</strong> <strong>Image</strong> <strong>Segmentation</strong><br />

Xavier Marichal<br />

<strong>Laboratoire</strong> <strong>de</strong> Télécommunications et Télédétection, Université catholique <strong>de</strong> Louvain<br />

B-1348 Louvain-la-Neuve, Belgium<br />

Marichal@tele.ucl.ac.be<br />

Abstract<br />

The present paper introduces a simple and efficient scheme to perform real-time software segmentation.<br />

Based on a combination of changed <strong>de</strong>tection masks, it allows separating all moving objects from the<br />

background. It is currently used to provi<strong>de</strong> web servers with segmented content.<br />

1. Introduction<br />

“The i<strong>de</strong>ntification, access and representation of digital image and vi<strong>de</strong>o information is becoming an<br />

integral part of many interactive multimedia applications today. The emerging MPEG-4 and MPEG-7<br />

standards will allow for content-based vi<strong>de</strong>o coding and representation and content-based visual query in<br />

image and vi<strong>de</strong>o. Various other applications, such as editing and manipulation of vi<strong>de</strong>o sequences, vi<strong>de</strong>o<br />

surveillance, or image and vi<strong>de</strong>o in<strong>de</strong>xing and retrieval applications, are equally <strong>de</strong>pen<strong>de</strong>nt on the<br />

availability of sophisticated algorithms for content i<strong>de</strong>ntification, content segmentation and content<br />

<strong>de</strong>scription.”[1]<br />

Although the <strong>de</strong>sign of suitable fully automatic algorithms, in particular for image sequence segmentation,<br />

is still consi<strong>de</strong>red an unsolved problem as a whole, the present paper introduces a simple yet robust<br />

segmentation algorithm that runs real-time on a Linux platform and provi<strong>de</strong>s a <strong>Web</strong> server with segmented<br />

content.<br />

2. Motivation<br />

Casterman S.A. (http://www.casterman.com) is a Belgian publisher involved in many fields of publication.<br />

Among other activities, Casterman is one of the major editors of comics, with for instance the famous<br />

Tintin (http://www.tintin.be). They are naturally seeking to extend their activities towards new technologies<br />

and are therefore pushing some comics to the <strong>Web</strong>. The virtual city of Urbican<strong>de</strong> is then born<br />

(http://www.urbican<strong>de</strong>.be), and proposes fans to navigate from link to link among a whole enigmatic site<br />

where they have to find secret passages.<br />

Although fun, this virtual city was only proposing static stories with hid<strong>de</strong>n but static links. The objective<br />

therefore is to make the city live and animate it. The retained solution is to bring this virtual city to life<br />

thanks to real people that would inhabit its streets, bars… The final aim is to provi<strong>de</strong> people with the<br />

possibility to interact both from their computer or by going in front of the cameras.<br />

Three cameras have been placed in the street of Louvain-la-Neuve (the city hosting the Université<br />

catholique <strong>de</strong> Louvain). The background of these three views has then be redrawn by an artist who<br />

redrafted the buildings… according to the architecture of the virtual city: Urbican<strong>de</strong>-la-Neuve. Then, all<br />

moving objects appearing in front of the camera are segmented in or<strong>de</strong>r to be incrusted in the virtual<br />

<strong>de</strong>cors. Finally, the images are compressed (currently as an animated GIF, but evolution is ensured towards<br />

MPEG-4) and pushed to the <strong>Web</strong>. The streams are visible from http://urbican<strong>de</strong>.tele.ucl.ac.be. Figure 1<br />

presents one resulting snapshot.


Figure 1: A snapshot from ‘Urbican<strong>de</strong>-la-Neuve’ mixing virtual background and<br />

foreground with real people walking in front of a street camera<br />

3. <strong>Segmentation</strong> scheme<br />

Basically, the segmentation scheme is very similar to the one of the MODEST project<br />

(http://www.tele.ucl.ac.be/MODEST/) since it also aims at segmenting moving objects captured by a still<br />

camera. However, the framework here differs on two points:<br />

• At first, the algorithm must run real-time, what prevents the use of too sophisticated algorithms.<br />

• Secondly, since the algorithm is running at any time, un<strong>de</strong>r any weather conditions, some special<br />

attention has to be paid to the adaptation to illumination variations… Although the camera does not<br />

move, the background changes!<br />

When segmenting the image at time t, two change masks are generated by comparison with the previous<br />

and the next pictures. Since every of these masks contains not only the moving object at its location at time<br />

t but also its location in the reference frame, both masks are combined with a logical AND operator. The<br />

resulting mask (Masktemp) generally <strong>de</strong>picts in a very good way the object contours. However, the insi<strong>de</strong> of<br />

the objects is not always correctly <strong>de</strong>tected as a part of this change mask. It is the reason why a reference<br />

image background is used. The change mask between this background image and the image to segment<br />

(Maskbg) often allows <strong>de</strong>tecting the insi<strong>de</strong> of objects. A logical OR operation applied to Masktemp and<br />

Maskbg provi<strong>de</strong>s the system with a reasonably good segmentation of the moving objects. Moreover, objects<br />

that stop moving are also <strong>de</strong>tected since they appear in Maskbg. Figure 2 presents this combination scheme.<br />

If the use of a reference background image is crucial to enable a complete and fast segmentation of objects,<br />

it does nevertheless necessitate to automatically generating this image. The problem is not trivial since the<br />

2


ackground illumination changes along time or due to weather conditions: the two images of figure 3 for<br />

instance <strong>de</strong>pict the background aspect of the “Grand-rue” before and after the sunset.<br />

Mask temp<br />

AND<br />

Mask prev<br />

Mask next<br />

-<br />

-<br />

Update<br />

background<br />

!<br />

Mask tmp2<br />

OR<br />

Mask final<br />

<strong>Image</strong> t-1<br />

<strong>Image</strong> t<br />

<strong>Image</strong> t+1<br />

Background<br />

image<br />

Figure 2: The scheme used for segmentation.<br />

Mask bg<br />

Mask bgRef<br />

It is obvious that the background image cannot be extracted once for all but needs to be constantly updated.<br />

The solution is to use a mobile median filter of size b. In or<strong>de</strong>r to further improve the quality of this<br />

background image, only pixels which do not belong to moving objects, i.e. which are not part of Masktemp,<br />

are injected in the filter. Since it is a mobile filter of fixed size, a new pixel to be taken into account<br />

replaces the ‘ol<strong>de</strong>st’ one among the b already in memory. Typically, b is set between 10 and 50. The good<br />

behavior of this mobile median filter is <strong>de</strong>monstrated by the pictures of figure 3 since the right one results<br />

from the progressive adaptation of the left one.<br />

<strong>On</strong>e has to note that the change <strong>de</strong>tection steps of figure 2 are not performed by mere comparison. Due to<br />

possible camera instabilities, the global illumination change between images is taken into account when<br />

comparing images: a histogram is built with the difference of the mean of paired 16x16 blocks, and the<br />

OR<br />

-<br />

AND<br />

Mask tmp<br />

z -1<br />

3


illumination change is estimated as the histogram peak. Moreover, change is not <strong>de</strong>tected with a simple<br />

threshold T but with an adaptive threshold T+α.p, where p represents the value of the reference pixel being<br />

compared. α is typically set to 10, while T equals 1 or 2 to compute Maskprev and Masknext, and is equal to 5<br />

or 6 for computing Maskbg.<br />

Figure 3: background before and after the sunset.<br />

Nevertheless, such a need for adapting the background has one major drawback: any object who stops<br />

moving for at least (b/2)+1 images becomes part of the background since it does not appear on Masktemp<br />

anymore and is progressively injected into the median filter. It causes still objects to sud<strong>de</strong>nly disappear<br />

from Maskfinal. A possible solution to this problem would be not to adapt the background at location where<br />

the image differs from the background for a certain period of time: it consists in memorizing the<br />

background mask and combining it with the current mask and with the temporal mask as illustrated with<br />

dotted <strong>line</strong>s on the block diagram of figure2. If one consi<strong>de</strong>rs a moving objects that sud<strong>de</strong>nly becomes still,<br />

the object will appear in both Maskbg and MaskbgRef, and therefore also in Masktmp and Masktmp2: the<br />

background will never be updated at this location.<br />

Unfortunately, adding this dotted part of the scheme will make that any error in the estimation of the<br />

background image will belong to Maskbg and MaskbgRef and will never be corrected! Or very very slowly…<br />

This could be overcome if one memorizes the past of Maskbg over a duration of b/2 images, and then<br />

combines all these masks into Masktmp with a logical AND operator. It would drastically reduce the risk of<br />

error integration into the background. But, this would exaggeratedly increase the memory usage as well as<br />

the computational bur<strong>de</strong>n of the algorithm. This dotted part of the scheme is therefore not used.<br />

4. Performance evaluation<br />

Currently, the system runs real-time on a Pentium II – 350 MHz PC. A subset of the Linux kernel has been<br />

installed as OS. The algorithm is able to segment and compose 1 or 2 images (576 x 720 pels) per second<br />

with a background memory of size b=21. It has to be noted that about 90% of the computational time are<br />

spent for updating the background memory.<br />

5. References<br />

[1] Guest Editorial Note, Special Issue on <strong>Segmentation</strong>, Description, and Retrieval of Vi<strong>de</strong>o Content,<br />

IEEE transactions on Circuits and Systems for Vi<strong>de</strong>o Technology, 8(5), Sept. 1998.<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!