On-line Web Application using Image Segmentation - Laboratoire de ...

On-line Web Application using Image Segmentation 

Xavier Marichal 

Laboratoire de Télécommunications et Télédétection, Université catholique de Louvain 

B-1348 Louvain-la-Neuve, Belgium 

Marichal@tele.ucl.ac.be 

Abstract 

The present paper introduces a simple and efficient scheme to perform real-time software segmentation. 

Based on a combination of changed detection masks, it allows separating all moving objects from the 

background. It is currently used to provide web servers with segmented content. 

1. Introduction 

“The identification, access and representation of digital image and video information is becoming an 

integral part of many interactive multimedia applications today. The emerging MPEG-4 and MPEG-7 

standards will allow for content-based video coding and representation and content-based visual query in 

image and video. Various other applications, such as editing and manipulation of video sequences, video 

surveillance, or image and video indexing and retrieval applications, are equally dependent on the 

availability of sophisticated algorithms for content identification, content segmentation and content 

description.”[1] 

Although the design of suitable fully automatic algorithms, in particular for image sequence segmentation, 

is still considered an unsolved problem as a whole, the present paper introduces a simple yet robust 

segmentation algorithm that runs real-time on a Linux platform and provides a Web server with segmented 

content. 

2. Motivation 

Casterman S.A. (http://www.casterman.com) is a Belgian publisher involved in many fields of publication. 

Among other activities, Casterman is one of the major editors of comics, with for instance the famous 

Tintin (http://www.tintin.be). They are naturally seeking to extend their activities towards new technologies 

and are therefore pushing some comics to the Web. The virtual city of Urbicande is then born 

(http://www.urbicande.be), and proposes fans to navigate from link to link among a whole enigmatic site 

where they have to find secret passages. 

Although fun, this virtual city was only proposing static stories with hidden but static links. The objective 

therefore is to make the city live and animate it. The retained solution is to bring this virtual city to life 

thanks to real people that would inhabit its streets, bars… The final aim is to provide people with the 

possibility to interact both from their computer or by going in front of the cameras. 

Three cameras have been placed in the street of Louvain-la-Neuve (the city hosting the Université 

catholique de Louvain). The background of these three views has then be redrawn by an artist who 

redrafted the buildings… according to the architecture of the virtual city: Urbicande-la-Neuve. Then, all 

moving objects appearing in front of the camera are segmented in order to be incrusted in the virtual 

decors. Finally, the images are compressed (currently as an animated GIF, but evolution is ensured towards 

MPEG-4) and pushed to the Web. The streams are visible from http://urbicande.tele.ucl.ac.be. Figure 1 

presents one resulting snapshot.

Figure 1: A snapshot from ‘Urbicande-la-Neuve’ mixing virtual background and 

foreground with real people walking in front of a street camera 

3. Segmentation scheme 

Basically, the segmentation scheme is very similar to the one of the MODEST project 

(http://www.tele.ucl.ac.be/MODEST/) since it also aims at segmenting moving objects captured by a still 

camera. However, the framework here differs on two points: 

• At first, the algorithm must run real-time, what prevents the use of too sophisticated algorithms. 

• Secondly, since the algorithm is running at any time, under any weather conditions, some special 

attention has to be paid to the adaptation to illumination variations… Although the camera does not 

move, the background changes! 

When segmenting the image at time t, two change masks are generated by comparison with the previous 

and the next pictures. Since every of these masks contains not only the moving object at its location at time 

t but also its location in the reference frame, both masks are combined with a logical AND operator. The 

resulting mask (Masktemp) generally depicts in a very good way the object contours. However, the inside of 

the objects is not always correctly detected as a part of this change mask. It is the reason why a reference 

image background is used. The change mask between this background image and the image to segment 

(Maskbg) often allows detecting the inside of objects. A logical OR operation applied to Masktemp and 

Maskbg provides the system with a reasonably good segmentation of the moving objects. Moreover, objects 

that stop moving are also detected since they appear in Maskbg. Figure 2 presents this combination scheme. 

If the use of a reference background image is crucial to enable a complete and fast segmentation of objects, 

it does nevertheless necessitate to automatically generating this image. The problem is not trivial since the 

2

ackground illumination changes along time or due to weather conditions: the two images of figure 3 for 

instance depict the background aspect of the “Grand-rue” before and after the sunset. 

Mask temp 

AND 

Mask prev 

Mask next 

- 

- 

Update 

background 

! 

Mask tmp2 

OR 

Mask final 

Image t-1 

Image t 

Image t+1 

Background 

image 

Figure 2: The scheme used for segmentation. 

Mask bg 

Mask bgRef 

It is obvious that the background image cannot be extracted once for all but needs to be constantly updated. 

The solution is to use a mobile median filter of size b. In order to further improve the quality of this 

background image, only pixels which do not belong to moving objects, i.e. which are not part of Masktemp, 

are injected in the filter. Since it is a mobile filter of fixed size, a new pixel to be taken into account 

replaces the ‘oldest’ one among the b already in memory. Typically, b is set between 10 and 50. The good 

behavior of this mobile median filter is demonstrated by the pictures of figure 3 since the right one results 

from the progressive adaptation of the left one. 

One has to note that the change detection steps of figure 2 are not performed by mere comparison. Due to 

possible camera instabilities, the global illumination change between images is taken into account when 

comparing images: a histogram is built with the difference of the mean of paired 16x16 blocks, and the 

OR 

- 

AND 

Mask tmp 

z -1 

3

illumination change is estimated as the histogram peak. Moreover, change is not detected with a simple 

threshold T but with an adaptive threshold T+α.p, where p represents the value of the reference pixel being 

compared. α is typically set to 10, while T equals 1 or 2 to compute Maskprev and Masknext, and is equal to 5 

or 6 for computing Maskbg. 

Figure 3: background before and after the sunset. 

Nevertheless, such a need for adapting the background has one major drawback: any object who stops 

moving for at least (b/2)+1 images becomes part of the background since it does not appear on Masktemp 

anymore and is progressively injected into the median filter. It causes still objects to suddenly disappear 

from Maskfinal. A possible solution to this problem would be not to adapt the background at location where 

the image differs from the background for a certain period of time: it consists in memorizing the 

background mask and combining it with the current mask and with the temporal mask as illustrated with 

dotted lines on the block diagram of figure2. If one considers a moving objects that suddenly becomes still, 

the object will appear in both Maskbg and MaskbgRef, and therefore also in Masktmp and Masktmp2: the 

background will never be updated at this location. 

Unfortunately, adding this dotted part of the scheme will make that any error in the estimation of the 

background image will belong to Maskbg and MaskbgRef and will never be corrected! Or very very slowly… 

This could be overcome if one memorizes the past of Maskbg over a duration of b/2 images, and then 

combines all these masks into Masktmp with a logical AND operator. It would drastically reduce the risk of 

error integration into the background. But, this would exaggeratedly increase the memory usage as well as 

the computational burden of the algorithm. This dotted part of the scheme is therefore not used. 

4. Performance evaluation 

Currently, the system runs real-time on a Pentium II – 350 MHz PC. A subset of the Linux kernel has been 

installed as OS. The algorithm is able to segment and compose 1 or 2 images (576 x 720 pels) per second 

with a background memory of size b=21. It has to be noted that about 90% of the computational time are 

spent for updating the background memory. 

5. References 

[1] Guest Editorial Note, Special Issue on Segmentation, Description, and Retrieval of Video Content, 

IEEE transactions on Circuits and Systems for Video Technology, 8(5), Sept. 1998. 

4

On-line Web Application using Image Segmentation - Laboratoire de ...

Create successful ePaper yourself

Delete template?

Save as template?