12.07.2015 Views

Master Thesis - Hochschule Bonn-Rhein-Sieg

Master Thesis - Hochschule Bonn-Rhein-Sieg

Master Thesis - Hochschule Bonn-Rhein-Sieg

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fachbereich InformatikDepartment of Computer Science<strong>Master</strong> <strong>Thesis</strong>Visual Inspection of Fast Moving HeatShrink Tubes in Real-TimeAlexander BarthA thesis submitted to the <strong>Bonn</strong>-<strong>Rhein</strong>-<strong>Sieg</strong> University of Applied Sciencesin partial fulfillment of the requirements for the degree of<strong>Master</strong> of Science in Computer ScienceDate of submission: December 16, 2005Examination Committee:Prof. Dr.-Ing. Rainer Herpers (Supervisor)Prof. Dr. Dietmar Reinert


iDeclarationI hereby declare, that the work presented in this thesis is solely my work andthattothebestofmyknowledgethisworkisoriginal,exceptwhereindicatedby references to other authors.This thesis has neither been submitted to another committee, nor has it beenpublished before.St. Augustin, December 16, 2005Alexander Barth


AbstractHeat shrink tubing is widely used in electrical and mechanical applications for insulatingand protecting cable splices. Especially in the automotive supply industry accuracydemands are very high and quality assurance is an important factor in establishing andmaintaining customer relationships. In production, the heat shrink tubes are cut intolengths (between 20 and 100mm) from a continuous tube. During this process, however,deviations from the target length can occur.In this thesis, a prototype of a vision-based length measuring sensor for a range of heatshrink tubes is presented. The measuring is performed on a conveyor belt in real-time atvelocities of up to 40m/min. The tubes can differ in color, diameter and length.In a multi-measurement strategy, the total length of each tube is computed based on upto 11 single measurements while the tube is in the visual field of the camera. Tubes thatdo not meet the allowed tolerances between ±0.5mm and ±1mm depending on the targetlength are sorted out by air-pressure. Both the engineering and the software developmentare part of this thesis work.About 70% of all manufactured tubes are transparent, i.e. they show a poor contrastto the background. Thus, sophisticated but fast algorithms are developed which reliablydetect even low contrast tube edges under the presence of background clutter (e.g. belttexture or dirt) with subpixel accuracy. For this purpose, special tube edge templates aredefined and combined with model knowledge about the inspected objects. In addition,perspective and lens specific distortions have to be compensated.An easy to operate calibration and teach-in step has been investigated which is importanttobeabletoproducedifferenttubetypesatthesameproductionlineinshortintervals.The prototype system has been tested in extensive experiments at varying velocitiesand for different tube diameters and lengths. The measuring precision of non deformedtubes can reach 0.03mm at a conveyor velocity of 30m/min. Even with elliptical deformationsof the cross-section or deflections it is still possible to achieve an average precisionof < 0.1mm. The results have been compared to manually acquired ground truth measurements,which also show a standard deviation of about 0.1mm under ideal laboratoryconditions. Finally, a 100% control during production is possible with this system - reachingthe same accuracy and precision than humans without getting tired.v


ContentsAcknowledgmentsAbstractList of TablesList of Figuresiiivxixiii1. Introduction 11.1. MachineVision-StateofArt.......................... 11.2. ProblemStatement................................ 31.3. Requirements................................... 41.4. RelatedWork................................... 51.5. <strong>Thesis</strong>Outline .................................. 62. Technical Background 92.1. VisualMeasurements............................... 92.1.1. AccuracyandPrecision ......................... 92.1.2. InverseProjectionProblem ....................... 102.1.3. CameraModels.............................. 102.1.4. CameraCalibration ........................... 132.2. Illumination.................................... 162.2.1. LightSources............................... 162.2.2. IncidentLighting............................. 182.2.3. Backlighting ............................... 192.3. EdgeDetection.................................. 202.3.1. EdgeModels ............................... 202.3.2. DerivativeBasedEdgeDetection.................... 212.3.3. CommonEdgeDetectors ........................ 232.3.4. SubpixelEdgeDetection......................... 272.4. TemplateMatching................................ 293. Hardware Configuration 313.1. Conveyor ..................................... 313.2. Camerasetup................................... 333.2.1. CameraSelection............................. 333.2.2. CameraPositioning ........................... 383.2.3. LensSelection .............................. 383.3. Illumination.................................... 433.4. BlowOutMechanism .............................. 48vii


viiiContents4. Length Measurement Approach 514.1. SystemOverview................................. 514.2. ModelKnowledgeandAssumptions ...................... 534.2.1. CameraOrientation ........................... 534.2.2. ImageContent .............................. 534.2.3. TubesUnderPerspective ........................ 544.2.4. EdgeModel................................ 564.2.5. Translucency ............................... 564.2.6. TubeOrientation............................. 574.2.7. BackgroundPattern ........................... 594.3. CameraCalibration ............................... 594.3.1. CompensatingRadialDistortion .................... 594.3.2. Fronto-Orthogonal View Generation . . . ............... 604.4. TubeLocalization ................................ 654.4.1. GrayLevelProfile ............................ 654.4.2. ProfileAnalysis.............................. 664.4.3. PeakEvaluation ............................. 684.5. MeasuringPointDetection ........................... 754.5.1. EdgeEnhancement............................ 754.5.2. TemplateBasedEdgeLocalization................... 784.5.3. TemplateDesign ............................. 804.5.4. SubpixelAccuracy ............................ 874.6. Measuring..................................... 894.6.1. DistanceMeasure............................. 904.6.2. PerspectiveCorrection.......................... 904.6.3. TubeTracking .............................. 914.6.4. TotalLengthCalculation ........................ 924.7. Teach-In...................................... 934.7.1. RequiredInput.............................. 934.7.2. Detection Sensitivity ........................... 934.7.3. PerspectiveCorrectionParameters................... 944.7.4. CalibrationFactor ............................ 945. Results and Evaluation 975.1. ExperimentalDesign............................... 975.1.1. Parameters ................................ 975.1.2. EvaluationCriteria............................ 995.1.3. GroundTruthMeasurements ......................1025.1.4. Strategies .................................1045.2. TestScenarios...................................1055.3. ExperimentalResults...............................1075.3.1. Noise ...................................1075.3.2. MinimumTubeSpacing .........................1095.3.3. ConveyorVelocity ............................1105.3.4. TubeDiameter ..............................1165.3.5. Repeatability...............................1215.3.6. Outlier ..................................123


Contentsix5.3.7. TubeLength ...............................1245.3.8. Performance ...............................1265.4. DiscussionandFutureWork...........................1306. Conclusion 133Appendix 135A. Profile Analysis Implementation Details 137A.1.GlobalROI ....................................137A.2.ProfileSubsampling ...............................138A.3.ScanLines.....................................138A.4.NotesonConvolution ..............................140B. Hardware Components 141B.1.Camera ......................................141B.2.IlluminationHardware..............................142Bibliography 145


xContents


List of Tables1.1. Rangeoftubetypesconsideredinthisthesis. ................. 41.2. Tolerancespecifications ............................. 53.1. Lensselection-Overview ............................ 423.2. Lensselection-FieldofViewatminimumobjectdistance.......... 433.3. Lensselection-Workingdistances ....................... 433.4. Blowoutcontrolprotocol ............................ 484.1. Thresholdcomparisonofprofileanalysis.................... 734.2. Comparisonofdifferentedgedetectors..................... 764.3. Templatecurvaturetestsetparameters .................... 825.1. Overviewondifferenttestparameters ..................... 985.2. Constantsoftwareparametersettingsthroughouttheexperiments. ..... 985.3. Testsetusedtodeterminethehumanvarianceinmeasuring. ........1025.4. Resultsof50mmtubesatdifferentvelocities(black) .............1125.5. Resultsof50mmtubesatdifferentvelocities(transparent)..........1135.6. Resultsof50mmtubeswithdifferentdiameterat30m/min .........1165.7. Resultsofblowoutexperiment .........................1255.8. Resultsof30mmand70mmtubesat30m/min ................127B.1. Camera specifications for the AVT Marlin F-033C and F-046B. . ......141B.2. Light Source (A20800.2) with DDL Lamp ...................142B.3.Backlightspecifications .............................142B.4.Lampspecifications................................143xi


xiiList of Tables


List of Figures2.1. AccuracyandPrecision ............................. 102.2. Parallellinesatperspective ........................... 112.3. Pinholegeometry................................. 122.4. Thinlensmodel ................................. 132.5. Incidentlightingsetups ............................. 182.6. Edgemodels ................................... 212.7. Comparisonofdifferentedgedetectors..................... 242.8. Orientationselectivefilters ........................... 272.9. Subpixelaccuracyusinginterpolationtechniques ............... 283.1. Hardwaresetupoftheprototype ........................ 323.2. BAYERmosaic.................................. 343.3. Comparisonofcolorandgraylevelcamera................... 363.4. Colorinformationoftransparenttubes..................... 373.5. Telecentriclens.................................. 403.6. FieldofViewgeometry ............................. 423.7. Tubesatdifferentfrontlightingsetups..................... 443.8. Backlightingthroughaconveyorbelt ..................... 453.9. Polarizedbacklighting.............................. 463.10.Backlightpanel ................................. 473.11.Blowoutsetup .................................. 484.1. Systemoverview ................................. 524.2. Potentialimagestates .............................. 534.3. Tubemodels ................................... 544.4. Measuringplanedefinition............................ 554.5. Characteristicintensitydistributionoftransparenttubes........... 574.6. Tubeorientationerror .............................. 584.7. Cameracalibration-Calibrationimages.................... 604.8. Cameracalibration-Subpixelcornerextraction................ 614.9. Cameracalibration-Extrinsicparameters................... 614.10.Cameracalibration-Radialdistortionmodel ................. 624.11.Camerapositioning-OnlineGridCalibration................. 644.12.Camerapositioning-controlpoints....................... 654.13.Scanlinesforprofileanalysis .......................... 664.14.Profileanalysis .................................. 704.15.Motivationforaregion-basedprofilethreshold ................ 724.16.Ghosteffect.................................... 734.17.Characteristictubeedgeresponses ....................... 79xiii


xivList of Figures4.18.TemplateDesign ................................. 814.19.TemplateOccurrence............................... 834.20.Templatewithextremeheightweightingcoefficient.............. 834.21.TemplateWeighting ............................... 844.22.Templaterotation-Motivation ......................... 854.23.Templatecurvatureoccurrence ......................... 864.24.Subpixelaccuratetemplatematching...................... 884.25.Perspectivecorrectionfunction ......................... 905.1. Measuring slide used for acquiring ground truth measurements by hand. . . 1025.2. Intraandinterhumanmeasuringvariance...................1035.3. Supplytube....................................1045.4. Accuracyevaluationoflengthmeasurementsatsyntheticsequences.....1085.5. Resultsofminimumspacingexperiment ....................1095.6. Minimumtubespacingforblacktubes.....................1105.7. Measuringresultsat20m/min..........................1115.8. Resultsof8mmblacktubesat30m/min....................1135.9. Resultsof8mmtransparenttubesat30m/min ................1145.10.Brightnessvarianceofanemptyconveyorbeltatbacklight .........1155.11.Bent6mmtube..................................1165.12.Experimentalresultsofblacktubeswith6and12diameter .........1175.13.Groundtruthdistanceofblacktubeswith6and12mmdiameter......1185.14.Influenceofcross-sectiondeformationsat12diametertubes.........1195.15. Experimental results of transparent tubes with 6 and 12mm diameter . . . 1205.16. Ground truth distance of transparent tubes with 6 and 12mm diameter . . 1205.17.Failureoftubeedgedetectionduetoapoorcontrast.............1215.18.Repeatabilityofthemeasurementofonetube.................1225.19. Repeatability of the measurement of a metallic cylinder . ..........1235.20.Resultsofoutlierexperiment ..........................1245.21.Resultsof30mmand70mmtubesat30m/min ................1275.22.Performanceevaluationresults .........................1295.23.Backgroundsuppressioninthefrequencydomain...............131A.1.Comparisonofdifferentscanlines........................139


1. IntroductionHeat shrinkable tubing is widely used for electrical and mechanical insulation, sealing,identification and connection solutions. Customers are mainly from the automotive, electronics,military or aerospace sector. In terms of competition in world markets, highquality assurance standards are essential in establishing and maintaining customer relationships.Especially in the automotive supply industry, accuracy demands are very high,and tolerated outliers are specified in only a few parts-per-million.In this master thesis, a prototype of a vision-based sensor for real-time length measurementof heat shrink tubes in line production is presented. The main objectives areaccuracy, reliability and meeting time constraints.The thesis work has been accomplished in cooperation with the company DSG-Canusa,Meckenheim, Germany.1.1. Machine Vision - State of ArtThis section gives an overview on the term Machine Vision (MV), the use of vision systemsin industrial applications, and a brief historical review. In addition, the advantages anddrawbacks of MV are discussed and related applications are presented. The term MachineVision is defined by Davies [16] as follows:“Machine Vision is the study of methods and techniques whereby artificial vision systemscan be constructed and usefully employed in practical applications. As such, itembraces both the science and engineering of vision.”Researchers and engineers argue whether the terms Machine Vision and ComputerVision can be used synonymously [7]. Both terms are part of a larger field called ArtificialVision and have many things in common. The main objective is to make artificial systems‘see’. However, the priorities of the two subjects differ.Computer Vision has arisen in the academic field and concentrates mainly on theoreticalproblems with a strong mathematical background. Usually, as the term Computer Visionindicates, a computer processes an input image or a sequence of images. Nevertheless,many methods and algorithms developed in Computer Vision can be adapted to practicalapplications.Machine Vision, on the other hand, implies practical solutions for many applications,and covers not only the image processing itself, but also the engineering that makes asystem work [16]. This includes the right choice of the sensor, optics, illumination, etc.MV systems are often used in industrial environments making robustness, reliability andcost-effectiveness very important. If an application is highly time-constrained or computationallyexpensive, specific hardware (e.g. DSPs, ASICs, or FPGAs) is used instead ofan off-the-shelf computer [42]. A current trend is to develop imaging sensors, that have1


2 CHAPTER 1. INTRODUCTIONon-chip capabilities for image processing algorithms. Thus, the image processing movesfrom the computer into the camera superseding the bottleneck of data transfer.During the 1970s and 1980s, western companies faced a new challenge with the Asianmarket [7]. Especially countries like Japan established new production methods, leadingto an increased significance of quality in manufacturing at the international markets.Many western companies proved unable to meet the challenge and failed to survive, whileothers realized the importance of quality assurance and started to investigate the use ofnew technologies like Machine Vision. MV has many advantages and is able to improveproduct quality, to enhance processing efficiency, and to increase operational safety.In the early 1980s, the development in the field of Artificial Vision was slow and mainlyacademic, and the industrial interest was low until the late 1980s and early 1990s [7]. Asignificant progress in computer hardware allows for real-time implementations of imageprocessing algorithms, developed over the past 25 years, on standard platforms. Thedecreasing costs for computational power made MV systems more and more attractive,leading to a growth of MV applications and companies developing such systems. Today,the field of MV has become a confident multi-million dollar industry [7].The objectives of MV systems include position recognition, identification, shape anddimension check, completeness check, image and object comparison, and surface inspection[18]. Usually, the goal is to detect and sort out production errors or to guide a robot arm(or other devices) in a particular task [42].MV systems can be found in all industrial sectors and cover a huge range of inspectedobjects. Dimensional measuring tasks can be found for example in the inspection ofbottles on assembly lines [72], wood [15, 50], screw threads [34], or thin-film disk heads[61]. Measuring objects is often related to 3D CAD models [23, 43]. An example forguiding a robot arm in grasping 3D sheet metal parts is given in [52]. Giving a detailedoverview on all potential applications is beyond the scope of this thesis.Guaranteed product quality can help to establish and maintain customer relationships,enhancing the competitive position of a company. The main advantage of visual inspectionin quality control is, beside its versatile range of applications, that it is non-contact, clean,fast [7].Although the interpretative capability of today’s vision systems can not achieve theability of the human visual system in the overall case, it is possible to develop systems thatperform better than people at some quantitative task. However, this assumes controlledand circumscribable conditions, reducing the problem to a defined and repetitive task.Usually, such conditions can be established at manufacturing lines.A human operator can be expected to be only 70-80% efficient, even under ideal conditions[7]. In practice, there are many factors that can reduce this productive efficiencyof humans like tiredness, sickness, boredom, alcohol or drugs. For example, if a humanis instructed to observe objects on a conveyor, this task is tiring and it is not unlikelythat the operator is distracted after a while. On the other hand, a MV system could,theoretically, perform the same task 24 hours a day and 365 days a year without gettingtired.If the inspection is performed in surroundings were working can be unpleasant, intolerable,dangerous or harmful to health for a human being, MV is a welcome option. Thisincludes working under high (or low) temperatures, chemical exhalation, smoke, biological


1.2. PROBLEM STATEMENT 3hazards, risk of explosion, x-rays, radioactive material, loud noise levels, etc. [7]. On theother hand, in applications that require aseptic conditions as in the food or pharmaceuticalindustry, a human operator can be a ‘polluter’ as a source of dirt particles (hair, danders,bacteria, etc.). In this case, a MV system is a clean alternative.Machines usually exceed humans in all kinds of accurate vision-based measurements.Human vision performs well in comparing objects and in detecting differences for examplein shape, color or texture [27]. Large deviations can be detected quickly. As the differenceis getting smaller, however, the time of inspection increases or the deviation can not bedetected at all without technical tools. With respect to the task considered in this thesis,a human is not able to determine the length of an object at sub millimeter precision justby looking at it. Manual measurements are slow and not practicable in line production if100% control is desired, and can thus be used only for random inspection of few objects.MV systems on the other hand can measure the length (or other features) of an objectwithout contact up to nm precision - depending on the optical system and the size of theobject [16]. Furthermore, humans soon reach limits, if the number of objects to inspect perminute increases significantly. Many manufacturing processes are so fast that the humaneye has problems to even perceive the objects, not to mention the ability to accomplishany inspection task. MV systems, however, can handle several hundred objects per minutewith high accuracy.Although MV systems have many advantages for manufacturers, there are also drawbacks.Usually, a MV system is designed and optimized for a specific task in a constraintenvironment. If the requirements of the application change, the system has to be adapted,which can be difficult and expensive [7]. Furthermore, the system can be sensitive to alot of influences of the (industrial) environment like heat, humidity, dirt, dust, or ambientlighting. Respective precautions have to be taken to protect the system. Finally, like ingeneral in automation, vision systems that exceed the power of a human at some specifictask, replace human operators and will therefore supersede mostly low-skilled jobs in thefuture. Addressing this problem in more detail is outside the scope of this thesis.1.2. Problem StatementA large variety of heat shrink tubes of different sizes, material and shrinking properties isavailable on the market. The focus in this thesis will be on the DSG- Canusa DERAY-SPLICEMELT series. These tubes are commonly used for insulation of cable splices inthe automotive industry (see Figure 1.1 for an example). A film of hotmelt adhesive insidethe heat shrink tubes provides a waterproof sealing around the splice after shrinking. Inaddition, the DERAY- SPLICEMELT series shows a strong resistance against thermal,chemical and mechanical strains. The easy and fast handling allows for an application inseries production. Accordingly, if the heat shrinking is performed in an automated fashion,the accuracy demands increase.In production, the heat shrink tubes are cut into lengths from a continuous tube. Duringthis process, however, deviations from a specific target length can occur. In terms of qualityassurance, any deviations above a tolerable level must be detected so that failings can besorted out.


4 CHAPTER 1. INTRODUCTION(a) (b) (c)Figure 1.1: Application of a transparent heat shrink tube of type DERAY- SPLICEMELT.After shrinking the heat shrink tube provides a robust, waterproof insulation of the cablesplice. (Source: DSG-Canusa)PropertyColorLengthDiameterAttributestransparent, black20-100mm6, 8, 12mmTable 1.1: Range of tube types considered in this thesis.Delivering defectives must be avoided at highest priority to satisfy the customer and toretain a good reputation. In this context, tolerable failure rates are specified in parts permillion. Rejected goods can be very expensive.Up to now, length measurements have been performed manually by a human operator.This has several drawbacks. First, only random samples can be controlled by hand, since10 parts per second and more considerably exceed the human capabilities. Furthermore,one operator is busy doing the monotone measuring task at one machine and can not bedeployed to other tasks. This leads to a low effective productivity. In practice, more thanone production line that cuts the heat shrink tubes into lengths is running in parallel,requiring even more human resources which is very expensive. In addition, there is alwaysa non-negligible possibility of subjective errors when human operators carry out theinspections - they also show symptoms of fatigue over time in this highly repetitive task.The measuring quality varies detectable between morning and late shift.In this thesis work a machine vision inspection system is developed that is able to replacethe human operator at this particular measuring task allowing for a reliable 100% control.1.3. RequirementsThe system must cover a range of tube types, differing in diameter, length or materialproperties. An overview of the variety of tube types can be found in Table 1.1.ThetwomainclassesofDERAY-SPLICEMELT heat shrink tubes considered in thisthesis are black or transparent in color - transparent tubes cover about 70% of the production.Unlike black tubes, the transparent ones are translucent and appear slightlyyellowish or reddish due to a film of hotmelt adhesive inside the tube.Most tubes have a printing on the surface that can consist of both letters and numbers(e.g. DSG2). Since this printing is plotted onto the continuous tube before being cut into


1.4. RELATED WORK 5Length [mm] Tolerance [mm]20 − 30 ±0.531 − 50 ±0.751 − 100 ±1.0Table 1.2: Tolerance specifications of different tube lengthslengths, the position of the printing is not consistent among the tubes and must not affectthe measuring results.The tube length ranges from 20mm to 100mm. In this thesis, however, the focus willbe on 50mm tubes since this is the dominant length in production. The outer diametervaries between 6mm and 12mm.The tolerances differ between 0.5 and1.0mm depending on the tube length as can beseen in Table 1.2. This table includes the tolerable deviations from a given target lengthin mm.Themeasurementshavetobeaccomplishedatlineproductiononaconveyorinrealtime.The system is intended to reach a 100% control without reducing production velocity.Currently the conveyor runs at approximately 20m/min, i.e. 3-17 tubes per second arecut depending on the segment size. Theoretically the cutting machine is able to run atup to 40m/min. A faster velocity results in less processing time per tube segment. Thesystem design must be robust with respect to industrial use. Theoretically, it must beable to run stable 24 hours/day, 7 days/week and 365 days/year.Although there are many different tube types, only one kind of tube is processed atone production line over a certain period of time. This means, the tube segments to beinspected on the conveyor are all of the same kind. However, to be flexible to customerdemands, a production line must be able to be rearranged to a different kind of tubeseveral times a day. This emphasizes the importance of an easy to operate calibration andteach-in step of the inspection system for practical application.The goal of the visual inspection is a reliable good/bad decision for each tube segmentwhether it has to be sorted out or not. In the following, tube segments wrongly classifiedas proper, but nevertheless deviating from the given target length above the allowedtolerances (see Table 1.2), are denoted as false positives. On the other hand, false negativesare tube segments that are classified for sorting out, although the actual length meets thetolerances. To reach optimal product quality, the number of false positives must be reducedto zero. Large numbers of false negatives indicate that the system is not adjusted properlyand has to be reconfigured.1.4. Related WorkIn Section 1.1 several examples of vision-based measuring systems in industrial applicationshave been presented. Much more work in this area has been done over the past 20 years[4]. However, MV related publications of academic interest often consider only specificsubproblems, but do not present a detailed insight of the whole system. On the otherhand, commercial manufacturers of MV systems hide the technical details in order to keepthe competitive advantage [18].


6 CHAPTER 1. INTRODUCTIONThere are several useful books addressing the fundamental methods, techniques andalgorithms used to develop machine vision applications in a comprehensive fashion [7, 16,18, 62].Dimensional measuring of objects requires knowledge of an object’s boundaries. Acommon indicator for object boundaries, both in human and artificial vision, are edges.Edge detection is a widely investigated area of vision research dating back from 1959 in thefield of TV signal processing [37] to the present. The edge detection methods consideredin this thesis are related to the work of Sobel [36, 51], Marr and Hildreth [45] and Canny[13].In addition, anisotropic approaches have been proposed [69], i.e. orientation selectiveedge detectors. These filters have many applications for example in texture analysis orin the design of steerable filters that efficiently control the orientation and scale of filtersto extract certain features in an adaptive way [25, 49]. Many of these approaches aremotivated in early human vision. In their investigation of the visual cortex, Hubel andWiesel discovered orientation selective cells in the striate cortex V1 [33]. In several theoriesit is assumed that humans perceive low-level features such as edges or lines by combinationsof the response of these cells [27]. Many computer vision researchers, however, adaptedthe idea of orientation selective cells or filters which can be combined to produce a certainresponse. Such sets of filters are often called filter banks. Malik and Perona [44] used afilter bank based on even symmetric difference of offset Gaussians (DOOG) for texturediscrimination.The discrete pixel grid resolution of CCD camera images limits the measuring accuracy.Thus, several techniques have been proposed that compute subpixel edge positions [6, 41,66, 56, 71].A common task in vision applications is to search whether a particular pattern is partof an image, and if so, where it is located [28]. Template matching is one method totackle this problem. Cross-correlation techniques are widely used as measure of similarity[64, 18, 62]. In stereo vision, correlation is used to solve the problem of correspondencesbetween the left and right view [21, 65]. Other practical applications can be found infeature trackers, pattern recognition, or registration of e.g. medical image data.Accurate visual measurements often require a camera calibration step to relate 3D pointsin the real world to image coordinates and to compensate for lens distortions. One earlyapproach was presented by Tsai [39, 67]. An extensive introduction into calibration isgiven by Faugeras [21] or Hartley and Zisserman [30]. The calibration approach in thisthesis work is closely related to the work of Zhang [74] and Heikkilä and Silvén [31].1.5. <strong>Thesis</strong> OutlineThe remainder of this thesis is organized as follows: Chapter 2 provides the theoreticalbackground on models and techniques used in later sections with regard to measuring withvideo cameras. This chapter also gives an overview on different illumination techniquesused for machine vision applications.In Chapter 3, the physical design of the system is introduced. Especially the cameraand lens selection as well as the illumination setup are discussed in detail in this chapter.


1.5. THESIS OUTLINE 7The vision part of the system is presented in Chapter 4. After describing assumptionsand model knowledge used throughout the inspection, the different steps of the lengthmeasuring are proposed. This chapter also contains the calibration and teach-in of thesystem as well as the algorithms and techniques used to perform the measuring task withrespect to real-time demands.The system is systematically evaluated in Chapter 5. Therefore, several quantitativeand qualitative evaluation criteria as well as different test scenarios are introduced. Theautomated measurements are compared to human measurements in terms of accuracy andprecision. Finally, the results are discussed, and ideas for future work are given. Thethesis concludes with a summary on the presented work in Chapter 6.


8 CHAPTER 1. INTRODUCTION


2. Technical Background2.1. Visual MeasurementsThis section introduces the basic concepts and techniques making visual measurementspossible. It is elementary to understand the fundamental process of image acquisitionas well as the underlying camera models and geometries to be able to understand whatparameters influence the measurement of real world objects in video images. Based onthese concepts one can determine the factors that influence accuracy and precision.Extracting information about real world objects from images in machine vision applicationsis closely related to the area of photogrammetry. In [5], photogrammetry is definedas the art, science, and technology of obtaining reliable information about physical objectsand the environment through the processes of recording, measuring, and interpreting photographicimages and patterns of electromagnetic radiant energy and other phenomena.There are many traditional applications of photogrammetry in geography, remote sensing,medicine, archaeology, or crime detection. In machine vision applications, there is awide range of measuring tasks including dimensional measuring (size, distance, diameter,etc.) or angles. Although sophisticated algorithms can increase accuracy, the quality andrepeatability of measurements is always related to the hardware used (e.g. camera sensor,optical system, digitizer) as well as the environmental conditions (e.g. illumination).2.1.1. Accuracy and PrecisionThroughout this thesis the terms accuracy and precision are used quite often and aremostly related to measuring quality. Although these terms may be used synonymously ina different context, with respect to measurements they have a very distinct meaning.Accuracy relates a measured length to a known reference truth or ground truth. Thecloser a measurement approximates the ground truth, the more accurate is the measuringsystem. Precision represents the repeatability of measurements, i.e. how much differentmeasurements of the same object vary. The more precise a measuring system is, the closerlie the measured values together.Figure 2.1 visualizes the definition of accuracy and precision in a mathematical sense.The distribution of a set of measurements can be expressed in terms of a Gaussian probabilitydensity function. The peak of this distribution corresponds to the mean value of themeasurements. The distance between the mean value and the reference ground truth valuedetermines the accuracy of this measurement. The standard deviation of the distributioncan be used as measure of precision.It is important to state that accuracy does not have to imply precision and vice versa.For example the measuring result of a tube of 50mm length could be 50 ± 20mm. Thisstatement is very accurate but not very precise. On the other hand a measuring system canbe very precise, but not accurate if it is not calibrated correctly. Thus, good measurementsfor industrial inspection tasks have to be both accurate and precise.9


10 CHAPTER 2. TECHNICAL BACKGROUND10.9Accuracy0.80.70.60.50.40.30.20.1Precision049.5 49.6 49.7 49.8 49.9 50 50.1 50.2 50.3 50.4 50.5ReferenceValueFigure 2.1: Visualization of the difference between accuracy and precision in terms of measurements.A good measuring system must be both accurate and precise.2.1.2. Inverse Projection ProblemA general problem of human vision denoted as inverse projection problem [27] can alsobe applied for artificial systems. It states that the (perspective) projection of threedimensionalworld objects onto a two-dimensional image plane can not be inverted welldefined.The loss of dimension indicates a loss of information, which can not be compensatedin general, since it is possible to produce the same stimulus on the human retina orthe camera sensor by different origins. So several objects of different size or shape can lookidentical in an image. One important property to consider in this context is the influenceof perspective. The term perspective is further discussed again in Section 2.1.3.Humans can compensate for the inverse projection problem by certain heuristics andmodel knowledge of the scene in many situations. Similar techniques can be adaptedto artificial systems. Especially in machine vision applications where conditions are welldefinedand known, model knowledge of the inspection task can be derived and integrated.2.1.3. Camera ModelsThere are several approaches to model the geometry of a camera. Addressing all thesemodels is outside the scope of this thesis. In the following only the most common cameramodels are introduced that provide a theoretic basis for visual measurements with CCDcameras.Pin Hole Camera The simplest form of a camera known as camera obscura was inventedin the 16th century. The underlying principle of this camera was already known long beforeby Aristotle (384-322 BC): Light enters an image plane through an (ideally) infinite smallhole, so only one ray of light from the world passes through the hole for each point inthe 2D image plane leading to an one-to-one correspondence. Objects at a wide rangeof distances from the camera can be imaged sharp and undistorted [65, 73]. The cameraobscura is formally named pin hole camera. In the non-ideal case the pinhole has a finitesize, thus, each image point collects light from a cone of rays.


2.1. VISUAL MEASUREMENTS 11(a)Figure 2.2: Parallel lines intersect at horizon at perspective. Image taken by F. Wagenfeldat Alaska Highway between Watson Lake and Whitehorse, Canada.In the 15th century, Filippo Brunelleschi used the pin hole camera model to demonstratethe laws of perspective discovered earlier [24, 38]. Two main effects characterize the pinhole perspective or central perspective:Close objects appear larger than far onesParallel lines intersect at horizonFigure 2.2 visualizes these effects of perspective at an example.A drawback of the pinhole camera with respect to practical use in combination with aphotosensitive device is its long exposure time, since only a little amount of light entersthe image plane at one time [65]. However, the pinhole model can be used to derivefundamental properties in a mathematical sense that describe the imaging process. Theseproperties can be extended by more realistic models to imply real imaging devices.Figure 2.3(a) gives an overview over the pinhole geometry. The camera center O, alsodenoted as optical center or center of projection, is the origin of a 3D coordinate systemwith the axis X, Y and Z. This 3D coordinate system is denoted as camera referenceframe or simply camera frame. The image plane Π I is defined to be parallel to the XYplane, i.e. perpendicular to the Z axis. The point o where the Z axis intersects the imageplaneisreferredtoasimage center. TheZ axis, i.e. the line through O and o is denotedas optical axis.The fundamental equations of a perspective camera describe the relationship betweenapointP =(X, Y, Z) T in the camera frame and a point p =(x, y) T in the image plane:x = f X Zy = f Y Z(2.1)(2.2)where f is the focal length of the camera. p can be seen as the point of intersection of aline through P and the center of projection with the image plane Π I [30]. This relationshipcan be easily derived from Figure 2.3(b). In the following, lower-case letters will always


12 CHAPTER 2. TECHNICAL BACKGROUND(a)(b)Figure 2.3: (a) Pinhole geometry. (b) Projection of a point P in the camera frame onto theimage plane Π I (herewithregardtoY ).indicate image coordinates, while upper-case letters refer to 3D coordinates outside theimage plane.Weak-Perspective Camera If the relative distance between points in the camera framewith respect to the Z axis (scene depth) is small compared to the average distance fromthe camera, these points are approximately projected onto the image plane like lying allon one Z-plane Z 0 .Thus,theZ coordinate of each point can be approximated by Z 0 as:x ≈ f X Z 0(2.3)y ≈ f Y Z 0This has the effect of all points being projected with a constant magnification [24]. IfthedistancebetweencameraandplaneZ 0 increases to infinity, there is a direct mappingbetween 3D points in the camera frame and in the image plane:x = X (2.4)y = YThis projection is denoted as orthographic projection [65]. To overcome the describedproblems of pinhole cameras, real imaging systems are usually provided with a lens whichcollects rays of light and brings them into focus on the image plane.Thin Lens Camera The simplest optical system can be modeled by a thin lens. Themain characteristics of a thin lens are [65]:


2.1. VISUAL MEASUREMENTS 13Figure 2.4: Thin lens camera model.Any ray entering the lens parallel to the axis on one side goes through the focus onthe other sideAny ray entering the lens from the focus on one side emerges parallel to the axis onthe other sideThe geometry of a thin lens imaging system is shown in Figure 2.4. F and ˆF are thefocus points before and behind the lens. From this model one can derive the fundamentalequation of thin-lenses [65]:1Z + 1 z = 1 (2.5)fwhere Z is the distance or depth of a point to the lens and z the distance between thelens and the image plane. The focal length f, i.e. the distance between the focus pointand the lens is equal at both sides of the thin lens in the ideal model.Thick Lens Camera Real lenses are represented much better by a thick lens model. Thethin lens model does not consider several aberrations that come with real lenses. Thisincludes defocusing of rays that are neither parallel nor go through the focus (sphericalaberration), different refraction based on the wavelength or color of light rays enteringthe lens (chromatic aberration), or focusing of objects at different depths. Another factorthat is important with real lenses with respect to accurate measuring applications, is lensdistortion. Ideally, a world point, its image point and the optical center are collinear, andworld lines are imaged as lines [30]. For real cameras this model does not hold. Especiallyat the image boundaries, straight lines appear curved (radial distorted). The effect ofdistortion will be re-addressed in following sections.2.1.4. Camera CalibrationUntil now, all relationships between 3D points and image coordinates have been definedwith respect to a common (camera) reference frame. Usually, the location of a pointin the world is not known in camera coordinates. Thus, if one wants to relate worldcoordinates to image coordinates, or vice versa, one has to consider geometric models and


14 CHAPTER 2. TECHNICAL BACKGROUNDphysical parameters of the camera. At this stage, one can distinguish between intrinsicand extrinsic parameters [24].Intrinsic Parameters The intrinsic parameters describe the projection of a point in thecamera frame onto the image plane, i.e. the transformation of camera coordinates intoimage coordinates. This transformation extends the ideal perspective camera model introducedin the previous section with respect to properties of real CCD cameras. One canderive the following projection matrix M i :⎛⎞−f/s x k o xM i = ⎝ 0 −f/s y o y⎠ (2.6)0 0 1where f represents the focal length, s x and s y theeffectivepixelsizeinx and y directionrespectively, k the skew coefficient, and (o x ,o y ) the coordinates of the image center. α = sys xis the aspect ratio of the camera. If α = 1, the sensors of the CCD array are ideally square.The skew coefficient k determines the angle between the pixel axis and is usually zero, i.e.the x- and y axis are perpendicular. (o x ,o y ) can be seen as an offset that translates theprojection of the camera origin onto the image origin in pixel dimensions. If s x = s y =1and o x = o y = k =0,M i represents an ideal pinhole perspective camera.Extrinsic Parameters The extrinsic parameters take the transformation between a fixedworld coordinate system (or object coordinate system) and the camera coordinate systeminto account. This includes the translation and rotation of the coordinate axis [65], i.e. atranslation vector T =(T x T y T z ) T and a 3 × 3 rotation matrix R such as:⎛M e = ⎝r 11 r 12 r 13 −R T 1 Tr 21 r 22 r 23 −R T 2 Tr 31 r 32 r 33 −R T 3 T⎞⎠ (2.7)where r ij (i, j ∈{1, 2, 3}) are the matrix elements of R at (i, j) andR i indicates theith row of R.Thus, the relationship between world and image coordinates can be written in terms oftwo matrix multiplications [65]:⎛⎝x1x2x3⎛⎞⎠ = M i M e⎜⎝XYZ1⎞⎟⎠ (2.8)with (X, Y, Z, 1) T representing a 3D world point in homogeneous coordinates, and imagecoordinates can be computed as x = x1/x3 andy = x2/x3 respectively. M = M i M e isdenoted as projection matrix in the following.Image Distortion The resulting image coordinates may be distorted by the lens, i.e.linear projection is not guaranteed. If high accuracy and precision is required, the simplemathematical relationships introduced before are not sufficient.To overcome this effect, a model of the distortion has to be defined. A common radialdistortion model [30] can be written as:


2.1. VISUAL MEASUREMENTS 15( ) (xd˜x= L(˜r)y d ỹ)(2.9)where (˜x, ỹ) T is the undistorted and (x d ,y d ) the corresponding distorted image position.The function L(˜r) determines the amount of distortion depending on the radial distance˜r = √˜x 2 +ỹ 2 from the center for radial distortion.The correction of the distortion at a measured position p =(x, y) can be computed as:ˆx = x c + L(r)(x − x c ) (2.10)ŷ = y c + L(r)(y − y c ) (2.11)where (ˆx, ŷ) T is the undistorted (corrected) position, (x c ,y c ) T the center of the radialdistortion, and r = √ (x − x c ) 2 +(y − y c ) 2 theradialdistancebetweenp and the centerof distortion.An arbitrary distortion factor L(r) can be approximated by the following equation [30]:L(r) =1+m∑κ i r i (2.12)whichisdefinedforr > 0andL(0) = 1. The distortion coefficients κ i as well asthe center of radial distortion (x c ,y c ) T can be seen as additional intrinsic parameters ofthe camera model. The number of coefficients m depends on the required accuracy andthe available computation time. Usually less than the first three or four coefficients areconsidered. In common calibration procedures such as the calibration method proposedby Tsai [67], only the even coefficients (i.e. κ 2 , κ 4 ,...) are taken into account while oddcoefficients are set to zero. In this case, one or two coefficients are sufficient to compensatefor the distortion in most cases [31].Beside the radial distortion model, there are several other models including tangential,linear, and thin prism distortion [31]. Usually a radial distortion model is combined witha tangential model as proposed in [11, 12].iThere are several approaches to compute the unknown intrinsic and extrinsic parametersof a camera. The most common methods are based on known correspondences betweenreal world points and image coordinates. A chessboard-like calibration grid has becomequite common as a calibration pattern. The corners of the grid provide a set of coplanarpoints.The world coordinates can be easily determined if one defines a coordinate system withthe X- andY axis lying orthogonal in the chessboard plane and Z = 0 for all points. Acorner not to close to the center represents the world origin. Based on these definitions,each corner of the calibration pattern can be described in the form (X, Y, Z) T . Threedimensionalcalibration rigs composed of orthogonal chessboard-planes are also used quiteoften.In a captured image of the calibration pattern, the corners can be extracted at pixel(or subpixel) level, and mapped to world coordinates. If there is a sufficient number ofcorrespondences, one can try to solve a homogeneous linear system of equations based on


16 CHAPTER 2. TECHNICAL BACKGROUNDthe projection matrix M. The solution is also denoted as implicit camera calibration, sincethe resulting parameters do not have any physical meaning [31]. In the next stage, theintrinsic and extrinsic camera parameters can be extracted from the computed solution ofM [21].There are linear and nonlinear methods to solve for the projection matrix. Linear methodsassume an ideal pinhole camera and ignore distortion effects. Thus, these methodscan be solved in closed-form. Abdel-Aziz and Karara [1] introduced a direct linear transform(DLT) to compute the parameters in a noniterative algorithm. If higher accuracy isneeded, nonlinear optimization techniques have been investigated to accomplish for distortionmodels. Usually, the parameters are estimated by minimizing the pixel error betweena measured point correspondence and the reprojected position of the world point usingthe projection matrix in least-square sense. This is an iterative process that may endup with a bad solution unless a good initial guess is available [70]. Therefore, linear andnonlinear methods are combined as the DLT can be used for initialization of the nonlinearoptimization. One well-known two-step calibration method was proposed by Tsai [67, 39]2.2. IlluminationIn machine vision applications, the right choice of illumination can simplify the furtherimage processing considerably [16]. It can preprocess the input signal (e.g. enhancecontrast or object boundaries, eliminate background, diminish unwanted features etc.)without consuming any computational power. On the other hand, even the best imagingsensor can not compensate for the loss in image quality induced by poor illumination.There are several different approaches of illumination in MV. Depending on the applicationone has to consider what has to be inspected (e.g. boundaries, surface patternsor color), what are the material properties of the objects to be inspected (e.g. lightingreflection characteristics or translucency) and what are the environmental conditions (e.g.background characteristics, object dimension, camera position or the available space toinstall light sources). In the following, different types of light sources and lighting setupsused in MV are introduced.2.2.1. Light SourcesThe light sources commonly used in machine vision include high-frequency fluorescenttubes, halogen lamps, xenon light bulbs, laser diodes and light emitting diodes (LED)[18].High-frequency fluorescent lights High-frequency fluorescent light sources are widelyused in machine vision applications, since they produce a homogeneous, uniform, andvery bright illumination. They feature white or ultraviolet light at low development ofheat, thus, there is no need for fan cooling.Standard fluorescent light tubes are not suitable for vision applications since they flickercyclically with the power supply frequency. This yields unwanted changes in intensity orcolor in the video image, whereas the effect increases, if the capturing rate of the camera isclose to the power supply frequency (e.g. 50Hz in Germany). High-frequency fluorescent


2.2. ILLUMINATION 17tubes alternate at about 25kHz, what is far beyond what can be captured by a videocamera.Fluorescent lights exist at different sizes, shapes, and setups. Beside the common lighttube, there are also fluorescent ring lights or rectangle area lights. Low costs and a longlife-time make fluorescent lights even more attractive.Light Emitting Diodes A LED is a semiconductor device that emits incoherent, monochromaticlight with the wavelength depending on the chemical composition of the semiconductor.Today, different wavelengths of the visible spectrum for humans ranging fromabout 400 to 780nm, as well as ultraviolet or infrared wavelengths, can be covered byLEDs. The emitted visible light appears for example red, green, blue or yellow. Furthermore,it is possible to produce LEDs that appear “white” by combining a blue LED witha yellowish phosphor coating.LEDs have many advantages compared to other light sources. Due to the small size, theycan be used for a variety of lighting geometries [18]. This includes ring lights, dome lights,area or line lights, spot lights, dark-field lights and backlights. Theoretically, each singleLED in a cluster can be controlled independently. Thus, it is possible to generate differentillumination conditions (for example different lighting angles or intensities) with a singlesetup by enabling and disabling certain LEDs, e.g. automated or software controlled. Itis also possible to use LEDs in strobe light mode.Another advantage of LEDs is their energy efficiency and long lifetime with only a littleloss in intensity over time. Thus, LEDs have low maintenance costs. Operated at DCpower, LEDs do not produce any flickering visible as intensity changes in the video image.Halogen lights Halogen lamps are an extension of light bulbs and filled with a halogengas (e.g. bromine or iodine). With respect to machine vision applications, halogen lampsare often used in combination with fiber optic light guides [18]. The emitted light of alight source is transferred through this fiber optic light guides, allowing for very flexibleillumination setups and geometries. This includes ring lights, dome lights, area or linelights, spot lights, dark-field lights and backlights as for LEDs. Furthermore, there are arange of fiber optic bundles at different sizes to route and position the light for user-definedlighting.One disadvantage of halogen lamps is a large heat development. Thus, usually activecooling is required. Nevertheless, due to the bright “white” light emitted by halogenlights (color temperature of about 6000K), they are also called cold light sources in theliterature. If heat development of the light source can be harmful to heat sensitive objects,fiber optics can be useful to keep the light source away from the point of inspection. LikeLEDs, halogen lamps do not produce flickering effects, if the light source is DC-regulated.Thus, halogen lamps qualify for high accuracy inspection tasks.Xenon lights, often used for strobe light mode, are quite similar to halogen lamps. Theselights allow for very short and bright light pulses, which are used to reduce the effect ofmotion blur.Besidethedifferentwaysoflightgeneration,therearemultiplepossiblesetupsofhowlight sources are arranged. Especially LED lights and fiber optics are very flexible asintroduced before. They can be adapted to a wide range of machine vision tasks at almostany size and geometry.


18 CHAPTER 2. TECHNICAL BACKGROUND(a)(b)(c)(d)Figure 2.5: Incident lighting setups. (a) Indirect diffuse illumination over a hemisphere. (b)Diffuse ring- or area light setup. (c) Darkfield illumination. (d) Coaxial illumination.2.2.2. Incident LightingIncident lighting or front lighting is characterized by one or more light sources illuminatingthe object of interest from the cameras viewing direction. This includes diffuse frontlighting, directional front lighting, polarized light, axial/in-line illumination and structuredlighting [18]. Figure 2.5 gives an overview on different incident lighting setups.Diffuse Lighting Diffuse lighting reduces specular surface reflections and can be seen as auniform, undirected illumination. It is usually generated by one or more light sources thatare placed behind a diffuser at a certain distance. This yields the effect of one uniformarea of light. A diffuser can be a pane of white translucent (acrylic) glass, mylar or othersynthetic material. Instead of using a diffuser in front of a light source, indirect lightingcan also result in diffuse illumination. A simple, but effective method reported in theliterature [7] converts a Chinese wok into a hemisphere for diffuse lighting. The innerside of the wok is painted white. The camera can be placed at a hole on the top of thehemisphere (the bottom of the wok). The light sources are arranged in a way they can notdirectly illuminate the object, but the emitted light is reflected at the white screen insidethe hemisphere (see Figure 2.5(a)). A diffuse illumination can also be achieved using aring or area light source as in Figure 2.5(b)Directional Lighting Directional lighting is achieved by one or more directed light sourcesat a very low angle of incidence. The main characteristic of such type of illumination is the


2.2. ILLUMINATION 19effect of completely smooth objects appearing dark in the image, since the light rays arenot reflected toward the camera, while unevenness leads to brighter image intensities. Dueto this effect, directional lighting is also denoted as dark field illumination in the literature[18] (See Figure 2.5(c)). Directional lighting mostly qualifies for surface inspection tasksthat consider the surface structure revealing irregularities or bumpiness.Polarized light In combination with a polarizing filter in front of the camera lens, incidentlighting with polarized light can be used to avoid specular reflections. Such reflectionspreserve the polarization of a light ray, thus, with the right choice of filter, only scatteredlight rays can pass the filter and reach the camera. A maximal filter effect can be reached ifthe polarization of the light source and the filter are perpendicular to each other. Polarizedlight is often combined with a ring light setup to avoid both shadows and reflections.Structured lighting Structured lighting is used to obtain three-dimensional informationof objects. A certain pattern of light (e.g. crisp lines, grids or cycles [18]) is projectedonto the object. Based on deflections of this known pattern in the image, one can inferthe object’s three-dimensional characteristics. For example, in [58], a 3D scanner usingstructured lighting is presented that integrates a real-time range scanning pipeline. Inmachine vision applications, structured lighting can be used for dimensional measuringtasks were the contrast between object and background is poor.Axial illumination In this type of illumination setup (see Figure 2.5(d)), also denoted ascoaxial illumination in the literature, the light rays are directed to run along the opticalaxis of the camera [18]. This is achieved using an angled beam splitter or half-silveredmirror in combination with a diffuse light source. The beam of light has usually the samesize as the camera’s field of view. The main application of axial illumination systemsis to illuminate highly reflective, shiny materials such as plastic, metal or other specularmaterials, or for example to inspect the inside of bore holes. Axial illumination is typicallyused for inspection of small objects such as electrical connectors or coins.One potential problem with most incident lighting methods are shadows. Although theshadow contrast can be lowered using several light sources at different positions aroundthe object (e.g. ring lights) or axial illumination setups, objects with sharp corners orconcavities might have regions that can not be illuminated and therefore especially regionsclose to the object’s boundaries appear darker in the image. Thus, dark objects on a brightbackground may appear enlarged [16]. The effect of shadows is less significant for brightobjects on dark background. In applications that require totally shadow-free conditionsfor highly accurate measurements of object contours, another lighting setup called backlighting can be used, as introduced in the following.2.2.3. Back lightingThe setup were the object is placed between the light source and the camera, comparedto incident lighting, is denoted as back light illumination. In this arrangement, the lightenters the camera directly leading to bright intensity values at non-occluded regions. Theobject, on the other hand, casts a shadow on the image plane, thus, leading to darkerintensity values. Non-translucent materials result in a very strong, shadow-free contrast,


20 CHAPTER 2. TECHNICAL BACKGROUNDwhich makes back lighting interesting for dimensional measuring tasks. Furthermore,surface structures or textures can be suppressed. If the only light source is placed belowthe object, there will be no shadows around. Back lighting can also be used for localizationof wholes and cracks, or for measuring translucency.In combination with polarized light, back lighting can also be adapted to enhance thecontrast of transparent materials which are difficult to detect in an image at other lightingsetups. In a typical scenario, polarized light entering the camera directly is filtered out byan adequate polarization filter in front of the camera lens, while the polarization of thelight is changed when passing through the object. Thus, in opposition to back lightingwithout polarization, background regions appear dark in the image while (translucent)objects result in brighter intensities. Figure 3.9 in Section 3.3 visualizes the effect of backlighting in combination with a polarization filter.2.3. Edge DetectionAn edge can be defined as particularly sharp change in (image) brightness [24], or moremathematically speaking a strong discontinuity of the spatial image gray level function[36].Beside edges due to object boundaries, there are much more causes for edges in imagessuch as shadows, reflectance, texture or depth. Thus, simply extracting edges in imagesis no general indicator for object boundaries. To yield a semantical meaning, edge informationcan be combined with other features including shape, color, texture, or motion.Model knowledge about expected properties can be useful to group these low-level featuresto objects.In real images there are many changes in brightness (or color), but with respect to acertain application it may be of interest to extract only the strongest edges or edges of acertain orientation. Thus, information such as edge strength and orientation have to betaken into account to link the results of the filter response. Furthermore, in real imagesthere is also a certain amount of noise in the data which has to be handled carefully.2.3.1. Edge ModelsEdges can be modeled according to their intensity profiles [65]. The two edge modelsconsidered in this thesis are shown in Figure 2.6.The ideal step edge is the basis for most theoretic approaches. It can be defined as:{i1 ,x


2.3. EDGE DETECTION 21i 2i 10i 2i 10(a)(b)Figure 2.6: (a) Ideal step edge model. (b) Ramp edge model.reach a higher precision than the discrete pixel grid (See Section 2.3.4). Ramp edges alsoappear if an object is not in focus, or if imaged at motion (motion blur).There are three common criteria for optimal edge detectors proposed by Canny:Good detectionGood localizationUniqueness of responseThe first criterion states an optimal edge detector must not be affected by noise, i.e. itmust be robust against false positives (edges due to noise). On the other hand, edges ofinterest have to be conserved.The good localization criterion takes into account the precision of the detected edgeposition. The distance between the real edge location and the detected position mustvanish.The last criterion requires to have distinct and unique results where only the localmaxima of an edge is relevant. Responses of more than one pixel describe an edge locationonly poorly and should be suppressed.The Canny edge detector [13] is designed to optimize all three criteria (See Section 2.3.3for more details). However, there is definitive tradeoff between the detection and localizationcriterion, since it is not possible to improve both criteria simultaneously [65].2.3.2. Derivative Based Edge DetectionA common way to localize strong discontinuities in a mathematical function is to search forlocal extrema in the function’s first-order derivative or for zero crossings in the secondorderderivative. This principle can be easily adapted to images, thus, replacing theproblem of edge detection by a search for extrema or zero crossings.In the discrete case, differentiation of the image gray level function f(x, y) can be approximatedby finite differences. Since an image can be seen as a two-dimensional function,it can be differentiated in both the horizontal and vertical direction, i.e. with respect tothe x- and y-axis respectively. Following the notation of [21], the partial derivatives ∂f∂xcan be calculated as:and ∂f∂y


22 CHAPTER 2. TECHNICAL BACKGROUND∂f∂x (x, y) ∼ = ∆ x f(x, y) =f(x +1,y) − f(x, y) (2.14)∂f∂y (x, y) ∼ = ∆ y f(x, y) =f(x, y +1)− f(x, y) (2.15)The partial derivative operators ∆ x and ∆ y can be expressed by a discrete convolutionof the image with the filter kernel [1 -1] and[1-1] T for x- andy-direction respectively(the ‘center’ elements of the asymmetric kernel are printed in bold). There are other approximationspossible including the mirrored versions of the kernels above or a symmetrickernel 1/2[1 0 − 1] [36].Accordingly, the second-order derivative can be approximated by the discrete operators∆ 2 x =[1 − 2 1] and ∆ 2 y =[1 − 2 1] T .Under the presence of noise (as usual in real images), edge detectors using the approximationsintroduced before work only poor. This is due to the fact that noise is mostlyuncorrelated and is characterized by local changes in intensity. Assuming a uniform region,a good edge detector should result in a value of zero at this region. With noise the localintensity variations lead to noticeable responses (and local extrema) if using estimates ofpartial derivatives. Therefore, all common edge detectors include a certain smoothing stepto reduce the influence of noise. The selection of the smoothing function, however, candiffer between approaches. The most common smoothing function is a Gaussian.The Gaussian function is a widespread choice, since it comes with several advantages.This includes the property of a Gaussian that convolving a Gaussian with a Gaussianresults in another Gaussian. Assume a Gaussian function G 1 with standard deviation σ 1and G 2 with standard deviation σ 2 . The result of convolving G 1 and G 2 is a Gaussianwith standard deviation σ G1 ∗G 2:√σ G1 ∗G 2= σ1 2 + σ2 2 (2.16)Thus, instead of resmoothing a smoothed image to get a stronger smoothing, it ispossible to use a single convolution with a Gaussian with larger standard deviation. Thisobviously saves computational costs, which is important since convolution is an expensiveoperation.Another advantage of a Gaussian kernel is its separability. This means, a two-dimensional,circularly symmetric Gaussian function G σ (x, y) can be factored into two one-dimensionalGaussians (see [24]) as:G σ (x, y) ==(1 (x 22πσ 2 exp + y 2 ))2σ 2( ( )) ( ( ))1 x2 1 y2√ exp 2πσ 2σ 2 √ exp 2πσ 2σ 2(2.17)Since convolution is an associative operation, the same results can be achieved by convolvingan image with a two-dimensional kernel, or by applying a convolution once withthe separated version in x-direction and convolve the result with the y-version. In practice,a convolution with a discrete N × N kernel can be replaced by two convolutions with a


2.3. EDGE DETECTION 23N × 1 kernel. This increases the performance significantly for large images and N. Moreinformation about convolution and filter separation can be found for example in [64].The general procedure of edge enhancement in common derivative-based edge detectorscan be summarized into two steps:1. Smoothing of the image by convolving with a smoothing function2. Differentiation of the smoothed imageMathematically, this can be expressed as follows (here with respect to x):I edge (x, y) = K ∂/∂x ∗ (S ∗ I(x, y)) (2.18)= (K ∂/∂x ∗ S) ∗ I(x, y)= ∂S ∗ I(x, y)∂xwhere K ∂/∂x indicates the filter kernel approximating the partial derivative with respectto x. S represents the kernel of the smoothing function. Again, the associativity of theconvolution can be used to optimize processing. Thus, instead of first smoothing theimage with kernel S and then calculating the partial derivative, it is possible to reducethe problem to a single convolution with the partial derivative of the smoothing kernel∂S∂x. Hence, the first-order derivative of a Gaussian is suited as an edge detector which isless sensitive to noise compared to finite difference filters [24]. The response of the edgedetector can be parametrized by the standard deviation of the Gaussian to control thescale of detected edges, i.e. the level of detail. A larger σ suppresses high-frequency edgesfor example.2.3.3. Common Edge DetectorsDue to the large number of approaches in this section only a selection of common edgedetectors can be presented. Figure 2.7 visualizes the edge responses of different edgedetectors that will be introduced in the following in more detail.Sobel Edge Detector A very early edge detector that is still used quite often in thepresent is the Sobel operator. It was first described in [51] and attributed to Sobel. It isthe smallest difference filter with odd number of coefficients that averages the image inthe direction perpendicular to the differentiation [36]. The corresponding filter kernel forx and y are:⎡SOBEL X = ⎣⎡SOBEL Y = ⎣1 0 −12 0 −21 0 −1⎤1 2 10 0 0−1 −2 −1⎦ (2.19)⎤⎦ (2.20)


24 CHAPTER 2. TECHNICAL BACKGROUND(a)(b)(c)(d)Figure 2.7: Comparison of different edge detectors. (a) Common LENA test image. (b)Gradient magnitude based on Sobel operator. (c) Edges enhanced via the discrete Laplaceoperator. (d) Result of Canny edge detector (Hyteresis thresholds: 150, 100).


2.3. EDGE DETECTION 25These operators compute the horizontal and vertical components of a smooth gradient[21], denoted as g x and g y in the following. The total gradient magnitude g at a pixelposition p in an image can be computed by the following equation:√g(p) = gx(p)+g 2 y(p) 2 (2.21)An example of the gradient magnitude based on the Sobel operator can be found inFigure 2.7(b). The following approximations can be used in order to save computationalcosts:g(p) ≈ |g x (p)| + |g y (p)| (2.22)g(p) ≈ max(|g x (p)|, |g y (p)|) (2.23)These approximations yield equally accurate results on average [22]. Beside the gradientmagnitude it is possible to compute the angle of the gradient as:( )gy (p)φ(p) =arctan(2.24)g x (p)Although there is a certain angular error with the Sobel gradient [36], it is used veryoften in practice, since it provides a good balance between the computational load andorientation accuracy [16].The Equations 2.21-2.24 are defined not only for the Sobel operator, but for every otheroperator that computes the horizontal and vertical gradient components.Canny Edge Detector Today, the Canny edge detector [13] is probably the most usededge detector, and is proven to be optimal in a precise, mathematical sense [65]. It isdesigned to detect noisy step edges of all orientations and consists of three steps:1. Edge enhancement2. Nonmaximum suppression3. Hysteresis thresholdingThe first step is based on a first-order Gaussian derivative as introduced before. Forfast implementations, the separability of the filter kernel can be used to improve the performance.Gradient magnitude and orientation can be computed as in Equation 2.21 and2.24, or using the approximations. The standard deviation parameter σ of the Gaussianfunction influences the scale of the detected edges. A lower σ preserves more details (highfrequencies),but also noisy edges, while a larger σ leaves only the strongest edges. Theappropriate σ depends on the image content and what kind of edges should be detected.The goal of the nonmaximum suppression step is to thin out ridges around local maximaand return a number of one pixel wide edges [65]. The dominant direction of the gradientcalculated in step one determines the considered neighbors of a pixel. The gradient magnitudeat this position must be larger than both neighbors, otherwise it is no maximumand its position is set to zero (suppressed) in the edge image.


26 CHAPTER 2. TECHNICAL BACKGROUNDIn the last stage of the Canny edge detector, an edge tracking combined with hysteresisthresholding is applied. Starting at a local maxima that meets the upper threshold of thehysteresis function, the algorithm follows the contour of neighboring pixels that have notbeen visited before and meet the lower threshold. Due to step two, a set of one-pixel widecontours is the output of the edge detection (see Figure 2.7(d) for an example with anupper threshold of 150 and a lower threshold of 100).As in most cases, a thresholding is always a tradeoff between false positives (in thiscase edges due to noise) and false negatives (suppressed or fragmented edges of interest).As with the standard deviation of the Gaussian in step one, the hysteresis thresholdshave to be adapted depending on the particular image content. Methods for estimatingthe threshold parameters dynamically from image-statistics are reported for example in[68] or [29]. There are many variations and extensions of the Canny edge detector. Onepopular approach motivated by the Canny’s work is the edge detector of Deriche [19].Laplace The Laplace edge detector is a common representative for second-order derivativeedge detectors. Recalling edges are localized at zero crossings in the second-orderderivative of an image’s two-dimensional intensity function, the goal is to find zero crossingsthat are surrounded by strong peaks.The Laplacian of a function can be seen as sensible analogue to the second derivativeand is rotationally invariant [24]. It is defined as∇ 2 (f(x, y)) = ∂2 f∂x 2 + ∂2 f∂y 2 (2.25)As with first-order derivative edge detectors, a smoothing operation to reduce noiseis performed before applying the edge detector, usually with a Gaussian. Analog toEquation 2.18, the two steps can be combined by applying the Laplacian function tothe Gaussian smoothing kernel before convolution. This leads to an edge detector denotedas Laplacian of Gaussian (LoG) proposed by Marr and Hildreth [45]. It is quite commonto replace the LoG with a Difference of Gaussians (DoG) [24] to reduce the computationalload.A discrete Laplace operator can be derived directly from the first-order operators ∆ 2 xand ∆ 2 y asL ∇ 2 = ∆ 2 x ⊕ ∆ 2 y (2.26)= [1 − 2 1]⊕ [1 − 2 1] T⎡⎤0 1 0= ⎣ 1 −4 1 ⎦0 1 0where the ⊕ operator denotes the tensor product [10] in this context. The result of thediscrete Laplace operator applied to the LENA test image can be found in Figure 2.7(c).Edge detectors based on the Laplacian are isotropic, meaning the response is equallyover all orientations [36]. One drawback of this approach is that second-order derivativebased methods are much more sensitive to noise than gradient-based methods.


2.3. EDGE DETECTION 27(a) 0 ◦ (b) 90 ◦ (c) 30 ◦(d) (e) (f) (g)Figure 2.8: Orientation selective filters based on rotated versions of a first derivative Gaussian(Images taken from [25]).Orientation Selective Edge Detection Until now all presented approaches for edge detectionhave been more or less isotropic, but there are also many approaches that consciouslyexploit anisotropy leading to orientation selective edge detectors. A good overviewon anisotropic filters can be found for example in [69]. These filters have many applicationsfor example in texture analysis or in the design of steerable filters that efficientlycontroltheorientationandscaleoffilterstoextractcertainfeaturesinanadaptiveway.An orientation selective filter can be generated from a rotated version of an elongatedGaussian derivative. Figure 2.8 shows an example of different filters that are mostly sensitiveto 0 ◦ ,90 ◦ ,and30 ◦ oriented edges respectively. If many different orientations shouldbe detected independently in one image, common optimizations exploit the associativityof the convolution operation. Instead of convolving the image with a large number of differentorientation specific filters, the image is convolved with few basis filters only. Then,an anisotropic response of an arbitrary orientation can be estimated over a weighted sumof the basis filter responses. For more information on the technical background of thisapproach is referred to the original papers [25, 49].2.3.4. Subpixel Edge DetectionAt image acquisition (e.g. with CCD cameras) light intensity is integrated over a finite,discrete array of sensor elements. Following the Sampling Theorem [36] this sampling canbe seen as a low-pass filter on the incoming signal, cutting off high-frequencies. Hence,strong edges, which can be seen as high-frequency, may not be imaged precisely by thediscrete grid. On the other hand, edge detectors that work on pixel level can detect thereal edge position only roughly. The average localization error is 0.5 pixel since the centerof the real edge could be anywhere within the pixel [65].In many applications such as high precision measuring tasks, detected edges at pixel gridaccuracy are often not accurate enough. Thus, subpixel techniques have been developedtoovercomethelimitsofdiscreteimagesandtocomputecontinuousvaluesthatlieinbetween the sampled grid.


28 CHAPTER 2. TECHNICAL BACKGROUND300250200Interpolatedsubpixeledge locationDiscrete 1st derivativeSpline InterpolationEdge Profile150100500500 2 4 6 8 10 12 14 16 18x(a)(b)Figure 2.9: (a) Subpixel accuracy using bilinear interpolation. Pixel position P is a localmaximum if the gradient magnitude of gradient g at P is larger than at the positions Aand B respectively. These positions can be computed using bilinear interpolation betweenthe neighboring pixels 0, 7and3, 4 respectively. The gradient direction determines whichneighbors contribute to the interpolation. The edge direction is perpendicular to the gradientvector. (b) The discrete first derivative of a noisy step edge is approximated using cubicspline interpolation. The subpixel tube edge location is assumed to be at the maximum of thecontinuous spline function, which can lie in between two discrete positions (here at x =9.5).Interpolation is the most common technique to compute values between pixels by considerationof the local neighborhood of a pixel. This includes for example bilinear, polynomial,or B-spline interpolation. In [21], a linear interpolation of the gradient values withina3× 3 neighborhood around a pixel is proposed. Here, the gradient direction determineswhichofthe8neighborsareconsidered(seeFigure2.9(a)). Sincethegradientdoesnothave to fall exactly on pixel positions on the grid, the gradient value is interpolated usinga weighted sum of the two pixel positions respectively that are next to the position wherethe gradient intersects the pixel grid (denoted as A and B in the figure). In a nonmaximumsuppression step, the center pixel is classified as edge pixel only if the gradient magnitudeat this position is larger than at the interpolated neighbors. If so, the corresponding edgeis perpendicular to gradient direction.Since the center pixel P lies still on the discrete pixel grid, one has to perform a secondinterpolation step, if higher precision is needed. The image gradient within a certainneighborhood along the gradient direction (e.g. A-P -B) can be approximated for exampleby a one-dimensional spline function [17, 66]. Figure 2.9(b) shows an example of a noisystep edge between the discrete pixel positions 9 and 10 in x-direction. The discrete firstderivative of intensity profile is approximated with cubic splines. The extremum of thiscontinuous function can be theoretically detected with an arbitrary precision representingthe subpixel edge position. However, there are obviously limits of what is still meaningfulwith respect to the underlying input data. In this example, a resolution of 1/10 pixel wasused. The maximum is found at 9.5, i.e. exactly in between the discrete positions.Rockett [56] analyzes the subpixel accuracy of a Canny implementation that uses interpolationby least-square fitting of a quadratic polynomial to the gradient normal to thedetected edge. He found out that for high-contrast edges the edge localization reaches an


2.4. TEMPLATE MATCHING 29accuracy of 0.01 pixels, while the error increases to about 0.1 pixels for low-contrast edges.Lyvers et al. [41] proposed a subpixel edge detector based on spatial moments of a graylevel edge with an accuracy of better than 0.05 pixels for real image data. Aström [6] analyzessubpixel edge detection by stochastic models. A survey on subpixel measurementstechniques can be found in [71].2.4. Template MatchingA common task in vision applications is to search whether a particular pattern is partof an image, and if so, where it is located [28]. Template matching is one method totackle this problem. The search pattern or template can be represented as an image and isusually considerably smaller than the inspected input image. Then, the template is shiftedover the input image and compared with the underlying values. A measure of similarity iscomputed at each position. Positions reaching a high score are likely to match the pattern,or the other way around, if the template matches at a certain location, the score has amaximum at this location.A technique denoted as cross-correlation is widely used as measure of similarity betweenimage patches [64]. It can be derived from the sum of squared differences (SSD):c SSD (x, y) =W∑−1i=0H−1∑j=0(T (i, j) − I(x + i, y + j)) 2 (2.27)where I is the discrete image function and T the discrete template function. W andH indicate the template width and height respectively. Expanding the squared quantityyields:c SSD (x, y) =W∑−1i=0H−1∑j=0T 2 (i, j) − 2T (i, j)I(x + i, y + j)+I 2 (x + i, y + j) (2.28)Since the template is constant, the sum over the template patch T 2 (i, j) is constant aswell and does not contain any information on similarity. The same holds approximatelyfor the sum over the image patch I 2 (x + i, y + j) if there are no strong variances in imageintensity. Hence, the term T (i, j)I(x+i, y +j) remains the only real indicator of similaritythat depends on both the image and the template. This leads to the cross-correlationequation:c(x, y) =W∑−1i=0H−1∑j=0T (i, j)I(x + i, y + j) (2.29)It turns out that the correlation looks very similar to the discrete convolution. Indeed,the only difference between correlation and convolution is the sign of the summation in thesecond term [28]. Thus, theoretically a correlation can be replaced by a convolution with aflipped version of the template [64]. Like convolution, correlation is an expensive operationif applied to large images and templates. In some cases it is faster to convert the spatialimages into the frequency domain using the (discrete) Fast Fourier Transformation (FFT),


30 CHAPTER 2. TECHNICAL BACKGROUNDmultiply the resulting transform of one image with the complex conjugate of the other,and finally reconvert the result to the spatial domain using the inverse FFT [62, 64, 53].Unfortunately, the assumption of image brightness constancy is weak. If there is, forexample, a bright spot in the image, the cross-correlation results in much larger values atthis position than at darker regions. This may lead to incorrect matches. To overcomethis problem, several normalized correlation methods have been introduced. One commonmeasure is denoted as correlation coefficient. It can be computed as:c coeff (x, y) =∑ W −1i=0∑ H−1( ) (j=0 T (i, j) − T I(x + i, y + j) − I(x, y))(2.30)WHσ T σ I(x,y)where T represents the mean template brightness and I(x, y) the mean image brightnesswithin the particular window at position (x, y). σ T and σ I(x,y) indicate the standarddeviation of the template and the image patch respectively. The resulting values lie inthe range between −1 and 1. Obviously, the correlation coefficient is computational moreexpensive. If the standard cross-correlation yields accurate enough results in a certainapplication it may be of interest to use a less expensive normalization that simply mapsthe results of the cross-correlation into the range of −1 to1. Thiscanbeachievedoverthe following equation:c ′ (x, y) =∑ W −1i=0√ (∑W−1 ∑ H−1i=0∑ H−1j=0T (i, j)I(x + i, y + j)j=0 T (i, ∑ j)2 W −1i=0∑ H−1j=0 I(x + i, y + j)2 ) (2.31)The term cross-correlation is usually used if two different images are correlated. If oneimage is correlated with itself, i.e. I = T ,thetermautocorrelation is commonly used [28].In practical applications it is often necessary to adapt the template by changing theorientation or scale to reach maximum matching results [28]. This increases the numberof correlation operations, and thus, the computational load. Therefore, optimizationstrategies are used that try to exclude as many positions as possible that are very unlikelyto match a template.


3. Hardware ConfigurationThis chapter introduces the physical design of the visual inspection prototype. This includesthe conveyor, the camera setup, the choice of illumination as well as the blow outmechanism. Figure 3.1 gives an overview on the hardware setup of the prototype.3.1. ConveyorFor the prototype, a 200cm long and 10cm wide conveyor is used to simulate a productionline. It can be manually fit with several tube segments where the exact number dependson the target length and the distance between two consecutive segments. The measuringis performed at a certain area of the conveyor denoted as measuring area in the following.The field of view of the camera is adjusted to this area, as well as the illumination as willbe introduced in Section 3.2 and 3.3 respectively.The dimension of the measuring area depends on the size of the tubes to be measured.Therefore, with respect to the range of tube sizes, the measuring area is designed to coverthe maximum tube size of 100mm in length and about 12mm in diameter. It must beeven larger to be able to capture several images of each tube while passing the visual fieldof the camera.Since in production the tubes are cut to lengths from a continuous tube using a rotatingknife (flying knife), there would not be a notable spacing between two consecutive tubesegments if transfered to the measuring area with the same speed as entering the knife.Thus, it can be difficult to determine where one tube starts and ends in the continuousline by looking both for humans and artificial vision sensors. To overcome this problem,after cutting, the tube segments have to fall onto another conveyor with a faster velocityto separate them. The faster the second conveyor is compared to the first one, the largerthe gap.Since processing time is expensive, the goal is to simplify the measuring conditions asmuch as possible using an elaborated hardware setup. One easy but effective simplificationis to mount two guide bars to the conveyor that guarantee almost horizontal oriented tubesegments. The guide bars are arranged like a narrow ‘V’ (see Figure 3.1(b)). The tubesenter the guide bars at the wider end and are adjusted into horizontal position whilemoving. At the measuring area the guide bars are almost parallel and just slightly widerthan the diameter of the tubes. The distance of the guide bars can be easily changed usingadjusting screws if the tube type changes.The color and structure of the conveyor belt is crucial to maximize the contrast betweenobjects and background for the inspection task. Therefore, a white-colored belt is used.The advantage of this choice with respect to the range of tube types to be inspected incombination with the illumination setup will be discussed in more detail in Section 3.3.31


32 CHAPTER 3. HARDWARE CONFIGURATION(a)(b)Figure 3.1: Hardware setup of the prototype in the laboratory environment. (a) Total view.(b) View on the measuring area.


3.2. CAMERA SETUP 333.2. Camera setupMachine vision applications have high demands on the imaging system, especially if highaccuracy and precision is required. The camera and optical system, i.e. the lens, have tobe selected with respect to the particular inspection task. This section gives an overviewon the imaging system used in this application and how it was selected.3.2.1. Camera SelectionThe main criteria for camera selection with respect to the application in this thesis are:Image qualitySpeedResolutionThe image quality is essential to allow for precise measurements. This includes a lowsignal-to-noise ratio, no or only a little cross-talking between neighboring pixels, and squarepixel elements. As introduced in Section 1.3 the system is intended to work in continuousmode. Therefore, the speed, i.e. the possible frame rate, of the camera determines howmany images of a tube can be captured within a given time period. Of course, this numberis also depending on the velocity of the conveyor. Especially at higher velocities, a fastcamera is important, since the idea of multi-image measurements fails if the camera isnot able to capture more than one image of each tube that is possible to evaluate. Thefinal frame rate should depend purely on the per frame processing time. This means, thecamera must be able to capture at least as many frames as can be processed. Otherwisethe camera would be a bottleneck. The frame rate of a camera is closely related to theimage resolution. Higher resolutions mean a larger amount of data to be transferred andprocessed. Thus, there is a tradeoff between resolution and speed. A higher resolutionmeans smaller elements on the CDD sensor array, hence, an object can be imaged moredetailed. With respect to length measurements the effective pixel size decreases at a higherresolution, and a pixel represents a smaller unit in the real world.Three cameras have been tested and compared:Sony DFW VL-500AVT Marlin F-033CAVT Marlin F-046BThese cameras are all IEEE 1394 (Firewire) progressive scan CCD cameras.The Sony camera has a 1/3” image device (Sony Wfine CCD) and provides VGA (640×480) resolution color images at a frame rate of 30 frames per second (fps). It is equippedwith an integrated 12× zoom lens which can be adjusted over a motor.The Marlin F-033C is a color camera with a maximum resolution of 656 × 492 pixel inraw mode, while the F-046B is a gray scale camera with a resolution of 780 × 582 pixel inraw mode. Both cameras have a 1/2” image device (SONY IT CCD). The Marlin camerasreach much higher frame rates compared to the Sony. At full resolution, the F-033C


34 CHAPTER 3. HARDWARE CONFIGURATIONR1 G1 R2 G2G3 B1 G4 B2P1 P2 P3Figure 3.2: The sensor elements of single chip color cameras like the Marlin F-033C areprovided with color filters, so that each sensor element gathers light of a certain range ofwavelengths only, corresponding to red, green, and blue respectively. The arrangement ofthe filters is denoted as BAYER mosaic. Interpolation is needed to compute the missing twochannels at each pixel. Image taken from [3].features 74fps and the F-046B 53fps respectively. Since these cameras do not come withan integrated optical system, a particular lens (C-Mount) must be provided additionally.AmoredetailedspecificationoftheMarlincamerascanbefoundinAppendixB.1.It turned out that the Sony camera is not suited for this particular application. Themain reason is the limited frame rate of 30fps, thus, a new image is captured approximatelyevery 30ms. As mentioned before, the camera speed should not be the bottleneck of theapplication. However, as will be shown in Section 5.3.8, the processing time of one image issignificantly less than 30ms, which excludes the Sony camera in this particular application.The Marlin cameras reach much higher frame rates and come with another advantage.Since the tube orientation can be considered as horizontal due to the guide bars as introducedin the previous section, one does not need the whole image height that a camera canprovide. It is possible to reduce the image size to user-defined proportions also denotedas area of interest (AOI). This function is used to decrease the number of image rows tobe transferred over the firewire connection, but keeping the full resolution in horizontaldirection. For example, in a typical setup an image height of 160 pixels is large enoughto include the whole region between the guide bars. The reduced image size is about 1/3of the original size. Combined with a short shutter time, the reduced number of imagerows increases the effective frame rate significantly, so it is possible to reach frame ratesof > 100fps.The decision whether to use the Marlin F-033C or the F-046B depends mainly on thequestion if color is a useful feature in this particular application. In general, single chipcolor cameras like the F-033C map a scene less accurate compared to gray scale camerasif image brightness is considered.This is due to how these cameras are designed. Each sensor cell of a single chip colorcamera is provided with a color filter for either red (R), green (G), or blue (B) respectively.Without these filters the sensor cells are equal to those in gray scale cameras. Usually, thefilters are arranged in a pattern denoted as BAYER mosaic (see Figure 3.2). Within each2×2 region there are two green, one red, and one blue filter. This distribution is originatedin human vision and leads to more natural looking images, since the human optical systemis most sensitive to green light. The drawback of this approach is that the resolution of


3.2. CAMERA SETUP 35eachcolorchannelisreduced. Toovercomethisproblem,onehastointerpolatethetwomissing color channels at each pixel position. There are several interpolation approachesalso denoted as BAYER demosaicing. With respect to speed it is important to use a nottoo expensive computation. The F-033C computes R-G-B values at virtual points P i atthe center of each local 2 × 2 neighborhood as follows [3]:P 1 redP 1 greenP 1 blueP 2 redP 2 greenP 2 blueP 3 redP 3 greenP 3 blue= R1= 1 2 (G1+G3)= B1= R2= 1 2 (G1+G4)= B1= R2= 1 2 (G2+G4)= B2(3.1)where the location of the different points can be found in Figure 3.2. Obviously, thisinterpolation technique reduces the resolution of the sensor in all channels, since valuescan be computed only at positions where four pixels meet and not at the boundaries ofthe image. 1If the inspection task can be performed at gray scale images, gray scale cameras shouldbe used instead of color cameras. Intuitively, the accuracy of a color camera can not bethe same as that of a gray scale camera, because this requires two interpolation steps.First, one interpolates the R-G-B color channels as introduced before, and then has toestimate the image brightness from these interpolated values. A gray scale camera offersa more direct transformation between light intensity and image values, thus, leading notonly to more accurate images, but also to higher frame rates. This can be supported bythe following experiment.A test image of graph paper has been captured once with the F-033C and once with theF-046B. A 16mm fix-focal length lens has been used respectively, and the distance betweencamera and graph paper as well as the viewing direction has been the same. The focus ofthe optical system was adjusted to obtain a sharp image in both cases. The results canbe found in Figure 3.3. The color image in (a) has been converted into gray level valuesusing the following equation:I(x, y) =0.299R(x, y)+0.587G(x, y)+0.114B(x, y) (3.2)where R, G, B represent the three color channels for red, green, and blue respectivelyand I is the resulting gray level image.The grid appears to be more sharp in the image of the gray scale camera, although thecolor image was also at focus during acquisition. The profiles of two scan lines of equallength through an edge of the grid (visualized in (b) and (d)) can be found in Figure 3.3(e).1 There are also color cameras that are provided with three chip sensors. The incoming light is split intodifferent wavelength ranges via a prism. Thus, each sensor yields a full resolution image of one colorchannel and interpolation is not necessary. These cameras, however, are quite expensive and could notbe tested.


36 CHAPTER 3. HARDWARE CONFIGURATION(a)(b)(c)(d)170160Marlin F033CMarlin F046B1501401301201101009080700 1 2 3 4 5(e)Figure 3.3: Comparison of the F-033C color and F-046B gray level camera. The test imagesshow a graph paper captured from a distance of approximately 250mm using a 16mm fixfocallength lens. (a) Color image of the F-033C. (b) Zoom view showing the location of thescan line through a grid edge in the converted gray scale image of (a). (c) Gray scale imageacquired with the F-046B. (d) Zoom view showing the location of the scan line through a gridedge in (c). (e) Profiles of the two scan lines. The F-046B acquires a significant sharper edgecompared to the color camera which can be seen at the slope of the edge ramp.


3.2. CAMERA SETUP 37Figure 3.4: Color information of transparent tubes in HSV color space. Rows include fromtop to bottom: Color input image, hue channel, saturation channel, value (brightness) channel,and in the bottom row the computed gray scale image using Eq. 3.2. Although all images aretaken from the same sequence, tubes and background can have very different color.The position of the scan lines corresponds to the same real world location. It can be seenthat both edges are ramp edges (see Section 2.3.1). The slope of the edge profile, however,is larger for the gray level camera, i.e. the edge can be located more precise. This is animportant advantage with respect to accurate measuring. Therefore, if color has no othersignificant advantage over gray scale images, a gray scale camera should be preferred inthis application.One can think of using color information to segment the transparent tubes from thebackground, since they appear yellowish or reddish while the conveyor belt should bewhite. For black tubes, color has obviously no significant benefit, hence, it is adequate toconcentrate on the transparent tubes in this context.The idea is to use color as a measure to distinguish between transparent tubes and thebackground, since here the gray scale contrast is lower compared to black tubes. However,as can be seen in Figure 3.4, in real images of transparent heat shrink tubes on a conveyor,the color of the conveyor belt can appear quite different. The test images have been takenfrom a sequence of tubes on a moving conveyor. The images have been illuminated via aback light setup, which will be introduced in Section 3.3. It can be observed that someregions of the same conveyor belt look yellowish, while others appear blueish in the image(see left column in Figure 3.4).There are several color models beside the R-G-B model. Humans intuitively perceiveand describe color experiences in terms of hue (chromatic color), saturation (absence ofwhite) and brightness [27]. A corresponding color model is the H-S-V model, where Hstands for hue, S for saturation, and V for (brightness) value respectively. More detailedinformation on color models can be found for example in [35].In the hue domain, a yellowish transparent tube differs from a blueish background significantly(left column in Figure 3.4). If the background is also yellowish, the difference


38 CHAPTER 3. HARDWARE CONFIGURATIONbetween tube and background decreases (center column). Strong discontinuities in backgroundcolor (as in the right column) could be wrongly classified as a tube. The saturationdomain is also a quite unstable feature. If the background contains a lot of white, it ismore desaturated than the object (like in the center column) and yields a quite strongcontrast. The example in the left column, however, shows that the difference in saturationdoes not always have to be that clear. The brightness channel (fourth row) is veryclose to the computed gray level image using Equation 3.2 (bottom row). Thus, it equalsapproximately what a gray level camera would see.In this experiment it has been shown color can be a very unstable feature. With respectto precise length measurements it definitely turns out that there are a lot of artifacts atthe tube edges in the H and S color channel respectively. In the brightness channel, edgesappear much more sharp. The little artifacts in this channel are due to the camera noise,motion blur effects, or not perfectly adjusted camera focus. Since the brightness channelis closely related to the gray value image converted from R-G-B values using Equation 3.2one could replace the brightness channel by this image. As can be seen in Figure 3.4, thebottom row yields even a better contrast between object and background.With the observations made before, one can conclude that a gray level camera is bestsuited in this particular application. It yields the best edge quality, which is important forprecise measurements, and both black and transparent tubes are imaged with a sufficientcontrast between object and background making it possible to locate a tube in the imagewithout using color information. Hence, the Marlin F-046B camera has been selected forthis prototype. It yields the best compromise in image quality, resolution, and speed.3.2.2. Camera PositioningThe camera is placed at fix position and viewing angle above the measuring area of theconveyor (see Figure 3.1(b)). In a calibration step, it is adjusted to position the imageplane parallel to the surface of the conveyor with the optical center above the center of themeasuring area, thus, minimizing the perspective effects at this area. The exact calibrationprocedure will be explained in Section 4.3.2. The moving direction of the conveyor (andtherefore of the tube segments) is horizontal in the image.The distance between camera and conveyor depends on the optical system, i.e. the lens,that is used and on the tube size to be inspected. In Section 4.2.2, the basic assumptionsand constraints regarding the image content with respect to the image processing arepresented. This includes the assumption that only one tube can be seen totally in animage at one time. Correspondingly, the cameras field of view has to be adapted to satisfythis criterion for different tube lengths.Placing the camera above the conveyor has the additional advantage of not extendingthe dimensions of the production line, since space in a production hall is limited andtherefore expensive.3.2.3. Lens SelectionParameters such as object size, sensor size of the camera, camera distance, and accuracyrequirements determine the right optical system (objective) for a particular application.In the following, the term lens will be used synonymously to the term optical system orobjective, although an objective is actually more than just a single lens (iris, case, mount,


3.2. CAMERA SETUP 39adjusting screws, etc.). The lens, however, is the most important factor that determinesthepropertiesoftheobjective.The most important parameters to specify a lens include the focal length, F-number,magnification, angle of view, depth of focus, minimum object distance, and finally theprice. In addition, lenses can have a number of aberrations as introduced before in Section2.1.3. Lens manufacturers try to minimize for example chromatic or spherical aberrations,but it is not possible to produce an completely aberration free lens in the generalcase (e.g. for all wavelengths of light or angles). In practice, lenses are composed ofdifferent layers of special glass. High precision is needed to produce high quality lenses,thus, such lenses can be very expensive. There are different lens types available includingfix-focal and zoom lenses. While fix-focal length lenses, as the term indicates, have a fixfocal length, zoom lenses cover a range of different focal lengths. The actual focal lengthcan be adjusted manually or motorized. For machine vision applications fix-focal lengthlenses are usually preferable [40]. If the conditions are highly constrained, the best suitedlens can be selected a priori.This section should give a brief overview on the most important lens parameters andmotivate the selection of the lens used in this application.Focal Length In the ideal thin lens camera model, the focal length is defined as thedistance between the lens and the focal point, i.e. the point where parallel rays enteringthe lens intersect at the other side (see Figure 2.4). In practice, the focal length valuespecified by the manufacturer depends on the lens model used (which is usually unknown)and does not have to be accurate. In applications that require high accuracy, a cameracalibration step is important to determine the intrinsic parameters of the camera includingthe effective focal length with respect to the underlying camera model.F-number The F-number describes the relation of the focal length to the relative aperturesize such as [18]:F = f d(3.3)where d is the diameter of the aperture. Thus, the F-number is an indicator of the lightgatheringpower of the lens. Typical values are 1.0, 1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, and32 with a constant ratio of √ 2 between consecutive values. A smaller F-number indicatesmore light can pass the lens and vice versa. Camera lenses are often specified by theminimum and maximum F-number, also denoted as iris range.Magnification In the weak perspective camera model (see Section 2.1.3), the ratio betweenfocal length and the average scene depth Z 0 can be seen as magnification, i.e.following Equations 2.3 the magnification m is expressed as [24]:m = f Z 0(3.4)


40 CHAPTER 3. HARDWARE CONFIGURATION(a)(b)Figure 3.5: (a) Standard perspective lens. Closer objects appear larger in the image thanobjects of equal size further away. (b) Telecentric lenses map objects of equal size to the sameimage size independent of the depth within a certain range of distances. Images are takenfrom Carl Zeiss AG (www.zeiss.de)where Z 0 can be seen as the lens-object distance also denoted as working distance inthe following. This gives a good estimate how large an object will appear on the imageplane at a given distance Z 0 to the camera with a lens of focal length f.Depth of Focus Following the thin lens camera model, only points at a defined distanceto the camera will be focused on the image plane. Points at shorter or further distanceappear blurred in the ideal model. In practice, however, points within some range ofdistances are in acceptable focus [24]. This range is denoted as depth of focus or depthof field. This is due to the finite size of each sensor element, since there is no differencevisible in the image if a point is focused on the image plane or not as long as it will notspread over several pixels [18]. The depth of focus increases with a larger F-number [18].Minimum Object Distance (MOD) All real lenses have a certain distance at whichpoints that lie closer to the camera can not be focused anymore. This has both mechanicaland physical reasons. The MOD value is important, since it determines the minimumdistance of the camera to the objects in an application.AngleofView The angle of view is the maximum angle from which rays of light areimaged to the camera sensor by the lens. Short focal length lenses have usually a widerangle of view and therefore are also denoted as wide-angle lenses, while lenses with a largerfocal length have a narrower angle of view. The angle of view determines the field of viewof the camera at a given distance and a certain sensor size, this means what part of theworld is imaged onto the sensor array of the camera.Commonly short focal lenses are used to capture images of a larger field of view forexample in video surveillance applications that have to cover a larger area. With respectto machine vision applications, such lenses can also be used for close-up images at ashort camera-object distance. The amount of radial distortion increases with a shorterfocal length. The fish-eye lens is an extreme example for a very short focal length lens.Increasing the focal length increases the magnification. Thus, even smaller objects at


3.2. CAMERA SETUP 41further distance can be imaged over the whole image size with such lenses. However, theminimum object distance is larger for long focal length lenses.For two-dimensional measuring tasks most accurate and precise results can be achievedwith telecentric lenses (see Figure 3.5). These special lenses are designed to map objects ofthe same size in the world to the same image size, even if the object to lens distance differs.It is important to note that the maximum object size can not be larger than the diameterof the lens. This makes telecentric lenses useful only in connection with relatively smallobjects. In addition, such lenses reach a size of over 0.5m for objects of about 100mm anda mass of approximated 4kg [18]. Finally, telecentric lenses are very expensive.Although a telecentric lens would be advantageous in the imaging properties, a lessexpensive solution had to be found for the prototype development in this application. Theoptical system must be able to map objects between 20 and 100mm to an 1/2” CDDsensor at a relative short camera-object distance, and which is expected not to be affectedtoo much by aberrations and radial distortion.However, this is an optimization problem that has no universal solution for all tubelengths. Different tube lengths need different magnification factors and field of views if themaximum possible resolution should be exploited to reach the highest accuracy. Changingthe magnification factor means changing either the focal length of the optical system orthe distance between object and camera, or both. If moving the camera toward the object,the minimum object distance of the lens has to be considered to be able to yield sharpimages. Zoom lenses could be used to change the focal length without changing the wholeoptical system. However, zoom lenses should be avoided in machine vision applications[40], since they have to make larger compromises than fix-focal lenses and usually havea minimum working distance of one meter and more. Hence, if using a fix-focal lens,this implies changing the camera-object distance to adapt to different tube lengths, or tophysically exchange the lens when a new length is cut by the machine which can not becovert by the current lens.Several commercial lenses designed for machine vision applications have been comparedto find the lens that is best suited to inspect different tube sizes (see Table 3.1). Figure 3.6gives an overview on the parameters that influence a camera’s field of view. The angleof view θ is specified by the lens manufacturer, and is depending on the focal length andthe camera sensor size. All values in the following are oriented at an 1/2” CCD sensor,since this is the sensor size of the Marlin F-033C and F-046B. The working distance dis here defined as the distance between lens and conveyor. O represents the object size,and L indicates the size of the measuring area with respect to a certain tube size. LcanbeapproximatedastwicetheobjectsizeO. The goal is to find a combination of alens with a working distance that yields a visual field so that the size V of the imagedregion of the conveyor equals the measuring area L. Note, in this context size can bereplaced by length in horizontal, i.e. in the moving direction of the conveyor, since thisis the measuring direction in this constraint application. Thus, in the following only thisdirection is considered.The geometry in Figure 3.6 leads to the following relationship between θ, d and V :( )θradV =2dtan(3.5)2


42 CHAPTER 3. HARDWARE CONFIGURATIONFigure 3.6: Parameters that influence the field of view (FoV) of a camera. θ indicates theangle of view of the optical system, d the distance between lens and conveyor, O the objectsize, V is the size of the region on the conveyor that is imaged, and L representing the size ofthe measuring area depending on the current tube size. The goal is to find a lens that yieldsafieldofviewsuchasV ≈ L at short distance.Model f θ d minPentax H1214-M 12mm 28.91 250mmPentax C1614-M 16mm 22.72 250mmPentax C2514-M 25mm 14.60 250mmPentax C3516-M 35mm 10.76 400mmPentax C5028-M 50mm 7.32 900mmTable 3.1: Different commercial machine vision lenses and there specifications including focallength f, horizontal angle of view θ (in degrees) with respect to an 1/2” sensor, and minimumobject distance d min respectively.where θ rad represents the angle of view θ in radians. Using this equation one cancompute the length of the conveyor that is imaged in horizontal direction at the minimumobject distance of a lens. The results can be found in Table 3.2.This shows, none of the compared lenses is able to image small objects (< 30mm) infocus onto the camera sensor in a way that the object covers about half the full imagewidth. Thus, the minimum tube size that can be inspected at full resolution under thisassumption is 30mm. However, if one shrinks the image width manually (for exampleusing the AOI function of the camera), the constraints can be reached even for tubesbelow 30mm.The real world representation s of one pixel in the image plane can be approximated asfollows:s =VW img(3.6)where W img represents the image width in pixels. For example, for a 16mm focallength lens and a working distance of 250mm, one pixel represents about 0.12mm at thisdistance in the real world if the image resolution is 780 in horizontal direction. At the samedistance, a 25mm focal length lens yields a pixel representation of about 0.08mm at the


3.3. ILLUMINATION 43f12mm16mm25mm35mm50mmV129mm100mm64mm75mm115mmTable 3.2: Field of view of different fix-focal length lenses at the specified minimum objectdistance.V f=12mm 16mm 25mm 35mm 50mm40 •77 •99 •156 •212 •31260 •116 •149 •234 •318 •469100 •193 248 390 530 •782200 381 497 780 1006 1563Table 3.3: Working distances to yield a certain field of view for different focal length lenses.Distances that fall significantly below the minimum working distance are marked with a •.same resolution. Thus, smaller tubes can be measured theoretically at higher precision.The minimum object distance of the compared lenses, however, represents a certain limitin precision. Tubes below 30mm can not be measured with higher, but with the sameprecision as 30mm tubes. Reminding the tolerances introduced in Section 1.3, smallertubes have a smaller tolerance than larger tubes, and 20 − 30mm tubes have the sametolerance.At the upper bound, larger tubes need a wider field of view of the camera. Hence, alarger region is mapped on the same image sensor, so one pixel represents more. For a200mm measuring area the pixel representation is about 0.25mm. The field of view canbe achieved by placing the camera further away from the object. The distance increaseswith the focal length of the lens. Table 3.3 shows the approximated working distance forthe compared lenses that are needed to result in a certain field of view. Distances thatfall below the minimum object distance are marked with a ‘•’.It turns out that a 16mm focal length lens is best choice for tube lengths between 50and 100mm, since this lens maps the required measuring areas onto the image plane at thesmallest working distance. However, tubes below 50mm can not be inspected with higherprecision with this lens. In this case, a 25mm focal length lens has to be selected. Thislens is the best compromise for small and large tube sizes. It has the drawback of a largeworking distance of up to 780mm for 100mm tubes. Both a 16mm (PENTAX C1614-M)and a 25mm (PENTAX C2514-M) focal lens have been used in the experiments.3.3. IlluminationAs introduced in Section 2.2, the right choice of illumination is substantial in machinevision applications. Accurate length measuring of heat shrink tubes requires a sharp contrastat the tube’s outline, especially at the boundaries that are considered as measuringpoints. Any shadows that would increase the tube’s dimension in the 2D image projection


44 CHAPTER 3. HARDWARE CONFIGURATION(a)(b)(c)(d)Figure 3.7: Heat shrink tubes at different front lighting setups. (a) Illumination by twodesktop halogen lamps. Specular reflections at the tube boundaries complicate an accuratedetection. (b) Varying the angle and distance of the light sources as in (a) can reduce reflections.(c) Professional front lighting setup with two line lights at both tube ends. (d)Resulting image of the setup in (c). Both in (b) and (d) shadows can not be eliminatedcompletely. (Images (c) and (d) by Polytec GmbH, Waldbronn, Germany)


3.3. ILLUMINATION 45(a) (b) (c)Figure 3.8: Back lighting through different types of conveyor belts. The structure of thebelt determines the amount of light entering the camera, thus, influencing the image qualitysignificantly.must be avoided. In addition, the illumination setup should cover both black and transparenttubes, whereas the transparent tubes are translucent while the black ones are not.The surface of both materials appears mat under diffuse illumination, but shows specularreflections if illuminated directly with point light sources.In a first experiment with standard desktop halogen lamps a front lighting setup wastested. Two light sources have been placed at low angle to illuminate the tube boundariesfrom two sides at the measuring area inside the guide bars. The results are shown inFigure 3.7(a) and 3.7(b). This setup yielded good results with black heat shrink tubes,but it turned out to produce unacceptable reflections just at the measuring points withthe transparent ones. Such reflections could be reduced by changing the angle of lightincidence, but still left strongly non-uniform results. Although the halogen lamps areoperated at DC power, the AC/DC conversion of off-the-shelf desktop lamps if often notstabilized, thus, leading to temporal and spatial variances in image intensities and color.This effect has been observed throughout the experiments with the desktop lamps at videoframe rates of 50fps.Using a professional, flicker free, front lighting system with two fiber optical line lightsilluminating the tube ends (see Figure 3.7(c)), the image quality could be increased as canbe seen in Figure 3.7(d). However, there are still a few shadows left.Experiments with a back light setup have been accomplished, too. A calibrated fiberoptical area light is placed at a certain distance (about 1-2cm) below the conveyor belt.The light has to shine through the belt, thus, it is important to use a material that istranslucent. A typical belt core consist of a canvas (e.g. cotton) and a rubber coating,whereas thickness, structure and density of the canvas as well as the color of the rubberdetermine how much light can enter the camera. In the optimal case, no light at all wouldbe absorbed by the belt what is technically hardly possible.Five different belt types have been tested. Some of the results can be seen in Figure 3.8.Each sample in this experiment consists of a transparent rubber coating and a white canvasas base. The structure of the belt canvas is visible in each image as background pattern.Obviously, the background should not influence the detection of the tube’s boundary.Thus, the goal is to find a belt type that allows for back lighting without adding to muchunwanted information to the image that could complicate the measurements.In Figure 3.8(a), the coarse texture of the background significantly affects the tube endsof the transparent tube at the bottom. A sharp boundary is missing, making accurateand reliable measurements impossible. The belt type in Figure 3.8(b) has a finer texture,but transmits only a little amount of light. Figure 3.8(c) shows the belt type that yielded


46 CHAPTER 3. HARDWARE CONFIGURATION(a) (b) (c)(d)(e)Figure 3.9: Polarized back lighting. (a) Image of diffuse back light through polarized glassesused for viewing 3D stereo projections with no filter in front of the camera. (b) Setup as in (a)with an opposing polarization filter in front of the camera. Almost no light enters the cameraat the polarized area. (c) Transparent heat shrink tube at polarized back light. There is astrong contrast at the polarized area, while it is impossible to locate the tube’s boundaries atthe unpolarized area (bottom right). (d) Polarized back light through a conveyor belt. Thepolarization is changed both by the belt and the tube, thus, leading to a poor contrast. (e)For comparison: Back light setup without polarization.best results both in background texture and transmittance. As can be seen, there are noshadows at the tube boundaries.Since the black tubes do not let pass any light rays, the contrast between background andtube is excellent with all kinds of belt types tested. One advantage of black tubes followsfrom this property: The printing on the tube’s surface is not visible in the image. Onthe other hand, the transparent tubes do transmit the light coming from below. Positionscovered by the printing show a minor transmittance, hence, the printing is visible in termsof darker intensity values in the image.As introduced in Section 2.2.3, polarized back lighting can be used to emphasize transparent,translucent objects. In an experiment, shown in Figure 3.9, the integration ofpolarization filters has been tested. Two polarized glasses originally used for viewing 3Dstereo projections have been employed to polarize the light coming from the area backlight. First, the principle is tested without a conveyor belt. Two opposite polarization filtersare placed between light source and camera. As can be seen in Figure 3.9(b), the areacovered by the two polarization filters at right angle appears black in the image while theareas without polarization filters are ideally white. A transparent tube between the twofilters changes the polarization, and hence, making it possible that light enters the cameraat locations that have been black before. There is an almost binary contrast betweenobject and background (see Figure 3.9(c)). At regions that are not affected by the filters,there is no contrast at all making the tube invisible. Unfortunately these good resultshave no practical relevance, since in the real application the light has to pass the conveyorbelt, too. If the belt is placed between the first polarization filter and the object, it alsochanges the polarization at regions that do belong to the background (see Figure 3.9(d)).The binary segmentation is lost and the structure of the conveyor belt is visible again.While it is not possible to install the first polarization filter between conveyor and tube,the polarized back light approach has no advantages compared to the unpolarized in this


3.3. ILLUMINATION 47(a)(b)Figure 3.10: (a) Installation of the back light panel. The measuring area is illuminatedfrom below through a translucent conveyor belt. A diffuser is used to yield a more uniformlight and to protect the fiber optical light panel. (b) SCHOTT PANELight Backlight A23000used for illumination (Source: SCHOTT).application. On the contrary it has the effect of less light entering the camera which yieldsdarker images and increases the amount of sensor noise.As result of the experiments with different lighting techniques, the back lighting setuphas been chosen for the prototype. It offers excellent properties for black tubes and yieldedalso very good results for the transparent tubes in connection with a fine structured,translucent conveyor belt. The incident lighting did not perform better in the experiments.A light source (SCHOTT DCR III) with a DDL halogen lamp (150W, 20V) has beenselected in combination with the fiber optic area light (SCHOTT PANELight BacklightA23000) (see Figure 3.10(b)). The panel size is 102 × 152mm. It is installed 20mm belowa cut-out in the conveyor below the belt as can be seen in Figure 3.10(a). A diffuserbetween light panel and conveyor belt provides a uniform illumination and protects thelight area against dirt. More details regarding the illumination hardware can be found inAppendix B.2.The usage of a fiber optic area light below the conveyor belt has the advantage of a verylow heat development since the light source can be placed outside at a certain distance.With respect to the characteristics of heat shrink tubes, the avoidance of heat is essentialat this step to prevent deformations. The light is transmitted through a flexible tube offibers. If the lamp is out of order it can be exchanged easily without changing anything atthe conveyor. The lifetime of one halogen lamp is about 500 hours at maximum brightness.To eliminate the influence of illumination from other light sources than the back light,the whole measuring area including the camera is darkened. This guarantees constantillumination conditions. For the prototype, a wooden rack has been constructed that isplaced around the measuring area on the conveyor. A thick black, non translucent fabriccan be spanned around the rack leaving only two openings where the tubes enter and leave


48 CHAPTER 3. HARDWARE CONFIGURATIONFigure 3.11: Air pressure is used to sort out tubes that do not meet the tolerances. Theblow out unit consisting of an air blow nozzle, light barrier and a controller (not visible in theimage) is placed at a certain distance behind the measuring area.the function room darkening. For industrial use this interim solution has to be replacedby a more robust and compact (metal) case that excludes environmental illumination andprotects the whole measuring system against other outside influences in addition. A slightoverpressure inside the closed case or an air filtering system could be integrated to avoiddust particles from entering the case through the required openings. Any accumulation ofdust or other dirt on the lens is critical and must be prevented.3.4. Blow Out MechanismAfter a tube has passed the measuring area the measured length is evaluated with respectto the given target length and tolerance. The result is a binary good/bad decision foreach particular tube. Good tubes are allowed to pass the blow out unit, which is placedbehind the measuring area at a certain distance. On the other hand, tubes that do notmeetthetoleranceshavetobesortedout. Thisisdonebyairpressure. Aairblownozzleis arranged to blow out tubes from the conveyor. Therefore, the guide bars have to endbehind the measuring area. The whole blow out setup can be seen in Figure 3.11.The visual inspection system sends the good/bad decision over a RS-232 connection(serial interface) to a controller unit in terms of a certain character followed by a carriagereturn (‘\r’). The used protocol can be seen in Table 3.4. Once the controller receives anA or B this message is stored in a first-in-first-out (FIFO) buffer.MessageTUBE GOODTUBE BADRESETCode‘A\r’‘B\r’‘Z\r’Table 3.4: Protocol used for communication between the inspection system and the blowout controller.


3.4. BLOW OUT MECHANISM 49A light barrier is used to send a signal to the controller when a tube is placed in frontof the air blow nozzle. If the first entry in the FIFO buffer contains a B, thetubehastobe blown out and the air blow nozzle is activated. On the other hand, if the first entrycontains an A, the tube can pass. In both cases the first entry in the buffer is deleted.The advantage of this approach is that the current conveyor velocity does not have tobe known to compute the time a tube needs to move from a point x on the measuringarea to the position of the air blow nozzle. The light barrier guarantees the blow out isactivated when the tube is exactly at the intended position.


50 CHAPTER 3. HARDWARE CONFIGURATION


4. Length Measurement ApproachWhile the previous chapter focused on the hardware setup, this chapter will present themethodical part of the system. After a brief overview, the different steps including thecamera calibration and teach-in step as well as the tube localization, measuring pointdetection, tube tracking and the good/bad classification are introduced. All assumptionsand the model knowledge used throughout these steps are presented before.4.1. System OverviewThe fundamental concept of the developed system is a so called multi-image measuringstrategy. This means, the goal is to measure each tube not only once, but in as manyimages as possible while it is in the visual field of the camera. The advantage of thisapproach is that the decision whether a particular tube meets the length tolerances canbe made based on a set of measurements. The total length is computed by averaging overthese single measurements leading to more robust results. Furthermore, the system is lesssensitive to detection errors. Depending on the conveyor velocity and the tube length onecan reach between 2 and 10 measurements per tube.The system is designed to work without any external trigger that provokes the camerato grab a frame depending on a certain event, e.g. a tube passing a light barrier. Instead,the camera is operated in continuous mode, i.e. images are captured at a constant framerate using an internal trigger. The absence of an external trigger, however, requires fastalgorithms to evaluate whether a frame is useful, i.e. whether a measurement is possible.In addition, the system must be able to track a tube while it is in the visual field of thecamera to assign measurements to this particular tube. Accurate length measurements oftubes require the very accurate detection of the tube edges. A template based tube edgelocalization method has been developed allowing for reliable, subpixel accurate detectionresultsevenunderthepresenceoftubeedgelikebackgroundclutter. Oncethereisevidencethat a tube has left the visual field of the camera, all corresponding measurements haveto be evaluated with respect to the given target length and tolerances. The resultinggood/bad decision must be delegated to the external controller handling the air pressurebased blow out mechanism. Model knowledge regarding the inspected tubes under theconstrained conditions is exploited if possible to optimize the processing.Before any measurements can be performed, the system has to be calibrated and trainedto the particular target length. This includes camera positioning, radial distortion compensationand an online teach-in step.Figure 4.1 gives an overview on the different stages of the system. It can also be seenas outline of this section. The underlying methods and concepts will be introduced in thefollowing in more detail.Throughout this chapter all parameters will be handled abstract. Corresponding valueassignments used in the experiments are given in Section 5.1.1.51


52 CHAPTER 4. LENGTH MEASUREMENT APPROACHCamera calibrationTeach-InNext imageTube localizationNoMeasurementpossible?YesMeasuring point detectionLength measuringNoTube passed?YesTotal length computationGood/bad classificationBlow out controlFigure 4.1: System overview. After camera calibration and a teach-in step the systemevaluates the acquired images continuously. If a tube is located and assigned as measurable,the exact measuring points on the tube edges are detected and the tube length is calculated.Once a tube has passed the visual field of the camera, the computed total length is comparedto the allowed tolerances for a good/bad classification. Finally, the blow out controller isnotified whether the current tube is allowed to pass.


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 53(a) ‘empty’ (b) ‘entering’ (c) ‘leaving’(d) ‘centered’ (e) ‘entering + centered’ (f) ‘ent. + centered + leav.’(g) ‘centered + leaving’ (h) ‘entering + leaving’ (i) ‘full’Figure 4.2: Potential image states. Each image can be categorized into one of these ninestates. States that contain one tube completely with a clear spacing to neighboring tubes canbe used for length measuring, i.e. state (d), (e), (f) and (g) respectively. The remaining statesdo not allow for a measurement and, thus, can be skipped. State (i) might be due to a toosmall field of view of the camera (i.e. tubes are too large), or to a failure in separation (i.e.the spacing between two or more tubes is missing). If this state is detected, a warning mustbe thrown.4.2. Model Knowledge and AssumptionsThe visual length measurement of heat shrink tubes to be proposed throughout this chapteris based on several assumptions and model knowledge regarding the inspected objects,which is introduced in the following.4.2.1. Camera OrientationAs introduced in Section 3.2.2, the camera is placed above the conveyor.adjusted to fulfill the following criteria:It must beThe optical ray is perpendicular to the conveyorThe image plane is parallel to the conveyorThis camera view is commonly denoted as fronto-parallel view [30]. If the image planeis parallel to the conveyor, the average scene depth is quite small. Therefore it is possibleto approximate the perspective projection with a weak-perspective camera model. In thismodel (see Section 2.1.3) objects are projected onto the image plane up to a constantmagnification factor. This means distances between two points lying in the same planeare preserved in the image plane until a constant scale factor. This property is importantto allow for affine distance measurements in a fronto-parallel image view.4.2.2. Image ContentThe following assumptions regard the image content and capture properties of the camera:


54 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a) Ideal tube model(b) Perspective tube modelFigure 4.3: (a) In the ideal model, the (parallel) projection of a 3D tube corresponds to arectangle in the image. The distance d between the left and right edge is equal at each height.Under a perspective camera, objects closer to the camera appear larger in the image. Hence,the distance d1, belonging to the points on the tube edges that are closest to the camera,is larger than d2, and d2 is larger than d3 (the distance of the edge points that are farthestaway). Note, the dashed lines are not visible in the image under back light, and the tubeedges appear convex.Only one tube is visible completely (with left and right end) in each image at onetimeThere is a clear spacing between two consecutive tubesThe guide bars cover the upper and lower border of each imageThe guide bars are parallel and in horizontal directionThe moving direction is from the left to the rightThe mean intensity of the background (conveyor belt) is brighter than the foreground(heat shrink tubes)There is a sufficient contrast between background and objectsThe video capture rate is fast enough to take at least one valuable image of eachtube segment so that a length measurement can be performed. (Potentially theproduction speed has to be reduced to qualify this constraint)The image is not distorted, i.e. straight lines in the world are imaged as straightlines and parallel lines are also parallel in the imageIn this application, the variety of image situations to be observed is highly limited andconstraint by the physical setup (see Chapter 3). Thus, it is possible to reduce the numberof potential situations to nine defined states. Each image can be categorized into exactlyone of these states as shown in Figure 4.2 by means of synthetic representatives. Onlyfour of the nine states are measurable, i.e. state (d), (e), (f) and (g) respectively. In thesestates a tube is completely in the image.4.2.3. Tubes Under PerspectiveUnder ideal conditions, i.e. with a parallel projection, a tube on the conveyor is representedby a rectangle in the image plane with the camera setup used (see Figure 4.3(a)). Due


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 55Figure 4.4: The plane parallel to the conveyor plane Π C that goes through the measuringpoints P L and P R is denoted as measuring plane Π M . TheredlineinΠ M between P L andP R corresponds to the measured distance d1 in Figure 4.3(b), i.e. the distance between themostouterpointsoftheprojectedtubeedgeinanimage.to the guide bars this rectangle is oriented parallel to the x-axis in horizontal directionand parallel to the y-axis in vertical direction respectively. The length can be measuredbetween the left and right edge of the tube in horizontal direction. The horizontal distanced is equal between the left and right tube boundary independent of the height. This is anideal property for length measurements.However, if the camera is not provided with a telecentric lens or the camera is not placedat infinity, the tube’s projection is influenced by perspective. In general, objects that arecloser to the camera are imaged larger than objects further away. Thus, the left and righttube edge do not appear straight in the image, but curved in a convex fashion due to thedifferent distances between a point on the tube’s surface and the camera. Figure 4.3(b)visualizes a synthetic tube under perspective. The distance d1 between the two edgepoints closest to the camera is larger than the distances between points farther away.Accordingly, d2 is larger than d3, although in the real world d1 =d2 =d3 (assuming thetube is not cut skew). The perspective curvature increases with the distance to the imagecenter. Thus, the maximum curvature is reached at the image boundaries, while an edgethat lies directly below the optical center of the camera (approximately the image center)appears straight.With the constraints regarding the image content it is not possible to look inside a tubefrom the camera view if a tube is completely in the image. Therefore, one can assumethat the most outer edge point of the tube edge corresponds always to the point that isclosest to the camera, i.e. measuring between these two points corresponds always to thesame distance in the world.In the following P L and P R will denote the points on the left and right side respectivelythat are closest to the camera. The tube length in the real world is defined as the lengthof the line connecting these two points (corresponding to d1 in Figure 4.3(b)). Assuminga tube has the same height at the left and right side, P L and P R lie in the same planedenoted as measuring plane Π M . This plane is assumed to lie parallel to the image planeas can be seen in Figure 4.4. The measuring points have two correspondences in the imagedenoted as p L =(x pL ,y pL ) T and p R =(x pR ,y pR ) T respectively. The distance between p Land p R in the image can be related to the real world length up to a certain scale factor.


56 CHAPTER 4. LENGTH MEASUREMENT APPROACHHowever, this scale factor may differ depending on the image position. It is expectedthat the distance between p L and p R will be slightly shorter at the image boundaries andmaximal at the image center due to perspective.4.2.4. Edge ModelThetubeedgesaremodeledasramp edges as introduced in Section 2.3.1, since this modeldescribes the real data most adequate both for transparent and black tubes. The slope ofthe ramp determines the sharpness of an edge. As steeper the rise (or fall respectively) assharper the edge. Obviously, the edge position can be located much more precise if theramp has only a minimum spatial extension.As mentioned before in the technical background section there are several factors thatcan cause ramp edges including the discrete pixel grid, the camera focus, and motion blur.The first factor can be reduced if using a high-resolution camera (reminding the trade offbetween resolution and speed as discussed in Section 3.2.1). The camera focus dependsmainly on the depth of an object. In this application, the depth of an object does notchange over time, since all tubes in a row have the same diameter and are lying on theplanar conveyor belt which is parallel to the image plane. In the following it is assumedthat the camera and the optical system are adjusted in way that a tube is imaged as sharpas possible. Motion is another common parameter influencing the appearance of an edge.Since the tubes are inspected at motion (up to 40m/min), a short shutter time (exposuretime) of the camera is required. If the shutter time is too large, light rays from one point onthe tube contribute to the integrated intensity values of several sensor elements along themoving direction. Especially the left and right tube boundary considered for measuringare affected by motion blur as they lie in the moving direction.Therefore, it is assumed that the shutter of the camera is adjusted to a very smallexposure time to suppress motion blur as much as possible. A short shutter time requiresa large amount of light to enter the camera at one time. The iris optical system has to bewide open (corresponding to a little F-number) and the illumination must be sufficientlybright.4.2.5. TranslucencyTranslucency is the main property to distinguish between transparent and black tubes.Black tubes do not transmit light leading to one uniform black region with strong edgesin the image under back light. In this case, the local edge contrast at a certain positiondepends on the background only. On the other hand, transparent tubes transmit light.However, some part of the light is also absorbed or reflected in directions that do notreach the camera. Therefore, a tube will appear darker in the image compared to thebackground. It will even be darker at positions where the light has to go through morematerial. This leads to two characteristic dark horizontal stripes at the top and bottom ofa transparent tube as can be seen in Figure 4.5. This model knowledge has been exploitedto define a robust feature for edge localization which can still be detected in situationswhere the contrast at the center of the edge is poor.The printing on the tubes also reduces the translucency and is therefore visible ontransparent tubes in the image. On average it covers about 8% of a tube’s surface alongthe perimeter for 6, 8, and 12mm diameter tubes.


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 57Figure 4.5: The image intensity of transparent tubes is not uniform as for black tubes.Depending on how much light can pass through a tube, regions appear darker or brighter.One characteristic of transparent tubes under back light are two dark horizontal stripes atthe top and the bottom of a tube indicated by the arrows. The printing also reduces thetranslucency and thus appears darker in the image.4.2.6. Tube OrientationThe tube orientation is highly constrained by the guide bars as introduced in Section 3.1.Thus, an approximately horizontal orientation can be assumed throughout the design ofthe inspection algorithms.In practice, the distance between the guide bars is slightly larger than the outer diameterof a tube to prevent a blockage, since tubes may not be ideally round. This means, thecross-section of a tube can be elliptical instead of circular. Let d space denote the verticaldistance between the guide bar distance d GB , and h max the maximum expected tubeextension in vertical direction with respect to the image projection. The remaining spacingdistance can be expressed as d space = d GB − h max ascanbeseeninFigure4.6(a).The maximum possible rotation is reached if the tube hits both guide bars at two points(see Figure 4.6(b)). The maximum angle of rotation θ max can be defined as the anglebetween the longitudinal axis of the tube and the x-axis. One can define an unrotatedversion of the tube with the longitudinal axis parallel to the x-axis and shifted so that thetwo axis intersect at the center of gravity of the rotated tube. In Figure 4.6(b) this virtualtube is visualized as dashed rectangle. The distance between the measuring points of therotated and the ideal horizontal tube can be also seen in the Figure and are denoted asd L and d R for the left and right tube side respectively. Both d L and d R are ≤ d space /2. Ifthetubeisnotbent,d L = d R . The maximum error between the ideal distance l and therotated distance l ′ can be estimated as follows:err θ = l ′ − l√(4.1)= l 2 + d 2 space − lFor example, in a typical setup for 50mm tubes of 8mm diameter one tube has a length ofapproximately 415 pixels and d space = 15. This leads to an error of err θ =0.27pixel. Thus,with one pixel representing 0.12mm in the measuring plane, the acceptable maximum errordue to orientation would be about 0.03mm. On average this error will be even smaller.Based on these estimation, the orientation error is neglected in the following, i.e. all tubesare assumed to be oriented ideally horizontal.


58 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a)(b)Figure 4.6: (a) The guide bar distance d GB and the maximum extension of a tube in verticaldirection h max define the maximum space between a tube and the guide bars d space at idealhorizontal orientation. (b) The maximum possible tube orientation is limited by the guidebars. The angle θ between the longitudinal axis of the tube and the ideal measuring distanceparallel to the x-axis determines the maximum distance the measuring point can be displacedby rotation (d L = d R ifthetubeisnotbent).Thisdistanceis≤ d space /2 and can be used toestimate the error due to rotation between the ideal tube length l andtherotateddistancel ′ .


4.3. CAMERA CALIBRATION 594.2.7. Background PatternAs introduced in Section 3.3, the measuring area is illuminated by a back light setup belowthe conveyor belt. This setup emphasizes the structure of the belt which can be seen as acharacteristic pattern in the image. This pattern may differ between different belt types.Depending on the light intensity it is possible to eliminate the background completely. Ifthe light source is bright enough, the background appears uniform white even with a shortshutter. For black tubes such an overexposed image would lead to an almost binary image.Transparent tubes, however, do also disappear under too bright illumination. Hence, therewill be always a certain amount of background structure visible in the image in practice.The strength of the background pattern increases with lower light intensity.In the following, it is generally assumed that the illumination is adjusted to allow fordistinguishing between a tube edge and edges in the background. Larger amounts of dirtor other particles than heat shrink tubes on the conveyor must be prevented.4.3. Camera CalibrationIn the previous section several assumptions regarding the camera position and the imagecontent have been presented. With respect to accurate measurements it is important thatan object is imaged as reliably as possible, this means, straight lines should appear straightand not curved in the image, parallelism should be preserved, and objects of the same sizeshould be mapped to the same size in the image. Unfortunately, the later properties donot hold in the perspective camera model as introduced before. However, under certainconstraintsitispossibletominimizetheperspectiveeffects.If the internal camera parameters are known including the radial and tangential distortioncoefficients, it is possible to compute an undistorted version of an image. Afterundistorting, straight lines in the world will appear as straight lines in the image. Furthermore,if one can arrange the camera in way that objects of equal size are projectedonto the same size in the image within the camera’s field of view at a constant depth, onecan assume that the image plane is approximately parallel to the conveyor.In the following the calibration method used to receive the intrinsic camera parametersas well as a method to arrange the camera in a way that perspective effects are minimizedis presented.4.3.1. Compensating Radial DistortionTo compensate for the radial distortion of an optical system, one needs to compute theintrinsic camera parameters. Since the intrinsic parameters can be assumed to be constantif the focal length is not changed, the calibration procedure does not have to be repeatedevery time the system is started and therefore can be precomputed offline.The common Camera Calibration Toolbox for Matlab of Jean-Yves Bouguet [9] is usedfor this purpose. It is closely related to the calibration method proposed in [74] and [31].The calibration pattern required in this method is a planar chessboard of known grid size.The calibration procedure has to be performed for each lens separately. The camera isplaced at a working distance of approximately 250mm over the measuring area with a16mm fix-focal lens. It is adjusted to bring tubes with a diameter of 8mm at this distanceintofocus(inthemeasuringplaneΠ M ).


60 CHAPTER 4. LENGTH MEASUREMENT APPROACHFigure 4.7: 16 sample images used for calibrating the intrinsic camera parameters.16 images of a 21 × 10 chessboard of 2.5mm grid size at different spatial orientationsaround the measuring plane Π M have been acquired. A selection of this images can befound in Figure 4.7.In each image the outer grid corners have to be selected by hand. The remaining cornersare then extracted automatically at subpixel accuracy as can be seen in Figure 4.8. Thecoordinate axis of the world reference frame are also visualized. The Z axis is perpendicularto the chessboard plane in direction to the camera.The result of this calibration procedure are the intrinsic camera parameters includingthe radial distortion coefficients. The Camera Calibration Toolbox for Matlab allows alsofor visualization of the extrinsic location of each of the 16 calibration pattern with respectto the camera as shown in Figure 4.9. The actual working distance of approximately250mm is reconstructed very well. The resulting radial distortion model can be found inFigure 4.10. In Section 3.2 the area of interest function of the camera has been introducedsince the whole image height is not needed. Obviously, the goal is to select the location ofthis area with respect to minimum distortions. The position of the AOI within a full sizeimage is visualized by the red lines, i.e. only pixels between these lines are considered.4.3.2. Fronto-Orthogonal View GenerationOnce distortion effects have been compensated, the goal is to yield a view of the measuringarea in which the world plane, e.g. the conveyor belt, is parallel to the image plane. Thereare two main strategies that can be applied.In the first strategy the camera is positioned only roughly. Afterward the perspectiveimage is warped to yield an optimal synthetic fronto-orthogonal view of the scene. In thesecond strategy the camera is adjusted as precise as possible so that the resulting imageis approximately fronto-orthogonal and does not need any correction.


4.3. CAMERA CALIBRATION 61Figure 4.8: Extracted grid corners at subpixel accuracy. The upper right corner is defined asorigin O of the world reference frame. The directions of the X and Y axis are also visualizedwhile the Z axis is perpendicular to the chessboard plane in direction to the camera.2011020O cZ Xc cY cExtrinsic parameters (camera centered)40 200201005002001501514 6312516217 410 79 8113Figure 4.9: Reconstructed extrinsic location of each calibration pattern relative to the camera.The working distance of approximately 250mm is detected very well.


62 CHAPTER 4. LENGTH MEASUREMENT APPROACHFigure 4.10: Visualization of the resulting radial distortion model. The computed center ofdistortion indicated by the ‘◦’ is slightly displaced from the optical center (‘×’). The imagearea of interest considered in this application lies in between the red lines.Perspective Warping One possibility to compute a synthetic fronto-orthogonal view ofan image is based on the extrinsic relationship of the camera plane and a particularworld plane (e.g. conveyor plane) that can be extracted in a calibration step. With theextrinsic parameters it is possible to describe the position and orientation of the worldplane in the camera reference frame. Finally, one can compute a transformation thatmaps the world plane into a plane parallel to the image plane or vice versa, and warp theimage to a synthetic fronto-orthogonal view. This approach has a significant drawback.First of all, the accuracy of the results is closely related to the calibration accuracy.Furthermore, the extrinsic parameters of a camera change if the camera is moved evenslightly compared to the intrinsic parameters that can be assumed constant as long asthe focus is not changed. Thus, one has to recalibrate the extrinsic parameters as well asthe transformation parameters every time the camera is moved, which seemed to be notpracticable in this particular application.There are other methods that can be used to compute a fronto-orthogonal view of anperspective image, which are based on characteristic image features such as parallel ororthogonal lines, angles, or point correspondences and do not need any knowledge on theinterior or exterior camera parameters [30]. One common approach is based on pointcorrespondences of at least 4 points x i and x ′ i with x′ i = Hx i (1 ≤ i ≤ 4) and⎡H = ⎣h1 h2 h3h4 h5 h6h7 h8 h9⎤⎦ (4.2)the projective transformation matrix representing the 2D homography.The unknown parameters of H can be computed in terms of the vector cross productx ′ i × Hx i = 0 using a Direct Linear Transformation (DLT) [30]. To correct the perspectiveof an image one has to find four points in the image that lie on the corners of a rectanglein the real world, but are perspectively distorted in the image. These points x i have to bemapped to points x ′ i that represent the corners of a rectangle in the image. Then, after H


4.3. CAMERA CALIBRATION 63is computed, each point in the image is transformed by H. Obviously, this is an expensiveoperation for larger images. Furthermore, in practice the question is where to place thecalibration points. One possibility is to place them on top of the guide bars. The systemcould automatically detect the calibration points and check whether these points lie on arectangle in the affine image space. This requires a very accurate positioning of the guidebars, and all marker points should be coplanar, i.e. lie in one plane. Assuming one cansolve this mechanical problem there is still another problem, since - depending on how thedestination rectangle is defined - the warped image may be scaled. In any case, warpingdiscrete image points requires interpolation since transformed points may fall in betweenthe discrete grid. Obviously, this can reduce the image quality.Online Grid Calibration Although the previous described approach does not require anaccurate positioning of the camera, there are several drawbacks especially with respect toperformance and image reliability. If there is a way to adjust the camera perfectly onedoes not need warping and perspective correction. However, a human operator must beable to perform this positioning task in an appropriate time.Therefore, an interactive camera positioning method has been developed denoted asOnline Grid Calibration.First, the distance of the parallel guide bars has to be adjusted to the current tube size.Then, a planar chessboard pattern of known size is placed between the guide bars on theconveyor within the visual field of the camera. The horizontal lines on the chessboard mustbe parallel to the guide bars (see Figure 4.11). To simplify the adjustments, a mechanicaldevice may be developed that can be placed in between the guide bars combining thefunction of a spacer bringing the guide bars into the right distance, and the calibration gridthat perfectly fits into the space between the guide bars with the designated orientation.The underlying idea is as follows: If the chessboard is imaged in a way that vertical linesin the world are vertical in the image and horizontal lines appear horizontal respectively,while each grid cell of the chessboard results in the same size in the image, the camera isadjusted accurate enough to yield a fronto-orthogonal view.The process of camera adjustment can be simplified if the operator gets a feedback inreal-time of how close the current viewing position is to the optimal position. Therefore,the live images of the camera are overlaid with an optimal visual grid of squares. This gridcan be parametrized by two points, i.e. the upper left corner and the lower right cornerrespectively as well as the vertical and horizontal size of each grid cell. The operator canmove the grid in horizontal and vertical direction and adjust the size. This is a goodfeature to initialize the grid or to perform the fine adjustments.For each image, the correspondence between the overlaid virtual grid and the underlyingimage data is computed. A two step method has been developed. At first the imagegradient both in vertical and horizontal direction is extracted using the SOBEL X andSOBEL Y operator. This information can be used to approximate the gradient magnitudeand orientation (see Equation 2.21 and 2.24). Since there is a strong contrast betweenthe black and white chessboard cells, the gradient magnitude at the edges is strong aswell. If the virtual grid matches the current image data, the gradient orientation φ(p)on horizontal grid lines must be ideally π/2 or3π/2 respectively depending on whetheran edge is a black-white or white-black transition. Remind that the gradient direction isalways perpendicular to the edge. Correspondingly, vertical grid lines have orientations of


64 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a)(b)Figure 4.11: Online Grid Calibration using a 5 × 5mm chessboard pattern. (a) Calibrationimage distorted by perspective. The goal in this calibration step is to adjust the camera in away that the chessboard pattern perfectly fits the overlaid grid as in (b).0orπ. Inpractice,thegradientorientationisallowedtobeinanarrowrangearoundtheideal orientation, since the computation of φ(p) is only an approximation that estimatesthe real orientation up to an epsilon (see Figure 4.12(b)). Thus, theoretically each positionon the virtual grid must meet the orientation constraints. In addition, the gradientmagnitude must reach a certain threshold to prevent that edges induced by noise influencethe calibration procedure.To reduce the computational load only a selection of points on the grid denoted ascontrol points is considered. The position of these points can be seen in Figure 4.12(a).The ratio of grid matches to the total number of control points can be seen as scoreof correspondence. If the score reaches a threshold, e.g. more than 95% of all checkedpositions on the virtual grid match the real image data, the second step of the calibrationis started.The second step concentrates on the size of each grid cell. Assuming negligible perspectiveeffects if the camera is perfectly positioned, all grid cells should have the samesize in the image. To compute the size of each grid cell as accurate as possible, the realedge location of the grid is detected with subpixel precision within a local neighborhoodof each control point on the virtual grid. Therefore, the gradient magnitude of a 7 × 1neighborhood perpendicular to the grid orientation at a given control point is interpolatedusing cubic splines. Then, the width and height of a grid cell can be determined overthe affine distance between two opposed subpixel grid positions. Finally, the mean gridsize and standard deviation can be computed both for width and height. The standarddeviation is used as measure of how close the current camera viewing position equals afronto-orthogonal view. Ideally, if all squares have equal size, the standard deviation iszero. In practice the standard deviation is always larger than zero for example due tonoise, edge localization errors, or a remaining small error of perspective. Experiments


4.4. TUBE LOCALIZATION 65(a)(b)Figure 4.12: (a) Control points (marked as crosses) are used to adjust the virtual calibrationgrid of width w and height h to the underlying image data. (b) Gradient orientation φ ateach control point. Since the computed values are only an approximation, a narrow range oforientations indicated by the gray cones around the ideal orientation is also seen as match.have shown that it is possible to adjust the camera within an acceptable time to yield a100% coverage in step one and a grid standard deviation of less than 0.3pixels. In thiscase the camera is assumed to be adjusted good enough for accurate measurements.4.4. Tube LocalizationSince the system is intended to work without any external trigger (e.g. a light barrier)that gives a signal whenever a tube is totally in the visual field of the camera, the firststep before further processing of a frame is to check whether there is a tube in the imagethat can be measured or not. If there is no tube in the image or only in parts, this imagecan be neglected. This decision has to be very fast and reliable.4.4.1. Gray Level ProfileTo classify an image into one of the states proposed in Section 4.2.2, an analysis of the intensityprofile along the x-axis is performed. Strong changes in intensity indicate potentialboundaries between tubes and background.In ideal images as be seen in Figure 4.2, the localization of object boundaries is almosttrivial with standard edge detectors (see Section 2.3). In real image sequences, however,there are many changes in intensity of different origin that do not belong to the boundariesof a tube, e.g. caused by the background pattern (see Figure 3.8) or by dirt on the conveyorbelt. Furthermore, the printing on transparent tubes, visible in the image using back lightillumination, influences the intensity profile as will be seen later on.The intensity profile ˆP y of an image row y can be formally defined as


66 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a) transparent, 50mm length, ∅8mm(b) black, 50mm length, ∅8mm250gray level profile250gray level profile200200150150100100505000 100 200 300 400 500 600 700(c)00 100 200 300 400 500 600 700(d)Figure 4.13: Sample images with 11 equally distributed vertical scan lines used for profileanalysis within a certain region of interest. (c) and (d) show the resulting profiles of image(a) and (b) respectively.ˆP y (x) =I(x, y) (4.3)where I(x, y) indicates the gray level value of an image I at pixel position (x, y). Sincea single scan line (e.g. ˆPh/2 with h the image height) is very sensitive to noise and localintensity variations, the localization of the tube boundaries based on the profile of a singlerow can be error-prone. Hence, a set of n parallel scan lines is considered. The meanprofile P n of all n lines is calculated by averaging the intensity values at each position:P n = 1 nn∑ˆP yi (4.4)One property of the resulting profile P n is the projection of a two-dimensional to an onedimensionalproblem which can be solved even faster (processing speed is a very importantcriteria at this step of the computation). Since further processing steps with respect to P nare independent of the number of scan lines n (n ≥ 1), P n is denoted simply as P in thefollowing. A more detailed view on the number of scan lines and the scan line distributionwith respect to robustness and performance is given in Appendix A. In the following N scandenotes the number of scanlines used.i=14.4.2. Profile AnalysisStep 1: The first step is smoothing the profile P by convolving with a large 1D meanfilter kernel of dimension K smooth :


4.4. TUBE LOCALIZATION 67[]1 11P smooth = P ∗...K smooth K smooth K} {{ smooth}K smooth times(4.5)The idea of this low pass filtering operation is to reduce the high-frequency componentsin the profile, thus, especially the structure of the background pattern.Obviously, this step also blurs the tube edges, and therefore reduces the detection precisionsignificantly. Having in mind the goal of the profile analysis, it is intended to verifywhether a measurement is possible in the current frame or not. In a next step, the propermeasurements have to be performed on the original image data and not on the profile.However,knowledgeofthisfirststepdoesnothavetobediscardedandcanbeusedinsteadto optimize the following. In other words, if it is possible to predict a tube’s boundariesreliable, but not precise, this information is then used to define a region of interest (ROI)as close as possible around the exact location.Step 2: The next step is to detect strong changes in the profile. Large peaks in the firstderivative of the profile indicate such changes and can be considered as candidates fortube boundaries. Therefore, a convolution with a symmetric 1D kernel approximating thefirst derivative of a Gaussian is performed:P drv = P smooth ∗ D x (4.6)The odd symmetric 9 × 1 filter kernel D x is given by the following filter tab as proposedin [25] for the design of steerable filters:tab 0 1 2 3 4value 0.0 0.5806 0.302 0.048 0.0028With this kernel a dark-bright edge results in a negative response while a bright-darkedge leads to a positive response. The intensity of the response is proportional to thecontrast at the edge.Assuming the potential tube boundaries have a sufficient contrast, only the strongestpeaks of P drv are of interest for later processing. To simplify the task of peak detection,theabsolutevaluesofthedifferentiatedprofilearetakenintoaccountonly. Thisisdenotedas P + drv as follows: P + drv = |P drv| (4.7)Note that the information of the sign of a peak in P drv is still useful for later classificationand has not to be discarded.Step 3: A thresholding is performed on P + drvto eliminate smaller peaks that correspondfor example to changes in intensity due to the background pattern or dirt:{ P+P thresh (x) =drv (x) , if P + drv (x) >τ peak0 , otherwise(4.8)


68 CHAPTER 4. LENGTH MEASUREMENT APPROACHThe threshold τ peak is calculated dynamically based on the mean of P + drvdenoted asP + drv with τ peak = α peak P + drv(4.9)The factor α peak indirectly relates to the number of peaks left to be further processed.τ peak is also denoted as profile peak threshold. The goal is to remove as much peaks aspossible that do not belong to a tube’s boundary without eliminating any relevant peak.If the images are almost uniform over larger regions as for black tubes, there are onlya few strong changes in intensity. Thus, P + drvis expected to be quite low compared tomax(P + drv) and the peaks belonging to the tube boundaries are conserved even for a largerα peak . On the other hand, for transparent tubes the contrast between foreground andbackground is lower. Hence, the distance between intensity changes due to backgroundclutter and those at the tube boundaries is much smaller. The choice of the right thresholdis more critical in this situation and α peak has to be selected carefully. If it is too low,too many peaks will survive the thresholding. Otherwise if it is too large, importantpeaks will be eliminated as well. The profile peak threshold is closely related to thedetection sensitivity of the system as will be discussed in more detail in later sections.More sophisticated calculations of τ peak considering the difference between maximum valueand mean or the median did not perform better.Step 4: The x-coordinates of the remaining peaks defined as local maxima in P thresharestoredinalistdenotedascandidate positions Ω in ascending order. N Ω indicates thenumber of elements in Ω, i.e. the number of potential tube boundaries in an image.4.4.3. Peak EvaluationThe process described in the previous section results in a number of candidate positionsthat have to be evaluated since it is possible that there are more candidate positionsthan the number of tube boundaries. This is due to the fact that the thresholding isparametrized to avoid the elimination of relevant positions. The actual number of tubeboundaries indicating the current state as introduced in Section 4.2 is not known by nowand has to be extracted by applying model knowledge to the candidate positions.Since only four of the nine possible states can be used for measuring, it is of interest toknow whether the current image matches one of these four states. If this is the case, it issufficient to localize the boundaries of the centered tube. Under the assumptions made inSection 4.2 only one tube can be in the visual field of the camera completely at one time.In the following, an approach reducing this problem to an iterative search for boundariesthat belong to a single foreground object is presented.First, Ω is extended to Ω ′ by two more x-positions: x = 0 at the front and x = x maxat the back of the list, where x max is the largest possible x-coordinate in the profile.Then, any segment s(i), defined as the region between two consecutive positions Ω(i) andΩ(i + 1) , can be assigned to one of two classes in {BG, TUBE} representing backgroundand foreground respectively. In this way, the whole profile is partitioned into N Ω +1segments if there are N Ω peaks.


4.4. TUBE LOCALIZATION 69Global Threshold The classification into BG and TUBE is based on the general assumptionthat the mean intensity of objects is darker than the background. In more detail,taking the mean value of the smoothed profile P smooth as a global reference and calculatingthe local mean value for each segment s(i), the classification C can be expressed as:{ TUBE , mean(s(i)) ≤ PsmoothC 1 (s) =(4.10)BG , otherwiseIn image segmentation the mean value is widely used as an initial guess of a thresholdseparating two classes of data distinguishable via the gray level [48, 2]. There are manymore sophisticated approaches for threshold selection including histogram shape analysis[57, 63, 26], entropy [54], fuzzy sets [20, 14] or cluster-based approaches [55, 46]. The differenttechniques are summarized and compared in several surveys [59, 47, 60]. However,in this application the threshold is used for classification and it is not intended for calculationof a binary image that segments the tubes from the background. Since processingtime is strictly limited and critical in this application, it is essential to save computationtime if possible. As introduced before, the actual segmentation is based on strong verticaledges in the profile, but does not include any semantic meaning of the segments. In theclassification step, the mean turned out to be a reliable and fast choice to distinguish betweenforeground and background segments both for black and transparent tubes if thereis a uniform and sufficient contrast between tubes and the background over the whole image.In this case there is no need for another threshold than the mean - saving additionaloperations.Insteadofcomparingtheglobalmeanwiththelocalmean,thelocalmediancouldbeobserved to result in a more distinct measure for discrimination:{ TUBE , median(s(i)) ≤ PsmoothC 2 (s) =(4.11)BG , otherwiseThe better performance of measure C 2 originates in the characteristic of the mediantobelesssensitivetooutlierscomparedtothemean[32]. Thisisimportantsincetheinput data can be very unsteady due to the background texture or printing visible ontransparent tubes (independent of the additional camera noise level). As mentioned before,thesmoothingoftheprofileatthefirststepalsoblursthetubeedgescausingthesegmentboundaries not to be totally precise. In this case, the local mean tends to move closerto the global mean, which does not have to implicate a misclassification. The median,however, turned out to be more distinct in most cases. Figure 4.14 shows the smoothedprofile of (a) a transparent and (b) a black tube respectively. The examples represents thestates entering + centered and entering + centered + leaving. The segment boundaries,which correspond to the locations of the strongest peaks in the first derivative of theprofile, are visualized as well as the global mean and the local median. Segments thathave a median above the global mean are classified as background.Regional Threshold One drawback of the global threshold approach is that differentbackground segments are assumed to be almost equal in image brightness, i.e. the tubebackgroundcontrast is approximately uniform within one image. This assumption, however,does not hold if there are larger variations in background brightness (for exampledue to material properties or dirt on the belt). Such variations can occur between images,


70 CHAPTER 4. LENGTH MEASUREMENT APPROACH400350smoothed profilesegment boundariespredicted tube boundarieslocal medianglobal mean300250BackgroundBackground200150100Tube 1 Tube 250!4003500 20 40 60 80 100 120 140 160 180x(a) Transparent/ State: entering + centeredsmoothed profilesegment boundariespredicted tube boundarieslocal medianglobal mean300250200BackgroundBackground150100500Tube 1 Tube 2 Tube 30 20 40 60 80 100 120 140 160 180x(b) Black/ State: entering + centered + leavingFigure 4.14: Different steps and results of the profile analysis. After smoothing the profile,strong peaks in the first derivative indicate potential tube boundaries. The segments betweenthe strongest peaks are classified into foreground and background based on the differencebetween the local median of each segment and the global mean. The background is assumedto be brighter on average. Neighboring segments of the same class are merged. The crossesmark the correctly predicted boundaries of the centered tube. Note the stronger contrast ofblack tubes.


4.4. TUBE LOCALIZATION 71but also over the whole image width or locally within a single image. The first case isuncritical as long as there is a sufficient contrast between a tube and the background. Thelater case, i.e. local variations in background brightness, can lead to failures of the globalthreshold. Figure 4.15(a) shows one characteristic situation which occurs quite often withtransparent tubes. The background intensity on the left is much darker compared to theright. The global threshold fails, since the much brighter background regions on the rightincrease the global mean. Thus, the local median of the most left segment falls belowthe threshold and is therefore classified as foreground. Due to this misclassification nomeasuring will be performed on this frame, although it would be possible.A region based threshold can overcome this problem. The idea is to compute theclassification threshold not globally, but on regional image brightness. While the localmedian is computed for each segment, a good classification threshold must consider atleast one transition between background and foreground. Following the assumptions madein Section 4.2, two tubes can not be completely in the image at one time. Furthermore,the number of connected background regions in the image can not exceed two. If thereare two connected background regions, one has to lie in the left half of the image whilethe other falls in the right half. Thus, one can define two regions, left and right of theimage center respectively, and compute the mean for each region as analogue to the globalmean. Inthefollowing,themeanoftheleftandrightsideofthe(smoothed)profilearedenoted as P left and P right respectively.If there is only one background region (states empty, entering, leaving, entering +leaving), splitting the image at the center has no negative effect. The left and right meanis computed either over a tube and background region, or over background only. At thevery special case that the image width is exactly twice a tube’s length and the tube enters(or leaves) the scene with the right (or left) boundary exactly on the image center, theregional threshold is computed only over the tube and the classification may be eitherforeground or background. However, in both cases this situation can be detected as astate where a measurement is not possible and is therefore a sufficient solution.The region based classification of the segments can now be expressed as:{ TUBE , median(s(i)) ≤ τregionC 3 (s) =(4.12)BG , otherwisewhere τ region is defined as follows:⎧⎨ P leftτ region = P⎩ rightmax(P left , P right ),s(i) falls into left region only,s(i) falls into right region only,s(i) falls into both regions(4.13)In Figure 4.15(b) one can see the difference between the global and the regional classificationthreshold. The regional threshold of the left half is much lower compared tothe global threshold. On the other hand, since the second segment belonging to the tubeintersects the center, the maximum of both regional thresholds is taken into account whichlies significantly above the global threshold. Finally, all segments are classified correctly.With this threshold, the classification is less sensitive to darker background regions.Thetwomethodshavebeencomparedinthefollowingexperiment:A sequence of transparent tubes (50mm length, 8mm diameter) has been capturedincluding 467 frames that have been manually classified as measurable, i.e. a tube is


72 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a)250200Smoothed graylevel profileRegional meanGlobal meanpeak candidatesfiltered peakslocal median1501005000 20 40 60 80 100 120 140 160 180 200(b)Figure 4.15: (a) The background intensity at the left is much darker than at the right.The global mean as threshold can not compensate for such local variations as can be seenin (b). In this case, the left background region is wrongly classified as foreground, since theglobal threshold is larger than the local median of the corresponding segment. A region basedthreshold that considers the left and right image side independently can overcome this problem(see text).


4.4. TUBE LOCALIZATION 73Global RegionalTotal number: 467 467Measurable: 353 414Average PTM: 5.98 7.01Table 4.1: Comparison of the global and regional threshold used for classification in theprofile analysis. The table shows the number of images that have been correctly detectedas measurable compared to the total number, as well as the average number of per tubemeasurements (PTM). Using the regional threshold increases the number of measurementssignificantly.Figure 4.16: Ghost effect: If the parameters of the profile analysis are too sensitive, darkerparts on the conveyor (e.g. due to dirt or background structure) can be wrongly classified asatube.completely in the image. The sequence has been analyzed once with the global thresholdand once with the regional threshold. All other parameters have been constant. Theresults can be found in Table 4.1. In this context it is important to understand that theterm measurable is related to a single image. It does not mean if the system fails to detecta tube in one image that the tube can pass the visual field of the camera undetected. Thisoccurs only if it is not measured in all images that include this tube what is very unlikely.The experiment shows the average number of measurements per tube can be increasedby approximately one if using the regional instead of the global mean as threshold for thetube classification. Particularly situations as in Figure 4.15(a) can be prevented.The reason why none of the two method has detected all measurable frames is due toother parameters for example a too little contrast between a tube and the background.The dynamic threshold τ peak as introduced before defines the strongest peaks of the profilederivative. If it is too large, low contrast tube edges may not be detected. On the otherhand, if it is too low, darker regions in the background may be wrongly classified asforeground. This leads to ghost effects, i.e. the system detects a tube where actuallyno tube is as can be seen in Figure 4.16. Therefore, the weighting factor α peak of τ peak(see Equation 4.9) must be adjusted in the teach-in step to the smallest value that doesnot produce ghost effects if inspecting an empty, moving conveyor belt. Obviously, thecompromise gets larger with an increasing amount of dirt.Merging Segments In Figure 4.14(a), one can find two more segments than needed torepresent the actual state entering + leaving. Two strong peaks on the right that are dueto a dark dirt spot on the conveyor belt have not been eliminated by the thresholding.However, the corresponding segments are correctly classified as background leading tothree consecutive background segments which could be merged to one large segment.In general, once all segments s(i) are classified the goal is to iteratively merge neighboringsegments of the same class and to eliminate foreground segments that do not qualify


74 CHAPTER 4. LENGTH MEASUREMENT APPROACHInput : coordinate list Ω ′ with N = |Ω ′ |global median of profileminimum tube segment s i z e MIN SIZEStep1 :define segments: S = { s [ i ] = [Ω ′ [i], Ω ′ [i+1] ]}classify each segment based on Eq. 4.12 :s[i].label = C 3 (s[i])i f s [ i ] . l a b e l == TUBE f o r a l l i r e t u r n ERROR/ remove foreground segments at the borders /let i1 be the index of the firstand i2 the index of the last BG segmentset s[ j ]. label = BG for all j , 0≤j


4.5. MEASURING POINT DETECTION 75for measuring. An overview of the algorithm is shown in Listing 4.1. A size filter operation,which can be parametrized with respect to the given target length, is used to removetoo small foreground segments (e.g. caused by dirt on the conveyor belt).The output of the algorithm is either one large background segment (i.e. all foregroundsegments have been removed if existed since they did not fulfill the criteria) or threesegments in the form BG-TUBE-BG. In the later case, the peaks belonging to the leftand right boundary of the remaining foreground segment are finally verified with respectto the sign of the derivative. With the derivative operator used, the position of the leftboundary must result in a negative first-order derivative value (bright-dark edge) and theright boundary in a positive value (dark-bright edge). If the predicted tube boundariesare consistent with this last criterion, they are used to define two local ROIs of widthW ROI as starting point for a more precise detection of the measuring points. The localROI height is defined over the distance between the two guide bars.ThemergingofthesegmentsisalinearoperationinthecomplexityofO(N Ω ). Since itis only allowed to reclassify a former foreground segment into background in this procedureand never vice versa, Step2 of the algorithm is repeated only once if at all. Hence, thealgorithm terminates for sure.If all segment are classified as TUBE in the first step, an error is returned. This errorindicates the presence of state full (See Figure 4.2(i)). The reason can be due to a toosmall field of view of the camera or to a missing spacing between consecutive tubes. Inany case it is not possible to perform a measuring. Since this state is critical compared toother states that can not be used for measuring, it is important to detect this situation.In practice, if this situation occurs an alert must be produced.4.5. Measuring Point DetectionThe previous sections described a fast method to distinguish whether a frame is useful ornot. If a measuring is possible, two regions around the potential left and right boundaryof a tube to be measured are the output of this first step. In the following, the exact tubeboundaries have to be detected with subpixel accuracy.4.5.1. Edge EnhancementAs introduced in Section 2.3 there is a large number of approaches for edge detection. Fourcommon methods including the Sobel operator, Laplace operator, Canny edge detector [13]and a steerable filter edge detector based on the derivative of a parametrized Gaussianhave been applied to test images. The results can be found in Table 4.2. It includesexperiments with two transparent tubes (left boundary) of the same sequence and oneblack tube boundary. All tubes have a inner diameter of 8mm. The difference in sizebetween the transparent and black tubes is due to a different camera-object distance.As can be seen the edge of the transparent tubes can differ in brightness, contrast andbackground pattern between frames.The goal was to find an edge detection operation that adequately extracts the tubeboundaries under the presence of background structure and noise, and which is computationalinexpensive in addition.


76 CHAPTER 4. LENGTH MEASUREMENT APPROACHInputInputSOBEL XGaussian5x5 (a)SOBEL YGaussian5x5 (v)LaplaceGaussian7x7 (a)Canny(50/100)Gaussian7x7 (v)Canny(90/230)Gaussian11x11 (a)Canny(185/210)Gaussian11x11 (v)Table 4.2: Comparison of different edge detectors. The parameters of the Canny edgedetector indicate the lower and upper threshold. The Gaussian derivative based edge detectionresults are all of first-order. An (a) indicates edges of all orientations (in discrete steps of 5 )are enhanced with a steerable filter approach, while (v) represents only vertical edges.


4.5. MEASURING POINT DETECTION 77The results of the transparent tubes are crucial for the selection of an appropriateedge detection approach used in this application, since due to the strong contrast thedetection of the black tube boundaries is uncritical with all tested methods. For bothtube types the edge detection results differ in detected orientation, edge elongation (i.e.how precise an edge can be localized), or edge representation (signed/unsigned values,floating point/binary, etc.).Canny Edge Detector The Canny edge detector results in a skeletonized one pixel wideresponse that precisely describes edges of arbitrary orientation. In this application themain drawback of Canny’s approach is the importance of the threshold choice. As canbe seen in Table 4.2, different parameter sets yield very different results. If the upperhysteresis threshold used as starting point for edge linking is low (e.g. 100) combined witha lower second threshold (e.g. 50), too many background edges are detected as well. Alarger upper threshold (e.g. > 200) reduces the number of detected edge pixels, but alsoeliminates parts of the tube edge. It is possible that it breaks up into parts. If the distancebetween upper and lower threshold is large, it is likely that background and tube edgesare merged. In any case a threshold set working fine with one image can lead to verypoor results in another. The result of the Canny edge detector is a binary image wherenon-edge pixels have a value of zero and edge pixels a value of one (or 255 in 8bit gray levelimages). Binary contour algorithms can be applied to analyze chains of connected edgepixels. As can be seen in the test images, depending on how many edge pixels survivedthe thresholding, such analysis can be very complex and time-consuming. Gaps withinedges belonging to the tube boundary make this search even more complicated.Sobel The Sobel operator approximates a Gaussian smoothing combined with differentiation.It can be applied with respect to x- andy- direction. Accordingly to the filterdirection, vertical or horizontal edges are enhanced. Since the tube boundaries have a verticalorientation, the SOBEL X operator is an adequate choice in this application. Edgesare located at local extrema, i.e. local minima at bright-dark edges and local maximafor dark-bright edges with respect to the gradient direction. A drawback is that alsothe background pattern is dominantly vertical oriented, thus, background edges are alsodetected. The intensity of an edge is related to the image contrast. Assuming a certaincontrast between tubes and background, a large amount of background clutter could beremoved by thresholding leaving only tube edges and edges due to high-contrast dirt particles.However, this would lead to a similar approach like the Canny edge detector withthe drawbacks stated before.Laplace The implementation used to test the Laplacian calculates the second-order derivativein x- andy-direction using the Sobel operator and sums the results. The output isan image of signed floating point values. Edges are located at the zero crossings betweenstrong peaks. The Laplacian is an anisotropic operator, thus, edges off all orientations aredetected equally. One drawback of this method is the sensitivity to noise. In the resultingresponse there are many zero crossings. Compared to first-order derivatives, the edge criterionis more complex. A pixel is an edge pixel if the closest neighbor in the direction ofthe gradient is a local maximum while the opposite neighbor is a local minimum and both


78 CHAPTER 4. LENGTH MEASUREMENT APPROACHneighbors must meet a certain threshold. However, the zero crossing can be computedwith subpixel accuracy.Steerable Filters The idea with filters that are steerable for example in scale and orientationis to design a filter that performs best for a particular edge detection task (SeeSection 2.3.3). In this application the goal is to find a filter that extracts the tube edgeswith maximum precision, while background edges and dirt are suppressed. The steerablefilter approach allows for testing a large range of different edge detection kernels.Experiments with systematically varied parameter sets of first-derivative Gaussian filtersfollowing the approach of Freeman and Adelson [25] are applied to the test images. Someof the results are visualized in Figure 4.2.As can be seen, the background clutter can not be eliminated even with larger kernelsizes while the tube edges get blurred. No parameter setting for a Gaussian derivativekernel has been found that performs significantly better as a tube edge detector than thecomputational less expensive Sobel operator.All tested methods beside the Canny edge detector can be seen more as edge enhancerthan as real edge detectors. This means, the results do not fulfill the second and thirdcriterion for good edge detection (See Section 2.3.1). Further processing of the edgeresponses such as nonmaximum suppression is necessary. An alternative is a templatebased edge localization step which is introduced in the next section.4.5.2. Template Based Edge LocalizationIt is important to state that even precisely detected edges (including Canny’s approach)still have no semantical meaning. In all tested methods there have been false positives,i.e. edges belonging to the background, dirt, or noise. Hence, model knowledge has to beapplied to the detected edges to ensure whether an edge really corresponds to a tube’sboundary or not.In this application, the highly constrained conditions reduce the number of expectedsituations to a small, well defined minimum. The edges belonging to the tube boundariesof interest are always approximately vertical. Due to perspective the tube boundaryappears straight or slightly curved in a convex fashion under back light, depending on theposition of the tube with respect to the optical ray of the camera. The more the tubeboundary is displaced from the camera center the larger is the curvature.At this stage it is of interest to locate a tube’s boundaries within the two local ROIs(left and right respectively). Strong changes in image intensity in x direction (verticaledges) have been enhanced using the SOBEL X operator. The goal is not only to findthe strongest peaks in the edge image, but also the strongest connected ridge along suchpeaks that most likely corresponds to the tube boundary. This task can be performed bytemplate matching (See Section 2.4).If the feature to be detected can be modeled by a template, the response of the crosscorrelationwith this template computes a match probability within a given search region.The idea is to design a template that models the response of the edge enhancer andcorrelate this template with the local ROI. The position where the correlation has itsmaximum provides close information on the tube boundary location. Therefore, it is


4.5. MEASURING POINT DETECTION 7940035030025020015010050-50 0-100-150051015x2025303501020304050 y60708040 9035030025020015010050-50 0-100-1500 5 10 15 20 25 30 35 40x0102030405060708090y(a)(b)35030025020015010050-50 0-100-1500 5 10 15 20 25 30 35 40x0102030405060708090y100080060004001020020030-200405060700 5 10 158020 25 30 35 40 90xy(c)(d)Figure 4.17: Edge detection results of the SOBEL X operator applied to different tubes(right boundary). The tube boundary corresponds to the strongest ridge in vertical direction ineach plot. It can be seen that the edge response differs in curvature, intensity and backgroundclutter. (a) Almost straight edge (close to the optical center of the camera) of a transparenttube with a quite uniform region left of the ridge belonging to the tube, and a more varyingarea on the right due to the background structure. (b) The tube boundary looks convex iffurther away from the camera center due to perspective. The edge response is much strongerat the ends of the ridge than at the center. This is due to the amount of light which istransmitted by the tube (see text). (c) Edge of a transparent tube with a printing close tothe boundary visible as smaller ancillary ridge on the left. (d) Boundary of a black tube. Theedge response is about three times stronger compared to transparent tubes due to the strongimage contrast.


80 CHAPTER 4. LENGTH MEASUREMENT APPROACHimportant to have a closer look on the response of the edge detection results with respectto the input data. Consistent characteristics can used for the design of the right template.Figure 4.17 shows examples of the SOBEL X operator applied to test images. In thiscase, the response corresponds to the right ROI of three transparent tubes (Figure 4.17(a)-(c)) and one black tube (Figure 4.17(d)) at different positions in the image with respectto the x-axis. The tube boundary can be detected intuitively by humans even under thepresence of background clutter. However, one can find the edge response differs betweenthe different plots due to image contrast or perspective.Figure 4.17(a) shows an almost straight edge (close to the optical center of the camera)with a quite uniform region left of the ridge belonging to the tube, and a more varying areaon the right due to the background structure. It can be observed that the edge responseis stronger at the ends of the ridge than in the center, which is due to the transmittancecharacteristic of transparent tubes (see Section 4.2). More light is transmitted at thecenter leading to brighter intensity values and a poorer contrast, while the corners (‘L’-corners between horizontal and vertical boundary of a tube) are darker and yield a bettercontrast. This effect can be seen also very clearly in Figure 4.17(b). In addition, the tubeboundary looks convex due to perspective since it is further away from the camera center.Vertical edges of printings on a tube’s surface are also extracted by the edge detection stepas can be seen in Figure 4.17(c). In this case, the straight line of an upsight-down capital‘D’ falls into the right local ROI, causing the smaller ancillary ridge on the left of the tubeboundary. Figure 4.17(d) includes the boundary of a black tube. Due to the strong imagecontrast the edge response is about three times stronger compared to transparent tubes.The influence of the background clutter reduces to a minimum and since printings are notvisible on black tubes at back light, this problem vanishes completely. The edge responsedoes not differ in intensity at the ends like with transparent tubes.4.5.3. Template DesignThe goal is to design a universal, minimum set of templates that covers all potential edgeresponses of both transparent and black tube boundaries. The templates must modeldifferent curvatures to be able to handle perspective effects. Assuming a constant horizontalorientation and a constant size, the curvature is the only varying parameter betweentemplates. The following two-dimensional function has been developed that can be parametrizedto approximate the expected edge responses:( ( ) )y2T ψ (x, y) =aexp b − (x − (ψy2 )) 2H T 2σ 2 (4.14)It is based on a Gaussian with standard deviation σ in x-direction extended with respectto y. The curvature is denoted by ψ. A value of ψ = 0 represents no curvature, whilethe curvature increases with increasing values of ψ (ψ ≤ 1). The first summand in theexponent of the exponential function can be used to emphasize the ends of the templatein y-direction which is motivated in the characteristic response of transparent tubes. Theedge detector results in higher values at the ends than at the center. b controls the amountof height displacement. If b = 0, the template is equally weighted. H T corresponds to thetemplate height. a determines the sign of the template values. For bright-dark edges like atthe left boundary the edge response is negative, thus a


4.5. MEASURING POINT DETECTION 810.80.70.60.50.40.30.20.100102010.80.60.40.2001020020.9 1 (a)x468304050607010 80y02x468304050607010 80y(b)10.80.60.40.2002x46801020304050y60701.8 2 (d)1.61.41.20.8 10.60.40.20024x6801020304050y607010 80(c)Figure 4.18: Different templates generated using Equation 4.14. (a) Straight edge: ψ =0,b = 0. (b) Curved edge: ψ =0.005, b = 0. (c) Curved edge: ψ =0.02, b =0. (d)Curvededgewith emphasized ends: ψ =0.002, b =3. (σ =0.8 anda = 1 has been used for all templatesin this figure). Note the differently scaled axis x and y.


82 CHAPTER 4. LENGTH MEASUREMENT APPROACH# templates ψ min ψ max χ a bLeft: 30 0.0 0.02 0.00066 −1 3Right: 30 −0.02 0.0 0.00066 1 3Table 4.3: A set of 30 templates with curvatures equally distributed between ψ min and ψ maxat a curvature resolution (step size) χ has been used to determine the occurrence of certaincurvatures empirically.side a>0 is used to model the positive response of dark-bright edges. Figure 4.18 showssome examples that visualize Equation 4.14 and the effect of the different parameters.Template Dimension A constant template width of 11pixels is used, which is largeenough to represent both straight and maximal curved tube boundaries. The templateheight is defined over the global ROI height. Assuming the guide bars are always arrangedso that the guide bar distance is only slightly larger than the tube’s perimeter, the globalROI height is a good reference on the tube size. It is possible to compute a well guess ofthe tube height by the following equation:H T = γH ROIG (4.15)where H ROIG is the global ROI height and γ a factor between 0 and 1.Curvature Thequestionis,whatrangeofcurvaturesoccursinpracticeandhowmanytemplates are needed to cover that range. Therefore, several test sequences with bothblack and transparent tubes of different diameter have been captured. 30 templates ofdifferent curvature have been generated for both tube sides. The parameters can be foundin Table 4.3.Foreachmeasurableframethecurvatureofthetemplatethatreachesthemaximumcorrelation value is taken into account to build a histogram of curvature occurrence bothfor the left and ride tube side. The normalized cross-correlation (see Equation 2.31) isused as measure evaluating the match quality of each template at a certain location. Theresults can be found in Figure 4.19.It shows, the occurring curvatures are limited to a small range denoted as R ψ, left andR ψ, right with R ψ, left =[0, 0.005] for the left and R ψ, right =[−0.005, 0] for the rightside respectively. In order to reduce the number of templates all curvatures outside thisrange can be ignored.Another important criteria is the step size or curvature resolution χ, i.e. how manysteps between ψ min and ψ max are taken into account. Theoretically one could quantizethe curvature ranges into very small steps. However, since correlation is an expensiveoperation one has to make a compromise between accuracy and performance. It wasobserved that if more than 15 templates have to be tested at each tube side per frame, thesystem starts to drop frames, i.e. this is a quantitative indicator that the overall processingtime exceeds the current frame rate. Therefore the total number of templates is restrictedto 10 in this application. The corresponding step size between two curvatures is 0.0005.


4.5. MEASURING POINT DETECTION 830.20.30.180.160.250.140.20.12occurrence0.10.08occurrence0.150.060.10.040.050.020-0.002 0 0.002 0.004 0.006 0.008 0.01 0.0120-0.025 -0.02 -0.015 -0.01 -0.005 0 0.005curvaturecurvature(a)(b)Figure 4.19: Histogram of template occurrence for (a) left and (b) right tube side. It canbe seen that only a small range of curvatures can be observed. This reduces the number oftemplates that have to be tested each time.12010080604020002x46810 807060504030y20100Figure 4.20: If the height weighting coefficient gets too large (here: b = 20), the center ofthe tube edge does not contribute to the matching score anymore.Template Weighting The weighting coefficient b in Equation 4.14 is important for transparenttubes. Due to a poor contrast, the overall edge response of a transparent tube mightbe low. If considering only the center region of an edge, the contrast might be even lowerthan a background edge at worst case. The cross-correlation only computes the similarityof a template at a certain location in the image. The maximum response is taken as match,since it is assumed that there must be a tube edge in the search region, even if the sameor another template matches the real tube edge perfectly, but with a lower score. Finallythis will lead to a wrong measurement.With model knowledge about the tube characteristics one can assume that the contrastat the edge ends is significantly stronger. If the template is weighted uniformly at thecenter and the ends, the correlation score depends on the whole edge equally. On theother hands, if the ends of the template are weighted stronger than the center, a templatethat perfectly fits the tube edge will yield a larger score, since background edges are usuallyuniform. Thus, the template is designed to prefer tubes edges.The weighting coefficient b hastobelargerthanonetoyieldthedesiredeffect.Ontheother hand, b must not be too large as well, since then the ends get too much influence. Inthe extreme case, the template equals two spots at a certain distance that do not representa tube edge anymore (see Figure 4.20).


84 CHAPTER 4. LENGTH MEASUREMENT APPROACH(a)(b)MaxMax0.50.40.30.20.100.50.40.30.20.100 2 4 6x8 10 12 14 16 18 205432y100 2 4 6x8 10 12 14 16 18 205432y10(c)(d)Figure 4.21: Effect of the weighting coefficient b in Equation 4.14. (a) Tube edge detectionresults with a uniform weighted template (b = 0). (b) Results of a template with enhancedends (b = 3). (c) Corresponding cross-correlation results of (a), and (d) the cross-correlationresults of (b) respectively. The maximum in (c) and (d) corresponds to the pixel positionwhere the template matches best. In this example, the ridge closer to the observer is due toa background edge while the ridge further away corresponds to the real tube edge.Figure4.21showsanexampleofhowtheweightingofthetemplateendsimprovesthetube edge detection. In this example the right boundary contrast of a transparent heatshrink tube is quite low. Using a uniformly weighted template, i.e. b = 0, the maximumcorrelation score is reached at a background edge (see Figure 4.21(a) and (c)). In thiscase, the tube would be measured larger than it really is. On the other hand, withan enhancement of the template ends, the tube edge results in a larger score than thebackground edge leading to a correct detection as can be seen in Figure 4.21(b) and (d).The enhancement of the template ends is motivated in transparent tube characteristics.For black tubes, b = 0 describes the response of the SOBEL X operator best. However,there is no disadvantage if using the same weighting coefficient as for transparent tubes.Due to the strong contrast of black tubes, the curvature and size of the template are thedominant factors influencing the matching results.Template Rotation The templates generated by Equation 4.14 are symmetric along they-axis with respect to the template center. Thus, the ends of the template lie always onone line perpendicular to the x-axis. In the ideal case, the edge response of a heat shrinktube has the same characteristic. In practice, however, a tube can be slightly angularwithin the guide bars, or the tube edge might be cut skew. In both cases the strong edgeresponses at the ends do not have to lie on one line perpendicular to the x-axis as in the


4.5. MEASURING POINT DETECTION 85(a)(b)(c)Figure 4.22: (a) Edge response of an angular oriented (transparent) tube edge. The characteristicpeaks at the ends of transparent tube edges do not have to lie on one line perpendicularto the x-axis. The red line visualizes the slight angular orientation of the tube edge. (b) Exampledetection result with k = 1 orientations. (c) Corresponding result with k = 3 orientations.template. Figure 4.22(a) visualizes the edge response of a slightly angular tube edge of atransparent tube (left side). In such a situation no template will fit the edge perfectly. Thiscan be critical if the edge contrast is poor. In this case, as mentioned before, the strongerweighting of the template ends helps to support a match at the real tube boundary insteadof at a background edge. With an angular tube edge, a symmetric template can not beshifted over the image in a way it matches both edge ends. Thus, the cross-correlationscore is significantly smaller and the probability increases that a background edge yieldsa larger score.A little rotation of the template can overcome this problem. Therefore, the bank oftemplates is extended by k − 1 rotated versions of each template. It turned out that it issufficient to rotate each template by ±2 degrees to cover the range of expected deviationsfrom the ideal symmetric model. Thus, k = 3 has been used throughout the experiments.It is assumed that larger angular deviations can not occur due to the guide bars.Model Knowledge Optimization The number of templates to be checked each time onthe left and right side increases with the number of rotations. Instead of 2 × 10 templates


86 CHAPTER 4. LENGTH MEASUREMENT APPROACHcurvature5 x 10 34.543.532.521.510.500 50 100 150 200 250 300 350 400xcurvature0.511.522.533.544.55350 400 450 500 550 600 650 700 7500 x 10 3 x(a) Left tube side(b) Right tube sideFigure 4.23: Curvature of best matching template depending on the x-position of the match.one has to consider 2 × 10 × 3 templates if k = 3. Since correlation is an expensiveoperation, the processing time increases significantly even if the local ROIs are relativesmall. It turned out that not more than 15 templates can be checked at each side withoutskipping frames at a frame rate of 50fps at an AMD Athlon 64 FX-55 processor with 2GBRAM.One thinkable optimization is to reduce the curvature resolution, i.e. quantize the samerange of curvatures to ≤ 5 templates at each side. Obviously this reduces the accuracy ofthe edge localization and is no satisfying solution in this application.Instead one can apply model knowledge to exclude several curvatures depending onthe horizontal image position. It can be assumed that the curvature is maximal at theimage boundaries and decreases toward the image center. Real sequences support thisassumption. Figure 4.23 shows the occurrence of different curvatures with respect to x.The data was acquired over several sequences including transparent and black tubes. Itturns out that the curvature decreases linearly within a certain band. The upper and lowerboundary of this band determine which curvatures can be excluded at a given position.The range distance of curvatures d ψ at a position x is defined as:d ψ (x) =ψ max (x) − ψ min (x) (4.16)where ψ max (x) andψ min (x) are the maximum and minimum curvature occurring atthis position. d ψ is the average range distance over all x. This range must be checkedeach time and is covered by n templates. In practice n = 5 is used, since as mentionedbefore the maximum number of templates that can be processed with the given hardwarein real-time is 15 (in addition to all further processing that is needed), and 5 curvatures ×3 rotations = 15 templates to be checked each frame at one tube side. To yield the desiredresolution over the whole range of curvatures, the total number of curvatures N ψ,total iscomputed as follows:N ψ,total = n(ψ max − ψ min )d ψ(4.17)


4.5. MEASURING POINT DETECTION 87where ψ max and ψ min indicate the overall maximum and minimum curvature a templatecan have. Hence, one has to compute N ψ,total × k templates for each side. This can bedone in a preprocessing step to reduce the computational load. During inspection one hasto determine which templates have to be checked at a given position defined by the centerof the local ROI around a predicted tube edge. For an efficient implementation a look uptable (LUT) is used for this task.4.5.4. Subpixel AccuracyThe maximum accuracy of the template based edge localization so far is limited by thediscrete pixel grid. The templates are shifted pixelwise within the local ROIs to find theposition that reaches the maximum correlation score. Following the assumptions of tubesunder perspective (see Section 4.2.3) the measuring is performed between the most outerpoints of the convex tube edges.The way the templates are defined the template center corresponds always to the mostouter point of the generated ridge. This is consistent to template rotation, since therotation is performed around the template center. In the special case that the template isnot curved, the template center is still the valid measuring point. With the knowledge ofthis point within the template and the position where this template matches best in theunderlying image, the position of the measuring point in the image can be easily computed.However, pixel grid resolution is not accurate enough in this application. For exampleone pixels represents about 0.12mm in the measuring plane Π M in a typical setup for50mm tubes. The allowed tolerance for 50mm tubes is ±0.7mm. As a rule of thumb forreliable results, the measuring system should be as accurate as 1/10thofthetolerance,i.e. 0.07mm in this example. To reach that accuracy one has to apply subpixel techniquesto overcome the pixel limits.Figure 4.24(a) visualizes the results of the cross-correlation of an image ROI around theright boundary of a transparent tube with the template that yields maximum score. Themaximum is located at position M max =(19, 5). These coordinates refer directly to theedge position in the image, since the template function is known and therefore the exactlocation of the template ridge.The real maximum that describes the tube edge location most accurate may lie in betweenof two grid positions. With respect to the measuring task, the edge has to bedetected as accurate as possible. Interpolation methods have been introduced in Section2.3.4 to overcome the pixel grid limits in edge detection. The same can be applied atthis stage to the template matching results.Cubic spline interpolation is used to compute the subpixel maximum within a certainneighborhood around the discrete maximum. Cubic splines approximate a function basedon a set of sample points using piecewise third-order polynomials. They have the advantageof being smooth in the first-derivative and continuous in the second derivative, both withinan interval and its boundaries [53].The interpolation is performed only with respect to the x direction, since this is themeasuring direction. A subpixel location with respect to y has only a marginal effect onthemeasurements.Ideally,themeasuringpointsontheleftandrightsidehavethesamey value. Assuming the real maximum location is displaced by maximal 0.5 pixels at each


88 CHAPTER 4. LENGTH MEASUREMENT APPROACH0.80.60.40.20-0.2-0.4-0.6543210302520151050(a)0.8samplescubic spline interpolation1samplescubic spline interpolationmaximummaximum0.60.80.40.60.20.400.2-0.20-0.4-0.2-0.60 5 10 15 20 25 30(b)-0.410 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15(c)Figure 4.24: (a) Cross-correlation results of an image patch around the right boundary of atransparent tube and the best scoring template. The maximum is located at position (19, 5).(b) Cubic spline interpolation in a local neighborhood around the maximum. In this case, theinterpolated maximum is equal to the discrete position. (c) Matching results of a differentimage. Here, the interpolated subpixel maximum differs from the discrete maximum and canbe found at x =12.2.


4.6. MEASURING 89side of the tube, the worst-case displacement is 0.5 at one side and −0.5 at the other sideleading to a total displacement of 1. A straight line connecting the two measuring pointsin an Euclidean plane is slightly longer than the distance in x. Following Pythagoras’theorem the maximum expectable error due to a vertical inaccuracy is:error y = √ l 2 +1− l (4.18)where l is the pixel length between the left and right measuring point. With respect tothe definition of the camera’s field of view and the image resolution, the length of a tubeis about 415 pixels in an image. In this case, the worst-case error is about 0.0012 pixel.Assuming one pixel represents 0.12mm (a typical value for 50mm tubes) this correspondsto an acceptable error of 0.14µm which is far beyond the imaging capabilities of the cameraused (each sensor element has a size of about 8.3 × 8.3µm).Other than in the vertical direction, a subpixel shift of the best matching templateposition in horizontal direction has a significant influence on the length measurementresults. Again, assuming a maximum error of 0.5 pixels if discrete pixel grid resolution isused, the total error at both sides sums up to 1 in worst-case. If one pixel corresponds to0.12mm as in the example above, this means the measuring system has an inaccuracy ofthe same length purely depending on the edge localization. Obviously, this error dependson the resolution of the camera and can become even worse if one pixels represents a largerdistance.The interpolation considers five discrete points: The maximum matching position M maxand the two nearest neighbors left and right to M max in x-direction respectively. InFigure 4.24(b), the interpolation results of the local neighborhood around the discretemaximum of Figure 4.24(a) are drawn into the plot of the match profile at y =5. Itshows the interpolated values describe the sampled values quite well. In this example, theinterpolated subpixel maximum equals the discrete maximum. This does not always haveto be the case as can be seen in Figure 4.24(c). Here, the discrete maximum is located atx = 12, whereas the subpixel maximum lies at x =12.2. In the first case, the neighborpixels of the maximum yield almost equal results at both sides. On the other hand in thesecond example, the right neighbor of the maximum is significantly larger than the leftone. This explains the shift of the subpixel maximum toward the right. The precision ofthe subpixel match localization is 1/10 pixel. Mathematically, much higher precision ispossible,butthesignificanceofsuchresultsisquestionablewithrespecttotheimagingsystem and noise, and increases the computational costs unnecessary.4.6. MeasuringThe result of the template matching are two subpixel positions indicating the left and rightmeasuring point of a tube. This section introduces how a pixel distance is transformedinto a real world length and how the measurements of one tube are combined. Therefore,a tracking mechanism is required that assures the correct assignment of a measurement toa particular tube. This means, one has to detect when a tube enters or leaves the visualfield of the camera.


90 CHAPTER 4. LENGTH MEASUREMENT APPROACH418417.5MeasurementsPolynomial Fit1.41.2Perspective Correction Function418417.5Corrected MeasurementsMeanlength [pixel]417416.5416Correction [pixel]10.80.60.4Length [pixel]417416.5416415.50.2415.54150 50 100 150 200 250 300 350 400x(a)00 50 100 150 200 250 300 350 400x(b)4150 50 100 150 200 250 300 350 400x(c)Figure 4.25: Perspective correction. (a) The measured length varies depending on theimage position in terms of the left measuring point. Due to perspective the length of one tubeappears larger at the image center than at the image boundaries. The effect of perspectivecan be approximated by a 2nd order polynomial. (b) The correction function computed fromthe polynomial coefficients. (c) The result of the perspective correction.4.6.1. Distance MeasureThe distance between the two measuring points p L and p R (see Section 4.2) is computedover the Euclidean distance. Thus, the pixel length l ofatubeisdefinedasfollows:l = √ (p R − p L ) 2 (4.19)where l is expressed in terms of pixels. In the following, l(x) denotes the pixel length ofa tube at position x where x = x pL , i.e. the position of a measurement is defined by thex-coordinate of the left measuring point.4.6.2. Perspective CorrectionFigure 4.25(a) shows the measured pixel length l(x) of a metal reference tube (gage) atdifferent image positions. The sequence was acquired at the slowest conveyor velocity.In the ideal case l should be equal independent of the measuring position. However, themeasured length is smaller at the boundaries and maximal at the image center due toperspective. This property is consistent between tubes. To approximate the ideal case, aperspective correction can be applied to the real measurements. Mathematically this canbe expressed as:l cor (x) =l(x)+f cor (x) (4.20)where l cor is the perspective corrected pixel length, and f cor a correction function. Theperspective variation in the measurements can be approximated by a 2nd order polynomialof the form:f(x) =c 1 x 2 + c 2 x + c 3 (4.21)where the coefficients of the polynomial c i have to be determined in the teach-in stepby fitting the function f(x) to measured length values l(x) in least-squares sense. Then,the correction function f cor canbecomputedas:


4.6. MEASURING 91f cor (x) =−(c 1 x 2 + c 2 x)+c 1 s 2 + c 2 s (4.22)where s is the x-coordinate of the peak of f(x) withs = −c 2 /(2c 1 ), i.e. the point wherethe first-derivative of f(x) is zero. Thus, f cor is the 180 ◦ rotated version of f(x) whichisshifted so that f cor (s) = 0 as can be seen in Figure 4.25(b).This function applied to the measurements has the effect of all values being adjustedto approximately one length l(s). The corrected length values l cor (x) areshowninFigure4.25(c). As one can see, the mean value over all measurements describes the datamuch better after perspective correction.To reduce the computational load the correction function is computed only once foreach position at discrete steps and stored in a look up table for fast access.4.6.3. Tube TrackingAssuming a sufficient frame rate, one tube is measured several times at different positionswhile moving through the visual field of the camera. One constraint in Section 4.2.2regarding the image content states that only one tube is allowed to be measurable at onetime. The question is whether the current measurement belongs to an already inspectedtube or if there is a new tube in the visual field of the camera. Since there is no externaltrigger, this task has to be solved by the software.Consecutive tubes appear quite equal in shape, size, or texture (especially black tubes).Itisdifficultuptoimpossibletofindreliablefeaturesinformofanuniquefingerprintthat can be used to distinguish between tubes. In addition the extraction and comparisonof such fingerprints would be computational expensive. Standard tracking approachessuch as Kalman filtering [24] or condensation [8] are also not suited in this particularapplication, since such approaches are quite complex and are worthwhile only if an objectis expected to be in the scene over a certain time period. At faster velocities, however, atube is in the image for about 4-7 frames only.Since processing time is highly limited, it is a better choice to develop fast heuristicsbased on model-knowledge that replace the problem of tube tracking by detecting whena tube has left the visual field. Therefore, the following very fast heuristics have beendefined:1. Backward motion2. TimeoutBackward motion Since the conveyor moves always in one direction (e.g. from left toright in the image), it is impossible that a tube moves backward. Thus, if the horizontalimage position of the tube at time t is smaller than at time t − 1(i.e. thetubewouldhavemoved further to the left), this can be used as indicator that the current measurementbelongs to the next tube. The position of a tube can be defined as the x-coordinate ofthe left measuring point. Hence with the image content assumption the tube measured attime t − 1 has left the visual field if x pL (t)


92 CHAPTER 4. LENGTH MEASUREMENT APPROACHTimeout The backward motion heuristic assumes a tube has passed the visual field ofthe camera when the next tube is measured for the first time. This requires a successorfor each tube within a certain time period. With respect to the blow out mechanismit is important that the good/ bad decision is made quickly, since the controller (seeSection 3.4) must receive the result before the tube has passed the light barrier. Thus, atimeout mechanism is integrated. If no new tube arrives for more than ∆t frames, it isassumed that the previously measured tube has passed the measuring area and the totallength can be computed. In practice, ∆t should be oriented on the average number of pertube measurements and the distance between measuring area and light barrier.4.6.4. Total Length CalculationOncethereisevidencethatatubehaspassedthevisualfieldofthecamera,thesinglemeasurements have to be combined to a total length. Let m i denote the number ofmeasurements assigned to tube i, andl j (i) thepixellengthofthejth measurement (0


4.7. TEACH-IN 93inthemeasuringplaneΠ M that can be represented by one pixel in the image plane. Thetotal length in mm L total of tube i canbecomputedasfollows:L total (i) =l total (i)f pix2mm (4.27)The length L total is used for the good/bad classification whether a tube meets the allowedtolerances. This can be formalized to:{ GOOD if |Ltotal (i) − LC(i) =target |


94 CHAPTER 4. LENGTH MEASUREMENT APPROACHand background afterward. It is computed dynamically based on the regional mean ofthe profile and a constant factor α peak (see Equation 4.9). Although this parameter isassumed to be constant it has to be trained once with respect to the conveyor belt used.The teach-in of this parameter is very simple and intuitive. The visual system is set toinspection mode, i.e. it is started as for standard measuring. The conveyor is empty, butmoving. The operator can adjust α peak online starting at a quite low value. This valueis slightly increased as long as the system detects tubes (ghosts) where actually no tubesare. Until now this procedure has to be performed manually, but one could think of anautomated version to reduce the influence of a human operator which is always a sourceof errors.To ensure the threshold has not become too large, several tubes are placed on theconveyor. If the system is able to successfully detect all tubes (detection does not meanthe length has to be computed correctly in this context), the profile threshold factor isassumed to be trained sufficiently. If the conveyor belt is not uniformly translucent, i.e.the overall image brightness changes significantly over time, one has to assure that thesystem is able to detect a tube both at the brightest and at the darkest region of the belt.4.7.3. Perspective Correction ParametersAs introduced in Section 4.6.2 perspective effects in the measuring data can be reducedusing a perspective correction function f cor (x). This function has two parameters c 1 andc 2 that have to be learned in the teach-in step from real data.One intuitive method to do this is to measure a tube at a very slow conveyor velocity.The result is a set of pixel length measurements (see Figure 4.25(a)) at almost everyposition in the image. Then, the parameters of a second order polynomial f(x) =c 1 x 2 +c 2 x + c 3 can be computed using nonlinear least-squares (NLLS) methods. In this case, astandard Levenberg-Marquardt algorithm [53] is used.The resulting parameters c 1 and c 2 can be directly inserted into Equation 4.22 to computef cor (x).For robust results this procedure can be repeated several times and the final parameterset is averaged. Alternatively one could first acquire measurements of several tubes andfit the correction function to the total data.4.7.4. Calibration FactorThe most important parameter to be trained in the teach-in step is the calibration factorthat relates a length in the image to a real world length in the measuring plane Π M .Thisfactor has been introduced as f pix2mm . The idea is to learn the calibration factor basedon correspondences between measurements and ground truth data.In an interactive process the operator places a tube of known length onto the movingconveyor. The velocity of the conveyor is set to production velocity, i.e. the velocity wherethe tubes will be measured later. When the tube reaches the visual field of the camerait is measured with the described approach, but at pixel level only. Once the tube hasleft the measuring area, the total pixel length is computed and the user is asked to enterthe real world length of this tube into a dialog box. Again the input device is a standardkeyboard in the prototype version of the system.


4.7. TEACH-IN 95The pair of a pixel length l(i) and a real world reference L(i) can be used to computethe ideal factor f pix2mm (i) thatconvertspixelsintomm for a measurement i as follows:f pix2mm (i) = L(i)(4.29)l(i)This procedure has to be repeated several times for different reference tubes. Finally,the estimated calibration factor is computed analog to Equation 4.25 using a k-outlierfilter before averaging:f pix2mm =N−k∑j=0f ′ pix2mm(j) (4.30)where k is the number of outliers, N the number of iterations, and f pix2mm ′ indicates thesingle calibration factors sorted by the squared distance to the mean in ascending order.The median could be also used instead of averaging.The root-mean-square error at iteration i betweentheknownrealworldlengthsandthelengthscomputedbasedontheestimatedcalibrationfactorcanbeusedasmeasureof quality.∑Err(i) = √ i (L(j) − l(j)f pix2mm ) 2 (4.31)j=1If the error is low, this can be used as indicator that the learned calibration factor isa good approximation of the ideal magnification factor that relates a pixel length in theimage into a real world length in the measuring plane Π M without any knowledge on thedistance between Π M and the camera.In practice, the learning of the calibration factor is an interactive process. One candefine a minimum and maximum number of iterations N min and N max respectively. OnceN min correspondences have been acquired, f pix2mm and Err(i) are computed for the firsttime. The operator continues the procedure as long as the calibration at iteration i +1does change more than a little epsilon compared to iteration i. This means the learningcan be stopped if |Err(i +1)− Err(i)|


96 CHAPTER 4. LENGTH MEASUREMENT APPROACH


5. Results and Evaluation5.1. Experimental DesignThere are several parameters influencing the measuring results both in the hardware setupand in the vision algorithms. To yield meaningful results, it is important to vary not morethan one parameter within the same experiment. In the following the parameters that aretested as well as the evaluation criteria and the strategies used are proposed.5.1.1. ParametersThe different parameters of the system can be grouped into four main categories includingtube, conveyor, camera and software respectively. Table 5.1 summarizes the mostimportant representatives of each category.Obviously, there are much more parameters which have been described in the previouschapter that theoretically fall in the last category. However, most of these parametersdo not have to be changed (e.g. the number of profile scanlines or the local ROI width).The corresponding value assignments have been determined empirically at representativesequences and are summarized in Table 5.2.α peak =4.0 has been determined in a teach-in step as proposed in Section 4.7.2 andyields best results for transparent tubes with the conveyor belt and the illumination used.This assignment does also cover black tubes, although the threshold could be much largerin that case. As long as the conveyor belt is not changed and the amount of dirt onthe conveyor does not change significantly, the detection sensitivity does not have to bere-initialized each time.A timeout period of ∆t = 5 frames for the tube tracking (see Section 4.6.3) has beenused throughout the experiments, which is a good compromise between the number ofexpected per tube measurements and the distance to the light barrier.Approximately 1/4 of all measurements (rounded to the next integer value) are notconsidered for the total length computation with α outlier =0.25 to eliminate outliers inthe single measurements as introduced in Section 4.6.4. The same value is used for theoutlier filter in the teach-in step (see Section 4.7.4)The teach-in of the calibration factor f pix2mm (see Section 4.7.4) terminates if the rootmean square error does not change for more than ɛ =0.0001 between two iterations.Since it is still very complex to test all permutations and assignments of the remainingparameters, one has to make compromises in the experimental design. Therefore, someof the parameters listed above have been adjusted before the experiments to meet theassumptions made in Section 4.2. This includes the guide bar distance as well as theillumination (fiber optical back light setup through the conveyor belt) and all cameraparameters, i.e. lens, working distance, exposure time and F-number. For all experimentswith 50mm tubes a 16mm focal length lens at a working distance of approximately 250mmis used. The shutter time has been adjusted to 1.024ms which is a good compromise97


98 CHAPTER 5. RESULTS AND EVALUATIONCategoryTubeConveyorCameraSoftwareParameterColorLengthDiameterVelocityTube spacingGuide bar distanceLensWorking distanceExposure timeF-numberProfile peak threshold τ peak (sensitivity)Number of templates (scale, orientation, curvature)Perspective correctionCalibration factorTable 5.1: Overview on different test parametersParameter Category Description Value SectionN scan Profile Analysis Number of 11 4.4.1scanlinesK smooth Profile Analysis Smoothing 19 4.4.2kernel sizeα peak Profile Analysis Peak threshold 4.0 4.4.2factorW ROI Edge detection Local ROI 15 4.4.3widthγ Template Generation Template 0.95 4.5.3height ratioR ψ, right Template Generation Curvature [-0.005, 0] 4.5.3range rightR ψ, left Template Generation Curvature [0, 0.005] 4.5.3range leftχ Template Generation Curvature resolution0.0005 4.5.3b Template Generation Height weighting3 4.5.3coefficientk Template Generation Number of rotations3 4.5.3∆t Tube Tracking Time out period5 4.6.3α outlier Total Length Outlier factor 0.25 4.6.4ɛ Teach-In Allowed calibration0.0001 4.7.4errorTable 5.2: Constant software parameter settings throughout the experiments.


5.1. EXPERIMENTAL DESIGN 99between light efficiency and motion blur effects. This shutter time requires a small F-number of 1.4 to yield sufficient bright images.In all experiments it is assumed that the system is calibrated correctly, the radial distortioncoefficientsareknownandateach-instephasbeenperformedtolearnfpix2mm .Inaddition, the perspective correction function has been determined before each experimentto compensate for perspective distortions.5.1.2. Evaluation CriteriaThere are several criteria that can be used to compare and evaluate the results of differentexperiments. These can be classified into quantitative and qualitative criteria.Quantitative CriteriaTotal Detection Ratio The system must exactly detect the number of tubes that passthe visual field of the camera. Formally, this can be expressed in the following score Ω total :Ω total = N detected(5.1)N totalwhere N detected indicates the number of detected tubes and N total the total numberof tubes respectively. Ω total = 1 is a necessary but not sufficient criterion for a correctworking inspection system.Per Tube Measurements The average number of single measurements for each tubedepends mainly on the velocity of the conveyor and the camera frame rate. If N tubeshave been measured, the mean number of per tube measurements can be computed as:Ω PTM = 1 Nwhere m i isthenumberofsinglemeasurementsoftheith tube.N∑m i (5.2)i=1False Positives/ False Negatives Each tube T can be classified into one of the threegroups G 0 (good ),G − (too short), and G + (too long) ifmeasuredmanually. G 0 is definedby the target length and the allowed tolerance for this length. It contains all tubes thatmeet the tolerance in the real world. G − and G + include all tubes of a real world lengththat lie below the lower or above the upper tolerance threshold respectively.In the same way, each tube can be categorized into one of the three groups G ′ 0 , G′ −,orG ′ + based on the measured length by the visual inspection system. In the ideal case, thisthree groups are equal to the corresponding ground truth classifications, i.e. G ′ 0 = G 0,G ′ − = G − ,andG ′ + = G 1 + .In practice, however, the measurements are biased by many factors like perspectiveerrors, curved tubes, skew tube edges, noise, motion blur, or failures in measuring pointdetection. In addition, as will be introduced in Section 5.1.3, the manually acquired ground1 Theoretically, a fourth group U for unsure can be defined including all tubes that could not be detectedat all. These tubes have to be handled by different mechanisms as will be discussed in later sections


100 CHAPTER 5. RESULTS AND EVALUATIONtruth data has also a certain variance. Thus, the distributions measured by humans anda machine vision system may differ. This gets critical if two distributions intersect.Tubes that are actually too short or too long, but are measured to be within the toleranceare denoted as false positives (FP). On the other hand, tubes of an allowed length can bewrongly classified as outlier and are denoted as false negatives (FN). More mathematically,false positives and false negatives can be defined as follows:FP = {T |T ∈ G ′ 0 ∧ T /∈ G 0 } (5.3)FN = {T |T /∈ G ′ 0 ∧ T ∈ G 0 } (5.4)In terms of system evaluation, the following measures can be used:Ω FP = N FPN total(5.5)Ω FN = N FNN total(5.6)where N FP and N FN indicate the number of false positives and false negatives respectively.Both the false positive ratio Ω FP and the false negative ratio Ω FN should be zeroin the optimal case. As already discussed in the introduction, Ω FP is more critical thanΩ FN , since it is less bad to sort out a good tube than delivering a failure to the customer.Performance The performance of the system can be evaluated with respect to the averageprocessing time that is needed to analyze a frame:Ω TIME = 1 MM∑t i (5.7)where M is the number of frames considered and t i represents the processing timeof frame i. Ω TIME is expressed in terms of ms/frame. This measure can be used todetermine the maximum possible capture rate. Skipped frames indicate that the cameracaptures more frames than the system is able to process.i=1Qualitative CriteriaStandard Deviation Per Tube The multi-image measuring approach is based on theidea, that more robust measuring results can be reached if each tube is measured severaltimes. In the ideal case, all measurements should yield the equal length value. In practice,however, the single measurements can differ. The standard deviation σ tube (i) canbeusedas an indicator of how much these measurements vary. It is computed as:σ tube (i) = √ 1 ∑m i (2l j (i) − l(i))(5.8)m i − 1j=1


5.1. EXPERIMENTAL DESIGN 101where l j (i) indicates the length of the jth single measurement of tube i, l(i) themeanover all single measurements of this tube, and m i is the total number of single measurementsof tube i. σ tube is expressed in terms of pixels.A large per tube standard deviation represents a uncertainness in the results. In thiscase, the mean describes the data only roughly. If the uncertainness is too large, it maybe better to blow out the particular tube, since the probability of a false positive decisionincreases proportional with the standard deviation.Sequence Standard Deviation The standard deviation of a sequence σ seq is computedanalogue to σ tube , but not with respect to the single measurements of one tube, but to thecomputed total length l total of N tubes:σ seq = √ 1 N∑ ( ) 2ltotal (i) − l total (5.9)N − 1i=1where l total is the mean over all total measurements. Finally, all measurements can berepresented by a Gaussian distribution function G(x) as:(1 (x − µseq ) 2 )G(x) = √ expσ seq 2π 2σseq2 (5.10)where µ seq = l total . The production is most accurate if the distance between the giventarget length and the mean of this distribution is small.Ground Truth Distance The difference between the vision-based length measurementresults and the manually acquired ground truth data can be seen as relative error assumingthe ground truth data is correct. Interesting are the minimum and maximum ground truthdistance (GTD) of a sequence of tubes defined as:GT D min = min {(l total (i) − l gt (i)) | 1 ≤ i ≤ N} (5.11)GT D max = max {(l total (i) − l gt (i)) | 1 ≤ i ≤ N} (5.12)where l total (i) is the computed total length of tube i, l gt (i) the corresponding groundtruth length, and N the number of tubes considered. If the mean ground truth distanceGT D is approximately zero, the deviation is distributed equally. Otherwise, if GT D > 0,the measured length is predominantly larger than the ground truth measurement. Accordinglyif GT D < 0, the opposite is valid. In both cases, the systematic error indicatesthe system is probably not calibrated correctly.Root Mean Square Error (RMSE) The root mean square error measure is used tocompare the measurements of the visual inspection system to manually acquired groundtruth data over a sequence as follows:RMSE = √ 1 N∑(l total (i) − l gt (i)) 2 (5.13)Ni=1


102 CHAPTER 5. RESULTS AND EVALUATIONFigure 5.1: Measuring slide used for acquiring ground truth measurements by hand.with l total (i), l gt (i) andN as defined before. A small root mean square error indicates themeasurements are close to the ground truth data.5.1.3. Ground Truth MeasurementsThe acquisition of ground truth data is important for evaluating the vision-based inspectionsystem with respect to human measurements. For this purpose a special digitalmeasuring slide as can be seen in Figure 5.1 has been used. The precision of this deviceis up to 1/100mm.However, there is a significant deviation in human measurements, since heat shrinktubes are flexible. Depending on the force the human operator applies to the measuringslide, the measured length gets smaller or larger. This variation has been investigatedempirically.12 sample tubes of different diameter (6, 8 and 12) are selected as test set (see Table5.3). One half of the samples are black, the other half transparent tubes. For eachcombination of color and diameter, one tube has a length of approximately 50mm and onewas manipulated, i.e. slightly larger or shorter than the tolerance allows for.No. color diameter mean length1 Transparent 8 49.952 Transparent 6 49.773 Transparent 12 49.824 Transparent 8 48.195 Transparent 6 51.336 Transparent 12 51.887 Black 8 50.988 Black 6 50.199 Black 12 50.0010 Black 6 50.8411 Black 8 49.6612 Black 12 51.56Table 5.3: Test set used to determine the human variance in measuring.The results are shown in Figure 5.2. In a first experiment, the variance of a singleperson is investigated denoted as intra human variance. Each tube in the test set hasbeen measured 10 times by the same person with the goal to be as precise as possible.


5.1. EXPERIMENTAL DESIGN 10352.5Intra Human Variance52.5Inter Human Variance525251.551.5Ground Truth Length [mm]5150.55049.5Ground Truth Length [mm]5150.55049.5494948.548.5480 2 4 6 8 10 12 14Tube480 2 4 6 8 10 12 14Tube(a)(b)Figure 5.2: Intra and inter human variance for the test set in Table 5.3 under ideal laboratoryconditions. The error bars indicate the maximum and minimum length for each of the 12 tubesas well as the mean value of the measurements once for one person (a) and once for 10 persons(b). The average inter human variance is slightly larger compared to the intra human variance.Theerrorbarsindicatethemaximumandminimumlengthaswellasthemeanvalueofall measurements. The computed mean standard deviation is 0.078mm.In a second experiment, the inter human variance is determined. Therefore, 10 personshave been asked to measure the same test set again as precise as possible. The inter humanvariance is slightly larger than the intra human variance (see Figure 5.2(b)). In this case,the mean standard deviation was observed to be 0.083mm.Furthermore, it is important to state that the manual measurements for the groundtruth data have been acquired very carefully with elevated concentration under laboratoryconditions and with the aim to be as precise as possible using the digital measuring slide(see Figure 5.1). Less than 5 tubes can be measured within one minute at this precision. Atproduction, the sample measurements are performed with a standard sliding caliper andat a much higher rate. There is a definitively tradeoff between accuracy and speed. Theexpected individual measuring error at production is much larger. Furthermore, factorslike tiredness or distraction can significantly increase the inter and intra human measuringvariance.The accuracy and precision of the visual inspection system, however, should be evaluatedwith respect to the maximum possible accuracy humans can reach with the givenmeasuring slide under ideal conditions. Throughout this thesis, manual ground truth measurementsalways refer to the ideal, laboratory condition measurements. One has to keepin mind that there is still a certain unsureness in these measurements. The real absolutelength of a tube can not be determined exactly.For the following experiments, all tubes have been measured three times to reduce theinfluence of the human variance. The mean of the three measurements is taken as groundtruth reference. All measurements are stored in a database and each measured tube islabeled by hand with a four digit ID using a white touch-up pen.


104 CHAPTER 5. RESULTS AND EVALUATIONFigure 5.3: At velocities > 30m/min larger sequences of tubes with a small spacing have tobe placed on the conveyor using a special supply tube.5.1.4. StrategiesOnline vs Offline Inspection There are two main strategies for evaluation of the inspectionsystem. The first strategy analyzes the tubes online, i.e. in real-time on the conveyor.This includes the tube localization, tracking, measuring as well as the good/bad classification.The results are stored in a file and can be further processed or visualized afterward.This is closely related to the application at production. The drawback of this approach isthat if there is some interesting or strange behavior observed in the resulting data, it isdifficult to localize the origin.Therefore, the second evaluation strategy is based on an offline inspection. This meansa sequence of tubes is first captured into a file at the maximum frame rate that can beprocessed online. Then, the sequence can be analyzed repetitive with different sets ofparameters or methods. This is a significant advantage if one wants to compare differenttechniques or parameter settings.In the following experiments, both strategies will be applied.Tube Placement The prototype setup in the laboratory has one significant drawback.The tubes to be inspected have to be added manually to the conveyor, since there is nosecond conveyor from which the tubes fall onto continuously like in production. The size ofthe conveyor allows for about 21 tubes of 50mm length with a spacing of 10mm in between.If all tubes are placed on the inactive conveyor it takes some time until the desired velocityis reached. Therefore, at faster velocities, the first tubes pass the measuring area with aslower velocity leading to unequal conditions between measurements.Hence, either less tubes have to be placed on the conveyor (starting further away fromthemeasuringarea)orthetubeshavetobeplacedontotheconveyorwhileitisrunningat the desired velocity. The later is hardly possible for a human without producing largespacings between two consecutive tubes. Instead a certain supply tube of about 1.30mlength, with a diameter slightly larger than the current tube diameter, can be used asmagazine for about 25 tubes of 50mm length (see Figure 5.3). The supply tube is placedat steep angle at the front of the conveyor (in moving direction). If the conveyor is notmoving, the tubes are blocked and can not leave the supply tube. On the other hand iftheconveyorismoving,thebottomtubeisgrippedbythebeltandcanleavethesupplytube through a bevel opening in moving direction. If the velocity of the conveyor is fast


5.2. TEST SCENARIOS 105enough, the time until the next tube in the supply tube is gripped by the belt is sufficient toproduce a spacing. Experiments have shown that the supply tube works only for velocities> 30m/min. Otherwise it is possible that two consecutive tubes are not separated.Thus, one has two disjunctive methods to fit a conveyor with tubes. One is working wellfor lower velocities, the other for faster ones. In both cases the maximum number of tubesis limited. Therefore, larger experiments have to be partitioned over several sequences.Test Data Since it is not worthwhile to manually measure thousands of tubes as groundtruth reference, the number of tubes that can be compared to such reference lengths islimited. However, it is possible to increase the number of ground truth comparisons if onerepeats the automated visual measurement of a manually measured tube. For example, onecan manually measure 20 tubes of each particular type (that number can be placed ontothe conveyor or into the supply tube at one time) and repeat the automated inspectionseveral times. From the algorithmic perspective the system is confronted with a newsituation every time, independent if there are 100 different tubes to be inspected or 5×20.In the following it is distinguished between tubes of a length that meet the given targetlength within the allowed tolerance and tubes of manipulated length falling outside thistolerance. The system must be able to separate the manipulated tubes from the properones.5.2. Test ScenariosEight test scenarios have been developed to evaluate the system. In each scenario onlyone parameter is varied, while the others are kept constant. The different scenarios areintroduced in the following.Noise Before the system is tested with respect to real data, the accuracy and precisionof the measuring approach is evaluated on synthetic images. A rectangle of known pixelsize simulates the projection of an ideal tube that is not deformed by perspective. The‘tube edges’ as well as the measuring points are detected with subpixel precision like atreal images. The resulting length in pixels must equal the rectangle width. To evaluatethe accuracy under the presence of noise, Gaussian noise of different standard deviationis added systematically to the sequences.Minimum Tube Spacing In this scenario the minimum spacing between tubes is investigatedboth for black and transparent tubes on real images. The test objects have a sizeof about 50mm within the allowed tolerance and a diameter of 8mm. The velocity of thebelt is 30m/min. Starting at sequences that allow for only one tube in the visual field,e.g. the spacing is larger than the tube length, the spacing is decreased until the detectionrate Ω total fallsbelow1,i.e.atleastonetubecouldnotbedetected.Conveyor Velocity The goal in this scenario is to investigate how accuracy and precisionof the measurements depend on the velocity of the conveyor. The focus is on four differentvelocities: slow (10m/min), medium (20m/min), fast (30m/min), and very fast (40m/min).This is the maximum velocity that can be reached at production. Currently, the productionline runs at approximately 20m/min. To test the limits of the system, even higher velocities


106 CHAPTER 5. RESULTS AND EVALUATIONup to 55m/min are tested. For all velocities > 30m/min,thetubeshavetobeplacedontothe conveyor using the supply tube.Again the inspected tube size is about 50mm in length within the allowed tolerance anda diameter of 8mm both for black and transparent tubes. The spacing in between the tubesmust be large enough following the results of the minimum tube spacing experiments.In this scenario, all evaluation criteria introduced in Section 5.1.2 are considered includinga comparison to ground truth measurements. The evaluation is performed offline.Tube Diameter If the distance between camera and conveyor belt does not change,the diameter of a tube influences the distance between the measuring plane Π M (seeSection 4.2) and the image plane. Tubes with a smaller diameter are further away andappear smaller in the image, while tubes with a larger diameter are magnified in the image.Thus, the calibration factor that relates a pixel length to a real world length in mm hasto be adapted.The test data includes transparent and black tubes with a diameter of 6, 8 and 12mmand a length of 50mm that meet the allowed tolerances. The conveyor velocity is constantat 30m/min. Again all evaluation criteria are considered and the evaluation is performedoffline.Repeatability In this scenario, a tube of known size is measured many times in a rowat a constant velocity of 30m/min. Theoretically, the system should measure the samelength each time, since one can assume the length of the tube does not change throughoutthe experiments. As mentioned before there are several parameters that can influence therepeatability in practice like a varying background.In the same experiment one can not only determine the repeatability, i.e. the precisionof the system, but also the accuracy if one does not use a heat shrink tube, but an idealtube gage. Such a gage can be made from metal with much higher precision overcomingthe human variance in measuring deformable heat shrink tubes. For comparable results,the gage should have the same shape and dimension of a heat shrink tube. Since it doesnot transmit light, a metallic gage can simulate black tubes only.The real world length of the gage is known very accurate and precise. Thus, the RMSEof the measuring results gets almost independent of errors in the ground truth data.The measurements can be best performed online, i.e. in real-time, due to the amountof accumulating data. The resulting lengths are stored in a file for later evaluation.Outlier Detection Until now, all experiments are based on test data that is known tomeet the given tolerances. In this scenario, tubes of approximately 50mm length are mixedwith tubes that are too long or too short, i.e. differ from the target length for more than0.7mm.Thepositionandthenumberoftheoutliersinasequenceisknown.Thesystemmust be able to detect the outliers correctly. Thus, the false positive and false negativerate are the main criteria of interest in this scenario.The evaluation can be performed both offline or online.Tube Length As mentioned before, the focus in this thesis is set to tubes of 50mm length.In addition it is shown that the system is able to measure also tubes of different lengthexemplary for tubes of 30 and 70mm length.


5.3. EXPERIMENTAL RESULTS 107The tolerances for these lengths differ, i.e. the 30mm tubes are allowed to deviate onlyup to 0.5mm around the target length while 70mm tubes have a larger tolerance of 1mm.The measuring precision can be directly linked to these tolerances. Accordingly the systemmust measure smaller tubes with a higher precision then larger ones.In this scenario, the accuracy and precision is evaluated based on the mean and standarddeviation of a sequence of tubes measured online that approximately meet the given targetlength. Corresponding ground truth data is available.Performance Finally, it is of interest to determine the performance of the system interms of the average per frame processing time Ω TIME . It is investigated how the totalprocessing time is distributed over the different stages of the inspection including radialdistortion compensation, profile analysis, edge detection and template matching, as wellas the total length computation and tracking.5.3. Experimental ResultsIn this section the experimental results of the different scenarios are presented and discussed.Further discussion as well as an outlook on future work is given in Section 5.4.5.3.1. NoiseThe influence of noise on the measuring accuracy is tested on synthetic sequences. Rectanglesof 200 pixels width are placed on a uniform background with a contrast of 70 graylevels between the object and the brighter background. The image size is 780 × 160, andthe sequence is analyzed like a real sequence with two differences. First, the perspectivecorrection function is disabled, since the synthetic ‘tube’ is not influenced by perspective,i.e. the width of the rectangle is constant independent of the image position. Furthermore,the dynamic selection of template curvatures based on the image position does not workas well in this scenario, since the model knowledge assumptions do not hold. Thus, inthis experiment all templates are tested at each position (computation time is not criticalhere).Gaussian noise of standard deviation σ N has been added to the ideal images, withσ N ∈{5, 10, 25}. Sample images of each noise level are shown in Figure 5.4(a)-(d).The measuring results are evaluated using the root-mean-square-error between theground truth length of 200pixels and the result of the single measurements. The resultsshow that in the ideal (noise free) case, the pixel length is always measured correctly.Under the presence of noise, the measured length varies at subpixel level. Figure 5.4(e)shows how the measurements differ in accuracy and precision under the presence of noise.The maximum deviation from the target length occurs at the largest standard deviation(σ N = 25). The RMSE results can be found in Figure 5.4(f). For sequences with onlya little amount of noise (σ N = 5) the RMSE is acceptable low with 0.122. If one pixelrepresents 0.12mm in the measuring plane, the real world error is about 1/100mm. Evenunder strong noise (σ N = 25), which is far beyond the noise level of real images, themeasuring error is 0.252pixels or 0.03mm in the example. This is still significantly belowthe human measuring variance.


108 CHAPTER 5. RESULTS AND EVALUATION(a) σ N =0 (b) σ N =5(c) σ N =10 (d) σ N =251std=0std=5std=10std=250.80.60.40.20199 199.2 199.4 199.6 199.8 200 200.2 200.4 200.6 200.8 201Length [pixel](e)(f)σ N RMSE0 05 0.12210 0.15825 0.252Figure 5.4: Accuracy evaluation of length measurements at synthetic sequences under theinfluence of noise. (a)-(d) Rectangles of known size (length = 200 pixels) simulate a tubeon a uniform background without perspective effects. Gaussian noise of different standarddeviation σ N ∈{5, 10, 25} has been added to the ideal images. (e) Gaussian distribution ofthe measurements. (f) Root mean square error (RMSE) for each noise level.


5.3. EXPERIMENTAL RESULTS 1091.05blacktransparent10.95Detection rate0.90.850.80.750 10 20 30 40 50 60Tube spacing [mm]Figure 5.5: Detection rate of black and transparent tubes depending on the spacing betweenconsecutive tubes.Thus, one can conclude the system is able to detect the synthetic tube edges very accurateeven under the presence of noise if there is a sufficient contrast between backgroundand foreground.5.3.2. Minimum Tube Spacing10 black and 10 transparent tubes are used to investigate the influence of the spacingon the detection rate. The tubes have been placed on the conveyor at an approximatelyconstant spacing. Five gap sizes are tested: 60, 30, 20, 10, and 5mm respectively. Eachload of tubes passes the measuring area five times for each gap size at a conveyor velocityof 30m/min. In this experiment the total detection rate Ω total is considered only, i.e. howmany tubes are detected by the system at least once. The results are averaged over the 5iterations.As can be seen in Figure 5.5 the detection of black tubes is uncritical indicated byΩ total = 1 until the tube spacing is less than 10mm. This means no black tube can passthe measuring area without being measured if the spacing is ≥ 10mm. The decrease at5mm gaps to Ω total =0.98 (i.e. 1 tube out of 50 is not detected) may be due to the factthat the manual tube placing can not guarantee an exact spacing of 5mm. It is likelythat the distance between two tubes has become even smaller leading to the failure. Sincethe tests have been performed online it is not possible to locate the origin of the outlier.Therefore, it has been investigated how small the gap between two black tubes must beuntil the profile analysis fails to locate the tube. The results are shown in Figure 5.6. Evena spacing of about 2mm as in (a) is large enough to reliably detect the background regionsbetween the tubes as can be also seen at the corresponding profile analysis results in (c).A gap of about 1mm, however, is too small even for black tubes. Due to perspective thepoints closer to the camera merge (see Figure 5.6(b) and (d)).Thetransparenttubesshowadetectionrateof< 1 even for the largest tested gapsize of 60. This can be explained by the much lower contrast to the background. If thesystem must able to overcome a strong non-uniform background brightness, one has tomake a larger compromise in terms of detection sensitivity. As it turns out there is noparameter setting that can guarantee that all tubes are detected independent of the gap


110 CHAPTER 5. RESULTS AND EVALUATION(a)(b)300250smoothed profilesegment boundarieslocal medianglobal meanregional meanpredicted tube boundaries300250smoothed profilesegment boundarieslocal medianglobal meanregional mean200200150150100100505000 20 40 60 80 100 120 140 160 180(c)00 20 40 60 80 100 120 140 160 180(d)Figure 5.6: Minimum tube spacing for black tubes. (a) A spacing of about 2mm is stillsufficient to locate the measurable tube correctly. (b) The detection fails if the two tubesappear to touch under perspective as on the left side. (c) Profile analysis of (a). (d) Profileanalysis of (b).size. However, the results have shown that the detection rate decreases drastically below10mm(seeFigure5.5).As the result of these experiments the minimum spacing used in the following experimentsis 10mm for black tubes and 20mm for transparent tubes.5.3.3. Conveyor VelocityThe test data in this scenario includes 17 transparent and 21 black tubes of 50mm lengthand 8mm diameter. Manual ground truth measurements of these tubes are available. Thenumber of tubes of each color is geared to the number of tubes that can be placed on theconveyor with a sufficient spacing. To increase the probability of a 100% detection rate,the spacing between two transparent tubes has to be larger than for black tubes. Eachcharge of tubes is measured 5 − 6 times at each velocity of 10, 20, 30, and 40m/min toyield a total number of > 100 measurements (based on even more single measurements)in each experiment. Thus, all tubes have to pass the measuring area many times.Before presenting the results in detail, Figure 5.7 shows an example of how the systemhas measured (a) the charge of black tubes and (b) the charge of transparent tubes at20m/min respectively. Both the single measurements per tube (indicated by the crosses) aswell as the computed total length and the corresponding ground truth length are visualized.The lengths measured by the system are quite close to the ground truth data.These results are just an example to show what kind of data is evaluated in the following.Since it is not possible to visualize longer sequences as detailed as in Figure 5.7 due to the


5.3. EXPERIMENTAL RESULTS 1115251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries5150.5Length [mm]5049.54948.5480 10 20 30 40 50 60 70 80 90 100 110 120 130 140Measurement number(a) 21 black tubes at 20m/min5251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries5150.5Length [mm]5049.54948.5480 10 20 30 40 50 60 70 80 90 100 110Measurement number(b) 17 transparent tubes at 20m/minFigure 5.7: Measuring results at 20m/min for (a) black and (b) transparent tubes. The redcrosses indicate single measurements, while the dashed vertical lines represent the boundariesbetween measurements belonging to the same tube. The averaged total length as well as thecorresponding ground truth length are also shown in the plots. All measured tubes of thissequence meet the tolerances. However, while the transparent tubes have approximately thetarget length of 50mm on average, the mean of the black tubes is slightly shifted, i.e. all tubestend to be shorter than the target length.


112 CHAPTER 5. RESULTS AND EVALUATIONv[m/min] Ω total Ω PTM σ tube GT D min GT D max GT D RMSE10 1 11.4 0.05 -0.12 0.14 0.01 0.0720 1 6.9 0.04 -0.16 0.11 -0.02 0.0730 1 4.6 0.05 -0.19 0.19 0.0 0.0740 1 3.2 0.07 -0.21 0.17 -0.01 0.0955 1 2.3 0.07 -0.16 0.16 0.01 0.08Table 5.4: Evaluation results at different conveyor velocities v for black tubes (50mm length,∅8mm). The accuracy of the measurements does not decrease significantly with faster velocitiesnor with a decreasing number of per tube measurements Ω PTM indicated by the RMSE.σ tube is the per tube standard deviation and GT D stands for ground truth distance (seeSection 5.1.2).amount of data, more comprehensive representations will be used based on the proposedevaluation criteria.Black Tubes The results of the velocity experiments with black tubes are summarizedin Table 5.4.TheblacktubesshowadetectionrateΩ total of 1 for all velocities, i.e. no tube haspassed the measuring area without being measured independent of how fast the tubes aremoved. The average number of per tube measurements Ω PTM decreases from 11.4 attheslowest velocity (10m/min) to 3.2 at the maximum possible production velocity. Even at55m/min each tube is measured at least twice. The average standard deviation σ tube ofthemeasurementspertubereachesfrom0.04 to 0.07mm, again there is only a very littlerise from the slower to the faster velocities. The absolute ground truth distance does notexceed 0.21 and measurements that are shorter or larger than the ground truth are equallydistributed indicated by the mean ground truth distance GT D that is approximately zero.As an example, the ground truth distance at 30m/min is shown in Figure 5.8(a). If thedistance is larger than 0, the manually measured length is shorter than the vision-basedmeasurement and vice versa. Due to the variance in the ground truth data it is not verylikely that the distance is zero for all values. However, the distance should be as small aspossible. If the ground truth distance is one-sided, i.e. all measurements of the systemare larger or shorter than the corresponding ground truth measurement, this indicates animprecise calibration factor. The conversion of the pixel length into a real world lengthresults in a systematical error which has to be compensated by adapting the calibrationfactor.The RMSE differs only marginally between the tested velocities. The largest RMSEis computed at 40m/min with 0.09. This value is only slightly larger than the deviation ofhuman measurements. For lower velocities it is even better with 0.07. Another indicatorof how the vision-based measurements converge to the ground truth data is the Gaussiandistribution over the sequence of all measurements. This distribution is based on themean µ seq and standard deviation σ seq (see Section 5.1.2). Figure 5.8(b) compares thevision-based distribution (solid line) at 30m/min and the corresponding ground truthdistribution (dashed line). The mean is 49.66 in both cases. σ seq is slightly larger with0.1193 compared to the ground truth with 0.1027.In terms of accuracy and precision this means the vision-based measurements of blacktubes are equally accurate compared to human measurements (laboratory conditions) and


5.3. EXPERIMENTAL RESULTS 1130.70.6ground truth distance1Measurement distributionGround truth distribution0.50.40.80.3GTD [mm]0.20.10-0.1-0.2-0.3-0.4-0.5-0.6-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105Tube number(a)0.60.40.2049 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51Length [mm](b)Figure 5.8: (a) Ground truth distance GT D in mm for black tubes (50mm length, ∅8mm)at 30m/min. (b) Gaussian distribution of all measurements compared to the ground truthdistribution.v[m/min] Ω total Ω PTM σ tube GT D min GT D max GT D RMSE10 0.99 9.6 0.06 -0.14 0.32 0.09 0.1320 0.98 5.2 0.09 -0.16 0.29 0.08 0.1130 1 3.9 0.15 -0.16 0.66 0.15 0.2040 0.97 2.4 0.18 -0.27 0.75 0.23 0.28Table 5.5: Evaluation results at different conveyor velocities v for transparent tubes (50mmlength, ∅8mm). The accuracy seems to decrease with faster velocities as can be seen at theRMSE and the mean per tube standard deviation σ tube . The number of per tube measurementsΩ PTM is smaller for transparent tubes. Due to the lower contrast it is more likely thata tube is not detected as measurable.are only marginally less precise. Furthermore, as an additional benefit, it is possible toshow that a sequence of tubes is systematically shorter than the target length (althoughstill in the tolerances). This information could be used to adjust the cutting machine untilµ seq approximates the given target length.Transparent Tubes The same experiments have been repeated with transparent tubes.The results are summarized in Table 5.5.The detection rate Ω total tends to decrease with an increasing velocity, although alltubes have been detected at 30m/min in this experiment. 3% of the tubes have passedthe visual field of the camera without being measured at 40m/min.Due to the poorer contrast of transparent tubes the probability increases that a tubecan not be located in the profile analysis step. This can be seen on the average numberof per tube measurements Ω PTM . While black tubes are measured about 11.4 timesatv = 10m/min, the transparent tubes reach only 9.6 measurements per tube at the samevelocity. At 40m/min this number decreases to 2.4. At faster velocities, e.g. 55m/min,the number of per tube measurements falls short of 1. Reliable measurements are not


114 CHAPTER 5. RESULTS AND EVALUATION0.70.6ground truth distancetube marker1Measurement distributionGround truth distribution0.50.40.80.3GTD [mm]0.20.10-0.1-0.2-0.3-0.4-0.5-0.6-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100Tube number(a)0.60.40.2049 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51Length [mm](b)Figure 5.9: (a) Ground truth distance GT D in mm for transparent tubes (50mm length,∅8mm) at 30m/min. The measurements marked by a ‘+’ are all belonging to the same tubethat reached the maximum GT D at measurement 68. As one can see it is not systematicallymeasured wrong. A poor contrast region on the conveyor belt is rather the origin for the strongdeviations from the ground truth. (b) Gaussian distribution of all measurements compared tothe ground truth distribution.possible at this velocity for transparent tubes so far and are therefore not considered inTable 5.5.The standard deviation σ tube of transparent tubes moved at 40m/min is three timeslarger than at 10m/min. This can be explained by the smaller number of per tube measurements.The ground truth distance increases also with the velocity. Especially GT Dgets conspicuously larger, i.e. the measured lengths are larger than the ground truthlength on average. This trend can also be observed at the absolute value of GT D max andGT D min . At a velocity of 40m/min the maximum ground truth distance is 0.75 whichis more than the allowed tolerance. In this context one has to keep in mind that thesevalues are only the extrema and do not describe the average distribution. This makes theground truth distance measure very sensitive to outliers. However, a large GT D valuedoes not have to mean poor accuracy automatically. On the other hand if the groundtruth distance is low in the extrema as with the black tubes in this experiment, this is anadditional indicator of high accuracy. The ground truth distance of the transparent tubesat 30m/min is shown in Figure 5.9(a). The deviations are significantly larger comparedto Figure 5.8(a).Instead of being approximately equally distributed as for the black tubes, the error oftransparent tubes seems to increase and decrease randomly, but always over a range ofconsecutive measurements. This observation can be explained by the varying backgroundintensity at back light through the conveyor belt. The periodic intensity changes influencethe transparent tubes obviously much stronger than the black tubes since the detectionquality depends mostly on the image contrast. Figure 5.10 shows how the mean imageintensity of a moving empty conveyor belt changes over time. If a tube is measured ata part on the conveyor belt that yields a poor contrast under back light, the GT D islikely to increase. Having in mind each tube passes the measuring area 6 times in thisexperiment, the probability is small that it is always measured at the same position onthe conveyor. The tube measured with the maximum GT D hasbeenmarkedintheplot


5.3. EXPERIMENTAL RESULTS 115155Mean image brightness150145140135gray level1301251201151101050 50 100 150 200 250 300 350 400 450 500Figure 5.10: Mean image intensity of a moving empty conveyor belt over time. The deviationbetween the brightest and the darkest region on the conveyor exceeds 40 gray levels and isoriginated in non uniform translucency characteristics of the belt. Example images showingthis non uniformity can be found in Figure 3.4.tas well as all other measurements belonging to this particular tube. It turns out thatthe average ground truth distance of this tube is 0.3mm which is still larger than theRMSE of the whole sequence due to the outliers. However it is shown that this tube isnot measured wrongly in general. Furthermore one can see that all neighboring tubes thatlie in the same region on the conveyor are also measured inaccurately. It is assumed thatwith a more uniform conveyor belt such deviations could be avoided.The mean over all measurements is 50.04 at 30m/min compared to 49.96 in the groundtruth. This is still very accurate. The precision of the vision-based measurements is0.15 compared to 0.09 of human measurements under ideal laboratory conditions. Thecorresponding Gaussian distributions are plotted in Figure 5.9(b).Finally, the RMSE increases with faster velocities, and the total error is larger comparedto black tubes. The lowest error was measured at 20m/min (approximately the currentproduction velocity) with 0.11. This error is still only slightly larger than the humanvariance.One can conclude that the results of the black tubes are very accurate both for slow andfast conveyor velocities. The RMSE falls even below the standard deviation of humanmeasurements. The accuracy of transparent tubes decreases with faster velocities, but isstill in a range that allows for measurements with the given tolerance specifications. Bestresults have been achieved at a velocity of 20m/min. As it turns out, all tubes meeting thetolerances in the real world (based on manual ground truth data) have been also measuredreliably to be within the tolerances by the system, i.e. Ω FN =0. Thus,notubewouldhave been blown out wrongly at any velocity.


116 CHAPTER 5. RESULTS AND EVALUATIONDiameter Ω total Ω PTM σ tube GT D min GT D max GT D RMSE6mm (B) 1 4.8 0.05 -0.40 0.29 -0.13 0.188mm (B) 1 4.6 0.05 -0.19 0.19 0.0 0.0712mm (B) 1 4.6 0.07 -0.44 0.31 -0.11 0.196mm (T) 0.92 2.8 0.18 -1.15 0.87 0.01 0.208mm (T) 1 3.9 0.15 -0.16 0.66 0.15 0.2012mm (T) 0.98 3.12 0.24 -0.69 0.67 0.07 0.20Table 5.6: Measuring results of 50mm length tubes with different diameter at a velocity of30m/min. The first two rows show to black tubes (B) and the last two rows transparent (T)ones.Figure 5.11: The thin 6mm tubes are likely to be bent. The distance between the definedmeasuring points in the image does not represent the length of the straight tube correctly.5.3.4. Tube DiameterBeside tubes of 8mm diameter as investigated in the velocity experiments, there are also 6and 12mm diameter tubes to be considered in the DERAY-SPLICEMELT series. Therefore,the test data in this scenario includes transparent and black tubes of 50mm lengthwith these diameters. The velocity is constant at 30m/min. Again more than 100 tubesare measured for each combination of color and diameter. The summarized evaluationresults can be found in Table 5.6.Black Tubes As for 8mm tubes, 100% of the black tubes both for 6 and 12mm diameterare measured by the system indicated by a score of Ω total = 1. The number of per tubemeasurements is also approximately equal with 4.8 for 6mm diameter tubes and 4.6 for12mm tubes. The per tube standard deviation σ tube is slightly larger for 12mm with 0.07compared to 0.05 at 6 and 8mm tubes. One significant difference to 8mm tubes are thelarger extrema in the ground truth distance GT D min and GT D max and the definite shiftin the average ground truth distance GT D. Values of −0.13 for 6mm and −0.11 for 12mmindicate the vision-based lengths are mostly shorter than the manual measurements.This has basically two different origins: Tubes with a diameter of 6mm are bent muchstronger than tubes of larger diameters as can be seen for example in Figure 5.11. In thiscase,bothmanualaswellasvision-basedmeasurementsaredifficult. Thelengthofatubeintheimageisdefinedasthedistancebetweentheleftandrightendofthetubeatthemost outer points of the corresponding edges. If the tube is bent, however, the distancebetween the measuring points is obviously smaller than the real length. This can be seenin the ground truth distance as well as in the resulting RMSE which is significantly largerwith 0.18 compared to 8mm tubes. Figure 5.12(a) visualizes the results of a sequence of21 black tubes at 30m/min. The bent tube in Figure 5.11 corresponds to the 10th tube inthis plot (located between measurement number 45 and 50) and is measured significantly


5.3. EXPERIMENTAL RESULTS 1175251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries5251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries515150.550.5Length [mm]50Length [mm]5049.549.5494948.548.548480 10 20 30 40 50 60 70 80 90 1000 10 20 30 40 50 60 70 80 90 100Measurement numberMeasurement number(a) black, 6mm diameter(b) black, 12mm diameterFigure 5.12: Length measurement results of black tubes with different diameter at 30m/min.The plots show only a section of the total number of measured tubes. Although the RMSEis larger both for 6 and 12mm tubes compared to the 8mm results, the measurements are stillaccurate enough to correctly detect all tubes within the allowed tolerances.shorter than the ground truth. The total results of the experiment with 6mm diameterblack tubes are shown in terms of the ground truth distance in Figure 5.13(a).Only a few tubes are measured too long while most measurements are shorter than theground truth depending on how much a tube is bent, i.e. how much it is differing fromthe assumed straight tube model. However, all tubes out of 100 are measured correctly tolie within the allowed tolerances leading to a false negative rate of Ω FN =0(Ω FP =0isimplicit since there are no outliers in the test data).While bending is no problem for black tubes with a diameter of 12mm, these tubes haveanother drawback. The larger diameter makes the tubes more susceptible to deformationsof the circular cross-section shape. This means, only a little pressure is needed to deformthe cross-section of a tube to an ellipse. These deformations occur if the tubes are storedfor example in a bag or box and many tubes lay on top of each other. The tubes used astest set have been delivered in such way. In addition, the effect is increased since mosttubes are grabbed by hand several times, e.g. to measure the ground truth distance or ifexperiments have been repeated with the same tubes. Each manual handling is a potentialsource for a deformation. With respect to the vision-based measuring results the ellipticalcross-section of a tube leads to a significant problem. In the model assumptions themeasuring plane Π M is defined at a certain distance above the conveyor belt. This distanceis assumed to be exactly the outer diameter of an ideal circular tube (see Figure 5.14(a)).The magnification factor that relates a pixel length into a real world length is valid onlyin the measuring plane. With a weak-perspective camera model it is assumed that thisfactor is also valid within a certain range of depth around this plane.For a deformed tube the measuring points in the image p L and p R do not originate inpoints that lie in the measuring plane. If the cross-section is elliptical it is most likelythat the tube will automatically roll to the largest contact area. In this case the pointsclosest to the camera will be further away than the measuring plane. Under perspectivethe resulting length in the image will be shorter. This is exactly what is observed in


118 CHAPTER 5. RESULTS AND EVALUATION0.70.6ground truth distance0.70.6ground truth distance0.50.50.40.40.30.30.20.2GTD [mm]0.10-0.1GTD [mm]0.10-0.1-0.2-0.2-0.3-0.3-0.4-0.4-0.5-0.5-0.6-0.6-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105Tube numberTube number(a) black, 6mm diameter(b) black, 12mm diameterFigure 5.13: Ground truth distance in mm of all measured black tubes with a diameter of6 and 12mm at 30m/min.the experiments. Although it is less likely, it is also possible that a tube lies on the sidewith the smaller contact area. This happens if the tube is leaned against a guide bar forexample. The result are measuring points above the measuring plane leading to a largerlength in the image.Figure 5.12(b) shows a section of 21 black tubes with a diameter of 12mm measured at30m/min. The larger distance to the ground truth data is clearly visible. However, thesystem is again able to reliably detect all tubes correctly within the tolerances withoutany false negatives (Ω FN = 0). As an example of how the deformation of a tube influencesthe measuring results, images of the 7th and the 11th tube 2 of this sequence are shownin Figure 5.14(b) and (c) respectively. The extension in vertical direction of tube No. 7is definitely smaller than for the neighboring tubes. This is an indicator that the tube isdeformed and lies on the smaller side, thus, it is measured larger than it actually is. Onthe other hand, tube No. 11 is larger in the vertical extension indicating it is lying onthe larger contact area. The result is a much shorter length measured by the vision-basedsystem which can be also seen in Figure 5.13(b). Like for 6mm tubes, the measurements aremostly shorter compared to the ground truth, although the origin is different as introducedabove.These results show the accuracy limits of the weak-perspective model. If higher accuracyis needed, a telecentric lens could be used to overcome the perspective effects of differentdepths, or the height of a tube in the image could be exploited to adapt the calibrationfactor f pix2mm dynamically.Transparent Tubes The experiments with different diameters have been repeated withtransparent tubes. Only 92% of all transparent tubes with a diameter of 6mm are detectedand measured by the system in this experiment. This is mainly due to the nonuniformtranslucency of the conveyor belt. Especially the thin 6mm tubes are very sensitive to2 Note: The tube number does not correspond to the (single) measurement number. The dashed linesindicate which measurements belong to the same tube.


5.3. EXPERIMENTAL RESULTS 119(a)(b)(c)Figure 5.14: (a) Idealized cross-section of deformed tubes (frontal view). The measuringplane Π M is defined based on an ideal circular tube (center). Deviations denoted as ∆ 1 (left)and ∆ 2 (right) influence the length measurement in the image projection. (b) Example of adeformed tube (No. 7 in Figure 5.12(b)) lying on the smaller side. The measuring points arecloser to the camera and due to perspective, the tube appears measurable larger in the image.(c) The opposite effect occurs if a deformed tube (No. 11 in Figure 5.12(b)) lies on the largercontact area.changes in brightness, since they are more translucent than 8mm and 12mm tubes. Atregions on the conveyor belt that transmit more light, the thin tubes almost disappear.Thus, one has to reduce the intensity of the light source. This is a tradeoff, becauseother regions that transmit less light get even darker while the structure of the belt isemphasized. If the contrast is too low, the tube can not be located in the profile. Thisproblem could be prevented if one would use a more homogenously translucent conveyorbelt.The 12mm diameter tubes yield generally a better contrast which can be seen on thedetection rate of 98%. The number of per tube measurements Ω PTM is 3.12 compared to2.8 for 6mm tubes. However, the average standard deviation is larger for the 12mm tubeswith 0.24. A RMSE of 0.2 for both 6 and 12mm transparent tubes indicates the measuringresults are almost equally accurate than black tubes of the same diameter, although theextrema are significantly larger. As already mentioned, these values can be influenced by afew outliers. The values of GT D show a much more uniform distribution of the deviationscompared to black tubes. This is due to the fact that transparent tubes are more sensitiveto strong background edges which can be wrongly detected as tube edge. Figure 5.17gives an example of how the system can fail leading to a larger measured length. Thepoor contrast at the tube boundary can not be compensated by the stronger responses atthetubeedgeends. Themaximumcorrelationscoreisreachedatthebackgroundedge.This problem does not occur at black tubes due to the stronger contrast.Thus, in addition to the problems described for black tubes of 6 and 12mm diameter,transparent tubes may be measured longer than they really are. Figure 5.15 visualizes theexperimental results with different diameters of transparent tubes. Again, this is only asection of the total number of measurements which are summarized more comprehensivein Figure 5.16 based on the ground truth distance. Compared to the experiments withblack tubes there have been false negatives among the transparent tubes, i.e. tubes have


120 CHAPTER 5. RESULTS AND EVALUATION5251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries5251.5measurementsupper tolerancelower toleranceresulting mean lengthground truthboundaries515150.550.5Length [mm]50Length [mm]5049.549.5494948.548.548480 10 20 30Measurement number(a) transparent, 6mm diameter0 10 20 30 40 50Measurement number(b) transparent, 12mm diameterFigure 5.15: Experimental results of transparent tubes (50mm length) with a diameter of 6and 12mm at 30m/min. The plots show a section of the total number of tubes only.0.7ground truth distance0.7ground truth distance0.60.60.50.50.40.40.30.30.20.2GTD [mm]0.10-0.1GTD [mm]0.10-0.1-0.2-0.2-0.3-0.3-0.4-0.4-0.5-0.5-0.6-0.6-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90-0.70 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95Tube numberTube number(a) transparent, 6mm diameter(b) transparent, 12mm diameterFigure 5.16: Ground truth distance in mm of all measured transparent tubes with a diameterof 6 and 12mm at 30m/min.


5.3. EXPERIMENTAL RESULTS 121(a) (b) (c)Figure 5.17: The tube edge detection can fail if the contrast between tube and backgroundis poor. (a) Zoomed region of an input image. (b) Edge response of this image within the localROI around the assumed edge location. Only the ends of the tube edge yield a significantresponse which is of little account compared to the edge response of the background. (c) Themaximum correlation score between a template and the image within the local ROI (bluebounding box) is reached at the background edge indicated by the red dots. The resultingmeasured length is obviously wrong.been wrongly classified as too long or too short. For 6mm tubes the false negative rateis Ω FN =0.02 and for 12mm tubes Ω FN =0.01. This means 1 − 2 tubes out of hundredwould have been sorted out wrongly by the system.5.3.5. RepeatabilityA transparent tube of 50.0mm and a black tube of 49.7mm (manual ground truth length)have been measured 100 times (based on several single measurements in each case) bythe system at a constant velocity of 30m/min. The tubes have a diameter of 8mm. Themeasuring results of the black tube are shown in Figure 5.18(a) and the results of the transparenttube in Figure 5.18(c). The corresponding Gaussian distribution functions basedon the mean and standard deviation over all measurements can be found in Figure 5.18(b)and (d) respectively. The narrower the distribution the better is the repeatability of themeasurements.The mean of the 100 measurements of the black tube is 49.66 which is pretty close tothe ground truth length. The standard deviation of the black tube is 0.0614mm. Thus,the deviation between measurements of the same tube is less than 1/10th of the toleranceand significantly smaller than the deviation between human measurements.The measuring results of the transparent tubes show a mean of 49.99 and a standarddeviation of 0.051. With the results of the previous experiments one could have expectedthe deviation of a transparent tube would be larger than for a black tube. In this experimentthe transparent tube has been detected 100times in a row correctly (as the blacktube). The only difference between the two tubes is the shape of the cross-section. Bothtubes are not ideally circular, but the material of the black tubes is slightly softer, i.e.more susceptible for deformations than transparent tubes. In this experiment each tubeis manually put onto the conveyor belt 100 times. Thus, even if the operator tries to grabthe tubes as carefully as possible, deformations can not be prevented for both tube typesleading to the observed deviations in the measurements. Obviously the total deviation


122 CHAPTER 5. RESULTS AND EVALUATIONMeasurementsMean10.9Measurement distribution500.8Length [mm]49.849.60.70.60.50.449.40.30.249.20 20 40 60 80 100N(a) black (49.7mm)0.1049 49.2 49.4 49.6 49.8 50 50.2Length [mm](b) black (49.7mm)50.4MeasurementsMean1Measurement distribution0.850.2Length [mm]500.60.449.80.249.60 20 40 60 80 100(c) transparent (50.0mm)N049.4 49.6 49.8 50 50.2 50.4 50.6Length [mm](d) transparent (50.0mm)Figure 5.18: Repeatability of the measurement of one tube. (a) 100 measurements of oneblack tube with the ground truth length of 49.7mm. (b) Corresponding Gaussian distributionof all measurements in (a) with µ =49.66 and σ =0.0614. (c) 100 measurements ofone transparent tube with the ground truth length of 50.0mm. (d) Corresponding Gaussiandistribution of all measurements in (c) with µ =49.99 and σ =0.051. The belt velocity is30m/min in both experiments.


5.3. EXPERIMENTAL RESULTS 12350.4MeasurementsMean1Measurement distribution0.850.2Length [mm]5049.80.60.449.60.20 20 40 60 80 100(a)N049.4 49.6 49.8 50 50.2 50.4 50.6Length [mm](b)Figure 5.19: Repeatability results of a metallic cylinder simulating a tube of 49.99mm groundtruth length. (a) 100 measurements of the gage at 30m/min. (b) Gaussian distribution of theresults with µ =49.94 and σ =0.033.is also influenced by other parameters such as the tube orientation within the guide barsand the limits of the discrete input image (although subpixel techniques are applied).This experiment shows how accurate the vision-based system is able to measure eventransparent tubes if the tube edge detection is successful.The experiment has been repeated with a metallic cylinder of 49.99mm length simulatingan ideal tube (gage). The cross-section of this gage is circular and not deformable manually.The results of this experiment are shown in Figure 5.19(a) and (b). The mean over all100 measurements is 49.94 with a standard deviation of 0.0331. This deviation is close tothe error that has been estimated in Section 4.2.6 with respect to the maximum possibletube orientation within the guide bars.One can conclude, as long as the orientation within the guide bars is neglected, themaximum precision of the system is about 0.03mm for tubes that are ideally round andnot bent. This is much more than twice as precise as human measurements. It is assumedthat this precision could be even increased, if the tubes are not only approximately butideally horizontally oriented.5.3.6. OutlierThe system is evaluated with respect to outliers in two steps. First more than 150 tubes(about 50mm, ∅8mm) are measured by the system at 30m/min. Approximately 1/3 ofthetubesmeetthetoleranceswhiletheother2/3 have a manipulated length. The groundtruth length of the tubes is known as well as the measuring order, i.e. each measurementcan be assigned to a corresponding ground truth length. With the results of the previousexperiments one can assume that the results of the black tubes will be better or equal tothe transparent tube results.The results of this experiment are visualized in Figure 5.20. All of the 150 tubes areclassified correctly. There is not a single false positive or false negative in the data.In the second stage of this experiment 30 manipulated and 22 good tubes are randomlymixed. All tubes are measured online at 30m/min while the blow out mechanism is


124 CHAPTER 5. RESULTS AND EVALUATION5251.5upper tolerancelower toleranceresulting mean lengthground truth5150.55049.54948.5480 20 40 60 80 100 120 140Figure 5.20: 150 transparent tubes of both good and manipulated tubes have been measuredby the system at 30m/min and compared to ground truth data. The system is able to reliablyseparate the tubes that meet the tolerances around the target length of 50mm from themanipulated tubes without any false positive or false negative.activated. This means tubes that do not meet the tolerances should be sorted out. Onceall tubes have passed the measuring area it is checked how many of the manipulated tubeshave also passed the blow out mechanism (false positives) and how many good tubeshave been sorted out (false negatives). To simplify this task the manipulated tubes havebeen marked before. This experiment is repeated 22 times leading to a total number of1144 inspected tubes. The results can be found in Table 5.7. The total detection rateis Ω total =0.99, i.e. 6 tubes out of 1144 could pass the measuring area without beingmeasured at. Three tubes have been sorted out wrongly representing a false negative rateof Ω FN =0.0026, i.e. 2.6 .The false positives are more critical. 5 outliers have not been blown out correctly, thus,Ω FP =0.0043. However, it turns out that 4 of the 5 false positives occur at sequenceswith at least one non detected tube. Hence with the ratio of good and manipulatedtubes of about 2:3, the probability is larger that the not inspected tube is a manipulatedone. In this case the false positives are most likely not due to failures in measuring, butoriginated in the fact that these tubes have not been measured at all. At production, allnon inspected tubes should be sorted out and revised to be sure that no outlier can pass.5.3.7. Tube LengthMeasuring tubes of a different length requires the adaptation of the visual field of thecamera. For tubes < 50mm this means placing the camera closer to the conveyor. However,due to the minimum object distance (250mm) of the 16mm lens used in the experimentsbefore and with the consideration made in Section 3.2.1, a lens with a longer focal lengthis needed to yield the desired field of view. In this case a 25mm focal length lens is used.


5.3. EXPERIMENTAL RESULTS 125Total Detected Missed FN FP52 52 0 1 052 52 0 0 052 52 0 0 052 52 0 0 052 52 0 0 052 52 0 1 052 51 1 0 152 52 0 0 052 50 2 0 152 52 0 0 052 52 0 0 152 52 0 0 052 52 0 0 052 52 0 0 052 52 0 1 052 52 0 0 052 51 1 0 152 52 0 0 052 51 1 0 152 52 0 0 052 51 1 0 052 52 0 0 01144 1138 6 3 5Table 5.7: Results of repeated blow out experiments. 22 × 52 transparent tubes have beenmeasured at 30m/min. The test data included 22 tubes within the allowed tolerances and 30outliers. Detected outliers should have been sorted out by the blow out mechanism. 3 tubeshave been sorted out wrongly (false negatives) and 5 outliers have passed (false negatives).Conspicuously, 4 of the 5 false positives occur if at least one tube has not been detected at allby the system.


126 CHAPTER 5. RESULTS AND EVALUATIONLarger tubes can be covered by the 16mm focal length lens like 50mm tubes, but thecamera has to be placed further away from the conveyor to yield a larger field of view. Theresulting pixel representation, i.e. the length a pixel represents in the measuring plane,increases as mentioned before. Hence, the precision decreases.In each experiment a charge of 50 tubes (transparent and black) of 30mm and 70mmlength and 8mm diameter is used as test data. Each charge has been measured by hand andis evaluated with respect to mean and standard deviation. Each tube passes the measuringarea once in this experiment and is measured as often as possible (single measurements)while it is in the visual field of the camera. The mean over the computed total lengthsas well as the standard deviation are determined and compared to the ground truth data.The results are summarized in Table 5.8 and visualized in Figure 5.21 in terms of Gaussiandistributions.The number of per tube measurements Ω PTM of 30mm tubes is slightly smaller comparedto experiments with 50mm tubes at the same velocity. This is due to the smallerfield of view of the camera. Obviously the tubes leave the measuring area faster. However,there are still more than 3 single measurements of each tube both for black and transparenttubes on average. The larger 70mm tubes have been measured even more oftenthan 50mm tubes with 6.12 single measurements for black and 4.85 for transparent tubesrespectively. This can be explained by a larger field of view.The mean value over a sequence of tubes µ seq equals the expectation µ GT in almostall experiments. Only the 30mm transparent tubes differ from the ground truth of about0.01mm which is acceptable small. This indicates the calibration factor between pixelsand mm has been trained perfectly in all experiments.The standard deviation is much smaller for 30mm tubes both in the manual and automatedmeasurements compared to 70mm tubes. In general black tubes are measured withhigher precision than transparent tubes by the system according to the observations inprevious experiments. The higher precision for 30mm tubes is important with respect tothe specified tolerances (see Table 1.2). In all experiments beside the 70mm black tubesthe manual precision is only slightly better than the precision of the visual inspectionsystem. However, the results of the system have been always precise enough to allow forreliable measurements in terms of the allowed tolerances. At 70mm black tubes the systemperformed even better than humans with a standard deviation of 0.14 compared to 0.16measured by hand.It is important to state that the precision in these experiments depends both on themeasuring variance of the system and the real variance of the tubes. Accordingly onecan not compare the results directly with those in Section 5.3.5 where only one tube wasmeasured several times in one experiment.One can conclude the visual inspection system is able to measure also tubes of differentlengths as accurate as humans on average.5.3.8. PerformanceFinally, the performance of the system is evaluated on an Athlon64 FX-55 (2.6GHz, 2GBRAM) platform.The total processing time can be divided into five main groups including profile analysis,compensation for radial distortion, edge detection and template matching, as well as length


5.3. EXPERIMENTAL RESULTS 127Color L target Ω PTM µ seq µ GT σ seq σ GT(a) Black 30 3.43 30.06 30.06 0.09 0.08(b) Transparent 30 3.18 30.07 30.06 0.12 0.08(c) Black 70 6.12 69.76 69.76 0.14 0.16(d) Transparent 70 4.85 70.21 70.21 0.27 0.20Table 5.8: Results of 30mm and 70mm tubes at 30m/min. L target represents the targetlength and Ω PTM the average number of per tube measurements. The mean and standarddeviation of the length measuring distributions are denoted as µ seq and σ seq for the automated,and µ GT and σ GT for the human measurements respectively. The results are also visualizedin Figure 5.21.1Measurement distributionGround truth distribution1Measurement distributionGround truth distribution0.90.90.80.80.70.70.60.60.50.50.40.40.30.30.20.20.10.1029 29.5 30 30.5 31Length [mm](a) 30mm black029 29.5 30 30.5 31Length [mm](b) 30mm transparent1Measurement distributionGround truth distribution1Measurement distributionGround truth distribution0.90.90.80.80.70.70.60.60.50.50.40.40.30.30.20.20.10.1069 69.5 70 70.5 71Length [mm](c) 70mm black068.5 69 69.5 70 70.5 71 71.5Length [mm](d) 70mm transparentFigure 5.21: Length distribution of 30mm and 70mm tubes at 30m/min for automated (solidline) and manual measurements (dashed line). All experiments show a very good accuracy, i.e.the vision system measures the same length on average. Black tubes are generally measuredslightly more precise than transparent tubes. The vision system is even more precise at 70mmblack tubes than human measurements.


128 CHAPTER 5. RESULTS AND EVALUATIONcomputation and tracking. The last group contains all remaining operations that are notconsidered by any of the groups before.Manythousandsofframeshavebeentimedwithandwithouttubesinthevisualfieldof the camera. The results of the performance evaluation can be found in Figure 5.22. Itturns out that the processing of a measurable frame requires 17.8ms on average. Thus,all images at a capture rate of 50fps (i.e. a new image is acquired every 20ms) can beprocessed.The dominant part of the processing is consumed by edge detection and template matchingwhere the later is mostly expensive. 82% of the total processing time is needed forthis step on average, although the number of pixels considered is highly restricted by thelocal ROIs. The undistortion operation is the second most expensive operation with 10%followed by the length computation and tracking with 4%. The profile analysis, thougth asfast heuristic to locate a tube roughtly, is proven to be very fast with only 0.29ms/frame.The remaining 3% represent operations such as image conversions, copying or drawingfunctions to visualize the detection results. The later could be saved at production ifvisualization is not required.If the profile analysis detects a non measurable frame, the template matching is notperformed. Thus, the remaining time could be used for different side operations in future,e.g. to save logging information or to run certain self control mechanisms. Such mechanismscould check whether the illumination is still bright enough or if the camera positionhas changed for example.


5.3. EXPERIMENTAL RESULTS 129TaskΩ time [ms/frame]Profile Analysis 0.29Undistortion 1.79Edge Detection/14.57Template MatchingLength computation/0.69TrackingOther 0.48Total 17.82(a)Undistortion: 10%Profile Analysis: 2%Other: 3%Length computation/Tracking: 4%Edge detection/ Template matching: 82%(b)Figure 5.22: (a) Average processing time per frame divided into different steps of the visualinspection. (b) Corresponding pie chart. As one can see, the edge detection and templatematching is the dominant operation throughout inspection.


130 CHAPTER 5. RESULTS AND EVALUATION5.4. Discussion and Future WorkThe main difficulties with transparent tubes come along with the nonuniform brightnessand the texture of the background. A conveyor belt which is equally translucent overthe whole length could prevent many problems. The parameters controlling the detectionsensitivity must cover both the brightest and the darkest region of the conveyor belt. Thisis always a compromise leading to poorer results on average. However, if the contrastbetween tubes and the background does not depend on where the tube is located on theconveyor belt, the parameters can be adjusted much more specific.The background texture of the conveyor belt used for the prototype has the drawbackof regular vertical structures. If the tube edge contrast is poor, the edge response ofthe background may be stronger than the tube edge. Model knowledge can be used toimprove the tube edge localization even under the presence of strong vertical backgroundedges. However, there is still a certain error probability which can be drastically reducedif vertical background edges are suppressed. The best solution would be to use a conveyorbelt with a canvas of horizontal structure. This would obviously simplify the detectiontask without requiring any computation time.If no conveyor belt can be found that provides the desired horizontal structure in combinationwith good translucency characteristics, one can think of suppressing the backgroundpattern within the local ROI around a tube edge algorithmically by exploiting the regularityof the background pattern. One idea is to transform the spatial image into thefrequency domain using the Fourier transform. For more information on the Fourier transformand the frequency domain it is referred to [64]. If it is possible to find characteristicfrequencies belonging to the background pattern, one can remove these frequencies in thefrequency domain and apply the inverse Fourier transform to the filtered spectrum. Theresult is a filtered spatial image with reduced background structure. The filter must bedesigned carefully to preserve the tube edges.In a first experiment, test images of both a conveyor with and without a tube have beenacquired and transformed into the frequency domain. Figure 5.23(a) and (b) show anexample of the spectrum of an image with background only and with transparent tubesin the image respectively. The spatial domain of (b) can be seen in (d). One eye-catchingconsistency in the spectra are the bright spots. If one removes these spots in the spectrumof an image indicated by the black regions in (c) and applies the inverse Fourier transformto this filtered spectrum, the result is an image with a significantly reduced backgroundpattern. The actual tube edges, however, are quite well preserved. In this case thespectrum has been filtered by hand and only coarse. Much more work has to be spentin designing more sophisticated and reliable filters that perform well for a large numberof images without removing or blurring any relevant edges. Removing a frequency fromthe spectrum does always influence the whole image. The filter in the example producesnew structure at the tube regions, especially around the printings. In addition, the darkerstripe in the background on the right of the input image is still present in the filteredversion, since it does not belong to the regular pattern of the background. Although inthis example the dark stripe is not critical it might be in other situations. This shows thelimits of this approach. Any deviations from the regular background pattern are difficultto suppress in the frequency domain. If the conveyor belt is changed, the texture of thebelt might by completely different. In this case the filter has to be adapted. An automatedfilter adaptation and background learning is non trivial.


5.4. DISCUSSION AND FUTURE WORK 131(a) Background only (b) Background + Tubes (c) Masked spectrum(d) Source Image(e) Filtered ImageFigure 5.23: Background suppression in the frequency domain. (a) Fourier transform of animage of an empty conveyor. (b) Fourier transform of (d). (c) Certain frequencies have beenremoved by hand indicated by the black regions. (e) Inverse Fourier transform of the filteredspectrum. The characteristic vertical background pattern could be reduced quite well whilethetubeedgesarepreserved.The experiments have shown that tubes of 8mm diameter are most robust againstdeformations. While thinner tubes of 6mm diameter tend to be bent, tubes of 12mm maybe elliptical in the cross-section. In both cases the accuracy and precision decreases. Thequestion is whether such deformations are only caused by the way the tubes have beenstored, transported and handled throughout the experiments in the laboratory or if theyalso occur at production. The later can be assumed, at least in a certain amount. Atelecentric lens could overcome the problem of perspective occurring with deformed 12mmtubes.A less cost expensive improvement would be to measure not only the length, but alsothe height of a tube in the image. A larger height indicates the tube is closer to the cameraand vice versa. The calibration factor relating pixels to mm could be defined as a functionofthetubeheight.Obviously,thisrequiresamorecomplexteach-instep.Another potential source of deviations in the measurements is the tube orientation.The guide bars restrict the maximum tube rotation to a minimum. The remaining errorhas been approximated. Although it is very small, it could be even further reduced bytilting the whole conveyor slightly around its longitudinal axis. The angular orientationguarantees that all tubes will roll to the lower guide bar. If the guide bar is horizontalin the image, so will be the tubes. Accordingly the camera position has to be adaptedto reestablish the fronto-orthogonal view. The proposed camera positioning method isindependent of the orientation of the conveyor and the camera in 3D space.


132 CHAPTER 5. RESULTS AND EVALUATIONThe blow out mechanism was tested successfully in the prototype setup. The advantageof this mechanism is that it works almost independent of the conveyor velocity and theposition of the light barrier relative to the measuring area. One has to assure only thatno tube passes the light barrier before the good/bad decision of the measuring systemreaches the blow out controller.One drawback of the current strategy is the sensitivity to ghosts. If the system detectsa tube where actually no tube is, the resulting classification of the ghost is send to thecontroller anyhow and stored in the FIFO memory. Since a ghost is never detected bythe light barrier, the good/bad decision of the ghost is still in the memory when the nexttube passes the light barrier. Instead of considering the decision belonging to this tube(appended to the FIFO memory) the decision of the ghost is evaluated. This leads to aloss of synchronization, i.e. a tube T is related to the decision of tube T − 1. Over timethis effect can increase and the reliability of the system is obviously violated.A potential solution of this problem can be achieved by replacing the FIFO memory bya single register that is able to store only the latest decision. Without loss of generalitya0inthisregistermightcorrespondtoblowingoutthenexttubewhilea1indicatesthenext tube can pass. The register is set to 0 by default. Each time the inspection systemmeasures a tube to be within the allowed tolerances a signal is send to the controller thatsets the bit in the register to 1. As soon as the tube has passed the light barrier, theregister is reset to 0. This has to be done before the next tube is measured. Thereforethe light barrier has to be placed quite close to the measuring area. The advantage of thisapproach is that the memory contains always the current decision belonging to the tubethat passes the light barrier next. A timer can be used to reset the register if no tubeintersects the light barrier within the expected time. Thus, ghosts become uncritical.Furthermore, since the register is reset each time, this helps also to prevent the problemsofnondetectedtubes,i.e.tubesthathavepassedthevisualfieldofthecamerawithout being measured. In the outlier experiment (see Section 5.3.6) the false positiverate increased drastically if tubes could not be detected. In this case the system does notsend a good/bad decision for the missed tube to the controller. The light barrier, however,detects every tube independent of being measured or not. With the single register strategythese tubes are blown out by default. Thus, only tubes that have been measured by thesystem and meet the allowed tolerances are able to pass the blow out nozzle.If tubes are not detected at all or measurements do not result in a meaningful lengthvalue (e.g. the standard deviation of the single measurements is too large), the correspondingtubes define another group U including all unsure measurements that can notdefinitely be assigned to G ′ 0 , G′ −,orG ′ +. All tubes of this class should be blown out bydefault to ensure no outlier can pass the quality control. These tubes do not have tobe considered as rejections, but could be measured by hand afterward or recirculated tobe inspected again by the vision-based measuring system depending on the frequency ofoccurrence.The experiments have shown that more than 80% of the total processing time is neededfor the template based edge localization. In the current implementation the left andright ROI are processed sequential. One possible optimization could be to parallelize thisproblem. This means, the computation within the left and right ROI could be performedin separate threads to exploit the power of curret dual core architectures. This is possible,since the processing in the two ROIs is independent of each other.


6. ConclusionIn this thesis a functioning prototype for a vision-based heat shrink tube measuring systemhas been presented allowing for an 100% online inspection in real-time. Extensive experimentshave shown the accuracy and precision of the developed system which is reachingthe quality of accurate human measurements under ideal laboratory conditions. The advantageof the developed system is that this accuracy can be achieved even at conveyorvelocities of up to 40m/min.A multi-measurement approach has been investigated in which each decision whethera tube has to be sorted out is based on 2-11 single measurements depending on the tubetype and conveyor velocity. This requires video frame rates of ≥ 50fps to be processedin real-time. Fast algorithms, heuristics and model knowledge are used to improve theperformance in this constrained application. Tube edge specific templates have been definedthat are able to locate a tube edge with subpixel accuracy even in low contrastimages under the presence of background clutter. In the prototype setup, the tube edgedetection has been complicated by the strong vertical structure of the conveyor belt andan inhomogeneous translucency leading to non uniform bright background regions. Theconsequences for transparent tubes have been discussed including the possibility of tubesthat can pass the visual field of the camera without being detected.Since black tubes are not translucent, they yield an optimal contrast to the backgroundwith a back lighting setup. On the other hand, transparent tubes are much more sensitiveto the structure of the background and the local tube edge contrast. All parametersadjusted for transparent tubes turned out to have no disadvantage for black ones. Thus,the parameters for transparent tubes are used in general, leading to a more uniformsolution in the system design.Beside the algorithmic part of the work the engineering of the whole system includingthe proper selection of a camera, optical system, and illumination has been solved. Theintegration of the micro controller and the air blow nozzle completes the prototype, allowingfor concrete demonstrations of how tubes that do not meet the tolerances are blownout.A simple and intuitive initialization of the system has been developed. Most parameterscan be trained interactively and automated without complicated user interactions. Evenan unskilled worker should be able to perform the teach-in step after a few instructions.The only critical part of the teach-in is the camera positioning. To exclude as many sourcesof error the camera should be mounted as stable as possible at fix orientation (which hasto be calibrated only once). The required height adjustments to cover the range of tubelengths should be automated if possible.The maximum measuring precision of 0.03mm was reached for a metallic tube modelsimulating an ideal tube (at a conveyor velocity of 30m/min). During the experimentsit has been observed that deformations of real heat shrink tubes (elliptical cross-section133


134 CHAPTER 6. CONCLUSIONor bending) have a certain influence on the measuring precision. However, the averageprecision is still < 0.1mm for real tubes. In general, tubes of 8mm diameter have beenmeasured more precisely than 6mm or 12mm tubes.The average accuracy (root mean square error) of the automated measurements, i.e.the distance to some ground truth reference, is about 0.1mm for black tubes and about0.2mm for transparent tubes at velocities of 30m/min. The ground truth has been acquiredmanually under ideal laboratory conditions and has also a certain inter and intra humandeviation of about 0.1mm. While the velocity has only a minor influence on the accuracy ofblack tubes, the accuracy of transparent tubes decreases significantly with higher velocities.The main reason for this observation is the decreasing number of per tube measurements,sinceaveragingoverthesinglemeasurementsgetsmoresensitivetooutliers.Inaddition,the probability increases that a transparent tube is not detected at all if the backgroundcontrast is poor. However, in general, the accuracy and precision has been good enough inall experiments to reliably detected both black and transparent tubes of different lengthand diameter with respect to the specified tolerances. Experiments with transparent tubesof manipulated lengths have shown the system is able to separate the good ones from thetubes that do not meet the tolerances successfully. The false negative rate, i.e. the numberof tubes that have been sorted out wrongly, is 2.6 .Lessthan4.3 of failures couldpass the measuring area. However, 80% of the false positives have not been detected at allby the system. With the adaptation of the blow out strategy as suggested in Section 5.4these tubes would have been blown out, too. Hence, the theoretically remaining falsepositive rate is 0.87 for transparent tubes. Following the experimental results one canassume that the false positive rate for black tubes will be less or equal.The measuring results have a positive side effect, since it is possible to compute themoving average over the last N measurements. An operator can compare the currentmean length to the given target length. This can be useful especially during the teachinof the machine. At production, deviations can be corrected before the tolerances areexceeded. In a more sophisticated solution the adjustment could be automated. If one canassure the current mean length measured by the vision system equals the target length,the blow out mechanism may never need to be activated and the probability for falsepositives can be further decreased.In addition, the system is able to store the inspection results in a file or database. Suchstatistics can be also useful for the management or controlling since they include not onlythe length distribution of the production, but also information about the total number oftubesproduced,thetimeofproduction,aswellasthenumberofdefectives.The good results of the prototype support the use of an optical inspection system forlength measurements of heat shrink tubes. Manual sample inspections as used currently atproduction are influenced by many factors like concentration, speed, motivation, or tirednessof the individual operator. In general, less precision can be assumed for measurementsat production compared to ideal laboratory measurements as used for evaluating the system.The advantage of the automated vision-based system is the ability to inspect eachtube at laboratory precision without getting tired.


Appendix135


A. Profile Analysis Implementation DetailsDetails regarding the implementation of the profile analysis with a focus on performanceaspects are introduced in the following.A.1. Global ROIA simple, but very effective way to decrease the computational load is to restrict the imageprocessing to a certain region of interest (ROI). Following the assumption that parts ofthe guide bars are visible in the images at the top and the bottom without containingany information, the guide bars can be excluded from further processing and, thus, theROI lies in between these guide bars. The height of the ROI is given by the guide bardistance which should be almost constant over the whole image since they are adjusted tobe parallel to the x-axis in the image. The ROI extends in horizontal direction over thewhole image width minus a certain offset at both sides. This offset is due to the fact thatthe image distortion is maximal at the boundaries. The actual value of the offset dependson the ability to overcome the distortion at measuring. If the measurements are accurateeven at the image boundaries, the offset tends against zero. In the following, the ROIbetweentheguidebarsisalsoreferredtoasglobal ROI.Section 3.2 states it is possible to adapt the camera resolution to a user-defined size.The reason why the image size is not adjusted to cover the global ROI exactly (by what itbecomes redundant) is a very practical one. First of all, the guide bars provide a valuableclue in adjusting the field of view of the camera. In addition, smaller images mean less datahas to be transferred and consequentially a larger number of images can be transferred inthe same time. If the image size is too small, the actual frame rate exceeds the number offrames that can be processed without skipping frames which should be avoided.The extraction of the global ROI can be automated using a similar profile analysisapproach as used for tube localization but in vertical direction. Again several verticalscan lines are used to build the profile. If there is no tube in the image (empty scene),the guide bars can be detected clearly since the contrast between the bright conveyor beltand the black guide bars is very strong. A smoothing step as used in horizontal directionto overcome the background clutter is not necessary. This has the benefit that the twostrongest peaks in the profile describe the guide bar location quite accurate. The detectionof the global ROI has to be performed only once at an initialization step if assuming a staticsetup of camera and conveyor that does not change over time. In future, it is thinkablethat everytime the state ’empty’ is detected, the ROI is reinitialized and compared withthe previous location. A difference indicates something changed with the setup and mayinduce an alert or some specific reaction.137


138 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILSA.2. Profile SubsamplingIn many computer vision tasks it is common to perform a specific operation on lowerresolution images than the input to increase computation speed. For example one couldsimply discard every second row or column two obtain an image of half size of the originalimage. However, to avoid a violation of the sampling theorem it is important to applya low-pass filter operation on the data before. This mechanism can be used to generatepyramids of images at different resolutions or scales. Each layer in the pyramid has halfthe size of the layer above with the top layer corresponding to the original size. Beforesubsampling the data a Gaussian smoothing operation is performed to suppress higherfrequencies. Thus, such pyramids are called Gaussian Pyramids in the literature [24].The same can be applied to one-dimensional signals such as gray level profiles. Inthis application, experiments have shown the information about the tube boundaries isconserved a coarser scale. Thus, a subsampled version two levels down the pyramid insteadof the original profile is used in praxis. The data to be processed after this step is onlya fourth of the input. Obviously, the profile analysis can be accelerated by this step.Experiments investigating whether the profile subsampling could replace step one in theprofile analysis, i.e. the smoothing with a large mean kernel, came to the conclusion thatin connection with transparent tubes and dark printing, the strong contrast of the letterscould be misclassified as tube boundary. The system tries to detect the real tube locationin a certain region around the wrong position and is likely to fail. The mean filter insteadis able to reduce the influence of the lettering and must not be replaced.A.3. Scan LinesAs mentioned in Section 4.4.2, the profile to be evaluated is based on the normalized sumof N scan scan lines equally distributed over the global ROI. The reason why a single scanline is not sufficient is shown in Figure A.1(b). Three sample profiles at different heights(61, 80 and 100) are selected to visualize the influence of the printing. One can see thestrong contrast at the letters as well as a poor contrast at the right tube boundary. Sinceit is non-deterministic of whether the printing of a particular tube is visible in an image,one has to consider the worst case. This is a scan line passing through the printing at asmany positions possible. The global mean of the resulting profile is much lower in this caseand it is possible that the intensity of the tube at regions outside the printing is wronglyclassified as background. The result of this effect is shown in Figure A.1(d). On the otherhand, the usage of several scan lines decreases the influence of the printing significantly.The probability that more than a few scan lines will pass through the printing is low.For example, among the sample tubes used for testing of the prototype, the coverage ofthe printing is about 16% with respect to the diameter. Thus, it is very likely to havemore than one scan line passing through tube regions without printing. In total, theinfluence of the printing decreases with the number of scan lines. However, Figure A.1(c)shows 11 scan lines equally distributed over the global ROI in y-direction are sufficientto yield almost equal results as with considering all rows of the ROI. Here, the profileconsisting of 11 scan lines is shifted, i.e. the intensity values are lower compared to theprofile calculated from all ROI rows (90 in this example). This is due to the location of


A.3. SCAN LINES 139(a)4003501 scanline (y=61)1 scanline (y=80)1 scanline (y=100)400350normalized sum of 11 scanlinesnormalized sum of all rows300300250250gray value200gray value200150150100100505000 100 200 300 400 500 600 70000 100 200 300 400 500 600 700xx(b)(c)(d)(e)Figure A.1: Comparison of a single and multi scan line approach. (a) Input gray scale image.(b) Profiles of three selected scan lines at height 61, 80 and 100 respectively. The first twoscan lines pass through the printing leading to strong variations in the profile. Compared tothese variations the poor contrast of the right tube border makes a correct detection difficult.(c) The normalized sum of several scan lines reduces the effect of the printing bringing out thelocation of the tube much more clearly. It can be seen that 11 scan lines equally distributedover the global ROI are sufficient to yield almost equivalent results as if considering every row.(Note: The profile of the 11 scan lines is shifted since the global ROI included parts of theguide bars at the upper and bottom row. Since these pixels have a value near zero, they do notcontribute much to the profile sum but are considered in normalization. The scale, however,does not affect the actual tube location.) (d) Wrong detection of the tube boundaries if usinga single scan line. (e) Result of the multi scan line approach.


140 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILSthe global ROI. As can be seen in Figure A.1(e) the global ROI is a bit too large, thus, theupper and bottom row hits the border of the guide bars. Scan lines through these rows donot contribute much to the overall profile, but have an effect in normalization. This shift,however, does not affect the actual tube location. With respect to performance, rows thathave no influence should be ignored.Obviously the problem with the printing on a tube’s surface comes only with transparenttubes since the printing is not visible on the black tubes at back light. If black tubes areinspected, a single scan line in the image center is sufficient to localize the tube correctly,but more scan lines do not impair the results. To have a more universal solution, the multiscan line approach is used for all tube types and it is not distinguished at this part of thesystem to keep it simple.A.4. Notes on ConvolutionAt several steps in the profile analysis a convolution operation is performed. With respectto the derivation of the profile by convolving with a first derivative Gaussian kernel in steptwo, it is important to note what boundary condition is used, since in discrete convolutionthere are positions at the image boundaries that are undefined. There are many differentstrategies to adopt this problem including padding the image with constant values (e.g.zero), reflecting the image boundaries periodically or simply ignoring the boundaries [24].Here, a symmetric reflection strategy is used:P (−i) = P (i − 1) (A.1)P (N P + i) = P (N P +1− i); (A.2)where the first equation is used for the left and the second equation for the right boundaryrespectively. N P indicates the length of P and P (x) the intensity value in the profile atposition x. The advantage of this strategy compared to a padding with zeros for exampleis that no artificial edges are introduced.


B. Hardware ComponentsB.1. CameraSpecification MF-033C MF-046BImage Device 1/2” (diag. 8 mm) type progressive scan 1/2” (diag. 8 mm) type progressive scanSONY IT CCDSONY IT CCDEffective Picture Elements 656 (H) × 492 (V) 780 (H) × 580 (V)Lens Mount C-mount: 17.526 mm (in air); ∅ 25.4 mm C-mount: 17.526 mm (in air); ∅ 25.4 mm(32 T.P.I.) Mechanical Flange Back to filterdistance: 8.2 mm(32 T.P.I.) Mechanical Flange Back to filterdistance: 8.2 mm640 × 480 pixels (Format 0)Picture Sizes780 × 580 pixels (Format 7; Mode 0)640 × 480 pixels (Format 0; Mode 5)388 × 580 pixels (Format 7; Mode 1)656 × 492 pixels (Format 7; Mode 0)780 × 288 pixels (Format 7; Mode 2)388 × 288 pixels (Format 7; Mode 3)Cell Size 9.9 µm × 9.9 µm 8.3 µm × 8.3 µmADC 10 Bit 10 BitColor Modes Raw 8, YUV 4:2:2, YUV 4:1:1 -Data Path 8Bit 8Frame Rates3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 74 Hzin Format 7(RAW);68Hz(YUV4:1:1);upto 51 Hz in YUV 4:2:2Gain ControlManual: 0-16 dB (0.035 dB/step); Auto gain(select. AOI)White Balance Manual (U/V); One Push; Auto (select.AOI)Shutter Speed20 . . . 67.108.864 µs (∼ 67s); Auto shutter(select. AOI)External Trigger Shutter Trigger Mode 0, Trigger Mode 1, Advancedfeature: Trigger Mode 15 (bulk); imagetransfer by command; Trigger delay3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 53 Hzin Format 7Manual: 0-24 dB (0.035 dB/step); Auto gain(select. AOI)-20 . . . 67.108.864 µs (∼ 67s); Auto shutter(select. AOI)Trigger Mode 0, Trigger Mode 1, Advancedfeature: Trigger Mode 15 (bulk); imagetransfer by command; Trigger delayInternal FIFO-Memory Up to 17 frames Up to 13 frames#LookUpTablesOne, user programmable (10 Bit → 8 Bit); One, user programmable (10 Bit → 8 Bit);Gamma (0.45)Gamma (0.45)Smart FunctionsReal time shading correction, image sequencing,Real time shading correction, image sequencurabletwo configurable inputs, two configing,two configurable inputs, two config-outputs, image mirror (L-R ↔ R-L), urable outputs, image mirror (L-R ↔ R-L),serial port (IIDC v. 1.31)binning, serial port (IIDC v. 1.31)Transfer Rate 100 Mb/s, 200 Mb/s, 400 Mb/s 100 Mb/s, 200 Mb/s, 400 Mb/sDigital Interface IEEE 1394 IIDC v. 1.3 IEEE 1394 IIDC v. 1.3Power RequirementsDC 8 V - 36 V via IEEE 1394 cable or 12-pinHIROSEDC 8 V - 36 V via IEEE 1394 cable or 12-pinHIROSEPower Consumption Less than 3 Watts (@ 12 V d.c) Less than 3 Watts (@ 12 V d.c)Dimension58 mm × 44 mm × 29 mm (L × W × H);without tripod and lens58 mm × 44 mm × 29 mm (L × W × H);without tripod and lensMass < 120g(withoutlens) < 120g(withoutlens)Operating Temparature +5 – +45 ◦ Celsius +5 – +45 ◦ CelsiusStorage Temparature −10 – +60 ◦ Celsius −10 – +60 ◦ CelsiusRegulationsEN 55022, EN 61000, EN 55024, FCC ClassA, DIN ISO 9022EN 55022, EN 61000, EN 55024, FCC ClassA, DIN ISO 9022OptionsHost adapter card, locking IEEE 1394 cable,API (FirePackage), TWAIN (WIA)- andWDM stream driverRemovable IR-cut-filter, Host adapter card,locking IEEE 1394 cable, API (FirePackage),TWAIN (WIA)- and WDM stream driverTable B.1: Camera specifications for the AVT Marlin F-033C and F-046B.141


142 APPENDIX B. HARDWARE COMPONENTSB.2. Illumination HardwareDescriptionValueRated Power Output200 WattsOutput Voltage0.0, 0.5 to 20.5 VDCInput Voltage Rating, 50/60 Hz90 to 265 VACPower Factor Correction @ 230 VAC, 50 Hz > 0.99, < 4 ◦Hold-up Time, Nominal AC Input, Full Load 8.3 msLine Regulation, Over Entire Input Range ±0.5%Current Limit Set Point8.5 AmpsTemperature Range: Operating0 ◦ to 45 ◦ CStorage−25 ◦ to 85 ◦ CRelative Humidity, Non-condensing 5% to 95%Table B.2: Light Source (A20800.2) with DDL LampDescriptionCalibrated AreaPanel SizeOverall ThicknessValue3” × 5” (76 × 127mm)4” × 6” (102 × 152mm).05” (1.3mm)Table B.3: SCHOTT PANELite Backlight (A23000) (flexible fiber optical area light).


B.2. ILLUMINATION HARDWARE 143DescriptionValueBulb TypeDDLVoltage 20Wattage 150Lamp BaseGX5.3Bulb FinishClearBurn PositionBase/Down Horz.ShapeMR-16Color Temp. 3150FilamentCC-6Lamp FillHalogenLamp Life500 Hrs.Over All Lengt [mm] 44.5Reflector Design DichroicReflector Size [mm] 50.7Working Distance [mm] 194.5Table B.4: Lamp specifications


144 APPENDIX B. HARDWARE COMPONENTS


Bibliography[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation from comparatorcoordinates into object space coordinates in close-range photogrammetry. Proc. ofthe Symposium on Close-Range Photogrammetry, pages 1–18, 1971.[2] M. B. Ahmad and T. S. Choi. Local threshold and boolean function based edgedetection. IEEE Trans. on Consumer Electronics, 45(3):674–679, August 1999.[3] Allied Vision Technologies GmbH, Taschenweg 2a, D-07646 Stadtroda, Germany.AVT Marlin - Technical Manual, 7 2004.[4] A. Alper. An inside look at machine vision. Managing Automation, 2005.[5] American Society for Photogrammetry and Remote Sensing (ASPRS). Manual ofPhotogrammetry. Asprs Pubns, 4th edition, 1980.[6] K. Astrom and A. Heyden. Stochastic modelling and analysis of sub-pixel edge detection.In International Conference on Pattern Recognition (ICPR), pages 86–90,1996.[7] B. Batchelor and F. Waltz. Intelligent Machine Vision. Springer, 2001.[8] A. Blake. Active Contours. Springer, 1999.[9] J. Y. Bouguet. Camera calibration toolbox for matlab.[10] I. N. Bronstein, G. Musiol, H. Mühlig, and K. A. Semendjajew. Taschenbuch derMathematik. Harri Deutsch, 2001.[11] D. C. Brown. Decentering distortion of lenses. Photometric Engineering, 32(3):444–462, 1966.[12] D. C. Brown. Lens distortion for close-range photogrammetry. Photometric Engineering,37(8):855–866, 1971.[13] J. Canny. A computational approach to edge detection. IEEE Transactions on PatternAnalysis and Machine Intelligence (PAMI), 8:679–698, 1986.[14] T. Chaira and A. K. Ray. Threshold selection using fuzzy set theory. Pattern RecognitionLetters (PRL), 25(8):865–874, June 2004.[15] R. W. Conners, D. E. Kline, P. A. Araman, and T.H. Drayer. Machine vision technologyfor the forest products industry. Computer, 30(7):43–48, 1997.[16] E. R. Davies. Machine Vision- Theory, Algorithms, Practicalities. Elsevier, 2005.[17] C. de Boor. A practical guide to splines. Springer, 1978.145


146 Bibliography[18] C. Demant, B. Streicher-Abel, and P. Waszkewitz. Industrial Image Processing -Visual Quality Control in Manufacturing. Springer, 1999.[19] R. Deriche. Using canny’s criteria to derive a recursively implemented optimal edgedetector. International Journal of Computer Vision (IJCV), 1(2):167–187, 1987.[20] S. di Zenzo, L. Cinque, and S. Levialdi. Image thresholding using fuzzy entropies.IEEE Transactions on Systems, Man, and Cybernetics (SMC-B), 28(1):15–23, February1998.[21] O. Faugeras. Three-Dimensional Computer Vision. A Geometric Viewpoint. MITPress, Cambridge, 1993.[22] J. Föglein. On edge gradient approximations. Pattern Recognition Letters (PRL),1:429–434, 1983.[23] P. J. Flynn and A. K. Jain. Cad-based computer vision: From cad models to relationalgraphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),13(2):114–132, 1991.[24] D. A. Forsyth and J. Ponce. Computer Vision - A modern approach. Pearson EducationInternational, 2003.[25] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEETransactions on Pattern Analysis and Machine Intelligence (PAMI), 13(9):891–906,1991.[26] C. A Glasbey. An analysis of histogram-based thresholding algorithm. GraphicalModels and Image Processing, 55(6):532–537, November 1993.[27] E. B. Goldstein. Sensation and Perception. California: Brooks/Cole Publishing Co.,1996.[28] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall, 2ndedition, 2002.[29] E. R. Hancock and J. V. Kittler. Adaptive estimation of hysteresis thresholds. InProc. of the IEEE Computer Vision and Pattern Recognition (CVPR), pages 196–201,1991.[30] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. CambridgeUniversity Press, 2nd edition, 2003.[31] J. Heikkila and O. Silven. A four-step camera calibration procedure with implicitimage correction. In Proc. of the IEEE Computer Vision and Pattern Recognition(CVPR), pages 1106–1112, 1997.[32] R. V. Hogg and A. T. Craig. Introduction to Mathematical Statistics. Prentice Hall,5 edition, 1994.[33] D. H. Hubel. Exploration of the primary visual cortex, 1955-1978. Nature, 299:515–524, 1982.


Bibliography 147[34] R. J. Hunsicker, J. Patten, A. Ledford, C Ferman, et al. Automatic vision inspectionand measurement system for external screw threads. Journal of ManufacturingSystems, 1994.[35] R. W. Hunt. Measuring Colour. Ellis Horwood Ltd. Publishers, 2nd edition, 1991.[36] B. Jähne. Digital Image Processing. Springer, 6th edition, 2005.[37] B. Julez. A method of coding TV signals based on edge detection. Bell System Tech.,38(4):1001–1020, July 1959.[38] R. King. Brunelleschi’s Dome: How a Renaissance Genius Reinvented Architecture.Penguin Books, 2001.[39] R. K. Lenz and R. Y. Tsai. Calibrating a cartesian robot with eye-on-hand configurationindependent of eye-to-hand relationship. IEEE Transactions on Pattern Analysisand Machine Intelligence (PAMI), 11(9):916–928, September 1989.[40] J. Linkemann. Optics recommendation guide. http://www.baslerweb.com/.[41] E. P. Lyvers, O. R. Mitchell, M. L. Akey, and A. P. Reeves. Subpixel measurementsusing a moment-based edge operator. IEEE Transactions on Pattern Analysis andMachine Intelligence (PAMI), 11(12):1293–1309, December 1989.[42] E. N. Malamas, E. G. M. Petrakis, M. E. Zervakis, L. Petit, and J. D. Legat. Asurvey on industrial vision systems, applications and tools. Israel Venture Capital(IVC), 21(2):171–188, February 2003.[43] M. Malassiotis and G. Strintzis. Stereo vision system for precision dimensional inspectionof 3d holes. Machine Vision and Applications, 15(2):101–113, December2003.[44] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanism.Journal of the Optical Society of America, 7(5):923–932, May 1990.[45] D. Marr and E. C. Hildreth. Theory of edge detection. Proc. Royal Soc. London,B207:187–217, 1980.[46] N. Otsu. A threshold selection method from grey-level histograms. IEEE Transactionson Systems, Man, and Cybernetics (SMC), 9(1):62–66, January 1979.[47] N. R. Pal and S. K. Pal. A review on image segmentation techniques. PatternRecognition, 26(9):1277–1294, September 1993.[48] J.R. Parker. Algorithms for image processing and computer vision. John Wiley &Sons, Inc., 1997.[49] P. Perona. Deformable kernels for early vision. IEEE Transpaction on Pattern Analysisand Machine Intelligence, 17(5):488–499, May 1995.[50] D. T1 Pham and R. J Alcock. Automated visual inspection of wood boards: Selectionof features for defect classification by a neural network. In Proc.oftheIMECHE Part E Journal of Process Mechanical Engineering, volume 213, pages 231–245.Professional Engineering Publishing, 1999.


148 Bibliography[51] K.K. Pingle. Visual perception by a computer. In Proc. of Analogical and InductiveInference (AII), pages 277–284, 1969.[52] W. J. Plut and G. M. Bone. Grasping of 3-d sheet metal parts for robotic fixturelessassembly. In Proc. of the CSME Forum - Engineering Applications of Mechanics,pages 221–228, Hamilton, Ont., 1996.[53] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.Numerical Recipes in C: The Art of Scientific Computing. Cambridge UniversityPress, Cambridge, UK, 2nd edition, 1993.[54] T. Pun. Entropic thresholding: A new approach. Computer Graphics and ImageProcessing (CGIP), 16(3):210–239, July 1981.[55] T. W. Ridler and S. Calvard. Picture thresholding using an iterative selection method.IEEE Transactions on Systems, Man, and Cybernetics (SMC), 8(8):629–632, August1978.[56] P. Rockett. The accuracy of sub-pixel localisation in the canny edge detector. InProc. of the British Machine Vision Conference (BMVC), 1999.[57] A. Rosenfeld and P. de la Torre. Histogram concavity analysis as an aid in thresholdselection. IEEE Transactions on Systems, Man, and Cybernetics (SMC), 13(3):231–235, March 1983.[58] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3d model acquisition. ACMTransactions on Graphics, 21(3):438–446, July 2002.[59] P.K. Sahoo, S. Soltani, A. K. C. Wong, and Y.C. Chen. A survey of thresholdingtechniques. Computer Vision, Graphics, and Image Processing (CVGIP), 41(2):233–260, February 1988.[60] B. Sankur and M. Sezgin. A survey over image thresholding techniques and quantitativeperformance evaluation. Journal of Electronic Imaging, 13(1):146–165, 1994.[61] J. L. Sanz and D. Petkovic. Machine vision algorithms for automated inspection ofthin-film disk heads. IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), 10(6), 1988.[62] M. Seul, L. O’Gorman, and M. J. Sammon. Practical Algorithms For Image Analysis.Cambridge University Press, 2000.[63] M.I. Sezan. A peak detection algorithm and its application to histogram-based imagedata reduction. Computer Vision, Graphics, and Image Processing (CVGIP),49(1):36–51, January 1990.[64] S. W. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. CaliforniaTechnical Publishing, 1997.[65] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. PrenticeHall PTR, 1998.


Bibliography 149[66] F. Truchetet, F. Nicolier, and O. Laligant. Supixel edge detection for dimensionalcontrol by artificial vision. Journal of Electronic Imaging, 10(1):234–239, Januar2001.[67] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machinevision metrology using off-the-shelf tv cameras and lenses. Robotics and Automation,IEEE Journal, 3(4):323–344, 1987.[68] H. Voorhees and T. Poggio. Detecting textons and texture boundaries in naturalimages. In Proc. of the International Conference on Computer Vision (ICCV), pages250–258, 1987.[69] J. Weickert. Anisotropic Diffusion in Image Processing. ECMI. Teubner, Stuttgart,1998.[70] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models andaccuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), 14(10):965–980, October 1992.[71] G. A. W. West and T. A Clarke. A survey and examination of subpixel measurementtechniques. ISPRS Int. Conf. on Close Range Photogrammetry and Machine Vision,1395:456 – 463, 1990.[72] P. C. West. High speed, real-time machine vision. Technical report, Imagenation andAutomated Vision Systems, 2001.[73] M. Young. The pinhole camera, imaging without lenses or mirrors. The PhysicsTeacher, pages 648–655, December 1989.[74] Z. Y. Zhang. A flexible new technique for camera calibration. IEEE Transactionson Pattern Analysis and Machine Intelligence (PAMI), 22(11):1330–1334, November2000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!