06.08.2013 Views

Laboratory Exercises, C++ Programming

Laboratory Exercises, C++ Programming

Laboratory Exercises, C++ Programming

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

LUND INSTITUTE OF TECHNOLOGY <strong>C++</strong> <strong>Programming</strong><br />

Department of Computer Science 2011/12<br />

<strong>Laboratory</strong> <strong>Exercises</strong>, <strong>C++</strong> <strong>Programming</strong><br />

General information:<br />

• The course has four compulsory laboratory exercises.<br />

• You are to work in groups of two people. You sign up for the labs at sam.cs.lth.se/Labs<br />

(see the course plan for instructions).<br />

• The labs are mostly homework. Before each lab session, you must have done all the<br />

assignments (A1, A2, . . . ) in the lab, written and tested the programs, and so on. Reasonable<br />

attempts at solutions count; the lab assistant is the judge of what’s reasonable. Contact a<br />

teacher if you have problems solving the assignments.<br />

• You will not be allowed to participate in the lab if you haven’t done all the assignments.<br />

• Smaller problems with the assignments, e.g., details that do not function correctly, can be<br />

solved with the help of the lab assistant during the lab session.<br />

• Extra labs are organized only for students who cannot attend a lab because of illness. Notify<br />

Per Holm (Per.Holm@cs.lth.se) if you fall ill, before the lab.<br />

The labs are about:<br />

1. Basic <strong>C++</strong>, compiling, linking.<br />

2. The GNU make and gdb tools, valgrind.<br />

3. <strong>C++</strong> strings and streams.<br />

4. <strong>Programming</strong> with the STL, operator overloading.<br />

Practical information:<br />

• You will use many half-written “program skeletons” during the lab. You must download the<br />

necessary files from the course homepage before you start working on the lab assignments.<br />

• The lab files are in separate directories lab[1-4] and are available in gzipped tar format.<br />

Download the tar file and unpack it like this:<br />

tar xzf lab1.tar.gz<br />

This will create a directory lab1 in the current directory.<br />

General sources of information about <strong>C++</strong> and the GNU tools:<br />

• http://www.cplusplus.com<br />

• http://www.cppreference.com<br />

• Info in Emacs (Ctrl-h i)


The GNU Compiler Collection and <strong>C++</strong> 3<br />

1 The GNU Compiler Collection and <strong>C++</strong><br />

Objective: to demonstrate basic GCC usage for <strong>C++</strong> compiling and linking.<br />

Read:<br />

• Book: basic <strong>C++</strong>, variables and types including pointers, expressions, statements, functions,<br />

simple classes, namespaces, ifstream, ofstream.<br />

• Manpages for gcc, g++, and ld.<br />

• An Introduction to GCC, by Brian Gough, http://www.network-theory.co.uk/docs/<br />

gccintro/. Treats the same material as this lab, and more. Also available in book form.<br />

1 Introduction<br />

The GNU Compiler Collection, GCC, is a suite of native compilers for C, <strong>C++</strong>, Objective C,<br />

Fortran 77, Java and Ada. Front-ends for other languages, such as Fortran 95, Pascal, Modula-2<br />

and Cobol are in experimental development stages. The list of supported target platforms is<br />

impressive — GCC has been ported to all kinds of Unix systems, as well as Microsoft Windows<br />

and special purpose embedded systems. GCC is free software, released under the GNU GPL.<br />

Each compiler in GCC contains six programs. Only two or three of the programs are language<br />

specific. For <strong>C++</strong>, the programs are:<br />

Driver: the “engine” that drives the whole set of compiler tools. For <strong>C++</strong> programs you invoke<br />

the driver with g++ (use gcc for C, g77 for Fortran, gcj for Java, etc.). The driver invokes<br />

the other programs one by one, passing the output of each program as input to the next.<br />

Preprocessor: normally called cpp. It takes a <strong>C++</strong> source file and handles preprocessor directives<br />

(#include files, #define macros, conditional compilation with #ifdef, etc.).<br />

Compiler: the <strong>C++</strong> compiler, cc1plus. This is the actual compiler that translates the input file<br />

into assembly language.<br />

Optimizer: sometimes a separate module, sometimes integrated in the compiler module. It<br />

handles optimization on a language-independent code representation.<br />

Assembler: normally called as. It translates the assembly code into machine code, which is<br />

stored in object files.<br />

Linker-Loader: called ld, collects object files into an executable file.<br />

The assembler and the linker are not actually parts of GCC. They may be proprietary operating<br />

system tools or free equivalents from the GNU binutils package.<br />

A <strong>C++</strong> source code file is recognized by its extension. We will use .cc, which is the recommended<br />

extension. Other extensions, such as .cxx, .cpp, .C, are sometimes used.<br />

In <strong>C++</strong> (and in C) declarations are collected in header files with the extension .h. To distinguish<br />

<strong>C++</strong> headers from C headers other extensions are sometimes used, such as .hh, .hxx, .hpp, .H.<br />

We will use the usual .h extension.<br />

The <strong>C++</strong> standard library is declared in headers without an extension. These are, for example,<br />

iostream, vector, string. Older <strong>C++</strong> systems use pre-standard library headers, named iostream.h,<br />

vector.h, string.h. Do not use the old headers in new programs.<br />

A <strong>C++</strong> program may not use any identifier that has not been previously declared in the same<br />

translation unit. For example, suppose that main uses a class MyClass. The program code can be<br />

organized in several ways, but the following is always used (apart from toy examples):


4 The GNU Compiler Collection and <strong>C++</strong><br />

• Define the class in a file myclass.h:<br />

#ifndef MYCLASS_H // include guard<br />

#define MYCLASS_H<br />

// #include necessary headers here<br />

class MyClass {<br />

public:<br />

MyClass(int x);<br />

// ...<br />

private:<br />

// ...<br />

};<br />

#endif<br />

• Define the member functions in a file myclass.cc:<br />

#include "myclass.h"<br />

// #include other necessary headers<br />

MyClass::MyClass(int x) { ... }<br />

// ...<br />

• Define the main function in a file test.cc (the file name is arbitrary):<br />

#include "myclass.h"<br />

int main() {<br />

MyClass m(5);<br />

// ...<br />

}<br />

The include guard is necessary to prevent multiple definitions of names. Do not write function<br />

definitions in a header (except for inline functions and template functions).<br />

The g++ command line looks like this:<br />

g++ [options] [-o outfile] infile1 [infile2 ...]<br />

All the files in a program can be compiled and linked with one command (the -o option specifies<br />

the name of the executable file; if this is omitted the executable is named a.out):<br />

g++ -o test test.cc myclass.cc<br />

To execute the program, just enter its name:<br />

./test<br />

However, it is more common that files are compiled separately and then linked:<br />

g++ -c myclass.cc<br />

g++ -c test.cc<br />

g++ -o test test.o myclass.o<br />

The -c option directs the driver to stop before the linking phase and produce an object file, named<br />

as the source file but with the extension .o instead of .cc.<br />

The driver can be interrupted also at other stages, using the -S or -E options. The -S option<br />

stops the driver after assembly code has been generated and produces an assembly code file


The GNU Compiler Collection and <strong>C++</strong> 5<br />

named file.s. The -E option stops the driver after preprocessing and prints the result from this<br />

first stage on standard output.<br />

A1. Write a “Hello, world!” program in a file hello.cc, compile and test it.<br />

A2. Generate preprocessed output in hello.ii, assembly code output in hello.s, and object code<br />

in hello.o. Study the files:<br />

• hello.ii: this is just to give you an impression of the size of the header.<br />

You will find your program at the end of the file.<br />

• hello.s: your code starts at the label main.<br />

• hello.o: since this is a binary file there is not much to look at. The command nm<br />

hello.o (study the manpage) prints the file’s symbol table (entry points in the<br />

program and the names of the functions that are called by the program and must be<br />

found by the linker).<br />

2 Options and messages<br />

There are many options to the g++ command that we didn’t mention in the previous section. In<br />

the future, we will require that your source files compile correctly using the following command<br />

line:<br />

g++ -c -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces \<br />

-Wparentheses -Wold-style-cast file.cc<br />

Short explanations (you can read more about these and other options in the gcc and g++<br />

manpages):<br />

-c just produce object code, do not link<br />

-pipe use pipes instead of temporary files for communication between<br />

tools (faster)<br />

-O2 optimize the object code on “level 2”<br />

-Wall print most warnings<br />

-W print extra warnings<br />

-ansi follow standard <strong>C++</strong> syntax rules<br />

-pedantic-errors treat “serious” warnings as errors<br />

-Wmissing-braces warn for missing braces {} in some cases<br />

-Wparentheses warn for missing parentheses () in some cases<br />

-Wold-style-cast warn for old-style casts, e.g., (int) instead of static cast<br />

Do not disregard warning messages. Even though the compiler chooses to “only” issue warnings<br />

instead of errors, your program is erroneous or at least questionable. Another option, -Werror,<br />

turns all warnings into errors, thus forcing you to correct your program and remove all warnings<br />

before object code is produced. However, this makes it impossible to test a “half-written” program<br />

so we will normally not use it.<br />

Some of the warning messages are produced by the optimizer, and will therefore not be<br />

output if the -O2 flag is not used. But you must be aware that optimization takes time, and<br />

on a slow machine you may wish to remove this flag during development to save compilation<br />

time. Some platforms define higher optimization levels, -O3, -O4, etc. You should not use these<br />

levels, unless you know very well what their implications are. Some of the optimizations that are<br />

performed at these levels are very aggressive and may result in a faster but much larger program.


6 The GNU Compiler Collection and <strong>C++</strong><br />

It is important that you become used to reading and understanding the GCC error messages.<br />

The messages are sometimes long and may be difficult to understand, especially when the errors<br />

involve the standard STL classes (or any other complex template classes).<br />

3 Introduction to make, and a list example<br />

You have to type a lot in order to compile and link <strong>C++</strong> programs — the command lines are long,<br />

and it is easy to forget an option or two. You also have to remember to recompile all files that<br />

depend on a file that you have modified.<br />

There are tools that make it easier to compile and link, “build”, programs. These may be<br />

integrated development environments (Eclipse, Visual Studio, . . . ) or separate command line<br />

tools. In Unix, make is the most important tool. We will explain make in detail in lab 2. For now,<br />

you only have to know this:<br />

• make reads a Makefile when it is invoked,<br />

• the makefile contains a description of dependencies between files (which files that must be<br />

recompiled/relinked if a file is updated),<br />

• the makefile also contains a description of how to perform the compilation/linking.<br />

As an example, we take the program from assignment A3 (see below). There, two files (list.cc and<br />

ltest.cc) must be compiled and the program linked. Instead of typing the command lines, you<br />

just enter the command make. Make reads the makefile and executes the necessary commands.<br />

The makefile looks like this:<br />

# Define the compiler options<br />

CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors<br />

CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast<br />

# Define what to do when make is executed without arguments.<br />

all: ltest<br />

# The following rule means "if ltest is older than ltest.o or list.o,<br />

# then link ltest".<br />

ltest: ltest.o list.o<br />

g++ -o ltest ltest.o list.o<br />

# Define the rules to create the object files.<br />

ltest.o: ltest.cc list.h<br />

g++ -c $(CXXFLAGS) ltest.cc<br />

list.o: list.cc list.h<br />

g++ -c $(CXXFLAGS) list.cc<br />

You may add rules for other programs to the makefile. All “action lines” must start with a tab,<br />

not eight spaces.<br />

A3. The class List describes a linked list of integer numbers. 1 The numbers are stored in<br />

nodes. A node has a pointer to the next node (0 in the last node). Before the data nodes<br />

there is an empty node, a “head”. An empty list contains only the head node. The only<br />

purpose of the head node is to make it easier to delete an element at the front of the list.<br />

1 In practice, you would never write your own list class. There are several alternatives in the standard library.


The GNU Compiler Collection and <strong>C++</strong> 7<br />

namespace cpp_lab1 {<br />

/* List is a list of long integers */<br />

class List {<br />

public:<br />

/* create an empty list */<br />

List();<br />

};<br />

/* destroy this list */<br />

~List();<br />

/* insert d into this list as the first element */<br />

void insert(long d);<br />

/* remove the first element less than/equal to/greater than d,<br />

depending on the value of df. Do nothing if there is no<br />

value to remove. The public constants may be accessed with<br />

List::LESS, List::EQUAL, List::GREATER outside the class */<br />

enum DeleteFlag { LESS, EQUAL, GREATER };<br />

void remove(long d, DeleteFlag df = EQUAL);<br />

/* returns the size of the list */<br />

int size() const;<br />

/* returns true if the list is empty */<br />

bool empty() const;<br />

/* returns the value of the largest number in the list */<br />

long largest() const;<br />

/* print the contents of the list (for debugging) */<br />

void debugPrint() const;<br />

private:<br />

/* a list node */<br />

struct Node {<br />

long value; // the node value<br />

Node* next; // pointer to the next node, 0 in the last node<br />

Node(long value = 0, Node* next = 0);<br />

};<br />

};<br />

Node* head; // the pointer to the list head, which contains<br />

// a pointer to the first list node<br />

/* forbid copying of lists */<br />

List(const List&);<br />

List& operator=(const List&);<br />

Notes: Node is a struct, i.e., a class where the members are public by default. This is<br />

not dangerous, since Node is private to the class. The copy constructor and assignment<br />

operator are private, so you cannot copy lists.<br />

The file list.h contains the class definition and list.cc contains a skeleton of the class<br />

implementation. Complete the implementation in accordance with the specification.<br />

Also implement, in a file ltest.cc, a test program that checks that your List implementation<br />

is correct. Be careful to check exceptional cases, such as removing the first or the last<br />

element in the list.<br />

Type make to build the program. Note that you will get warnings about unused<br />

parameters when the implementation skeleton is compiled. These warnings will disappear


8 The GNU Compiler Collection and <strong>C++</strong><br />

when you implement the functions.<br />

Try to write code with testing and debugging in mind. The skeleton file list.cc includes<br />

. This makes it possible to introduce assertions on state invariants. The file also<br />

contains a definition of a debug macro, TRACE, which writes messages to the clog output<br />

stream. Assertions as well as TRACE statements are removed from the code when NDEBUG is<br />

defined. 2<br />

A4. Implement a class Coding with two static methods:<br />

/* For any character c, encode(c) is a character different from c */<br />

static unsigned char encode(unsigned char c);<br />

/* For any character c, decode(encode(c)) == c */<br />

static unsigned char decode(unsigned char c);<br />

Use a simple method for the coding and decoding. Then write a program, encode, that<br />

reads a text file 3 , encodes it, and writes the encoded text to another file. The command<br />

line:<br />

./encode file<br />

should run the program, encode file, and write the output to file.enc.<br />

Write another program, decode, that reads an encoded file, decodes it, and writes the<br />

decoded text to another file. The command line should be similar to that of the encode<br />

program. Add rules to the makefile for building the programs.<br />

Test your programs and check that a file that is first encoded and then decoded is<br />

identical to the original.<br />

Note: the programs will work also for files that are UTF8-encoded. In this encoding,<br />

national Swedish characters are encoded in two bytes, and the encode and decode functions<br />

will be called twice for each such character.<br />

4 Object Code Libraries<br />

A lot of software is shipped in the form of libraries, e.g., class packages. In order to use a library, a<br />

developer does not need the source code, only the object files and the headers. Object file libraries<br />

may contain thousands of files and cannot reasonably be shipped as separate files. Instead, the<br />

files are collected into library files that are directly usable by the linker.<br />

4.1 Static Libraries<br />

The simplest kind of library is a static library. The linker treats the object files in a static library<br />

in the same way as other object files, i.e., all code is linked into the executable files, which as a<br />

result may grow very large. In Unix, a static library is an archive file, lib 〈name〉.a. In addition to<br />

the object files, an archive contains an index of the symbols that are defined in the object files.<br />

A collection of object files f1.o, f2.o, f3.o, . . . , are collected into a library using the ar command:<br />

ar crv libfoo.a f1.o f2.o f3.o ...<br />

2 There are two ways to define a global macro like NDEBUG. It can either be specified in the source file:<br />

#define NDEBUG<br />

or be given on the compiler command line, using the -D option:<br />

g++ -c -DNDEBUG $(OTHER CXXFLAGS) list.cc<br />

3 Note that you cannot use while (infile >> ch) to read all characters in infile, since >> skips whitespace. Use<br />

infile.get(ch) instead. Output with outfile


The GNU Compiler Collection and <strong>C++</strong> 9<br />

(Some Unix versions require that you also create the symbol table with ranlib libfoo.a.) In<br />

order to link a program main.o with the object files obj1.o, obj2.o and with object files from the<br />

library libfoo.a, you use the following command line:<br />

g++ -o main main.o obj1.o obj2.o -L. -lfoo<br />

The linker always searches for libraries in certain system directories. The -L. option makes the<br />

linker search also in the current directory. 4 The library name (without lib and .a) is given after<br />

-l.<br />

A5. Collect the object files generated in the other assignments, except those containing main<br />

functions, in a library liblab1.a. Then link the programs (ltest, encode, decode) again,<br />

using the library.<br />

4.2 Shared Libraries<br />

Since most programs use a large amount of code from libraries, executable files can grow very<br />

large. Instead of linking library code into each executable that needs it, the code can be loaded at<br />

runtime. The object files should then be in shared libraries. When linking programs with shared<br />

libraries, the files from the library are not actually linked into the executable. Instead a “pointer”<br />

is established from the program to the library. The obvious advantage is that common code does<br />

not need to be reproduced in all programs.<br />

In Unix shared library files are named lib 〈name〉.so[.x.y.z] (.so for shared objects, .x.y.z is<br />

an optional version number). The linker uses the environment variable LD LIBRARY PATH as the<br />

search path for shared libraries. In Microsoft Windows shared libraries are known as DLL files<br />

(for dynamically loadable libraries).<br />

A6. (Advanced, optional) Create a library as in the previous exercise, but make it shared<br />

instead of static. Then link the executables, with different names, using the shared library.<br />

Make sure they run correctly. Compare the sizes of the dynamically linked executables<br />

to the statically linked (there will not be a big size difference, since the size of the library<br />

code is small).<br />

Use the command ldd (list dynamic dependencies) to inspect the linkage of your<br />

programs. Hints: shared libraries are created by the linker, not the ar archiver. Use the gcc<br />

manpage and the ld manpage (and, if needed, other manpages) to explain the following<br />

sequence of operations:<br />

g++ -fPIC -c *.cc<br />

g++ -shared -Wl,-soname,liblab1.so.1 -o liblab1.so.1.0 *.o<br />

ln -s liblab1.so.1.0 liblab1.so.1<br />

ln -s liblab1.so.1 liblab1.so<br />

You then link with -L. -llab1 as usual. The linker merely checks that all referenced<br />

symbols are in the shared library. Before you execute the program, you must define<br />

LD LIBRARY PATH so it includes the current directory (export is for zsh, setenv for tcsh):<br />

export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH<br />

setenv LD_LIBRARY_PATH .:$LD_LIBRARY_PATH<br />

4 You may have several -L and -l options on a command line. Example, where the current directory and the directory<br />

/usr/local/mylib are searched for the libraries libfoo1.a and libfoo2.a:<br />

g++ -o main main.o obj1.o obj2.o -L. -L/usr/local/mylib -lfoo1 -lfoo2


10 The GNU Compiler Collection and <strong>C++</strong><br />

5 Final Remarks<br />

If a source file includes a header that is not in the current directory, the path to the directory<br />

containing the header can be given to the compiler using the -I option (similar to -L):<br />

g++ -c -Ipath file.cc<br />

To make GCC print all it is doing during compilation (what commands it is calling and a lot of<br />

other information), the option -v (for verbose) can be added to the command line.


Tools for Practical <strong>C++</strong> Development 11<br />

2 Tools for Practical <strong>C++</strong> Development<br />

Objective: to introduce a set of tools which are often used to facilitate <strong>C++</strong> development in a Unix<br />

environment. The make tool (GNU Make) is used for compilation and linking, gdb (the GNU<br />

debugger) for controlled execution and testing, valgrind (not GNU) for finding memory-related<br />

errors.<br />

Read:<br />

• GNU Make, http://www.gnu.org/software/make/manual/<br />

• GDB User Manual, http://www.gnu.org/software/gdb/documentation/<br />

• valgrind Quick Start, http://www.valgrind.org/docs/manual/QuickStart.html<br />

• Manpages for the tools used in the lab.<br />

The manuals have introductory tutorials that are recommended as starting points. The manuals<br />

are very well written and you should consult them if you want to learn more than what is treated<br />

during the lab.<br />

1 GNU Make<br />

1.1 Introduction<br />

As you saw in lab 1, make is a good tool — it sees to it that only files that have been modified<br />

are compiled (and files that depend on modified files). To do this, it compares the modification<br />

times 5 of source files and object files.<br />

We will use the GNU version of make. There are other make programs, which are almost but<br />

not wholly compatible to GNU make. In a Linux or Darwin system, the command make usually<br />

refers to GNU make. You can check what variant you have with the command make --version.<br />

It should give output like the following:<br />

make --version<br />

GNU Make 3.81<br />

Copyright (C) 2006 Free Software Foundation, Inc.<br />

This is free software; see the source for copying conditions.<br />

...<br />

If make turns out to be something else than GNU make, try the command gmake instead of make;<br />

this should refer to GNU make.<br />

Make uses a description of the source project in order to figure out how to generate executables.<br />

The description is written in a file, usually called Makefile. When make is executed, it reads<br />

the makefile and executes the compilation and linking (and maybe other) commands that are<br />

necessary to build the project.<br />

The most important construct in a makefile is a rule. A rule specifies how a file (the target),<br />

which is to be generated, depends on other files (the prerequisites). For instance, the following<br />

rule specifies that the file ltest.o depends on the files ltest.cc and ltest.h:<br />

5 In a network environment with a distributed file system, there may occur problems with unsynchronized clocks<br />

between a client and the file server. Make will report errors like:<br />

make: ***Warning: File "foo" has modification time in the future.<br />

make: warning: Clock skew detected. Your build may be incomplete.<br />

Unfortunately, there is not much you can do about this problem, if you don’t have root privileges on the computer.<br />

Inform the system administrators about the problem and switch to another computer if possible. If no other computer is<br />

available, temporarily work on a local file system where you have write access, such as /tmp.


12 Tools for Practical <strong>C++</strong> Development<br />

ltest.o: ltest.cc list.h<br />

On the line following the dependency specification you write a shell command that generates the<br />

target. This is called an explicit rule. Example (with a short command line without options):<br />

ltest.o: ltest.cc list.h<br />

g++ -c ltest.cc<br />

Make has a very unusual syntax requirement: the shell command must be preceded by a tab<br />

character; spaces are not accepted.<br />

An implicit rule does not give an explicit command to generate the target. Instead, it relies<br />

on make’s large collection of default rules. For instance, make knows how to generate .o-files<br />

from most kinds of source files, for instance that g++ is used to generate .o-files from .cc-files.<br />

The actual implicit rule is:<br />

$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $<<br />

CXX, CPPFLAGS, and CXXFLAGS are variables that the user can define. The syntax $(VARIABLE) is<br />

used to evaluate a variable, returning its value. CXX is the name of the <strong>C++</strong> compiler, CPPFLAGS<br />

are the options to the preprocessor, CXXFLAGS are the options to the compiler. $@ expands to the<br />

name of the target, $< expands to the first of the prerequisites.<br />

We are now ready to write our first makefile, which will build the ltest program from lab 1.<br />

The value g++ for CXX is default, thus that definition is optional. CPPFLAGS and CXXFLAGS are<br />

empty by default. CXXFLAGS is almost always redefined.<br />

# Compiler and compiler options:<br />

CXX = g++<br />

CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors<br />

CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast<br />

# Linking:<br />

ltest: ltest.o list.o<br />

$(CXX) -o $@ $^<br />

# Dependencies, the implicit rule .cc => .o is used:<br />

ltest.o: ltest.cc list.h<br />

list.o: list.cc list.h<br />

$^ expands to the complete list of prerequisites. We will make improvements to this makefile<br />

later.<br />

Suppose that none of the files ltest, ltest.o, list.o exists. Then, the following commands are<br />

executed when you run make (the long command lines have been wrapped):<br />

make ltest<br />

g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses<br />

-Wold-style-cast -c -o ltest.o ltest.cc<br />

g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses<br />

-Wold-style-cast -c -o list.o list.cc<br />

g++ -o ltest ltest.o list.o<br />

If an error occurs in one of the commands, make will be aborted. If there, for instance, is an error<br />

in list.cc, the compilation of that file will be aborted and the program will not be linked. When<br />

you have corrected the error and run make again, it will discover that ltest.o is up to date and<br />

only remake list.o and ltest.<br />

If you run make without a target name as parameter, make builds the first target it finds in the


Tools for Practical <strong>C++</strong> Development 13<br />

makefile. Since ltest is the first target, make ltest and make are equivalent. By convention, the<br />

first target should be named all. Therefore, the first rules in the makefile should look like this:<br />

# Default target ‘all’, make everything<br />

all: ltest<br />

# Linking:<br />

ltest: ltest.o list.o<br />

$(CXX) -o $@ $^<br />

Makefiles can also contain directives that control the behavior of make. For example, a makefile<br />

can include other files with the include directive.<br />

A1. The file Makefile contains the makefile that has been used as an example. Experiment with<br />

make: copy the necessary source files (list.h, list.cc and ltest.cc) to the lab2 directory and<br />

run make. Run make again. Delete the executable program and run make again. Change<br />

one or more of the source files (it is sufficient to touch them) and see what happens. Run<br />

make ltest.o. Run make notarget. Read the manpage and try other options. Etc., etc.<br />

1.2 Phony targets<br />

Makefiles may contain targets that do not actually correspond to files. The all target in the<br />

previous section is an example. Now, suppose that a file all is created in the directory that<br />

contains the makefile. If that file is newer than the ltest file, a make invocation will do nothing<br />

but say make: Nothing to be done for ‘all’., which is not the desired behavior. The solution<br />

is to specify the target all as a phony target, like this:<br />

.PHONY: all<br />

Another common phony target is clean. Its purpose is to remove intermediate files, such as object<br />

files, and it has no prerequisites. It typically looks like this:<br />

.PHONY: clean<br />

clean:<br />

$(RM) $(OBJS)<br />

The variable RM defaults to rm -f, where the option -f tells rm not to warn about non-existent<br />

files. The return value from the command is always 0, so the clean target will always succeed.<br />

A cleaning rule that removes also executable files and libraries could be called, e.g., cleaner or<br />

realclean.<br />

All Unix source distributions contain one or more makefiles. A makefile should contain<br />

a phony target install, with all as its prerequisite. The make install command copies the<br />

programs and libraries built by the all rule to a system-wide location where they can be reached<br />

by other users. This location, which is called the installation prefix, is usually the directory<br />

/usr/local. Installation uses the GNU command install (or plain cp) to copy programs to<br />

$(PREFIX)/bin and libraries to $(PREFIX)/lib. Since only root has permission to write in<br />

/usr/local, you will have to use one of your own directories as prefix.<br />

A2. Create the bin and lib directories. Add all, clean and install targets to your makefile and<br />

specify suitable phony targets. Also provide an uninstall target that removes the files that<br />

were installed by the install target.


14 Tools for Practical <strong>C++</strong> Development<br />

1.3 Rules for Linking<br />

In our example makefile, we used an explicit rule for linking. Actually, there is an implicit rule<br />

for linking, which looks (after some variable expansions) like this:<br />

$(CC) $(LDFLAGS) $^ $(LOADLIBES) $(LDLIBS) -o $@<br />

LDFLAGS are options to the linker, such as -Ldirectory. LOADLIBES and LDLIBS 6 are variables<br />

intended to contain libraries, such as -llab1. So this is a good rule, except for one thing: it<br />

uses $(CC) to link, and CC is by default gcc, not g++. But if you change the definition of CC, the<br />

implicit rule works also for <strong>C++</strong>:<br />

# Define the linker<br />

CC = g++<br />

1.4 Generating Prerequisites Automatically<br />

While you’re working with a project the prerequisites are often changed. New #include directives<br />

are added and others are removed. In order for make to have correct information of the<br />

dependencies, the makefile must be modified accordingly. The necessary modifications can be<br />

performed automatically by make itself.<br />

A3. Read the make manual, section 4.14 “Generating Prerequisites Automatically”, and implement<br />

such functionality in your makefile. Hint: Copy the %.d rule (it is not necessary that<br />

you understand everything in it) to your makefile and make suitable modifications. Do not<br />

forget to include the *.d files.<br />

The first time you run make you will get a warning about .d files that don’t exist. This<br />

is normal and not an error. Look at the .d files that are created and see that they contain<br />

the correct dependencies.<br />

One useful make facility that we haven’t mentioned earlier is wildcards: as an example,<br />

you can define the variable SRC as a list of all .cc files with SRC = $(wildcard *.cc). And<br />

from this you can get all .o files with OBJ = $(SRC:.cc=.o).<br />

From now on, you must update the makefile when you add new programs, so that you<br />

can always issue the make command in the build directory to build everything. When you<br />

add a new program, you should only have to add a rule that defines the .o-files that are<br />

necessary to link the program, and maybe add the program name to a list of executables.<br />

This is important — it will save you many hours of command typing.<br />

(Optional) Collect all object files not containing main functions into a library, and link<br />

executables against that library. If you know how to create shared libraries, use a shared<br />

library libcxx.so, otherwise create a static archive libcxx.a.<br />

1.5 Writing a Good Makefile<br />

A makefile should always be written to run in sh (Bourne Shell), not in csh. Do not use special<br />

features from, e.g., bash, zsh, or ksh. The tools used in explicit rules should be accessed through<br />

make variables, so that the user can substitute alternatives to the defaults.<br />

A makefile should also be written so that it doesn’t have to be located in the same directory<br />

as the source code. You can achieve this by defining a SRCDIR variable and make all rules refer<br />

to the source code from that prefix. Even better is to use the variable VPATH, which specifies a<br />

search path for source files. Vpath is described in section 4.5 of the make manual.<br />

6 There doesn’t seem to be any difference between LOADLIBES and LDLIBS — they always appear together and are<br />

concatenated. Use LDLIBS.


Tools for Practical <strong>C++</strong> Development 15<br />

A4. (Optional, VPATH is difficult) Create a new directory, src, for <strong>C++</strong> lab source code and copy<br />

the source files from lab 1 into it. In a parallel directory build, install the Makefile, and<br />

modify it so that you can build the programs (and possibly libraries), standing in the build<br />

directory. The all target, which should be default, should build your ltest, encode, and<br />

decode programs. No command should write to the src directory.<br />

2 Debugging<br />

Consider the following program (reciprocal.cc):<br />

#include <br />

#include // for atoi, see below<br />

double reciprocal(int i) {<br />

return 1.0 / i;<br />

}<br />

int main(int argc, char** argv) {<br />

int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />

double r = reciprocal(i);<br />

std::cout


16 Tools for Practical <strong>C++</strong> Development<br />

(gdb) is a command prompt.<br />

The first step is to run the program inside the debugger. Enter the command run and the program<br />

arguments. First run the program without arguments, like this:<br />

(gdb) run<br />

Starting program: /h/dd/c/.../reciprocal<br />

Program received signal SIGSEGV, Segmentation fault.<br />

*__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380)<br />

at strtol_l.c:298<br />

...<br />

The error occurred in the strtol l internal function, which is an internal function that atoi<br />

calls. Inspect the call stack with the where command:<br />

(gdb) where<br />

#0 *__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380)<br />

at strtol_l.c:298<br />

#1 0xb7e07fc6 in *__GI_strtol (nptr=0x0, endptr=0x0, base=10) at strtol.c:110<br />

#2 0xb7e051b1 in atoi (nptr=0x0) at atoi.c:28<br />

#3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9<br />

You see that the atoi function was called from main. Use the up command (three times) to go to<br />

the main function in the call stack:<br />

...<br />

(gdb) up<br />

#3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9<br />

9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />

Note that GDB finds the source code for main, and it shows the line where the function call<br />

occurred.<br />

You can examine the value of a variable using the print command:<br />

(gdb) print argv[1]<br />

$1 = 0x0<br />

This confirms a suspicion that the problem is indeed a 0 pointer passed to atoi. Set a breakpoint<br />

on the first line of main with the break command:<br />

(gdb) break main<br />

Breakpoint 1 at 0x80487a0: file reciprocal.cc, line 9.<br />

Now run the program with an argument:<br />

(gdb) run 5<br />

The program being debugged has been started already.<br />

Start it from the beginning? (y or n) y<br />

Starting program: /h/dd/c/.../reciprocal 5<br />

Breakpoint 1, main (argc=2, argv=0xbff7aae4) at reciprocal.cc:9<br />

9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />

The debugger has stopped at the breakpoint. Step over the call to atoi using the next command:


Tools for Practical <strong>C++</strong> Development 17<br />

(gdb) next<br />

10 double r = reciprocal(i);<br />

If you want to see what is going on inside reciprocal, step into the function with step:<br />

(gdb) step<br />

reciprocal (i=5) at reciprocal.cc:5<br />

5 return 1.0 / i;<br />

The next statement to be executed is the return statement. Check the value of i and then issue<br />

the continue command to execute to the next breakpoint (which doesn’t exist, so the program<br />

terminates):<br />

(gdb) print i<br />

$2 = 5<br />

(gdb) continue<br />

Continuing.<br />

argv[1] = 5, 1/argv[1] = 0.2<br />

Program exited normally.<br />

A5. Try the debugger by working through the example. Experiment with the commands (the<br />

table below gives short explanations of some useful commands). Commands may be<br />

abbreviated and you may use tab for command completion. Ctrl-a takes you to the<br />

beginning of the command line, Ctrl-e to the end, and so on. Use the help facility and the<br />

manual if you want to learn more.<br />

help [command] Get help about gdb commands<br />

run [args...] Run the program with arguments specified.<br />

continue Continue execution.<br />

next Step to the next line over called functions.<br />

step Step to the next line into called functions.<br />

where Print the call stack.<br />

up Go up to caller.<br />

down Go down to callee.<br />

list [nbr] List 10 lines around the current line or around line nbr (the<br />

following lines if repeated).<br />

break func Set a breakpoint on the first line of a function func.<br />

break nbr Set a breakpoint at line nbr in the current file.<br />

catch [arg] Break on events, for example exceptions that are thrown or caught<br />

(catch throw, catch catch).<br />

info [arg] Generic command for showing things about the program, for<br />

example information about breakpoints (info break).<br />

delete [nbr] Delete all breakpoints or breakpoint nbr.<br />

print expr Print the value of the expression expr.<br />

display var Print the value of variable var every time the program stops.<br />

undisplay [nbr] Delete all displays or display nbr.<br />

set var = expr Assign the value of expr to var.<br />

kill Terminate the debugger process.<br />

watch var Set a watchpoint, i.e., watch all modifications of a variable. This<br />

can be very slow but can be the best solution to find bugs when<br />

”random” pointers are used to write in the wrong memory location.


18 Tools for Practical <strong>C++</strong> Development<br />

whatis var|func Print the type of a variable or function.<br />

source file Read and execute the commands in file.<br />

make Run the make command.<br />

The debugging information that is written to the executable file can be removed using the<br />

strip program from binutils. See the manpage for more information.<br />

You may wish to disable optimization (remove the -O2 option) while you are debugging<br />

a program. GCC allows you to use a debugger together with optimization but the<br />

optimization may produce surprising results: some variables that you have defined may<br />

not exist in the object code, flow of control may move where you did not expect it, some<br />

statements may not be executed because they compute constant results, etc.<br />

To make it easier to switch between generating debugging versions and production<br />

versions of your programs, you can add something like this to your makefile:<br />

# insert this line somewhere at the top of the file<br />

DEBUG = true // or false<br />

# insert the following lines after the definitions of CXXFLAGS and LDFLAGS<br />

ifeq ($(DEBUG), true)<br />

CXXFLAGS += -ggdb<br />

LDFLAGS += -ggdb<br />

else<br />

CXXFLAGS += -O2<br />

endif<br />

You can specify the value of DEBUG on the command line (the -e options means that the<br />

command line value will override any definition in the makefile):<br />

make -e DEBUG=true<br />

3 Finding Memory-Related Errors<br />

<strong>C++</strong> is less programmer-friendly than Java. In Java, many common errors are caught by the<br />

compiler (use of uninitialized variables) or by the runtime system (addressing outside array<br />

bounds, dereferencing null pointers, etc.). In <strong>C++</strong>, errors of this kind are not caught; instead<br />

they result in erroneous results or traps during program execution. Furthermore, you get no<br />

information whatsoever about where in the program the error occurred. Since allocation of<br />

dynamic memory in <strong>C++</strong> is manual, you also have a whole new class of errors (double delete,<br />

memory leaks).<br />

Valgrind (http://www.valgrind.org) is a tool (available only under Linux and Mac OS X)<br />

that helps you to find memory-related errors at the precise locations at which they occur. It does<br />

this by emulating an x86 processor and supplying each data bit with information about the usage<br />

of the bit. This results in slower program execution, but this is more than compensated for by the<br />

reduced time spent in hunting bugs.<br />

It is very easy to use valgrind. To find the error in the reciprocal program from the previous<br />

section, you just do this (compile and link with -ggdb and without optimization):<br />

valgrind --leak-check=yes ./reciprocal<br />

==2938== Memcheck, a memory error detector<br />

==2938== Copyright (C) 2002-2010, and GNU GPL’d, by Julian Seward et al.<br />

==2938== ...<br />

==2938== Invalid read of size 1<br />

==2938== at 0x41A602F: ____strtol_l_internal (strtol_l.c:298)


Tools for Practical <strong>C++</strong> Development 19<br />

==2938== by 0x41A5DE6: strtol (strtol.c:110)<br />

==2938== by 0x41A3160: atoi (atoi.c:28)<br />

==2938== by 0x80486B9: main (reciprocal.cc:9)<br />

==2938== Address 0x0 is not stack’d, malloc’d or (recently) free’d<br />

==2938== ...<br />

==2938==<br />

==2938== HEAP SUMMARY:<br />

==2938== in use at exit: 0 bytes in 0 blocks<br />

==2938== total heap usage: 0 allocs, 0 frees, 0 bytes allocated<br />

==2938==<br />

==2938== All heap blocks were freed -- no leaks are possible<br />

==2938==<br />

==2938== For counts of detected and suppressed errors, rerun with: -v<br />

==2938== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 6)<br />

[1] 2938 segmentation fault valgrind --leak-check=yes ./reciprocal<br />

When the error occurs, you get an error message and a stack trace (and a lot of other information).<br />

In this case, it is almost the same information as when you used gdb, but this is only because the<br />

error resulted in a segmentation violation. But most errors don’t result in immediate traps:<br />

void zero(int* v, size_t n) {<br />

for (size_t i = 0; i < n; ++i)<br />

v[i] = 0;<br />

}<br />

int main() {<br />

int* v = new int[5];<br />

zero(v, 10); // oops, should have been 5<br />

// ... this error may result in a trap now or much later, or wrong results<br />

}<br />

A6. The file vgtest.cc contains a test program with two functions with common programming<br />

errors. Write more functions to check other common errors. Note that valgrind cannot find<br />

addressing errors in stack-allocated arrays (like int v[10]; v[10] = 0). Run the program<br />

under valgrind, try to interpret the output.<br />

Advice: always use valgrind to check your programs! Note that valgrind may give<br />

“false” error reports on <strong>C++</strong> programs that use STL or <strong>C++</strong> strings. See the valgrind FAQ,<br />

http://valgrind.org/docs/manual/faq.html#faq.reports<br />

4 Makefiles and Topological Sorting<br />

4.1 Introduction<br />

Targets and prerequisites of a makefile form a graph where the vertices are files and an arc<br />

between two vertices corresponds to a dependency. When make is invoked, it computes an order<br />

in which the files are to be compiled so the dependencies are satisfied. This corresponds to a<br />

topological sort of the graph.<br />

Consider a simplified makefile containing only implicit rules, i.e., lines reading target:<br />

prerequisites. You are to implement a <strong>C++</strong> program topsort that reads such a file, whose name<br />

is given on the command line, and outputs the targets to standard output in an order where<br />

the dependencies are satisfied. If the makefile contains syntax errors or cyclic dependencies, the<br />

program should exit with an error message.<br />

This is not a trivial task. The solution will be developed stepwise in the following sections.


20 Tools for Practical <strong>C++</strong> Development<br />

4.2 Graph Representation<br />

The first task is to choose a representation of the dependency graph. We will use a representation<br />

where each vertex has a list of its neighbors. This list is called an adjacency list.<br />

Consider the following makefile:<br />

D: A<br />

E: B D<br />

B: A C<br />

The dependencies in the makefile form the following graph (the graph to the left, the vertices<br />

with their adjacency lists to the right):<br />

A D<br />

B<br />

C E<br />

A graph has a list of Vertex objects (A, B, C, D, E, in the figure). A vertex is represented by an object<br />

of class Vertex, which has the attributes name (the label on the vertex) and adj (the adjacency<br />

list). In the figure, the adjacency list of vertex A contains B and D.<br />

The class definitions follow. First, class Vertex (the attribute color is used when traversing<br />

the graph, see section 4.3):<br />

class Vertex {<br />

friend class VertexList; // give VertexList access to private members<br />

private:<br />

/* create a vertex with name nm */<br />

Vertex(const std::string& nm);<br />

};<br />

std::string name; // name of the vertex<br />

VertexList adj; // list of adjacent vertices<br />

enum Color { WHITE, GRAY, BLACK };<br />

Color color; // used in the traversal algorithm<br />

In class VertexList, the adjacency list is implemented as a vector of pointers to other vertices<br />

(vptrs). The functions top sort and dfs visit are described in section 4.3.<br />

struct cyclic {}; // exception type, the graph is cyclic<br />

class VertexList {<br />

public:<br />

/* create an empty vertex list, destroy the list */<br />

VertexList();<br />

~VertexList();<br />

/* insert a vertex with the label name in the list */<br />

void add_vertex(const std::string& name);<br />

A<br />

B<br />

C<br />

D<br />

E


Tools for Practical <strong>C++</strong> Development 21<br />

/* insert an arc from the vertex ‘from’ to the vertex ‘to’ */<br />

void add_arc(const std::string& from, const std::string& to);<br />

/* return the vertex names in reverse topological order */<br />

std::stack top_sort() throw(cyclic);<br />

/* print the vertex list (for debugging) */<br />

void debugPrint() const;<br />

private:<br />

void insert(Vertex* v); // insert the vertex pointer v in vptrs, if<br />

// it’s not already there<br />

Vertex* find(const std::string& name) const; // return a pointer to the<br />

// vertex named name, 0 if not found<br />

void dfs_visit(Vertex* v, std::stack& result) throw(cyclic);<br />

// visit v in the traversal algorithm<br />

};<br />

std::vector vptrs; // vector of pointers to vertices<br />

VertexList(const VertexList&); // forbid copying<br />

VertexList& operator=(const VertexList&);<br />

Finally, the class Graph is merely a typedef for a VertexList:<br />

Notes:<br />

typedef VertexList Graph;<br />

• add vertex in VertexList should create a new Vertex object and add it to the vertex list.<br />

It should do nothing if a vertex with that name already is present in the list.<br />

• add arc should insert to in from’s adjacency list. It should start with inserting from and<br />

to in the vertex list — in that way arcs and vertices may be added in arbitrary order.<br />

• find is used in add vertex and in insert.<br />

• The VertexList destructor is quite difficult to write correctly, since you have to be careful<br />

so no Vertex object is deleted more than once. It is easiest to start by removing pointers<br />

from the adjacency lists so the graph isn’t circular, before deleting the objects. — You may<br />

treat the destructor as optional, if you wish.<br />

• Advice: draw your own picture of a graph structure, with all objects, vptrs and pointers.<br />

A7. The classes are in the files vertex.h, vertex.cc, vertexlist.h, vertexlist.cc, graph.h. Implement<br />

the classes, except top sort and dfs visit, and test using the program graph test.cc.<br />

Comment out the lines in the function testGraph that check the top sort algorithm.<br />

4.3 Topological Sort<br />

A topological ordering of the vertices of a graph is essentially obtained by a depth-first search<br />

from all vertices. (There are other algorithms for topological sorting. If you prefer another<br />

algorithm you are free to use that one instead.) The topological ordering is not necessarily unique.<br />

For instance, the vertices of the graph in the preceding section may be ordered like this:<br />

A C B D E or<br />

C A D B E or . . .<br />

The algorithm is described below, in pseudo-code. The algorithm produces the vertices in reverse<br />

topological order, which is why we use a stack for the result.


22 Tools for Practical <strong>C++</strong> Development<br />

The algorithm should detect cycles in the graph. If that happens, your function should throw<br />

a cyclic exception. The cycle detection is not described in the algorithm; you must add that<br />

yourself.<br />

Topological-Sort(Graph G) {<br />

/* 1. initialize the graph and the result stack */<br />

for each vertex v in G<br />

v.color = WHITE;<br />

Stack result;<br />

/* 2. search from all unvisited vertices */<br />

for each vertex v in G<br />

if (v.color == WHITE)<br />

DFS-visit(v, result);<br />

}<br />

DFS-visit(Vertex v, Stack result) {<br />

/* 1. white vertex v is just discovered, color it gray */<br />

v.color = GRAY;<br />

/* 2. recursively visit all unvisited adjacent nodes */<br />

for each vertex w in v.adj<br />

if (w.color == WHITE)<br />

DFS-visit(w);<br />

/* 3. finish vertex v (color it black and output it) */<br />

v.color = BLACK;<br />

result.push(v.name);<br />

}<br />

A8. Implement top sort and dfs visit and test the program.<br />

4.4 Parsing the Makefile<br />

A9. Finally, you are ready to solve the last part of the assignment: to read a makefile, build a<br />

graph, sort it topologically and print the results. The input to the program is a makefile<br />

containing lines of the type:<br />

target: prereq1 prereq2 ...<br />

The names of the target and the prerequisites are strings. The list of prerequisites may<br />

be empty. The program should perform reasonable syntax checks (the most important is<br />

probably that a line must contain a target).<br />

The file topsort.cc contains a test program. Implement the function buildGraph and<br />

test. The files test[1-5].txt contain test cases (the same that were used in the graph test<br />

program in assignment A8), but you are encouraged to also test other cases.


Strings and Streams 23<br />

3 Strings and Streams<br />

Objective: to practice using the string class and the stream classes.<br />

Read:<br />

• Book: strings, streams, function templates.<br />

1 Class string<br />

1.1 Introduction<br />

In C, a string is a null-terminated array of characters. This representation is the cause of many<br />

errors: overwriting array bounds, trying to access arrays through uninitialized or incorrect<br />

pointers, and leaving dangling pointers after an array has been deallocated. The <br />

library contains some useful operations on C-strings, such as copying and comparing strings.<br />

<strong>C++</strong> strings hide the physical representation of the sequence of characters. A <strong>C++</strong> string<br />

object knows its starting location in memory, its contents (the characters), its length, and the<br />

length to which it can grow before it must resize its internal data buffer (i.e., its capacity).<br />

The exact implementation of the string class is not defined by the <strong>C++</strong> standard. The<br />

specification in the standard is intended to be flexible enough to allow different implementations,<br />

yet guarantee predictable behavior.<br />

The string identifier is not actually a class, but a typedef for a specialized template:<br />

typedef std::basic_string string;<br />

So string is a string containing characters of the type char. There is another standard string<br />

specialization, wstring, which describes strings of characters of the type wchar t. wchar t is the<br />

type representing “wide characters”. The standard does not specify the size of these characters;<br />

usually it’s 16 or 32 bits (32 in gcc). In this lab we will ignore all “internationalization” problems<br />

and assume that all characters fit in one byte. Strings in <strong>C++</strong> are not limited to the classes string<br />

and wstring; you may specify strings of “characters” of any type, such as int. However, this is<br />

discouraged: there are other sequence containers that are better suited for non-char types.<br />

Since we’re only interested in the string specialization of basic string, you may wonder why<br />

we mention basic string at all? There are two reasons: First, std::basic string is all the<br />

compiler knows about in all but the first few translation passes. Hence, diagnostic messages only<br />

mention std::basic string. Secondly, it is important for the general understanding of the<br />

ideas behind the string method implementations to have a grasp of the basics of a <strong>C++</strong> concept<br />

called character traits, i.e., character “characteristics”. For now, we will just show parts of the<br />

definition of basic string:<br />

/* Standard header */<br />

namespace std {<br />

template<br />

class basic_string {<br />

// types:<br />

typedef traits_t traits_type;<br />

typedef typename traits_t::char_type char_type;<br />

typedef unsigned long size_type;<br />

static const size_type npos = -1;<br />

// much more<br />

};<br />

}


24 Strings and Streams<br />

char type is the type of the characters in the string. size type is a type used for indexing in the<br />

string. npos (“no position”) indicates a position beyond the end of the string; it may be returned<br />

by functions that search for characters in a string. Since size type is unsigned, npos represents<br />

positive infinity.<br />

1.2 Operations on Strings<br />

We now show some of the most important operations on strings (actually basic string’s). The<br />

operations are both member functions and global helper functions. We start with the member<br />

functions (for formatting reasons, some parameters of type size type have been specified as<br />

int):<br />

class string {<br />

public:<br />

/*** construction ***/<br />

string(); // create an empty string<br />

string(const string& s); // create a copy of s<br />

string(const char* cs); // create a string with the characters from cs<br />

string(const string& s, int start, int n); // create a string with n<br />

// characters from s, starting at position start<br />

string(int n, char ch); // create a string with n copies of ch<br />

}<br />

/*** information ***/<br />

int size(); // number of characters in the string<br />

int capacity(); // the string can hold this many characters<br />

// before resizing is necessary<br />

void reserve(int n); // set the capacity to n, resize if necessary<br />

/*** character access ***/<br />

const char& operator[](size_type pos) const;<br />

char& operator[](size_type pos);<br />

/*** substrings */<br />

string substr(size_type start, int n); // the substring starting at<br />

// position start, containing n characters<br />

/*** finding things ***/<br />

// see below<br />

/*** inserting, replacing, and removing ***/<br />

void insert(size_type pos, const string& s); // insert s at position pos<br />

void append(const string& s); // append s at the end<br />

void replace(size_type start, int n, const string& s); // replace n<br />

// characters starting at pos with s<br />

void erase(size_type start = 0, int n = npos); // remove n characters<br />

// starting at pos<br />

/*** assignment and concatenation ***/<br />

string& operator=(const string& s);<br />

string& operator=(const char* cs);<br />

string& operator=(char ch);<br />

string& operator+=(const string& s); // also for const char* and char<br />

/*** access to C-string representation ***/<br />

const char* c_str();<br />

• Note that there is no constructor string(char). Use string(1, char) instead.


Strings and Streams 25<br />

• The subscript functions operator[] do not check for a valid index. There are similar at()<br />

functions that do check, and that throw out of range if the index is not valid.<br />

• The substr() member function takes a starting position as its first argument and the number<br />

of characters as the second argument. This is different from the substring() method in<br />

java.lang.String, where the second argument is the end position of the substring.<br />

• There are overloads of most of the functions. You can use C-strings or characters as<br />

parameters instead of strings.<br />

• There is a bewildering variety of member functions for finding strings, C-strings or characters.<br />

They all return npos if the search fails. The functions have the following signature (the<br />

string parameter may also be a C-string or a character):<br />

size_type FIND_VARIANT(const string& s, size_type pos = 0) const;<br />

s is the string to search for, pos is the starting position. (The default value for pos is npos,<br />

not 0, in the functions that search backwards).<br />

The “find variants” are find (find a string, forwards), rfind (find a string, backwards),<br />

find first of and find last of (find one of the characters in a string, forwards or backwards),<br />

find first not of and find last not of (find a character that is not one of the<br />

characters in a string, forwards or backwards).<br />

Example:<br />

void f() {<br />

string s = "accdcde";<br />

int i1 = s.find("cd"); // i1 = 2 (s[2]==’c’ && s[3]==’d’)<br />

int i2 = s.rfind("cd"); // i2 = 4 (s[4]==’c’ && s[5]==’d’)<br />

int i3 = s.find_first_of("cd"); // i3 = 1 (s[1]==’c’)<br />

int i4 = s.find_last_of("cd"); // i4 = 5 (s[5]==’d’)<br />

int i5 = s.find_first_not_of("cd"); // i5 = 0 (s[0]!=’c’ && s[0]!=’d’)<br />

int i6 = s.find_last_not_of("cd"); // i6 = 6 (s[6]!=’c’ && s[6]!=’d’)<br />

}<br />

The global overloaded operator functions are for concatenation (operator+) and for comparison<br />

(operator==, operator


26 Strings and Streams<br />

A2. The Sieve of Eratosthenes is an ancient method for finding all prime numbers less than<br />

some fixed number M. It starts by enumerating all numbers in the interval [0, M] and<br />

assuming they are all primes. The first two numbers, 0 and 1 are marked, as they are not<br />

primes. The algorithm then starts with the number 2, marks all subsequent multiples of<br />

2 as composites, and repeats the process for the next prime candidate. When the initial<br />

sequence is exhausted, the numbers not marked as composites are the primes in [0, M].<br />

In this assignment you shall use a string for the enumeration. Initialize a string with<br />

appropriate length to PPPPP...PPP. The characters at positions that are not prime numbers<br />

should be changed to C. Write a test program that prints the prime numbers between 1<br />

and 200 and also the largest prime that is less than 100,000.<br />

Example with the numbers 0–27:<br />

1 2<br />

0123456789012345678901234567<br />

Initial: CCPPPPPPPPPPPPPPPPPPPPPPPPPP<br />

Find 2, mark 4,6,8,...: CCPPCPCPCPCPCPCPCPCPCPCPCPCP<br />

Find 3, mark 6,9,12,...: CCPPCPCPCCCPCPCCCPCPCCCPCPCC<br />

Find 5, mark 10,15,20,25: CCPPCPCPCCCPCPCCCPCPCCCPCCCC<br />

Find 7, mark 14,21: CCPPCPCPCCCPCPCCCPCPCCCPCCCC<br />

...<br />

2 The iostream Library<br />

2.1 Input/Output of User-Defined Objects<br />

We have already used most of the stream classes: istream, ifstream, and istringstream for<br />

reading, and ostream, ofstream, and ostringstream for writing. There are also iostream’s that<br />

allow both reading and writing. The stream classes are organized in the following (simplified)<br />

generalization hierarchy:<br />

ios_base<br />

ios<br />

istream ostream<br />

iostream<br />

ifstream istringstream fstream stringstream ofstream ostringstream<br />

The classes ios base and ios contain, among other things, information about the stream state.<br />

There are, for example, functions bool good() (the state is ok) and bool eof() (end-of-file has<br />

been reached). There is also a conversion operator void*() that returns nonzero if the state is<br />

good, and a bool operator!() that returns nonzero if the state is not good. We have used these<br />

operators with input files, writing for example while (infile >> ch) or if (! infile).<br />

Now, we are interested in reading objects of a class A from an istream using operator>> and<br />

in writing objects to an ostream using operator(std::istream& is, A& aobj);<br />

std::ostream& operator


Strings and Streams 27<br />

Note:<br />

• Since the parameters are of type istream or ostream, the operator functions work for both<br />

file streams and string streams.<br />

• The functions are global “helper” functions. 7<br />

• Both operators return the stream object that is the first argument to the operator, enabling<br />

us to write, e.g., cout c;<br />

if (c == ’(’) {<br />

is >> re >> c;<br />

if (c == ’,’) {<br />

is >> im >> c;<br />

if (c == ’)’) {<br />

cplx = complex(re, im);<br />

}<br />

else {<br />

// format error, set the stream state to ’fail’<br />

is.setstate(ios_base::failbit);<br />

}<br />

}<br />

else if (c == ’)’) {<br />

cplx = complex(re, num_type(0));<br />

}<br />

else {<br />

// format error, set the stream state to ’fail’<br />

is.setstate(ios_base::failbit);<br />

}<br />

}<br />

else {<br />

is.putback(c);<br />

is >> re;<br />

cplx = complex(re, num_type(0));<br />

}<br />

return is;<br />

}<br />

}<br />

7 The alternative implementation, to write operator>> and operator


28 Strings and Streams<br />

A3. The files date.h, date.cc, and date test.cc describe a simple date class. Implement the<br />

class and add operators for input and output of dates. Dates should be output in the<br />

form 2012-01-10. The input operator should accept dates in the same format. (You may<br />

consider dates written like 2012-1-10 and 2012 -001 - 10 as legal, if you wish.)<br />

2.2 String Streams<br />

The string stream classes (istringstream and ostringstream) function as their “file” counterparts<br />

(ifstream and ofstream). The only difference is that characters are read from/written to a<br />

string instead of a file.<br />

A4. In Java, the class Object defines a method toString() that is supposed to produce a<br />

“readable representation” of an object. This method can be overridden in subclasses.<br />

Write a <strong>C++</strong> function toString for the same purpose. Examples of usage:<br />

double d = 1.234;<br />

Date today;<br />

std::string sd = toString(d);<br />

std::string st = toString(today);<br />

You may assume that the argument object can be output with


<strong>Programming</strong> with the STL 29<br />

4 <strong>Programming</strong> with the STL<br />

Objective: to practice using the STL container classes and algoritms, with emphasis on efficiency.<br />

You will also learn more about operator overloading and STL iterators.<br />

Read:<br />

• Book: STL containers and STL algorithms. Operator overloading, iterators.<br />

1 Domain Name Servers and the STL Container Classes<br />

On the web, computers are identified by IP addresses (32-bit numbers). Humans identify<br />

computers by symbolic names. A Domain Name Server (DNS) is a component that translates<br />

a symbolic name to the corresponding IP address. The DNS system is a very large distributed<br />

database that contains billions (or at least many millions) of IP addresses and that receives billions<br />

of lookup requests every day. Furthermore, the database is continuously updated. If you wish<br />

to know more about name servers, read the introduction at http://computer.howstuffworks.<br />

com/dns.htm.<br />

In this lab, you will implement a local DNS in <strong>C++</strong>. With “local”, we mean that the DNS does<br />

not communicate with other name servers; it can only perform translations using its own database.<br />

The goal is to develop a time-efficient name server, and you will study different implementation<br />

alternatives. You shall implement three versions (classes) of the name server, using different STL<br />

container classes. All three classes should implement the interface NameServerInterface:<br />

typedef std::string HostName;<br />

typedef unsigned int IPAddress;<br />

const IPAddress NON_EXISTING_ADDRESS = 0;<br />

class NameServerInterface {<br />

public:<br />

virtual ~NameServerInterface() {}<br />

virtual void insert(const HostName&, const IPAddress&) = 0;<br />

virtual bool remove(const HostName&) = 0;<br />

virtual IPAddress lookup(const HostName&) const = 0;<br />

};<br />

insert() inserts a name/address pair into the database, without checking if the name already<br />

exists. remove() removes a name/address pair and returns true if the name exists; it does<br />

nothing and returns false if the name doesn’t exist. lookup() returns the IP address for a<br />

specified name, or NON EXISTING ADDRESS if the name doesn’t exist.<br />

Since this is an STL lab, you shall use STL containers and algorithms as much as possible.<br />

This means, for example, that you are not allowed to use any for or while statements in your<br />

solutions. (There is one exception: you may use a for or while statement in the hash function,<br />

see assignment A1c.)<br />

A1. The definition of the class NameServerInterface is in the file nameserverinterface.h.<br />

a) Implement a class VectorNameServer that uses an unsorted vector to store the name/<br />

address pairs. Use linear search to search for an element. The intent is to demonstrate<br />

that such an implementation is inefficient.<br />

Use the STL find if algorithm to search for a host name. Consider carefully the third<br />

parameter to the algorithm: it should be a function or a functor that compares an<br />

element in the vector (a name/address pair) with a name. If you use a functor, the<br />

call may look like this:


30 <strong>Programming</strong> with the STL<br />

find_if(v.begin(), v.end(), bind2nd(first_equal(), name))<br />

first equal is a functor (a class) that takes two parameters, a pair and a host name,<br />

and compares the first component of the pair with the name.<br />

b) Implement a class MapNameServer that uses a map to store the name/address pairs.<br />

The average search time in this implementation will be considerably better than that<br />

for the vector implementation.<br />

c) It is possible that the map implementation of the name server is sufficiently efficient,<br />

but you shall also try a third implementation. Implement a class HashNameServer<br />

that uses a hash table to store the name/address pairs. Use a vector of vector’s to<br />

store the name/address pairs. 8<br />

The hash table implementation is open for experimentation: you must select an<br />

appropriate size for the hash table and a suitable hash function. 9 It can also be<br />

worthwhile to exploit the fact that most (in fact all, in the test program) computer<br />

names start with “www.”. Without any greater difficulty, you should be able to<br />

improve the search times that you obtained in the map implementation by about 50%.<br />

Use the program nstest.cc to check that the insert/remove/lookup functions work correctly.<br />

Then, use the program nstime.cc to measure and print the search times for the three<br />

different implementations, using the file nameserverdata.txt as input (the file contains<br />

25,143 name/address pairs 10 ).<br />

A2. Examples of average search times in milliseconds for a name server with 25,143 names are<br />

in the following table.<br />

vector 0.185<br />

map 0.00066<br />

hash 0.00029<br />

25,143 100,000<br />

Search the Internet for information about search algorithms for different data structures, or<br />

use your knowledge from the algorithms and data structures course, and fill in the blanks<br />

in the table. Write a similar table for your own implementation.<br />

2 Bitsets, Subscripting, and Iterators<br />

2.1 Bitsets<br />

To manipulate individual bits in a word, <strong>C++</strong> provides the bitwise operators & (and), | (or),<br />

^ (exclusive or), and the shift operators > (shift right). The STL class bitset<br />

generalizes this notion and provides operations on a set of N bits indexed from 0 through N-1.<br />

N may be arbitrary large, indicating that the bitset may occupy many words.<br />

For historical reasons, bitset doesn’t provide any iterators. We will develop a simplified<br />

version of the bitset class where all the bits fit in one word, and extend the class with iterators<br />

so it becomes possible to use the standard STL algorithms with the class. Our goal is to provide<br />

enough functionality to make the following program work correctly:<br />

8 GNU STL contains a class hash map (in newer versions unordered map), a map implementation that uses a hash table.<br />

You should not use any of these classes in this assignment.<br />

9 Note that a good hash function should take all (or at least many) of the characters of a string into account and that<br />

"abc" and "cba" should have different hash codes. For instance, a hash function that merely adds the first and last<br />

characters of a string is not acceptable.<br />

10 It was difficult to find a file with real name/address pairs, so the computer names are common words from a dictionary<br />

file, with www. first and .se or similar last. The IP addresses are running numbers.


<strong>Programming</strong> with the STL 31<br />

int main() {<br />

using namespace std;<br />

using namespace cpp_lab4;<br />

}<br />

// Define an empty bitset, set some bits, print the bitset<br />

Bitset bs;<br />

bs[3] = true;<br />

bs[0] = bs[3];<br />

copy(bs.begin(), bs.end(), ostream_iterator(cout));<br />

cout


32 <strong>Programming</strong> with the STL<br />

An outline of the class (BitsetStorage is the type of the word that contains the bits):<br />

class BitReference {<br />

public:<br />

BitReference(Bitset::BitStorage* pb, size_t p)<br />

: p_bits(pb), pos(p) {}<br />

// Operations will be added later<br />

protected:<br />

Bitset::BitStorage* p_bits; // pointer to the word containing bits<br />

size_t pos; // position of the bit in the word<br />

};<br />

Now, operator[] in class Bitset may be defined as follows:<br />

BitReference operator[](size_t pos) {<br />

return BitReference(&bits, pos);<br />

}<br />

The actual bit fiddling is performed in the BitReference class. In order to see what we need to<br />

implement in this class, we must study the results of some expressions involving operator[]:<br />

bs[3] = true; // bs.operator[](3) = true; =><br />

// BitReference(&bs.bits,3) = true; =><br />

// BitReference(&bs.bits,3).operator=(true);<br />

From this follows that we must implement the following operator function in BitReference:<br />

BitReference& operator=(bool x); // for bs[i] = x<br />

This function should set the bit referenced by the BitReference object to the value of x (just like<br />

the set function in the original Bitset class). When we investigate (do this on your own) other<br />

possibilities to use operator[] we find that we also need the following functions:<br />

BitReference& operator=(const BitReference& bsr); // for bs[i] = bs[j]<br />

operator bool() const; // for x = bs[i]<br />

A4. Use the files bitset.h, bitset.cc, bitreference.h, bitreference.cc, and bitsettest1.cc. Implement<br />

the functions in bitreference.cc and test. It is a good idea to execute the program line<br />

by line under control of gdb, even if your implementation is correct, so you can follow<br />

the execution closely and see what happens (turn off the -O2 switch on the compilation<br />

command line if you do this).<br />

2.3 Iterators<br />

From one of the OH slides: “An iterator “points” to a value. All iterators are DefaultConstructible<br />

and Assignable and support ++it and it++.” A ForwardIterator should additionally be<br />

EqualityComparable and support *it, both for reading and writing via the iterator.<br />

The most important requirement is that an iterator should point to a value. A BitsetIterator<br />

should point to a Boolean value, and we already have something that does this: the class<br />

BitReference! The additional requirements (++, equality test, and *) can be implemented in a<br />

derived class. It will look like this: 11<br />

11 This class only contains the constructs that are necessary for the bitset test program. For example, we have not<br />

implemented postfix ++ and comparison with ==.


<strong>Programming</strong> with the STL 33<br />

class BitsetIterator : public BitReference {<br />

public:<br />

// ... typedefs, see below<br />

};<br />

BitsetIterator(Bitset::BitsetStorage* pb, size_t p)<br />

: BitReference(pb, p) {}<br />

BitsetIterator& operator=(const BitsetIterator& bsi);<br />

bool operator!=(const BitsetIterator& bsi) const;<br />

BitsetIterator& operator++();<br />

BitReference operator*();<br />

The assignment operator must be redefined so it copies the member variables (as opposed to the<br />

assignment operator in the base class BitReference, which sets a bit in the bitset). The typedefs<br />

are necessary to make the iterator STL-compliant, i.e., to make it possible to use the iterator with<br />

the standard STL algorithms. The only typedef that isn’t obvious is:<br />

typedef std::forward_iterator_tag iterator_category;<br />

This is an example of an iterator tag, and it informs the compiler that the iterator is a forward<br />

iterator.<br />

A5. Implement the member functions in bitsetiterator.cc. Uncomment the lines in bitset.h and<br />

bitset.cc that have to do with iterators, implement the begin() and end() functions. Use<br />

the program bitsettest2.cc to test your classes.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!