Laboratory Exercises, C++ Programming
Laboratory Exercises, C++ Programming
Laboratory Exercises, C++ Programming
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
LUND INSTITUTE OF TECHNOLOGY <strong>C++</strong> <strong>Programming</strong><br />
Department of Computer Science 2011/12<br />
<strong>Laboratory</strong> <strong>Exercises</strong>, <strong>C++</strong> <strong>Programming</strong><br />
General information:<br />
• The course has four compulsory laboratory exercises.<br />
• You are to work in groups of two people. You sign up for the labs at sam.cs.lth.se/Labs<br />
(see the course plan for instructions).<br />
• The labs are mostly homework. Before each lab session, you must have done all the<br />
assignments (A1, A2, . . . ) in the lab, written and tested the programs, and so on. Reasonable<br />
attempts at solutions count; the lab assistant is the judge of what’s reasonable. Contact a<br />
teacher if you have problems solving the assignments.<br />
• You will not be allowed to participate in the lab if you haven’t done all the assignments.<br />
• Smaller problems with the assignments, e.g., details that do not function correctly, can be<br />
solved with the help of the lab assistant during the lab session.<br />
• Extra labs are organized only for students who cannot attend a lab because of illness. Notify<br />
Per Holm (Per.Holm@cs.lth.se) if you fall ill, before the lab.<br />
The labs are about:<br />
1. Basic <strong>C++</strong>, compiling, linking.<br />
2. The GNU make and gdb tools, valgrind.<br />
3. <strong>C++</strong> strings and streams.<br />
4. <strong>Programming</strong> with the STL, operator overloading.<br />
Practical information:<br />
• You will use many half-written “program skeletons” during the lab. You must download the<br />
necessary files from the course homepage before you start working on the lab assignments.<br />
• The lab files are in separate directories lab[1-4] and are available in gzipped tar format.<br />
Download the tar file and unpack it like this:<br />
tar xzf lab1.tar.gz<br />
This will create a directory lab1 in the current directory.<br />
General sources of information about <strong>C++</strong> and the GNU tools:<br />
• http://www.cplusplus.com<br />
• http://www.cppreference.com<br />
• Info in Emacs (Ctrl-h i)
The GNU Compiler Collection and <strong>C++</strong> 3<br />
1 The GNU Compiler Collection and <strong>C++</strong><br />
Objective: to demonstrate basic GCC usage for <strong>C++</strong> compiling and linking.<br />
Read:<br />
• Book: basic <strong>C++</strong>, variables and types including pointers, expressions, statements, functions,<br />
simple classes, namespaces, ifstream, ofstream.<br />
• Manpages for gcc, g++, and ld.<br />
• An Introduction to GCC, by Brian Gough, http://www.network-theory.co.uk/docs/<br />
gccintro/. Treats the same material as this lab, and more. Also available in book form.<br />
1 Introduction<br />
The GNU Compiler Collection, GCC, is a suite of native compilers for C, <strong>C++</strong>, Objective C,<br />
Fortran 77, Java and Ada. Front-ends for other languages, such as Fortran 95, Pascal, Modula-2<br />
and Cobol are in experimental development stages. The list of supported target platforms is<br />
impressive — GCC has been ported to all kinds of Unix systems, as well as Microsoft Windows<br />
and special purpose embedded systems. GCC is free software, released under the GNU GPL.<br />
Each compiler in GCC contains six programs. Only two or three of the programs are language<br />
specific. For <strong>C++</strong>, the programs are:<br />
Driver: the “engine” that drives the whole set of compiler tools. For <strong>C++</strong> programs you invoke<br />
the driver with g++ (use gcc for C, g77 for Fortran, gcj for Java, etc.). The driver invokes<br />
the other programs one by one, passing the output of each program as input to the next.<br />
Preprocessor: normally called cpp. It takes a <strong>C++</strong> source file and handles preprocessor directives<br />
(#include files, #define macros, conditional compilation with #ifdef, etc.).<br />
Compiler: the <strong>C++</strong> compiler, cc1plus. This is the actual compiler that translates the input file<br />
into assembly language.<br />
Optimizer: sometimes a separate module, sometimes integrated in the compiler module. It<br />
handles optimization on a language-independent code representation.<br />
Assembler: normally called as. It translates the assembly code into machine code, which is<br />
stored in object files.<br />
Linker-Loader: called ld, collects object files into an executable file.<br />
The assembler and the linker are not actually parts of GCC. They may be proprietary operating<br />
system tools or free equivalents from the GNU binutils package.<br />
A <strong>C++</strong> source code file is recognized by its extension. We will use .cc, which is the recommended<br />
extension. Other extensions, such as .cxx, .cpp, .C, are sometimes used.<br />
In <strong>C++</strong> (and in C) declarations are collected in header files with the extension .h. To distinguish<br />
<strong>C++</strong> headers from C headers other extensions are sometimes used, such as .hh, .hxx, .hpp, .H.<br />
We will use the usual .h extension.<br />
The <strong>C++</strong> standard library is declared in headers without an extension. These are, for example,<br />
iostream, vector, string. Older <strong>C++</strong> systems use pre-standard library headers, named iostream.h,<br />
vector.h, string.h. Do not use the old headers in new programs.<br />
A <strong>C++</strong> program may not use any identifier that has not been previously declared in the same<br />
translation unit. For example, suppose that main uses a class MyClass. The program code can be<br />
organized in several ways, but the following is always used (apart from toy examples):
4 The GNU Compiler Collection and <strong>C++</strong><br />
• Define the class in a file myclass.h:<br />
#ifndef MYCLASS_H // include guard<br />
#define MYCLASS_H<br />
// #include necessary headers here<br />
class MyClass {<br />
public:<br />
MyClass(int x);<br />
// ...<br />
private:<br />
// ...<br />
};<br />
#endif<br />
• Define the member functions in a file myclass.cc:<br />
#include "myclass.h"<br />
// #include other necessary headers<br />
MyClass::MyClass(int x) { ... }<br />
// ...<br />
• Define the main function in a file test.cc (the file name is arbitrary):<br />
#include "myclass.h"<br />
int main() {<br />
MyClass m(5);<br />
// ...<br />
}<br />
The include guard is necessary to prevent multiple definitions of names. Do not write function<br />
definitions in a header (except for inline functions and template functions).<br />
The g++ command line looks like this:<br />
g++ [options] [-o outfile] infile1 [infile2 ...]<br />
All the files in a program can be compiled and linked with one command (the -o option specifies<br />
the name of the executable file; if this is omitted the executable is named a.out):<br />
g++ -o test test.cc myclass.cc<br />
To execute the program, just enter its name:<br />
./test<br />
However, it is more common that files are compiled separately and then linked:<br />
g++ -c myclass.cc<br />
g++ -c test.cc<br />
g++ -o test test.o myclass.o<br />
The -c option directs the driver to stop before the linking phase and produce an object file, named<br />
as the source file but with the extension .o instead of .cc.<br />
The driver can be interrupted also at other stages, using the -S or -E options. The -S option<br />
stops the driver after assembly code has been generated and produces an assembly code file
The GNU Compiler Collection and <strong>C++</strong> 5<br />
named file.s. The -E option stops the driver after preprocessing and prints the result from this<br />
first stage on standard output.<br />
A1. Write a “Hello, world!” program in a file hello.cc, compile and test it.<br />
A2. Generate preprocessed output in hello.ii, assembly code output in hello.s, and object code<br />
in hello.o. Study the files:<br />
• hello.ii: this is just to give you an impression of the size of the header.<br />
You will find your program at the end of the file.<br />
• hello.s: your code starts at the label main.<br />
• hello.o: since this is a binary file there is not much to look at. The command nm<br />
hello.o (study the manpage) prints the file’s symbol table (entry points in the<br />
program and the names of the functions that are called by the program and must be<br />
found by the linker).<br />
2 Options and messages<br />
There are many options to the g++ command that we didn’t mention in the previous section. In<br />
the future, we will require that your source files compile correctly using the following command<br />
line:<br />
g++ -c -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces \<br />
-Wparentheses -Wold-style-cast file.cc<br />
Short explanations (you can read more about these and other options in the gcc and g++<br />
manpages):<br />
-c just produce object code, do not link<br />
-pipe use pipes instead of temporary files for communication between<br />
tools (faster)<br />
-O2 optimize the object code on “level 2”<br />
-Wall print most warnings<br />
-W print extra warnings<br />
-ansi follow standard <strong>C++</strong> syntax rules<br />
-pedantic-errors treat “serious” warnings as errors<br />
-Wmissing-braces warn for missing braces {} in some cases<br />
-Wparentheses warn for missing parentheses () in some cases<br />
-Wold-style-cast warn for old-style casts, e.g., (int) instead of static cast<br />
Do not disregard warning messages. Even though the compiler chooses to “only” issue warnings<br />
instead of errors, your program is erroneous or at least questionable. Another option, -Werror,<br />
turns all warnings into errors, thus forcing you to correct your program and remove all warnings<br />
before object code is produced. However, this makes it impossible to test a “half-written” program<br />
so we will normally not use it.<br />
Some of the warning messages are produced by the optimizer, and will therefore not be<br />
output if the -O2 flag is not used. But you must be aware that optimization takes time, and<br />
on a slow machine you may wish to remove this flag during development to save compilation<br />
time. Some platforms define higher optimization levels, -O3, -O4, etc. You should not use these<br />
levels, unless you know very well what their implications are. Some of the optimizations that are<br />
performed at these levels are very aggressive and may result in a faster but much larger program.
6 The GNU Compiler Collection and <strong>C++</strong><br />
It is important that you become used to reading and understanding the GCC error messages.<br />
The messages are sometimes long and may be difficult to understand, especially when the errors<br />
involve the standard STL classes (or any other complex template classes).<br />
3 Introduction to make, and a list example<br />
You have to type a lot in order to compile and link <strong>C++</strong> programs — the command lines are long,<br />
and it is easy to forget an option or two. You also have to remember to recompile all files that<br />
depend on a file that you have modified.<br />
There are tools that make it easier to compile and link, “build”, programs. These may be<br />
integrated development environments (Eclipse, Visual Studio, . . . ) or separate command line<br />
tools. In Unix, make is the most important tool. We will explain make in detail in lab 2. For now,<br />
you only have to know this:<br />
• make reads a Makefile when it is invoked,<br />
• the makefile contains a description of dependencies between files (which files that must be<br />
recompiled/relinked if a file is updated),<br />
• the makefile also contains a description of how to perform the compilation/linking.<br />
As an example, we take the program from assignment A3 (see below). There, two files (list.cc and<br />
ltest.cc) must be compiled and the program linked. Instead of typing the command lines, you<br />
just enter the command make. Make reads the makefile and executes the necessary commands.<br />
The makefile looks like this:<br />
# Define the compiler options<br />
CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors<br />
CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast<br />
# Define what to do when make is executed without arguments.<br />
all: ltest<br />
# The following rule means "if ltest is older than ltest.o or list.o,<br />
# then link ltest".<br />
ltest: ltest.o list.o<br />
g++ -o ltest ltest.o list.o<br />
# Define the rules to create the object files.<br />
ltest.o: ltest.cc list.h<br />
g++ -c $(CXXFLAGS) ltest.cc<br />
list.o: list.cc list.h<br />
g++ -c $(CXXFLAGS) list.cc<br />
You may add rules for other programs to the makefile. All “action lines” must start with a tab,<br />
not eight spaces.<br />
A3. The class List describes a linked list of integer numbers. 1 The numbers are stored in<br />
nodes. A node has a pointer to the next node (0 in the last node). Before the data nodes<br />
there is an empty node, a “head”. An empty list contains only the head node. The only<br />
purpose of the head node is to make it easier to delete an element at the front of the list.<br />
1 In practice, you would never write your own list class. There are several alternatives in the standard library.
The GNU Compiler Collection and <strong>C++</strong> 7<br />
namespace cpp_lab1 {<br />
/* List is a list of long integers */<br />
class List {<br />
public:<br />
/* create an empty list */<br />
List();<br />
};<br />
/* destroy this list */<br />
~List();<br />
/* insert d into this list as the first element */<br />
void insert(long d);<br />
/* remove the first element less than/equal to/greater than d,<br />
depending on the value of df. Do nothing if there is no<br />
value to remove. The public constants may be accessed with<br />
List::LESS, List::EQUAL, List::GREATER outside the class */<br />
enum DeleteFlag { LESS, EQUAL, GREATER };<br />
void remove(long d, DeleteFlag df = EQUAL);<br />
/* returns the size of the list */<br />
int size() const;<br />
/* returns true if the list is empty */<br />
bool empty() const;<br />
/* returns the value of the largest number in the list */<br />
long largest() const;<br />
/* print the contents of the list (for debugging) */<br />
void debugPrint() const;<br />
private:<br />
/* a list node */<br />
struct Node {<br />
long value; // the node value<br />
Node* next; // pointer to the next node, 0 in the last node<br />
Node(long value = 0, Node* next = 0);<br />
};<br />
};<br />
Node* head; // the pointer to the list head, which contains<br />
// a pointer to the first list node<br />
/* forbid copying of lists */<br />
List(const List&);<br />
List& operator=(const List&);<br />
Notes: Node is a struct, i.e., a class where the members are public by default. This is<br />
not dangerous, since Node is private to the class. The copy constructor and assignment<br />
operator are private, so you cannot copy lists.<br />
The file list.h contains the class definition and list.cc contains a skeleton of the class<br />
implementation. Complete the implementation in accordance with the specification.<br />
Also implement, in a file ltest.cc, a test program that checks that your List implementation<br />
is correct. Be careful to check exceptional cases, such as removing the first or the last<br />
element in the list.<br />
Type make to build the program. Note that you will get warnings about unused<br />
parameters when the implementation skeleton is compiled. These warnings will disappear
8 The GNU Compiler Collection and <strong>C++</strong><br />
when you implement the functions.<br />
Try to write code with testing and debugging in mind. The skeleton file list.cc includes<br />
. This makes it possible to introduce assertions on state invariants. The file also<br />
contains a definition of a debug macro, TRACE, which writes messages to the clog output<br />
stream. Assertions as well as TRACE statements are removed from the code when NDEBUG is<br />
defined. 2<br />
A4. Implement a class Coding with two static methods:<br />
/* For any character c, encode(c) is a character different from c */<br />
static unsigned char encode(unsigned char c);<br />
/* For any character c, decode(encode(c)) == c */<br />
static unsigned char decode(unsigned char c);<br />
Use a simple method for the coding and decoding. Then write a program, encode, that<br />
reads a text file 3 , encodes it, and writes the encoded text to another file. The command<br />
line:<br />
./encode file<br />
should run the program, encode file, and write the output to file.enc.<br />
Write another program, decode, that reads an encoded file, decodes it, and writes the<br />
decoded text to another file. The command line should be similar to that of the encode<br />
program. Add rules to the makefile for building the programs.<br />
Test your programs and check that a file that is first encoded and then decoded is<br />
identical to the original.<br />
Note: the programs will work also for files that are UTF8-encoded. In this encoding,<br />
national Swedish characters are encoded in two bytes, and the encode and decode functions<br />
will be called twice for each such character.<br />
4 Object Code Libraries<br />
A lot of software is shipped in the form of libraries, e.g., class packages. In order to use a library, a<br />
developer does not need the source code, only the object files and the headers. Object file libraries<br />
may contain thousands of files and cannot reasonably be shipped as separate files. Instead, the<br />
files are collected into library files that are directly usable by the linker.<br />
4.1 Static Libraries<br />
The simplest kind of library is a static library. The linker treats the object files in a static library<br />
in the same way as other object files, i.e., all code is linked into the executable files, which as a<br />
result may grow very large. In Unix, a static library is an archive file, lib 〈name〉.a. In addition to<br />
the object files, an archive contains an index of the symbols that are defined in the object files.<br />
A collection of object files f1.o, f2.o, f3.o, . . . , are collected into a library using the ar command:<br />
ar crv libfoo.a f1.o f2.o f3.o ...<br />
2 There are two ways to define a global macro like NDEBUG. It can either be specified in the source file:<br />
#define NDEBUG<br />
or be given on the compiler command line, using the -D option:<br />
g++ -c -DNDEBUG $(OTHER CXXFLAGS) list.cc<br />
3 Note that you cannot use while (infile >> ch) to read all characters in infile, since >> skips whitespace. Use<br />
infile.get(ch) instead. Output with outfile
The GNU Compiler Collection and <strong>C++</strong> 9<br />
(Some Unix versions require that you also create the symbol table with ranlib libfoo.a.) In<br />
order to link a program main.o with the object files obj1.o, obj2.o and with object files from the<br />
library libfoo.a, you use the following command line:<br />
g++ -o main main.o obj1.o obj2.o -L. -lfoo<br />
The linker always searches for libraries in certain system directories. The -L. option makes the<br />
linker search also in the current directory. 4 The library name (without lib and .a) is given after<br />
-l.<br />
A5. Collect the object files generated in the other assignments, except those containing main<br />
functions, in a library liblab1.a. Then link the programs (ltest, encode, decode) again,<br />
using the library.<br />
4.2 Shared Libraries<br />
Since most programs use a large amount of code from libraries, executable files can grow very<br />
large. Instead of linking library code into each executable that needs it, the code can be loaded at<br />
runtime. The object files should then be in shared libraries. When linking programs with shared<br />
libraries, the files from the library are not actually linked into the executable. Instead a “pointer”<br />
is established from the program to the library. The obvious advantage is that common code does<br />
not need to be reproduced in all programs.<br />
In Unix shared library files are named lib 〈name〉.so[.x.y.z] (.so for shared objects, .x.y.z is<br />
an optional version number). The linker uses the environment variable LD LIBRARY PATH as the<br />
search path for shared libraries. In Microsoft Windows shared libraries are known as DLL files<br />
(for dynamically loadable libraries).<br />
A6. (Advanced, optional) Create a library as in the previous exercise, but make it shared<br />
instead of static. Then link the executables, with different names, using the shared library.<br />
Make sure they run correctly. Compare the sizes of the dynamically linked executables<br />
to the statically linked (there will not be a big size difference, since the size of the library<br />
code is small).<br />
Use the command ldd (list dynamic dependencies) to inspect the linkage of your<br />
programs. Hints: shared libraries are created by the linker, not the ar archiver. Use the gcc<br />
manpage and the ld manpage (and, if needed, other manpages) to explain the following<br />
sequence of operations:<br />
g++ -fPIC -c *.cc<br />
g++ -shared -Wl,-soname,liblab1.so.1 -o liblab1.so.1.0 *.o<br />
ln -s liblab1.so.1.0 liblab1.so.1<br />
ln -s liblab1.so.1 liblab1.so<br />
You then link with -L. -llab1 as usual. The linker merely checks that all referenced<br />
symbols are in the shared library. Before you execute the program, you must define<br />
LD LIBRARY PATH so it includes the current directory (export is for zsh, setenv for tcsh):<br />
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH<br />
setenv LD_LIBRARY_PATH .:$LD_LIBRARY_PATH<br />
4 You may have several -L and -l options on a command line. Example, where the current directory and the directory<br />
/usr/local/mylib are searched for the libraries libfoo1.a and libfoo2.a:<br />
g++ -o main main.o obj1.o obj2.o -L. -L/usr/local/mylib -lfoo1 -lfoo2
10 The GNU Compiler Collection and <strong>C++</strong><br />
5 Final Remarks<br />
If a source file includes a header that is not in the current directory, the path to the directory<br />
containing the header can be given to the compiler using the -I option (similar to -L):<br />
g++ -c -Ipath file.cc<br />
To make GCC print all it is doing during compilation (what commands it is calling and a lot of<br />
other information), the option -v (for verbose) can be added to the command line.
Tools for Practical <strong>C++</strong> Development 11<br />
2 Tools for Practical <strong>C++</strong> Development<br />
Objective: to introduce a set of tools which are often used to facilitate <strong>C++</strong> development in a Unix<br />
environment. The make tool (GNU Make) is used for compilation and linking, gdb (the GNU<br />
debugger) for controlled execution and testing, valgrind (not GNU) for finding memory-related<br />
errors.<br />
Read:<br />
• GNU Make, http://www.gnu.org/software/make/manual/<br />
• GDB User Manual, http://www.gnu.org/software/gdb/documentation/<br />
• valgrind Quick Start, http://www.valgrind.org/docs/manual/QuickStart.html<br />
• Manpages for the tools used in the lab.<br />
The manuals have introductory tutorials that are recommended as starting points. The manuals<br />
are very well written and you should consult them if you want to learn more than what is treated<br />
during the lab.<br />
1 GNU Make<br />
1.1 Introduction<br />
As you saw in lab 1, make is a good tool — it sees to it that only files that have been modified<br />
are compiled (and files that depend on modified files). To do this, it compares the modification<br />
times 5 of source files and object files.<br />
We will use the GNU version of make. There are other make programs, which are almost but<br />
not wholly compatible to GNU make. In a Linux or Darwin system, the command make usually<br />
refers to GNU make. You can check what variant you have with the command make --version.<br />
It should give output like the following:<br />
make --version<br />
GNU Make 3.81<br />
Copyright (C) 2006 Free Software Foundation, Inc.<br />
This is free software; see the source for copying conditions.<br />
...<br />
If make turns out to be something else than GNU make, try the command gmake instead of make;<br />
this should refer to GNU make.<br />
Make uses a description of the source project in order to figure out how to generate executables.<br />
The description is written in a file, usually called Makefile. When make is executed, it reads<br />
the makefile and executes the compilation and linking (and maybe other) commands that are<br />
necessary to build the project.<br />
The most important construct in a makefile is a rule. A rule specifies how a file (the target),<br />
which is to be generated, depends on other files (the prerequisites). For instance, the following<br />
rule specifies that the file ltest.o depends on the files ltest.cc and ltest.h:<br />
5 In a network environment with a distributed file system, there may occur problems with unsynchronized clocks<br />
between a client and the file server. Make will report errors like:<br />
make: ***Warning: File "foo" has modification time in the future.<br />
make: warning: Clock skew detected. Your build may be incomplete.<br />
Unfortunately, there is not much you can do about this problem, if you don’t have root privileges on the computer.<br />
Inform the system administrators about the problem and switch to another computer if possible. If no other computer is<br />
available, temporarily work on a local file system where you have write access, such as /tmp.
12 Tools for Practical <strong>C++</strong> Development<br />
ltest.o: ltest.cc list.h<br />
On the line following the dependency specification you write a shell command that generates the<br />
target. This is called an explicit rule. Example (with a short command line without options):<br />
ltest.o: ltest.cc list.h<br />
g++ -c ltest.cc<br />
Make has a very unusual syntax requirement: the shell command must be preceded by a tab<br />
character; spaces are not accepted.<br />
An implicit rule does not give an explicit command to generate the target. Instead, it relies<br />
on make’s large collection of default rules. For instance, make knows how to generate .o-files<br />
from most kinds of source files, for instance that g++ is used to generate .o-files from .cc-files.<br />
The actual implicit rule is:<br />
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $<<br />
CXX, CPPFLAGS, and CXXFLAGS are variables that the user can define. The syntax $(VARIABLE) is<br />
used to evaluate a variable, returning its value. CXX is the name of the <strong>C++</strong> compiler, CPPFLAGS<br />
are the options to the preprocessor, CXXFLAGS are the options to the compiler. $@ expands to the<br />
name of the target, $< expands to the first of the prerequisites.<br />
We are now ready to write our first makefile, which will build the ltest program from lab 1.<br />
The value g++ for CXX is default, thus that definition is optional. CPPFLAGS and CXXFLAGS are<br />
empty by default. CXXFLAGS is almost always redefined.<br />
# Compiler and compiler options:<br />
CXX = g++<br />
CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors<br />
CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast<br />
# Linking:<br />
ltest: ltest.o list.o<br />
$(CXX) -o $@ $^<br />
# Dependencies, the implicit rule .cc => .o is used:<br />
ltest.o: ltest.cc list.h<br />
list.o: list.cc list.h<br />
$^ expands to the complete list of prerequisites. We will make improvements to this makefile<br />
later.<br />
Suppose that none of the files ltest, ltest.o, list.o exists. Then, the following commands are<br />
executed when you run make (the long command lines have been wrapped):<br />
make ltest<br />
g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses<br />
-Wold-style-cast -c -o ltest.o ltest.cc<br />
g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses<br />
-Wold-style-cast -c -o list.o list.cc<br />
g++ -o ltest ltest.o list.o<br />
If an error occurs in one of the commands, make will be aborted. If there, for instance, is an error<br />
in list.cc, the compilation of that file will be aborted and the program will not be linked. When<br />
you have corrected the error and run make again, it will discover that ltest.o is up to date and<br />
only remake list.o and ltest.<br />
If you run make without a target name as parameter, make builds the first target it finds in the
Tools for Practical <strong>C++</strong> Development 13<br />
makefile. Since ltest is the first target, make ltest and make are equivalent. By convention, the<br />
first target should be named all. Therefore, the first rules in the makefile should look like this:<br />
# Default target ‘all’, make everything<br />
all: ltest<br />
# Linking:<br />
ltest: ltest.o list.o<br />
$(CXX) -o $@ $^<br />
Makefiles can also contain directives that control the behavior of make. For example, a makefile<br />
can include other files with the include directive.<br />
A1. The file Makefile contains the makefile that has been used as an example. Experiment with<br />
make: copy the necessary source files (list.h, list.cc and ltest.cc) to the lab2 directory and<br />
run make. Run make again. Delete the executable program and run make again. Change<br />
one or more of the source files (it is sufficient to touch them) and see what happens. Run<br />
make ltest.o. Run make notarget. Read the manpage and try other options. Etc., etc.<br />
1.2 Phony targets<br />
Makefiles may contain targets that do not actually correspond to files. The all target in the<br />
previous section is an example. Now, suppose that a file all is created in the directory that<br />
contains the makefile. If that file is newer than the ltest file, a make invocation will do nothing<br />
but say make: Nothing to be done for ‘all’., which is not the desired behavior. The solution<br />
is to specify the target all as a phony target, like this:<br />
.PHONY: all<br />
Another common phony target is clean. Its purpose is to remove intermediate files, such as object<br />
files, and it has no prerequisites. It typically looks like this:<br />
.PHONY: clean<br />
clean:<br />
$(RM) $(OBJS)<br />
The variable RM defaults to rm -f, where the option -f tells rm not to warn about non-existent<br />
files. The return value from the command is always 0, so the clean target will always succeed.<br />
A cleaning rule that removes also executable files and libraries could be called, e.g., cleaner or<br />
realclean.<br />
All Unix source distributions contain one or more makefiles. A makefile should contain<br />
a phony target install, with all as its prerequisite. The make install command copies the<br />
programs and libraries built by the all rule to a system-wide location where they can be reached<br />
by other users. This location, which is called the installation prefix, is usually the directory<br />
/usr/local. Installation uses the GNU command install (or plain cp) to copy programs to<br />
$(PREFIX)/bin and libraries to $(PREFIX)/lib. Since only root has permission to write in<br />
/usr/local, you will have to use one of your own directories as prefix.<br />
A2. Create the bin and lib directories. Add all, clean and install targets to your makefile and<br />
specify suitable phony targets. Also provide an uninstall target that removes the files that<br />
were installed by the install target.
14 Tools for Practical <strong>C++</strong> Development<br />
1.3 Rules for Linking<br />
In our example makefile, we used an explicit rule for linking. Actually, there is an implicit rule<br />
for linking, which looks (after some variable expansions) like this:<br />
$(CC) $(LDFLAGS) $^ $(LOADLIBES) $(LDLIBS) -o $@<br />
LDFLAGS are options to the linker, such as -Ldirectory. LOADLIBES and LDLIBS 6 are variables<br />
intended to contain libraries, such as -llab1. So this is a good rule, except for one thing: it<br />
uses $(CC) to link, and CC is by default gcc, not g++. But if you change the definition of CC, the<br />
implicit rule works also for <strong>C++</strong>:<br />
# Define the linker<br />
CC = g++<br />
1.4 Generating Prerequisites Automatically<br />
While you’re working with a project the prerequisites are often changed. New #include directives<br />
are added and others are removed. In order for make to have correct information of the<br />
dependencies, the makefile must be modified accordingly. The necessary modifications can be<br />
performed automatically by make itself.<br />
A3. Read the make manual, section 4.14 “Generating Prerequisites Automatically”, and implement<br />
such functionality in your makefile. Hint: Copy the %.d rule (it is not necessary that<br />
you understand everything in it) to your makefile and make suitable modifications. Do not<br />
forget to include the *.d files.<br />
The first time you run make you will get a warning about .d files that don’t exist. This<br />
is normal and not an error. Look at the .d files that are created and see that they contain<br />
the correct dependencies.<br />
One useful make facility that we haven’t mentioned earlier is wildcards: as an example,<br />
you can define the variable SRC as a list of all .cc files with SRC = $(wildcard *.cc). And<br />
from this you can get all .o files with OBJ = $(SRC:.cc=.o).<br />
From now on, you must update the makefile when you add new programs, so that you<br />
can always issue the make command in the build directory to build everything. When you<br />
add a new program, you should only have to add a rule that defines the .o-files that are<br />
necessary to link the program, and maybe add the program name to a list of executables.<br />
This is important — it will save you many hours of command typing.<br />
(Optional) Collect all object files not containing main functions into a library, and link<br />
executables against that library. If you know how to create shared libraries, use a shared<br />
library libcxx.so, otherwise create a static archive libcxx.a.<br />
1.5 Writing a Good Makefile<br />
A makefile should always be written to run in sh (Bourne Shell), not in csh. Do not use special<br />
features from, e.g., bash, zsh, or ksh. The tools used in explicit rules should be accessed through<br />
make variables, so that the user can substitute alternatives to the defaults.<br />
A makefile should also be written so that it doesn’t have to be located in the same directory<br />
as the source code. You can achieve this by defining a SRCDIR variable and make all rules refer<br />
to the source code from that prefix. Even better is to use the variable VPATH, which specifies a<br />
search path for source files. Vpath is described in section 4.5 of the make manual.<br />
6 There doesn’t seem to be any difference between LOADLIBES and LDLIBS — they always appear together and are<br />
concatenated. Use LDLIBS.
Tools for Practical <strong>C++</strong> Development 15<br />
A4. (Optional, VPATH is difficult) Create a new directory, src, for <strong>C++</strong> lab source code and copy<br />
the source files from lab 1 into it. In a parallel directory build, install the Makefile, and<br />
modify it so that you can build the programs (and possibly libraries), standing in the build<br />
directory. The all target, which should be default, should build your ltest, encode, and<br />
decode programs. No command should write to the src directory.<br />
2 Debugging<br />
Consider the following program (reciprocal.cc):<br />
#include <br />
#include // for atoi, see below<br />
double reciprocal(int i) {<br />
return 1.0 / i;<br />
}<br />
int main(int argc, char** argv) {<br />
int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />
double r = reciprocal(i);<br />
std::cout
16 Tools for Practical <strong>C++</strong> Development<br />
(gdb) is a command prompt.<br />
The first step is to run the program inside the debugger. Enter the command run and the program<br />
arguments. First run the program without arguments, like this:<br />
(gdb) run<br />
Starting program: /h/dd/c/.../reciprocal<br />
Program received signal SIGSEGV, Segmentation fault.<br />
*__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380)<br />
at strtol_l.c:298<br />
...<br />
The error occurred in the strtol l internal function, which is an internal function that atoi<br />
calls. Inspect the call stack with the where command:<br />
(gdb) where<br />
#0 *__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380)<br />
at strtol_l.c:298<br />
#1 0xb7e07fc6 in *__GI_strtol (nptr=0x0, endptr=0x0, base=10) at strtol.c:110<br />
#2 0xb7e051b1 in atoi (nptr=0x0) at atoi.c:28<br />
#3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9<br />
You see that the atoi function was called from main. Use the up command (three times) to go to<br />
the main function in the call stack:<br />
...<br />
(gdb) up<br />
#3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9<br />
9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />
Note that GDB finds the source code for main, and it shows the line where the function call<br />
occurred.<br />
You can examine the value of a variable using the print command:<br />
(gdb) print argv[1]<br />
$1 = 0x0<br />
This confirms a suspicion that the problem is indeed a 0 pointer passed to atoi. Set a breakpoint<br />
on the first line of main with the break command:<br />
(gdb) break main<br />
Breakpoint 1 at 0x80487a0: file reciprocal.cc, line 9.<br />
Now run the program with an argument:<br />
(gdb) run 5<br />
The program being debugged has been started already.<br />
Start it from the beginning? (y or n) y<br />
Starting program: /h/dd/c/.../reciprocal 5<br />
Breakpoint 1, main (argc=2, argv=0xbff7aae4) at reciprocal.cc:9<br />
9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer"<br />
The debugger has stopped at the breakpoint. Step over the call to atoi using the next command:
Tools for Practical <strong>C++</strong> Development 17<br />
(gdb) next<br />
10 double r = reciprocal(i);<br />
If you want to see what is going on inside reciprocal, step into the function with step:<br />
(gdb) step<br />
reciprocal (i=5) at reciprocal.cc:5<br />
5 return 1.0 / i;<br />
The next statement to be executed is the return statement. Check the value of i and then issue<br />
the continue command to execute to the next breakpoint (which doesn’t exist, so the program<br />
terminates):<br />
(gdb) print i<br />
$2 = 5<br />
(gdb) continue<br />
Continuing.<br />
argv[1] = 5, 1/argv[1] = 0.2<br />
Program exited normally.<br />
A5. Try the debugger by working through the example. Experiment with the commands (the<br />
table below gives short explanations of some useful commands). Commands may be<br />
abbreviated and you may use tab for command completion. Ctrl-a takes you to the<br />
beginning of the command line, Ctrl-e to the end, and so on. Use the help facility and the<br />
manual if you want to learn more.<br />
help [command] Get help about gdb commands<br />
run [args...] Run the program with arguments specified.<br />
continue Continue execution.<br />
next Step to the next line over called functions.<br />
step Step to the next line into called functions.<br />
where Print the call stack.<br />
up Go up to caller.<br />
down Go down to callee.<br />
list [nbr] List 10 lines around the current line or around line nbr (the<br />
following lines if repeated).<br />
break func Set a breakpoint on the first line of a function func.<br />
break nbr Set a breakpoint at line nbr in the current file.<br />
catch [arg] Break on events, for example exceptions that are thrown or caught<br />
(catch throw, catch catch).<br />
info [arg] Generic command for showing things about the program, for<br />
example information about breakpoints (info break).<br />
delete [nbr] Delete all breakpoints or breakpoint nbr.<br />
print expr Print the value of the expression expr.<br />
display var Print the value of variable var every time the program stops.<br />
undisplay [nbr] Delete all displays or display nbr.<br />
set var = expr Assign the value of expr to var.<br />
kill Terminate the debugger process.<br />
watch var Set a watchpoint, i.e., watch all modifications of a variable. This<br />
can be very slow but can be the best solution to find bugs when<br />
”random” pointers are used to write in the wrong memory location.
18 Tools for Practical <strong>C++</strong> Development<br />
whatis var|func Print the type of a variable or function.<br />
source file Read and execute the commands in file.<br />
make Run the make command.<br />
The debugging information that is written to the executable file can be removed using the<br />
strip program from binutils. See the manpage for more information.<br />
You may wish to disable optimization (remove the -O2 option) while you are debugging<br />
a program. GCC allows you to use a debugger together with optimization but the<br />
optimization may produce surprising results: some variables that you have defined may<br />
not exist in the object code, flow of control may move where you did not expect it, some<br />
statements may not be executed because they compute constant results, etc.<br />
To make it easier to switch between generating debugging versions and production<br />
versions of your programs, you can add something like this to your makefile:<br />
# insert this line somewhere at the top of the file<br />
DEBUG = true // or false<br />
# insert the following lines after the definitions of CXXFLAGS and LDFLAGS<br />
ifeq ($(DEBUG), true)<br />
CXXFLAGS += -ggdb<br />
LDFLAGS += -ggdb<br />
else<br />
CXXFLAGS += -O2<br />
endif<br />
You can specify the value of DEBUG on the command line (the -e options means that the<br />
command line value will override any definition in the makefile):<br />
make -e DEBUG=true<br />
3 Finding Memory-Related Errors<br />
<strong>C++</strong> is less programmer-friendly than Java. In Java, many common errors are caught by the<br />
compiler (use of uninitialized variables) or by the runtime system (addressing outside array<br />
bounds, dereferencing null pointers, etc.). In <strong>C++</strong>, errors of this kind are not caught; instead<br />
they result in erroneous results or traps during program execution. Furthermore, you get no<br />
information whatsoever about where in the program the error occurred. Since allocation of<br />
dynamic memory in <strong>C++</strong> is manual, you also have a whole new class of errors (double delete,<br />
memory leaks).<br />
Valgrind (http://www.valgrind.org) is a tool (available only under Linux and Mac OS X)<br />
that helps you to find memory-related errors at the precise locations at which they occur. It does<br />
this by emulating an x86 processor and supplying each data bit with information about the usage<br />
of the bit. This results in slower program execution, but this is more than compensated for by the<br />
reduced time spent in hunting bugs.<br />
It is very easy to use valgrind. To find the error in the reciprocal program from the previous<br />
section, you just do this (compile and link with -ggdb and without optimization):<br />
valgrind --leak-check=yes ./reciprocal<br />
==2938== Memcheck, a memory error detector<br />
==2938== Copyright (C) 2002-2010, and GNU GPL’d, by Julian Seward et al.<br />
==2938== ...<br />
==2938== Invalid read of size 1<br />
==2938== at 0x41A602F: ____strtol_l_internal (strtol_l.c:298)
Tools for Practical <strong>C++</strong> Development 19<br />
==2938== by 0x41A5DE6: strtol (strtol.c:110)<br />
==2938== by 0x41A3160: atoi (atoi.c:28)<br />
==2938== by 0x80486B9: main (reciprocal.cc:9)<br />
==2938== Address 0x0 is not stack’d, malloc’d or (recently) free’d<br />
==2938== ...<br />
==2938==<br />
==2938== HEAP SUMMARY:<br />
==2938== in use at exit: 0 bytes in 0 blocks<br />
==2938== total heap usage: 0 allocs, 0 frees, 0 bytes allocated<br />
==2938==<br />
==2938== All heap blocks were freed -- no leaks are possible<br />
==2938==<br />
==2938== For counts of detected and suppressed errors, rerun with: -v<br />
==2938== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 6)<br />
[1] 2938 segmentation fault valgrind --leak-check=yes ./reciprocal<br />
When the error occurs, you get an error message and a stack trace (and a lot of other information).<br />
In this case, it is almost the same information as when you used gdb, but this is only because the<br />
error resulted in a segmentation violation. But most errors don’t result in immediate traps:<br />
void zero(int* v, size_t n) {<br />
for (size_t i = 0; i < n; ++i)<br />
v[i] = 0;<br />
}<br />
int main() {<br />
int* v = new int[5];<br />
zero(v, 10); // oops, should have been 5<br />
// ... this error may result in a trap now or much later, or wrong results<br />
}<br />
A6. The file vgtest.cc contains a test program with two functions with common programming<br />
errors. Write more functions to check other common errors. Note that valgrind cannot find<br />
addressing errors in stack-allocated arrays (like int v[10]; v[10] = 0). Run the program<br />
under valgrind, try to interpret the output.<br />
Advice: always use valgrind to check your programs! Note that valgrind may give<br />
“false” error reports on <strong>C++</strong> programs that use STL or <strong>C++</strong> strings. See the valgrind FAQ,<br />
http://valgrind.org/docs/manual/faq.html#faq.reports<br />
4 Makefiles and Topological Sorting<br />
4.1 Introduction<br />
Targets and prerequisites of a makefile form a graph where the vertices are files and an arc<br />
between two vertices corresponds to a dependency. When make is invoked, it computes an order<br />
in which the files are to be compiled so the dependencies are satisfied. This corresponds to a<br />
topological sort of the graph.<br />
Consider a simplified makefile containing only implicit rules, i.e., lines reading target:<br />
prerequisites. You are to implement a <strong>C++</strong> program topsort that reads such a file, whose name<br />
is given on the command line, and outputs the targets to standard output in an order where<br />
the dependencies are satisfied. If the makefile contains syntax errors or cyclic dependencies, the<br />
program should exit with an error message.<br />
This is not a trivial task. The solution will be developed stepwise in the following sections.
20 Tools for Practical <strong>C++</strong> Development<br />
4.2 Graph Representation<br />
The first task is to choose a representation of the dependency graph. We will use a representation<br />
where each vertex has a list of its neighbors. This list is called an adjacency list.<br />
Consider the following makefile:<br />
D: A<br />
E: B D<br />
B: A C<br />
The dependencies in the makefile form the following graph (the graph to the left, the vertices<br />
with their adjacency lists to the right):<br />
A D<br />
B<br />
C E<br />
A graph has a list of Vertex objects (A, B, C, D, E, in the figure). A vertex is represented by an object<br />
of class Vertex, which has the attributes name (the label on the vertex) and adj (the adjacency<br />
list). In the figure, the adjacency list of vertex A contains B and D.<br />
The class definitions follow. First, class Vertex (the attribute color is used when traversing<br />
the graph, see section 4.3):<br />
class Vertex {<br />
friend class VertexList; // give VertexList access to private members<br />
private:<br />
/* create a vertex with name nm */<br />
Vertex(const std::string& nm);<br />
};<br />
std::string name; // name of the vertex<br />
VertexList adj; // list of adjacent vertices<br />
enum Color { WHITE, GRAY, BLACK };<br />
Color color; // used in the traversal algorithm<br />
In class VertexList, the adjacency list is implemented as a vector of pointers to other vertices<br />
(vptrs). The functions top sort and dfs visit are described in section 4.3.<br />
struct cyclic {}; // exception type, the graph is cyclic<br />
class VertexList {<br />
public:<br />
/* create an empty vertex list, destroy the list */<br />
VertexList();<br />
~VertexList();<br />
/* insert a vertex with the label name in the list */<br />
void add_vertex(const std::string& name);<br />
A<br />
B<br />
C<br />
D<br />
E
Tools for Practical <strong>C++</strong> Development 21<br />
/* insert an arc from the vertex ‘from’ to the vertex ‘to’ */<br />
void add_arc(const std::string& from, const std::string& to);<br />
/* return the vertex names in reverse topological order */<br />
std::stack top_sort() throw(cyclic);<br />
/* print the vertex list (for debugging) */<br />
void debugPrint() const;<br />
private:<br />
void insert(Vertex* v); // insert the vertex pointer v in vptrs, if<br />
// it’s not already there<br />
Vertex* find(const std::string& name) const; // return a pointer to the<br />
// vertex named name, 0 if not found<br />
void dfs_visit(Vertex* v, std::stack& result) throw(cyclic);<br />
// visit v in the traversal algorithm<br />
};<br />
std::vector vptrs; // vector of pointers to vertices<br />
VertexList(const VertexList&); // forbid copying<br />
VertexList& operator=(const VertexList&);<br />
Finally, the class Graph is merely a typedef for a VertexList:<br />
Notes:<br />
typedef VertexList Graph;<br />
• add vertex in VertexList should create a new Vertex object and add it to the vertex list.<br />
It should do nothing if a vertex with that name already is present in the list.<br />
• add arc should insert to in from’s adjacency list. It should start with inserting from and<br />
to in the vertex list — in that way arcs and vertices may be added in arbitrary order.<br />
• find is used in add vertex and in insert.<br />
• The VertexList destructor is quite difficult to write correctly, since you have to be careful<br />
so no Vertex object is deleted more than once. It is easiest to start by removing pointers<br />
from the adjacency lists so the graph isn’t circular, before deleting the objects. — You may<br />
treat the destructor as optional, if you wish.<br />
• Advice: draw your own picture of a graph structure, with all objects, vptrs and pointers.<br />
A7. The classes are in the files vertex.h, vertex.cc, vertexlist.h, vertexlist.cc, graph.h. Implement<br />
the classes, except top sort and dfs visit, and test using the program graph test.cc.<br />
Comment out the lines in the function testGraph that check the top sort algorithm.<br />
4.3 Topological Sort<br />
A topological ordering of the vertices of a graph is essentially obtained by a depth-first search<br />
from all vertices. (There are other algorithms for topological sorting. If you prefer another<br />
algorithm you are free to use that one instead.) The topological ordering is not necessarily unique.<br />
For instance, the vertices of the graph in the preceding section may be ordered like this:<br />
A C B D E or<br />
C A D B E or . . .<br />
The algorithm is described below, in pseudo-code. The algorithm produces the vertices in reverse<br />
topological order, which is why we use a stack for the result.
22 Tools for Practical <strong>C++</strong> Development<br />
The algorithm should detect cycles in the graph. If that happens, your function should throw<br />
a cyclic exception. The cycle detection is not described in the algorithm; you must add that<br />
yourself.<br />
Topological-Sort(Graph G) {<br />
/* 1. initialize the graph and the result stack */<br />
for each vertex v in G<br />
v.color = WHITE;<br />
Stack result;<br />
/* 2. search from all unvisited vertices */<br />
for each vertex v in G<br />
if (v.color == WHITE)<br />
DFS-visit(v, result);<br />
}<br />
DFS-visit(Vertex v, Stack result) {<br />
/* 1. white vertex v is just discovered, color it gray */<br />
v.color = GRAY;<br />
/* 2. recursively visit all unvisited adjacent nodes */<br />
for each vertex w in v.adj<br />
if (w.color == WHITE)<br />
DFS-visit(w);<br />
/* 3. finish vertex v (color it black and output it) */<br />
v.color = BLACK;<br />
result.push(v.name);<br />
}<br />
A8. Implement top sort and dfs visit and test the program.<br />
4.4 Parsing the Makefile<br />
A9. Finally, you are ready to solve the last part of the assignment: to read a makefile, build a<br />
graph, sort it topologically and print the results. The input to the program is a makefile<br />
containing lines of the type:<br />
target: prereq1 prereq2 ...<br />
The names of the target and the prerequisites are strings. The list of prerequisites may<br />
be empty. The program should perform reasonable syntax checks (the most important is<br />
probably that a line must contain a target).<br />
The file topsort.cc contains a test program. Implement the function buildGraph and<br />
test. The files test[1-5].txt contain test cases (the same that were used in the graph test<br />
program in assignment A8), but you are encouraged to also test other cases.
Strings and Streams 23<br />
3 Strings and Streams<br />
Objective: to practice using the string class and the stream classes.<br />
Read:<br />
• Book: strings, streams, function templates.<br />
1 Class string<br />
1.1 Introduction<br />
In C, a string is a null-terminated array of characters. This representation is the cause of many<br />
errors: overwriting array bounds, trying to access arrays through uninitialized or incorrect<br />
pointers, and leaving dangling pointers after an array has been deallocated. The <br />
library contains some useful operations on C-strings, such as copying and comparing strings.<br />
<strong>C++</strong> strings hide the physical representation of the sequence of characters. A <strong>C++</strong> string<br />
object knows its starting location in memory, its contents (the characters), its length, and the<br />
length to which it can grow before it must resize its internal data buffer (i.e., its capacity).<br />
The exact implementation of the string class is not defined by the <strong>C++</strong> standard. The<br />
specification in the standard is intended to be flexible enough to allow different implementations,<br />
yet guarantee predictable behavior.<br />
The string identifier is not actually a class, but a typedef for a specialized template:<br />
typedef std::basic_string string;<br />
So string is a string containing characters of the type char. There is another standard string<br />
specialization, wstring, which describes strings of characters of the type wchar t. wchar t is the<br />
type representing “wide characters”. The standard does not specify the size of these characters;<br />
usually it’s 16 or 32 bits (32 in gcc). In this lab we will ignore all “internationalization” problems<br />
and assume that all characters fit in one byte. Strings in <strong>C++</strong> are not limited to the classes string<br />
and wstring; you may specify strings of “characters” of any type, such as int. However, this is<br />
discouraged: there are other sequence containers that are better suited for non-char types.<br />
Since we’re only interested in the string specialization of basic string, you may wonder why<br />
we mention basic string at all? There are two reasons: First, std::basic string is all the<br />
compiler knows about in all but the first few translation passes. Hence, diagnostic messages only<br />
mention std::basic string. Secondly, it is important for the general understanding of the<br />
ideas behind the string method implementations to have a grasp of the basics of a <strong>C++</strong> concept<br />
called character traits, i.e., character “characteristics”. For now, we will just show parts of the<br />
definition of basic string:<br />
/* Standard header */<br />
namespace std {<br />
template<br />
class basic_string {<br />
// types:<br />
typedef traits_t traits_type;<br />
typedef typename traits_t::char_type char_type;<br />
typedef unsigned long size_type;<br />
static const size_type npos = -1;<br />
// much more<br />
};<br />
}
24 Strings and Streams<br />
char type is the type of the characters in the string. size type is a type used for indexing in the<br />
string. npos (“no position”) indicates a position beyond the end of the string; it may be returned<br />
by functions that search for characters in a string. Since size type is unsigned, npos represents<br />
positive infinity.<br />
1.2 Operations on Strings<br />
We now show some of the most important operations on strings (actually basic string’s). The<br />
operations are both member functions and global helper functions. We start with the member<br />
functions (for formatting reasons, some parameters of type size type have been specified as<br />
int):<br />
class string {<br />
public:<br />
/*** construction ***/<br />
string(); // create an empty string<br />
string(const string& s); // create a copy of s<br />
string(const char* cs); // create a string with the characters from cs<br />
string(const string& s, int start, int n); // create a string with n<br />
// characters from s, starting at position start<br />
string(int n, char ch); // create a string with n copies of ch<br />
}<br />
/*** information ***/<br />
int size(); // number of characters in the string<br />
int capacity(); // the string can hold this many characters<br />
// before resizing is necessary<br />
void reserve(int n); // set the capacity to n, resize if necessary<br />
/*** character access ***/<br />
const char& operator[](size_type pos) const;<br />
char& operator[](size_type pos);<br />
/*** substrings */<br />
string substr(size_type start, int n); // the substring starting at<br />
// position start, containing n characters<br />
/*** finding things ***/<br />
// see below<br />
/*** inserting, replacing, and removing ***/<br />
void insert(size_type pos, const string& s); // insert s at position pos<br />
void append(const string& s); // append s at the end<br />
void replace(size_type start, int n, const string& s); // replace n<br />
// characters starting at pos with s<br />
void erase(size_type start = 0, int n = npos); // remove n characters<br />
// starting at pos<br />
/*** assignment and concatenation ***/<br />
string& operator=(const string& s);<br />
string& operator=(const char* cs);<br />
string& operator=(char ch);<br />
string& operator+=(const string& s); // also for const char* and char<br />
/*** access to C-string representation ***/<br />
const char* c_str();<br />
• Note that there is no constructor string(char). Use string(1, char) instead.
Strings and Streams 25<br />
• The subscript functions operator[] do not check for a valid index. There are similar at()<br />
functions that do check, and that throw out of range if the index is not valid.<br />
• The substr() member function takes a starting position as its first argument and the number<br />
of characters as the second argument. This is different from the substring() method in<br />
java.lang.String, where the second argument is the end position of the substring.<br />
• There are overloads of most of the functions. You can use C-strings or characters as<br />
parameters instead of strings.<br />
• There is a bewildering variety of member functions for finding strings, C-strings or characters.<br />
They all return npos if the search fails. The functions have the following signature (the<br />
string parameter may also be a C-string or a character):<br />
size_type FIND_VARIANT(const string& s, size_type pos = 0) const;<br />
s is the string to search for, pos is the starting position. (The default value for pos is npos,<br />
not 0, in the functions that search backwards).<br />
The “find variants” are find (find a string, forwards), rfind (find a string, backwards),<br />
find first of and find last of (find one of the characters in a string, forwards or backwards),<br />
find first not of and find last not of (find a character that is not one of the<br />
characters in a string, forwards or backwards).<br />
Example:<br />
void f() {<br />
string s = "accdcde";<br />
int i1 = s.find("cd"); // i1 = 2 (s[2]==’c’ && s[3]==’d’)<br />
int i2 = s.rfind("cd"); // i2 = 4 (s[4]==’c’ && s[5]==’d’)<br />
int i3 = s.find_first_of("cd"); // i3 = 1 (s[1]==’c’)<br />
int i4 = s.find_last_of("cd"); // i4 = 5 (s[5]==’d’)<br />
int i5 = s.find_first_not_of("cd"); // i5 = 0 (s[0]!=’c’ && s[0]!=’d’)<br />
int i6 = s.find_last_not_of("cd"); // i6 = 6 (s[6]!=’c’ && s[6]!=’d’)<br />
}<br />
The global overloaded operator functions are for concatenation (operator+) and for comparison<br />
(operator==, operator
26 Strings and Streams<br />
A2. The Sieve of Eratosthenes is an ancient method for finding all prime numbers less than<br />
some fixed number M. It starts by enumerating all numbers in the interval [0, M] and<br />
assuming they are all primes. The first two numbers, 0 and 1 are marked, as they are not<br />
primes. The algorithm then starts with the number 2, marks all subsequent multiples of<br />
2 as composites, and repeats the process for the next prime candidate. When the initial<br />
sequence is exhausted, the numbers not marked as composites are the primes in [0, M].<br />
In this assignment you shall use a string for the enumeration. Initialize a string with<br />
appropriate length to PPPPP...PPP. The characters at positions that are not prime numbers<br />
should be changed to C. Write a test program that prints the prime numbers between 1<br />
and 200 and also the largest prime that is less than 100,000.<br />
Example with the numbers 0–27:<br />
1 2<br />
0123456789012345678901234567<br />
Initial: CCPPPPPPPPPPPPPPPPPPPPPPPPPP<br />
Find 2, mark 4,6,8,...: CCPPCPCPCPCPCPCPCPCPCPCPCPCP<br />
Find 3, mark 6,9,12,...: CCPPCPCPCCCPCPCCCPCPCCCPCPCC<br />
Find 5, mark 10,15,20,25: CCPPCPCPCCCPCPCCCPCPCCCPCCCC<br />
Find 7, mark 14,21: CCPPCPCPCCCPCPCCCPCPCCCPCCCC<br />
...<br />
2 The iostream Library<br />
2.1 Input/Output of User-Defined Objects<br />
We have already used most of the stream classes: istream, ifstream, and istringstream for<br />
reading, and ostream, ofstream, and ostringstream for writing. There are also iostream’s that<br />
allow both reading and writing. The stream classes are organized in the following (simplified)<br />
generalization hierarchy:<br />
ios_base<br />
ios<br />
istream ostream<br />
iostream<br />
ifstream istringstream fstream stringstream ofstream ostringstream<br />
The classes ios base and ios contain, among other things, information about the stream state.<br />
There are, for example, functions bool good() (the state is ok) and bool eof() (end-of-file has<br />
been reached). There is also a conversion operator void*() that returns nonzero if the state is<br />
good, and a bool operator!() that returns nonzero if the state is not good. We have used these<br />
operators with input files, writing for example while (infile >> ch) or if (! infile).<br />
Now, we are interested in reading objects of a class A from an istream using operator>> and<br />
in writing objects to an ostream using operator(std::istream& is, A& aobj);<br />
std::ostream& operator
Strings and Streams 27<br />
Note:<br />
• Since the parameters are of type istream or ostream, the operator functions work for both<br />
file streams and string streams.<br />
• The functions are global “helper” functions. 7<br />
• Both operators return the stream object that is the first argument to the operator, enabling<br />
us to write, e.g., cout c;<br />
if (c == ’(’) {<br />
is >> re >> c;<br />
if (c == ’,’) {<br />
is >> im >> c;<br />
if (c == ’)’) {<br />
cplx = complex(re, im);<br />
}<br />
else {<br />
// format error, set the stream state to ’fail’<br />
is.setstate(ios_base::failbit);<br />
}<br />
}<br />
else if (c == ’)’) {<br />
cplx = complex(re, num_type(0));<br />
}<br />
else {<br />
// format error, set the stream state to ’fail’<br />
is.setstate(ios_base::failbit);<br />
}<br />
}<br />
else {<br />
is.putback(c);<br />
is >> re;<br />
cplx = complex(re, num_type(0));<br />
}<br />
return is;<br />
}<br />
}<br />
7 The alternative implementation, to write operator>> and operator
28 Strings and Streams<br />
A3. The files date.h, date.cc, and date test.cc describe a simple date class. Implement the<br />
class and add operators for input and output of dates. Dates should be output in the<br />
form 2012-01-10. The input operator should accept dates in the same format. (You may<br />
consider dates written like 2012-1-10 and 2012 -001 - 10 as legal, if you wish.)<br />
2.2 String Streams<br />
The string stream classes (istringstream and ostringstream) function as their “file” counterparts<br />
(ifstream and ofstream). The only difference is that characters are read from/written to a<br />
string instead of a file.<br />
A4. In Java, the class Object defines a method toString() that is supposed to produce a<br />
“readable representation” of an object. This method can be overridden in subclasses.<br />
Write a <strong>C++</strong> function toString for the same purpose. Examples of usage:<br />
double d = 1.234;<br />
Date today;<br />
std::string sd = toString(d);<br />
std::string st = toString(today);<br />
You may assume that the argument object can be output with
<strong>Programming</strong> with the STL 29<br />
4 <strong>Programming</strong> with the STL<br />
Objective: to practice using the STL container classes and algoritms, with emphasis on efficiency.<br />
You will also learn more about operator overloading and STL iterators.<br />
Read:<br />
• Book: STL containers and STL algorithms. Operator overloading, iterators.<br />
1 Domain Name Servers and the STL Container Classes<br />
On the web, computers are identified by IP addresses (32-bit numbers). Humans identify<br />
computers by symbolic names. A Domain Name Server (DNS) is a component that translates<br />
a symbolic name to the corresponding IP address. The DNS system is a very large distributed<br />
database that contains billions (or at least many millions) of IP addresses and that receives billions<br />
of lookup requests every day. Furthermore, the database is continuously updated. If you wish<br />
to know more about name servers, read the introduction at http://computer.howstuffworks.<br />
com/dns.htm.<br />
In this lab, you will implement a local DNS in <strong>C++</strong>. With “local”, we mean that the DNS does<br />
not communicate with other name servers; it can only perform translations using its own database.<br />
The goal is to develop a time-efficient name server, and you will study different implementation<br />
alternatives. You shall implement three versions (classes) of the name server, using different STL<br />
container classes. All three classes should implement the interface NameServerInterface:<br />
typedef std::string HostName;<br />
typedef unsigned int IPAddress;<br />
const IPAddress NON_EXISTING_ADDRESS = 0;<br />
class NameServerInterface {<br />
public:<br />
virtual ~NameServerInterface() {}<br />
virtual void insert(const HostName&, const IPAddress&) = 0;<br />
virtual bool remove(const HostName&) = 0;<br />
virtual IPAddress lookup(const HostName&) const = 0;<br />
};<br />
insert() inserts a name/address pair into the database, without checking if the name already<br />
exists. remove() removes a name/address pair and returns true if the name exists; it does<br />
nothing and returns false if the name doesn’t exist. lookup() returns the IP address for a<br />
specified name, or NON EXISTING ADDRESS if the name doesn’t exist.<br />
Since this is an STL lab, you shall use STL containers and algorithms as much as possible.<br />
This means, for example, that you are not allowed to use any for or while statements in your<br />
solutions. (There is one exception: you may use a for or while statement in the hash function,<br />
see assignment A1c.)<br />
A1. The definition of the class NameServerInterface is in the file nameserverinterface.h.<br />
a) Implement a class VectorNameServer that uses an unsorted vector to store the name/<br />
address pairs. Use linear search to search for an element. The intent is to demonstrate<br />
that such an implementation is inefficient.<br />
Use the STL find if algorithm to search for a host name. Consider carefully the third<br />
parameter to the algorithm: it should be a function or a functor that compares an<br />
element in the vector (a name/address pair) with a name. If you use a functor, the<br />
call may look like this:
30 <strong>Programming</strong> with the STL<br />
find_if(v.begin(), v.end(), bind2nd(first_equal(), name))<br />
first equal is a functor (a class) that takes two parameters, a pair and a host name,<br />
and compares the first component of the pair with the name.<br />
b) Implement a class MapNameServer that uses a map to store the name/address pairs.<br />
The average search time in this implementation will be considerably better than that<br />
for the vector implementation.<br />
c) It is possible that the map implementation of the name server is sufficiently efficient,<br />
but you shall also try a third implementation. Implement a class HashNameServer<br />
that uses a hash table to store the name/address pairs. Use a vector of vector’s to<br />
store the name/address pairs. 8<br />
The hash table implementation is open for experimentation: you must select an<br />
appropriate size for the hash table and a suitable hash function. 9 It can also be<br />
worthwhile to exploit the fact that most (in fact all, in the test program) computer<br />
names start with “www.”. Without any greater difficulty, you should be able to<br />
improve the search times that you obtained in the map implementation by about 50%.<br />
Use the program nstest.cc to check that the insert/remove/lookup functions work correctly.<br />
Then, use the program nstime.cc to measure and print the search times for the three<br />
different implementations, using the file nameserverdata.txt as input (the file contains<br />
25,143 name/address pairs 10 ).<br />
A2. Examples of average search times in milliseconds for a name server with 25,143 names are<br />
in the following table.<br />
vector 0.185<br />
map 0.00066<br />
hash 0.00029<br />
25,143 100,000<br />
Search the Internet for information about search algorithms for different data structures, or<br />
use your knowledge from the algorithms and data structures course, and fill in the blanks<br />
in the table. Write a similar table for your own implementation.<br />
2 Bitsets, Subscripting, and Iterators<br />
2.1 Bitsets<br />
To manipulate individual bits in a word, <strong>C++</strong> provides the bitwise operators & (and), | (or),<br />
^ (exclusive or), and the shift operators > (shift right). The STL class bitset<br />
generalizes this notion and provides operations on a set of N bits indexed from 0 through N-1.<br />
N may be arbitrary large, indicating that the bitset may occupy many words.<br />
For historical reasons, bitset doesn’t provide any iterators. We will develop a simplified<br />
version of the bitset class where all the bits fit in one word, and extend the class with iterators<br />
so it becomes possible to use the standard STL algorithms with the class. Our goal is to provide<br />
enough functionality to make the following program work correctly:<br />
8 GNU STL contains a class hash map (in newer versions unordered map), a map implementation that uses a hash table.<br />
You should not use any of these classes in this assignment.<br />
9 Note that a good hash function should take all (or at least many) of the characters of a string into account and that<br />
"abc" and "cba" should have different hash codes. For instance, a hash function that merely adds the first and last<br />
characters of a string is not acceptable.<br />
10 It was difficult to find a file with real name/address pairs, so the computer names are common words from a dictionary<br />
file, with www. first and .se or similar last. The IP addresses are running numbers.
<strong>Programming</strong> with the STL 31<br />
int main() {<br />
using namespace std;<br />
using namespace cpp_lab4;<br />
}<br />
// Define an empty bitset, set some bits, print the bitset<br />
Bitset bs;<br />
bs[3] = true;<br />
bs[0] = bs[3];<br />
copy(bs.begin(), bs.end(), ostream_iterator(cout));<br />
cout
32 <strong>Programming</strong> with the STL<br />
An outline of the class (BitsetStorage is the type of the word that contains the bits):<br />
class BitReference {<br />
public:<br />
BitReference(Bitset::BitStorage* pb, size_t p)<br />
: p_bits(pb), pos(p) {}<br />
// Operations will be added later<br />
protected:<br />
Bitset::BitStorage* p_bits; // pointer to the word containing bits<br />
size_t pos; // position of the bit in the word<br />
};<br />
Now, operator[] in class Bitset may be defined as follows:<br />
BitReference operator[](size_t pos) {<br />
return BitReference(&bits, pos);<br />
}<br />
The actual bit fiddling is performed in the BitReference class. In order to see what we need to<br />
implement in this class, we must study the results of some expressions involving operator[]:<br />
bs[3] = true; // bs.operator[](3) = true; =><br />
// BitReference(&bs.bits,3) = true; =><br />
// BitReference(&bs.bits,3).operator=(true);<br />
From this follows that we must implement the following operator function in BitReference:<br />
BitReference& operator=(bool x); // for bs[i] = x<br />
This function should set the bit referenced by the BitReference object to the value of x (just like<br />
the set function in the original Bitset class). When we investigate (do this on your own) other<br />
possibilities to use operator[] we find that we also need the following functions:<br />
BitReference& operator=(const BitReference& bsr); // for bs[i] = bs[j]<br />
operator bool() const; // for x = bs[i]<br />
A4. Use the files bitset.h, bitset.cc, bitreference.h, bitreference.cc, and bitsettest1.cc. Implement<br />
the functions in bitreference.cc and test. It is a good idea to execute the program line<br />
by line under control of gdb, even if your implementation is correct, so you can follow<br />
the execution closely and see what happens (turn off the -O2 switch on the compilation<br />
command line if you do this).<br />
2.3 Iterators<br />
From one of the OH slides: “An iterator “points” to a value. All iterators are DefaultConstructible<br />
and Assignable and support ++it and it++.” A ForwardIterator should additionally be<br />
EqualityComparable and support *it, both for reading and writing via the iterator.<br />
The most important requirement is that an iterator should point to a value. A BitsetIterator<br />
should point to a Boolean value, and we already have something that does this: the class<br />
BitReference! The additional requirements (++, equality test, and *) can be implemented in a<br />
derived class. It will look like this: 11<br />
11 This class only contains the constructs that are necessary for the bitset test program. For example, we have not<br />
implemented postfix ++ and comparison with ==.
<strong>Programming</strong> with the STL 33<br />
class BitsetIterator : public BitReference {<br />
public:<br />
// ... typedefs, see below<br />
};<br />
BitsetIterator(Bitset::BitsetStorage* pb, size_t p)<br />
: BitReference(pb, p) {}<br />
BitsetIterator& operator=(const BitsetIterator& bsi);<br />
bool operator!=(const BitsetIterator& bsi) const;<br />
BitsetIterator& operator++();<br />
BitReference operator*();<br />
The assignment operator must be redefined so it copies the member variables (as opposed to the<br />
assignment operator in the base class BitReference, which sets a bit in the bitset). The typedefs<br />
are necessary to make the iterator STL-compliant, i.e., to make it possible to use the iterator with<br />
the standard STL algorithms. The only typedef that isn’t obvious is:<br />
typedef std::forward_iterator_tag iterator_category;<br />
This is an example of an iterator tag, and it informs the compiler that the iterator is a forward<br />
iterator.<br />
A5. Implement the member functions in bitsetiterator.cc. Uncomment the lines in bitset.h and<br />
bitset.cc that have to do with iterators, implement the begin() and end() functions. Use<br />
the program bitsettest2.cc to test your classes.