+ -
Notes for current slide
Notes for next slide

5CCYB041

OBJECT-ORIENTED PROGRAMMING

Week 2, session 2

More string handling
Basic debugging

1

Picking up where we left off

We continue working on our DNA shotgun sequencing project

You can find the most up to date version in the project's solution/ folder

Make sure your code is up to date now!

2

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

3

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
4

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
5

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
6

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
7

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA

We have a match with an overlap of 5 bases!

  • should we stop now?
8

Computing the overlap

Let's imagine that this is the current estimate of the sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGCAGACG

and this is a candidate fragment:

CCCTCATCACAACTGAATGCTTGGGCTGAA

We could compute the overlap by comparing the overlapping region, starting from the smallest overlap:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA

No! If we carry on, we'll find a larger overlap

9

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
10

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
11

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
12

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
13

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
14

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
15

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
16

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
17

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
18

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
19

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
20

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
21

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
22

Computing the overlap

A better strategy is to start with the biggest overlap first, and gradually reduce it

  • that way, if we do find a match, we can stop immediately
CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA

Now as soon as we find an overlap, it is guaranteed to be the largest

23

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

24

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

Exercise: implement code to do this

24

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
25

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
  • we return an integer representing the number of bases in the overlap
26

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
  • we return an integer representing the number of bases in the overlap
  • we chose a suitable name for our function that clearly states its purpose
27

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
  • we return an integer representing the number of bases in the overlap
  • we chose a suitable name for our function that clearly states its purpose
  • we pass both the sequence and candidate fragment as const references to strings
28

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
  • we return an integer representing the number of bases in the overlap
  • we chose a suitable name for our function that clearly states its purpose
  • we pass both the sequence and candidate fragment as const references to strings
  • we need to #include the <string> header since we are using std::string
29

Computing the overlap

We create a header file overlap.h to hold our function declaration

#pragma once
#include <string>
int compute_overlap (const std::string& sequence, const std::string& fragment);
  • we return an integer representing the number of bases in the overlap
  • we chose a suitable name for our function that clearly states its purpose
  • we pass both the sequence and candidate fragment as const references to strings
  • we need to #include the <string> header since we are using std::string
  • we use the #pragma once directive to avoid double-inclusion
30

Computing the overlap

Next, we create the overlap.cpp file to hold our function definition

#include <iostream>
#include <stdexcept>
#include <string>
#include "overlap.h"
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
// this is where our code will go
}
31

Computing the overlap

Next, we create the overlap.cpp file to hold our function definition

#include <iostream>
#include <stdexcept>
#include <string>
#include "overlap.h"
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
// this is where our code will go
}
  • we need to replicate the function declaration, making sure it matches exactly
32

Computing the overlap

Next, we create the overlap.cpp file to hold our function definition

#include <iostream>
#include <stdexcept>
#include <string>
#include "overlap.h"
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
// this is where our code will go
}
  • we need to replicate the function declaration, making sure it matches exactly
  • we need to #include the matching header file
33

Computing the overlap

Next, we create the overlap.cpp file to hold our function definition

#include <iostream>
#include <stdexcept>
#include <string>
#include "overlap.h"
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
// this is where our code will go
}
  • we need to replicate the function declaration, making sure it matches exactly
  • we need to #include the matching header file
  • we need to #include the system headers for the functionality we will be using
34

Computing the overlap

Next, we create the overlap.cpp file to hold our function definition

#include <iostream>
#include <stdexcept>
#include <string>
#include "overlap.h"
int compute_overlap (const std::string& sequence, const std::string& fragment)
{
// this is where our code will go
}
  • we need to replicate the function declaration, making sure it matches exactly
  • we need to #include the matching header file
  • we need to #include the system headers for the functionality we will be using
  • now we can worry about what will go in the body of our function
35

Computing the overlap

Now we fill out the body of our compute_overlap() function

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
...
return 0;
}
36

Computing the overlap

Now we fill out the body of our compute_overlap() function

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
...
return 0;
}
  • always check any assumptions and expectations upfront – this can save you a lot of frustration later...
37

Computing the overlap

Now we fill out the body of our compute_overlap() function

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
...
return 0;
}
  • always check any assumptions and expectations upfront – this can save you a lot of frustration later...
  • during the development phase, don't hesitate to display lots of information, it will often alert you to where things might be going wrong.
38

Computing the overlap

Now we fill out the body of our compute_overlap() function

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
...
return 0;
}
  • always check any assumptions and expectations upfront – this can save you a lot of frustration later...
  • during the development phase, don't hesitate to display lots of information, it will often alert you to where things might be going wrong.
  • we need to return an integer, so let's return a value of zero (no overlap) by default
39

Computing the overlap

Now we fill out the body of our compute_overlap() function

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
...
return 0;
}
  • always check any assumptions and expectations upfront – this can save you a lot of frustration later...
  • during the development phase, don't hesitate to display lots of information, it will often alert you to where things might be going wrong.
  • we need to return an integer, so let's return a value of zero (no overlap) by default

Now we can focus on what happens here...

40

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}
41

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}

We iterate over the range of valid overlaps:

  • starting from the largest possible overlap (the size of the candidate fragment)
  • reducing the size of the overlap at each iteration
  • until the overlap is zero
42

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}

We iterate over the range of valid overlaps:

  • starting from the largest possible overlap (the size of the candidate fragment)
  • reducing the size of the overlap at each iteration
  • until the overlap is zero

We extract the part of the current sequence that corresponds to that overlap

43

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}

We iterate over the range of valid overlaps:

  • starting from the largest possible overlap (the size of the candidate fragment)
  • reducing the size of the overlap at each iteration
  • until the overlap is zero

We extract the part of the current sequence that corresponds to that overlap

  • to do this, we use the string substr() method
  • and likewise for the candidate sequence – though it is now the end of the string
44

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}

We iterate over the range of valid overlaps:

  • starting from the largest possible overlap (the size of the candidate fragment)
  • reducing the size of the overlap at each iteration
  • until the overlap is zero

We extract the part of the current sequence that corresponds to that overlap

  • to do this, we use the string substr() method
  • and likewise for the candidate sequence – though it is now the end of the string

Finally, we check whether both substrings match

  • If so, we can immediately return the size of the current overlap
45

Computing the overlap

for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}

We iterate over the range of valid overlaps:

  • starting from the largest possible overlap (the size of the candidate fragment)
  • reducing the size of the overlap at each iteration
  • until the overlap is zero

We extract the part of the current sequence that corresponds to that overlap

  • to do this, we use the string substr() method
  • and likewise for the candidate sequence – though it is now the end of the string

Finally, we check whether both substrings match

  • If so, we can immediately return the size of the current overlap

Note that in this call to substr(), we only provided one argument, whereas we had provided two arguments in the previous call (position and length of the substring).


This is because the arguments for this call have been given default values:

  • if unspecified, the position is zero, and the length is the maximum value possible.
  • we will cover default arguments in C++ later in the course.
46

Computing the overlap

Here is our function definition in full:

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}
return 0;
}
47

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

48

Computing the overlap

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}
return 0;
}

We need to modify our code to also check for overlap at the other end of the sequence

49

Computing the overlap

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end)
return overlap;
}
return 0;
}

We can no longer return the overlap here, since there may be a larger overlap at the other end.


We need to check both ends, and return the larger of the two.

50

Computing the overlap

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
int largest_overlap = 0;
for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
...

To do this, we declare an integer variable to hold the value of the largest overlap detected so far

51

Computing the overlap

int compute_overlap (const std::string& sequence, const std::string& fragment)
{
if (fragment.size() > sequence.size())
throw std::runtime_error ("fragment size larger than current sequence!");
std::cerr << "sequence is \"" << sequence << "\"\n";
std::cerr << "trying fragment \"" << fragment << "\"\n";
int largest_overlap = 0;
for (int overlap = fragment.size(); overlap > 0; --overlap) {
const auto seq_start = sequence.substr(0, overlap);
const auto frag_end = fragment.substr(fragment.size()-overlap);
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
...

Now, instead of returning the value of the current overlap, we record this value, and use the break statement to stop the enclosing loop and proceed with the following code.

52

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}
53

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}

We now check the other end of the string, using the same idea as before – with a few differences

53

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}

This time, we extract the substrings from the opposite ends of both the current sequence and the candidate fragment

54

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}

There is no point iterating all the way to zero, since we already know the size of the largest overlap from the previous loop

  • there is no point checking smaller overlaps than that!
55

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}

... and we record the overlap as a negative value.

  • this will be our way of representing an overlap at the tail end of the sequence
  • this is guaranteed to be the largest overlap so far, since in this for loop, we are only considering overlaps larger than encountered at the other end of the sequence
56

Computing the overlap

...
if (seq_start == frag_end) {
largest_overlap = overlap;
break;
}
}
for (int overlap = fragment.size(); overlap > largest_overlap; --overlap) {
const auto seq_end = sequence.substr(sequence.size() - overlap);
const auto frag_start = fragment.substr(0, overlap);
if (seq_end == frag_start) {
largest_overlap = -overlap;
break;
}
}
return largest_overlap;
}

Finally, we end by returning the largest overlap recorded in both loops

57

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

58

Identifying fragment with the largest overlap

...
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < fragments.size(); ++n) {
const auto overlap = compute_overlap (sequence, fragments[n]);
if (std::abs (biggest_overlap) < std::abs (overlap)) {
biggest_overlap = overlap;
fragment_with_biggest_overlap = n;
}
}
if (fragment_with_biggest_overlap >= 0)
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
write_sequence (args[2], sequence);
}
59

Identifying fragment with the largest overlap

...
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < fragments.size(); ++n) {
const auto overlap = compute_overlap (sequence, fragments[n]);
if (std::abs (biggest_overlap) < std::abs (overlap)) {
biggest_overlap = overlap;
fragment_with_biggest_overlap = n;
}
}
if (fragment_with_biggest_overlap >= 0)
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
write_sequence (args[2], sequence);
}

We iterate over all candidate fragments

  • we use a regular for loop (rather than a range-based for loop) since we want to keep track of the index of the fragment
60

Identifying fragment with the largest overlap

...
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < fragments.size(); ++n) {
const auto overlap = compute_overlap (sequence, fragments[n]);
if (std::abs (biggest_overlap) < std::abs (overlap)) {
biggest_overlap = overlap;
fragment_with_biggest_overlap = n;
}
}
if (fragment_with_biggest_overlap >= 0)
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
write_sequence (args[2], sequence);
}

We use our new function to compute the overlap for the current candidate fragment

  • we store the result in a block scope variable
  • we declare this variable const since we have no intention of modifying it any further
    • this is good practice as we can rely on the compiler to check that we do not trip ourselves up!
  • we allow the compiler to deduce the type using the auto keyword
    • the compiler will assume we want to use the type returned by compute_overlap()
    • we could just easily have declared it as int...
61

Identifying fragment with the largest overlap

...
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < fragments.size(); ++n) {
const auto overlap = compute_overlap (sequence, fragments[n]);
if (std::abs (biggest_overlap) < std::abs (overlap)) {
biggest_overlap = overlap;
fragment_with_biggest_overlap = n;
}
}
if (fragment_with_biggest_overlap >= 0)
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
write_sequence (args[2], sequence);
}

We can then compare the extracted portions using the == comparison operator


If they match, we can immediately return the value of the overlap

We check whether the overlap for the current fragment exceeds the biggest overlap we have detected so far

  • since we have allowed overlaps to be negative, we need to take the absolute value of the overlaps prior to comparison
  • we also need to make sure we have declared a variable to hold the value of the biggest overlap detected so far, and initialise it with the smallest value possible
62

Identifying fragment with the largest overlap

...
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < fragments.size(); ++n) {
const auto overlap = compute_overlap (sequence, fragments[n]);
if (std::abs (biggest_overlap) < std::abs (overlap)) {
biggest_overlap = overlap;
fragment_with_biggest_overlap = n;
}
}
if (fragment_with_biggest_overlap >= 0)
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
write_sequence (args[2], sequence);
}

If we make through to the last iteration, there is no match, so we simply return 0;

If the overlap is the biggest so far, record its value, and the index of the corresponding fragment

  • to do this, we also need to declare a variable to hold the index of the corresponding fragment
  • we also need to initialise the value of this index. In this case, it's best to assign it an invalid value (for example, a negative index)
63

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

64

Merge fragment with current sequence

Once we've identified the candidate fragment with the largest overlap, we need to merge it with the current sequence to produce a longer, updated sequence:

65

Merge fragment with current sequence

Once we've identified the candidate fragment with the largest overlap, we need to merge it with the current sequence to produce a longer, updated sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA
65

Merge fragment with current sequence

Once we've identified the candidate fragment with the largest overlap, we need to merge it with the current sequence to produce a longer, updated sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA

CCCTCATCACAACTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
65

Merge fragment with current sequence

Once we've identified the candidate fragment with the largest overlap, we need to merge it with the current sequence to produce a longer, updated sequence:

CTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC
CCCTCATCACAACTGAATGCTTGGGCTGAA

CCCTCATCACAACTGAATGCTTGGGCTGAAAGGGCGCGAGACGTATTCCCCGGTTGC


Let's implement a function to do this. This is what our function declaration should look like. We can add it to overlap.h:

void merge (std::string& sequence, const std::string& fragment, const int overlap);
65

Merge fragment with current sequence

Next, add our function definition to overlap.cpp:

void merge (std::string& sequence, const std::string& fragment, const int overlap)
{
if (overlap < 0) {
sequence += fragment.substr (-overlap);
}
else if (overlap > 0) {
sequence = fragment.substr (0, fragment.size()-overlap) + sequence;
}
}
66

Merge fragment with current sequence

Next, add our function definition to overlap.cpp:

void merge (std::string& sequence, const std::string& fragment, const int overlap)
{
if (overlap < 0) {
sequence += fragment.substr (-overlap);
}
else if (overlap > 0) {
sequence = fragment.substr (0, fragment.size()-overlap) + sequence;
}
}
  • we concatenate the non-overlapping portion of the fragment with the sequence
  • the sign of overlap indicates which end the overlap corresponds to
    • hence whether to append or prepend the fragment to the sequence
66

Merge fragment with current sequence

Use merge() function in main():

...
if (fragment_with_biggest_overlap >= 0) {
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
merge (sequence, fragments[fragment_with_biggest_overlap], biggest_overlap);
}
std::cerr << "final sequence has length " << sequence.size() << "\n";
write_sequence (args[2], sequence);
}
67

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

68

Iteratively merge fragments into sequence

while (fragments.size()) {
std::cerr << "---------------------------------------------------\n";
std::cerr << fragments.size() << " fragments left\n";
int biggest_overlap = 0;
int fragment_with_biggest_overlap = -1;
for (int n = 0; n < static_cast<int> (fragments.size()); ++n) {
...
}
if (fragment_with_biggest_overlap < 0)
break;
if (std::abs (biggest_overlap) < 10)
break;
std::cerr << "fragment with biggest overlap is at index "
<< fragment_with_biggest_overlap
<< ", overlap = " << biggest_overlap << "\n";
merge (sequence, fragments[fragment_with_biggest_overlap], biggest_overlap);
fragments.erase (fragments.begin() + fragment_with_biggest_overlap);
}
69

Exercises

  • Add a function to compute the overlap between the current sequence and a candidate fragment

    • make sure it works for both ends of the string
  • Use this function to identify the candidate fragment with the largest overlap with current sequence

  • Add a function to merge this candidate fragment with the current sequence, given the computed overlap

  • Use these functions to iteratively merge candidates fragments until no overlapping fragments remain

  • Check that all unmerged fragments are already contained within the sequence

70

Check all unmerged fragments are already in sequence

...
fragments.erase (fragments.begin() + fragment_with_biggest_overlap);
}
int num_unmatched = 0;
for (const auto& frag : fragments) {
if (sequence.find (frag) == std::string::npos)
++num_unmatched;
}
if (num_unmatched)
std::cerr << "WARNING: " << num_unmatched << " fragments remain unmatched!\n";
std::cerr << "final sequence has length " << sequence.size() << "\n";
write_sequence (args[2], sequence);
}
71

Command-line options

Requesting additional debugging information

72

Command-line options

We've already seen examples of command-line options in other commands:

$ ls -l
$ g++ -std=c++20 shotgun.cpp -o shotgun
73

Command-line options

We've already seen examples of command-line options in other commands:

$ ls -l
$ g++ -std=c++20 shotgun.cpp -o shotgun

The purpose of a command-line option is typically to alter the behaviour of the command in some way

73

Command-line options

We've already seen examples of command-line options in other commands:

$ ls -l
$ g++ -std=c++20 shotgun.cpp -o shotgun

The purpose of a command-line option is typically to alter the behaviour of the command in some way

Can we use that to request additional debugging information?

  • it is a common practice for commands to provide an option to enable verbose mode
  • this is really useful to figure out exactly what is going wrong when running your program!
73

Command-line options

We've already seen examples of command-line options in other commands:

$ ls -l
$ g++ -std=c++20 shotgun.cpp -o shotgun

The purpose of a command-line option is typically to alter the behaviour of the command in some way

Can we use that to request additional debugging information?

  • it is a common practice for commands to provide an option to enable verbose mode
  • this is really useful to figure out exactly what is going wrong when running your program!

A typical option to enable verbose mode is -v

  • let's try to implement this in our project
73

Command-line options

There are many ways to handle command-line options

74

Command-line options

There are many ways to handle command-line options

Have a go at implementing this simple approach to detect the -v option:

  • start by checking whether the option has been provided in the arguments list
  • if it has, then:
    • take whatever action is appropriate (e.g. set a verbose variable to true)
    • remove that option from the argument list
  • process your arguments as previously
74

Command-line options

There are many ways to handle command-line options

Have a go at implementing this simple approach to detect the -v option:

  • start by checking whether the option has been provided in the arguments list
  • if it has, then:
    • take whatever action is appropriate (e.g. set a verbose variable to true)
    • remove that option from the argument list
  • process your arguments as previously

Hint: there are many ways of doing this, but you may find the new C++20 std::erase() function very useful here (if you can work out how to use it from the documentation...)

74

Detecting the -v option

Possible solution:

...
void run (std::vector<std::string>& args)
{
bool verbose = std::erase (args, "-v");
if (args.size() < 3)
throw std::runtime_error ("expected 2 arguments: input_fragments output_sequence");
...
}
75

Detecting the -v option

Possible solution:

...
void run (std::vector<std::string>& args)
{
bool verbose = std::erase (args, "-v");
if (args.size() < 3)
throw std::runtime_error ("expected 2 arguments: input_fragments output_sequence");
...
}

std::erase() will erase any occurence of the value ("-v") from the container (args), and return the number of occurences removed.

  • if no -v in the argument list, the return value is zero, which equates to false when assigned to a bool
  • if there were any occurences of -v, a non-zero value will be returned, which equates to true
75

Global variables

One problem with the previous solution is that verbose is a local variable

  • it is only accessible within the run() function
76

Global variables

One problem with the previous solution is that verbose is a local variable

  • it is only accessible within the run() function

If we call other functions, the code in these functions will have no knowledge or access to our verbose variable

76

Global variables

One problem with the previous solution is that verbose is a local variable

  • it is only accessible within the run() function

If we call other functions, the code in these functions will have no knowledge or access to our verbose variable

We could pass our verbose variable as an argument to all relevant functions

  • but this very rapidly becomes cumbersome and impractical
76

Global variables

One problem with the previous solution is that verbose is a local variable

  • it is only accessible within the run() function

If we call other functions, the code in these functions will have no knowledge or access to our verbose variable

We could pass our verbose variable as an argument to all relevant functions

  • but this very rapidly becomes cumbersome and impractical

A better option in this particular case is to use a global variable

76

Global variables

A global variable is declared outside of any function and is accessible from anywhere in the code

  • provided its declaration is encountered before the point of use within each translation unit
77

Global variables

A global variable is declared outside of any function and is accessible from anywhere in the code

  • provided its declaration is encountered before the point of use within each translation unit

In general, global variables are strongly discouraged

  • they introduce scope for state changes that may affect how different part of the program operate, and make it difficult to ensure predictable behaviour (see e.g. this article for details)
77

Global variables

A global variable is declared outside of any function and is accessible from anywhere in the code

  • provided its declaration is encountered before the point of use within each translation unit

In general, global variables are strongly discouraged

  • they introduce scope for state changes that may affect how different part of the program operate, and make it difficult to ensure predictable behaviour (see e.g. this article for details)

There are however cases where they make sense:

  • for constants (in which case they would also be declared const)
77

Global variables

A global variable is declared outside of any function and is accessible from anywhere in the code

  • provided its declaration is encountered before the point of use within each translation unit

In general, global variables are strongly discouraged

  • they introduce scope for state changes that may affect how different part of the program operate, and make it difficult to ensure predictable behaviour (see e.g. this article for details)

There are however cases where they make sense:

  • for constants (in which case they would also be declared const)
  • when they represent a truly unique entity
    • examples include std::cout, std::cerr: there can only be one of each in any program
    • in our case, there can only ever be one setting representing verbose mode!
77

Global variables

We could declare our global variable like this:

...
bool verbose = false;
...
void run (std::vector<std::string>& args)
{
verbose = std::erase (args, "-v");
...

But there are still issues with this:

  • this is declared within our shotgun.cpp file – it won't be accessible in our other cpp files
    • they won't be in the same translation unit
78

Global variables

We could declare our global variable like this:

...
bool verbose = false;
...
void run (std::vector<std::string>& args)
{
verbose = std::erase (args, "-v");
...

But there are still issues with this:

  • this is declared within our shotgun.cpp file – it won't be accessible in our other cpp files
    • they won't be in the same translation unit
  • what happens if another function declares their own local variable called verbose?
78

Variable scope

The scope of a variable determines its lifetime and where it can be accessed from.

Global scope

  • Variables are instantiated (created) before main() starts, and destroyed after main() returns
  • they are accessible from any part of the code
    • provided their declaration occurs before their point of use in the current translation unit
79

Variable scope

The scope of a variable determines its lifetime and where it can be accessed from.

Global scope

  • Variables are instantiated (created) before main() starts, and destroyed after main() returns
  • they are accessible from any part of the code
    • provided their declaration occurs before their point of use in the current translation unit

Function scope

  • Variables are instantiated when declared, and destroyed when the function returns
  • They are only accessible within their enclosing function, after their declaration
  • Each invocation of a given function will have its own version of its local variables
79

Variable scope

The scope of a variable determines its lifetime and where it can be accessed from.

Global scope

  • Variables are instantiated (created) before main() starts, and destroyed after main() returns
  • they are accessible from any part of the code
    • provided their declaration occurs before their point of use in the current translation unit

Function scope

  • Variables are instantiated when declared, and destroyed when the function returns
  • They are only accessible within their enclosing function, after their declaration
  • Each invocation of a given function will have its own version of its local variables

Block scope

  • Block scope means within a compound statement (e.g. within a for loop or if statement)
  • Variables are instantiated when declared, and destroyed at the end of the block
  • They are only accessible within their block, after their declaration
79

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}
80

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}

This is perfectly legal in C++!

80

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}

This is perfectly legal in C++!

The variable at function scope hides the variable of the same name at global scope

  • at this point, any reference to count is assumed to refer to the function scope variable
  • the global scope version of count still exists, but is no longer accessible by that name
81

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}

This is perfectly legal in C++!

Similarly, the variable at block scope hides the variable of the same name at function scope

  • at this point, any reference to count is assumed to refer to the block scope variable
  • the function and global scope versions of count still exist, but are no longer accessible by that name
82

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}

This is perfectly legal in C++!

This line will print the block scope version of count

83

Variable shadowing

Variable shadowing (also called name hiding) occurs when two variables of the same name co-exist in different scopes

int count = 0; // <= global scope
...
void compute ()
{
int count = 10; // <= function scope
...
for (int count = 0; count < 20; ++count) { // <= block scope
std::cout << count << "\n";
...
}
std::cout << count << "\n";
}

This is perfectly legal in C++!

... but this line will print the function scope version of count, since it is outside the block

  • the block scope version has been destroyed by that point
84

Variable shadowing

Variable shadowing allows a function to use local variables without worrying about whether some other global variable exists elsewhere with the same name

85

Variable shadowing

Variable shadowing allows a function to use local variables without worrying about whether some other global variable exists elsewhere with the same name

... but it can also lead to subtle errors!

85

Variable shadowing

Variable shadowing allows a function to use local variables without worrying about whether some other global variable exists elsewhere with the same name

... but it can also lead to subtle errors!

Name collisions are a general problem in computing

  • many projects will rely on functionality provided by other projects
  • difficult to avoid using the same names across different projects!
85

Variable shadowing

Variable shadowing allows a function to use local variables without worrying about whether some other global variable exists elsewhere with the same name

... but it can also lead to subtle errors!

Name collisions are a general problem in computing

  • many projects will rely on functionality provided by other projects
  • difficult to avoid using the same names across different projects!

Most of these problems can be avoided by:

  • not using global variables in the first place!
85

Variable shadowing

Variable shadowing allows a function to use local variables without worrying about whether some other global variable exists elsewhere with the same name

... but it can also lead to subtle errors!

Name collisions are a general problem in computing

  • many projects will rely on functionality provided by other projects
  • difficult to avoid using the same names across different projects!

Most of these problems can be avoided by:

  • not using global variables in the first place!
  • using a dedicated C++ namespace
85

C++ namespaces

86

What is a namespace?

A namespace is essentially a named scope for your function and variable declarations

87

What is a namespace?

A namespace is essentially a named scope for your function and variable declarations

We have already been using a namespace from the start: the std namespace

  • this namespace is where all the functionality provided by the C++ standard is declared
87

What is a namespace?

A namespace is essentially a named scope for your function and variable declarations

We have already been using a namespace from the start: the std namespace

  • this namespace is where all the functionality provided by the C++ standard is declared
std::vector
  • the vector class (a type) declared within the std namespace
std::cout
  • a variable (actually of type std::ostream) called cout declared within the std namespace
87

What is a namespace?

A namespace is essentially a named scope for your function and variable declarations

We have been using a namespace from the start: the std namespace

  • this namespace is where all the functionality provided by the C++ standard is declared
std::vector
  • the vector class (a type) declared within the std namespace
std::cout
  • a variable (actually of type std::ostream) called cout declared within the std namespace
  • these are both declared within the std namespace
88

What is a namespace?

A namespace is essentially a named scope for your function and variable declarations

We have been using a namespace from the start: the std namespace

  • this namespace is where all the functionality provided by the C++ standard is declared
std::vector
  • the vector class (a type) declared within the std namespace
std::cout
  • a variable (actually of type std::ostream) called cout declared within the std namespace
  • these are both declared within the std namespace
  • accessing members of a namespace is done with the scope resolution operator (::)
89

Why would we want to use namespaces?

The main purpose of C++ namespace is:

  • to organise your code into logical units
  • to avoid name collisions
90

Why would we want to use namespaces?

The main purpose of C++ namespace is:

  • to organise your code into logical units
  • to avoid name collisions

For example, we may very well want to declare a variable or function called count

Thankfully, it is declared within the std namespace

  • we can declare our variable or function count without the risk of name collisions with std::count()!
90

Declaring namespaces

Declaring variables or functions within a namespace is simply a matter of declaring them within the scope of our namespace.

91

Declaring namespaces

Declaring variables or functions within a namespace is simply a matter of declaring them within the scope of our namespace.

For example:

namespace debug {
bool verbose = false;
}

This declares our variable verbose to be within the debug namespace

  • this also implicitly declares the debug namespace if it hadn't already been encountered
91

Working with namespaces

Any code also declared within our debug namespace can already access our verbose variable directly.

92

Working with namespaces

Any code also declared within our debug namespace can already access our verbose variable directly.

When accessing our variable outside of the debug namespace, we can refer to it using its fully qualified name:

if (debug::verbose)
...
92

Working with namespaces

Any code also declared within our debug namespace can already access our verbose variable directly.

When accessing our variable outside of the debug namespace, we can refer to it using its fully qualified name:

if (debug::verbose)
...

Alternatively, we can selectively bring one identifier (variable or function) into the current scope with the using declaration:

using debug::verbose;
...
if (verbose)
...
92

Working with namespaces

We can also bring all identifiers declared in a namespace into the current scope with the using namespace directive:

using namespace debug;
...
if (verbose)
...
93

Working with namespaces

We can also bring all identifiers declared in a namespace into the current scope with the using namespace directive:

using namespace debug;
...
if (verbose)
...

The using namespace directive can be problematic

  • the general recommendation is to avoid its use
  • especially at global scope
93

Working with namespaces

We can also bring all identifiers declared in a namespace into the current scope with the using namespace directive:

using namespace debug;
...
if (verbose)
...

The using namespace directive can be problematic

  • the general recommendation is to avoid its use
  • especially at global scope

It is much better to use the fully qualified name, or if it really helps code readability, use the selective using declaration for the specific identifiers you need

  • and only use it at function or block scope
93

The problem with using namespace std;

In spite of the general recommendation to avoid blanket using namespace directives, you will find that many (if not most) C++ tutorials make use of this directive from the very beginning, starting at 'hello world':

#include <iostream>
using namespace std;
int main() {
cout << "Hello World!\n";
return 0;
}
94

The problem with using namespace std;

In spite of the general recommendation to avoid blanket using namespace directives, you will find that many (if not most) C++ tutorials make use of this directive from the very beginning, starting at 'hello world':

#include <iostream>
using namespace std;
int main() {
cout << "Hello World!\n";
return 0;
}

This brings everything declared within the std namespace into the global scope

  • this helps to keep the code cleaner by avoiding the need for std:: as a prefix for std::cerr, std::vector, std::string, ...
  • it makes it a bit easier when getting started with C++
94

The problem with using namespace std;

So why do we not follow this convention on this course?

95

The problem with using namespace std;

So why do we not follow this convention on this course?

95

The problem with using namespace std;

So why do we not follow this convention on this course?

In practice, the using namespace directive is strongly discouraged

  • it bring everything from that namespace into the current scope
  • including variables or functions you might not have known about!
  • it negates the name collision benefits that C++ namespaces were meant to address!
95

The problem with using namespace std;

So why do we not follow this convention on this course?

In practice, the using namespace directive is strongly discouraged

  • it bring everything from that namespace into the current scope
  • including variables or functions you might not have known about!
  • it negates the name collision benefits that C++ namespaces were meant to address!

It is particularly bad practice to make use of using namespace directives at global scope within header files

  • it's acceptable to use using namespace within your own cpp files
    • it would still not be considered good practice!
    • but you can manage name collisions within your own project
95

The problem with using namespace std;

So why do we not follow this convention on this course?

In practice, the using namespace directive is strongly discouraged

  • it bring everything from that namespace into the current scope
  • including variables or functions you might not have known about!
  • it negates the name collision benefits that C++ namespaces were meant to address!

It is particularly bad practice to make use of using namespace directives at global scope within header files

  • it's acceptable to use using namespace within your own cpp files
    • it would still not be considered good practice!
    • but you can manage name collisions within your own project
  • but header files are supposed to be used by other code!
    • any such global declaration will pollute the namespace of any code that #includes your header
    • it will affect the entire translation unit from the point your header is #included
95

Working with namespaces

For the reasons outlined in the previous slides, we recommend to always use the fully qualified name to access members of a namespace

96

Working with namespaces

For the reasons outlined in the previous slides, we recommend to always use the fully qualified name to access members of a namespace

Coming back to our project, we can declare our verbose variable within a new debug namespace to avoid name collisions

namespace debug {
bool verbose = false;
}

and refer to this variable by its fully qualified name: debug::verbose

96

Working with namespaces

For the reasons outlined in the previous slides, we recommend to always use the fully qualified name to access members of a namespace

Coming back to our project, we can declare our verbose variable within a new debug namespace to avoid name collisions

namespace debug {
bool verbose = false;
}

and refer to this variable by its fully qualified name: debug::verbose

But we still need to work out where to declare this variable

96

Working with global variables

As previously mentioned, our debug::verbose variable must be declared prior to its point of use in any translation unit where it is needed

  • this means it needs to be declared in a header file
97

Working with global variables

As previously mentioned, our debug::verbose variable must be declared prior to its point of use in any translation unit where it is needed

  • this means it needs to be declared in a header file

Let's create a header file debug.h to hold this variable:

#pragma once
namespace debug {
bool verbose = false;
}
97

Working with global variables

As previously mentioned, our debug::verbose variable must be declared prior to its point of use in any translation unit where it is needed

  • this means it needs to be declared in a header file

Let's create a header file debug.h to hold this variable:

#pragma once
namespace debug {
bool verbose = false;
}

We can now #include it in shotgun.cpp, fragments.h and overlap.cpp

97

Working with global variables

As previously mentioned, our debug::verbose variable must be declared prior to its point of use in any translation unit where it is needed

  • this means it needs to be declared in a header file

Let's create a header file debug.h to hold this variable:

#pragma once
namespace debug {
bool verbose = false;
}

We can now #include it in shotgun.cpp, fragments.h and overlap.cpp

Exercise: implement these changes in your code

97

Working with global variables

When trying to do this, you will most likely have encountered errors of this nature:

/usr/bin/ld: fragments.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
/usr/bin/ld: overlap.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status
98

Working with global variables

When trying to do this, you will most likely have encountered errors of this nature:

/usr/bin/ld: fragments.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
/usr/bin/ld: overlap.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status

The issue is that we have declared and implicitly defined our debug::verbose variable once per translation unit

  • we now have 3 versions of the same global variable, each independently listed in the object files shotgun.o, fragments.o and overlap.o
98

Working with global variables

When trying to do this, you will most likely have encountered errors of this nature:

/usr/bin/ld: fragments.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
/usr/bin/ld: overlap.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status

The issue is that we have declared and implicitly defined our debug::verbose variable once per translation unit

  • we now have 3 versions of the same global variable, each independently listed in the object files shotgun.o, fragments.o and overlap.o
  • this causes an error at the linker stage, when the contents of all the compiled object files are brought together to form the final executable: which version is the right one to use?
98

Working with global variables

When trying to do this, you will most likely have encountered errors of this nature:

/usr/bin/ld: fragments.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
/usr/bin/ld: overlap.o:(.bss+0x0): multiple definition of `debug::verbose';
shotgun.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status

The issue is that we have declared and implicitly defined our debug::verbose variable once per translation unit

  • we now have 3 versions of the same global variable, each independently listed in the object files shotgun.o, fragments.o and overlap.o
  • this causes an error at the linker stage, when the contents of all the compiled object files are brought together to form the final executable: which version is the right one to use?

There are 2 ways to get around this:

  • ensure the variable is only declared in the header, and provide a single definition in a separate cpp file
  • tell the compiler to allow multiple definitions of this variable
98

Working with global variables

First approach:

Use the extern keyword in debug.h to make it clear that we are only declaring our variable at this point:

#pragma once
namespace debug {
extern bool verbose;
}
99

Working with global variables

First approach:

Use the extern keyword in debug.h to make it clear that we are only declaring our variable at this point:

#pragma once
namespace debug {
extern bool verbose;
}

and place the full definition and initialisation in a new file debug.cpp:

#include "debug.h"
namespace debug {
bool verbose = false;
}
99

Working with global variables

Second approach (much simpler):

Use the inline keyword in our debug.h header file to allow multiple definitions of the same variable to be provided across multiple translation units:

#pragma once
namespace debug {
inline bool verbose = false;
}

We do not need an additional cpp file in this case, and the initial value can now be specified at the same time

100

Working with global variables

Second approach (much simpler):

Use the inline keyword in our debug.h header file to allow multiple definitions of the same variable to be provided across multiple translation units:

#pragma once
namespace debug {
inline bool verbose = false;
}

We do not need an additional cpp file in this case, and the initial value can now be specified at the same time

Note that this use of the inline keyword was introduced in the C++17 version of the standard

100

Working with global variables

Second approach (much simpler):

Use the inline keyword in our debug.h header file to allow multiple definitions of the same variable to be provided across multiple translation units:

#pragma once
namespace debug {
inline bool verbose = false;
}

We do not need an additional cpp file in this case, and the initial value can now be specified at the same time

Note that this use of the inline keyword was introduced in the C++17 version of the standard

Let's go ahead and use the inline keyword in our project

100

Enabling the verbose option

We now have all the pieces required to implement our verbose option

  • we add the debug.h header file as per the previous slide
101

Enabling the verbose option

We now have all the pieces required to implement our verbose option

  • we add the debug.h header file as per the previous slide
  • in main():
    • we use the std::erase() call to detect and handle the -v option
    • and set debug::verbose to true if detected
    • we do this before any attempt at using the command-line arguments
101

Enabling the verbose option

We now have all the pieces required to implement our verbose option

  • we add the debug.h header file as per the previous slide
  • in main():
    • we use the std::erase() call to detect and handle the -v option
    • and set debug::verbose to true if detected
    • we do this before any attempt at using the command-line arguments
  • everywhere else:
    • #include "debug.h" so the debug::verbose variable is declared at the point of use
    • check if debug::verbose is true, and if so print out appropriate information
101

Enabling the verbose option

We now have all the pieces required to implement our verbose option

  • we add the debug.h header file as per the previous slide
  • in main():
    • we use the std::erase() call to detect and handle the -v option
    • and set debug::verbose to true if detected
    • we do this before any attempt at using the command-line arguments
  • everywhere else:
    • #include "debug.h" so the debug::verbose variable is declared at the point of use
    • check if debug::verbose is true, and if so print out appropriate information

For example, in fragments.cpp:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
if (debug::verbose)
std::cerr << "reading fragments from file \"" << filename << "\"...\n";
...
101

Enabling the verbose option

We now have all the pieces required to implement our verbose option

  • we add the debug.h header file as per the previous slide
  • in main():
    • we use the std::erase() call to detect and handle the -v option
    • and set debug::verbose to true if detected
    • we do this before any attempt at using the command-line arguments
  • everywhere else:
    • #include "debug.h" so the debug::verbose variable is declared at the point of use
    • check if debug::verbose is true, and if so print out appropriate information

For example, in fragments.cpp:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
if (debug::verbose)
std::cerr << "reading fragments from file \"" << filename << "\"...\n";
...

Exercise: implement these changes in your own code

101

Using the verbose option

We can now run our code as normal:

$ ./shotgun ../data/fragments-1.txt out
initial sequence has size 1000
final sequence has length 20108

which provides a clean output, showing just as much information as we might expect when everything works correctly

102

Using the verbose option

... or we can run it with the verbose option to see exactly what is going on:

$ ./shotgun ../data/fragments-1.txt out -v
reading fragments from file "../data/fragments-1.txt"...
read 190 fragments
190 fragments, mean fragment length: 529.158, range [ 51 1000 ]
initial sequence has size 1000
---------------------------------------------------
189 fragments left
fragment with biggest overlap is at index 41, overlap = -824
---------------------------------------------------
...
fragment with biggest overlap is at index 37, overlap = 51
---------------------------------------------------
104 fragments left
104 fragments remaining unmatched - checking whether already contained in sequence...
final sequence has length 20108
writing sequence to file "out"...
103

Using the verbose option

The availability of a verbose option allows you and your users to quickly narrow down problems

  • the default output can be kept clean and minimal

  • when issues occur, running in verbose mode provide useful information:

    • in the case of crashes, it provides a more precise indication of when the problem occurred (as indicated by the last message), and hence where to look in the code to figure out the problem
    • in the case of unexpected output, the information reported may point out issues with the input data, or provide clues as to where to look for errors in the code
  • when there are performance issues, verbose mode can provide an indication of which stage is taking longer to complete than expected

  • ...

104

Using the verbose option

The availability of a verbose option allows you and your users to quickly narrow down problems

  • the default output can be kept clean and minimal

  • when issues occur, running in verbose mode provide useful information:

    • in the case of crashes, it provides a more precise indication of when the problem occurred (as indicated by the last message), and hence where to look in the code to figure out the problem
    • in the case of unexpected output, the information reported may point out issues with the input data, or provide clues as to where to look for errors in the code
  • when there are performance issues, verbose mode can provide an indication of which stage is taking longer to complete than expected

  • ...

In general, it is recommended to provide some mechanism to allow users to get detailed information about the processing, whether using a verbose option, or by way of a separate log file

104

Adding convenience functions for debugging

Now that we have the option to provide debugging output, we can wrap up this functionality more cleanly to avoid having lots of if (debug::verbose) statements throughout our code

Let's implement a debug::log() function, which will take a std::string as its argument, and print it to the standard error stream if verbose mode is enabled

  • it can additionally format its output to more clearly label it as debugging information (for example, by prefixing each line with [DEBUG])
105

Adding convenience functions for debugging

Now that we have the option to provide debugging output, we can wrap up this functionality more cleanly to avoid having lots of if (debug::verbose) statements throughout our code

Let's implement a debug::log() function, which will take a std::string as its argument, and print it to the standard error stream if verbose mode is enabled

  • it can additionally format its output to more clearly label it as debugging information (for example, by prefixing each line with [DEBUG])

Exercise: implement this function and make use of it in your own code

105

Adding convenience functions for debugging

In debug.h:

#pragma once
include <iostream>
include <string>
namespace debug {
inline bool verbose = false;
inline void log (const std::string& message) {
if (verbose)
std::cerr << "[DEBUG] " << message << "\n";
}
}
106

Adding convenience functions for debugging

In debug.h:

#pragma once
include <iostream>
include <string>
namespace debug {
inline bool verbose = false;
inline void log (const std::string& message) {
if (verbose)
std::cerr << "[DEBUG] " << message << "\n";
}
}

Note the use of the inline keyword here, this time to allow multiple definitions of the same function

107

A word about inline

Historically, the inline keyword was considered more as a performance optimisation feature

  • it was used for functions only, to indicate that a given function was suitable for inlining
  • this would suggest to the compiler that the required code could be substituted directly at the point of use, rather than calling that function explicitly
  • There is always a small overhead to calling a function, and for functions that perform small operations and are called very often, this can impact performance
  • declaring such functions as inline can help avoid the overhead of dedicated function calls, and so improve performance
    • this is only possible if the function definition is available, which is why the inline mechanism was introduced to allow definitions to be included in header files without causing the multiple definition problem
  • For further details, refer to other online sources (e.g. GeeksForGeeks)
108

A word about inline

Historically, the inline keyword was considered more as a performance optimisation feature

  • it was used for functions only, to indicate that a given function was suitable for inlining
  • this would suggest to the compiler that the required code could be substituted directly at the point of use, rather than calling that function explicitly
  • There is always a small overhead to calling a function, and for functions that perform small operations and are called very often, this can impact performance
  • declaring such functions as inline can help avoid the overhead of dedicated function calls, and so improve performance
    • this is only possible if the function definition is available, which is why the inline mechanism was introduced to allow definitions to be included in header files without causing the multiple definition problem
  • For further details, refer to other online sources (e.g. GeeksForGeeks)

In modern C++ (particularly since C++17), the inline keyword is used to allow multiple definitions of functions or variables across multiple translation units

  • although it does allow the compiler to perform the kinds of optimisation mentioned above, the compiler is free to treat this merely as a hint
108

Adding convenience functions for debugging

We can now use our convenience function in our code, for example in fragments.cpp:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
if (debug::verbose)
std::cerr << "reading fragments from file \"" << filename << "\"...\n";
...

becomes simply:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
debug::log ("reading fragments from file \"" + filename + "\"...");
...
109

Adding convenience functions for debugging

We can now use our convenience function in our code, for example in fragments.cpp:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
if (debug::verbose)
std::cerr << "reading fragments from file \"" << filename << "\"...\n";
...

becomes simply:

...
std::vector<std::string> load_fragments (const std::string& filename)
{
debug::log ("reading fragments from file \"" + filename + "\"...");
...

Note the use of the + string concatenation operator instead of the stream insertion operator (<<)


This is because our debug::log() function expects a single std::string argument

109

Adding convenience functions for debugging

How do we handle this case?

if (debug::verbose)
std::cerr << "read " << fragments.size() << " fragments\n";
110

Adding convenience functions for debugging

How do we handle this case?

if (debug::verbose)
std::cerr << "read " << fragments.size() << " fragments\n";

We can't write:

debug::log ("read " + fragments.size() + " fragments");

since fragment.size() returns an int, and we can't concatenate a std::string with an int!

110

Adding convenience functions for debugging

How do we handle this case?

if (debug::verbose)
std::cerr << "read " << fragments.size() << " fragments\n";

We can't write:

debug::log ("read " + fragments.size() + " fragments");

since fragment.size() returns an int, and we can't concatenate a std::string with an int!

Use std::to_string(), which returns a std::string representation of a variable:

debug::log ("read " + std::to_string (fragments.size()) + " fragments");
110

Adding convenience functions for debugging

How do we handle this case?

if (debug::verbose)
std::cerr << "read " << fragments.size() << " fragments\n";

We can't write:

debug::log ("read " + fragments.size() + " fragments");

since fragment.size() returns an int, and we can't concatenate a std::string with an int!

Use std::to_string(), which returns a std::string representation of a variable:

debug::log ("read " + std::to_string (fragments.size()) + " fragments");

Exercise: implement these changes in your own code

110

Exercises

111

Exercise 1

112

Picking up where we left off

We continue working on our DNA shotgun sequencing project

You can find the most up to date version in the project's solution/ folder

Make sure your code is up to date now!

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
s Start & Stop the presentation timer
t Reset the presentation timer
?, h Toggle this help
Esc Back to slideshow