class: title 5CCYB041 # OBJECT-ORIENTED PROGRAMMING ### Week 3, session 1 ## advanced string formatting
structs & classes --- # Picking up where we left off We continue working on our [DNA shotgun sequencing project](https://github.com/KCL-BMEIS/OOP/blob/main/projects/DNA_shotgun_sequencing/assignment.md) You can find the most up to date version in [the project's `solution/` folder](https://github.com/KCL-BMEIS/OOP/tree/shotgun_sequencing_solution/projects/DNA_shotgun_sequencing/solution) .explain-bottom[ Make sure your code is up to date now! ] --- name: format # Advanced string formatting We've already seen how to concatenate strings to form more complex strings, and how to convert numeric values to strings using `std::to_string()` - but this isn't always very convenient or easy to follow -- The C++20 standard introduces a new function to help with string formatting: [`std::format()`](https://www.geeksforgeeks.org/cpp-20-std-format/) -- Its use is best illustrated with an example. Instead of writing: ``` debug::log ("read " + std::to_string (fragments.size()) + " fragments"); ``` -- we can write: ``` #include
... debug::log (std::format ("read {} fragments", fragments.size())); ``` --- # The `std::format()` function The `std::format()` *template* function has the following (highly simplified) declaration: ``` namespace std { string format (format_string fmt, ArgType1 arg1, ArgType2 arg2, ...); } ``` --- # The `std::format()` function The `std::format()` *template* function has the following (highly simplified) declaration: ``` *namespace std { string format (format_string fmt, ArgType1 arg1, ArgType2 arg2, ...); *} ``` - it is declared within the `std` namespace --- # The `std::format()` function The `std::format()` *template* function has the following (highly simplified) declaration: ``` namespace std { `string` format (format_string fmt, ArgType1 arg1, ArgType2 arg2, ...); } ``` - it is declared within the `std` namespace - it returns a `std::string` --- # The `std::format()` function The `std::format()` *template* function has the following (highly simplified) declaration: ``` namespace std { string format (`format_string fmt`, ArgType1 arg1, ArgType2 arg2, ...); } ``` - it is declared within the `std` namespace - it returns a `std::string` - the first argument is the *format string*, of type `std::format_string` - it contains the text for the output string, with braces `{}` where substitutions are to be inserted --- # The `std::format()` function The `std::format()` *template* function has the following (highly simplified) declaration: ``` namespace std { string format (format_string fmt, `ArgType1 arg1`, ArgType2 arg2, ...); } ``` - it is declared within the `std` namespace - it returns a `std::string` - the first argument is the *format string*, of type `std::format_string` - it contains the text for the output string, with braces `{}` where substitutions are to be inserted - each subsequent argument is a variable to be converted to text and inserted into the format string instead of the matching `{}` --- # The `std::format()` function For example: ``` std::string name = "Joe"; std::string colour = "orange"; std::cout << std::format ("My name is {}, my favorite colour is {}\n", name, colour); ``` would produce: ``` My name is Joe, my favorite colour is orange ``` --- # The `std::format()` function But the arguments to be substituted don't need to be strings: ``` int num_iter = 101; double func_value = 0.023859; std::cout << std::format ("after {} iterations, function value = {}\n", num_iter, func_value); ``` would produce: ``` after 101 iterations, function value = 0.023859 ``` --- # The `std::format()` function For numeric arguments, the conversion to text can be carefully controlled: ``` int num_iter = 101; double func_value = 0.023859; std::cout << std::format ("after {} iterations, function value = `{:.3f}`\n", num_iter, func_value); ``` would produce the second argument to 3 decimal places: ``` after 101 iterations, function value = 0.024 ``` --- # The `std::format()` function There are many more formatting options – too many to cover in this course! For details, please refer to the [relevant documentation](https://en.cppreference.com/w/cpp/utility/format/spec) .explain-bottom[ Exercise: use the `std::format()` function where relevant in your own code ] --- name: struct class: section # Composite data types ## grouping data into *structures* --- # Returning multiple values from a function Looking at our project, we would like to add a `find_biggest_overlap()` function to: - identify the fragment that has the biggest overlap with the current sequence - remove it from the list of candidate fragments - *and* return the size of the overlap We need to return *two* pieces of information from that function! -- One approach to this problem relies on *references*: - one of the arguments to our function is a reference to an existing variable, and the function will assign the correct value to that variable before returning: ``` int find_biggest_overlap (const std::string& sequence, std::vector
& fragments, `int& index`) ``` - The `index` variable is passed by non-const reference, allowing the function to assign a value to it that will also update the original variable. - the function would then be free to use the return value to provide the size of the corresponding overlap --- # Returning multiple values from a function We would then be able to use this function as follows: ``` int index_of_fragment; int overlap_size = find_biggest_overlap (sequence, fragments, index_of_fragment); ``` Since `index_of_fragment` is passed by reference, the function can update its value - we can now rely on both `overlap_size` and `index_of_fragment` being set correctly. -- However, this is a cumbersome approach - we need to declare a variable before invoking the function - the intent is not immediately clear -- A better solution would be to return a single variable of a type capable of holding multiple values - for example, we could return a `std::vector
` here - but what if the two values to be returned were of a different type? --- # Structures in C++ A cleaner solution would be to declare our own *compound data type*, composed of the two variables we need. This can be done using *structures* - structures are an old concept: they predate C++ and were already present in C -- Structures allow us to define a new compound data type, composed of other data types, grouped together into a single entity. - each *member variable* is named, and can therefore be assigned a clear interpretation - the struct can then be treated as any other variable, passed to & from function calls, etc. -- This is best illustrated with an example --- # Structures in C++ Structures are declared using the `struct` keyword, followed by the list of members (along with their types) enclosed in braces: ``` struct Overlap { int size; int fragment; }; ``` -- This can then be used as a regular data type in our function declaration: ``` `Overlap` find_biggest_overlap (const std::string& sequence, std::vector
& fragments); ``` -- We can use our function as follows, and access the member variables using [dot-notation](https://www.geeksforgeeks.org/dot-operator-in-cpp/): ``` `auto overlap` = find_biggest_overlap (sequence, fragments); std::cerr << std::format ("overlap of size {} at index {}\n", `overlap.size`, `overlap.fragment`); ``` --- layout:true # Aggregate initialisation In the implementation of our function (the function definition), we need to return a variable of type `Overlap`. We can do that like this: --- ``` ... Overlap retval; retval.size = biggest_overlap; retval.fragment = fragment_with_biggest_overlap; return retval; } ``` --- ``` ... Overlap retval = { biggest_overlap, fragment_with_biggest_overlap }; return retval; } ``` A much cleaner solution is to use [aggregate initialisation](https://www.geeksforgeeks.org/aggregate-initialization-in-cpp-20/) in the `return` statement - each member of the struct is then initialised with the matching variable in the brace-delimited list --- ``` ... return { biggest_overlap, fragment_with_biggest_overlap }; } ``` A much cleaner solution is to use [aggregate initialisation](https://www.geeksforgeeks.org/aggregate-initialization-in-cpp-20/) in the `return` statement - each member of the struct is then initialised with the matching variable in the brace-delimited list And this can be further simplified by returning the initialiser list directly - the compiler already knows that this function returns an object of type `Overlap` - the compiler will instantiate a *temporary* (unnamed) instance of `Overlap` for us - ... and use aggregate initialisation as before --- layout:false # Returning multiple values from a function Declaring our own custom `struct` allows us to return multiple pieces of information as a single variable - this is a cleaner way to solve our problem -- .explain-bottom[ Exercise: add the `find_biggest_overlap()` function to your own code ] --- name: classes class: section # Classes ## A cornerstone of Object-Oriented Programming --- # C++ classes Classes can be thought of an extension of structures - indeed, in C++, `struct` are also classes! -- Classes are user-defined data types that can be used to group data, but also: - allow the class to provide *member functions* to interact with the data - provide *access specifiers* to limit access to some or all member variables -- Classes are central to Object-Oriented Programming - a class is essentially a *blueprint* for objects of that type - a class is used to represent a broadly independent aspect of our program - an instance of a class is also referred to as an *object* -- We have already used a number of standard classes: - `std::string`, `std::vector`, `std::ifstream`, ... --- # Class member functions or methods *Methods* or *member functions* are functions that are accessed via an existing *instance* of a class using the [dot operator]((https://www.geeksforgeeks.org/dot-operator-in-cpp/) -- You have already been using methods throughout the course so far: - `s.size();` - `v.size();` - `v.push_back();` - `v.insert();` - ... -- These are a feature of C++ classes - you can define your own methods for your own classes --- # Access specifiers Member variables or functions can be declared as *public* or *private* -- When **public**, the corresponding variable or function can be used from outside the class When **private**, the variable or function can only be used within another member function of the same class -- The ability to protect members in this way supports [*encapsulation*](https://www.geeksforgeeks.org/encapsulation-in-cpp/) and [*abstraction*](https://www.geeksforgeeks.org/abstraction-in-cpp/) - *private* data can only be modified using *public* methods: - the author of the class can then ensure the *consistency* of the internal state of their class (encapsulation) - users of the class only need to understand the *abstract* interface provided via the *public* methods (abstraction) -- Encapsulation and abstraction are [fundamental features of Object-Oriented Programming](https://www.geeksforgeeks.org/object-oriented-programming-in-cpp/) - we will cover them in more detail later in the course, when they will make more sense --- # Difference between `struct` and `class` In C++, there is actually very little *practical* difference between `struct` and `class` - there is however a big *conceptual* difference! -- As far as the compiler is concerned, `struct` & `class` are essentially the same thing - the only *actual* difference between the two is that *unless otherwise specified*: - members of a `struct` are *public* by default - members of a `class` are *private* by default -- Nonetheless, you are encouraged to reserve the use of `struct` for small, lightweight containers with public data members only - for example, as a way of grouping variables into a single entity that can be returned from a function - do not use a `struct` for anything that should provide an abstract interface, and/or where maintaining consistency between member variables is important -- ⇒ In general, prefer to define a `class` --- # Using classes in our project We already use plenty of classes in our project: - `std::string` - `std::vector` - `std::vector
` These already provide all the functionality we need for our program to function -- But what if we need to use our DNA shotgun sequencing algorithm as part of a broader project? - it would be better to *encapsulate* the algorithm into a distinct, discrete module of some form -- ⇒ let's use a class to represent our algorithm! --- layout: true # The shotgun sequencing algorithm as a class Let's set up a class called `ShotgunSequencer` to encapsulate our algorithm: --- ``` class ShotgunSequencer { }; ``` --- ``` `class` ShotgunSequencer { }; ``` - a class is declared using the keyword `class` - this is similar to declaring a `struct` --- ``` class `ShotgunSequencer` { }; ``` - a class is declared using the keyword `class` - this is similar to declaring a `struct` - we provide a suitable name for the class - this is the name of the *type* – not an instance - there are many conventions for naming – on this course, we recommend using [PascalCase](https://www.freecodecamp.org/news/snake-case-vs-camel-case-vs-pascal-case-vs-kebab-case-whats-the-difference/) - it is important to chose a name that clearly expresses what kind of object this class represents --- ``` class ShotgunSequencer `{` `};` ``` - a class is declared using the keyword `class` - this is similar to declaring a `struct` - we provide a suitable name for the class - this is the name of the *type* – not an instance - there are many conventions for naming – on this course, we recommend using [PascalCase](https://www.freecodecamp.org/news/snake-case-vs-camel-case-vs-pascal-case-vs-kebab-case-whats-the-difference/) - it is important to chose a name that clearly expresses what kind of object this class represents - the contents of the class are then declared between braces - don't forget the final semicolon! --- layout: true # The shotgun sequencing algorithm as a class Now let's add some data members to our class: --- ``` class ShotgunSequencer { private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` --- ``` class ShotgunSequencer { `private:` const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - we are going to declare our member variables as **private** - this is done using the `private` keyword, followed by a colon (`:`) - all subsequent declarations will be private --- ``` class ShotgunSequencer { private: * const int m_minimum_overlap = 10; * std::string m_sequence; * std::vector
m_fragments; }; ``` - we are going to declare our member variables as **private** - this is done using the `private` keyword, followed by a colon (`:`) - all subsequent declarations will be private - we can now declare our member variables, in exactly the same way as we did with `struct` - there are many naming conventions – for member variables, we recommend `snake_case` with the `m_` prefix --- ``` class ShotgunSequencer { private: * const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - we are going to declare our member variables as **private** - this is done using the `private` keyword, followed by a colon (`:`) - all subsequent declarations will be private - we can now declare our member variables, in exactly the same way as we did with `struct` - there are many naming conventions – for member variables, we recommend `snake_case` with the `m_` prefix - note that member variables can be *default-initialised* as shown - we need to initialise `m_minimum_overlap` since we have declared it `const` – we won't be able to modify it later! - note: this type of [in-class member initialisation](https://isocpp.org/wiki/faq/cpp11-language-classes#member-init) was introduced in C++11 --- layout: true # The shotgun sequencing algorithm as a class We now need to add *methods* to allow users of our class to interact with it: --- ``` class ShotgunSequencer { public: void init (const std::vector
& fragments); bool iterate (); void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` --- ``` class ShotgunSequencer { `public`: void init (const std::vector
& fragments); bool iterate (); void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - this time, our methods will need to be **public** - this is done using the `public` keyword, in much the same way as with `private` --- ``` class ShotgunSequencer { public: * void init (const std::vector
& fragments); * bool iterate (); * void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - this time, our methods will need to be **public** - this is done using the `public` keyword, in much the same way as with `private` - we can now add our *method declarations* - the names of these methods should mirror the actions performed in the algorithm - these look like regular function declarations – but are declared *within the scope* of our `ShotgunSequencer` class --- ``` class ShotgunSequencer { public: * void init (const std::vector
& fragments); bool iterate (); void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - `.init()` is used to provide the list of fragments to initialise the algorithm - it does not need to return anything (return type is therefore `void`) --- ``` class ShotgunSequencer { public: void init (const std::vector
& fragments); * bool iterate (); void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - `.iterate()` performs a single iteration of the algorithm - it will identify the fragment with the largest overlap, and if found, merge it - we return a `bool` to indicate the status of the iteration:
⇒ if `false`, no fragment was found, and the algorithm should stop --- ``` class ShotgunSequencer { public: void init (const std::vector
& fragments); bool iterate (); * void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - `.check_remaining_fragments()` performs the final check - the remaining fragments should all already be contained within the estimated sequence - we *could* have decided to return `bool` to indicate the status of the check – this is a design decision! - ... but if any fragments remain, we consider this to be unexpected, but not fatal ⇒ we issue a warning - there is therefore to need for a return value – the return type is also `void` --- ``` class ShotgunSequencer { public: void init (const std::vector
& fragments); * bool iterate (); * void check_remaining_fragments (); private: const int m_minimum_overlap = 10; std::string m_sequence; std::vector
m_fragments; }; ``` - note that we don't need to provide any arguments to these methods - this is because the class members will all be available within the scope of these methods - they will already have full access to the private `m_minimum_overlap`, `m_sequence` and `m_fragments` variables! --- layout: true # The shotgun sequencing algorithm as a class How do we use our class elsewhere in our code? In `shotgun.cpp`: --- --- ``` *#include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); * ShotgunSequencer solver; * solver.init (fragments); * while (solver.iterate()); * solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` --- ``` *#include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); ShotgunSequencer solver; solver.init (fragments); while (solver.iterate()); solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` - we need to `#include` our new header to ensure the declarations are accessible in this file --- ``` #include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); * ShotgunSequencer solver; solver.init (fragments); while (solver.iterate()); solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` - we need to `#include` our new header to ensure the declarations are accessible in this file - at the apppropriate point, we can create an *instance* of our new `ShotgunSequencer` class --- ``` #include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); ShotgunSequencer solver; * solver.init (fragments); while (solver.iterate()); solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` - we need to `#include` our new header to ensure the declarations are accessible in this file - at the apppropriate point, we can create an *instance* of our new `ShotgunSequencer` class - we use the `.init()` method to supply the list of fragments and initialise the algorithm --- ``` #include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); ShotgunSequencer solver; solver.init (fragments); * while (solver.iterate()); solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` - we can now iterate through the algorithm - the simplest approach is to use a `while` loop here: we keep going while `iterate()` returns `true` - as everything is done within the `.iterate()` method, we can leave the loop empty --- ``` #include "shotgun_sequencer.h" ... auto fragments = load_fragments (args[1]); ShotgunSequencer solver; solver.init (fragments); while (solver.iterate()); * solver.check_remaining_fragments(); std::cerr << "final sequence has length " << solver.sequence().size() << "\n"; write_sequence (args[2], solver.sequence()); } ``` Finally, we can perform the final check to ensure all the remaining fragments are indeed already contained in the final sequence --- layout: false # The shotgun sequencing algorithm as a class Note that *we cannot access private members* outside the class: ``` int run (std::vector
& args) { ... * std::cout << "final sequence is: " << solver.m_sequence << "\n"; write_sequence (args[2], solver.sequence()); } ``` - this line will result in a compiler error, similar to: ``` shotgun.cpp: In function ‘void run(std::vector
>&)’: shotgun.cpp:35:48: error: ‘std::string ShotgunSequencer::m_sequence’ is private within this context 35 | std::cout << "final sequence is: " << solver.m_sequence << "\n"; | ^~~~~~~~~~ In file included from shotgun.cpp:8: shotgun_sequencer.h:16:17: note: declared private here 16 | std::string m_sequence; | ^~~~~~~~~~ ``` --- layout: true # The shotgun sequencing algorithm as a class We have *declared* our methods, but we have not *defined* them!
Let's create a `shotgun_sequencer.cpp` file to match the corresponding header: --- --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" #include "shotgun_sequencer.h" #include "debug.h" void ShotgunSequencer::init (const std::vector
& fragments) { ... } bool ShotgunSequencer::iterate () { ... } void ShotgunSequencer::check_remaining_fragments () { ... } ``` --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" *#include "shotgun_sequencer.h" #include "debug.h" void ShotgunSequencer::init (const std::vector
& fragments) { ... } bool ShotgunSequencer::iterate () { ... } void ShotgunSequencer::check_remaining_fragments () { ... } ``` .explain-bottom[ As before, we need to `#include` all the necessary headers that declare the functionality we are going to use – including our new header! ] --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" #include "shotgun_sequencer.h" #include "debug.h" *void ShotgunSequencer::init (const std::vector
& fragments) { ... } *bool ShotgunSequencer::iterate () { ... } *void ShotgunSequencer::check_remaining_fragments () { ... } ``` .explain-top[ We can now provide the definitions for our methods. As before, we need to start each *definition* by replicating the *declaration*, so that the compiler can match it with the original declaration in the header
But there are some clear differences! ] --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" #include "shotgun_sequencer.h" #include "debug.h" void `ShotgunSequencer::`init (const std::vector
& fragments) { ... } bool `ShotgunSequencer::`iterate () { ... } void `ShotgunSequencer::`check_remaining_fragments () { ... } ``` .explain-top[ The name of each method is now prefixed with the *class name* and the [scope resolution operator](https://www.geeksforgeeks.org/scope-resolution-operator-in-c/)
This is because these definitions are now *outside* the scope of the class declaration (outside the braces within which we declared our member variables and functions).
This is how we can refer to member functions of a class. This essentially means: the `init()` method that was declared within the scope of the `ShotgunSequencer` class ] --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" #include "shotgun_sequencer.h" #include "debug.h" *void init (const std::vector
& fragments) { ... } *bool iterate () { ... } *void check_remaining_fragments () { ... } ``` .explain-top[ If we tried to define our methods *without* this scope resolution, the compiler would (rightly) assume that we are defining completely different, *global* functions, that are entirely independent of our `ShotgunSequencer` class!
For example, we would end up with: - an unexpected `iterate()` function - potentially with *compiler* errors as we try to access member variables - no *definition* for our `ShotgunSequencer::iterate()` method - leading to *linker* errors at a later stage in the build process (unresolved symbol) ] --- ```c++ #include
#include
#include
#include "fragments.h" #include "overlap.h" #include "shotgun_sequencer.h" #include "debug.h" void ShotgunSequencer::init (const std::vector
& fragments) { `...` } bool ShotgunSequencer::iterate () { `...` } void ShotgunSequencer::check_remaining_fragments () { `...` } ``` .explain-top[ Let's now focus on what will go in the *body* of our functions ] --- layout: true # Function definitions --- ``` void ShotgunSequencer::init (const std::vector
& fragments) { m_fragments = fragments; if (debug::verbose) fragment_statistics (m_fragments); m_sequence = extract_longest_fragment (m_fragments); } ``` --- ``` void ShotgunSequencer::init (const std::vector
& fragments) { `m_fragments` = fragments; if (debug::verbose) fragment_statistics (`m_fragments`); `m_sequence` = extract_longest_fragment (`m_fragments`); } ``` Note that we can access the *members* of our class directly within the body of our method - technically, these are the members of the *current instance* of our class - each instance will have its own independent version of these variables --- ``` void ShotgunSequencer::init (const std::vector
& fragments) { * m_fragments = fragments; if (debug::verbose) fragment_statistics (m_fragments); m_sequence = extract_longest_fragment (m_fragments); } ``` Note that we can access the *members* of our class directly within the body of our method - technically, these are the members of the *current instance* of our class - each instance will have its own independent version of these variables We start by copying the list of fragments over from the argument provided (`fragments`) into the corresponding member variable (`m_fragments`) - note how using a clear naming strategy for class members helps to avoid confusion! --- ``` void ShotgunSequencer::init (const std::vector
& fragments) { m_fragments = fragments; * if (debug::verbose) * fragment_statistics (m_fragments); * m_sequence = extract_longest_fragment (m_fragments); } ``` Note that we can access the *members* of our class directly within the body of our method - technically, these are the members of the *current instance* of our class - each instance will have its own independent version of these variables We start by copying the list of fragments over from the argument provided (`fragments`) into the corresponding member variable (`m_fragments`) - note how using a clear naming strategy for class members helps to avoid confusion! The rest of the function mirrors what was done in `shotgun.cpp` previously --- ``` bool ShotgunSequencer::iterate () { debug::log ("---------------------------------------------------"); debug::log (std::format ("{} fragments left", m_fragments.size())); auto [ overlap, index ] = find_biggest_overlap (m_sequence, m_fragments); if (index < 0) return false; if (std::abs (overlap) < m_minimum_overlap) return false; debug::log ( std::format ("fragment with biggest overlap is at index {}, overlap = {}", index, overlap)); merge (m_sequence, m_fragments[index], overlap); m_fragments.erase (m_fragments.begin() + index); return true; } ``` .explain-topright[ This mirrors almost exactly what was previously performed in `shotgun.cpp` – this time using the member variables (`m_minimum_overlap`, `m_fragments`, `m_sequence`) ] --- ``` bool ShotgunSequencer::iterate () { debug::log ("---------------------------------------------------"); debug::log (std::format ("{} fragments left", m_fragments.size())); auto [ overlap, index ] = find_biggest_overlap (m_sequence, m_fragments); if (index < 0) `return false`; if (std::abs (overlap) < m_minimum_overlap) `return false`; debug::log ( std::format ("fragment with biggest overlap is at index {}, overlap = {}", index, overlap)); merge (m_sequence, m_fragments[index], overlap); m_fragments.erase (m_fragments.begin() + index); `return true`; } ``` .explain-topright[ The main difference is that we now `return` to indicate success or failure. ] --- ``` void ShotgunSequencer::check_remaining_fragments () { debug::log (std::format ( "{} fragments remaining unmatched" m_fragments.size())); int num_unmatched = 0; for (auto& frag : m_fragments) { if (m_sequence.find (frag) == std::string::npos) ++num_unmatched; } if (num_unmatched) std::cerr << "WARNING: " << num_unmatched << " fragments remain unmatched!\n"; else debug::log ("all remaining fragments matched OK"); } ``` Likewise, the code in `ShotgunSequencer::check_remaining_fragments()` works exactly as it did previously in `shotgun.cpp` --- layout: false name: getset # Getters & setters There is one final piece required for us to be able to use our `ShotgunSequencer` class: - a way to retrieve the resulting sequence -- For this, we can use a [*getter* method](https://www.geeksforgeeks.org/cpp-getters-and-setters/) ``` class ShotgunSequencer { public: ... * const std::string& sequence () const { return m_sequence; } private: ... }; ``` -- Let's unpack what is going on here... --- # Getters & setters ``` class ShotgunSequencer { public: ... `const std::string& sequence () const` { return m_sequence; } private: ... }; ``` This is the *declaration* of our method --- # Getters & setters ``` class ShotgunSequencer { public: ... const std::string& `sequence` () const { return m_sequence; } private: ... }; ``` We have given our getter method a simple name: `sequence()` - note that many style guides would recommend a name such as `get_sequence()` or `getSequence()` - use whichever [coding standards](https://isocpp.org/wiki/faq/coding-standards) are in use on whichever project you may be contributing to! --- # Getters & setters ``` class ShotgunSequencer { public: ... `const std::string&` sequence () const { return m_sequence; } private: ... }; ``` Our getter returns a *const reference* to our member variable - this is a common construct: returning a full-blown copy could rapidly become prohibitive - returning a `const` reference guarantees our *private* variable remains read-only - it cannot be modified from outside the code --- # Getters & setters ``` class ShotgunSequencer { public: ... const std::string& sequence `()` const { return m_sequence; } private: ... }; ``` Note that our getter method does not take any arguments - we can simply invoke it as `solver.sequence()` - this is often the case with getters: they only need to return the corresponding value --- name: const_method # Getters & setters ``` class ShotgunSequencer { public: ... const std::string& sequence () `const` { return m_sequence; } private: ... }; ``` The `const` keyword has a special meaning when placed at the end of our method declaration, after the argument list: - it states that this method cannot modify any of the class members - calling this method is therefore guaranteed to leave the class itself completely unmodified - the compiler is responsible for enforcing this --- # Getters & setters ``` class ShotgunSequencer { public: ... const std::string& sequence () const { `return m_sequence;` } private: ... }; ``` In this case, we have decided to insert the method *definition* right in the class declaration - this differs from our previous methods, which were defined separately in the corresponding `.cpp` file -- This declares the member function implicitly as `inline` - remember: `inline` means the definition is allowed to appear across multiple *translation units* - this makes sense for small functions such as getters & setters - it provides opportunities for the compiler to optimise away the function call - it can simply substitute the *body* of the function where it might otherwise have called the function - there is now no need to supply the corresponding function definition in a separate `.cpp` file --- # Getters & setters ``` class ShotgunSequencer { public: * void init (const std::vector
& fragments); ... const std::string& sequence () const { return m_sequence; } private: ... }; ``` *Setters* perform the opposite action from getters - they allow users to *set* class parameters - they typically do not need to return anything, so usually have a `void` return type - since they modify class members, they cannot be declared `const` -- Our `init()` method is in many ways a setter method: - it sets the (initial) list of fragments - we could have called this method `set_fragments()` or similar - here, we have chosen to call it `init()` since setting the fragment list implicitly (re-)initialises the algorithm --- # Getters & setters Getters and setters are an important tool to implement *encapsulation* - the getter can ensure the member variable cannot be modified directly from outside the class - the setter can perform any additional actions that may be required when modifying member variables - for example, our `init()` method (if viewed as a setter method) needs to re-initialise the whole algorithm, including resetting the current estimate of the sequence to the longest fragment in the list - simply setting the list of fragments without reinitialising the algorithm would leave the class in an inconsistent state – breaking encapsulation --- class: section name: exercises # Exercises --- # Exercises Have a go at implementing the changes necessary to create the `ShotgunSequencer` class. Move the functionality previously in `shotgun.cpp` (currently within the `run()` function) into dedicated methods.