Wednesday 10 July 2019

Rust: How do we teach "Implementing traits in no_std for generics using lifetimes" without students going mad?

Update 2019-Jul-27: In the code below my StackVec type was more complicated than it had to be, I had been using StackVec<'a, &'a mut T> instead of StackVec<'a, T> where T: 'a. I am unsure how I ended up making the type so complicated, but I suspect the lifetimes mismatch errors and the attempt to implement IntoIterator were the reason why I made the original mistake.

Corrected code accordingly.



I'm trying to go through Sergio Benitez's CS140E class and I am currently at Implementing StackVec. StackVec is something that currently, looks like this:

/// A contiguous array type backed by a slice.
///
/// `StackVec`'s functionality is similar to that of `std::Vec`. You can `push`
/// and `pop` and iterate over the vector. Unlike `Vec`, however, `StackVec`
/// requires no memory allocation as it is backed by a user-supplied slice. As a
/// result, `StackVec`'s capacity is _bounded_ by the user-supplied slice. This
/// results in `push` being fallible: if `push` is called when the vector is
/// full, an `Err` is returned.
#[derive(Debug)]
pub struct StackVec<'a, T: 'a> {
    storage: &'a mut [T],
    len: usize,
    capacity: usize,
}
The initial skeleton did not contain the derive Debug and the capacity field, I added them myself.

Now I am trying to understand what needs to happens behind:
  1. IntoIterator
  2. when in no_std
  3. with a custom type which has generics
  4. and has to use lifetimes
I don't now what I'm doing, I might have managed to do it:

pub struct StackVecIntoIterator<'a, T: 'a> {
    stackvec: StackVec<'a, T>,
    index: usize,
}

impl<'a, T: Clone + 'a> IntoIterator for StackVec<'a, &'a mut T> {
    type Item = &'a mut T;
    type IntoIter = StackVecIntoIterator<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        StackVecIntoIterator {
            stackvec: self,
            index: 0,
        }
    }
}

impl<'a, T: Clone + 'a> Iterator for StackVecIntoIterator<'a, T> {
    type Item = &'a mut T;

    fn next(&mut self) -> Option {
        let result = self.stackvec.pop();
        self.index += 1;

        result
    }
}

Corrected code as of 2019-Jul-27:
pub struct StackVecIntoIterator<'a, T: 'a> {
    stackvec: StackVec<'a, T>,
    index: usize,
}

impl<'a, T: Clone + 'a> IntoIterator for StackVec<'a, T> {
    type Item = T;
    type IntoIter = StackVecIntoIterator<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        StackVecIntoIterator {
            stackvec: self,
            index: 0,
        }
    }
}

impl<'a, T: Clone + 'a> Iterator for StackVecIntoIterator<'a, T> {
    type Item = T;

    fn next(&mut self) -> Option {
        let result = self.stackvec.pop().clone();
        self.index += 1;

        result
    }
}



I was really struggling to understand what should the returned iterator type be in my case, since, obviously, std::vec is out because a) I am trying to do a no_std implementation of something that should look a little like b) a std::vec.

That was until I found this wonderful example on a custom type without using any already implemented Iterator, but defining the helper PixelIntoIterator struct and its associated impl block:

struct Pixel {
    r: i8,
    g: i8,
    b: i8,
}

impl IntoIterator for Pixel {
    type Item = i8;
    type IntoIter = PixelIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        PixelIntoIterator {
            pixel: self,
            index: 0,
        }

    }
}

struct PixelIntoIterator {
    pixel: Pixel,
    index: usize,
}

impl Iterator for PixelIntoIterator {
    type Item = i8;
    fn next(&mut self) -> Option {
        let result = match self.index {
            0 => self.pixel.r,
            1 => self.pixel.g,
            2 => self.pixel.b,
            _ => return None,
        };
        self.index += 1;
        Some(result)
    }
}


fn main() {
    let p = Pixel {
        r: 54,
        g: 23,
        b: 74,
    };
    for component in p {
        println!("{}", component);
    }
}
The part in bold was what I was actually missing. Once I had that missing link, I was able to struggle through the generics part.

Note that, once I had only one new thing, the generics - luckly the lifetime part seemed it to be simply considered part of the generic thing - everything was easier to navigate.


Still, the fact there are so many new things at once, one of them being lifetimes - which can not be taught, only experienced @oli_obk - makes things very confusing.

Even if I think I managed it for IntoIterator, I am similarly confused about implementing "Deref for StackVec" for the same reasons.

I think I am seeing on my own skin what Oliver Scherer was saying about big infodumps at once at the beginning is not the way to go. I feel that if Sergio's class was now in its second year, things would have improved. OTOH, I am now very curious how does your curriculum look like, Oli?

All that aside, what should be the signature of the impl? Is this OK?

impl<'a, T: Clone + 'a> Deref for StackVec<'a, &'a mut T> {
    type Target = T;

    fn deref(&self) -> &Self::Target;
}
Trivial examples like wrapper structs over basic Copy types u8 make it more obvious what Target should be, but in this case it's so unclear, at least to me, at this point. And because of that I am unsure what should the implementation even look like.

I don't know what I'm doing, but I hope things will become clear with more exercise.

Thursday 4 July 2019

HOWTO: Rustup: Overriding the rustc compiler version just for some directory

If you need to use a specific version of the rustc compiler instead of the default, the rustup documentation tells you how to do that.


First install the desired version, e.g. nightly-2018-01-09

$ rustup install nightly-2018-01-09
info: syncing channel updates for 'nightly-2018-01-09-x86_64-pc-windows-msvc'
info: latest update on 2018-01-09, rust version 1.25.0-nightly (b5392f545 2018-01-08)
info: downloading component 'rustc'
info: downloading component 'rust-std'
info: downloading component 'cargo'
info: downloading component 'rust-docs'
info: installing component 'rustc'
info: installing component 'rust-std'
info: installing component 'cargo'
info: installing component 'rust-docs'

  nightly-2018-01-09-x86_64-pc-windows-msvc installed - rustc 1.25.0-nightly (b5392f545 2018-01-08)

info: checking for self-updates

Then override the default compiler with the desired one in the top directory of your choice:

$ rustup override set nightly-2018-01-09
info: using existing install for 'nightly-2018-01-09-x86_64-pc-windows-msvc'
info: override toolchain for 'C:\usr\src\rust\sbenitez-cs140e' set to 'nightly-2018-01-09-x86_64-pc-windows-msvc'

  nightly-2018-01-09-x86_64-pc-windows-msvc unchanged - rustc 1.25.0-nightly (b5392f545 2018-01-08)
That's it.

Saturday 15 June 2019

How to generate a usable map file for Rust code - and related (f)rustrations

Intro


Cargo does not produce a .map file, and if it does, mangling makes it very unusable. If you're searching for the TLDR, read from "How to generate a map file" on the bottom of the article.

Motivation

As a person with experience in embedded programming I find it very useful to be able to look into the map file.

Scenarios where looking at the map file is important:
  • evaluate if the code changes you made had the desired size impact or no undesired impact - recently I saw a compiler optimize for speed an initialization with 0 of an array by putting long blocks of u8 arrays in .rodata section
  • check if a particular symbol has landed in the appropriate memory section or region
  • make an initial evaluation of which functions/code could be changed to optimize either for code size or for more readability (if the size cost is acceptable)
  • check particular symbols have expected sizes and/or alignments

Rustrations 

Because these kind of scenarios  are quite frequent in my work and I am used to looking at the .map file, some "rustrations" I currently face are:
  1. No map file is generated by default via cargo and information on how to do it is sparse
  2. If generated, the symbols are mangled and it seems each symbol is in a section of its own, making per section (e.g. .rodata, .text, .bss, .data) or per file analysys more difficult than it should be
  3. I haven't found a way disable mangling globally, without editing the rust sources. - I remember there is some tool to un-mangle the output map file, but I forgot its name and I find the need to post-process suboptimal
  4. no default map file filename or location - ideally it should be named as the crate or app, as specified in the .toml file.

How to generate a map file

Generating map file for linux (and possibly other OSes)

Unfortunately, not all architectures/targets use the same linker, or on some the preferred linker could change for various reasons.

Here is how I managed to generate a map file for an AMD64/X86_64 linux target where it seems the linker is GLD:

Create a .cargo/config file with the following content:

.cargo/config:
[build]
    rustflags = ["-Clink-args=-Wl,-Map=app.map"]

This should apply to all targets which use GLD as a linker, so I suspect this is not portable to Windows integrated with MSVC compiler.

Generating a map file for thumb7m with rust-lld


On baremetal targets such as Cortex M7 (thumbv7m where you might want to use the llvm based rust-lld, more linker options might be necessary to prevent linking with compiler provided startup code or libraries, so the config would look something like this:
.cargo/config: 
[build]
target = "thumbv7m-none-eabi"
rustflags = ["-Clink-args=-Map=app.map"]
The thins I dislike about this is the fact the target is forced to thumbv7m-none-eabi, so some unit tests or generic code which might run on the build computer would be harder to test.

Note: if using rustc directly, just pass the extra options

Map file generation with some readable symbols

After the changes above ae done, you'll get an app.map file (even if the crate is of a lib) with a predefined name, If anyone knows ho to keep the crate name or at least use lib.map for libs, and app.map for apps, if the original project name can't be used.

The problems with the generated linker script are that:
  1. all symbol names are mangled, so you can't easily connect back to the code; the alternative is to force the compiler to not mangle, by adding the #[(no_mangle)] before the interesting symbols.
  2. each symbol seems to be put in its own subsection (e.g. an initalized array in .data.

Dealing with mangling

For problem 1, the fix is to add in the source #[no_mangle] to symbols or functions, like this:

#[no_mangle]
pub fn sing(start: i32, end: i32) -> String {
    // code body follows
}

Dealing with mangling globally

I wasn't able to find a way to convince cargo to apply no_mangle to the entire project, so if you know how to, please comment. I was thinking using #![no_mangle] to apply the attribute globally in a file would work, but is doesn't seem to work as expected: the subsection still contains the mangled name, while the symbol seems to be "namespaced":

Here is a some section from the #![no_mangle] (global) version:
.text._ZN9beer_song5verse17h0d94ba819eb8952aE
                0x000000000004fa00      0x61e /home/eddy/usr/src/rust/learn-rust/exercism/rust/beer-song/target/release/deps/libbeer_song-d80e2fdea1de9ada.rlib(beer_song-d80e2fdea1de9ada.beer_song.5vo42nek-cgu.3.rcgu.o)
                0x000000000004fa00                beer_song::verse
 
When the #[no_mangle] attribute is attached directly to the function, the subsection is not mangled and the symbol seems to be global:

.text.verse    0x000000000004f9c0      0x61e /home/eddy/usr/src/rust/learn-rust/exercism/rust/beer-song/target/release/deps/libbeer_song-d80e2fdea1de9ada.rlib(beer_song-d80e2fdea1de9ada.beer_song.5vo42nek-cgu.3.rcgu.o)
                0x000000000004f9c0                verse
I would prefer to have a cargo global option to switch for the entire project, and code changes would not be needed, comment welcome.

Each symbol in its section

The second issue is quite annoying, even if the fact that each symbol is in its own section can be useful to control every symbol's placement via the linker script, but I guess to fix this I need to custom linker file to redirect, say all constants "subsections" into ".rodata" section.

I haven't tried this, but it should work.