Crate lexerus

Source
Expand description

Lexerus is a lexer dinosaur that consumes a Buffer constructed from str and spits out a structure through the lexer::Lexer::lex call.

This library uses the lexerus_derive::Token and lexerus_derive::Lexer macros to decorate a structure for automatic parsing. See those macros for additional options.

This library was developed in conjunction with SPEW and examples on actual implementation can be found there (although currently private). See also tdlib_driver which uses this library (albeit badly).

An annotated struct will act as an AND and all tokens must be matched before Lexer::lex returns a valid Result::Ok An annotated enum acts as an OR and any of the match arms must be met in order for the Lexer::lex to return a valid Result::Ok

§Example


// Create and decorate a struct
#[derive(Lexer, Token, Debug)]
enum Trex<'code> {
    Trex(#[pattern = "rawr"] Buffer<'code>),
    Other(#[pattern = "meow"] Buffer<'code>),
};

// Create a raw buffe
let mut buffer = Buffer::from("rawr");

// Attempt to parse the trex
let trex_calling = Trex::lex(&mut buffer).unwrap();

if let Trex::Trex(trex_calling) = trex_calling {
    assert_eq!(trex_calling.to_string(), "rawr");
}
else {
    panic!("expected trex");
}

// Create and decorate a struct
#[derive(Lexer, Token, Debug)]
struct Trex<'code>(#[pattern = "trex::"] Buffer<'code>);

#[derive(Lexer, Token, Debug)]
struct TrexCall<'code>(
    #[pattern = "RAWR"] Buffer<'code>,
);

#[derive(Lexer, Token, Debug)]
struct Call<'code> {
    rex: Trex<'code>,
    call: TrexCall<'code>,
}

// Create a raw buffe
let mut buffer = Buffer::from("trex::RAWR");

// Attempt to parse the trex
let trex_calling = Call::lex(&mut buffer).unwrap();

// Extract the buffer from trex
let trex = trex_calling.rex.buffer().unwrap();
let trex_calling = trex_calling.buffer().unwrap();

// Buffer should contain the exact matched string
assert_eq!(trex_calling.to_string(), "trex::RAWR");
assert_eq!(trex.to_string(), "trex::");

§Goals

  • No heap allocations when parsing. However be aware that certain [helpers] may use heap allocations if required.
  • Heap allocations only occur when calling Token::buffer on non-contigous sections of text or_repeated_ sections of text. This is inevitable beause different sections of str have to be stitched together and the only way to do so is with a heap allocation.
  • Proper debuggable information, i.e. the Buffer retains information about its source and theexact range on the source. The Error which Lexer::lex generates contains a clone of the unparsed Buffer so that the program can debug where the Lexer::lex failed.

Modules§

implementations

Structs§

EoF
Error
Information about the error which occured when the Lexer::lex failed.
Group
Captures a group of contiguous Capture. Unlike GroupUntil, this structure does not check for an end token.
GroupBookEnd
Captures a group of contiguous Capture bookended by the Start and End types.
GroupUntil
Captures a group of contiguous Capture until End is found.
Isolate
Use Token to wrap any Type::Lexer to try to consume all WhiteSpaces before and after the given Lexer
NewLine
Literal newline \n
Not
Matches only if the next token is Type. Uses PhantomData and does not capture the Type nor the underlying Buffer
Peek
Lookahead to see if a Token exists. Does not consume the buffer and as such is a very useful tool for checking that the specified Token exists as a form of validation, but letting the Lexer on the next Token process it instead.
Space
Literal whitespace
Tab
Literal tab \t
TestBuild
Test harness for tokens.

Enums§

Buffer
Buffer is a container for source code. It is represented as an enum because there are two forms of Buffer which can be created:
Infix
Infix operator used to create an infix operation, e.g. 1+1. This helper is useful because it does not discard the LHS result but rather returns it as a None value if the Operator cannot be found.
Kind
Describes the type of error found.
WhiteSpace
Represents any permutation of possible white spaces. Could be , \t, \n or whatver whitespaces are subsequently added.

Traits§

Lexer
Trait that indicates that the structure is able to be constructed from a Buffer by calling Lexer::lex on the struct.
Token
Token indicator

Type Aliases§

WhiteSpaces
Unallocated crate::Group of WhiteSpace. This can probably be used in the context of breaking up SYMBOL [WhiteSpaces] SYMBOL in complex chains.

Derive Macros§

Lexer
Lexer
Token
Decorates a struct or enum with impl Token<'code>.