Expand description
Lexerus is a lexer dinosaur that consumes a Buffer constructed from str and spits out a structure through the lexer::Lexer::lex call.
This library uses the lexerus_derive::Token and lexerus_derive::Lexer macros to decorate a structure for automatic parsing. See those macros for additional options.
This library was developed in conjunction with SPEW and examples on actual implementation can be found there (although currently private). See also tdlib_driver which uses this library (albeit badly).
An annotated struct
will act as an AND and all tokens must be matched before
Lexer::lex returns a valid Result::Ok
An annotated enum
acts as an OR and any of the match arms must be met in order for the
Lexer::lex to return a valid Result::Ok
§Example
// Create and decorate a struct
#[derive(Lexer, Token, Debug)]
enum Trex<'code> {
Trex(#[pattern = "rawr"] Buffer<'code>),
Other(#[pattern = "meow"] Buffer<'code>),
};
// Create a raw buffe
let mut buffer = Buffer::from("rawr");
// Attempt to parse the trex
let trex_calling = Trex::lex(&mut buffer).unwrap();
if let Trex::Trex(trex_calling) = trex_calling {
assert_eq!(trex_calling.to_string(), "rawr");
}
else {
panic!("expected trex");
}
// Create and decorate a struct
#[derive(Lexer, Token, Debug)]
struct Trex<'code>(#[pattern = "trex::"] Buffer<'code>);
#[derive(Lexer, Token, Debug)]
struct TrexCall<'code>(
#[pattern = "RAWR"] Buffer<'code>,
);
#[derive(Lexer, Token, Debug)]
struct Call<'code> {
rex: Trex<'code>,
call: TrexCall<'code>,
}
// Create a raw buffe
let mut buffer = Buffer::from("trex::RAWR");
// Attempt to parse the trex
let trex_calling = Call::lex(&mut buffer).unwrap();
// Extract the buffer from trex
let trex = trex_calling.rex.buffer().unwrap();
let trex_calling = trex_calling.buffer().unwrap();
// Buffer should contain the exact matched string
assert_eq!(trex_calling.to_string(), "trex::RAWR");
assert_eq!(trex.to_string(), "trex::");
§Goals
- No heap allocations when parsing. However be aware that certain [helpers] may use heap allocations if required.
- Heap allocations only occur when calling Token::buffer on non-contigous sections of text or_repeated_ sections of text. This is inevitable beause different sections of str have to be stitched together and the only way to do so is with a heap allocation.
- Proper debuggable information, i.e. the Buffer retains information about its source and theexact range on the source. The Error which Lexer::lex generates contains a clone of the unparsed Buffer so that the program can debug where the Lexer::lex failed.
Modules§
Structs§
- EoF
- Error
- Information about the error which occured when the Lexer::lex failed.
- Group
- Captures a group of contiguous
Capture
. Unlike GroupUntil, this structure does not check for an end token. - Group
Book End - Captures a group of contiguous
Capture
bookended by theStart
andEnd
types. - Group
Until - Captures a group of contiguous
Capture
untilEnd
is found. - Isolate
- Use Token to wrap any
Type
::Lexer to try to consume all WhiteSpaces before and after the given Lexer - NewLine
- Literal newline
\n
- Not
- Matches only if the next token is
Type
. Uses PhantomData and does not capture theType
nor the underlying Buffer - Peek
- Lookahead to see if a Token exists. Does not consume the buffer and as such is a very useful tool for checking that the specified Token exists as a form of validation, but letting the Lexer on the next Token process it instead.
- Space
- Literal whitespace
- Tab
- Literal tab
\t
- Test
Build - Test harness for tokens.
Enums§
- Buffer
- Buffer is a container for source code. It is represented as an
enum
because there are two forms of Buffer which can be created: - Infix
- Infix operator used to create an infix
operation, e.g.
1+1
. This helper is useful because it does not discard theLHS
result but rather returns it as aNone
value if theOperator
cannot be found. - Kind
- Describes the type of error found.
- White
Space - Represents any permutation of possible white
spaces. Could be
\t
,\n
or whatver whitespaces are subsequently added.
Traits§
- Lexer
- Trait that indicates that the structure is able to be constructed from a Buffer by calling Lexer::lex on the struct.
- Token
- Token indicator
Type Aliases§
- White
Spaces - Unallocated crate::Group of WhiteSpace.
This can probably be used in the context of
breaking up
SYMBOL [WhiteSpaces] SYMBOL
in complex chains.