| examples | ||
| rautomaton | ||
| rautomaton-macro | ||
| scanatra-core | ||
| scanatra-macro | ||
| src | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| readme.md | ||
scanatra
scanatra is a tool set to create parsers, compilers and other similar text processing tools.
Content:
- rautomaton
- finite automaton (regular expressions, dfa, nfa)
- cfg (ll1, lr0/slr, lr1 parsers)
- rautomaton
- cfg_grammar_const - compile time cfg parse table generation
- scanatra-core
- double_enum - to create token enums for scanning
- scanner - lex text to tokens, remove irrelevant data
- ast_gen - macro creating ast gen code
- creates a grammar
- converts the grammar to an ast
- regex - a regex syntax parser, to use regex for creating regular expressions
- scanatra-macro
- regex - parses a regex string on compile time
Usage
Tokens
For lexing two kinds of tokens are needed: Tokens with data, Tokens without data, to represent the token kind.
Create them like this:
use scanatra::prelude::*;
double_enum!(
BareTokens, Tokens {
WhiteSpace,
Semicolon,
Ident(String),
// ...
}
);
This will create two enums (with some impl's):
#[derive(Debug, Clone, PartialEq)]
enum Tokens {
WhiteSpace,
Semicolon,
Ident(String),
// ...
}
#[derive(Terminal, Debug, Clone, PartialEq, Eq, Hash, Ord, PartialOrd)]
enum BareTokens {
WhiteSpace,
Semicolon,
Ident,
// ...
}
impl PartialEq<Tokens> for BareTokens { /* ... */ }
impl PartialEq<BareTokens> for Tokens { /* ... */ }
impl Form<Tokens> for BareTokens { /* ... */ }
You can create them yourself, but remember to implement those traits.
Scanner
For lexing, create a scanner.
A scanner just needs an enum like Tokens above.
This enum must impl the trait CreateMatchTable<Self>.
Write it your self or use the following macro:
use scanatra::prelude::*;
token_scanner!(
Tokens,
or!(' ', '\t', '\n', '\r') ;-> |_| {
Some(WhiteSpace)
}
';' ;-> |_| {
Some(Semicolon)
}
or!('a'..='z', 'A'..='Z') * star!(word!()) ;-> |m: &str| {
Some(Ident(String::from(m)))
}
regex!(r#""( [^\\"] | \\[\\"] )*""#) ;-> |m| {
//...
}
// ...
);
The macro uses the regular expressions from rautomaton.
regex
If you do not want to construct the regular expressions your self, you can use the regex!() macro.
This macro currently supports a subset of regex syntax.
features:
a | b: matches a or b.ab: matches a followed by b.(): match group[abc0-9]: matches any character in the brackets. Matching ranges (e.g 0 to 9) is supported.[^abc0-9]: same as above, but matches anything besides the given characters / ranges.?: matches element before 0 or 1 times.+: matches element before 1 or more times.*: matches element before 0 or more times.\x00: matches an escaped one byte character.\u0000or\u{00}: matches an escaped unicode.\/: matches an escaped character (but not x, c or u).- special characters:
\0,\r,\f,\v,\r,\t,\c - control characters:
\cAto\cZ \wand\W: only word characters / every other character\dand\D: only digits / every other character\sand\S: only whitespace characters / every other character
The macro regex! will parse the regex on compile time, if possible.
When the regex should be parsed on runtime, use regex_live!.
Scanning
To actually scan text, create a scanner, and supply it with some text.
Node: the finite Automatons from
rautomatoncould lex other data, but the Scanner is limited to Text.
use scanatra::prelude::*;
fn main(){
let code = String::from("...");
let scanner = Scanner::<Tokens>::new()
.with_skipping(Tokens::WhiteSpace);
let token_iter = scanner.iter(code);
// ...
}
Use .with_skipping to ignore tokens.
Parser
Preparation
A grammar needs none-terminals, to create the rules.
Create them like this:
#[derive(NoneTerminal, Debug, PartialEq, Eq, Hash, Clone, PartialOrd, Ord)]
enum NoneTerminals {
SomeNoneTerminal,
S,A,B,
// ...
}
To use Terminals and NoneTerminals, they have to impl Terminal and NoneTerminal.
double_enum! will do Terminal automatically.
As alternative a method could be implemented:
- for
NoneTerminals:fn to_sentential<T>(self) -> Sentential<Self, T> - for
Terminals:fn to_sentential<N>(self) -> Sentential<N, Self>
Grammar
A grammar can be created with Grammar<NoneTerminals, BareTokens>::new() or with the following macro:
fn grammar() -> Grammar<NoneTerminals, BareTokens> {
use BareTokens::*;
use NoneTerminals::*;
cfg_grammar![
start: P;
P => E;
E => E, Add, T;
E => T;
T => Ident, LBrace, E, RBrace;
T => Ident;
// ...
]
}
AST
For this a AST datatype is needed like:
#[derive(Clone)]
enum AST {
Var1(data, ...),
// ...
}
When generating an ast the following macro will help.
This macro will replace cfg_grammar! and the previous mentioned grammar func.
gen_ast!(
(trait: a_custom_trait)?
types: AST;
pos_type: ParsePos;
custom_err: String;
tokens: Tokens, BareTokens;
none_terminals: NoneTerminals;
start: P;
P => Pi=Var1(v) => {
Ok(AST::Var1(v))
},
Pi in Some(pos) => Pi=Var1(v), Comma, Ident(name) => {
println!("{} in {}", name, pos);
Ok(...)
}
)
Content:
trait: can optionally be used to create the implementations in a custom trait.typeandcustom_err: Every rule has to return aResult<type, custom_err>. When anErris returned the parsing fails.pos_type: The position type of the used scanner. The build in usesParsePos.tokens: The Terminals defined bydouble_enum!.none_terminals: the defined NoneTerminals.start: The starting rule.- grammar:
- rule (the element before
=>): rule to derive content from. - derived element:
- to a none terminal
rule=pattern: the pattern can be used to deconstruct theASTdata. - to a terminal
token(args, ...): the terminal token. With optional()to destruct a Token with data.
- to a none terminal
- behind every element
in poscan be added to get aOption<pos_type>of the position of an element or the complete rule.posis a pattern, that could be used for destructuring.
- rule (the element before
In the {} bracket code can be added to create an AST element out of the parsed rules and Terminals.
Note: This macro will try to create the least complex parse table in this order: LL(1), SLR, LR(1). When the grammar is not deterministic context free, i compiler error will be thrown.
For a complete example take a look at the build in regex parser or the json example.