Guide to adding new AST readers
Pyndoc’s reader functions in a way that makes it independent of the language being processed, the structure of syntax tree makes it simple to add new blocks or modify existing ones.
AST Blocks
AST blocks are divided into two subcategories:
Atom blocks - cannot hold other blocks inside of them
Composite blocks - can hold other blocks inside of them
The reader will treat these two types completely independently and has a different way for reading them
Default AST Blocks
The default representation of AST Blocks has been defined in
pyndoc.ast.blocks and contains definitions for most blocks found in
any markup language. Any language parser or reader added to pyndoc
should define these blocks
The blocks defined in the aforementioned file all derive from either the
ASTAtomBlock or ASTCompositeBlock class, found in
pyndoc.ast.basic_blocks. These classes define the default behaviour
and contents of Atom and Composite blocks
Read Handler
A read handler is a class containing default methods for parsing tokens related to certain AST Blocks, every AST Block derives from a Read Handler, allowing for custom reading functions and definitions
CompositeReadHandler
A class defining the Read Handler for all composite blocks, it contains attributes related to the start and end patterns of a Composite Block, as well as information whether the block is an inline block (if it can exist on its own, not wrapped in any other composite block)
CompositeReadHandler contains the following methods important for
creating new readers:
process_read- invoked after a block is created, can process additional arguments after a block’s definition, by default - does nothingstart- matches a token against a start patternend- matches a token against an end patternhandle_premature_closure- special handling of any situation in which the file has ended and the block needs extra processing
AtomReadHandler
A class defining the Read Handler for atom blocks, contains attributes related to the pattern that matches an atom block, and a boolean indicating if the block has any content (for example: a Str block will have a string as the content, and a Space block won’t have anything)
AtomReadHandler contains the following methods important for
creating new readers
match_pattern- matches the token against the block’s pattern
Atom Wrapper
An atom wrapper is a block that will catch, and wrap around any atom or
inline blocks that are defined without any context existing, most of the
time, ast.Para will be used for this, but any other function can be
used as well
Defining a reader
New readers can be defined in the src/pyndoc/readers directory.
First create a directory with the language’s name, inside of the
directory, create an empty __init__.py file.
tokens.py
tokens.py is a required file for each language reader, it contains
details on all tokens and their start and end patterns, it will be used
to define attribute values for AST Blocks
A tokens.py file should contain the following definitions:
A
declared_tokensdict, containing a specific AST Block class as a key, and a tuple containing a regex pattern defining the block’s start, and a boolean as a value. The boolean will be used as theis_inlineattributeA
declared_endsdict, containing information on declared end patterns, key values are same as above, values are just the regex patternan
atom_wrappervariable - containing the class name for the atom wrapperA
declared_atomic_patternsdict - keys as above, values are a tuple containing a regex string for each atomic pattern, and a boolean indicating if the block has any contents
Default block processing
The reader goes over a file character by character and forms tokens
that are then matched, by the parser, against the patterns defined in
tokens.py. With the default bahaviour of all read handles, the
reader will do the following for each read character:
Check if the currently processed block has ended:
Run the
endmethod of a read handler, it will return a match and a new tokenIf there is a match, pop the current block from the context tree, and place it into the parsed tree if the context is empty, or into the block below it otherwise
Check if a new block has started:
Run the
startmethod of a read handler, it will return a match and a new tokenif the block is an inline block, it will be wrapped in an atom handler first
Add the block to the context tree
Check if an atom block has ended
Check if an atom block has been matched in a previous iteration, and does not match now, the
match_patternmethod is used for thisthis indicates that the atom block has ended
insert the atom block into the current context, or wrap it around the atom wrapper if there is no context.
Defining custom blocks
Custom blocks can be defined within a language module (directory under
pyndoc/readers) in its own blocks.py file, custom behaviour such
as overriden start(), end() and process_read() methods can
be defined here
Examples
All of these can be found under ``pyndoc.ast.gfm.blocks``
Getting a header level from a matched string
class Header(ast.Header):
def __init__(self) -> None:
super().__init__()
def process_read(self, **kwargs: Unpack[ast_helpers.ProcessParams]) -> None:
match = kwargs["match"]
level = len(match.group("h"))
self.contents.metadata = [level]
Handling premature closure of an Emph
@classmethod
def handle_premature_closure(cls, token: str) -> str:
return token[:-1] if token[-1] == "*" else token
Adding a Plain inside of a newly created bullet list
def process_read(self, **kwargs: Unpack[ast_helpers.ProcessParams]) -> None:
match = kwargs["match"]
indent = len(match.group("s"))
self.contents.metadata = [indent]
self.add_plain(kwargs["context"])
@staticmethod
def add_plain(context: list) -> None:
plain = ast.Plain()
context.append(plain)