Guide to adding new AST readers =============================== Pyndoc's reader functions in a way that makes it independent of the language being processed, the structure of syntax tree makes it simple to add new blocks or modify existing ones. AST Blocks ---------- AST blocks are divided into two subcategories: - *Atom blocks* - cannot hold other blocks inside of them - *Composite blocks* - can hold other blocks inside of them The reader will treat these two types completely independently and has a different way for reading them Default AST Blocks ~~~~~~~~~~~~~~~~~~ The default representation of AST Blocks has been defined in ``pyndoc.ast.blocks`` and contains definitions for most blocks found in any markup language. Any language parser or reader added to ``pyndoc`` should define these blocks The blocks defined in the aforementioned file all derive from either the ``ASTAtomBlock`` or ``ASTCompositeBlock`` class, found in ``pyndoc.ast.basic_blocks``. These classes define the default behaviour and contents of Atom and Composite blocks Read Handler ~~~~~~~~~~~~ A read handler is a class containing default methods for parsing tokens related to certain AST Blocks, every AST Block derives from a Read Handler, allowing for custom reading functions and definitions CompositeReadHandler ^^^^^^^^^^^^^^^^^^^^ A class defining the Read Handler for all composite blocks, it contains attributes related to the **start** and **end** patterns of a Composite Block, as well as information whether the block is an **inline** block (if it can exist on its own, not wrapped in any other composite block) ``CompositeReadHandler`` contains the following methods important for creating new readers: - ``process_read`` - invoked after a block is created, can process additional arguments after a block's definition, by default - does nothing - ``start`` - matches a token against a start pattern - ``end`` - matches a token against an end pattern - ``handle_premature_closure`` - special handling of any situation in which the file has ended and the block needs extra processing AtomReadHandler ^^^^^^^^^^^^^^^ A class defining the Read Handler for atom blocks, contains attributes related to the pattern that matches an atom block, and a boolean indicating if the block has any content (for example: a Str block will have a string as the content, and a Space block won't have anything) ``AtomReadHandler`` contains the following methods important for creating new readers - ``match_pattern`` - matches the token against the block's pattern Atom Wrapper ~~~~~~~~~~~~ An atom wrapper is a block that will catch, and wrap around any atom or inline blocks that are defined without any context existing, most of the time, ``ast.Para`` will be used for this, but any other function can be used as well Defining a reader ----------------- New readers can be defined in the ``src/pyndoc/readers`` directory. First create a directory with the language's name, inside of the directory, create an empty ``__init__.py`` file. .. _tokenspy: tokens.py ~~~~~~~~~ ``tokens.py`` is a required file for each language reader, it contains details on all tokens and their start and end patterns, it will be used to define attribute values for AST Blocks A ``tokens.py`` file should contain the following definitions: - A ``declared_tokens`` dict, containing a specific **AST Block class** as a key, and a tuple containing a regex pattern defining the block's start, and a boolean as a value. The boolean will be used as the ``is_inline`` attribute - A ``declared_ends`` dict, containing information on declared end patterns, key values are same as above, values are **just** the **regex pattern** - an ``atom_wrapper`` variable - containing the class name for the atom wrapper - A ``declared_atomic_patterns`` dict - keys as above, values are a tuple containing a regex string for each atomic pattern, and a boolean indicating if the block has any contents Default block processing ------------------------ The reader goes over a file character by character and forms *tokens* that are then matched, by the parser, against the patterns defined in ``tokens.py``. With the default bahaviour of all read handles, the reader will do the following for each read character: 1. Check if the currently processed block has ended: - Run the ``end`` method of a read handler, it will return a match and a new token - If there is a match, pop the current block from the context tree, and place it into the parsed tree if the context is empty, or into the block below it otherwise 2. Check if a new block has started: - Run the ``start`` method of a read handler, it will return a match and a new token - if the block is an inline block, it will be wrapped in an atom handler first - Add the block to the context tree 3. Check if an atom block has **ended** - Check if an atom block has been matched in a previous iteration, and does not match now, the ``match_pattern`` method is used for this - this indicates that the atom block has ended - insert the atom block into the current context, or wrap it around the atom wrapper if there is no context. Defining custom blocks ---------------------- Custom blocks can be defined within a language module (directory under ``pyndoc/readers``) in its own ``blocks.py`` file, custom behaviour such as overriden ``start()``, ``end()`` and ``process_read()`` methods can be defined here Examples ~~~~~~~~ *All of these can be found under ``pyndoc.ast.gfm.blocks``* Getting a header level from a matched string .. code:: python class Header(ast.Header): def __init__(self) -> None: super().__init__() def process_read(self, **kwargs: Unpack[ast_helpers.ProcessParams]) -> None: match = kwargs["match"] level = len(match.group("h")) self.contents.metadata = [level] Handling premature closure of an Emph .. code:: python @classmethod def handle_premature_closure(cls, token: str) -> str: return token[:-1] if token[-1] == "*" else token Adding a `Plain` inside of a newly created bullet list .. code:: python def process_read(self, **kwargs: Unpack[ast_helpers.ProcessParams]) -> None: match = kwargs["match"] indent = len(match.group("s")) self.contents.metadata = [indent] self.add_plain(kwargs["context"]) @staticmethod def add_plain(context: list) -> None: plain = ast.Plain() context.append(plain)