extract helper function - Text Generator Plugin

The `extract` command is utilized to retrieve content from a variety of sources such as web pages, images, YouTube videos, PDFs, RSS and audio files. It is a versatile command that allows for data acquisition from different media types for further processing or analysis. ## Usage ### Basic Extraction #### Syntax ```handlebars {{extract "extractorType" source}} ``` - `"extractorType"`: The type of extractor to be used (e.g., "web", "youtube", "pdf", "image", "audio", "rss"). - `source`: The variable or direct URL/path that points to the content source. #### Examples **Extracting content from a web page using a variable or direct URL:** ```handlebars {{extract "web" weburl}} {{extract "web" "https://example.com"}} ``` **Using extractors for other media types:** ```handlebars {{extract "youtube" ytUrl}} {{extract "pdf" pdfPath}} {{extract "image" imgPath}} {{extract "audio" audioPath}} ``` ### Processing Input with Blocks To preprocess input before extraction, the `extract` command can be used in block mode. #### Syntax ```handlebars {{#extract "extractorType"}} {{inputVariable}} {{/extract}} ``` #### Example **Preprocessing a web URL before extraction:** ```handlebars {{#extract "web"}} {{weburl}} {{/extract}} ``` ## Web Extractor Options The `web` extractor can be used with additional options to tailor the extraction process. ### Output Formatting By default, the `web` extractor outputs results in markdown. However, if HTML format is required, you can specify this as follows: #### Syntax ```handlebars {{extract "web_html" source}} ``` #### Example **Extracting as HTML:** ```handlebars {{extract "web_html" "https://example.com"}} ``` ### Using CSS Selectors To focus the extraction on a specific part of a web page, you can utilize a CSS selector. #### Syntax ```handlebars {{extract "web_html" source "cssSelector"}} ``` #### Example **Focusing on an article section of a web page:** ```handlebars {{extract "web_html" "https://example.com" "article"}} ``` ## Key Points - **Use** the `extract` command to retrieve content from specified sources. - **Specify** the type of extractor and the source for the content you wish to extract. - **Process** input before extraction using block syntax for preprocessing needs. - **Extract** content in markdown or HTML format according to your requirements. - **Employ** CSS selectors with the `web` extractor to target specific content within a web page. By leveraging the `extract` command, you can automate the content retrieval process from a range of media, enabling efficient data collection for your workflows.