# S.T.A.R.K. Docs > Speach and Text Algorithmic Recognition Kit (S.T.A.R.K.) is a set of tools for building custom voice assistants. It is designed to be modular and extensible, allowing you to build your own custom voice assistant with ease. S.T.A.R.K. (Speech and Text Algorithmic Recognition Kit) is a modern, async Python framework for building voice assistants and natural language interfaces. Think FastAPI but for speech. It runs on-device, supports multiple languages, integrates with LLMs, and features advanced pattern-based NL parsing, context-aware commands, and community extensions via STARK-PLACE. # Getting Started # Installation This guide will walk you through the installation of the STARK framework and its associated extras. You can use either pip or poetry for the installation. Let's dive right in! ## Prerequisites Ensure you have Python 3.12 or newer installed. You can verify this with: ```bash python --version ``` On some systems, you may need to use the `python3` command instead of `python`: ```bash python3 --version ``` ### Avaiable Extras The STARK framework offers several extras, which are default implementations for its protocols, to facilitate integration with various tools. These extras include: - **all**: Installs all default implementations. Recommended if you're not well-versed in dependency management. - **vosk**: [Vosk](https://alphacephei.com/vosk/) (offline speech recognition) implementation of SpeechRecognizer protocol. - **gcloud**: [Google Cloud Text-to-Speech](https://cloud.google.com/text-to-speech) implementation of SpeechSynthesizer protocol. - **silero**: [Silero](https://github.com/snakers4/silero-models) Models (offline) implementation of SpeechSynthesizer. - **sound**: Required utilities for processing sound: `sounddevice` and `soundfile`. ## Installation with pip To install the base version of STARK: ```bash pip install stark-engine ``` To install any of the extras: ```bash pip install stark-engine[all] pip install stark-engine[gcloud] pip install stark-engine[vosk] pip install stark-engine[silero] pip install stark-engine[sound] ``` If you encounter the error `zsh: no matches found`, simply enclose the package name in quotes: ```zsh pip install "stark-engine[all]" pip install "stark-engine[gcloud]" pip install "stark-engine[vosk]" pip install "stark-engine[silero]" pip install "stark-engine[sound]" ``` ## Installation with poetry If you, like me, prefer using [poetry](https://python-poetry.org) to manage dependencies along with a virtual environment, simply replace `pip install` with `poetry add`. ```bash poetry add stark-engine poetry add stark-engine[all] poetry add stark-engine[gcloud] poetry add stark-engine[vosk] poetry add stark-engine[silero] poetry add stark-engine[sound] ``` If you encounter the error `zsh: no matches found`, simply enclose the package name in quotes: ```zsh poetry add "stark-engine[all]" poetry add "stark-engine[gcloud]" poetry add "stark-engine[vosk]" poetry add "stark-engine[silero]" poetry add "stark-engine[sound]" ``` ______________________________________________________________________ With the STARK framework installed and the desired extras in place, you're all set to develop powerful voice-driven applications. Dive into the documentation, experiment, and build great things! # First Steps Congratulations on installing the STARK framework! This guide is designed to help you familiarize yourself with its primary components and to set up your first voice-driven application using STARK. We'll demonstrate how to create a basic voice assistant that responds to the "hello" command. ## Hello World STARK provides flexibility by allowing you to integrate different implementations for speech recognition and synthesis. For this tutorial, we will employ the Vosk implementation for speech recognition and the Silero implementation for speech synthesis. Before diving in, you'll need to specify URLs for the models. Both Vosk and Silero are designed to automatically download and cache the models upon their first use. - [Vosk Model URL: Visit Vosk models to select an appropriate model.](https://alphacephei.com/vosk/models) - [Silero Model URL: Visit Silero models to identify a suitable model.](https://github.com/snakers4/silero-models?tab=readme-ov-file#models-and-speakers) At the heart of STARK is the `CommandsManager`, a component dedicated to managing the commands your voice assistant can comprehend. Here's a comprehensive example showcasing how to define a new command, initialize the speech recognizer and synthesizer, and run the voice assistant: ```py import anyio from stark import run, CommandsManager, Response from stark.interfaces.vosk import VoskSpeechRecognizer from stark.interfaces.silero import SileroSpeechSynthesizer VOSK_MODEL_URL = "YOUR_CHOSEN_VOSK_MODEL_URL" SILERO_MODEL_URL = "YOUR_CHOSEN_SILERO_MODEL_URL" recognizer = VoskSpeechRecognizer(model_url=VOSK_MODEL_URL) synthesizer = SileroSpeechSynthesizer(model_url=SILERO_MODEL_URL) manager = CommandsManager() @manager.new('hello') async def hello_command() -> Response: text = voice = 'Hello, world!' return Response(text=text, voice=voice) async def main(): await run(manager, recognizer, synthesizer) if __name__ == '__main__': anyio.run(main) ``` In this code snippet, we defined a new command for the voice assistant. When the word "hello" is spoken, the `hello_command` function is triggered, which then issues a greeting in response. It's important to note that STARK accommodates both synchronous (`def`) and asynchronous (`async def`) command definitions. For a deeper dive into the use-cases and distinctions between these two command types, consult the [Sync vs Async Commands](https://stark.markparker.me/sync-vs-async-commands/index.md) article. # Contributing and Shared Usage: S.T.A.R.K P.L.A.C.E ## STARK Platform Library and Community Extensions Stark-Place serves as a repository filled with commands, implementations of various protocols (like speech interfaces), and other extensions that enhance the capabilities of the Stark framework. These features are systematically structured into modules, categorized based on their functionality. ## πŸ“¦ Using Stark-Place To integrate features from Stark-Place into your projects: 1. Install it as you would with any pip module. ```bash pip install stark-place ``` 2. Import the `general_manager` for access to all commands or the specific manager from a module, or any other feature you require. ```python from stark_place.commands import general_manager ``` ## 🀝 Contributing to Stark-Place We welcome and appreciate contributions from the community! Here's how you can contribute: 1. **Fork the Repository**: Start by creating a fork of the [MarkParker5/STARK-PLACE](https://github.com/MarkParker5/STARK-PLACE) repository. 1. **Optional Branch Creation**: If you prefer, you can create a branch within your fork to manage your changes. 1. **Add Commands or Features**: Either add commands to an existing module or create a new module. 1. **Push Your Changes**: Once you're satisfied with your additions or modifications, push them to your fork. 1. **Open a Pull Request**: Finally, head over to the main STARK-PLACE repository and open a pull request. We'll review your contributions and merge them! ## License The Stark-Place project is licensed under the [CC BY-NC-SA 4.0 International license](https://github.com/MarkParker5/STARK-PLACE/tree/master/LICENSE.md). You're welcome to modify, contribute to the repository, create, and share forks. Just remember to attribute the original repository and its creator, abstain from commercial use, and retain the existing license. **Note**: Failing to provide the attribution or using the project for commercial purposes breaches the licensing terms and could have legal consequences. ______________________________________________________________________ We're thrilled to have you as part of our community, and we're excited to see the innovative extensions you'll bring to Stark-Place! Remember, every contribution, big or small, helps in shaping Stark-Place into a powerful platform for all Stark users. Join the community, share your expertise, and let's build together! # Commands # Creating Commands Commands serve as foundational building blocks designed to execute specific actions. They can be implemented either synchronously or asynchronously. In the following sections, we'll explore the specific features of each type and their differences. ______________________________________________________________________ ## Sync Commands ### Simple Command with `return` A synchronous command can straightforwardly return a response, as demonstrated below: ```python from stark import Response, CommandsManager manager = CommandsManager() @manager.new('hello') def hello_command() -> Response: text = voice = 'Hello, world!' return Response(text=text, voice=voice) ``` ### Multiple responses using `yield` Although it's possible to yield multiple responses in synchronous functions, doing so may block the main thread. This can result in warnings or even halt the application. For multiple responses in sync functions, consider using the `ResponseHandler.respond` method or contemplate migrating to the [async](https://stark.markparker.me/sync-vs-async-commands/index.md) option. ```python @manager.new('foo') def foo() -> Response: yield Response(text='Hello') yield Response(text='World') # more yields... ``` ### Multiple responses using `ResponseHandler.respond` To manage multiple responses, the `ResponseHandler` can be leveraged. Simply include a property of type `ResponseHandler`, and the [dependency injection](https://stark.markparker.me/dependency-injection/index.md) mechanism will handle it automatically. ```python @manager.new('foo') def foo(handler: ResponseHandler): handler.respond(Response(text='Starting task')) # some processing handler.respond(Response(text='Task progress is 50%')) ... handler.respond(Response(text='Task is done')) ``` ### Remove response using `ResponseHandler.unrespond` To remove a response, use the `unrespond` method. If the voice assistant is in waiting mode, the response won't be repeated in the subsequent interaction. Learn more about modes in [Voice Assistant](https://stark.markparker.me/voice-assistant/index.md). ```python @manager.new('foo') def foo(handler: ResponseHandler): handler.respond(Response(text='Starting task')) ... error = Response(text='No internet connection, retrying task...') handler.respond(error) ... # when the internet connection is restored handler.unrespond(error) handler.respond(Response(text='Task is done')) ``` ### Call command from another command Commands are inherently async, so we need to syncify the async foo (or declare the current function as async and await `foo`, see [Sync vs Async](https://stark.markparker.me/sync-vs-async-commands/index.md)) #### Simple ```python from asyncer import syncify ... @manager.new('foo') def foo() -> Response: return Response(text='Hello!') @manager.new('bar') def bar() -> Response: sync_foo = syncify(foo) return sync_foo() ``` #### With dependency injection Include the `inject_dependencies` property in the function declaration. This function wraps the command for smooth dependency injection. Learn more about dependencies at [DI Container](https://stark.markparker.me/dependency-injection/index.md). ```python @manager.new('bar') def bar(inject_dependencies): return syncify(inject_dependencies(foo))() ``` ______________________________________________________________________ ## Async Commands Asynchronous commands resemble their synchronous counterparts but offer enhanced features like `await` and `yield`. ### Simple Command with `return` An asynchronous command can effortlessly return a response: ```python @manager.new('hello') async def hello_command() -> Response: text = voice = 'Hello, world!' return Response(text=text, voice=voice) ``` ### Multiple responses using `yield` Yielding multiple responses in asynchronous functions is seamless and doesn't block the main thread. ```python @manager.new('foo') async def foo() -> Response: yield Response(text='Starting task') # some processing yield Response(text='Task progress is 50%') ... yield Response(text='Task is done') ``` ### Multiple responses using `ResponseHandler.respond` As an alternative to `yield`, the asynchronous version of `ResponseHandler`, named `AsyncResponseHandler`, can be used. ```python @manager.new('foo') async def foo(handler: AsyncResponseHandler): await handler.respond(Response(text='Starting task')) # some processing await handler.respond(Response(text='Task progress is 50%')) ... await handler.respond(Response(text='Task is done')) ``` ### Remove response using `ResponseHandler.unrespond` To remove a response, use the `unrespond` method. If the voice assistant is in waiting mode, the response won't be repeated in the subsequent interaction. Learn more about modes in [Voice Assistant](https://stark.markparker.me/voice-assistant/index.md). ```python @manager.new('foo') async def foo(handler: AsyncResponseHandler): await handler.respond(Response(text='Starting task')) ... error = Response(text='No internet connection, retrying task...') await handler.respond(error) ... # once the internet connection is restored await handler.unrespond(error) await handler.respond(Response(text='Task is done')) ``` Do note that you can delete responses sent using `yield` in the same manner. There's no distinction between the two. ### Call command from another command Commands can be invoked as if they were standard async functions (coroutines). #### Simple ```python @manager.new('foo') async def foo() -> Response: return Response(text='Hello!') @manager.new('bar') async def bar(): return await foo() ``` #### With dependency injection For commands with dependencies, the `inject_dependencies` wrapper ensures seamless injection. ```python @manager.new('foo') async def foo(handler: AsyncResponseHandler) -> Response: handler.respond(Response(text='Hello!')) @manager.new('bar') async def bar(inject_dependencies): return await inject_dependencies(foo)() ``` ______________________________________________________________________ ## Extending/merging commands managers Command managers can be expanded by merging child managers into them. ```python root_manager = CommandsManager() child_manager = CommandsManager('Child') @root_manager.new('test') def test(): pass @child_manager.new('test2') def test2(): pass root_manager.extend(child_manager) # now root_manager has all commands of child_manager ``` ______________________________________________________________________ In conclusion, the foundational concepts remain consistent whether you employ synchronous or asynchronous commands. The primary distinction is in task handling: asynchronous commands facilitate non-blocking execution. As always, opt for the approach that best aligns with your application's specific requirements. # Sync vs Async Commands ## TLDR ### Needs await If you're using third-party libraries that require `await`, such as ```py results = await some_library() ``` Declare your command using `async def`: ```py @manager.new('hello') async def hello_command() -> Response: text = voice = await some_library() # asynchronous call return Response(text=text, voice=voice) ``` ### Blocking Code If your command contains blocking synchronous code (e.g., using the `requests` library or `time.sleep`), declare it using `def`: ```py import requests @manager.new('hello') def hello_command() -> Response: requests.get('https://stark.markparker.me/') # synchronous blocking code text = voice = 'Hello, world!' return Response(text=text, voice=voice) ``` ### Only Fast Code For commands that don't need to wait for external responses or perform long computations, you can use both `async def` and `def`. ### Unsure? If you just don't know, use normal `def`. ### Mix of Blocking and Async If your command contains both blocking code and `await`-requiring asynchronous code, you'll need to use [asyncer](https://asyncer.tiangolo.com). There are two methods: 1. **Recommended**: Declare the command with `async def`, use `await` for asynchronous functions, and wrap blocking code in `asyncer.asyncify`: ```py import asyncer import requests @manager.new('hello') async def hello_command() -> Response: await some_library() # asynchronous function await asyncer.asyncify(requests.get)('https://stark.markparker.me/') # converted to asynchronous text = voice = 'Hello, world!' return Response(text=text, voice=voice) ``` 2. Use a regular `def` for the command, execute blocking functions as-is, and wrap asynchronous functions in `asyncer.syncify`: ```py import asyncer import requests @manager.new('hello') def hello_command() -> Response: asyncer.syncify(some_library)() # converted to synchronous requests.get('https://stark.markparker.me/') # blocking code text = voice = 'Hello, world!' return Response(text=text, voice=voice) ``` ## Technical Details All commands in Stark are inherently asynchronous. If you declare a command as synchronous, Stark converts it to asynchronous using [asyncer.asyncify](https://asyncer.tiangolo.com/). By default, Stark concurrently manages two vital processes: speech transcription and response handling. It also has to execute commands, adding temporary processes that last as long as the command. All these processes share a single main thread. If one process blocks the thread for an extended period (e.g., with `requests.get` or `time.sleep`), it can halt the entire application. Stark includes the `BlockageDetector` to monitor the main thread and alert you if it's blocked for longer than a specified duration (default is 1 second). For commands that might cause blockages, declaring them using def is advised. Stark will then wrap these commands with asyncer.asyncify, spawning separate background threads for each process. When using async def, care should be taken to prevent the main thread from being blocked. This can be achieved by avoiding long-blocking code and opting for asynchronous libraries like `aiohttp` over synchronous ones such as `requests`. Additionally, `asyncer.asyncify` can be used to wrap blocking sections of code. For a deeper dive into synchronous vs. asynchronous programming, check [FastAPI documentation page](https://fastapi.tiangolo.com/async/). To learn more about transitioning between functions and threads, refer to the [asyncer documentation](https://asyncer.tiangolo.com/). # Command Response The `Response` class represents the outcome of processing a command in the S.T.A.R.K. This documentation section will help you understand the various properties of the `Response` class, allowing you to craft detailed and specific responses to user queries. ## Response Properties ### `voice: str | LocalizableString` **Default:** `''` This string will be converted to speech and played back to the user. If left empty, no vocal response will be given. Accepts `LocalizableString` for localized responses β€” see [Localizing Responses](https://stark.markparker.me/localization-and-multilingual/localizing-responses/index.md). ### `text: str | LocalizableString` **Default:** `''` This property provides a textual representation of the response. It can be displayed in an application interface or used for logging. Accepts `LocalizableString` for localized responses. ### `status: ResponseStatus` **Default:** `ResponseStatus.success` This property indicates the state or result of the command's processing. It can be any of the following values: - **none:** No status set. - **not_found:** Command not recognized or found. - **failed:** Command processing failed. - **success:** Command processed successfully. - **info:** An informational response. - **error:** An error occurred during command processing. ### `needs_user_input: bool` **Default:** `False` This property, when set to `True`, signals that the assistant is actively awaiting additional input from the user. Additionally, if the response is queued for repetition and `needs_user_input` is set to `true`, the repetition will pause following the current response. This pause gives users the opportunity to address or answer any queries posed by the assistant without being interrupted by subsequent repeated messages. ### `commands: list[Command]` **Default:** `[]` This property contains a list of commands associated with the response. These commands can serve various purposes, such as providing context, suggesting subsequent actions to the user, or even structuring nested menus. It's often beneficial to utilize this in conjunction with the `needs_user_input` property to create more interactive and guided user experiences. ### `parameters: dict[str, Any]` **Default:** `{}` This property holds a dictionary of supplementary data or context useful to the voice assistant or the underlying command processing framework. Examples include specifying a city when inquiring about the weather or denoting a particular room in the context of smart home operations. This feature enables dynamic and contextual interactions, enhancing the overall user experience. ### `id: UUID` A unique identifier for the response. It gets automatically set when a response is created. For internal usage only. ### `time: datetime` The timestamp when the response was created. It gets automatically set upon the creation of a new response. For internal usage only. ### `repeat_last: Response` Static instance of the Response class, that provides a mechanism to reprocess the last given response. If a new response matches the `repeat_last` instance, the voice assistant will process the previous response again. ## Response Handling in the Framework Responses play a vital role in the user interaction flow. The `VoiceAssistant` class, along with the `CommandsContext`, processes these responses to ensure the user receives accurate and timely feedback. - **Upon receiving a new response:** The `VoiceAssistant` initially verifies if the response status belongs to its ignore list. If it doesn't, the assistant subsequently evaluates the mode's timeout parameters and, if applicable, appends the response to its collection. For further details on this behavior, refer to the Modes section on the [VoiceAssistant](https://stark.markparker.me/voice-assistant/index.md) page. - **Playing the response:** Depending on the assistant's mode, the response may be converted to speech and played back to the user. - **Repeating responses:** If there has been recent interaction, the assistant may opt to repeat specific responses, ensuring the user is reminded of any ongoing processes or required actions. This dynamic and flexible system of handling responses ensures that the user experience is interactive and engaging. ______________________________________________________________________ ## Formatting Locale-Sensitive Values with PyICU When building responses that include numbers, dates, units, or currencies, [PyICU](https://pypi.org/project/PyICU/) provides locale-aware formatting out of the box. PyICU wraps the ICU C++ library, the same internationalisation engine used by platforms and projects including Apple's Foundation Kit that powers iOS/macOS apps, Android, Chromium, and many Linux applications. ```python import icu # Spelled-out numbers (useful for TTS) formatter = icu.RuleBasedNumberFormat(icu.URBNFRuleSetTag.SPELLOUT, icu.Locale("en")) formatter.format(42) # "forty-two" # Locale-aware date df = icu.DateFormat.createDateInstance(icu.DateFormat.LONG, icu.Locale("de")) df.format(icu.Calendar.getNow()) # "21. Juni 2026" # Units mf = icu.MeasureFormat(icu.Locale("en"), icu.UMeasureFormatWidth.WIDE) mf.format(icu.Measure(5, icu.UMeasureUnit.KILOMETER)) # "5 kilometers" mf.format(icu.Measure(7, icu.UMeasureUnit.POUND)) # "7 pounds" # Pluralization in message templates msg = icu.MessageFormat("{num, plural, one {# item} other {# items}}", icu.Locale("en")) msg.format([1]) # "1 item" msg.format([5]) # "5 items" ``` PyICU is not a dependency of S.T.A.R.K β€” install it separately (`pip install PyICU`) and use it alongside `LocalizableString` for formatting dynamic values before injecting them into your response templates. A tighter integration (e.g., a built-in formatting layer or a convenience wrapper) is on the radar but the exact shape is TBD β€” if you have ideas or want to draft an implementation, contributions are welcome via [STARK PLACE](https://stark.markparker.me/contributing-and-shared-usage-stark-place/index.md). For more on response localization, see [Localizing Responses](https://stark.markparker.me/localization-and-multilingual/localizing-responses/index.md). ______________________________________________________________________ This documentation is meant to provide a concise overview of the `Response` class and its role within the S.T.A.R.K framework. It's crucial to understand these properties and mechanisms to design a voice assistant that effectively communicates with the user. # Commands Context The `Commands Context` feature provides a sophisticated means to manage multi-level command structures. By facilitating a hierarchical command interface, it ensures users enjoy an intuitive and seamless interaction. ## Managing Multiple Commands In instances where a single input correlates with multiple commands, the system adeptly manages these overlaps. It gives priority to commands based on their position in the string or their declaration sequence, guaranteeing that the most pertinent command always takes precedence. ## The Contextual Hierarchy Visualize the entire system as a tree. Each context functions as a node, with its linked sub-contexts acting as its offspring. As users navigate this tree, they move between nodesβ€”either delving deeper or backtrackingβ€”to consistently find the right command match. ## Command Context Processing When processing a string: - The system adds the root context if it's missing. - It checks the current context to find a command that matches the input string. If a command doesn't fit the current context, the system goes up, removing contexts until it finds a match or runs out of contexts. - Upon a successful match, the system updates parameters, organizes dependencies, and initiates the command. - Unneeded contexts are quickly removed. ## Managing Responses Responses are neatly lined up. The system constantly checks this line, running responses and commands in the order they come in, ensuring fast and orderly processing. ## Response-embedded Context Responses can include: - **`needs_user_input: bool`**: If set to true, the system halts processing after the current response. - **`commands: list[Command]`**: Commands that can reshape context, propose subsequent actions, or establish layered interfaces. - **`parameters: dict[str, Any]`**: A supporting data list important for later processing or context definition. For additional details on responses, visit the [Command Response](https://stark.markparker.me/command-response/index.md) page. ## Code Implementation ```python @manager.new('hello', hidden=True) def hello_context(**params): voice = text = f'Hi, {params["name"]}!' return Response(text=text, voice=voice) @manager.new('bye', hidden=True) def bye_context(name: Word, handler: ResponseHandler): handler.pop_context() return Response(text=f'Bye, {name}!') @manager.new('hello $name:Word') def hello(name: Word): text = voice = f'Hello, {name}!' return Response( text=text, voice=voice, commands=[hello_context, bye_context], parameters={'name': name} ) ``` The code example provided demonstrates how to define and manage commands using a fictional `manager` object. ### `hello_context` Function - This function is marked with a `hidden=True` parameter in its decorator. This means that the command will not be available in the root context, making it inaccessible as a top-level command. - The function accepts all context parameters through `**params`, which is a dictionary. - Within the function, both the `voice` and `text` variables are set to greet the user, using the context `name` parameter. - It then returns a response with the generated greeting text and voice. ### `bye_context` Function - Similarly, this function is also hidden from the root context. - The function accepts specific parameters: `name` and `handler`. It's important to note that there's no `name` in the command pattern, which implies that it must be derived from the context. - The `handler.pop_context()` method is called, which presumably removes the current context, signaling a transition or end of interaction. - A farewell response using the `name` parameter is returned. ### `hello` Function - This function defines a command pattern where a name is expected as input, formatted as `hello $name:Word`. - Inside, it constructs a greeting using the provided name. - The response not only contains the greeting but also a list of commands (`hello_context` and `bye_context`) that can be triggered next. This showcases the hierarchical and contextual nature of the system. Additionally, the name is passed as a parameter for potential use in subsequent commands. In summary, the code example gives us a glimpse into the contextual and hierarchical command management system. With the use of the `hidden` attribute, commands can be kept away from the root context, making them accessible only when they are contextually relevant. # Patterns Patterns in the S.T.A.R.K toolkit are designed to be dynamic and extensible. They are at the core of how custom voice assistants interpret input and match it to commands. This documentation is a comprehensive guide to understanding and working with patterns in S.T.A.R.K. ## Pattern Syntax At its essence, a pattern is a string that defines the structure of input it should match. The pattern syntax is enriched with special characters and sequences to help it match a variety of inputs dynamically. ### Basics - `**`: Matches any sequence of words. - `*`: Matches any single word. - `$name:Type`: Defines a named parameter of a specific type. Example: For instance, the pattern `'Some ** here'` will match both `'Some text here'` and `'Some lorem ipsum dolor here'`. ### Advanced Syntax **Selections** Selections provide flexibility in your voice command patterns by allowing multiple possibilities for a single command spot. This can be particularly useful in accommodating various ways users might phrase the same request. - `(foo|bar|baz)`: This pattern matches any single option among the three. So, it will match either `'foo'`, `'bar'`, or `'baz'`. Think of it as an "OR" choice for the user. - `(foo|bar)?`: This pattern introduces an optional choice. It can match `'foo'`, `'bar'`, or neither. The `?` denotes that the preceding pattern (in this case, the choice between `'foo'` or `'bar'`) is optional. - `{foo|bar}`: This pattern is designed to capture repetitions. It matches one or more occurrences of `'foo'` or `'bar'`. For example, if a user said "foo foo bar", this pattern would successfully match. Note: Be cautious with this pattern as it can match long, unexpected repetitions. There are also two plain-text helper functions for ordered groups: ```python from stark.core.patterns.rules import one_from, one_or_more_from ``` - `one_from(*args)` β†’ `(a|b|c)` - `one_or_more_from(*args)` β†’ `{a|b|c}` General Tip: While creating patterns, always keep the user's natural way of speaking in mind. Testing your patterns with real users can help ensure that your voice assistant responds effectively to a variety of commands. ## Parameters Parsing Voice commands can be dynamic, meaning they can accommodate varying inputs. This is achieved using named parameters in the command pattern, with the `$name:Type` syntax. When a user input matches a pattern with named parameters, the assistant extracts these parameters and passes them to the corresponding function. For example, consider the pattern `'Hello $name:Word'`. If a user says, `'Hello Stark'`, the system will extract a parameter named `'name'` with the value `'Stark'`. However, ensure that the function declaration tied to a command pattern includes all the parameters defined in that pattern, using the same names and types. If this isn't done, you'll encounter an exception during command creation. Here's an example: ```python from stark.core.types import Word @manager.new('Hello $name:Word') async def example_function(name: Word) -> Response: text = voice = f'You said {name}!' return Response(text=text, voice=voice) ``` ## Native Types List Out of the box, the S.T.A.R.K. comes with native types that can be used as parameter types in patterns. The currently supported native types include: - `String`: Matches any sequence of words (\*\*). - `Word`: Matches a single word (\*). It's also worth noting that you can extend the list of types by defining custom object types, as we'll discuss in the next section. ## Defining Custom Object Types The S.T.A.R.K toolkit isn't just limited to native types; it empowers developers to define their own custom object types. These bespoke types are constructed by subclassing the `Object` base class and specifying a distinct matching pattern. A standout feature of the S.T.A.R.K toolkit's patterns is their seamless compatibility with nested objects. In essence, a custom object type can house parameters that are, in themselves, other custom object types. This nesting capability facilitates the crafting of complex and nuanced patterns, capable of interpreting diverse input configurations. Below is a demonstrative example of how one might structure a custom object type: ```python class FullName(Object): first_name: Word second_name: Word @classproperty def pattern(cls) -> Pattern: return Pattern('$first_name:Word $second_name:Word') context = CommandsContext(...) context.pattern_parser.register_parameter_type(FullName) ``` Upon successfully matching the pattern, S.T.A.R.K will autonomously parse and assign values to `first_name` and `second_name`. It's imperative, just as with command patterns, that class properties are congruent with the pattern in terms of both name and type. The section is well-detailed, but I have a few recommendations to make it even clearer: ______________________________________________________________________ ## Advanced Object Types with Parsing Customization In instances where the default parsing doesn't cater to your requirements, or when you need specialized processing, the `did_parse` method comes to the rescue. By overriding this method in custom object types, you can introduce intricate transformations or run custom validation checks post-parsing. Here's an illustrative example: ```python class Lorem(Object): @classproperty def pattern(cls): return Pattern('* ipsum') async def did_parse(self, from_string: str) -> str: ''' Invoked after parsing from the string and assigning the parameters detected in the pattern. Directly calling this method is typically unnecessary and uncommon. Override this method to achieve more sophisticated string parsing. The from_string argument is a LocaleString β€” same as the regular string, but provides `from_string.language_code: LanguageCode` for language-aware parsing. See Localization docs for details. ''' if 'lorem' not in from_string: raise ParseError('lorem not found') # Throw a ParseError if the string doesn't meet certain criteria self.value = 'lorem' # Assign additional properties (properties inferred from the pattern are auto-assigned) return 'lorem' # Return the smallest substring essential for this object context = CommandsContext(...) context.pattern_parser.register_parameter_type(Lorem) print(context.pattern_parser.parse_object(Lorem, "lorem ipsum")) ``` ## Custom Parser Class Example In some cases, you may want to separate the parsing logic from your data model. This is especially useful when you want to reuse parsing logic, inject dependencies, have longer life cycle, or just keep your models clean. You can define a dedicated parser class for your object type. Here's an example: ```python from stark.core.types import Object, Word from stark.core.parsing import Pattern, PatternParser, ObjectParser class Lorem(Object): @classproperty def pattern(cls): return Pattern("* ipsum") class LoremParser(ObjectParser): def __init__(self, pattern_parser: PatternParser): self.pattern_parser = pattern_parser async def did_parse(self, obj: Lorem, from_string: str) -> str: # Custom parsing logic for Lorem if "lorem" not in from_string: raise ParseError("lorem not found") obj.value = "lorem" return "lorem" context = CommandsContext(...) context.pattern_parser.register_parameter_type(Lorem, parser=LoremParser()) print(context.pattern_parser.parse_object(Lorem, "lorem ipsum")) ``` This approach allows you to keep parsing logic separate from your data model and makes it easy to inject dependencies or share logic between different models. Note that the `did_parse` method must return a substring of the input string that was successfully parsed. This substring should be the smallest possible string that still represents the object's value. In case you use 3rd party parser that can't extract substring and just provides the value, you have several options to handle this: 1. If your parser returns a string-ish value, like some kind of name, you can use `levenshtein_search_substring` from the [STARK-Levenshtein](https://stark.markparker.me/tools/stark-levenshtein/index.md) module. This will allow you efficiently find the closest fuzzy match of your named entity in the input string. 1. Consider using `NLDictionaryName` from [Phonetic Dictionary](https://stark.markparker.me/tools/phonetic-dictionary/index.md) if suits your needs. 1. If options above are not suitable, take a look at [sliding_window_parser](https://stark.markparker.me/tools/sliding-window-parser/index.md) wrapper. Note that it will call the parser method multiple times to find the best match, which can be optimized by caching intermediate results inside your parser func, but yet still requires careful usage especially with large input strings and long io-bound parsing times. ## Recommended Use of Caching for `did_parse` Method When the `did_parse` method is involved in the matching process, especially if it performs complex computations or external lookups, it can slow down the overall matching process. To alleviate this potential bottleneck, it's highly recommended to use caching. By storing previously parsed objects in a cache, you can avoid redundant work and improve the overall performance of your custom voice assistant. ______________________________________________________________________ ## (beta) Unordered Patterns By default, parameters in a pattern must appear in a fixed order. Unordered patterns relax this constraint. The user can say the parts in any order and S.T.A.R.K will still match them. There are two flavours, available as helper functions from `stark.core.patterns.rules`: ### `all_unordered(*args)` β€” all required Every listed element must be present in the input. Order doesn't matter. ```python from stark.core.patterns.rules import all_unordered pattern = Pattern(f"{all_unordered('$h:Hours', '$m:Minutes', '$s:Seconds')}") # matches "12 h 30 m 45 s", "45 s 12 h 30 m", etc. # does NOT match "12 h 30 m" (missing seconds) ``` ### `one_or_more_unordered(*args)` β€” at least one required At least one element must match. The rest are optional. Order doesn't matter. ```python from stark.core.patterns.rules import one_or_more_unordered pattern = Pattern(f"{one_or_more_unordered('$h:Hours', '$m:Minutes', '$s:Seconds')}") # matches "12 h 30 m 45 s", "12 h", "30 m 45 s", etc. # does NOT match "" (at least one must be present) ``` > **Note:** Unordered patterns use lookahead-based regex under the hood and don't work well with multi-word wildcards (`**`). For unordered multi-word parameters, use Slots instead. ## Slots Slots provide unordered parameter extraction for Object types with multiple fields. Unlike unordered patterns (which work at the regex level), Slots parse each field independently from the input string, so they handle multi-word and greedy parameters correctly. ### Defining a Slots class A Slots class is a regular `Object` subclass. Each annotated field (except `value`) becomes a slot that will be parsed independently. Fields can be required or optional (`Optional[T]` / `T | None`). ```python from typing import Optional from stark.core.types import Object, Word class TimerSlots(Object): hours: Hours # required minutes: Minutes # required seconds: Optional[Seconds] # optional # NOTE: no pattern needed for TimerSlots ``` ### Registering with SlotsParser Unlike regular Object types, Slots classes use `SlotsParser` instead of the default parser: ```python from stark.core.types.slots import SlotsParser context = CommandsContext(...) context.pattern_parser.register_parameter_type( TimerSlots, parser=SlotsParser(context.pattern_parser) # <- ) ``` ### Using Slots in patterns Reference the Slots class like any other parameter type: ```python @manager.new('set timer $timer:TimerSlots') async def set_timer(timer: TimerSlots) -> Response: h = timer.hours # Hours object or None m = timer.minutes # Minutes object s = timer.seconds # Seconds object or None ... ``` ### How it works `SlotsParser` iterates over each slot and tries to parse its type from the remaining input string. Successfully parsed substrings are removed before parsing the next slot. After all slots are processed: - At least one slot must have matched, otherwise parsing fails. - Required (non-optional) slots must all match, otherwise parsing fails. - The `value` property is set to the minimal substring spanning all matched slots. This makes Slots ideal for commands where parameters can appear in any order and may include multi-word values β€” something that regex-based unordered patterns can't handle reliably. ______________________________________________________________________ By understanding and mastering patterns in the S.T.A.R.K toolkit, you'll be well-equipped to create powerful and dynamic custom voice assistants. Happy coding! # Dependency Injection Dependency Injection (DI) is a powerful design pattern used to achieve Inversion of Control (IoC) between classes and their dependencies. Within the context of our voice assistant, Dependency Injection facilitates the provision of specific objects or values to command functions. This ensures that these functions can readily access external resources or other system components. This guide provides an overview of the Dependency Injection implementation, how to utilize it in your voice assistant, and some native dependencies. ## Response Handler There are two response handlers: `AsyncResponseHandler` and `ResponseHandler`. They oversee the processing of responses, asynchronously and synchronously, respectively. To employ them, simply include the required type (class) annotation as an argument within the function declaration. The argument's name isn't significant for this dependency. ```python @manager.new('hello') async def hello(handler: AsyncResponseHandler) -> Response: await handler.respond(Response(text = 'Hi')) ``` In the showcased example, the `AsyncResponseHandler` is automatically injected into the `foo` command function upon its invocation. ## Language Code The `LanguageCode` dependency provides the language of the substring that matched the command's pattern. It's injected per-command β€” if two commands match in different languages from the same input, each receives its own language. Matched by type annotation; the parameter name doesn't matter. ```python from stark.general.localisation.language_code import LanguageCode from stark.general.localisation import LocalizableString @manager.new({"en": "set timer", "ru": "ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€"}) async def set_timer(lang: LanguageCode) -> Response: return Response(text=LocalizableString("timer_set", lang)) ``` When the user says "ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€", `lang` is `"ru"`. When they say "set timer", `lang` is `"en"`. For mixed-language input with `TranscriptionString`, the language is the majority language of the matched substring's words. ## `inject_dependency` The `inject_dependency` method serves to integrate specific dependencies into a function. This method determines the function's dependencies and subsequently calls it. Contrary to the response handler, this dependency is identified by the argument's name. Example: ```python @manager.new('foo') async def foo(handler: AsyncResponseHandler) -> Response: return Response(text = 'foo!') @manager.new('bar') async def bar(inject_dependencies): return await inject_dependencies(foo)() ``` Here, the `foo` dependency is injected and executed within the `bar` command function. ## Accessing DIContainer in a Command The `CommandsContext` class initializes with a `dependency_manager` of the `DependencyManager` type. This manager undertakes the role of identifying and injecting the requisite dependencies for command functions. To tap into the DIContainer inside a command, simply declare the needed dependency as a command function parameter. The `DependencyManager` will resolve this parameter and supply the appropriate object or value. For more advanced access, you can extract the container as a dependency of type `DIContainer`, as demonstrated: ```python @manager.new('baz') async def baz(di_container: DIContainer): di_container.add_dependency(...) di_container.find(...) ``` This is feasible because the default DI container internally registers itself as a dependency: ```python default_dependency_manager.add_dependency(None, DependencyManager, default_dependency_manager) ``` ## Adding Custom Dependency You can incorporate custom dependencies using the `add_dependency` method of the default shared instance of `DependencyManager`. Example: ```python from stark.general.dependencies import default_dependency_manager ... default_dependency_manager.add_dependency("custom_name", CustomType, custom_value) ``` In this instance, a new dependency named `custom_name`, of `CustomType`, with the value `custom_value` is appended. If the name is set to `None`, you can later choose any name for the function argument; the dependency will be discerned solely by type (like `ResponseHandler` and `AsyncResponseHandler`). Conversely, setting the type to `None` allows the dependency to be detected purely by the argument name (like `inject_dependencies`). ## Creating a Custom Container To employ a custom container for Dependency Injection in lieu of the default one, instantiate a new `DependencyManager` and input your custom dependencies. This tailored container can subsequently be utilized during the `CommandsContext` initialization. Example: ```python custom_dependency_manager = DependencyManager() custom_dependency_manager.add_dependency(...) context = CommandsContext(..., dependency_manager=custom_dependency_manager) ``` It's worth noting that the CommandsContext always registers several native dependencies upon initialization: ```python self.dependency_manager.add_dependency(None, AsyncResponseHandler, self) self.dependency_manager.add_dependency(None, ResponseHandler, SyncResponseHandler(self)) self.dependency_manager.add_dependency('inject_dependencies', None, self.inject_dependencies) ``` However, other native dependencies will be absent in the custom container unless you manually incorporate them. ______________________________________________________________________ The adaptability provided by the Dependency Injection framework ensures your command functions remain modular, simplifying testing. As you further develop your voice assistant, utilize this system to adeptly handle your dependencies. # Voice Assistant # Voice Assistant (VA) Documentation ## Env Parameters `STARK_VOICE_CLI`: Prints voice input and output in terminal if set to 1 (default 0). Useful for testing and debugging if no other interface is available. ## Overview The VA processes user speech inputs, interacts with a set of commands, and provides responses. The behavior and response of the VA can be modified by setting different "modes". These modes define how the VA should operate in various situations, such as active listening, waiting, or when it's inactive. ## How the VA Works ### Responses and Contexts in Different Modes The VA processes user inputs and responds based on the current context and mode. A context can be thought of as a state or situation in which the VA finds itself. Depending on the mode, the VA might immediately play responses, collect them for later, require explicit triggers to respond, or have different timeouts after which it changes its behavior or mode. ### Effects of Modes on VA The mode can change the VA's behavior in various ways, such as: - Whether to immediately play responses. - Whether to collect responses for future playbacks. - Setting a pattern for explicit interactions. - Setting timeouts for interactions or before repeating a response. - Switching to another mode either after a timeout or an interaction. - Deciding to stop after an interaction. ## Mode Details The `Mode` class defines the behavior and settings of the VA in various situations. Each property of the `Mode` class influences the VA's interaction with the user and the context. ### Mode Properties - **`play_responses: bool` (default: `True`)** Determines whether the VA should immediately play the responses to user inputs. If set to `False`, the VA might hold onto responses for later or not vocalize them at all, based on other mode settings. - **`collect_responses: bool` (default: `False`)** Indicates if the VA should collect responses for later playback. When set to `True`, responses might be saved and played back later, especially if `play_responses` is set to `False`. - **`explicit_interaction_pattern: Optional[str]` (default: `None`)** This can be set to a specific string pattern. When defined, the VA requires an explicit interaction matching this pattern before processing user input. This is useful for "wake word" or command activation scenarios. - **`timeout_after_interaction: int` (default: `20`)** Defines the number of seconds the VA waits after the last interaction before considering the session as timed out. Depending on other mode settings, the VA might change its behavior or switch modes after a timeout. - **`timeout_before_repeat: int` (default: `5`)** Specifies the number of seconds before the VA can repeat a previously played response. - **`mode_on_timeout: Callable[[], Mode] | None` (default: `None`)** Defines a function that returns another mode that the VA should switch to after a timeout. - **`mode_on_interaction: Callable[[], Mode] | None` (default: `None`)** Determines a function that returns another mode that the VA should switch to upon receiving an interaction from the user. - **`stop_after_interaction: bool` (default: `False`)** If set to `True`, the VA will stop its current operation after the command response. This is useful for situations where you want to start the VA on extarnal triggers, like keyboard shortcut. ### Native Modes - **Active**: The VA is in an active listening state, transitioning to the "waiting" mode upon timeout. - **Waiting**: The VA collects responses and goes back to the "active" mode upon user interaction. - **Inactive**: The VA doesn't immediately play responses but collects them, reverting to "active" mode upon interaction. - **Sleeping**: Similar to inactive, but requires an explicit interaction pattern to activate. - **Explicit**: Requires a specific interaction pattern to proceed every command. - **External**: Similar to Explicit, but requires an external trigger to activate. ### Mode Class Code ```python class Mode(BaseModel): play_responses: bool = True collect_responses: bool = False explicit_interaction_pattern: Optional[str] = None timeout_after_interaction: int = 20 # seconds timeout_before_repeat: int = 5 # seconds mode_on_timeout: Callable[[], Mode] | None = None mode_on_interaction: Callable[[], Mode] | None = None stop_after_interaction: bool = False @classproperty def active(cls) -> Mode: return Mode( mode_on_timeout = lambda: Mode.waiting, ) @classproperty def waiting(cls) -> Mode: return Mode( collect_responses = True, mode_on_interaction = lambda: Mode.active, ) @classproperty def inactive(cls) -> Mode: return Mode( play_responses = False, collect_responses = True, timeout_after_interaction = 0, # start collecting responses immediately timeout_before_repeat = 0, # repeat all mode_on_interaction = lambda: Mode.active, ) @classmethod def sleeping(cls, pattern: str) -> Mode: return Mode( play_responses = False, collect_responses = True, timeout_after_interaction = 0, # start collecting responses immediately timeout_before_repeat = 0, # repeat all explicit_interaction_pattern = pattern, mode_on_interaction = lambda: Mode.active, ) @classmethod def explicit(cls, pattern: str) -> Mode: return Mode( explicit_interaction_pattern = pattern, ) @classmethod def external(cls) -> Mode: return Mode( stop_after_interaction = True, ) ``` ## Changing Modes Manually You can manually set the mode by assigning a Mode object to the VA's `mode` attribute. For instance, to set the VA to "waiting" mode: ```python voice_assistant.mode = Mode.waiting ``` ## Setting Up a Custom Mode To define a custom mode, create an instance of the `Mode` class and specify the desired properties. For example: ```python custom_mode = Mode(play_responses=False, timeout_after_interaction=10) voice_assistant.mode = custom_mode ``` ## Setting VA Modes from Command To have commands in the VA interact with its modes. 1. Register VA in DIContainer 1. Add VA as a command dependency 1. Access VA in command *check [Dependency Injection](https://stark.markparker.me/dependency-injection/index.md) for details* ## Customizing VA and Observing Events If you want to add a custom logic to VA events, for example update GUI, you can subclass the native VoiceAssistant class and override its methods to add desired behavior. Don't forget to call the superclass method to ensure the default behavior is preserved. Voice assistant conforms to SpeechRecognizerDelegate and CommandsContextDelegate protocols, which methods are the main events. ```python class MyVoiceAssistant(VoiceAssistant): async def speech_recognizer_did_receive_final_result(self, result: str | LocaleString): super().speech_recognizer_did_receive_final_result(result) print('You said: ', result) # Your custom logic here async def speech_recognizer_did_receive_partial_result(self, result: str): super().speech_recognizer_did_receive_partial_result(result) print(f"\rListening...: \x1b[3m{result}\x1b[0m", end="") # Your custom logic here async def speech_recognizer_did_receive_empty_result(self): super().speech_recognizer_did_receive_empty_result() # Your custom logic here async def commands_context_did_receive_response(self, response: Response): super().commands_context_did_receive_response(response) print('STARK: ', response.text) # Your custom logic here ``` For more advanced usage, see the source code or use your IDE's autocomplete. Most modern editors support "go to definition" feature which might be very helpful for this. ## Multi-Language Voice Setup To use multiple STT engines for different languages simultaneously, pass a list of recognizers to `run()`: ```python from stark import run, CommandsManager from stark.interfaces.vosk import VoskSpeechRecognizer from stark.general.localisation import Localizer manager = CommandsManager() recognizers = [ VoskSpeechRecognizer(model_url="https://...", language_code="en"), VoskSpeechRecognizer(model_url="https://...", language_code="ru"), ] localizer = Localizer(languages={"en", "ru"}) localizer.load() await run( manager=manager, speech_recognizer=recognizers, # list triggers multi-STT relay speech_synthesizer=synthesizer, localizer=localizer, ) ``` When a list is provided, `run()` automatically creates **SpeechRecognizerRelay** β€” waits for all recognizers to report, builds the best transcription by per-word confidence comparison, and emits a `VoiceTranscriptionString` with per-word language codes The relay produces a `VoiceTranscriptionString` that carries: - The best-confidence assembled text - Per-word language codes (from whichever recognizer had the highest confidence for each word) - Time-aligned `VoiceTranscriptionTrack` with word timestamps, confidence scores, and speaker embeddings - Alternative texts from each language's recognizer (for matrix cross-language matching) TODO: ref the feature flag ### Speaker Model (Experimental) Vosk supports speaker identification via speaker embedding vectors. Pass a `speaker_model_url` to enable: ```python VoskSpeechRecognizer( model_url="https://...", language_code="en", speaker_model_url="https://...", # optional speaker ID model ) ``` Speaker embeddings are stored per-word in `VoiceTranscriptionTrack.spk` and preserved through the entire flow. They are not used yet, but the infrastructure is ready for a future speaker diarization module. See [Localizing Parsing](https://stark.markparker.me/localization-and-multilingual/localizing-parsing/index.md) for details on these features. See [Feature Flags](https://stark.markparker.me/advanced/feature-flags/index.md) for additional configuration options like enabling printing the conversation or tweaking multilingual features. # Default Speech Interfaces The Stark framework offers a default mechanism to incorporate speech interfaces from various platforms. This page elucidates the structure and usage of these interfaces. ## Overview Stark's speech interfaces comprise two primary components: 1. **Speech Recognizers**: Convert spoken words into text. 1. **Speech Synthesizers**: Translate text into audible speech. Both components employ protocols, ensuring flexibility and extensibility when opting for different implementations. ### VoskSpeechRecognizer An implementation utilizing the Vosk library. This recognizer captures audio input and processes it via the Vosk offline speech recognition engine. ```python def __init__(self, model_url: str): ``` ### SileroSpeechSynthesizer Implemented using Silero models. The resultant speech can be audibly played using the `Speech` class's `play()` method. ```python def __init__(self, model_url: str, speaker: str = 'baya', threads: int = 4, device ='cpu', torch_backends_quantized_engine: str = 'qnnpack'): ``` ### GCloudSpeechSynthesizer This synthesizer leverages Google Cloud's Text-to-Speech service. Ensure your credentials are properly configured before usage. The synthesized speech can be stored as a file for subsequent playback. ```python def __init__(self, voice_name: str, language_code: str, json_key_path: str): ``` ## Usage To integrate the speech interfaces: 1. Instantiate `CommandsManager`. 1. Select and instantiate your preferred speech recognizer. 1. Select and instantiate your preferred speech synthesizer. 1. Deploy the `run()` function, supplying it with the `CommandsManager`, recognizer, and synthesizer instances. ### Example ```python manager = CommandsManager(...) recognizer = VoskSpeechRecognizer(model_url="...") synthesizer = SileroSpeechSynthesizer(model_url="...") await run(manager, recognizer, synthesizer) ``` With the above configuration, your application will commence voice command listening and generate synthesized speech based on the logic within the commands manager. ## Notes 1. Confirm the required dependencies, such as Vosk, Silero, and Google Cloud, are in place (refer to [Installation](https://stark.markparker.me/installation/index.md)). 1. Adequate error management and model verifications are essential for a production environment. 1. For more nuanced interactions based on speech recognition outcomes, adjust the delegates. Harness Stark's default speech interfaces to effortlessly and flexibly craft voice-centric applications. Choose the most suitable recognizer and synthesizer for your requirements, and integrate them smoothly. ## Implementing Custom Speech Interface For more information, consult [Custom Speech Interfaces](https://stark.markparker.me/advanced/custom-interfaces/index.md) under the "Advanced" section. # Where to Host The flexibility of the Python programming language allows Stark to be hosted on virtually any system capable of running a Python interpreter. Here’s a guide on where you can run Stark: ## Unix-based Systems (macOS, Linux) Both macOS and Linux are Unix-based systems that typically come with Python pre-installed. However: - Ensure that your Python version is updated to at least 3.12. If it isn't, consider updating it. - If you wish to run Stark on boot and keep it running in the background, you can utilize `systemd` services to automate this process. ## Windows Windows doesn’t come with Python pre-installed, but setting it up is straightforward: - Download and install Python from [python.org](https://www.python.org/). - Running Stark on Windows presents its set of challenges. If you're looking to give Stark a graphical interface, consider frameworks like [PyQt](https://riverbankcomputing.com/software/pyqt/intro), [Tkinter](https://docs.python.org/3/library/tkinter.html), [Edifice](https://github.com/zzzeek/edifice), and others. - Alternatively, for a more minimalist approach, Stark can be integrated into a system tray program using libraries like [pystray](https://github.com/moses-palmer/pystray) or [infi.systray](https://github.com/Infinidat/infi.systray), thus enabling a voice-only interface. ## Mobile Platforms ### Android As of now, a direct port of Stark for Android has not been achieved. However, you can potentially make use of the [Kivy framework](https://kivy.org/) which is designed for building cross-platform apps using Python. ### iOS An iOS port for Stark is currently under development, with no fixed release date. Similarly to Android, you might find success using the cross-platform [Kivy framework](https://kivy.org/). ## Raspberry Pi-Based Hosting The Raspberry Pi, given its versatility and cost-effectiveness, can be a perfect host for Stark. Its compact size, affordability, and wide community support make it an attractive option. To set up Stark on a Raspberry Pi: 1. Ensure you have a Raspberry Pi with an appropriate operating system installed (e.g., Raspberry Pi OS). 1. Connect a microphone to the Raspberry Pi. If you aim for high voice recognition accuracy, consider using a high-sensitive omnidirectional microphone. 1. Connect a speaker or, as in the shared example, a TV soundbar to the Raspberry Pi for output. 1. Install Python (ensure version 3.12 or later) and other necessary packages for Stark. 1. If you wish to run Stark on boot and keep it running in the background, you can utilize `systemd` services to automate this process. ## Server-Based Hosting For those looking for a more robust and scalable solution, server-based hosting offers many benefits, like access to Stark from enywhere via the internet. 1. **VPS Hosting**: Virtual Private Servers (VPS) allow you to run Stark on remote servers. This is useful if you need higher computational power, redundancy, or want to ensure that Stark remains operational even if local power or network fails. 1. **Home Server**: You can host Stark on a dedicated home server or even on personal PCs. This can be a dedicated machine or single-board computers like the Raspberry Pi. The advantage is local access and full control over your data and operations. 1. **Custom Interfaces**: With Stark running on a server, you can develop custom interfaces for access. For example, by implementing an HTTP server, as was done in the shared example, you can connect other devices to Stark. Detailed instructions can be found at [Custom Interfaces](https://stark.markparker.me/advanced/custom-interfaces/index.md). ______________________________________________________________________ ## Personal Experience To offer some inspiration, here's a mixed setup that's been effectively used: Stark was set to run 24/7 on a dedicated Raspberry Pi at home, connected to a high-quality sensitive omnidirectional microphone and a TV soundbar for audio output. An Arduino microphone module was also attached, enabling a double-clap mechanism to wake up Stark. Additionally, a small HTTP server was implemented on the Raspberry Pi, allowing a mobile phone to connect to Stark at home. The native Android libraries handled Speech-to-Text (STT) and Text-to-Speech (TTS) functionalities, and the app communicated with the Raspberry Pi using transcribed text via HTTP. To ensure Stark was accessible from anywhere in the world, [ngrok](https://ngrok.com/) was set up on the Raspberry Pi, creating a secure tunnel to the localhost, making the locally hosted Stark globally accessible. Also, a telegram bot was implemented as an inerface for both voice and text messages, used as an additinal cross-platform remote communication way. ______________________________________________________________________ Such setups illustrate the flexibility and scalability of Stark. Whether you're working with a Raspberry Pi or a dedicated server, there's room for innovation and customization in how you host and interact with Stark. ______________________________________________________________________ ## Important Note Want to see the various platforms Stark has been adapted for? Visit the **STARK-PLACE** repository to find implemented ports and extensions. If you’ve developed a unique runner for Stark – be it tray, GUI, Kivy-based, or any other kind – consider contributing to the community. Open a PR to **STARK-PLACE**; let's work together to develop the best VA platform ever, enhancing the user experience for everyone! # Tools # Corrections (Pattern Extension for Phonetic Matching) [EXPERIMENTAL] Corrections is a matching feature that widens command pattern to accept translation/phonetic variants of known keywords. When STT or user input contains a misspelling or phonetic approximation, this feature injects the variant into the compiled pattern so the command still matches. Example: user says "tern on the lite" β†’ dictionary contains "turn" and "light" β†’ regex expands `"turn"` to `"(turn|tern)"` and `"light"` to `"(light|lite)"` β†’ command "turn on the light" matches. ## How It Works The feature has three parts: ### 1. Generation: `CorrectionsProcessor` A pipeline pre-processor that runs **before** `SearchProcessor`. It accepts one or more `Dictionary` instances and uses their phonetic matching infrastructure (IPAβ†’simplephoneβ†’levenshtein with proximity graph) to find corrections. ```python from stark.core.processors import CorrectionsProcessor, SearchProcessor from stark.tools.dictionary import build_recognizable_dictionary from stark.tools.phonetic.transcription import LatinPassthroughProvider # Build a dictionary from recognizable.strings bundles dictionary = build_recognizable_dictionary(localizer, ipa_provider=LatinPassthroughProvider()) context = CommandsContext( ..., processors=[ CorrectionsProcessor(dictionaries=[dictionary]), # generates corrections SearchProcessor(), # uses them for matching ], ) ``` When a `localizer` is provided and no custom `processors` are specified, `CorrectionsProcessor` is included automatically in the default pipeline. The processor accepts any `Dictionary` instance β€” not just ones built from recognizable.strings. You can pass custom dictionaries populated with domain-specific vocabulary. **Lookup modes:** The processor supports the same modes as `Dictionary`: `EXACT`, `CONTAINS`, `FUZZY`, and `AUTO` (default). Pass via `CorrectionsProcessor(dictionaries=[...], mode=LookupMode.FUZZY)`. See [Phonetic Dictionary](https://stark.markparker.me/tools/phonetic-dictionary/index.md) for more details. **Multilingual:** For `TranscriptionString` with alternative tracks, the processor runs dictionary search per each track and stores per-track corrections. See [Localization and Multilingual](https://stark.markparker.me/localization-and-multilingual/index.md) ### 2. Expansion (automatic) When corrections are present on the input string, `PatternParser.match()` automatically injects them into the compiled pattern before matching. For each `Correction(variant, keyword)`, if `keyword` appears as a literal in the compiled regex, it's replaced with `(keyword|variant1|variant2|...)`. No flag needed despite being an experimental feature β€” expansion is triggered by the presence of corrections. ### 3. Back-tracking After a successful match, `MatchResult` records which corrections were applied: - `corrections: dict[str, str]` β€” maps each variant to its keyword (e.g. `{"tern": "turn"}`) - `corrected_string: str` β€” the matched substring with corrections applied (e.g. `"turn on the light"`) This enables UIs to show the corrected text to the user, and simplifies debugging. ## Data Sources ### Recognizable Strings (built-in) `build_recognizable_dictionary()` creates a Dictionary from all loaded `recognizable.strings` bundles. See [Localizing Parsing](https://stark.markparker.me/localization-and-multilingual/localizing-parsing/index.md) ### Custom Dictionaries Any `Dictionary` instance works β€” populate it with domain-specific vocabulary: ```python from stark.tools.dictionary import Dictionary from stark.tools.dictionary.storage import DictionaryStorageMemory from stark.tools.dictionary import build_recognizable_dictionary recognizable_dict = build_recognizable_dictionary(localizer) custom_dict = Dictionary(storage=DictionaryStorageMemory()) custom_dict.write_one("en", "spotify") custom_dict.write_one("en", "bluetooth") processor = CorrectionsProcessor(dictionaries=[recognizable_dict, custom_dict]) ``` ## IPA Provider Options Dictionary-based matching uses IPA transcription for cross-language phonetic comparison. Available providers. ```python from stark.tools.phonetic.transcription import LatinPassthroughProvider, EspeakIpaProvider dict = build_recognizable_dictionary( localizer, ipa_provider=LatinPassthroughProvider(fallback=EspeakIpaProvider()), ) ``` See [Phonetic Dictionary](https://stark.markparker.me/tools/phonetic-dictionary/index.md) and [Phonetic Tools](https://stark.markparker.me/tools/raw-phonetic/index.md) for more details and native implementations. ## Comparison with NLDictionaryName Both features use `Dictionary` for phonetic matching, but at different levels: | | Corrections | NLDictionaryName | | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | | **Best for** | Fuzzy command keyword matching | Fuzzy named entity parsing | | **Level** | Pre-processor + Pattern matching (before parsing) | Parameter parsing (inside `did_parse`) | | **What it does** | Expands patterns with homophones of known words found in the request string | Searches through dictionary programmatically if pattern matched | | **Scope** | Scans the entire input string (pre-processing), affects all commands, only expands with homophones present in both the request string and the dictionary | Specific parameter types for commands matched by pattern | | **Data source** | All provided `Dictionary` objects | Specific `Dictionary` | | **Cross-language** | Yes | Yes | | **Extra requirements** | keyword must be present in the compiled pattern as a literal | none, can have "\*\*" pattern | | **Overhead** | Longer pre-processing, fast matching | No pre-processing, longer matching | Corrections are helpful for keywords that are present as literals and can be misheard. NLDictionaryName are designed for extraction of named-entity parameters (names, places, songs). They share the same `Dictionary` infrastructure as a backend, but apply it differently. Both are in experimental stages, please try both and provide feedback. See [Phonetic Dictionary](https://stark.markparker.me/tools/phonetic-dictionary/index.md) for Dictionary and NLDictionaryName details, [Custom Processors](https://stark.markparker.me/advanced/custom-processors/index.md) for pipeline setup. # Dictionary - Phonetic Lookup > NOTE: requires an IPA provider. Default is `EspeakIpaProvider` ([libespeak-ng binary](https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md#installation) installed in the system). For latin-only use cases, `LatinPassthroughProvider` works without external dependencies, is faster, but less accurate. See [raw-phonetic.md](https://stark.markparker.me/tools/raw-phonetic/index.md) for more details. ## Overview ### Basic Lookup Create a dictionary in memory and add an entry: ```python dictionary = Dictionary(storage=DictionaryStorageMemory()) dictionary.write_one('en', "Linkin Park", {"id": 2017}) ``` Then you can look up names by different spellings, homophones, or even cross-language phonetic similarity: ```python matches = dictionary.lookup("linkoln perk", 'en') # misspelled case matches[0].metadata # {"id": 2017}) matches = dictionary.lookup("Π»Ρ–Π½ΠΊΡ–Π½ ΠΏΠ°Ρ€ΠΊ", 'ua') # ukrainian spelling of Linkin Park matches[0].metadata # {"id": 2017}) ``` ### Search in Sentence You can also scan an entire sentence for names from a dictionary: ```python dictionary.search_in_sentence("good morning play linkin park on spotify", 'en') ``` Both `lookup` and `search_in_sentence` receive two optional parameters: `mode: LookupMode = .AUTO` and `field: LookupField = .PHONETIC`. ```python class LookupMode(Enum): EXACT = auto() # the fastest CONTAINS = auto() # fast FUZZY = auto() # slow, not recommended at 10K+ entries AUTO = auto() # recommended: tries modes sequentially until match with some dict-size limits class LookupField(Enum): NAME = auto() # search by original name, only same lang is reasonable PHONETIC = auto() # search by phonetic similarity, cross-lang support ``` ### Sorting Also, there are `lookup_sorted` and `search_in_sentence_sorted` methods that automatically sort results by levenshtein distance. These might add a noticeable overhead when many entries are matched (starting from magnitude of a hundred). In most cases, it's better to use the not sorted version, check results amount, and then sort them manually if needed. Example of levenshtein sort: ```python sorted( matches, key=lambda item: levenshtein_similarity( # sort by the original name for same languages s1=name_candidate, s2=item.name, ) if item.language_code == language_code else levenshtein_similarity( # sort by phonetic similarity for cross-language s1=transcription(name_candidate, language_code), s2=item.phonetic, ), reverse=True, ) ``` > More details about levenshtein for fuzzy string matching [here](https://stark.markparker.me/tools/stark-levenshtein/index.md) page. But in many cases, domain-specific sorting and filtering is the best approach. For example, in navigator app you can prioritize names that are closer to the user's location. For example, Georgia the state for american users, but Georgia the country for european. ## Using with NLDictionaryName You can use NLDictionaryName to parse and match names from a Dictionary. It has already implemented `did_parse`, so no need to implement it yourself. ```python from stark.tools.dictionary.dictionary import Dictionary from stark.tools.dictionary.nl_dictionary_name import NLDictionaryName from stark.tools.dictionary.storage import DictionaryStorageMemory class NLCityName(NLDictionaryName): dictionary = Dictionary(storage=DictionaryStorageMemory()) # any NLDictionaryName must implement dictionary: Dictionary # Fill the dictionary as usual NLCityName.dictionary.clear() NLCityName.dictionary.write_one("de", "NΓΌrnberg", {"coords": (49.45, 11.08)}) NLCityName.dictionary.write_one("en", "London", {"coords": (51.51, -0.13)}) NLCityName.dictionary.write_one("en", "Paris", {"coords": (48.85, 2.35 )}) @manager.new('weather in $city:NLCityName') def hello(weather: NLCityName): print(weather.value[0].item.metadata["coords"]) # (48.85, 2.35) for "weather in parish" ``` Data model overview: ```python class NLDictionaryName: value: list[LookupResult] dictionary: Dictionary class LookupResult: span: Span item: DictionaryItem @dataclass class DictionaryItem: name: str phonetic: str simple_phonetic: str language_code: str metadata: Metadata # dict[str, object] ``` Inspect your IDE suggestions and the source code (most modern editors support "go to definition" feature) for more details. ## Automatic [Corrections](https://stark.markparker.me/tools/corrections/index.md) Generation Corrections is a matching feature that widens command pattern to accept translation/phonetic variants of known keywords. When STT or user input contains a misspelling or phonetic approximation, this feature injects the variant into the compiled regex so the command still matches. See [Corrections](https://stark.markparker.me/tools/corrections/index.md) for how Dictionary integrates with the corrections pipeline. ## Encapsulate Storage and Filling Logic You can encapsulate storage and filling logic in a single class: ```python class MyDictionary(Dictionary): def __init__(self): super().__init__(storage=DictionaryStorageSQL("sqlite:///my-phonetic-dictionary.db")) async def build(self): self.write_all(...) # Fill from files, db, or API class NLExampleDictionaryName(NLObject): dictionary = MyDictionary() ``` ## Building Example While you can modify a Dictionary even in runtime, the best approach is to fill the dictionary at build stage if possible, since writing might be slow for large dictionaries (especially starting from magnitudes of thousands). There is the example main.py that uses [typer](https://typer.tiangolo.com) to add `build` and `run` cli commands to your app. ```python import typer cli = typer.Typer() @cli.command() def build(): """Build the project. See typer docs for better CLI with features like progress bars and logging.""" print("Building...") NLExampleDictionaryName.dictionary.build_if_needed() # fill the sqlite file once during the build stage, not at runtime SomeOtherDictionary.build() # or force re-build on each call # etc print("Done") @cli.command() def run(): """Run your main app here.""" pass if __name__ == "__main__": cli() ``` # Raw Phonetic Tools: IPA & Simplephone ## Overview These tools convert text to phonetic representations for fuzzy matching, name lookup, and cross-language search. They power the phonetic matching in the [Dictionary Tool](https://stark.markparker.me/tools/phonetic-dictionary/index.md) and are often used together (simplephone code of the phonetic transcription) for best results. - `transcription`: Converts text in any language to a simplified Latin transcription using IPA (International Phonetic Alphabet). The default implementation currently uses espeak-ng (requires [libespeak-ng binary](https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md#installation) installed in the system). STARK also provides an epitran wrapper as an alternative for espeak, and allows passing any custom implementation as a parameter. - `simplephone`: Further reduces a transcription (or plain English text) to a simple, language-agnostic phonetic code for fast, robust matching. ## Basic Usage ```python from stark.tools.phonetic.transcription import transcription, ipa2lat from stark.tools.phonetic.simplephone import simplephone # Convert ukrainian to simplified Latin phonetic transcription (IPA-based) ipa = transcription("Π›Ρ–Π½ΠΊΡ–Π½ ΠŸΠ°Ρ€ΠΊ", "uk") # e.g. "Π›Ρ–Π½ΠΊΡ–Π½ ΠŸΠ°Ρ€ΠΊ" β†’ "linkin park" # Convert to simplephone code (robust, language-agnostic) sp = simplephone("Linkin Park") # e.g. "linkin park" β†’ "LNKNPARK" # Combine for best fuzzy matching (recommended for cross-language) sp_combined = simplephone(transcription("Π›Ρ–Π½ΠΊΡ–Π½ ΠŸΠ°Ρ€ΠΊ", "uk")) # β†’ "LNKNPARK" # Direct IPA to Latin conversion latin = ipa2lat("tΙ›st") # β†’ "test" ``` ## Function Reference ### def transcription ```python def transcription(text: str, language_code: str, ipa_provider: IpaProvider = EspeakIpaProvider()) -> str ``` - Converts a string to a simplified Latin phonetic transcription using IPA via espeak-ng. - Handles many languages (see espeak-ng docs for supported codes). - Used for cross-language and accent-insensitive matching. **Parameters:** - `text`: Input string. - `language_code`: BCP-47 or ISO language code (e.g. `"en"`, `"uk"`, `"de"`). - `ipa_provider`: Optional, allows custom IPA provider (default: EspeakIpaProvider). **Returns:** Simplified Latin transcription as a string. ### def simplephone ```python def simplephone(text: str, glue: str = " ", sep: str = string.whitespace) -> str | None ``` - Converts a string to a simple, language-agnostic phonetic code. - Inspired by Caverphone, Soundex, and KΓΆlner Phonetik. - Ignores spaces, strips non-alphabetic characters, and normalizes similar sounds. **Parameters:** - `text`: Input string consisting of latin characters. - `glue`: Separator for joining words (default: space). - `sep`: Characters to treat as word separators (default: whitespace). **Returns:** Simplephone code as a string, or `None` if input is empty. ## Typical Usage Pattern For best fuzzy matching (especially cross-language), use both together: ```python # For English input a = simplephone(transcription("Linkin Park", "en")) # β†’ "LNKNPARK" # For Ukrainian input b = simplephone(transcription("Π›Ρ–Π½ΠΊΡ–Π½ ΠŸΠ°Ρ€ΠΊ", "uk")) # β†’ "LNKNPARK" a == b # True ``` This enables matching names and words across different languages and spellings. ## More fuzzyness For even more fuzzyness, consider using the levenshtein distance with the default proximity graph for simplephone (`SIMPLEPHONE_PROXIMITY_GRAPH`). For details see [STARK's Levenshtein implementation](https://stark.markparker.me/tools/stark-levenshtein/index.md) ## IPA Providers The `transcription()` function accepts an `ipa_provider` parameter: - `EspeakIpaProvider()` β€” default, requires [espeak-ng](https://github.com/espeak-ng/espeak-ng) system binary - `EpitranIpaProvider()` β€” pure Python via the `epitran` library, supports 120+ languages, slightly slower than `EspeakIpaProvider` and different language support - `LatinPassthroughProvider(fallback=None)` β€” returns latin text unchanged (lowercased), delegates non-latin to the fallback provider, raises `ValueError` for non-latin text if no fallback is provided. No external dependencies for latin-only text. Fastest for latin-only text, but less accurate. You are free to implement your own IPA provider by subclassing `IpaProvider`. ```python from stark.tools.phonetic.transcription import transcription, LatinPassthroughProvider # No espeak needed for English result = transcription("hello world", "en", ipa_provider=LatinPassthroughProvider()) # β†’ "hello world" ``` ## Notes - These functions are used internally by the Dictionary and [Corrections](https://stark.markparker.me/tools/corrections/index.md) for phonetic and fuzzy lookup. - For more details, see the source code or use your IDE's autocomplete. # Sliding Window Parser ## Overview `sliding_window_parser` helps you find and extract parameters from free text using a parser function even if it doesn't parse the entire input or returns just the value without the substring or the span. It slides through the sentence with growing/shrinking substring windows and tests each span until finds a suitable match. ### Basic Usage ```python from stark.tools.sliding_window_parser import sliding_window_parse, Span async def date_parser(text: str): if text.lower() in {"september 5", "5 september"}: return ("date", "2024-09-05") if text.lower() == "september": return ("month", "09") return None result = await sliding_window_parse( "remind me to call mom on september 5", parser=date_parser, ) print(result) # [(Span(27, 39), "september 5", ("date", "2024-09-05"))] ``` ### Parameters ```text async def sliding_window_parse( phrase: str, parser: Callable[[str], Awaitable[T]], min_window: int = 1, max_window: int | None = None, concurrency: int | None = None, find_one: bool = True, ) -> list[tuple[Span, str, T]]: - **phrase** – text to parse - **parser** – async callable returning a parsed value, `None`, or ParseError - **min_window / max_window** – window size range in tokens (words) - **concurrency** – limit parallel parser calls, default is `None` (unlimited) - **find_one** – stop after first match instead of collecting all Returns: A list of tuples (span, substring, value) for each match, where: - span: Span object with character offsets (start, end) in the original phrase - substring: the matched substring (phrase[span.start:span.end]) - value: the value returned by the parser If find_one=True, returns a single-item list with the first match (faster, less parser calls). If no match is found, raises ParseError, so the list is never empty, meaning result[0] is always safe. ``` # STARK-Levenshtein - Fuzzy String Matching ## Overview Minimal wrappers for Levenshtein distance and similarity, with optional phonetic/character proximity graphs, substring search, and prefix/suffix ignoring. Useful for fuzzy string matching, similarity scoring, and fuzzy substring search. Written in cython and compiled for performance. ### Basic Usage ```python from stark.tools.levenshtein import ( levenshtein_distance, levenshtein_similarity, levenshtein_match, levenshtein_distance_substring, levenshtein_search_substring, SIMPLEPHONE_PROXIMITY_GRAPH, # Is more meaningful to use for simplephone strings, see phonetic tools docs SKIP_SPACES_GRAPH, # ignores spaces while matching ) # Get the Levenshtein distance (lower = more similar, 0 = exact match) lev = levenshtein_distance(s1="kitten", s2="sitting") # Get similarity score (0.0 to 1.0, higher = more similar) sim = levenshtein_similarity(s1="kitten", s2="sitting") # Check if two strings are similar enough (similarity >= threshold) is_match = levenshtein_match(s1="kitten", s2="sitting", threshold=0.7) # Find all substrings in s2 with minimal distance to s1 dist_spans = levenshtein_distance_substring(s1="kitten", s2="the sitting cat") # Returns: list of (Span, distance) # Find substrings in s2 where similarity to s1 is above threshold search_spans = levenshtein_search_substring(s1="kitten", s2="the sitting cat", threshold=0.7) # Returns: list of (Span, similarity) ``` ### Parameters All functions accept: - `s1: str` – first string to compare (**required**) - `s2: str` – second string to compare (**required**) - `proximity_graph: dict[str, dict[str, float]] | None = None` – custom operation costs instead of default 1. For example, based on phonetic similarity, keyboard proximity, or just to ignore some characters. - `max_distance: float | None = None` – skip calculation if distance exceeds this value and early_return is True (optional) - `ignore_prefix: bool = False` – ignore matching prefixes, required for substring search - `ignore_suffix: bool = False` – ignore matching suffixes, breaks substring search - `narrow: bool = False` – restrict to shortest possible substring (substring search) - `early_return: bool = True` – return as soon as threshold is met (faster). False value is for debug only. - `lower: bool = False` – compare strings as lowercase Functions with a `threshold` parameter: - `threshold: float = 0` – similarity threshold for match/search; used to calc max_distance, which stops the calculation early if distance exceeds this value to improve performance ### Constants ```python type ProximityGraph = dict[str, dict[str, float]] PROX_MED = 0.5 PROX_LOW = 0.25 PROX_MIN = 0.01 SIMPLEPHONE_PROXIMITY_GRAPH: ProximityGraph = { "w": {"f": PROX_MED, "a": PROX_LOW, "y": PROX_LOW}, "y": {"a": PROX_LOW, "w": PROX_LOW}, "a": {"y": PROX_LOW, "w": PROX_LOW, "-": PROX_LOW}, # '-' for deletion "f": {"w": PROX_MED}, " ": {"-": PROX_MIN}, # ignore spaces "-": {"a": PROX_LOW, " ": PROX_MIN}, # insertion } SKIP_SPACES_GRAPH = {" ": {"-": PROX_MIN}, "-": {" ": PROX_MIN}} ``` ______________________________________________________________________ For more advanced usage, see the source code or use your IDE's autocomplete. # Advanced # Speech Interface Protocols and Custom Implementation When working with voice-driven applications, a robust and flexible architecture for handling both speech recognition and synthesis is vital. The Stark framework provides these features via interfaces (protocols) that can be easily extended and customized. This page dives deeper into the Stark framework's speech interface protocols and provides details on their implementation. ## Recognizer ### Protocol ```python @runtime_checkable class SpeechRecognizerDelegate(Protocol): async def speech_recognizer_did_receive_final_result(self, result: str): pass async def speech_recognizer_did_receive_partial_result(self, result: str): pass async def speech_recognizer_did_receive_empty_result(self): pass @runtime_checkable class SpeechRecognizer(Protocol): is_recognizing: bool delegate: SpeechRecognizerDelegate | None async def start_listening(self): pass def stop_listening(self): pass ``` ### Explanation #### SpeechRecognizerDelegate This protocol provides callback methods to output results of various states of the speech recognition: - `speech_recognizer_did_receive_final_result`: Triggered when a final transcript is available. - `speech_recognizer_did_receive_partial_result`: Fired upon receiving an interim transcript. - `speech_recognizer_did_receive_empty_result`: Called when no speech was detected. #### SpeechRecognizer This protocol defines the primary input interface for any speech recognition implementation. It consists of: - `is_recognizing`: A flag indicating if the recognizer is currently active. - `delegate`: An instance responsible for handling the recognition results. - `start_listening`: A method to initiate the listening process. - `stop_listening`: A method to halt the listening process. ### Implementation Reference To illustrate a custom implementation, we can reference the `VoskSpeechRecognizer`. This implementation leverages the Vosk offline speech recognition library. It downloads and initializes the Vosk model, sets up an audio queue, and provides methods to start and stop the recognition process. For a deeper understanding, review the source code of the `VoskSpeechRecognizer` implementation. ## Synthesizer ### Protocol ```python @runtime_checkable class SpeechSynthesizerResult(Protocol): async def play(self): pass @runtime_checkable class SpeechSynthesizer(Protocol): async def synthesize(self, text: str) -> SpeechSynthesizerResult: pass ``` ### Explanation - **SpeechSynthesizerResult**: This protocol defines a structure for the output of the speech synthesis process. It provides a method, `play`, to audibly present the synthesized speech. - **SpeechSynthesizer**: This protocol represents the primary interface for any speech synthesis implementation. It contains: - `synthesize`: An asynchronous method that takes text input and returns a `SpeechSynthesizerResult` instance. ### Implementation Reference For a hands-on example, the `SileroSpeechSynthesizer` and `GCloudSpeechSynthesizer` classes illustrate how one might implement the synthesizer protocol using the Silero models and Google Cloud Text-to-Speech services, respectively. To gain more insights, you can check the source code of the `SileroSpeechSynthesizer` implementation. ## Alternative Interfaces ### CLI Interface In this approach, you leverage the terminal or command line of a computer as the interface for both speech recognition and synthesis. Instead of speaking into a microphone and receiving audio feedback: - **Recognition**: Users type their queries or commands into the terminal. The system then processes these textual inputs as if they were transcribed from spoken words. - **Synthesis**: Instead of "speaking" or playing synthesized voice, the system displays the response as text in the terminal. This creates a chat-like experience directly within the terminal. This is an excellent method for debugging, quick testing, or when dealing with environments where audio interfaces aren't feasible. ### GUI Interface The GUI (Graphical User Interface) provides an intuitive and interactive way to implement custom speech interfaces for voice assistants. It offers a multifaceted experience, allowing users to: - **Text Outputs**: Display text-based responses, enabling clear communication with users through written messages. - **Context Visualization**: Visualize context and relevant information using graphics, charts, or interactive elements to enhance user understanding. - **Text and Speech Input**: Accept input through both text and speech, allowing users to interact in the manner most convenient for them. - **Trigger with Buttons**: Incorporate buttons or interactive elements that users can click or tap to initiate voice assistant interactions, providing a user-friendly interface. The GUI interface serves as a versatile canvas for crafting engaging voice assistant experiences, making it an excellent choice for applications where graphical interaction enhances user engagement and comprehension. ### Telegram Bot as an Interface Telegram, a popular messaging platform, provides an amazing bot API that developers can use to create custom bots. By leveraging this API, you can emulate speech interfaces in two distinct ways: #### 1. **Voice Messages** - **Recognition**: Users send voice messages to the Telegram bot. These voice messages can be transcribed into text using a speech recognition system. The recognized text can then be processed further by the bot for commands or queries. - **Synthesis**: Instead of sending back text responses, the bot can use a text-to-speech system to generate voice messages, which it then sends back to the users. This method provides a more authentic "voice assistant" experience within the messaging environment. By utilizing voice messages, you can create a more immersive experience for users, closely resembling interactions with traditional voice assistants. #### 2. **Text Messages** - **Recognition**: Users send text messages to the Telegram bot. The bot then treats these messages as if they were the transcribed text of spoken words. - **Synthesis**: Rather than synthesizing spoken responses, the bot sends back text messages as its replies. The users read these messages as if they were listening to the synthesized voice of the system. This approach offers a chat-like experience directly within the Telegram app, providing a seamless interaction that many users find intuitive. In both methods, the use of a Telegram bot allows developers to introduce voice command functionalities in messaging environments, reaching users on various devices and platforms. ______________________________________________________________________ Venture in mind that these are mere illustrations of potential implementations. The canvas of possibilities is vast, bounded solely by the horizons of your creativity. # Custom Processors Processors form a modular pipeline for string pre-processing and command search. Each processor in the pipeline receives the input string and can either find commands, enrich the parsing context, or pass through to the next processor. ## Data Flow ```text process_string(input) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Processor 1: Pre-processing β”‚ e.g. CorrectionsProcessor β”‚ Input: string, recognized_entities β”‚ - reads string metadata β”‚ Output: ([], 0) β€” pass-through β”‚ - updates corrections β”‚ Side effects: β”‚ - appends to recognized_entities β”‚ string.corrections β”‚ β”‚ recognized_entities β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Processor 2: Pre-processing β”‚ e.g. SpacyNERProcessor β”‚ Input: string, recognized_entities β”‚ - reads string text β”‚ Output: ([], 0) β€” pass-through β”‚ - appends RecognizedEntity objects β”‚ Side effects: β”‚ β”‚ recognized_entities β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Processor 3: Search β”‚ e.g. SearchProcessor β”‚ Input: string, recognized_entities β”‚ - uses corrections β”‚ Output: ([SearchResult, ...], 0) β”‚ for regex expansion β”‚ Uses: β”‚ - uses recognized_entities β”‚ PatternParser.match() β”‚ for parameter extraction β”‚ string.translate_position() β”‚ - uses alternative_texts β”‚ string[start:end] β”‚ for matrix matching β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Processor 4: Fallback (optional) β”‚ e.g. LLM, web search, β”‚ Only reached if Processor 3 β”‚ template response β”‚ returned no results β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` When `CommandsContext.process_string()` is called, it runs the input through each processor in order: 1. Each processor receives the **same** `string` object and the **shared** `recognized_entities` list 1. If a processor returns non-empty results, processing **stops** β€” subsequent processors are skipped 1. Pre-processors return `([], 0)` to pass through without stopping the pipeline 1. If all processors return empty, the context resets to root ## Built-in Processors ### `CorrectionsProcessor` (pre-processor) Generates phonetic corrections using `Dictionary`-based phonetic matching. Accepts any `Dictionary` instances β€” including one built from recognizable.strings via `build_recognizable_dictionary()`. For each input word/phrase, runs dictionary sentence search and appends matching corrections to `string.corrections`. These corrections are consumed by `PatternParser._expand_corrections()` to widen compiled regexes β€” e.g., `"hello"` in the regex becomes `"(hello|helo)"`. Included automatically for recognizable.strings in the default pipeline when a `localizer` is provided. See [Corrections](https://stark.markparker.me/tools/corrections/index.md) for full documentation. ### `SpacyNERProcessor` (pre-processor) Uses spaCy NER to mark named entities (locations, organizations, etc.) as `RecognizedEntity` objects. These narrow parameter extraction bounds in subsequent processors. **Complexity:** O(N) where N = input length (spaCy's neural model). Memory: proportional to model size. ### `SearchProcessor` (command search) Matches input against all registered command patterns. Handles: - Pattern matching via `PatternParser.match()` β€” O(C Γ— P) where C = commands in the current context window, P = pattern complexity - Matrix cross-language matching across alternative tracks (when `STARK_ENABLE_MULTILANG_MATRIX=1`) β€” multiplies by T (number of tracks which is the number of languages with active STT) - Corrections regex expansion β€” O(C) string replacements per match, where C = corrections - Overlap resolution with cross-track position translation β€” O(R), where R = results **Complexity:** O(T Γ— C Γ— P) for matching + O(R) for overlap resolution ## Creating a Custom Processor Subclass `CommandsContextProcessor` and override either `process_string` (for pipeline-wide logic) or `process_context_layer` (for per-context-layer logic): ```python from stark.core.commands_context_processor import CommandsContextProcessor class MyPreProcessor(CommandsContextProcessor): async def process_string(self, string, context, recognized_entities): # Pre-process: enrich metadata, add recognized entities # Return ([], 0) to pass through to the next processor return [], 0 class MySearchProcessor(CommandsContextProcessor): async def process_context_layer(self, string, context, context_layer, recognized_entities): # Search for commands in this context layer # Return list of SearchResult return [] ``` ## Registering Processors Pass your processors to `CommandsContext` or `run()`. Order matters β€” pre-processors before search, search before fallback: ```python from stark.core.processors import CorrectionsProcessor, SearchProcessor, SpacyNERProcessor from stark.tools.dictionary import build_recognizable_dictionary context = CommandsContext( task_group=main_task_group, commands_manager=manager, processors=[ CorrectionsProcessor(dictionaries=[dictionary]), # 1. generate phonetic corrections SpacyNERProcessor(lang_models={"en": "en_core_web_sm"}), # 2. mark entities SearchProcessor(), # 3. match commands # MyFallbackProcessor(), # 4. optionally handle unmatched input ], ) ``` ## Metadata on Input Strings Input string may be a plain python str, but also may carry metadata via `LocaleString` or subclasses. Processors can read metadata and append to mutable fields. STARK provides next table of metadata attributes available on `LocaleString` subclasses: | Attribute | Type | LocaleString Subclass | Description | | ------------------- | -------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `language_code` | `LanguageCode` | All `LocaleString` | Majority language of the input | | `words` | `tuple[TranscriptionWord]` | `TranscriptionString` | Per-word language annotations | | `corrections` | `list[Correction]` | `TranscriptionString` | **Mutable.** Phonetic corrections for regex expansion | | `alternative_texts` | `dict[str, LocaleString]` | `TranscriptionString` | Same utterance from different language models | | `track` | `VoiceTranscriptionTrack` | `VoiceTranscriptionString` | Word timestamps, confidence, speaker data. Subclass of `TranscriptionString`. Produced by `VoskSpeechRecognizer` and passed unchanged by `VoiceAssistant` | The type of the input string is determined by the IO layer (like STARK's `VoiceAssistant`). You can implement your own IO and processor layers and pass any metadata by subclassing `LocaleString` or its subclasses. ## Inter-Processor Communication ### `RecognizedEntity` Marks a substring that likely corresponds to a specific named entity or parameter type. It narrows parameter extraction bounds β€” the hardest part of parsing. ```python recognized_entities.append(RecognizedEntity( substring="London", type=Location, )) ``` `SearchProcessor` uses these to constrain parameter extraction β€” when a `RecognizedEntity` matches a parameter's type and appears within the regex match, the parser narrows to that exact substring. ### `corrections` Phonetic correction variants on `TranscriptionString`. Pre-processors append `Correction(variant, keyword)` pairs. `SearchProcessor` injects these into compiled patterns. ```python from stark.models.voice_transcription import Correction string.corrections.append( Correction(variant="helo", keyword="hello") ) ``` # Custom Run STARK's flexibility and extensibility can be attributed to its ability to cater to various use cases and environments. An essential feature of the framework is the capacity to customize the run function. This allows developers to personalize the core functionality, integrating custom setups, or extending the capabilities of the framework. Below is a quick guide on how to understand and make use of the custom run function. ## Understanding the Default Run Function The `run` function in STARK serves as the primary entry point that sets up and commences the voice assistant. ```python import asyncer from stark.interfaces.protocols import SpeechRecognizer, SpeechSynthesizer from stark.core import CommandsContext, CommandsManager from stark.voice_assistant import VoiceAssistant from stark.general.blockage_detector import BlockageDetector async def run( manager: CommandsManager, speech_recognizer: SpeechRecognizer, speech_synthesizer: SpeechSynthesizer ): async with asyncer.create_task_group() as main_task_group: context = CommandsContext( task_group = main_task_group, commands_manager = manager ) voice_assistant = VoiceAssistant( speech_recognizer = speech_recognizer, speech_synthesizer = speech_synthesizer, commands_context = context ) speech_recognizer.delegate = voice_assistant context.delegate = voice_assistant main_task_group.soonify(speech_recognizer.start_listening)() main_task_group.soonify(context.handle_responses)() detector = BlockageDetector() main_task_group.soonify(detector.monitor)() ``` Let's dissect it: ```python async def run( manager: CommandsManager, speech_recognizer: SpeechRecognizer, speech_synthesizer: SpeechSynthesizer ): ``` **Parameters:** - `manager`: An instance of `CommandsManager` which holds all the commands that the voice assistant can recognize and process. - `speech_recognizer`: The implementation you've selected for speech recognition. - `speech_synthesizer`: The implementation you've chosen for speech synthesis. ```python async with asyncer.create_task_group() as main_task_group: ``` Here, a task group is created using `asyncer`. Task groups allow you to manage several tasks concurrently. ```python context = CommandsContext( task_group = main_task_group, commands_manager = manager ) ``` A `CommandsContext` is initialized. This holds the context in which commands are executed, including the associated task group and the command manager. ```python voice_assistant = VoiceAssistant( speech_recognizer = speech_recognizer, speech_synthesizer = speech_synthesizer, commands_context = context ) ``` The `VoiceAssistant` is then created and initialized with the recognizer, synthesizer, and context. ```python speech_recognizer.delegate = voice_assistant context.delegate = voice_assistant ``` Both the speech recognizer and the commands context are associated with the voice assistant as their delegates. This setup ensures that when the recognizer captures any speech or when there's a command response to handle, the voice assistant processes them. ```python main_task_group.soonify(speech_recognizer.start_listening)() main_task_group.soonify(context.handle_responses)() ``` Tasks are added to the main task group: One to start the speech recognizer's listening process, and the other to handle responses from executed commands. ```python detector = BlockageDetector() main_task_group.soonify(detector.monitor)() ``` A blockage detector is introduced and initialized. This mechanism ensures that any potential deadlocks or blocking calls within the async code are detected, allowing for smooth operation. ## Customizing the Run Function Customizing the `run` function provides a pathway to inject additional functionalities or to adapt the framework to specific needs. For instance, you could: - Integrate other third-party tools or services. - Implement custom logging or analytics mechanisms. - Extend with other asynchronous operations to run concurrently with the voice assistant. When customizing, ensure that you maintain the core structure, especially the initialization of the main components and the task group management. The ordering can be crucial, especially when setting delegates. To kickstart your customization, replicate the default run function as your foundation, and weave in your specific adjustments or additions as needed. Consequently, a "Hello, World" implementation with a custom run would appear as: ```python import asyncer from stark import CommandsContext, CommandsManager, Response from stark.interfaces.protocols import SpeechRecognizer, SpeechSynthesizer from stark.interfaces.vosk import VoskSpeechRecognizer from stark.interfaces.silero import SileroSpeechSynthesizer from stark.voice_assistant import VoiceAssistant from stark.general.blockage_detector import BlockageDetector VOSK_MODEL_URL = "YOUR_CHOSEN_VOSK_MODEL_URL" SILERO_MODEL_URL = "YOUR_CHOSEN_SILERO_MODEL_URL" recognizer = VoskSpeechRecognizer(model_url=VOSK_MODEL_URL) synthesizer = SileroSpeechSynthesizer(model_url=SILERO_MODEL_URL) manager = CommandsManager() @manager.new('hello') async def hello_command() -> Response: text = voice = 'Hello, world!' return Response(text=text, voice=voice) async def run( manager: CommandsManager, speech_recognizer: SpeechRecognizer, speech_synthesizer: SpeechSynthesizer ): async with asyncer.create_task_group() as main_task_group: context = CommandsContext( task_group = main_task_group, commands_manager = manager ) voice_assistant = VoiceAssistant( speech_recognizer = speech_recognizer, speech_synthesizer = speech_synthesizer, commands_context = context ) speech_recognizer.delegate = voice_assistant context.delegate = voice_assistant main_task_group.soonify(speech_recognizer.start_listening)() main_task_group.soonify(context.handle_responses)() detector = BlockageDetector() main_task_group.soonify(detector.monitor)() async def main(): await run(manager, recognizer, synthesizer) if __name__ == '__main__': asyncer.runnify(main)() # or anyio.run(main), same thing ``` # External Triggers With the adaptability of Stark, VA can be integrated with various external triggers to provide a flexible and dynamic user experience. In the STARK framework, the integration of external triggers is seamless and can greatly enhance the interactivity of the assistant. In this guide, we will walk through how to set up and use external triggers to activate the STARK Voice Assistant. ## Setting Up External Mode The STARK framework provides a dedicated mode for external triggers: the "External" mode. When you set the VA mode to "external", it waits for an explicit trigger to activate the `SpeechRecognizer` component. Additionally, you can utilize the `stop_after_interaction` property in custom modes: ```python stop_after_interaction=True ``` When set to `True`, this ensures that after the VA finishes its current interaction, it stops the `SpeechRecognizer`, allowing for the next interaction to be initiated by an external trigger. Details on the [Voice Assistant](https://stark.markparker.me/voice-assistant/index.md) page. ## Triggering Using `start_listening()` Once the VA has stopped listening after an interaction, you can restart the `SpeechRecognizer` using the `start_listening()` method. This method serves as an entry point when you want to reactivate voice recognition after an external trigger. ## Implementing External Triggers Do note that you probably need to implement a [custom run function](https://stark.markparker.me/advanced/custom-run/index.md) to add cuncurrent process or create a separate thread. The beauty of external triggers lies in their versatility. Here are some ways to integrate them: ### Keyboard Hotkey Shortcut A simple approach is to have a specific keyboard combination to activate Stark. Tools like Python's `keyboard` library can help in detecting specific keypresses, enabling you to then call `start_listening()`. ### Hardware Integration For those looking for a hands-free approach, integrating hardware can be a fascinating option. For instance, using an Arduino microphone module, you can set up a system where Stark activates upon a distinct sound pattern, like a double or triple clap. ### Fast Wakeword Detectors Wakeword detection is a popular approach in modern VAs. Using fast lightweight wakeword detectors like Picovoice's Porcupine, you can have your VA spring into action upon hearing a specific keyword or phrase. ### Implementations and Examples You can find external trigger implementations at [stark_place/triggers](https://github.com/MarkParker5/STARK-PLACE/tree/master/stark_place/triggers) and examples of usage at [stark_place/examples](https://github.com/MarkParker5/STARK-PLACE/tree/master/stark_place/examples). ______________________________________________________________________ By embracing external triggers, you can elevate the adaptability and user experience of your voice assistant. Whether it's a simple keyboard shortcut or an intricate hardware setup, STARK's flexibility ensures that your VA is always ready and responsive, aligned with the needs of your user base. # Fallback Command / LLM Integration In the dynamic world of voice assistants and speech recognition, it's essential to account for the unpredictability of user input. Despite the comprehensive list of commands you may have configured, there will inevitably be instances where user utterances don't align with any predefined command. This is where the fallback command comes in. The fallback command in the STARK framework serves as a safety net, ensuring that when a user's voice input doesn't match any set command, there's still an appropriate and meaningful response. ## Setting Up the Fallback Command In the STARK framework, integrating a fallback command is streamlined. You can assign the `fallback_command` to the `CommandsContext` directly: ```python CommandsContext.fallback_command: Command ``` Here's a practical example: ```python from stark.core.types import String ... @manager.new('$string:String', hidden=True) async def fallback(string: String): # Your fallback logic here ... commands_context.fallback_command = fallback ``` In this example, any unrecognized string is directed to the `fallback` function, allowing you to define how the system should respond. ## Fallback Command Options With the rise of advanced language models like ChatGPT, it's now feasible to provide intelligent and contextually relevant responses even for unexpected user inputs. Integrating an LLM can elevate the user experience, making your voice assistant appear more intuitive and responsive. Fallbacks aren't limited to LLMs. You can get creative with your approach. Consider these options: - **Wikipedia API**: Search for a quick answer or definition related to the user's query. - **Google Search Parsing**: Extract snippets from top search results for a quick response. - **Custom Database Lookups**: If you have a specific dataset or database, direct fallback queries there. - **Fun random "I don't know" synonyms** ______________________________________________________________________ Fallback commands are invaluable, ensuring your voice assistant remains responsive, intelligent, and user-friendly, even in the face of unexpected inputs. With the flexibility of STARK and the power of modern Large Language Models, creating a robust voice assistant has never been easier. # Feature Flags S.T.A.R.K uses environment variables to enable or disable experimental and optional features. ## Available Flags | Flag | Default | Complexity overhead | Description | | ------------------------------- | ------- | --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `STARK_ENABLE_VOICE_CLI` | `0` | None | Print voice input/output in terminal. See [Voice Assistant](https://stark.markparker.me/voice-assistant/index.md). | | `STARK_ENABLE_MULTILANG_MATRIX` | `1` | O(T Γ— C Γ— P) β€” multiplies matching cost by T tracks | Match input against all alternative language tracks concurrently. See [Multilanguage Input](https://stark.markparker.me/localization-and-multilingual/multilanguage-input/index.md). | ## Setting Flags Set via environment variables before running your app: ```bash STARK_ENABLE_VOICE_CLI=1 python -m your_app ``` # Optimization for Stark When it comes to Stark, or any software platform, optimization is pivotal to ensuring smooth and efficient operations. Here are some pivotal guidelines and best practices to ensure that Stark runs at its best: ## Non-blocking is Key **THE MOST IMPORTANT**: Always ensure that you **DO NOT** place blocking code inside `async def` functions. Blocking code can drastically reduce the performance of asynchronous applications by halting the execution of other parts of the application. If you have commands that run blocking code, always define them using the simple `def` ([Sync-vs-Async](https://stark.markparker.me/sync-vs-async-commands/index.md)). This ensures that Stark creates a separate worker thread to handle the execution of that command. By doing so, Stark remains responsive, even when processing resource-intensive commands. ## Sync vs Async Understanding the difference between synchronous and asynchronous code is crucial. Asynchronous code allows your application to perform other tasks while waiting for a particular task to complete, thus improving efficiency. The [Sync-vs-Async](https://stark.markparker.me/sync-vs-async-commands/index.md) page provides a comprehensive comparison and guidance on how to effectively leverage both. ## Utilizing the asyncer The [asyncer](https://asyncer.tiangolo.com) documentation is a valuable resource. It provides an array of tools and methods to help convert synchronous code to asynchronous and vice-versa, aiding in the optimization process. ## Using asyncer.asyncify If you need to call blocking synchronous code within an `async def` function, utilize `asyncer.asyncify`. It allows you to effectively run synchronous code inside an asynchronous function without blocking the entire event loop. ## Grouping Asynchronous Requests If you have multiple asynchronous tasks that can be executed concurrently, group them together and await them as one unit. This approach allows tasks to be run simultaneously, improving the overall speed of the function. ```python async def task_one(): ... async def task_two(): ... # or import anyio async with anyio.create_task_group() as task_group: task_group.start_soon(task_one) task_group.start_soon(task_two) # or import asyncer async with asyncer.create_task_group() as task_group task_group.soonify(task_one)() task_group.soonify(task_one)() # or import asyncio await asyncio.gather(task_one(), task_two()) ``` ## Implement Caching Caching is a practice of storing frequently used data or results in a location for quicker access in the future. By implementing caching, you can significantly reduce repetitive computations and database lookups, leading to faster response times. Python libraries like `cachetools` or `functools.lru_cache` are popular tools for caching. ______________________________________________________________________ Optimization is a continuous process. As Stark grows and evolves, always look out for opportunities to refine and streamline its operations. Remember, the key is to ensure Stark remains responsive and efficient, offering users a seamless and efficient voice assistant experience. # Advanced Exploration Congratulations on navigating through the entirety of the STARK documentation! We've aimed to cover a comprehensive range of topics and scenarios to help you get the most out of the framework. However, the vast expanse of technology means there might always be nuances or specific use cases we might not have touched upon. ## Delving Deeper If you've scoured the documentation and still haven't found the precise information you're looking for, consider the following resources: - **Source Code**: Often, the code itself and it's tests can be the best documentation. Delve into the inner workings and intricacies of the STARK framework by perusing the source code. - **Issues & Discussions**: Engage with the community and the developers. The Issues and Discussions sections can offer insights into known challenges, proposed enhancements, and community-contributed solutions. - **STARK PLACE Repository**: Apart from the main repository, the STARK PLACE repo houses a plethora of shared modules, extensions, and utilities. It's an excellent place to find (or contribute) additional tools or modules that might be relevant to your needs. ## Customization and Extension STARK is designed with flexibility at its core. If you come across a situation where the existing methods don't align perfectly with your requirements, remember: - **Subclassing**: Feel empowered to subclass any module within the STARK framework. By doing so, you can maintain the foundational behavior and only override the specific methods you need to tailor to your needs. ______________________________________________________________________ Should you choose to delve into the source, customize components, or even contribute to the repository, we're excited to have you on board, pushing the boundaries of what STARK can achieve. Here's to building, innovating, and advancing together! # Localization and Multi-Language # Localization and Multi-Language S.T.A.R.K supports multi-language pattern matching and parsing. This section covers how to make your commands and types work across languages. - [Localizing Parsing](https://stark.markparker.me/localization-and-multilingual/localizing-parsing/index.md) β€” patterns, parameter types, `did_parse`, string bundles, `@key` syntax - [Multilanguage Input](https://stark.markparker.me/localization-and-multilingual/multilanguage-input/index.md) β€” `TranscriptionString`, per-word language metadata, alternative tracks, matrix matching - [Localizing Responses](https://stark.markparker.me/localization-and-multilingual/localizing-responses/index.md) β€” output formatting # Localizing Parsing S.T.A.R.K supports multi-language pattern matching and parsing out of the box. This page covers how to localize the input recognition side: patterns, parameter types, and `did_parse` logic. For response localization (output), see [Localizing Responses](https://stark.markparker.me/localization-and-multilingual/localizing-responses/index.md). ## Core Concepts There are three places where localization applies to input processing: 1. **Command patterns** β€” how a command is triggered (e.g., `"set timer"` vs `"ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€"`) 1. **Object type patterns** β€” how a parameter type is recognized (e.g., `Duration` matching `"hours"` vs `"часов"`) 1. **`did_parse` logic** β€” programmatic parsing that may behave differently per language (e.g., parsing `"five"` vs `"ΠΏΡΡ‚ΡŒ"` into a number) Language metadata flows through the entire pipeline on the input string itself via `LocaleString` β€” a `str` subclass that carries a `language_code` attribute. ## `LocaleString` `LocaleString` is a `str` subclass that carries language metadata. It behaves exactly like a regular string β€” equality, hashing, `in`, regex, `len`, iteration all work unchanged. All str methods that return a new string (`replace`, `strip`, slicing, `split`, etc.) are overridden to preserve the `language_code`. Note that third-party libraries and CPython C-level functions (e.g., `re.sub`, spacy) may reconstruct strings internally, bypassing Python-level overrides β€” in these cases metadata will be lost. Use `str(locale_string)` when passing to such APIs, and `locale_string._with(result)` to re-attach metadata to the output. ```python from stark.general.localisation import LocaleString s = LocaleString("hello world", "en") s.language_code # "en" s[6:] # LocaleString("world", "en") β€” metadata preserved s.replace("hello", "hi") # LocaleString("hi world", "en") ``` ## Language Codes and `"base"` The default language code is `"base"`. When no language is specified, all patterns and parsing use the `"base"` variant. This is the fallback for any language that doesn't have a dedicated pattern. All language codes are typed as `LanguageCode` β€” a `Literal` union of `"base"` and ISO 639-1 codes (e.g., `"en"`, `"ru"`, `"de"`). Your app provides the language code via `LocaleString`: ```python from stark.general.localisation import LocaleString await context.process_string(LocaleString("set timer for five minutes", "en")) await context.process_string(LocaleString("ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° ΠΏΡΡ‚ΡŒ ΠΌΠΈΠ½ΡƒΡ‚", "ru")) # Plain str works too β€” defaults to "base" await context.process_string("set timer for five minutes") ``` Language identification is not part of S.T.A.R.K's core β€” it's the app's responsibility. In a voice assistant setup, the STT engine typically provides the language code alongside the recognized text. ## Localizing Patterns ### Inline `patterns` Dict The simplest approach. Override the `patterns` classproperty on your Object type to return per-language Pattern instances: ```python class Duration(Object): value: str @classproperty def patterns(cls) -> dict[str, Pattern]: return { "base": Pattern("$n:Word (hours|minutes|seconds)"), "ru": Pattern("$n:Word (часов|ΠΌΠΈΠ½ΡƒΡ‚|сСкунд)"), } ``` When matching with `language_code="ru"`, S.T.A.R.K uses the `"ru"` pattern. For any other language code, it falls back to `"base"`. The single `pattern` classproperty still works β€” if you don't override `patterns`, it defaults to `{"base": cls.pattern}`. So existing types work unchanged. > **Note on dict ordering:** If you provide a `dict[str, str]` to `@manager.new` without a `"base"` key, the first entry by iteration order is used as `"base"`. Python dicts preserve insertion order, but this is worth being explicit about β€” always include a `"base"` key to avoid ambiguity. ### `@key` Syntax with String Bundles For production apps with many languages, embed localization keys in your pattern strings. The `PatternParser` resolves them at compile time from the `Localizer`: ```python class Duration(Object): value: str @classproperty def pattern(cls) -> Pattern: return Pattern("$n:Word (@duration_units)") ``` The `@duration_units` key is looked up in the Localizer's recognizable string files for the active language. This requires setting up a Localizer (see [String Bundles](#string-bundles) below). Keys must start with a letter or underscore, followed by letters, digits, or underscores β€” standard identifier rules (e.g., `@duration_units`, `@_private_key`, `@greeting2`). `@key` references are validated at type registration time and during health checks β€” if a key is missing from all loaded languages, you get an error at startup, not at runtime when a user triggers the command. ## Localizing Commands Commands support the same per-language patterns. Pass a `dict[str, str]` instead of a single string to `@manager.new`: ```python @manager.new({ "base": "set timer for $t:Duration", "ru": "ΠΏΠΎΡΡ‚Π°Π²ΡŒ Ρ‚Π°ΠΉΠΌΠ΅Ρ€ Π½Π° $t:Duration", }) async def set_timer(t: Duration) -> Response: ... ``` For a single-language command, just pass a string as before β€” it becomes the `"base"` pattern. The `@key` syntax works for commands as well: ```python @manager.new('@clock_timer_set_command') async def set_timer(t: Duration) -> Response: ... ``` ## Using ObjectParser for Localized Parsing `ObjectParser` is the recommended approach for types that need localized `did_parse` logic, localized programmatic patterns, or both. Every `ObjectParser` instance automatically holds a reference to the `Localizer` (injected by `PatternParser` during type registration) β€” no manual `__init__` wiring needed. ### Localized `did_parse` The `from_string` parameter in `did_parse` is a `LocaleString` β€” same as the regular string, but provides `from_string.language_code: LanguageCode` for language-specific parsing logic. With `ObjectParser`, use `self.localizer` for localized lookup tables: ```python class NLNumberParser(ObjectParser): async def did_parse(self, obj: Object, from_string: LocaleString) -> str: words_one = self.localizer.get_recognizable("words_one", from_string.language_code) ... ``` For simple types that don't self.localizer or any other features of `ObjectParser`, you can still use `did_parse` directly on the Object: ```python class NLNumber(Object): value: float async def did_parse(self, from_string: LocaleString) -> str: if from_string.language_code == "ru": self.value = parse_russian_number(from_string) else: self.value = parse_english_number(from_string) return from_string ``` More details about parsing of custom types at [ObjectParser](https://stark.markparker.me/patterns/#defining-custom-object-types) ### Programmatic Patterns For types that generate patterns at runtime (e.g., from a database or API), override the `patterns` property on your `ObjectParser`. It takes priority over the Object's `patterns` classproperty: ```python class PlaylistParser(ObjectParser): _cache: dict[str, Pattern] | None = None @property def patterns(self) -> dict[str, Pattern] | None: if self._cache: return self._cache playlists = fetch_playlists() # your data source play_word = self.localizer.get_recognizable("play", "base") or "play" self._cache = {"base": Pattern(f"({play_word}) ({"|".join(playlists)})")} return self._cache ``` Pattern resolution order: `parser.patterns[language_code]` > `object_type.patterns[language_code]` > fallback to `"base"` key. Cache invalidation is the extension's responsibility β€” the Localizer provides stable strings, but dynamic data (like playlist names) may change. ## String Bundles String bundles are files that store localized strings. S.T.A.R.K uses a simple key-value format (`.strings` files): ### File Format ```text /* optional comment */ "key" = "value"; "greeting" = "hello|hi|hey"; "duration_units" = "hours|minutes|seconds"; ``` ### Directory Structure ```text strings/ base/ localizable.strings recognizable.strings en/ localizable.strings recognizable.strings ru/ localizable.strings recognizable.strings ``` - **recognizable** β€” strings used for input matching (patterns, parsing) - **localizable** β€” strings used for output formatting (responses) β€” see [Localizing Responses](https://stark.markparker.me/localization-and-multilingual/localizing-responses/index.md) - **base/** β€” fallback strings used when a key is missing for a specific language ### Setting Up the Localizer ```python from stark.general.localisation import Localizer localizer = Localizer(languages={"en", "ru"}, base_language="en") localizer.load() # discovers and reads .strings files ``` Only languages in the `languages` set are loaded β€” the rest are ignored even if files exist on disk. The `base` directory is always loaded. `load()` automatically creates missing `strings/{lang}/` directories and empty `.strings` files for all configured languages. If a pattern uses an `@key` that doesn't exist in any loaded language, `health_check` automatically adds the key to the base `recognizable.strings` with its own name as the default value and a warning is emitted. This means you can start using `@key` syntax immediately β€” the files and entries are created for you, and you fill in translations later. Pass the Localizer when creating `CommandsContext`: ```python context = CommandsContext( task_group=main_task_group, commands_manager=manager, localizer=localizer, ) ``` The Localizer is automatically propagated to `PatternParser` and all `ObjectParser` instances registered on it. For mixed-language input with per-word language metadata, see [Multilanguage Input](https://stark.markparker.me/localization-and-multilingual/multilanguage-input/index.md). # Localizing Responses Response localization allows your assistant to reply in the user's language. The key idea: store a translation key with format arguments, so the translated template is resolved first, then the arguments are injected into it. ```python # Without localization β€” hardcoded language: Response(text=f"Hello, {name}!") # gives "Hello, Mark!" # With localization β€” deferred: Response(text=LocalizableString("greeting", "fr", name=str(name))) # Localizer resolves "greeting" for "fr" β†’ "Bonjour, {name}!" # Then formats β†’ "Bonjour, Mark!" ``` This matters because argument positions and surrounding text differ between languages β€” you can't just translate a pre-formatted string. ## `LocalizableString` `LocalizableString` stores a key, a language code, and format arguments. At response time, `Localizer.localize()` looks up the key in `localizable.strings` for the given language, then calls `.format(**arguments)` on the resolved template. ```python from stark.general.localisation import LocalizableString LocalizableString("greeting", "ru", name="Mark") # .string = "greeting" β€” the key # .language_code = "ru" β€” which translation to use # .arguments = {"name": "Mark"} β€” injected after translation ``` If the key is not found, `localize()` emits a `RuntimeWarning` and falls back to the raw key string. ## Using in Commands `Response.text` and `Response.voice` accept both plain `str` and `LocalizableString`. To know which language to respond in, annotate any parameter with `LanguageCode` β€” the framework injects the language of the matched substring automatically via dependency injection. The parameter name doesn't matter, only the type annotation: ```python from stark.general.localisation.language_code import LanguageCode @manager.new({ "base": "hello $name:Word", "ru": "ΠΏΡ€ΠΈΠ²Π΅Ρ‚ $name:Word", }) async def greet(name: Word, lang: LanguageCode) -> Response: return Response( text=LocalizableString("greeting_response", lang, name=str(name)), voice=LocalizableString("greeting_response", lang, name=str(name)), ) ``` When the user says "ΠΏΡ€ΠΈΠ²Π΅Ρ‚ ΠΌΠΈΡ€", the pattern matches via the Russian pattern, so `lang` is `"ru"`. When they say "hello world", `lang` is `"en"`. For mixed-language input with `TranscriptionString`, the language is the majority language of the matched substring's words. ## Resolving at Response Time The core framework stores `LocalizableString` as-is in the `Response` object. Resolution happens at the delegate level, where the `Localizer` is available. `VoiceAssistant` provided by STARK already does that automatically under the hood. ```python # In your custom delegate / response handler: if isinstance(response.text, LocalizableString): text = localizer.localize(response.text) else: text = response.text ``` This keeps the core framework decoupled from any specific output target β€” the same `Response` can be rendered differently by a voice assistant (TTS), a chat UI, or a logging system. ## Fallback Behavior If the key is not found in `localizable.strings` for the requested language, `localize()` falls back to: 1. The `base` language strings 1. The raw key string itself (with a `RuntimeWarning`) ## String Bundles Response strings use the same `.strings` bundle format and directory structure as pattern localization. The `localizable.strings` files are the output counterpart to `recognizable.strings`: ```text strings/ en/ localizable.strings ← response strings recognizable.strings ← pattern strings ru/ localizable.strings recognizable.strings ``` See [Localizing Parsing](https://stark.markparker.me/localization-and-multilingual/localizing-parsing/index.md) for the full bundle format reference. ## Formatting Complex Values with PyICU For formatting locale-sensitive values like numbers, dates, units, and currencies in responses, [PyICU](https://pypi.org/project/PyICU/) is a great companion library. It wraps the ICU C++ library (the same engine behind iOS/Swift's `Foundation` formatting) and provides ready-made locale-aware formatting for: - **Numbers** β€” decimal, percent, currency, and spelled-out (e.g., `"five"`, `"ΠΏΡΡ‚ΡŒ"`) - **Dates/Times** β€” locale-specific patterns, relative dates (`"yesterday"`, `"in 2 days"`) - **Units** β€” `"5 kilometers"`, `"3 lbs"`, `"2 hours"` with localized names - **Messages** β€” pluralization and gender rules (`"{num, plural, one {# item} other {# items}}"`) PyICU is not a dependency of S.T.A.R.K β€” use it alongside when you need locale-aware value formatting in your responses. See [Command Response](https://stark.markparker.me/command-response/index.md) for more examples of response building with formatted values. # Multilanguage Input When building custom IO interfaces (beyond the built-in Voice Assistant), you can provide per-word language metadata to the parsing pipeline via `TranscriptionString`. This metadata is optional β€” the parser works with plain strings and `LocaleString` too β€” but when available, it enables per-parameter language resolution for mixed-language input. ## TranscriptionString `TranscriptionString` extends `LocaleString` with per-word language annotations: ```python from stark.models.transcription_string import TranscriptionString ts = TranscriptionString.from_words([ ("set", "en"), ("timer", "en"), ("for", "en"), ("zwei", "de"), ("часа", "ru"), ]) # ts == "set timer for zwei часа" # ts.language_code == "en" (majority language) ``` When the parser slices a parameter substring (e.g., `"zwei часа"` for a Duration parameter), `TranscriptionString` automatically resolves the majority language of that span β€” in this case `"de"`, not `"en"`. The parser then uses the German Duration pattern for matching and passes the correct language to `did_parse`. All string operations (slicing, replace, strip, split) preserve the per-word language metadata. ## When to Use Use `TranscriptionString` when your input source provides per-word language information: - **STT engines** that tag each word with its detected language - **NLP pipelines** that perform language identification per token - **Translation APIs** that return source language annotations - **Manual annotation** for testing multilingual commands For single-language input, plain `LocaleString` is sufficient. ## Alternative Tracks `TranscriptionString` can carry `alternative_texts` β€” the same utterance as processed by different language models: ```python from stark.general.localisation import LocaleString ts = TranscriptionString.from_words( [("set", "en"), ("timer", "en")], alternative_texts={ "ru": LocaleString("сСт Ρ‚Π°ΠΉΠΌΠ΅Ρ€", "ru"), "de": LocaleString("set timer", "de"), }, ) ``` When `STARK_ENABLE_MULTILANG_MATRIX=1` (default), the parser tries each alternative track against its language's command patterns concurrently, merging results. This catches commands that exist only in specific languages. ## VoiceTranscriptionString For voice input, `VoiceTranscriptionString` extends `TranscriptionString` with time-aligned audio metadata: ```python from stark.models.voice_transcription_string import VoiceTranscriptionString ``` This adds per-word timestamps, confidence scores, and speaker embeddings. This data is used by the parser to resolve overlapping matches across alternative tracks, set priorities, improve recognition accuracy. Speaker identification is not used yet, but this is something to be added in the future. See [Voice Assistant](https://stark.markparker.me/voice-assistant/index.md) for the built-in multi-STT setup that produces `VoiceTranscriptionString` automatically. ## Passing to the Parser Pass `TranscriptionString` (or any `LocaleString` subclass) directly to `process_string`: ```python await context.process_string(ts) ``` The parser handles it as a regular string β€” `TranscriptionString` is a `str` subclass. The metadata enhances pattern resolution without requiring any changes to your commands or types. See [Feature Flags](https://stark.markparker.me/advanced/feature-flags/index.md) for additional configuration options like tweaking multilingual features.