Advanced Genie Flow concepts
With the tools described so far, one is already able to create sensible dialogues. But there are some nifty tricks to pull even more from the framework.
With the user_input
and ai_extraction
events, dialogues that play tennis between the user
and an LLM can be implemented. The order always looks something like:
- Genie Flow sends an initial text to the user
- User sends their input as part of a
user_input
event - LLM compiles a response and sends its input as part of an
ai_extraction
event - Genie Flow sends that response to the user
- Repeat from step 2, unless a final state has been reached
Running LLM queries can be time-consuming. Also, the true power of using LLMs comes into play when a prompt is split into multiple parts. For instance, for Step Back Prompting, you want a first prompt like "do a step back and tell me about the general rules that apply to this problem", and then adding the response to that step back prompt as context to the original query. But there are many cases where you would want to string together a number of consecutive prompts.
Genie Flow has a number of advanced features that enable the programmer to do exactly that.
the advance
event
When the intermediate results of a string of prompts need to be fed back to the user, the
programmer can introduce transitions on an advance
event. These events can be sent to the
state machine to make it advance onto the next transition without receiving any new user input.
For example, look at the following summary from the Claims Genie code. That example implements the following Genie Flow:
Some interesting parts from that code are:
from statemachine import State
from genie_flow.genie import GenieStateMachine
class ClaimsMachine(GenieStateMachine):
...
# STATES
ai_extracts_information = State(value=200)
user_views_start_of_generation = State(value=300)
ai_extracts_categories = State(value=310)
# EVENTS AND TRANSITIONS
ai_extraction = ai_extracts_information.to(user_views_start_of_generation, cond="have_all_info")
advance = user_views_start_of_generation.to(ai_extracts_categories)
...
The dialogue at some stage enters the state ai_extracts_information
, meaning that some information
is extracted from the dialogue. When all the information is gathered (cond="have_all_info"
), the
user is shown this summary.
Here we have defined a transition from the state user_views_start_of_generation
towards the
state ai_extracts_categories
. The idea being that when that first state is reached, the user is
sent some intermediate results (in this case, a summary of the information gathered so far) upon
the user front-end has the option to advance the state machine by sending it an advance
event.
The state machine then advances towards the state ai_extracts_categories
, where further processing
is done.
This means that the output of an LLM, in this case from the prompt attached to state
ai_extracts_information
, is sent to the user who can view it. The front-end should then send
an advance
event back to Genie Flow to make it advance to the next state
ai_extracts_categories
.
This way, the user can stay abreast of what is happening in the background, learn some intermediate results, and take away some of the waiting time experience when they are just looking at a screen where nothing happens.
chaining and branching
Although the advance
event is a great way to chain the output of one prompt into the input of the next,
it takes a cycle across the front-end to progress the dialogue and make it go to the
next stage.
Chaining templates in the backend means that the front-end is not updated with any intermediate results and no cycles are needed between the back- and front-ends.
The way to chain multiple subsequent calls to an LLM, where the output of one is added to the input for the next, is done by putting the two consecutive templates in a list. For instance the q_and_a_trans.py has the following templates defintions:
from genie_flow.genie import GenieStateMachine
class QandATransMachine(GenieStateMachine):
...
# TEMPLATES
templates = dict(
intro="q_and_a/intro.jinja2",
user_enters_query="q_and_a/user_input.jinja2",
ai_creates_response=[
"q_and_a/ai_response.jinja2",
"q_and_a/ai_response_summary",
],
)
Here, the template assigned to the state ai_creates_response
is assigned a list of templates.
The first one is the original template that creates the prompt for the LLM. The second one is
something like:
Summarize the following text into one paragraph that is not longer than 50 words.
Be strictly to the point, use short sentences and leave out all fluff words.
Also do not use words with more than two syllables.
---
{{ previous_result }}
This template takes the output of the previous prompt and directs the LLM to summarise that into
a paragraph of not more than 50 words. The result of the previous prompt is available to the
template as the property previous_result
. Any other model properties are also available as in
any normal template rendering.
This construct makes it easy to string together prompts that follow from one to the next. And that is very useful when the next prompt is dependent on the output of a previous one. If that is not the case, we could branch off into separate prompts that are executed in parallel. This branching is done by assigning a dictionary of prompts. This is done in the Claims Genie example as in the following extract:
from genie_flow.genie import GenieStateMachine
GenieStateMachine
class ClaimsMachine(GenieStateMachine):
...
templates = dict(
ai_extracts_categories=dict(
user_role="claims/prompt_extract_categories_user_role.jinja2",
product_description="claims/prompt_extract_categories_product_description.jinja2",
target_persona="claims/prompt_extract_categories_target_persona.jinja2",
further_info="claims/prompt_extract_categories_further_information.jinja2",
)
)
Here the template assignment for state ai_extracts_categories
is to a dictionary of different
templates. The Genie Flow framework will create separate LLM calls for each of the keys in that
dictionary that are then run in parallel. The result that is returned is a dictionary with
these same keys and the outputs of the LLM for each of the rendered templates.
Of course, the chaining and branching of templates can be combined. So you can chain together different branching templates following each other, followed by a simple template, like expressed in the following snippet:
some_state=[
dict(
foo="foo-template.jinja2",
bar="bar-template.jinja2",
),
dict(
foo_foo="foo-foo-template.jinja2",
bar_bar="bar-bar-template.jinja2",
),
"finalize.jinja2",
]
This would first run the foo
and bar
templates in parallel, feed the output of that (a
dictionary with the outputs of each individual prompt) into the foo_foo
and bar_bar
templates
that are also run in parallel. The finalize
template is then executed with a dictionary with
keys foo_foo
and bar_bar
, each with the output generated by sending the respective rendered
templates as a prompt to the LLM.
Remember that the result of a previous LLM call in the chain will be available in the property
previous_result
. If the previous step in the chain was a branching template (a dictionary of
templates), that property will contain the value of that dictionary.
Mapping and running in parallel
The special template type MapTaskTemplate
enables the user to map a template against a list
of values in the GenieModel
, and receive a list of values in return. The mapping is conducted
at run-time, meaning that the values that exist in the model at the time of invocation will
all be mapped against the given template. All these template invocations will be run in parallel,
meaning that maximum process speed will be achieved. The speed is dependent on the number of
Celery workers available at the time.
With the current release, only singular templates can be used to map against. It is foreseeable that more complex constructs such as lists and dicts will be supported.
The way to express a map task template is as follows:
from genie_flow.model.template import MapTaskTemplate
from genie_flow.genie import GenieStateMachine
from statemachine import State
...
class SomeGenieMachine(GenieStateMachine):
# STATES
mapping_a_template = State(value=500)
# TEMPLATES
templates = dict(
mapping_a_template=MapTaskTemplate(
"embed/chunk.jinja2",
"embedded_doc.chunks[*].content",
)
)
mapping_a_template
. As can be seen, this
would map the template embed/chunk.jinja2
to all the values that will come out of applying
the JMES Path expression embedded_doc.chunks[*].content
to the model, at run time.
The template could be something like:
{{ map_value }}
And with a meta.yaml
such as:
invoker:
type: genie_flow_invoker.invoker.docproc.embed.EmbedInvoker
text2vec_url: http://localhost:8080
pooling_strategy: masked_mean
This would, for each and every value, call the EmbedInvoker
to create an embedding for that
value. The result would be a list of embeddings.
Here you can find more information on JMES Path expressions or follow
the Tutorial. The expression in the above example gives
a list of all the content
values of all the elements of the list of chunks in the embedded_doc.
If the JMES Path expression does not render a list, a warning is created, and the value is placed
in a one-element list.
The following properties can be set on a MapTaskTemplate
:
template_name
- the qualified name of the template to map all values against. NB: currently only singular templates, no lists, dicts or other types are supported.
list_attribute
- the JMES Path expression that will be applied to create the list of values to map
map_index_field
- (default
map_index
) the name of the field that will contain the index for each of the mappings map_value_field
- (default
map_value
) the name of the field that will contain the value of each of the mappings
Your own Celery Task
Rather than specifying a reference to a template, or a list or dictionary of some form, the template can also be a Celery Task reference. That celery task will then be called as an argument a dictionary containing all properties of the data model attached to the state machine.
The return value of that Task will be used as any other output of an LLM call. That means that Celery Tasks can be used as part of a chain or branch, and the same rules will apply.
This gives the programmer the ability to execute arbitrary code.
Background Tasks
Retrieving a response from an LLM can take some time. A string of prompts, one feeding off of another, may take up to minutes to complete. One does not want any client who interacts with Genie Flow to wait for a response. It hogs the flow of the client logic, where one could potentially do more sensible work than wait for the result to come back.
To overcome this, Genie Flow will always respond immediately. Either with a result or with the promise of a result. It is up to the client to poll at their leisure to see if a background process has concluded and a new result can be obtained.
It is our goal to move away from polling and implement a channel approach where a client can Subscribe to messages about the finalisation of a background process.
Background processes are implemented using Celery, a Python framework for distributed queueing and parallel processing.
If a background process is started (typically by a user_input
event or an advance
event,
the Genie Flow framework will inform the client that (one of) the possible next actions is
to send a poll
event. That event will send a response that either has the output of the
longer running background, if that has concluded. If the background process is still running,
the response will carry no other information than the fact that the next possible action is
to send another poll
event.
Via this mechanism, the client is free to conduct other work and is able to check the status
of any longer-running process by sending a poll
event.
Running
As a consequence of running background tasks using Celery, to be able to run any Genie Flow application, you would need to run two different processes:
- The API that any client can talk to
- At least one Celery Worker that can pick up background tasks
Besides these two processes, you need to run a Broker and a Backend. Excellent documentation on how to operate a Celery-based application can be found on their website.