PoC Release

26 Aug, 2024

Introducing ParchmentProphet

ParchmentProphet is a Proof-of-Concept framework designed to undertake complex research and write full-length reports on any subject. It facilitates a three-step process that emulates how humans author complex reports in the real-world.

RESEARCH. A new method of knowledge retrieval allows theoretically unlimited input tokens to be condensed down into the principal salient points for a given research objective.
ANALYSIS. The condensed information is used to answer key research questions.
REPORTING. The analysis is then used as the input context under a reporting framework to draft full reports that satisfy a given scope.

In addition, ParchmentProphet contains fine-tuning tools allowing you to optimise and improve every aspect of this process for your unique requirements. This means that, as you submit more training data, your LLM models will understand more about your writing style, reporting objects, as well as your clients, stakeholders, and industry more broadly - just like a human analyst would.

For now, ParchmentProphet is implemented using OpenAI models, and we recommend using gpt-4o as a base. However, it is our objective to support self-hosted models. Refer to the Roadmap section of this article for further information.

Is ParchmentProphet right for you?

Initial testing suggests ParchmentProphet can be highly effective at producing well-written reports addressing complex subject matters... providing you meet the following prerequisites.

You want to generate a repeatable product or report type with a consistent pre-defined format. Examples might include: an audit, a briefing note, a client proposal or pitch, or some sort of periodic analysis.
You have at least 10 well-written examples of the reports you want to generate (but more is better). These are used to "seed" the model and ensure acceptable outputs from day-0. You do not necessarily need 10 examples of the same type. For example, I could have 3 market analysis reviews, 5 internal compliance reports, and 2 client proposals.
You have the time, inclination, and expertise to onboard your report types and fine-tune the output. This is not yet an automated process, so in the meantime you must know your subject matter well and spend time onboarding your report types.
You make money from writing these reports. ParchmentProphet makes heavy use of third-party AI services. If your reports don't provide you with a direct commercial advantage, using an expensive process to generate them may not suit your economic model.

What does this article contain?

The sections that follow provide technical information about the ParchmentProphet framework, including details on why it was created, the issues it attempts to resolve, and how you can experiment with the framework yourself.

This article is designed to faciliate discussion and gather feedback. The full code is available upon request, but for now is considered closed-source whilst we test the use cases more generally.

1. Background

Many organisations have an incentive to produce complex analysis. The professional services industry might be the most relevant example, given that they are directly compensated for the analysis they produce. But across many sectors - government, healthcare, finance, etc - there are people who are employed solely for the purpose of researching information and communicating it up the chain in condensed formats.

This is, broadly, the role of an analyst.

An analyst typically has an undergraduate degree where they've learned key research skills, and you'll see more than a few out there with a Master's degree or higher as well. They are generally smart, switched on, interesting people. And their job is, by definition, to provide intelligence. They collect information, research their topic, and communicate their assessments in a manner which is helpful for decision makers.

Or at least, that's the theory. Because in reality, it can take years for even the most qualified and eager individual to write reports that don't require extensive editing or reworking. This is often because the analyst is moving from an environment where they wrote essays and took exams through to an environment where they're being asked to provide simple answers to complex, often intimidating, questions.

Questions such as:

Is Country A likely to develop nuclear weapons in the next 3 years?
Should my organisation invest £XX,000,000 into Sector B?
What is the most significant threat to the renewable energy industry?
What was the root cause of a recent cyber attack?

No matter how well the analyst knows their subject, if they can't package the information into the format their clients or stakeholders need in order to to make their decisions, then their knowledge isn't providing a lot of value.

This is not unlike the situation with LLMs.

LLMs have been credited with intelligence through the way they understand topics and produce relevant outputs. But in this context, intelligence really only means "smart" or "knowledgeable", because so far they have proven unable to cater to the requirements of businesses and decision makers.

They're not particularly effective at producing compact, relevant, and informative analysis. Especially if you have, like most organisations do, guidelines on house style, key vocabulary, or specalist expertise.

And like with our analysts, that's because the LLMs haven't been through years of training yet. They aren't specalists in your industry, and they don't know your team, your stakeholders, or your objectives.

...But what if they did? What if we could put an LLM through the same training process that we would put any new analyst through?

That's what we set out to achieve with ParchmentProphet.

Objective

Develop a general-purpose framework to enable LLMs to generate full reports that present complex analysis, on any topic.

2. Problems with Retrieval Augmented Generation ('RAG')

Summary

RAG is excellent at certain tasks; however, the most common RAG implementations have critical limitations that means they often fail to provide the information required to generate complete answers. This is particularly damaging for tasks that involve vast amounts of unstructured data, such as the researched reports we set out to generate with ParchmentProphet.

We quickly identified that the existing frameworks were not sufficient to meet our objective.

What is RAG

Retrieval Augmented Generation ('RAG') is a generic term that can apply to a wide variety of techniques designed to load information from an external dataset into your LLM's context window.

RAG is necessary because LLMs convert text into representative "tokens" before generating a response, and all models have limits on the number of tokens they can process (referred to as their "context window"). This means there is a finite limit to the information they can review.

To combat this, RAG attempts to retrieve only the information from your input data that is most relevant to your query. This functionality enables you to, for instance, upload code repositories, knowledge libraries, or a large series of documents to an LLM chat application without saturating its context window.

This approach works very well for certain types of tasks, for example, helping a user quickly find specific pieces of information.

Common RAG Implementations

You most commonly see four main RAG implementations:

RAG with Keyword Search

Input data is chunked and indexed. Simple keyword searches return content from matching chunks.

RAG with Semantic Search

Input data is chunked and run through a vectorisation algorithm, which represents the chunk as a point in n-dimensional space. Once plotted, we can draw conclusions about the relevance between chunks by measuring the length of the line connecting two points (or its angle). The user's query is then plotted so that the most relevant chunks can be retrieved.

GraphRAG

Input data is chunked, and each chunk is reviewed by an LLM designed to extract any identified entities or relationships. This type of structured data is called a "Graph", and it can be queried either through Keyword Search or through Semantic Search. Once specific entities or relationships are identified through your search, greater context can be inferred by retrieving connected information.

Hybrid RAG

Multiple methods can be combined to help rank the relevance of retrieved data.

An example below demonstrate a high-level RAG implementation designed to retrieve information relating to internal company policies. When the information you are requesting is relatively simple, small, and non-contentious (i.e., there is a low likelihood of bias in the source data), then RAG implementations can work very well.

RAG Cases Diagram — FIGURE 1 - Illustration of common RAG tasks

Limitations

The implementations discussed above are unfortunately not suitable for the purposes of ParchmentProphet. This is because, regardless of which type of RAG we implement, all approaches have a few universal limitations:

RAG Limitation One - Not enough information: No matter how much we optimise our RAG implementations, we can never guarantee that we have retrieved all the information required to answer our question completely. This is especially true if there is nuance, bias, or event temporality to deconstruct.

Example

We have indexed two documents, a children's book titled "Green Bananas on Neptune" and an interview transcript with its author. In the interview, the author stated that the book was misprinted, and all instances of the word "blue" were replaced with the word "green".

Our user query is: What colour are bananas on Neptune?

The RAG processes the query and returns what it considers to be the top-10 most relevant chunks from our indexed data. In this case, all the results come either from the book itself, where the colour and location of bananas are frequently described, or from the interview transcript in places where the author introduces their work. But the comment about the misprint ranks far lower in our results, likely because it does not discuss bananas or Neptune directly.

Based on the retrieved data, our LLM produces a response that looks like this:

LLM: In the children's book "Green Bananas on Neptune", bananas are green.

However, a more complete answer to this question might be:

LLM: In the children's book Green Bananas on Neptune, bananas are green. However, in an interview following its publication, the author stated that a misprint of this book replaced the word 'blue' with the word 'green'. The author likely originally intended bananas on Neptune to be blue.

The RAG implementation failed to return the information required to generate a complete answer.

RAG Limitation Two - Too much information: Your RAG implementation may return too many chunks that it assesses are relevant, exceeding your LLM's context window. LLMs also tend towards decoherency as input tokens increase, meaning they are more prone to errors or hallucinations. Smaller context sizes are preferable, even as context windows continue to expand.

Example

You are an historian undertaking a project to document community perspectives on the standard of living. You organise and transcribe interviews from 50 participants, which you then index. Each transcript is around 1-hour long, resulting in a total of 5,000,000 tokens for the entire group.

For the sake of argument, you have optimised Your RAG implementation and are confident that all data required to generate a complete answer will be returned.

Your query is: Which participant was likely the wealthiest?

Your RAG implementation searches the indexed data and returns the specific chunks it assesses are required to answer the query.

The returned data is 500,000 tokens, nearly double your 200,000 token limit.

You see the error, but cannot identify any information that could feasibly be removed without running into the issue highlighted with Limitation One.

The LLM is unable to generate a response.

3. ParchmentProphet: From retrieval to research

Summary

ParchmentProphet overcomes the limitations of traditional RAG, by implementing a new framework named Researched-RAG ('R-RAG'). This has been designed to focus on research heavy workloads, and provides the critical data layer underpinning report generation.

Researched-RAG

ParchmentProphet implements a new RAG technique called Researched-RAG, which resolves RAG Limitation One and RAG Limitation Two through the implementation of two concepts:

Research Questions. A Research Question is a simple question posed by the user in advance of indexing. Each question must also be assigned a category, used to group questions thematically. This process is not dissimilar to how an analyst would typically identify the questions they want to answer before undertaking their research.

Example

Research Questions can range from the simple and pragmatic, such as "What is the name of the client?", through to more complex and evaluative questions, such as "What technologies has the shipping industry implemented to reduce climate emissions?".

In our database, Research Questions are stored as a simple array of objects, illustrated below.

[
    {
    "question": "What is the client's organisation name?",
    "category": "Company Information"
    },
    {
    "question": "Who are the key commercial contacts or stakeholders?",
    "category": "Commercial"
    },
    ...
]

Claims. A Claim is an attributable piece of information within a chunk containing a perspective relevant to our Research Questions. We consider Claims inherently biased, and their usefulness requires evaluation.

Example

An example Claim is provided below. They contain the information referenced, the source if known, and any supporting quotes. Each Claim also contains a chunk_id (not shown) and metadata from the source document. This means they can be later correlated against their source chunk and document context.

{
    "claim": "John Smith leases a new Mercedes-Benz through his company.",
    "source": "John Smith during a recorded interview.",
    "quotes": [
        "I actually just got a new car. A Mercedes-Benz. Its leased via my company."
    ],
}

With Researched-RAG, Claims are used to generate answers to Research Questions, by following the process below:

Prerequisite: User defines their Research Questions and assigns them thematic categories.
Indexing: Our analyst uploads information relevant to their report. This information is chunked and indexed.
Claim Extraction: Each chunk is reviewed by an LLM and Claims relevant to the analyst's Research Questions are extracted.
Researched Answers: An LLM reviews relevant Claims for each Research Question and generates an overall answer. If the volume of Claims exceed the context window of our LLM (less likely than with RAG, due to the low token count of Claims), recursive summarisation can be used without significant knowledge loss.

Researched-RAG therefore facilitates the generation of complete answers to complex questions, without setting a limit on the number of input tokens. This works around the limitations we identified with traditional RAG implementations, and means we can now deploy a knowledge retrieval engine that is more appropriate for the purposes of report generation.

This process is represented graphically below.

Diagram illustrating the Researched-RAG process — FIGURE 2 - Illustration of Researched-RAG process

GraphRAG

As illustrated above, ParchmentProphet also has a native GraphRAG implementation as part of its indexing process. However, under this proof-of-concept, GraphRAG is currently not used to answer Research Questions. Instead, we have integrated it because:

Future improvements to ParchmentProphet will rely more heavily on GraphRAG, specifically when it comes to answering questions about your input data that were not pre-defined as Research Questions
It provides a useful visual aid designed to assist an analyst with understanding their input data and recalling key information.

It is also possible that GraphRAG is integrated into Research-RAG in the future; however, we are still testing different implementations.

4. ParchmentProphet: Report generation

Summary

Using Researched-RAG to answer our Research Questions, ParchmentProphet facilitates full-length report generation. A separate fine-tuning process is then used to optimise generations and increase quality across the whole framework, with specific support for back-training. This means you can train ParchmentProphet based solely on output documents (your previous report), even if you have not kept the original input research.

Report Generation Schema

Just like how we define our Research Questions in advance of indexing, we must also define our report structure in advance as a Report Type. Once a Report Type has been defined, it can be used as a template to generate identically-structured reports based on new input data and subjects.

In ParchmentProphet, each Report Type contains a scope, persona, and a list of sections.

The Report Scope informs the LLM about the top-level reporting objectives, and could include information about the relationship between the report author and the one requesting the report, or outline the types of evidence expected to be reviewed.

Example

This scope helps contextualise who "we" are (ACME) and outlines the relationship with the intended audience (potential client). It also outlines the areas the report is expected to cover.

ACME Consulting LTD ('ACME') was invited to submit a proposal outlining our approach to addressing your core cyber security concerns. This proposal encompasses a detailed analysis of your current IT environment, recommended solutions, implementation strategies, and a roadmap for enhancing the your overall security posture.

The Report Persona informs the LLM principally to assist with Claim Extraction (and Graph Extraction). By outlining the type of persona you wish the LLM to adhere to, you indirectly instruct it to focus on specific types of Claims relevant to your field.

Example

This persona helps prompt the LLM towards identifying claims relevant to the client's holistic security culture, with a special focus on the two mentioned security frameworks.

You are an experienced cyber security consultant, skilled at applying critical insight, industry expertise, and skepticism. You understand that excellent cyber security stems from effective governance and culture, and not from point-in-time configuration reviews. You have a strong working knowledge of cyber security standards such as CIS Top 18 and NIST CSF, but prefer to assess cyber security holistically rather than nitpick over minor issues that you know can be addressed by elevating service maturity and improving governance systems.

Report Sections allows you to define a structured approach for how the LLM should generate your report. You are able to curate a specific prompt, as well as provide an example using 1-shot generation. Furthermore, sections are generated within a single LLM context window, meaning they should not repeat information that has already been generated. A generate_last key can be used to force a section to generate at the end of the process, which is very useful for sections such as an Executive Summary.

Example

An example schema of a Report Type is illustrated below.

{
  "title": "General Proposal",
  "questionnaire_id": "[link to Research Questions]",
  "report_scope": "[insert information about the reporting objectives and what it will assess]",
  "report_persona": "[describe a persona for the LLM to adopt throughout its research]",
  "sections": [
    {
      "generate_last": true,
      "title": "Requirement",
      "prompt": "Craft a short and succinct overall summary of the client's requirements.",
      "example": "[insert a high-quality example]"
    },
    {
      "generate_last": false,
      "title": "Workstreams",
      "prompt": "Identify the key high-level workstreams we will deliver,",
      "example": "[insert a high-quality example]"
    }
  ],
}

Info

An earlier draft of ParchmentProphet contained support for "structured" report sections. For example, a section that contains a series of findings, each of which must have identical attributes such as Description, Impact, and Likelihood. However, this was dropped from the proof-of-concept release. It will be added back in shortly. Refer to our Roadmap section for further information.

Combined with Researched-RAG, the ParchmentProphet report generation framework is illustrated below. This process allows the generation of full-length reports aligned to a consistent schema and based on potentially limitless input tokens.

Report Generation Diagram — FIGURE 3 - Illustration of the ParchmentProphet Report Generation process

Moving from generation to analysis

Despite the fact that the ParchmentProphet framework facilitates the generation of reports aligned to a desired schema, using input information designed to address our research questions, we still face one key limitation. Even with 1-shot prompting, no LLM understands our industry, house style, or the vocabulary that we use in our reports.

Although this framework addresses the issues and limitations presenting us from achieving report generation, we must now demonstrate that we can push this further into providing analysis.

To accomplish this, ParchmentProphet supports (and encourages) fine-tuning the LLM for each discrete task it may perform. There are four key generation tasks:

Report Generation, where answers from your Research Questions are used as input to write your requested report sections. Fine-Tuning this model results in a substantial quality improvement in your final reports.
Claim Extraction, where your input data chunks are reviewed for relevant claims. Fine-Tuning this model improves its ability to identify nuance related to your industry or better assess bias.
Graph Extraction, same as the above. As discussed earlier, this is not currently used as part of Researched-RAG, but is still a useful aid for an analyst writing a report.
Answer Generation, the model responsible for creating answers to your Research Questions based on relevant claims. Fine-Tuning this model is key to improving the quality of Researched-RAG, which reduces the risk of hallucinations and ensures your final report contains the most relevant information.

Back-Training

Fine-Tuning an LLM requires an input and output. For the purposes of report generation, it is common to retain your final report but not store your original research. This may be because of privacy or confidentiality concerns, or because the information was not captured (such as an in-person meeting where hard-copy notes were taken).

However, with the ParchmentProphet framework, it is possible to create an initial version of a fine-tuned model even if you lack the original research. This is because the Research Questions that you must define document the information required to write your report. It is therefore not a coincidence that your end report often contains answers to those questions in some form. This means we can take a final report, pass it through Researched-RAG as input data, and create a generated version that we can fine-tune using your human-authored content.

This process works for all four models identified above, and is an excellent way of "seeding" a version of ParchmentProphet that will work for your organisation.

To illustrate the power of fine-tuning, the sections below contain examples from three different model version.

Example Generations

The examples below demonstrate different generations of the same Report Type, using the same Research Questions and input data. The Report Type in these examples contained a simple instruction to draft a "Client Requirements" slide for a proposal deck.

The first example is a generation from a completely untrained model. It provides an overly verbose output, which is almost entirely simply parroting back words and phrases contained within the input data. It does not meet the reporting objectives, as the output text is too large to fit on a single slide within our standard proposal deck.
The second example has been trained on Report Types that are irrelevant to the subject matter, and the output demonstrates greater adherence to our house style and has grouped the core themes into succinct sections.This is a significant improvement, but does not align the requirements against services that we actually offer.
The third example has been trained on a mix of Report Type, including three that were directly relevant. The key change here is that the model has now inferred the services we are capable of offering and mapped client requirements through to key services we could provide. This meets our reporting objectives and draws key conclusions that were not evident within the input data.

Each example has been further annotated to demonstrate key areas of strength or weakness.

All training occurred with 10 samples.

Untrained

This example does not meet the reporting objectives, as the output text is too large to fit on a single slide within our standard proposal deck.

LLM: ACME is transitioning from a startup phase to a more mature state, necessitating the establishment of their own cybersecurity function and infrastructure independent of their parent organisation. Their core requirements include:

Secure migration of IT systems and data to Azure and Google Cloud Platform (GCP)
Development of a comprehensive cybersecurity function, including interim CISO services
Compliance with various regulatory requirements
Implementation of advanced security measures for their machine learning-driven underwriting processes

To address these needs, we propose a multifaceted approach that includes:

Establishing secure cloud environments and managing data migration
Providing interim CISO services and building a robust cybersecurity team
Ensuring regulatory compliance through policy development and ongoing monitoring
Integrating advanced security practices into their machine learning and software development processes
Setting up a Project Management Office to oversee and drive these initiatives

Our tailored strategy aims to enhance Ki Insurance's security posture, ensure regulatory compliance, and support their strategic objective of becoming a fully independent and secure entity.

Trained on irrelevant Report Types

This example represents a significant improvement, but does not align the requirements against services that we actually offer.

LLM: ACME is an algorithmic insurance underwriter transitioning out of a startup phase. Your back office functions are currently provided through your parent organisation, and you are now looking to complete a secure migration of IT systems. This will also necessitate establishing your own cyber security programme.

Your core requirements include accelerating this migration, building a robust cyber security framework, and ensuring compliance with FCA regulations.

To assist with your migration, we can offer the following services:

Secure Migration: Facilitating the secure transition of ACME's IT systems and data to new environments in Azure and Google Cloud Platform (GCP).
Cyber Security Function Development: Establishing a robust cyber security framework, including interim CISO services and the preparation of an RFP for long-term security solutions.
Regulatory Compliance: Ensuring adherence to the FCA's cybersecurity guidelines and other regulatory requirements.

Trained with a mix of relevant and irrelevant Report Types

This example meets our reporting objectives and effectively documents the client requirements and how we can help.

LLM: As ACME separates from their parent organisation, you wish to ensure the maintenance of top-level accountability for your cyber security programme and the continued provision of operational security services. You may also require assistance building out a new cyber security programme in the intermediate term (6+ months).

We have outlined below two potential service areas we assess would offer the most value and help ensure a smooth transition from your parent organisation. These services include:

vCISO and Security Team Augmentation. We can provide a dedicated vCISO to ensure top-level accountability and ownership of your cyber security programme, and supplement your operational capabilities by resourcing security engineers, offensive security expertise, and threat intelligence services.
Target Operating Model Implementation. We can provide the toolkits, frameworks, and resources required to rapidly create a new security programme, with a specific focus around provisioning secure cloud environments.

5. ParchmentProphet: Roadmap

The current ParchmentProphet codebase is split between a backend Python library, and a front-end Streamlit application. This provides a working proof-of-concept application, as shown below.

There are two high-priority next steps for the core ParchmentProphet project.

Support multi-tenancy and allow open user registration to the existing Streamlit-based PoC. This is still a step before the MVP, but would allow more users to experiment openly with the platform.
Re-architect the framework into an API-first platform, and incorporate a new front-end. This would be what we consider the MVP.

Other priorities

Self-Hosted LLMs. Most use cases for reporting involves sending sensitive or confidential data through to LLMs. Self-hosted models must be supported for this to be viable for most organisations.
Fine-Tuning Recommendation Algorithm. For fine-tuning Claim Extraction, Entity Extraction, and Answer Generation, the Streamlit application currently only support training on the first two generations from each input document on each reporting project. This is due to UI restrictions. To solve, we want to implement an algorithm designed to recommend the top-k best candidates to submit for fine-tuning. This would ensure that fine-tuned data covers more complex situations, different report and document types, and is more representative of your use case.
Claim Optimisation. Claims are the integral part of ParchmentProphet's Researched-RAG implementation; however, they are currently quite inefficient, overly verbose, and result in quite a lot of unnecessary duplication. This needs to be optimised.
Support for Structured Reports. In early versions of ParchmentProphet, there was full support for report sections that contain structured data. This could include a table, or a series of tables (as is often the case for "Report Findings", where each table much have Description, Impact, and Recommendation keys). This needs to be re-incorporated, as many reports require this functionality.

6. How you can help

If you are intrigued with ParchmentProphet and want to know how you can help, take a look through the points below.

Email me at james@parchmentprophet.com and ask for a private instance of ParchmentProphet. I am hosting free instances of the solution for experimentation and testing for those with interesting use cases. You'll need to bring your own OpenAI API key, for now!
If you want to subscribe to updates about ParchmentProphet, join our mailing list here.
Invite me to come and talk about ParchmentProphet or AI to your company, group, or meetup.