AI Legal Tech & Role Modeling Legal Compliance
- Martha Neustadt
- 2 days ago
- 8 min read
Does your solution meet the legal bar?
The Transparency Imperative: Legal Tech Must Lead the Way.
The world is jumping on the AI bandwagon. Every day, new solutions are being brought to market utilizing AI in new ways. Some of these are established bulwarks of the industry, adding AI in any way possible (sometimes in name only) while many, many more are startups seeking to ‘disrupt’ the way business has always been done.
Lawyers are not the stereotypical early tech adopters – that fields seems far more filled with those who are trying to replace lawyers. They are, however, enthusiastic adopters of the law. So, while legal practice is definitely ripe for disruption by technology, lawyers are simply not going to easily adopt technology that cannot easily demonstrate compliance with legal obligations.
This is particularly true in the in-house legal practice. The in-house practice differs from private practice not just in the business model, where in-house lawyers prize efficiency and repeatable processes, but also in the impact. The in-house legal team is more than a legal advisor they are also business partners and role models. How effective can they be in that role if they do not hold themselves to the same standards that they hold their clients when it comes to AI adoption?
Meeting a Changing Legal Standard
But what is the legal standard required for AI? Is there one?
Of course not – particularly for any company that operates in more than one global jurisdiction. Countries around the world are drafting, adopting, and enforcing a variety of laws and regulations around the responsible development and adoption of AI. It’s enough to make one want to toss their hands in the air and say “why bother?”. Among these myriad rules, though, there are definite themes.
Focusing on themes and common values rather than minutiae is a long-established strategy for developing legal compliance programs designed for growth. This is no different for AI than it has been for privacy, cybersecurity, or other areas of digital and technology law where technology advancement often outpaces regulatory efforts. Like privacy and cybersecurity, there are established and accepted common themes around responsible AI. The most widely used reference is the OECD’s AI Principles.
Developing AI legal compliance programs is a topic for another space (see the blog for more info if you are interested). When evaluating technology solutions – especially legal tech solutions – we can focus on one value with particularity: TRANSPARENCY.
Why is Transparency Key?
Transparency is the cornerstone of legal compliance within a supply chain. Transparency is what allows an organization to adopt third party AI systems in a way that allows the organization to meet its own policy and values commitments. Transparency should also provide insight into security practices, AI fairness and bias mitigation, data protection and processing protocols, accountability, and other areas of legal compliance. And not just at the vendor or its customer, but how those two parties share responsibility for compliance within modern cloud-based information architectures. Digital infrastructure is no longer simple enough to say “Vendor does X. Customer does Y.” and transparency is key to understanding all of the parties and their responsibilities within a digital supply chain.
Transparency is also critical for evidence-based compliance programs. These are particularly important for in-house legal teams because they are expected to role model the advice they provide to their one and only client. Unlike law firms, which serve many clients with many different compliance programs, in-house counsel serves only one client (their employer) and are subject to the policies and procedures of that client – often policies and procedures that the legal team helped write! Violating a policy that you helped write is neither good optics nor a good idea; it sets up a culture that focuses on exceptions and loopholes rather than a cohesive commitment to the client’s business goals and values. This is a very hard environment in which to provide acceptable legal advice.
How Transparent is Transparency?
“AI is a black box – no one really knows how it works.” – AI sales whenasked about transparency.
Have you ever received this response when asking someone to explain how their AI system works? This is what I call the “you’re not smart enough to understand this stuff” answer, and it is absolute rubbish. But to call out the BS when you see it, it’s important to understand the basics of how large language models (LLMs) actually work. So, let’s break it down with a goal to break through the gate-keeping technical jargon. How do LLMs actually work?
Tokenization.
This is where the LLM breaks down “language” (e.g., words in context of communication) into "tokens". A token can be a word or even a part of a word like single characters, data points, or even pixels. This is how language is turned into 1s and 0s, which is the only way that computers know how to process data.
Transparency with respect to tokenization methodology is complex and is often very technical. The vast majority of AI systems on the market today, however, are not LLMs but systems that use LLMs inside software, services, and platforms. Simply knowing which LLM is used can provide you with information on the tokenization within that model. For example, the beta product of V4 Final is designed for use on Open AI’s ChatGPT 4o and 4o-mini, and Open AI provides transparency tools such as their Tokenizer and tiktoken package for Python. For customers who want to use other LLMs, we can help you find the relevant information on tokenization and understand how you can use those LLMs instead of our default.
Embedding.
Key to the process of retaining context is that the 1s and 0s stay in context (via the creation of “vectors” for those of us who had to endure vector calculus), so the computer understands how they all related to each other. “Embedding” is the process by which tokens become vectors.
Like tokenization, information about the embedding process of an AI solution is more likely to be provided directly by the LLM provider, unless the AI solution is using a specialized model. Again as an example, V4 Final is not currently using a specialized model, so the standard embedding of text-embedding-ada-002 is used. OpenAI tests this model to determine how good it is at extracting context in real world scenarios and posts the results on a public GitHub page. Solutions running on different LLMs, or that create a specialized LLM rather than rely on the standard embeddings, this information could be different and is likely to have different scores.[1] Transparency simply means you know what is being used and how it performs.
Attention.
This is where the LLM decides what to focus on within those vectors. This is the process that teaches the LLM to understand that “bank” has different meanings when used in the sentences “the bank was closed on Monday” and “the boat moved away from the bank”. Using “self-attention,” it learns which words relate to each other, across the entire sequence and is able to determine what is relevant information in different circumstances.
How tokens and vectors are weighted in an LLM in as part of the attention is generally done at the LLM level again, and because most commercial AI tools are built on a few underlying LLMs, transparency is primarily related to disclosures about the underlying resources in the digital supply chain that creates the product. In the case of V4 Final, we are not currently building specialized LLMs and rely on the “transformer attention” model used by OpenAI in the ChatGPT 4o and 4o-mini. While there is a lot of information on what transformer attention is and how it is implemented, most of this is highly technical and may not be relevant to legal compliance obligations (unless, of course, some court in the future holds that it inadequate for a specific legal use case scenario – and that hasn’t happened…yet).
This is also where people question whether specialized LLMs – built on specific attention models designed to focus on legal matters – are necessary for AI-dependent legal tech. As with all things legal, the answer is likely “it depends”. At V4 Final, we focus on helping in-house lawyers gather the necessary facts and business information to make risk-based advisory decisions aligned with business goals and values. We know that most legal advice provided by in-house counsel needs to be good, not perfect. Standard attention models, combined with meticulously calculated detailed prompt engineering built on decades of experience, yields great results without a need to be deeply versed in the subtle nuances of academic legal theory. This is why we don’t recommend using it to draft a legal brief for a bet-the-company litigation – best to hire a great outside counsel for that.
Parameters
Parameters are the settings within an LLM that define the model’s behavior. There are a variety of different parameters, with “weights” and “biases” being the most well-known ones. Weights control how strongly different tokens are linked and biases are parameters that “nudge” output in a known direction (for better or for worse). Parameters are set at the transformer model of the LLMs and may be “fine tuned” with adaptive layers. Clear information and transparency around the transformer model and adaptive layers is key to understanding things like accuracy and bias in a system.
Good prompt engineering can also help guide these weight and bias parameters, even though it does not change them. As noted above, the V4 Final beta solution is built on the standard ChatGPT 4o and 4o-mini transformer. We can also utilize another LLM that a customer already has. We do not have additional adaptive layers at this time, opting instead to use detailed proprietary prompts dedicated to multiple single-tasked LLMs to create a robust product. We are always considering feedback and ways to improve the product, which may include adaptive layers in the future.
A final note on parameters is that there will always be a risk that harmful biases are not overcome or LLM-generated answers are incorrect. This is one reason why another common and important theme in global AI regulation is the need for qualified human oversight and feedback mechanisms to address harmful bias – but that’s a topic for another blog!
Is the Black Box Finally Dead?
No. Of course not. There is still a challenge that LLM output is simply not sufficiently understood at the foundational neural network layers to guarantee that an LLM will produce the same answer every time it is asked the same question. And that is the true nature of the “black box problem” – that LLM output is not as reliably consistent as traditional software output. It does not mean that LLMs (and therefore AI) cannot be transparently explained at all.
The V4 Final Difference
V4 Final is built on a foundation of the needs for in-house counsel, including the need to meet the corporate policies and requirements of their employer and sole client. We know the pressure to both adopt AI and be responsible about it, and we know the hoops to jump through and the requirements you advise your internal clients on when they seek to purchase AI solutions. We want to help you role model that behavior through our transparency – because the business value of the legal team is in leadership as well as advice.
[1] Want to really nerd out? Check out the MTEB Leaderboard - a Hugging Face Space by mteb to see how models compare.
Comments