GPT-o1, some thoughts

GPT-o1, some thoughts

In legal tech we're constantly on the lookout for tools that can streamline our workflows, enhance our analytical capabilities, and ultimately, improve our service to clients.

The recent release of GPT-o1 has certainly not gone under the radar in legal and every vendor and tech are looking at it to see if it can bring immediate benefits.

"OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA)."

It's a big jump capability wise, but these enhancements come with their own set of challenges and questions, particularly regarding transparency, cost, and real-world applicability - especially for us in legal.

The Hidden Chain of Thought

One of the most significant shifts in GPT-o1 is the obscuring of its detailed chain of thought (CoT) process. Over the past two years CoT has been used by a lot of user in our AI applications, offering users a window into how the model arrives at its conclusions. This transparency has been key in fields that require rigorous scrutiny and justification, such as legal research and analysis.

OpenAI's decision to keep the detailed CoT hidden in GPT-o1 is a brand new capability for an LLM.

"o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user."

While they argue this move protects the integrity of the model's thought process and maintains their competitive edge, it raises some serious questions for professionals who rely on this transparency to validate and trust the model's outputs.

The implications of this change are far-reaching, on one hand by integrating the chain of thought reasoning directly into the model, GPT-o1 demonstrates better safety alignment with human values. This embedded approach to safety rules shows a better understanding of ethical boundaries, which should in theory, reduce the likelihood of generating harmful or inappropriate content. Now for general public use, this seems like an obvious step in the right direction, but for specialised fields like law, where the ability to scrutinise every step of the reasoning process is crucial, this obfuscation could be a significant setback.

The Cost Factor

Adding another layer of complexity to all of this is cost - and as much as I'd like to ignore the concept of ROI when play around with tools, GPT-o1 comes with a substantially higher price tag compared to its predecessor, GPT-4o.

GPT-o1 Preview:

  • Input Tokens: £11.85 per 1M tokens
  • Output Tokens: £47.40 per 1M tokens

GPT-4o:

  • Input Tokens: £3.95 per 1M tokens
  • Output Tokens: £11.85 per 1M tokens

This steep increase does then raise questions about the return on investment, especially when weighed against the reduced transparency - for firms and professionals processing large volumes of data, this could significantly impact budgets without necessarily providing proportional benefits.

Now we've seen costs plummet over the past year or so, so it's probably just a matter of time before we see the same with o1 and in the grand scheme of things compared to humans handling the same tasks it's still ridiculously cheap.

Improved Multilingual Capabilities

Ignoring the cost issues though, GPT-o1 demonstrates significantly improved multilingual capabilities over GPT-4o.

"There are huge improvements in languages such as Arabic, Bengali, and Chinese, these enhancements makes GPT-o1 better suited for tasks that require a much more robust understanding and processing of various languages, which moves the current AI wrappers away from just being aimed at English language speakers."

Here's where things get interesting for firms with a global footprint. GPT-o1's enhanced multilingual capabilities are well very impressive, we're looking at substantial improvements across the board, from Arabic to Mandarin.

This shift away from a primarily English-centric approach is a welcome change, opening up new possibilities for global communication and understanding. For international law practices, this could be a game-changer, potentially streamlining cross-border legal work and improving the efficiency of dealing with multilingual legal documents.

Now we can't get ahead of ourselves though, the ability to understand and generate text in multiple languages is impressive, legal language is complex and often wooly. The nuances of legal terminology and the potential for misinterpretation mean we'll need to approach this capability with cautious optimism - it's a really promising feature, but one that will require rigorous testing in real-world legal scenarios before we can fully rely on it.

Results from the MMLU test set:

Language o1-preview GPT-4o o1-mini GPT-4o-mini
Arabic 0.8821 0.8155 0.7945 0.7089
Bengali 0.8622 0.8007 0.7725 0.6577
Chinese (Simplified) 0.8800 0.8335 0.8180 0.7305
English (not translated) 0.9080 0.8870 0.8520 0.8200
French 0.8861 0.8437 0.8212 0.7659
German 0.8573 0.8292 0.8122 0.7431
Hindi 0.8782 0.8061 0.7887 0.6916
Indonesian 0.8821 0.8344 0.8174 0.7452
Italian 0.8872 0.8435 0.8222 0.7640
Japanese 0.8788 0.8287 0.8129 0.7255
Korean 0.8815 0.8262 0.8020 0.7203
Portuguese (Brazil) 0.8859 0.8427 0.8243 0.7677
Spanish 0.8893 0.8493 0.8303 0.7737
Swahili 0.8479 0.7708 0.7015 0.6191
Yoruba 0.7373 0.6195 0.5807 0.4583

Enhanced Safety and Alignment

The integration of enhanced safety features and improved alignment with human values in GPT-o1 is a positive development, and for the general public consuming this through ChatGPT it makes perfect sense.

"By embedding safety rules within its reasoning process, GPT-o1 shows a better understanding of safety boundaries which should reduce the likelihood of generating harmful or inappropriate content, which for the general public use seems like a great idea."

GPT-o1's improved safety features and alignment with human values sound great on paper. The idea is that it's less likely to generate harmful or inappropriate content.

However, this raises some intriguing questions for legal applications. How do these safety measures interact with the need to explore all aspects of a case, including potentially unsavory ones?

Could these embedded ethical guardrails inadvertently limit the AI's ability to assist in cases that deal with complex, morally ambiguous situations – which, let's face it is a significant portion of legal work? It's a topic I've covered before when talking about uncensored models - without insight into the thought process it's hard to see how it could be used.

While the improvements in safety and multilingual capabilities are undoubtedly impressive and could pave the way for more responsible and inclusive AI applications, the reduced transparency and increased costs pose significant challenges, particularly for industries like law where the ability to trust and verify outputs is paramount.

The increased cost without corresponding transparency may not justify the switch for legal applications. We'll need to weigh the benefits of things like enhanced safety and multilingual support against the drawbacks of reduced transparency and higher expenses. It's not just about having the most advanced model - it's ultimately about having the right model for the task at hand.

So...

So, where does this leave us? GPT-o1 is undoubtedly impressive, pushing the boundaries of what AI can do... but in the legal tech world, the most advanced tool isn't always the right one.

  • The obscured reasoning process is going to be a significant hurdle for a profession built on precedent and explanation.
  • The cost increase is substantial and needs to be carefully weighed against tangible benefits.
  • The multilingual capabilities are exciting but need to be thoroughly vetted for legal accuracy.

As legal tech professionals, our role isn't just to adopt the latest technology – it's to critically assess how these tools fit into the world of legal practice. GPT-o1 offers some intriguing possibilities, but it also presents challenges that we need to grapple with.

I think ultimately what we really need is a model that combines the advanced capabilities of GPT-o1 with the transparency and cost-effectiveness required in legal applications - which I'm sure the big AI providers are looking at.

In the meantime I'd say approach GPT-o1 with both curiosity and caution. Test it, push its limits, but always with an eye on how it aligns with the fundamental principles of legal practice. After all, in legal tech it's not just about keeping up with the latest innovations – it's about shaping them to serve the needs our teams & clients.

Notes

Quotes and MMLU data results from: https://openai.com/index/learning-to-reason-with-llms/