Highlights from OpenAI, Microsoft, and Google's Latest Innovations

OpenAI, Microsoft, and Google have recently made significant announcements, each unveiling advancements in AI technologies. These updates highlight the rapid development in AI capabilities, offering exciting new tools and functionalities for various applications.

I wrote about all of these separately on LinkedIn but here's a high level summary of all 3 events.

OpenAI's GPT-4 Omni Announcements

OpenAI recently introduced several exciting updates and new capabilities at their GPT-4 Omni event.

GPT-4 Omni: This new flagship model offers GPT-4 level intelligence with faster performance and reduced costs. It improves capabilities across text, voice, and vision, making it highly versatile. GPT-4 Omni can now understand and discuss images better than previous models, understanding the context of a persons surroundings and offering conversation based on that. Future updates are due to support real-time video conversations - which sound very interesting.
Language Capabilities: Enhanced across more than 50 languages, GPT-4 Omni improves efficiency and lowers costs, especially in languages with previously high token usage.
Voice and Vision: The model integrates advanced voice and vision capabilities, allowing for natural, real-time voice interactions and detailed image analysis.

New Features and Tools:

Desktop App: A new ChatGPT desktop app for macOS has been launched, with a Windows version coming soon - reflecting where their users are. This app integrates seamlessly with other tasks on your computer, allowing instant questions and screenshot discussions directly in the app.
Advanced Tools for Free Users: GPT-4 Omni brings advanced tools to ChatGPT Free users, including data analysis, chart creation, photo discussions, and file uploads for summarisation, writing, or analysis. Free users now have access to GPT-4 level intelligence with some usage limits. Is this the big play to get more data now that existing available datasets are starting to run dry?

I think this shows OpenAI's efforts to enhance the accessibility and utility of their AI tools by improving performance, efficiency, and versatility across various applications - moving from just being an API service that companies would license, to being a whole stack.

Microsoft's AI Enhancements at Build

Microsoft's recent event announced new Surface devices and advanced AI capabilities, branded as CoPilot+. This approach focuses on handling tasks locally to enhance performance and reduce cloud reliance.

Key Highlights:

OS-level AI Integration: This allows applications to leverage appropriate APIs for complex tasks without embedding AI directly. For example, a user could say, "Find all my messages to Simon mentioning burritos and then find songs to clip together to make those messages," with OS-level AI managing the interactions - no need for each application to bake in AI. This simplifies app development, ensures consistent AI performance, and enhances user experience.
Potential for Legal Tech: This centralised AI model could be particularly useful for legal tech solutions, enabling vendors to provide seamless integration and interoperability across applications without embedding AI within each app.

Phi-3 Family Enhancements:

Phi-3 Vision: A 4.2 billion parameter multimodal model with both language and vision capabilities.
Phi-3 Mini: A 3.8 billion parameter language model available in 128K and 4K context lengths.
Phi-3 Small: A 7 billion parameter language model available in 128K and 8K context lengths.
Phi-3 Medium: A 14 billion parameter language model available in 128K and 4K context lengths.

The vision models excel in extracting content from various document types, including handwritten ones, and turning graphics from reports into textual insights. The mini, small, and medium models bring edge AI to the masses, with the mini model achieving GPT 3.5T levels of performance.

Google's Announcements at I/O

Google made a slew of announcements regarding their Gemini and Gemma model families at Google I/O.

Gemini Model Family:

Gemini 1.5 Pro: Supports 2 million tokens, with improvements in translation, coding, and reasoning.
Gemini Flash: Optimized for high-frequency tasks, offering 1 million token capacity at a lower cost.
Gemini Gems: Custom GPTs.
Gemini Live: Voice-based two-way conversations, leading into Project Astra.
Gemini Suite: Comprises Ultra, Pro, Flash, and Nano models.

Gemma Model Family:

Gemma 2: Now up to 27B, offering near-Llama-3-70B performance.
PaliGemma: Vision-language model inspired by PaLI-3.

Two particularly intriguing announcements include Gemini Live, demonstrated with a new Google Glass prototype, and the PaliGemma model.

These announcements from OpenAI, Microsoft, and Google mark an intense month of GenAI advancements, each aiming to lead in the AI space. These developments promise new utilities and capabilities that can support various tech initiatives, including legal tech solutions.