Ainsider AI: The most important AI news and releases from the past week
OpenAI – New Models: GPT-4.1, o3, and o4-mini
OpenAI has introduced three groundbreaking new models: GPT-4.1 (available exclusively through the API), o3 (the most intelligent reasoning model capable of "thinking in images" and autonomously using tools), and o4-mini (a faster version with advanced capabilities).
GPT-4.1: Accessible only via the API, featuring enhancements in coding and understanding long contexts.
o3: The most advanced reasoning model, capable of "thinking in images" and autonomously utilizing tools.
o4-mini: A faster and more economical version with similar capabilities, available in ChatGPT.
Key Features of the Models
GPT-4.1:
-
Coding: Significantly improved in creating and debugging code, as well as adhering to diff formats.
-
Longer Context: Supports up to 1 million tokens, ideal for large datasets.
-
Multimodality: Understands text, images, and video.
-
Pricing: $0.15 per million input tokens, $3.50 per million output tokens with reasoning.
o3:
-
Reasoning: "Thinks" before responding, enhancing quality and accuracy.
-
Image Thinking: Analyzes images, such as diagrams and drawings.
-
Tools: Autonomously uses ChatGPT functions like web browsing and image generation.
-
Benchmark Results: 92.7% in AIME 2025 (mathematics), 69.1% in SWE-Bench Verified (coding), 82.9% in MMMU (visual reasoning).
o4-mini:
-
Economy: Faster and cheaper, with similar capabilities to o3.
-
Visual Reasoning: Interprets images and performs visual tasks.
-
Tools: Utilizes ChatGPT functions.
-
Benchmark Results: 68.1% in SWE-Bench Verified (coding).
-
Variants: Standard and "o4-mini-high" with a higher level of reasoning.
Google – Gemini 2.5 Flash
Gemini 2.5 Flash is Google's new groundbreaking AI model that combines advanced reasoning capabilities with cost-effectiveness and speed. Its key features, such as hybrid reasoning, massive context support, multimodality, and integration with Google tools, make it a versatile tool for developers and users. The model is available for free in a preview version, allowing easy exploration of its capabilities.
Key Features and Capabilities:
-
Hybrid Reasoning Model:
-
The first fully hybrid reasoning model, enabling it to "think" before responding, improving performance and accuracy.
-
Developers can toggle the "thinking" feature on or off and set a "thinking budget" (from 0 to 24576 tokens), allowing customization of quality, cost, and latency for specific tasks. The model autonomously assesses task complexity and adjusts thinking intensity if no budget is specified.
-
-
Massive Context:
-
Supports 1 million tokens in the input context, enabling the processing of very large datasets like long documents, codebases, and system logs.
-
-
Multimodality:
-
Understands and processes various data types, including text, images, audio, and video. It can generate images and detect objects in photos (e.g., by generating bounding boxes or segmentation masks).
-
-
Code Execution:
-
Can write and execute Python code directly, which is extremely useful for developers.
-
-
Cost Efficiency:
-
Priced at $0.15 per 1 million input tokens and $3.50 per 1 million output tokens with thinking enabled. The model is considered best-in-class for its price-to-performance ratio, placing it on the "Pareto frontier" for costs and efficiency.
-
-
Benchmark Results:
-
Excels in diverse tasks, including:
-
Humanity's Last Exam (no tools): 18.8%.
-
GPQA diamond (one-shot): 84.0%.
-
Mathematics AIME 2025 (one-shot): 86.7%.
-
Visual Reasoning MMMU (one-shot): 81.7%.
-
Long Contexts MRCR 1M: 83.1%.
-
-
These results demonstrate the model's speed, efficiency, and precision in complex tasks.
-
-
Integration with Google Workspace:
-
Seamlessly integrates with Google products like Gmail, Docs, and Sheets, facilitating user workflows within a familiar environment.
-
Kling AI – Kling 2.0
Kling AI, a company specializing in AI-powered video generation, has introduced Kling 2.0, featuring improvements in command understanding, more natural character movements, and a new Multi-Elements Editor for easier video editing.
Advanced Command Understanding
Kling 2.0 appears to better interpret complex user commands, especially those involving sequential actions and camera movements. For example, the model understands both technical terms like "85mm lens with shallow depth of field" and general instructions such as "slowly zoom in on the character." This enhancement allows users to act like directors, precisely controlling the video content.
Dynamic and Natural Movements
A key improvement is the enhanced movement dynamics. Characters in Kling 2.0 exhibit a wider range of motions that are fluid, natural, and highly detailed. Particularly noteworthy are the walking animations, which show correct foot placement, react to surface textures, and maintain consistency through sequences up to 10 seconds, eliminating typical AI "jitter." For instance, the model preserves details during complex actions, providing an immersive viewing experience.
Cinematic Visual Quality
Kling 2.0 generates video in cinematic quality, with resolutions up to 1080p, rich details, and professional lighting. The visual aesthetics have been refined to make generated content look like professional productions rather than typical AI-generated material. Improved facial expression capabilities allow for realistic movements and expressiveness, giving characters a professional level of "acting." The model maintains visual style consistency, whether starting from text or an image, which is crucial for a professional look.
Multi-Elements Editor
An innovative feature is the Multi-Elements Editor, which allows users to add, replace, or remove video elements using simple text prompts or images. For example, a user can generate a video and then change the background or add a character directly within Kling 2.0, without needing external software. This feature seems to offer remarkable flexibility and control over the editing process.
Consistent Style and Quality
Kling 2.0 ensures visual style consistency, which is crucial for maintaining a professional look and feel. Regardless of whether the user starts with text or an image, the model generates video with a uniform style, making it easier to create cohesive and polished content. This improvement appears particularly important for creators who need to maintain brand or aesthetic consistency in their projects.
Canva – Visual Suite 2.0
Canva has announced the launch of Visual Suite 2.0, its largest update since the company's founding in 2012, as stated on the Canva website. New features include:
-
Canva AI: A multimodal assistant that designs presentations, generates images, writes text, and edits photos, allowing users to describe their needs, e.g., "Create a presentation for a smartwatch brand targeting Gen Z." Details on the Canva website.
-
Canva Sheets: Integrates spreadsheets with data visualization, offering tools like Magic Insights and Magic Charts for creating interactive charts. It allows importing data from Google Analytics, HubSpot, and Statistica, facilitating analysis, as described on the Canva website.
-
Magic Studio at Scale: Generates hundreds of personalized assets in minutes by combining spreadsheets with AI tools like Magic Write, Translate, and Background Remover, accelerating the creation of marketing campaigns and internal communications.
-
Canva Code: Enables the creation of interactive experiences without coding, making it easier to add advanced features to projects.
Visual Suite 2.0 blends creativity with productivity, eliminating the need to switch between tools, making Canva a versatile platform for professionals and amateurs alike.
Microsoft – Copilot Enhancements
Microsoft has introduced enhancements to Microsoft Copilot.
Key changes include:
-
Copilot Vision in Edge: A feature that allows Copilot to analyze website content, such as summarizing recipes, providing advice on job applications, or recommending products on sites like Amazon. Available for free to all Edge users with a Microsoft account, as confirmed in a ZDNET article dated April 18, 2025. Copilot Vision does not store user data, ensuring compliance with Microsoft's privacy policy, as described on the Microsoft support page.
-
Studio: A tool for creating AI agents that can interactively click and type text in user interfaces, both on desktops and in web applications, opening new possibilities for automation.
These enhancements make Copilot more versatile, especially for business and individual users.
xAI – Grok Studio and Memories
xAI, Elon Musk's AI company, has introduced two new features for its AI assistant, Grok, as announced on the xAI website and in an X post by Alvaro Cintas. These are:
-
Grok Studio: A new interface for co-creating documents, applications, and games in the browser, offering a split screen where Grok builds projects alongside the user, remembering conversation context. Integration with Google Drive allows file uploads, facilitating work on documents, spreadsheets, and presentations, as described in an Engadget article dated April 16, 2025.
-
Memories: A feature that allows Grok to remember details from previous conversations, providing personalized responses. Users can manage what Grok remembers, ensuring transparency, as stated in a TechCrunch article dated April 17, 2025. These features are in beta, available on grok.com and mobile apps for iOS and Android, excluding the EU and UK due to privacy restrictions.
OpenAI – Codex CLI
OpenAI has presented Codex CLI, a local coding agent that runs in terminals and can read, edit, and execute code using three approval modes, powered by the o4-mini model. This tool accelerates programming, as described in a reply to an X post.
ByteDance – Seaweed-7B Video
ByteDance has introduced Seaweed-7B, a video generation model with 7 billion parameters that outperforms larger models at lower training costs, making it attractive for video creators, as mentioned in a reply to an X post.
Anthropic – Claude Autonomous Research
Anthropic has added the Autonomous Research feature to Claude, enabling it to search Google Workspace, plan multi-step queries, and provide answers with cited sources, facilitating access to reliable information, as described in a reply to an X post.