AI Mega Update: New Tools and Huge News

AI news is relentless; it never seems to slow down even a little. There have been huge announcements from all the biggest players, tons of new tools to try out, and, like always, some weird ones. Before we jump into the really useful stuff you can experiment with yourself, I want to start with a quick run-through of some AI-integrated robots because a bunch of announcements have flooded in about them.

First, and not the craziest, is this Jizai Arms real-life Doctor Octopus. Looks like we got some competition! These are by a Japanese robotics company called Gizai Arms. They’re AI-powered and fully controllable by the user. The wearable base unit has terminals for up to six detachable robotic arms. You can control them, and it looks like we may see some interesting cyborgs in the near future.

Next is Sanctuary AI, with their humanoid robot, Phoenix. Their mission is to create the world’s first human-like intelligence in general-purpose robots to help us work more safely, efficiently, and sustainably. Carbon is the name of their AI Control System. It uses deep learning and reinforcement learning coupled with modern LLMS to translate natural language into real-world actions. There have been a lot of advanced humanoid robots in the works for a while, and now they’re starting to incorporate large language models.

Tesla had an announcement about Tesla Optimus this week as well. They’re walking around and exploring, and they teach them more and more complex tasks. They also released some examples from their full self-driving system. Tesla gets left out of the AI conversation a lot, but they have super Advanced AI models and have for a while now.

A lot of these robots are being developed to accomplish tasks in various workplace settings, but it’s only a matter of time before they’re made for general consumers and around the house, although they won’t be in humanoid form at first.

One example is this leak from Amazon where they’re implementing it into their Astro home robot. They’re calling it Burnham Astro. The first version wasn’t a big success, but this new version may end up with some useful capabilities. Amazon also had a job posting for AI-powered search. They talked about reimagining Amazon search with a new interactive conversational experience.

Enough about robotics. It’s fun to check out, but let’s talk about some stuff you can actually test for yourself. ChatGPT plugins are now available to anyone, but I’ll talk about that in a little bit.

Google has “Text to Music.” I did a ton of these, and it’s been really fun. You can rate which one is better to help the model develop, and you can download it if you want. It does the best job of any music generation platform I’ve seen, so I’m excited to see how this develops as they get feedback and improve from this test model.

Hugging Face posted something called DragGAN, and it’s really crazy. You can download the code and experiment for yourself. You drag the handles and manipulate the pose, shape, expression, and layout of all sorts of creatures and objects.

InVideo is a tool I started using for editing shorts. It’s a really intuitive and simple video editor with tons of templates and fast tools. I use Adobe Premiere for my long-form videos because of all its capabilities, but it is extremely time-consuming. A platform like InVideo makes it really fast, so I’ve been implementing it into my workflow. They have a full script-to-video feature where you input your scripts and it will generate the entire video, finding relevant stock footage, adding transitions, and even adding the voice for you. It’s pretty incredible and gives you a great starting point in just a couple of minutes. There’s a waitlist, but you can sign up to test it out.

Midjourney 5.1 came out, and a lot of images have gotten even better. Depending on what style you’re trying to create, they also announced that they’re working on text to 3D shapes.

OpenAI also announced their text to 3D called “Shape.” It’s not great yet, but it’s a good step in that direction. Nyric released a preview for their AI World Generation platform, where you create your whole 3D world and environment with text prompts. This isn’t available to test yet, but it looks completely mind-blowing to think that we are almost to a time where you’ll be able to generate any type of personalized game for yourself with some text prompts.

Unreal Engine 5.2 released a machine learning deformer sample. It’s used to create high-fidelity real-time character and deformations driven with full muscle, flesh, and cloth simulation. It is absolutely insane the amount of realism they’re able to create.

Another tool I’ve started using personally is 10Web, a website builder that’s incredibly easy to use and based on WordPress. It has an AI assistant built-in to help with writing and brainstorming, and the 10Web Builder is based on Elementor, so you can just use drag-and-drop editing to customize your site. You have full access to the WordPress dashboard, which is why I chose it for my site. Being built on WordPress was one of the main reasons since it’s the most used and trusted platform with endless tutorials and resources to learn from.

Enough about tools; let’s talk about a really cool example of stable diffusion being used in an ad by Coca-Cola. It’s done in a similar way to the technique Corridor Crew used to create their anime. Another one in the realm of poison, Wendy’s made a deal with Google. They’re working on a customized AI for Wendy’s drive-through ordering. They think it will give customers a better ordering experience with less miscommunication. The CEO said it will be very conversational, and you won’t know you’re talking to anybody but an employee. With the right training, I assume it will be good at this. There’s already plenty of ways you can train models on data sets that are much more complex than a Wendy’s menu. Although I am curious how it will respond to things like a drunk person asking nonsensical questions and doing that pay-it-forward thing. “So, can you make that guy behind me pay for this?” One of the coolest things I saw was a simulation someone created with ChatGPT programmed into NPCs to help coach someone who isn’t comfortable meeting new people. It’s to help someone with social anxiety. I think there’s a lot of potential for other things in this realm, like practicing for a job interview or preparing for a debate. Now, let’s jump into the biggest ChatGPT update for a while, which is access to plugins and web browsing for all Pro users. To enable them, go to settings, click beta features, then switch them on. Then you click the drop-down on a new chat to use them. There are over 70 plugins available. There are ones like Scholar, where you can interact with peer-reviewed data instead of how, with ChatGPT, you get these kind of broad strokes answers and sometimes just make stuff up. With Scholar, you can ask for specifics about a paper, and it can analyze it and will link directly to its sources. Right now, it’s only linked up with Springer Nature sources, but that does cover a wide array of topics, and they’re also planning to add additional databases like PubMed. This stuff is all still new, but imagine incorporating this with another plugin like Wolfram. That one’s able to analyze data algorithms and computation across subjects. There’s some really amazing possibilities. Chat with PDF is really useful. It can analyze PDFs. You just give it the URL, and you can summarize it or ask it questions. The Show Me plugin lets ChatGPT create diagrams, which is really cool. They’re adding new plugins regularly, exploring them, and finding what is applicable to you and spending a little time to implement it can be huge. I am all about creating efficient systems and workflows. There’s a Benjamin Franklin quote, “For every minute spent in organizing, an hour is earned.” That’s the case even more so now. If you spend that earned time wisely, you can really compound your productivity and what you’re able to achieve. Access to web browsing has been rolled out to Pro users as well. I’ve had issues with it where it just takes a long time or completely fails, but the times it does work, it’s been awesome, and it’s still new, so I’m sure improvements will come quickly. This announcement about plugins and web browsing came right on the heels of Google’s I/O event. That’s where they announced all the new features they’re working on, and Bard has access to the internet already, and they announced very similar plugins. IBM also announced that they’re working on Watson, with a plan to release it in July. That feels pretty late to the party. Google had some really solid announcements. It was a four-hour presentation, and a lot of it’s not coming out for a while. They said most of it is expected by the end of the year. So I’m just going to do a few quick highlights. The Magic Editor for photos looks awesome. You can move things around or remove objects, and it generates the missing information. Tons of other tools do this, but it looks like it does a great job. The Immersive Google Maps looks amazing. It will generate cars on the road to simulate current traffic. You’ll be able to drag a slider to see what any area looks like at any time of day or weather. It’s a huge upgrade. The Universal Translator not only translates any language while making it sound like the original speaker, but it also matches their lip movements. They partnered with Adobe Firefly to create images in slides for Bard. They also have text-to-image, speech-to-text, and coding tools they’re working on. Really, it’s a lot of things we’ve already seen plenty of other companies doing, but just implemented within Google. And finally, they released this promo video partnering with Tato. It’s a really cool example of stable diffusion being used. It’s done in a similar way to the technique Corridor Crew used to create their anime. I found a breakdown of the process on Reddit. Basically, they composite a mix of actors’ real scenes and 3D animation and use stable diffusion as a kind of stylistic rendering. It’s actually just insane to see how much work goes into a commercial like this, but the amount of time saved using stable diffusion is huge, and it gives it a really cool style. Then another one in the realm of poison, Wendy’s made a deal with Google. They’re working on a customized AI for Wendy’s drive-through ordering. They think it will give customers a better ordering experience with less miscommunication. The CEO said it will be very conversational, and you won’t know you’re talking to anybody but an employee. With the right training, I assume it will be good at this. There’s already plenty of ways you can train models on data sets that are much more complex than a Wendy’s menu. Although I am curious how it will respond to things like a drunk person asking nonsensical questions and doing that pay-it-forward thing. “So, can you make that guy behind me pay for this?” One of the coolest things I saw was a simulation someone created with ChatGPT programmed into NPCs to help coach someone who isn’t comfortable meeting new people. It’s to help someone with social anxiety. I think there’s a lot of potential for other things in this realm, like practicing for a job interview or preparing for a debate. Now, let’s jump into the biggest ChatGPT update for a while, which is access to plugins and web browsing for all Pro users. To enable them, go to settings, click beta features, then switch them on. Then you click the drop-down on a new chat to use them. There are over 70 plugins available. There are ones like Scholar, where you can interact with peer-reviewed data instead of how, with ChatGPT, you get these kind of broad strokes answers and sometimes just make stuff up. With Scholar, you can ask for specifics about a paper, and it can analyze it and will link directly to its sources. Right now, it’s only linked up with Springer Nature sources, but that does cover a wide array of topics, and they’re also planning to add additional databases like PubMed. This stuff is all still new, but imagine incorporating this with another plugin like Wolfram. That one’s able to analyze data algorithms and computation across subjects. There’s some really amazing possibilities. Chat with PDF is really useful. It can analyze PDFs. You just give it the URL, and you can summarize it or ask it questions. The Show Me plugin lets ChatGPT create diagrams, which is really cool. They’re adding new plugins regularly, exploring them, and finding what is applicable to you and spending a little time to implement it can be huge. I am all about creating efficient systems and workflows. There’s a Benjamin Franklin quote, “For every minute spent in organizing, an hour is earned.” That’s the case even more so now. If you spend that earned time wisely, you can really compound your productivity and what you’re able to achieve. Access to web browsing has been rolled out to Pro users as well. I’ve had issues with it where it just takes a long time or completely fails, but the times it does work, it’s been awesome, and it’s still new, so I’m sure improvements will come quickly. This announcement about plugins and web browsing came right on the heels of Google’s I/O event. That’s where they announced all the new features they’re working on, and Bard has access to the internet already, and they announced very similar plugins. IBM also announced that they’re working on Watson, with a plan to release it in July. That feels pretty late to the party. Google had some really solid announcements. It was a four-hour presentation, and a lot of it’s not coming out for a while. They said most of it is expected by the end of the year. So I’m just going to do a few quick highlights. The Magic Editor for photos looks awesome. You can move things around or remove objects, and it generates the missing information. Tons of other tools do this, but it looks like it does a great job. The Immersive Google Maps looks amazing. It will generate cars on the road to simulate current traffic. You’ll be able to drag a slider to see what any area looks like at any time of day or weather. It’s a huge upgrade. The Universal Translator not only translates any language while making it sound like the original speaker, but it also matches their lip movements. They partnered with Adobe Firefly to create images in slides for Bard. They also have text-to-image, speech-to-text, and coding tools they’re working on. Really, it’s a lot of things we’ve already seen plenty of other companies doing, but just implemented within Google. And finally, they released this promo video partnering with Tato.

AI Mega Update: New Tools and Huge News

AI Mega Update: New Tools and Huge News

Recent Posts

Recent Comments

Archives

Categories