I've been using this to help me read papers with mathematical notation in them. I screen share with https://aistudio.google.com/live and then ask (out loud) questions like "what do these symbols mean?" - it's a huge help for me, since I never memorized all of that stuff.
Oh this is great! I always have this problem. I find that's one of my biggest barriers when reading queueing theory content. I'm only doing it intermittently so I don't have memorized the meanings of ρ,σ,μ,λ...
Visually I also often confuse rho and sigma, and math texts will use psi ψ and phi φ in weird fonts and I can never tell them apart.
Cool tech, but for some reason, the very first sentence in every reply the AI gives in the demo videos is really off-putting to me personally. It seems to me like perhaps this kind of joyful helpfulness introductory sentence is expected in US culture, but it immediately turned me off. I come from a culture that is less verbose in general and more to the point, so this feels like a mismatch right away.
That is not what is implied here, OP seems to dislike the speech aesthetics produced by the model. I feel the same; the sugar-coating provided before and after any actual valuable information is (to me) :
- not succeeding in awkwardly trying to achieve an experience that would be comparable to talking with a human person
- not efficient, not enjoyable
- perfectly matching the experience of talking with an highly standardized and gimmicky version of human_v0.01.
Now, that being said, I don't really care about all of this.
The USA population is equivalent to approximately 4% of the total world population.
I am getting extremely skeptical of AI “age”. I was hoping it will unlock a whole new generation of startups like the iPhone did. However genAI is too generic and too blunt a tool, as in it does everything. However it’s too expensive for a small company to do stuff with. Looks like these AI companies (google and OpenAI) realize that and so are even doing the vertical integration themselves. In such an event does genAI end up being the automation tool that you access/use through OpenAI or google and that’s it?
I am sure people here see it better than I do, so what new class of problems is this genAI going to solve?
I do some angel investing so get a lot of deal flow in my inbox. A lot of the AI ones (which is now 99% of everything) are not really businesses.
They are at best nice features or capabilities to have in wider enterprise application suite for say HR or Accountancy but on their own its just a lot of smart people working extremely hard to sell one tiny solution that is small cogwheel for a much larger problem.
My gut is telling me that very soon, if not now, there will be an opportunity for savvy VCs to sweep up some of these cogwheels and package them together into a real business and it's something I am exploring with a few other angels. Curious what others think on this. Feel free to DM me (details on profile).
I was involved very early with automated speech recognition for transcribing meetings, but then both UberConference and Google Meet just integrated it into their existing offerings, thereby massively reducing the market size for standalone solutions. And given how heavily subsidized AI API calls are at the moment, just relying on them is a huge risk for your business model, because you never know when your suppliers' prices will 10x to represent the true cost of providing those AI services.
In my opinion, the sales of many of these new AI tools are mostly driven by the existing audience of the creator. In many cases, you could just ask an LLM to quickly build you your own competing solution, which you can then use for free. E.g. all those SEO content and LinkedIn blog post bots. Vibe-coded AI "businesses" are the software equivalent of branded white t-shirts.
Won't the general purpose models eventually eat all the application layer business models?
Now that ChatGPT desktop can read files in my code editor and apply changes I've pretty much stopped using dev specific AI tools. Same with spreadsheet problems - where uploading data to ChatGPT and hoping for the best actually works pretty well now.
This is exactly how I see it and hence my original comment. I am working on solving an actual problem with genAI. However I thought nothing is preventing the Oats from usurping this solution. My suspicion is confirmed once I started seeing vertical integrations by these companies. In fact I am now convinced that genAI as a service doesn’t have a long life outside of a search engine replacement and so these companies started doing vertical integration. However the once they start it they just came at every vertical as the unferlying tool is truly generic.
I personally don't want to see "a whole new generation of startups". GenAI, LLMs in particular, are a powerful tool for the users precisely because they are general.
After all, what is the business of such startups of the "AI age"? It's using the AI models on the backend, where users can't reach, to sprinkle some magic onto features. But as a user, I don't want your product to use AI as a tool, I want your product to be a tool the AI uses. The former keeps power away from users, the latter gives it back to them.
Here is a different take. The GenAI is just another revenue stream for big players with scale or resources.
At work, I was tasked with building some magical agentic stuff, working on it for a while, I realized that, HN shouts, oAI/xAI/Google/Amazon/Anthropic have no moats, there are oss models available. The actual fact is, the moat is the access to scale of resources(GPUs, power infra, network), which is very difficult to build for random joe's startup.
You must always rent a model access from one of these players(even OpenRouter delegates), and that is the exact moat.
GenAI solves problems of having more generalized solutions, so instead of a super customized secret sauce solutions for your product as competitive edge, you now build magic prompts for GenAI to take the same input and hopefully with enough GenAI ingesting the same data and coming to consensus, you get a reasonably useful output that previously your custom solution was capable of. Since you no longer have a custom solution, you now pay off the GenAI operators(the real moat of GenAI for all operators hosting it). In the process,you also sacrifice your competitive edge of that super secret IP and relying heavily on GenAI prompting correctly and numerous verification in each step with enough automation, which of course again costs money.
GenAI is the new hammer of visionary leadership and executives (a hefty amount of money has been burned to campaign and PR to convince these people) to use it everywhere, so the operators can ensure that they make some profit from the amount of money they sank on it. If you super impose the "AI" of current year to "Apps" in 201x era, where everything must have an "app", you'll suddenly realize that we've seen the same before and of course most apps need cloud... and as clouds have costs, apps became more subscription model instead of previous 200x era buy once, use forever.
If the only moat really will be the scale of computation resources, that's great news for users, because it will be an extremely competitive market where prices will be driven down very effectively.
I suspect that model quality/vibes and integrations will play a role as well though.
> However genAI is too generic and too blunt a tool, as in it does everything.
How does this preclude the AI "age"? And why is the metric "companies make money off of it"?
I view it more like open source/Linux. When Linux was new, it was immensely useful , but not a means for companies to make money (except for a tiny few).
Or more precisely, far more people used Linux for their own personal benefit than companies making money off of it.
And so it is with generative AI. For me, personally, it's very useful.[1] So assuming the major companies (OpenAI, Anthropic, etc) don't go bankrupt and/or kill it off, gen AI is here to stay, and will continue to disrupt. That startups can't make much money off of it is somewhat irrelevant.
[1] It has revolutionized speech recognition for me. I have several automations running that were coded by Claude. Things that had been in my mind for years but I didn't have time to write them. MCP will add more value to my life soon. Etc.
1. It solves a problem. Doesn't have to be a completely unsolved problem, can just be a new solution. Or even just new packaging on an old solution. But it needs to solve some kind of problem.
2. It's trustworthy. Some people get a tool to suite their own process. But the majority, from anecdotal evidence, will adopt the tool's process. There's this idea that "these guys know how to do invoicing so I don't have to think about invoicing if I use their invoicing tool".
3. It's known. A bit philosophical, but if something exists that nobody _knows_ solves a problem they might not even know they have, how much of a useful tool is it, really?
DropBox is an interesting example. It wasn't exactly a major scientific breakthrough, and a lot of people asked "why don't people just use FTP?". If you focus on (1), DropBox looked close to pointless. But what they did is nail (2) and (3).
Now, if you subscribe to the hype, you might argue (1) and (2) will soon be covered. AI will magically solve your problem and be a universal domain expert telling you what to do, so you don't have to think about it. You might also argue that it will magically solve (3), with stuff like Gemini Live kinda watching you all day and constantly going "let me tell you how to do that" or "let me do that for you".
Seems unlikely to me. Not impossible, most things I can think of are theoretically possible. Just unlikely. And if you think even just _one_ of those three aspects can't be fully automated in the near future, there's still plenty of opportunity left to differentiate in that area.
I think generative AI does unlock a new generation of startups, because it's genuinely new technologies that we can find at least some valuable use cases for. And an army of startups tends to be better at quickly exploring a new solution space than a few big incumbents. So in that sense, it is similar to smartphones, which also brought a new solution space, and with it, startups.
the problem for me with AI startup, built on someone else's AI, is that it is difficult for me to envision what the moat will be - perhaps because it is difficult to think up what the moat will be for something so generically specified, but really whenever I think of something cool, it seems AI is an added on ability but not the main driver, and that AI does not give any moat whatsoever.
I agree, it's an implementation detail. What you sell is a solution, and your mode is generally your brand, your existing user base, and any anti competitive shenanigans you manage to get away with. Not so much even the actual quality of the product, it's more important what people _think_ it's quality is.
In what way is AI trustworthy? It's ruining the parts of the internet I use and care about. I can't visit Digikey's site in an incognito tab without having to sit through a ~5 second captcha these days. Mouser is less aggressive, but it's still problematic. Drew's spent how much time combating AI bots instead of improving Sourcehut?
In fact I'd be hard pressed to think of a site that isn't getting boned by AI.
Well, I meant it in the sense of: You use a tool because you think it's creators really know this space and you can rely on it. To stick to the DropBox example, their users probably trust them to keep file management convenient for them and to keep their files safe.
I've seen a depressing amount of people treat LLMs like some sort of oracle. So I can picture a significant number of consumers just trusting ChatGPT with their taxes or whatever, based on the assumption that it know any domain at least as good as human experts.
I'm not saying _I_ find any LLM trustworthy. But if enough people do, it becomes difficult to differentiate there.
I have no idea if this will be as big as an iPhone, but the early iPhone apps and development followed a similar trajectory. Viral flashlight apps and others that then got integrated into the OS, similar to API wrappers and viral photo makers of today getting swallowed up by the next model update.
In terms of what problems it solves, I would imagine that will be up to the developers/companies to come up with the Uber/Airbnb/Tiktok, that the iPhone enabled, that AI enables. Same as any platform.
It's not too expensive. I did some crawling the other day. AI wrote the crawler automatically, then I had it extract structured info from the crawled product info. It was about 600 requests to o4-mini. Cost me about 2.5 cents.
o4-mini? It is a rumored model building up on o3, and different from 4o-mini. Sorry, their naming scheme is a mess. Would be cool if you do have o4-mini access though.
Imagine putting dice and random objects (cups, forks..) on a table, pointing your phone at them and asking it to invent a new game for your friends. Tell it to use these objects and also use the live camera as a gameplay element.
Or recognizing bird or plant species.
Or helping a blind person go hiking, helping avoid tree roots and describing the beautiful scenes they’re in.
Blind person here. I can confidently say that vision AIs for navigating the real world are pretty much hype. It is a bandwidth and latency issue. Latency from camera to AI, and bandwidth from device to brain... Please, don't hype accessibility just for your personal fun. There are people out there with real problems, and dangling impossible solutions before their "eyes" is pretty much cruel.
> describing the beautiful scenes they’re in
And this part is ableism at its best. Do you eally think what people like me are missing are computer generated descriptions of our surroundings, so that we notice the beauty? Reminds me of the Cochlear implant debate some people from the deaf community have. Is my life less interesting because I dont see your colours? Am I a lesser human that needs treatmenet because I dont see your beauty? Me thinks no.
Agreed, blind people and sighted people are able to experience different kinds of beauty that are inaccessible to each other. As a sighted person, if there was a technology that could describe the beauty that a blind person experiences in a given moment I would think that would be pretty cool to try.
> Please, don't hype accessibility just for your personal fun. There are people out there with real problems, and dangling impossible solutions before their "eyes" is pretty much cruel
Do you think it's going to be impossible forever? bandwidth and latency seem like the surest things to improve in AI tech
I don't know about bandwidth from device to brain. Maybe Neuralink or something similar?
It has been tried a lot. I saw the first sonar-alike navigation aid in the early 90s. It basically translated detected obstacles into vibration. Thats where you start to realize that bandwidth is the issue. Because a single, or maybe even a group of, vibrations, doesn't really tell you anything about the nature of the obstacle.
Now, we're at a point where vision models (if they dont hallucinate, DANGER!!!) can actually destinguish different obstacles and even describe them. Nice. However, you're pretty much limited to speech synthesis as an information channel. That is not that bad, but very distracting when you move about outside. After all, blind people already use their ears to learn about their surroundings. There isn't much, if any, bandwidth left to stick some constantly chatting voice in. You end up deminishing your actual senses input about the surroundings, which is also dangerous. Nothing beats the turn-around time of your own ears or tactile info you might get from your cane...
So, to answer your question: Maybe. I haven't seen a technology that can squeeze some more bandwidth out of the current situation yet.
> Or helping a blind person go hiking, helping avoid tree roots and describing the beautiful scenes they’re in.
"If only you could see, then you could appreciate the beauty of the scenery! Let me describe what are missing out on. I hope your sub-standard experience, as rated by sighted people, is sufficiently adequate to you. May I list more short-comings of your existence in comparison to able people?"
I did a BeMyEyes test recently, trying to sort about 40 cans according to the existance of a deposit logo. After 90 minutes of submitting photos, and a second round to make sure it doesn't lie too much, I had 16 cans which according to BeMyEyes (OpenAI) had a deposit logo. Then, I went to the shop to bring them back. Turns out, only 4 cans had a logo. So after a second round to eliminate hallucinations, the success rate was only 25%.
But isn't the BeMyEyes assisting happening via other humans? I remember signing up for some "when blind people need your help" thing via BeMyEyes and I understood it as it's 100% humans on the other end of the call that will help you.
Watching that demo video, I wonder why they chose to use that?
Gemini only talked about some useless surface knowledge that would be forgotten quickly, whereas if she actually read the Wikipedia page she would learn more and retain it better.
Because besides the TV show one none of your examples benefit from the video/"live" modality? Or at least I wouldn't be motivated to use video for that.
or even the internet one. It would have been nice if it was like "I see you have a NetMan 402-BX with a blinking amber light. That's a common failure mode that can be fixed by updating the firmware. You need to press and hold the power button and put a pin into the reset hole..."
You know ... something more than "reboot that shit"
I think it would be nice if the Pixel Fold could do: Have a browser on the left showing some content, and have Gemini on the right, where you can prompt it with questions or asking it to take actions on the left.
This is Apple Intelligence the way it was supposed to be ("AI for the rest of us"), but Apple just doesn't "get" AI, so here we are—the only platform provider that is taking the correct approach to AI is Google.
It seems very likley that Apple will license this in some form. They already are testing with ChatGPT and Gemini, but I don't think OpenAI is going to be able to undercut Google on price. And how it looks right now they might not be able to beat them on performance either.
Deep down though I can feel Apple must have some level of aversion to this. They have been trying to untangle themselves from Google for so long.
Upvoted. For a couple of weeks now, I added that site as a bookmarked app on my iPhone just to use that Gemini live feature, though I can only use the camera/voice feature (no screen sharing ability on the chrome ios browser afaik). Super weird they are not releasing this feature more officially on iOS
This soon will be available to all Gemini Advanced subscribers on Android devices, and today we’re bringing it to more people, starting with all Gemini app users on Pixel 9 and Samsung Galaxy S25 devices
Likely nothing. Mobile devices can't do much AI stuff except for the tiniest of models, and it'll likely be a long time before they will be able to anything super useful.
Edge compute is a long ways off, even on desktop without a dedicated GPU. But especially on mobile.
You might have bumped into a media website trying to run a WASM-powered onnx runtime background removal tool, or perhaps a super slim LLM. You'll notice how slow these are and how they can lock up your browser. That's about the experience you can expect from edge compute.
Nvidia's proclamation that they're going to be working on robotics as their next growth sector could mean more innovation on the edge / low power compute front. But most of the yield will come from better model architectures and models designed specifically to work with compute constraints.
Yes it's slow because they will most likely execute on CPU (for sure on iOS). WebGPU is still not enabled on safari and WebNN is not even supported anywhere (so that you can use NPU provider). ONNXRuntime is not even the most optimised when running natively (instead of WASM) on iOS, e.g. doesn't support MPS provider (GPU) and NPU provider (via CoreML) implements only subset operators (last time I tried). Safari also provide limitation on WASM memory usage
In practice when you want the best performance you would have to use native app and CoreML with NPU provider and model architecture optimized for NPU. On iOS for now big limitation is available RAM, but even my iPhone 13 mini has exactly the same fast NPU as on my Macbook M2 Max when tested it's having similar speed to running on GPU.
I still can't use the "summarize" feature that was used in marketing to sell the Pixel 8 Pro. Not holding my breath on ever being able to use this, either.
"The Google Pixel 9 is powered by the Tensor G4 chip, a custom-designed processor optimized for AI. This chip, along with 12 GB of RAM (or 16 GB for Pro models), enables advanced AI features and enhances overall performance."
Is anyone seriously using Gemini daily? How is it compared to other agents you've tried? Do you feel it's a good value prop? What does it excel/fail at?
I already pay for Google One storage, so it's the cheapest of the paid LLMs for me. That's 99% of the reason I use it, and honestly I don't really
have any strong opinions on it compared to ChatGPT. It's about the same level, and with new models constantly being released for all the different LLMs I've kind of lost track of what it's particularly good or bad at compared to others.
I will say the video with Gemini live is pretty impressive. My family and I tried it a bit yesterday, and my kids wanted to show Gemini all our pets. My kid showed it our cat, picking it up roughly as she is wont to do, and I was impressed when it asked "Is [name of cat] always so patient being handled like that?"
> I already pay for Google One storage, so it's the cheapest of the paid LLMs for me. That's 99% of the reason I use it, and honestly I don't really have any strong opinions on it compared to ChatGPT
I'm on a ChatGPT pro plan, been using it for a good while but got an offer on Google One storage so tried it out for a month. Google's models are far behind compared to OpenAI's, and seemingly o1 Pro Mode is still the best out there, albeit slow obviously. But probably the model I've got furthest with on difficult problems, and even the "simpler" models from OpenAI are still better than Gemma 2.5.
It does seem that Google has better tooling available for their models though, so a combination of the tooling of Google with the models of OpenAI would probably be optimal, but unlikely we'll see that happen.
Gemini 2.5 Pro is very competitive with o1/o1-pro in reasoning and has a larger maximum context. It also just rolled out Deep Research which is also very competitive (perhaps even better) than Open AI's.
Image search and text recognition have become way better than Google Lens ever was. In-place translation is also nice, and I use to check idioms, and other grammarly-esque things.
I think it's good at language related things and transformations, not great at anything else (I wouldn't ask it to create anything new TBH). I can't even get it reliably spun Google docs from my prompts from the standard web/app interfaces.
Similar to FactolSarin, it's baked into my Google account and subscriptions so I don't really need it to be stellar and wouldn't pay for it as a single product.
I find it's a great complement to Sonnet 3.7 in agentic programming workflows, the extended context is amazing, and it can often solve issues that Sonnet 3.7 can't handle (and vice-versa)
I use it and ChatGPT. I think they are pretty close. Gemini has gotten a lot better, and I think it’s superior on the deep research side. NotebookLM is incredible.
For more general things, ChatGPT is still better imo, but that gap is shrinking, at least for what I do. Their analysis plugin is super useful to me.
I will say that it’s hard to evaluate as some of my habits may drive behavior.
I'm probably on the wrong side of history and generally think LLMs are way overhyped, but Gemini 2.5 Pro is the first one to consistently impress me for pretty much everything.
You can't upload code files (.java files, build.gradle files, etc) on the web UI, which makes it 100% worthless for my dev flow. I'm not interested in integrating it with my IDE or installing CLI tools, I want to just upload relevant code and get results.
the first example video tells you how to improve your home decor by saying that you could add a side table and blanket to your chair. Thank you Gemini for telling me that tables can go next to chairs.
There's really no use for AI outside of making studio ghibli drawings and giving me a bunch of broken code quickly.
The answer seems to also be very surface-level, wow it sees a chair and lamp in a corner and believes it's the perfect position, without apparently considering anything else in the room.
Marie Kondo doesn't need to worry about being replaced by AI, yet...
"Gemini, what's the best way to bury this body?"
"Gemini, please count the cockroaches in this kitchen."
"Gemini, calculate my survival chances sprinting across this interstate."
"Gemini, speculate on the percentage volume of pee-pee in this swimming pool."
"Gemini, how many days will this meal set back my life expectancy?"
"Gemini, will these socks lead to my arrest by the fashion police?"
"Gemini, my cat is making that face again -- has it found out it's adopted?"
It's funny, you are kidding, with the impending death and such, but having it be able to look at what you are seeing seems useful to me. E.g., "How do I work this contraption?"
They're listing the use cases where a 10-20% hallucination rate isn't a big deal. They can't advertise the truly useful applications of the tech because it's not reliable enough (yet).
But they can make it more reliable over time by deploying the product now and training on user data:
> Google collects your chats (including recordings of your Gemini Live interactions), what you share with Gemini Apps (like files, images, and screens), related product usage information, your feedback, and info about your location. (...)
> Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.
Google has a history of faking their Gemini demos, even if this demo was impressive (it's not) I would feel the same way about it. Yeah, it will get better in time like self driving cars, quantum computers, super conductors, nuclear fusion.
I've been using this to help me read papers with mathematical notation in them. I screen share with https://aistudio.google.com/live and then ask (out loud) questions like "what do these symbols mean?" - it's a huge help for me, since I never memorized all of that stuff.
Oh this is great! I always have this problem. I find that's one of my biggest barriers when reading queueing theory content. I'm only doing it intermittently so I don't have memorized the meanings of ρ,σ,μ,λ...
Visually I also often confuse rho and sigma, and math texts will use psi ψ and phi φ in weird fonts and I can never tell them apart.
Cool tech, but for some reason, the very first sentence in every reply the AI gives in the demo videos is really off-putting to me personally. It seems to me like perhaps this kind of joyful helpfulness introductory sentence is expected in US culture, but it immediately turned me off. I come from a culture that is less verbose in general and more to the point, so this feels like a mismatch right away.
[flagged]
That is not what is implied here, OP seems to dislike the speech aesthetics produced by the model. I feel the same; the sugar-coating provided before and after any actual valuable information is (to me) : - not succeeding in awkwardly trying to achieve an experience that would be comparable to talking with a human person - not efficient, not enjoyable - perfectly matching the experience of talking with an highly standardized and gimmicky version of human_v0.01.
Now, that being said, I don't really care about all of this.
The USA population is equivalent to approximately 4% of the total world population.
I am getting extremely skeptical of AI “age”. I was hoping it will unlock a whole new generation of startups like the iPhone did. However genAI is too generic and too blunt a tool, as in it does everything. However it’s too expensive for a small company to do stuff with. Looks like these AI companies (google and OpenAI) realize that and so are even doing the vertical integration themselves. In such an event does genAI end up being the automation tool that you access/use through OpenAI or google and that’s it?
I am sure people here see it better than I do, so what new class of problems is this genAI going to solve?
I do some angel investing so get a lot of deal flow in my inbox. A lot of the AI ones (which is now 99% of everything) are not really businesses.
They are at best nice features or capabilities to have in wider enterprise application suite for say HR or Accountancy but on their own its just a lot of smart people working extremely hard to sell one tiny solution that is small cogwheel for a much larger problem.
My gut is telling me that very soon, if not now, there will be an opportunity for savvy VCs to sweep up some of these cogwheels and package them together into a real business and it's something I am exploring with a few other angels. Curious what others think on this. Feel free to DM me (details on profile).
I fully agree with this.
I was involved very early with automated speech recognition for transcribing meetings, but then both UberConference and Google Meet just integrated it into their existing offerings, thereby massively reducing the market size for standalone solutions. And given how heavily subsidized AI API calls are at the moment, just relying on them is a huge risk for your business model, because you never know when your suppliers' prices will 10x to represent the true cost of providing those AI services.
In my opinion, the sales of many of these new AI tools are mostly driven by the existing audience of the creator. In many cases, you could just ask an LLM to quickly build you your own competing solution, which you can then use for free. E.g. all those SEO content and LinkedIn blog post bots. Vibe-coded AI "businesses" are the software equivalent of branded white t-shirts.
Won't the general purpose models eventually eat all the application layer business models?
Now that ChatGPT desktop can read files in my code editor and apply changes I've pretty much stopped using dev specific AI tools. Same with spreadsheet problems - where uploading data to ChatGPT and hoping for the best actually works pretty well now.
This is exactly how I see it and hence my original comment. I am working on solving an actual problem with genAI. However I thought nothing is preventing the Oats from usurping this solution. My suspicion is confirmed once I started seeing vertical integrations by these companies. In fact I am now convinced that genAI as a service doesn’t have a long life outside of a search engine replacement and so these companies started doing vertical integration. However the once they start it they just came at every vertical as the unferlying tool is truly generic.
I personally don't want to see "a whole new generation of startups". GenAI, LLMs in particular, are a powerful tool for the users precisely because they are general.
After all, what is the business of such startups of the "AI age"? It's using the AI models on the backend, where users can't reach, to sprinkle some magic onto features. But as a user, I don't want your product to use AI as a tool, I want your product to be a tool the AI uses. The former keeps power away from users, the latter gives it back to them.
Here is a different take. The GenAI is just another revenue stream for big players with scale or resources.
At work, I was tasked with building some magical agentic stuff, working on it for a while, I realized that, HN shouts, oAI/xAI/Google/Amazon/Anthropic have no moats, there are oss models available. The actual fact is, the moat is the access to scale of resources(GPUs, power infra, network), which is very difficult to build for random joe's startup.
You must always rent a model access from one of these players(even OpenRouter delegates), and that is the exact moat.
GenAI solves problems of having more generalized solutions, so instead of a super customized secret sauce solutions for your product as competitive edge, you now build magic prompts for GenAI to take the same input and hopefully with enough GenAI ingesting the same data and coming to consensus, you get a reasonably useful output that previously your custom solution was capable of. Since you no longer have a custom solution, you now pay off the GenAI operators(the real moat of GenAI for all operators hosting it). In the process,you also sacrifice your competitive edge of that super secret IP and relying heavily on GenAI prompting correctly and numerous verification in each step with enough automation, which of course again costs money.
GenAI is the new hammer of visionary leadership and executives (a hefty amount of money has been burned to campaign and PR to convince these people) to use it everywhere, so the operators can ensure that they make some profit from the amount of money they sank on it. If you super impose the "AI" of current year to "Apps" in 201x era, where everything must have an "app", you'll suddenly realize that we've seen the same before and of course most apps need cloud... and as clouds have costs, apps became more subscription model instead of previous 200x era buy once, use forever.
If the only moat really will be the scale of computation resources, that's great news for users, because it will be an extremely competitive market where prices will be driven down very effectively.
I suspect that model quality/vibes and integrations will play a role as well though.
> However genAI is too generic and too blunt a tool, as in it does everything.
How does this preclude the AI "age"? And why is the metric "companies make money off of it"?
I view it more like open source/Linux. When Linux was new, it was immensely useful , but not a means for companies to make money (except for a tiny few).
Or more precisely, far more people used Linux for their own personal benefit than companies making money off of it.
And so it is with generative AI. For me, personally, it's very useful.[1] So assuming the major companies (OpenAI, Anthropic, etc) don't go bankrupt and/or kill it off, gen AI is here to stay, and will continue to disrupt. That startups can't make much money off of it is somewhat irrelevant.
[1] It has revolutionized speech recognition for me. I have several automations running that were coded by Claude. Things that had been in my mind for years but I didn't have time to write them. MCP will add more value to my life soon. Etc.
"I am getting extremely skeptical of Python. What class of problems is this language going to solve?"
"I am getting extremely skeptical of Photoshop. What class of problems is this software going to solve?"
"I am getting extremely skeptical of the internet. What class of problems is this network going to solve?"
Well, what's a tool? I would say:
1. It solves a problem. Doesn't have to be a completely unsolved problem, can just be a new solution. Or even just new packaging on an old solution. But it needs to solve some kind of problem.
2. It's trustworthy. Some people get a tool to suite their own process. But the majority, from anecdotal evidence, will adopt the tool's process. There's this idea that "these guys know how to do invoicing so I don't have to think about invoicing if I use their invoicing tool".
3. It's known. A bit philosophical, but if something exists that nobody _knows_ solves a problem they might not even know they have, how much of a useful tool is it, really?
DropBox is an interesting example. It wasn't exactly a major scientific breakthrough, and a lot of people asked "why don't people just use FTP?". If you focus on (1), DropBox looked close to pointless. But what they did is nail (2) and (3).
Now, if you subscribe to the hype, you might argue (1) and (2) will soon be covered. AI will magically solve your problem and be a universal domain expert telling you what to do, so you don't have to think about it. You might also argue that it will magically solve (3), with stuff like Gemini Live kinda watching you all day and constantly going "let me tell you how to do that" or "let me do that for you".
Seems unlikely to me. Not impossible, most things I can think of are theoretically possible. Just unlikely. And if you think even just _one_ of those three aspects can't be fully automated in the near future, there's still plenty of opportunity left to differentiate in that area.
I think generative AI does unlock a new generation of startups, because it's genuinely new technologies that we can find at least some valuable use cases for. And an army of startups tends to be better at quickly exploring a new solution space than a few big incumbents. So in that sense, it is similar to smartphones, which also brought a new solution space, and with it, startups.
the problem for me with AI startup, built on someone else's AI, is that it is difficult for me to envision what the moat will be - perhaps because it is difficult to think up what the moat will be for something so generically specified, but really whenever I think of something cool, it seems AI is an added on ability but not the main driver, and that AI does not give any moat whatsoever.
I agree, it's an implementation detail. What you sell is a solution, and your mode is generally your brand, your existing user base, and any anti competitive shenanigans you manage to get away with. Not so much even the actual quality of the product, it's more important what people _think_ it's quality is.
In fact I'd be hard pressed to think of a site that isn't getting boned by AI.
Well, I meant it in the sense of: You use a tool because you think it's creators really know this space and you can rely on it. To stick to the DropBox example, their users probably trust them to keep file management convenient for them and to keep their files safe.
I've seen a depressing amount of people treat LLMs like some sort of oracle. So I can picture a significant number of consumers just trusting ChatGPT with their taxes or whatever, based on the assumption that it know any domain at least as good as human experts.
I'm not saying _I_ find any LLM trustworthy. But if enough people do, it becomes difficult to differentiate there.
I have no idea if this will be as big as an iPhone, but the early iPhone apps and development followed a similar trajectory. Viral flashlight apps and others that then got integrated into the OS, similar to API wrappers and viral photo makers of today getting swallowed up by the next model update.
In terms of what problems it solves, I would imagine that will be up to the developers/companies to come up with the Uber/Airbnb/Tiktok, that the iPhone enabled, that AI enables. Same as any platform.
It's not too expensive. I did some crawling the other day. AI wrote the crawler automatically, then I had it extract structured info from the crawled product info. It was about 600 requests to o4-mini. Cost me about 2.5 cents.
o4-mini? It is a rumored model building up on o3, and different from 4o-mini. Sorry, their naming scheme is a mess. Would be cool if you do have o4-mini access though.
https://www.theverge.com/news/646458/openai-gpt-4-1-ai-model
My mistake, I meant GPT-4o mini.
It's too early to tell. We're only 24-36 months in.
We're still in the post Netscape, pre-dotcom-crash bubble.
Real applications are coming.
Marketing should use their imagination!
Imagine putting dice and random objects (cups, forks..) on a table, pointing your phone at them and asking it to invent a new game for your friends. Tell it to use these objects and also use the live camera as a gameplay element.
Or recognizing bird or plant species.
Or helping a blind person go hiking, helping avoid tree roots and describing the beautiful scenes they’re in.
So much possibility!
Blind person here. I can confidently say that vision AIs for navigating the real world are pretty much hype. It is a bandwidth and latency issue. Latency from camera to AI, and bandwidth from device to brain... Please, don't hype accessibility just for your personal fun. There are people out there with real problems, and dangling impossible solutions before their "eyes" is pretty much cruel.
> describing the beautiful scenes they’re in
And this part is ableism at its best. Do you eally think what people like me are missing are computer generated descriptions of our surroundings, so that we notice the beauty? Reminds me of the Cochlear implant debate some people from the deaf community have. Is my life less interesting because I dont see your colours? Am I a lesser human that needs treatmenet because I dont see your beauty? Me thinks no.
Agreed, blind people and sighted people are able to experience different kinds of beauty that are inaccessible to each other. As a sighted person, if there was a technology that could describe the beauty that a blind person experiences in a given moment I would think that would be pretty cool to try.
> Please, don't hype accessibility just for your personal fun. There are people out there with real problems, and dangling impossible solutions before their "eyes" is pretty much cruel
Do you think it's going to be impossible forever? bandwidth and latency seem like the surest things to improve in AI tech
I don't know about bandwidth from device to brain. Maybe Neuralink or something similar?
It has been tried a lot. I saw the first sonar-alike navigation aid in the early 90s. It basically translated detected obstacles into vibration. Thats where you start to realize that bandwidth is the issue. Because a single, or maybe even a group of, vibrations, doesn't really tell you anything about the nature of the obstacle. Now, we're at a point where vision models (if they dont hallucinate, DANGER!!!) can actually destinguish different obstacles and even describe them. Nice. However, you're pretty much limited to speech synthesis as an information channel. That is not that bad, but very distracting when you move about outside. After all, blind people already use their ears to learn about their surroundings. There isn't much, if any, bandwidth left to stick some constantly chatting voice in. You end up deminishing your actual senses input about the surroundings, which is also dangerous. Nothing beats the turn-around time of your own ears or tactile info you might get from your cane...
So, to answer your question: Maybe. I haven't seen a technology that can squeeze some more bandwidth out of the current situation yet.
Thanks for the insight. Very cool to get your perspective on this
> Or helping a blind person go hiking, helping avoid tree roots and describing the beautiful scenes they’re in.
"If only you could see, then you could appreciate the beauty of the scenery! Let me describe what are missing out on. I hope your sub-standard experience, as rated by sighted people, is sufficiently adequate to you. May I list more short-comings of your existence in comparison to able people?"
How charitable of you to assume those examples work reliably.
Bemyeyes app already work quite reliability to describe scenes to the blind
Haha, did you evaluate this personally?
I did a BeMyEyes test recently, trying to sort about 40 cans according to the existance of a deposit logo. After 90 minutes of submitting photos, and a second round to make sure it doesn't lie too much, I had 16 cans which according to BeMyEyes (OpenAI) had a deposit logo. Then, I went to the shop to bring them back. Turns out, only 4 cans had a logo. So after a second round to eliminate hallucinations, the success rate was only 25%.
Do you call that reliable?
> I did a BeMyEyes test recently
But isn't the BeMyEyes assisting happening via other humans? I remember signing up for some "when blind people need your help" thing via BeMyEyes and I understood it as it's 100% humans on the other end of the call that will help you.
Yes, what you are describing is how BeMyEyes started, and it still offer that feature.
However, somewhere around 1 or 2 years ago, they added a OpenAI vision model based way to send in photos and have them described.
In general, its a very nice feature, if it works. For instance, I do use it successfully to sort laundry.
But the deposit logo test I did gave horrible results...
That changed a while ago. They also use OpenAI's APIs now.
https://openai.com/index/be-my-eyes/
Are you willing to bet that it wouldn't work reliably in a year, 2 years, 5 years?
If you're releasing something today, should you talk about what it can do now or what it might be able to do in two years?
Watching that demo video, I wonder why they chose to use that?
Gemini only talked about some useless surface knowledge that would be forgotten quickly, whereas if she actually read the Wikipedia page she would learn more and retain it better.
right? not great examples. Here are the kind of prompt examples I would have used instead:
- explain this symbol to me
- what TV show am I watching
- how much is the house I'm looking at worth (bring in geo-location)
- How often should I be watering this plant
- how many calories are in this meal
- what type of car is this
- whats the exact name/type of this screw part
- what are the exact dimensions of this door frame
- what kind of style of art is this
- how often should I be taking this medication
- how many copies has this book sold
- which part of the world is this pic from
My meta glasses can do a decent number of those, with the ideal form factor for those use cases.
Looks like Google was demonstrating some of that on stage recently: https://gizmodo.com/google-showed-off-advanced-ar-glasses-th...
Because besides the TV show one none of your examples benefit from the video/"live" modality? Or at least I wouldn't be motivated to use video for that.
or even the internet one. It would have been nice if it was like "I see you have a NetMan 402-BX with a blinking amber light. That's a common failure mode that can be fixed by updating the firmware. You need to press and hold the power button and put a pin into the reset hole..."
You know ... something more than "reboot that shit"
> how often should I be taking this medication
What could go wrong?
It’s aimed at normies. What you watched would blow their minds and be exactly what they were hoping for.
I think it would be nice if the Pixel Fold could do: Have a browser on the left showing some content, and have Gemini on the right, where you can prompt it with questions or asking it to take actions on the left.
It's not the best at helping me play video games yet, lol. Ah well. Blind people are used to waiting. :)
This is Apple Intelligence the way it was supposed to be ("AI for the rest of us"), but Apple just doesn't "get" AI, so here we are—the only platform provider that is taking the correct approach to AI is Google.
It seems very likley that Apple will license this in some form. They already are testing with ChatGPT and Gemini, but I don't think OpenAI is going to be able to undercut Google on price. And how it looks right now they might not be able to beat them on performance either.
Deep down though I can feel Apple must have some level of aversion to this. They have been trying to untangle themselves from Google for so long.
Why only Pixel 9? Surely none of the computation is on-device anyways
You can do the same from the browser on any phone (including iphone) by going through AI Studio
https://aistudio.google.com/live
Upvoted. For a couple of weeks now, I added that site as a bookmarked app on my iPhone just to use that Gemini live feature, though I can only use the camera/voice feature (no screen sharing ability on the chrome ios browser afaik). Super weird they are not releasing this feature more officially on iOS
The second sentence contradicts your statement
This soon will be available to all Gemini Advanced subscribers on Android devices, and today we’re bringing it to more people, starting with all Gemini app users on Pixel 9 and Samsung Galaxy S25 devices
curious, what is done on-device?
Likely nothing. Mobile devices can't do much AI stuff except for the tiniest of models, and it'll likely be a long time before they will be able to anything super useful.
Edge compute is a long ways off, even on desktop without a dedicated GPU. But especially on mobile.
You might have bumped into a media website trying to run a WASM-powered onnx runtime background removal tool, or perhaps a super slim LLM. You'll notice how slow these are and how they can lock up your browser. That's about the experience you can expect from edge compute.
Nvidia's proclamation that they're going to be working on robotics as their next growth sector could mean more innovation on the edge / low power compute front. But most of the yield will come from better model architectures and models designed specifically to work with compute constraints.
For now, datacenter inference reigns supreme.
> WASM-powered onnx runtime background removal tool
Yes it's slow because they will most likely execute on CPU (for sure on iOS). WebGPU is still not enabled on safari and WebNN is not even supported anywhere (so that you can use NPU provider). ONNXRuntime is not even the most optimised when running natively (instead of WASM) on iOS, e.g. doesn't support MPS provider (GPU) and NPU provider (via CoreML) implements only subset operators (last time I tried). Safari also provide limitation on WASM memory usage
In practice when you want the best performance you would have to use native app and CoreML with NPU provider and model architecture optimized for NPU. On iOS for now big limitation is available RAM, but even my iPhone 13 mini has exactly the same fast NPU as on my Macbook M2 Max when tested it's having similar speed to running on GPU.
I still can't use the "summarize" feature that was used in marketing to sell the Pixel 8 Pro. Not holding my breath on ever being able to use this, either.
Incentives to buy new hardware.
"The Google Pixel 9 is powered by the Tensor G4 chip, a custom-designed processor optimized for AI. This chip, along with 12 GB of RAM (or 16 GB for Pro models), enables advanced AI features and enhances overall performance."
None of that onboard stuff ended up working afaik as the Tensor chips shipped right as giant LLM's became how dominant.
I don't understand why Android only? Why would nobody want to use this on a PC?
Is there a way to record the screen WITH AUDIO and save it?
pretty useless demos haha, i wonder why they choose those cases, maybe it really doesn't do much else correctly rn
and then humanity became illiterate
Is anyone seriously using Gemini daily? How is it compared to other agents you've tried? Do you feel it's a good value prop? What does it excel/fail at?
I already pay for Google One storage, so it's the cheapest of the paid LLMs for me. That's 99% of the reason I use it, and honestly I don't really have any strong opinions on it compared to ChatGPT. It's about the same level, and with new models constantly being released for all the different LLMs I've kind of lost track of what it's particularly good or bad at compared to others.
I will say the video with Gemini live is pretty impressive. My family and I tried it a bit yesterday, and my kids wanted to show Gemini all our pets. My kid showed it our cat, picking it up roughly as she is wont to do, and I was impressed when it asked "Is [name of cat] always so patient being handled like that?"
> I already pay for Google One storage, so it's the cheapest of the paid LLMs for me. That's 99% of the reason I use it, and honestly I don't really have any strong opinions on it compared to ChatGPT
I'm on a ChatGPT pro plan, been using it for a good while but got an offer on Google One storage so tried it out for a month. Google's models are far behind compared to OpenAI's, and seemingly o1 Pro Mode is still the best out there, albeit slow obviously. But probably the model I've got furthest with on difficult problems, and even the "simpler" models from OpenAI are still better than Gemma 2.5.
It does seem that Google has better tooling available for their models though, so a combination of the tooling of Google with the models of OpenAI would probably be optimal, but unlikely we'll see that happen.
how long ago? i think 2.5 exp is better than o1 or o3-mini in my experience.
Gemini 2.5 Pro is very competitive with o1/o1-pro in reasoning and has a larger maximum context. It also just rolled out Deep Research which is also very competitive (perhaps even better) than Open AI's.
I'd argue that 2.5 pro is better because it doesn't suffer from GPT-isms, where its blatantly obvious someone used ChatGPT.
It's also much, much faster than o1-pro. That feels like a bigger win every time I use it.
Not daily, but a decent amount I think.
Image search and text recognition have become way better than Google Lens ever was. In-place translation is also nice, and I use to check idioms, and other grammarly-esque things.
I think it's good at language related things and transformations, not great at anything else (I wouldn't ask it to create anything new TBH). I can't even get it reliably spun Google docs from my prompts from the standard web/app interfaces.
Similar to FactolSarin, it's baked into my Google account and subscriptions so I don't really need it to be stellar and wouldn't pay for it as a single product.
I find it's a great complement to Sonnet 3.7 in agentic programming workflows, the extended context is amazing, and it can often solve issues that Sonnet 3.7 can't handle (and vice-versa)
I use it and ChatGPT. I think they are pretty close. Gemini has gotten a lot better, and I think it’s superior on the deep research side. NotebookLM is incredible.
For more general things, ChatGPT is still better imo, but that gap is shrinking, at least for what I do. Their analysis plugin is super useful to me.
I will say that it’s hard to evaluate as some of my habits may drive behavior.
I'm probably on the wrong side of history and generally think LLMs are way overhyped, but Gemini 2.5 Pro is the first one to consistently impress me for pretty much everything.
Up unit 2.5 it was kinda meh, but now it's my daily driver. Still have ChatGPT, but swapped out Anthropic.
I use it in coding. It's pretty good.
I use it daily, even started planning trips, explaining stuff, coding. Easily the best ai out there, then claude and then deepseek.
You can't upload code files (.java files, build.gradle files, etc) on the web UI, which makes it 100% worthless for my dev flow. I'm not interested in integrating it with my IDE or installing CLI tools, I want to just upload relevant code and get results.
Try using the interface via ai.dev instead of gemini.google.com. I upload code files all the time and it works.
No thanks google.
Cool! I'm looking forward to having a live AI drawing tutor. None of the models are there yet but we're getting close!
the first example video tells you how to improve your home decor by saying that you could add a side table and blanket to your chair. Thank you Gemini for telling me that tables can go next to chairs.
There's really no use for AI outside of making studio ghibli drawings and giving me a bunch of broken code quickly.
The answer seems to also be very surface-level, wow it sees a chair and lamp in a corner and believes it's the perfect position, without apparently considering anything else in the room.
Marie Kondo doesn't need to worry about being replaced by AI, yet...
"Screen images simulated."
[flagged]
"Don't be snarky."
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
Don't be so pessimistic, how about these uses:
- Let Gemini watch you drive so you can get advice on driving better
- Let Gemini help you rewire your outlet for the first time
- Let Gemini help you pick edible mushrooms in the forest
> Don't be so pessimistic, how about these uses:
I suspect, these examples are chosen specifically because their failures modes can be lethal ?
How about:
- Gemini, help me tell if this is sarcasm
It's funny, you are kidding, with the impending death and such, but having it be able to look at what you are seeing seems useful to me. E.g., "How do I work this contraption?"
About those edible mushrooms - https://www.vox.com/24141648/ai-ebook-grift-mushroom-foragin...
They're listing the use cases where a 10-20% hallucination rate isn't a big deal. They can't advertise the truly useful applications of the tech because it's not reliable enough (yet).
But they can make it more reliable over time by deploying the product now and training on user data:
> Google collects your chats (including recordings of your Gemini Live interactions), what you share with Gemini Apps (like files, images, and screens), related product usage information, your feedback, and info about your location. (...)
> Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.
https://support.google.com/gemini/answer/13594961?hl=en
Google has a history of faking their Gemini demos, even if this demo was impressive (it's not) I would feel the same way about it. Yeah, it will get better in time like self driving cars, quantum computers, super conductors, nuclear fusion.
I'm sure you laughed at the words and sentences GPT-2 generated.
It seems that there’s a subset of people who will agree unironically.
They are the target market for this