Builds over this tweet thread by Elizabeth Laraki
The current UI for interacting with LLMs mimic messaging. We type in our queries and receive model reply in a flat/linear manner. With latest developments the models have become multimodal (i.e can upload images etc) but the interaction design is limited by the same linear manner. (think how uploading picture on messaging works). The future of interacting with these LLMs will see a paradigm shift.
Would be interesting to see how the interaction involves. I imagine if we ask the models in future to explain circulatory system, it would output an image showing the inner organs and then explain how blood flows from heart to body parts and back through animation. We would be able to point to organs and ask what role does it play in the system and the model answers back in voice.
On a side note - Can newer multimodal LLMs answer where’s Waldo?