Latent Space Podcast 6/8/23 [Summary] - From RLHF to RLHB: The Case for Learning from Human Behavior - with Jeffrey Wang and Joe Reeve of Amplitude
Explore AI & analytics with Jeffrey Wang & Joe Reeve on Latent Space Live! Dive into why AI values Analytics and the power of first-party behavioral data.
Original Link: From RLHF to RLHB: The Case for Learning from Human Behavior - with Jeffrey Wang and Joe Reeve of Amplitude
Summary
AI at Amplitude with Alessio, Jeffrey, Joe, and Swyx
Alessio, the host, starts by explaining the podcast's focus on AI research and its application, emphasizing the show's technical orientation.
Jeffrey, a co-founder and Chief Architect at Amplitude, introduces himself. He’s been in the field of product analytics for about a decade, helping businesses understand user behavior data to make informed product decisions. He finds recent trends in AI especially exciting, recognizing AI's increasing importance in product data and development.
Joe Reeve shares his background in tech startups and mentions his current role leading AI R&D at Amplitude. Both he and Jeffrey express enthusiasm for the innovations they're exploring.
The discussion moves to Amplitude's journey with AI. Jeffrey provides an overview: Amplitude's primary goal is helping customers utilize their product data to enhance their offerings. With digital products generating vast amounts of data, sifting through this information is challenging. Amplitude aims to bridge the gap between the massive data and actionable insights, viewing this as an AI-centric problem. He also recounts the origin of Amplitude, emphasizing the significance of product analytics in the gaming sector, where meticulous analysis of user data is critical.
Drawing parallels, Swyx mentions other tech giants like Slack and Discord that originated from gaming ventures.
The conversation shifts to current R&D efforts. Joe highlights their dual approach: implementing AI in their products and supporting their customers in doing the same. They've developed a framework to identify areas for AI integration by examining collaboration touchpoints, deeply embedding features, and creating supplementary AI tools. One of their recent tools allows users to input queries, which are then translated into charts. However, they faced challenges with AI models losing context or generating errors.
In essence, the conversation offers insights into how a tech company like Amplitude is navigating the ever-evolving world of AI, dealing with challenges, and envisioning the future.
Harnessing AI in Product Evolution: Challenges and Breakthroughs
Swyx and Joe discuss the evolution and application of machine learning models (ML) and large language models (LLM) in product development.
Products Released: Joe elaborates on a feature they rolled out called "question to chart" which allows users to ask a question and get a chart as a response. This new feature is part of their LLM initiative and was rolled out to AI design partners.
Contrast with Existing ML Models: Swyx points out a potential conflict in using both traditional ML models and the new AI initiatives. Joe explains that they utilize traditional ML to narrow down data for LLMs, which handles more complex reasoning.
Challenges Faced: Joe mentions two key pain points: hallucination and multi-query issues. To address these, they're tracking inferences and trying to understand why certain models fail.
Utilizing AI in Amplitude Products: Joe highlights that they use their own product, Amplitude, to measure and improve AI's performance. They are also working on helping companies build AI products with Amplitude.
Measurement of AI Effectiveness: Joe touches upon the challenge of assessing AI's performance. Many companies struggle with understanding how effective their AI models are. By measuring user behaviors, like content sharing, Joe believes that they can get a more accurate gauge of AI's value.
Potential for A-B Testing with AI: They discuss the idea of generating variations in user interfaces and content with AI and testing their effectiveness. This leads to a broader conversation about self-improving products and how generative AI can automate processes. Jeffrey gives an example of how generative models might revolutionize copywriting in the future by auto-generating and optimizing content.
In essence, the conversation revolves around the potential of AI in product development, its challenges, and the innovative solutions being created to address them.
Just-In-Time UIs and The Evolution of Analytics
Shaping the Future of User-Interface with Advanced Analytics and AI
Alessio delves into the concept of Just-In-Time UIs, discussing the preference-based user interactions. He mentions how platforms like Amplitude have transitioned from solely dashboard-driven designs to offering tailored displays based on user preferences.
Jeffrey envisions multiple paths for the future of analytics. He champions the precision of SQL and code, emphasizing their clarity over natural language. However, he also recognizes the utility of natural language interfaces for those unfamiliar with data structures. Jeffrey believes the balance between natural language expressiveness and code precision will evolve, with natural language leading the initial inquiry phase and precise code defining specifics.
Joe emphasizes the potential of models becoming more integrated features. He stresses the need for users to understand the model's actions and maintain the ability to intervene. Joe also highlights the importance of providing detailed feedback data for continuous model improvement.
Swyx points out the challenges tied to chat interfaces, raising questions about how to optimize user interaction time. He references Copilot's unique metrics on code retention as a successful approach to gauge product efficacy.
Jeffrey and Joe further elaborate on selecting the right AI models, with an inclination towards using general models for exploration. They note the efficiency and cost-effectiveness of embedding-based approaches once the utility of a general model is confirmed. Both stress the importance of understanding user intent, with Jeffrey emphasizing the challenges of quantifying match quality with LLMs.
The discussion ends with Alessio posing a thought on the interplay between the quality of the model versus the volume and quality of data, with Jeffrey noting that both are crucial components in the AI ecosystem.
Striking a Balance: Ethical AI Training and the Nature of Intelligence
In a discussion between Swyx, Joe, Jeffrey, and Alessio, the main topic revolved around the ethics and best practices of training AI models on user data, especially when users might not be fully aware.
Ethics on Training with User Data:
Swyx initiated the conversation, inquiring about the ethical implications of using user data for training without their knowledge.
Joe emphasized the importance of keeping Personally Identifiable Information (PII) away from training data to prevent unintended information leakage, such as an AI accidentally generating a social security number. Joe noted ongoing experiments to strip PII from prompts and other data while still allowing models to operate effectively.
User Privacy & Tracking:
Jeffrey spoke about the spectrum of user privacy concerning tracking. On one end, some users don't want any tracking, while on the other, there are those who are fine with being tracked across the internet. The challenge is to strike a balance between tracking everything and having insufficient data to improve products. Jeffrey underscored the importance of using pseudo-anonymized first-party data to improve products without breaching privacy.
Creativity & Intelligence in AI:
During a lightning round of questions, both Jeffrey and Joe touched on the progress of AI in areas previously thought to be purely human domains, like creativity. Jeffrey was surprised by the creative potential of AI in tasks like image and text generation, while Joe discussed the rapid advancements in AI, especially how they've been integrated with the internet.
Defining Intelligence:
Jeffrey posed a fundamental question about the nature of intelligence, noting how traditional benchmarks of human intelligence, like chess, have been surpassed by AI. The conversation touched on concepts like free will and the efficiency of the human brain compared to artificial systems.
Takeaways:
Jeffrey stressed that we are still in the early stages of the AI revolution, implying that there's a lot more to come. Joe highlighted the importance of letting machines augment human capabilities rather than replace them, advocating for the harmony between human skills and machine capabilities.
The session ended with Swyx opening up the floor for audience questions.
Customized AI Query Mechanisms in Chatbots
Audience Inquiry: A member of the audience inquired about the inner workings of an AI-driven chatbot, particularly curious about its model and how it turns single English queries into multiple sub-queries.
Joe's Response:
Their system uses a custom query engine instead of SQL.
They categorize questions to determine chart types, for instance, if they should use segmentation, line, or funnel charts, along with chart naming.
They initially considered Lang chain, a tool great for prototyping, but found it limiting and instead developed an internal wrapper with TypeScript. This allows them to write code and infer transactions within their system.
Currently, they use GPT-3.5 but are considering integrating GPT-4 and other models in the future. They also have fallback plans for clients who prefer not to use OpenAI by employing internal or open-source models trained on smaller datasets.
Jeffrey's Insights:
The key to their system's effectiveness is breaking down problems sufficiently, allowing them to guide the model to make specific decisions like chart type selection.
GPT models can be overwhelmed by too much information, hence their method of providing contextual prompts and breaking down tasks results in a higher quality output.
Swyx's Questions & Insights:
Swyx was interested in their experience with LangChain and their choice of databases.
Joe mentioned they have been using embedding or vector search in production for a while with Postgres, and recently shifted to PG Vector.
Swyx highlighted the importance of separating taxonomies from actual data to ensure data protection and prevent prompt injection.
In essence, the discussion revolved around the intricacies of implementing a chatbot, the choice of models, and ensuring both efficiency and security in their system.
Unpacking Future Potential: Behavioral AI and Predictive Patterns
Audience member inquires about a novel model, hinting at an "Amplitude GPT", trained on user behavior data and wonders about its capabilities. Jeffrey elucidates the potential of such models. He anticipates superior predictive capabilities, especially predicting user group behaviors, such as churning, purchasing, or upselling. Jeffrey's vision extends to understanding deeper patterns in sessions, enabling unsupervised categorization of users based on future outcomes. This would assist in discerning the reasons behind behavioral disparities and attempt to answer challenging questions on causation in product analytics. Furthermore, Jeffrey highlights the challenges of data interpretation in tools that automatically record user sessions due to the noise in data. Ideally, a perfect behavioral model could interpret such data, eliminating the need for manual instrumentation. While such a model is still in the realm of speculation, the aspiration is to make the analytics process seamless and more intuitive in the future. The conversation concludes with Swyx expressing gratitude to the listeners.