Share this post

🔑 Key Takeaways

  1. Conducting experiments and A/B testing is crucial for product companies to make informed decisions, as even small changes can have unexpected impacts. It is important to allocate time to high-risk ideas and be prepared for failures. Trust and an experiment-driven culture are essential for successful experiments.
  2. Small changes can have a big impact on revenue. Conducting experiments and not underestimating seemingly trivial changes can lead to significant long-term benefits without compromising user experience.
  3. Small gains in metrics and experiments can lead to significant improvements in revenue, but it requires constant experimentation and evaluation to find the few ideas that truly make a difference.
  4. Embrace experimentation, document surprises, maintain a comprehensive database, and stay data-driven to drive innovation and make informed decisions in an organization.
  5. Experimentation is essential for success, but it requires finding a balance between small changes and risky ideas. Learning from failures and using controlled experiments can determine if users benefit from an idea. A/B testing is valuable in the software industry. Startups should implement it once they have enough users and a supportive platform.
  6. When conducting experimentation and A/B testing, startups should consider a clear overall evaluation criterion that balances revenue growth with user experience and long-term factors such as user satisfaction.
  7. Considering the long-term value of users and incorporating metrics such as retention rates and time to achieve a task is crucial for making accurate predictions and decisions.
  8. By implementing smaller changes and learning from each iteration, teams can avoid negative outcomes and find successful alterations. A data-driven approach is crucial in recognizing the value of incremental changes.
  9. Allocating resources to both known optimizations and high-risk endeavors, prioritizing experiments, and understanding project purpose and impact can optimize processes effectively.
  10. Companies like Airbnb need to prioritize data-driven decision-making, including conducting controlled experiments, to ensure long-term growth and success, especially during uncertain times like the COVID-19 pandemic.
  11. Trust is crucial in an experimentation platform as it provides a safety net for aborting and promoting safe deployments. Trustworthy results are essential for informed decision-making, highlighting the need for organizational trust in the experiment's scientific nature and control.
  12. Sample ratio mismatch in experiments can indicate a problem with the experiment. By using a formula/spreadsheet, one can calculate the probability of mismatch occurring by chance and take necessary actions to address it.
  13. Don't be fooled by impressive results. Investigate and analyze the data thoroughly before drawing conclusions, as they may contain flaws or be misleading. Be cautious of relying solely on P values for statistical significance.
  14. Implementing experimentation and A/B testing can lead to success in a company, even with some uncertainty. Lowering P-value and consulting experts can increase success, and starting with a focused team can shift the culture towards embracing experimentation.
  15. Building a platform for running experiments is crucial for effective decision-making. It should focus on self-service, reducing costs, strong analysis capabilities, trust, speed, and variance reduction techniques.
  16. Ronny Kohavi highly recommends books like "Calling Bullshit," "Hard Facts, Dangerous Half-Truths And Total Nonsense," and "Mistakes Were Made (But Not by Me)" for insightful perspectives and challenging commonly held beliefs. He also praises the TV series "Chernobyl" and emphasizes the impact of using structured narratives in product development.
  17. Using structured documents improves feedback, decision-making, and the ability to reference information, while prioritizing controlled experiments over anecdotal or observational studies leads to more data-driven decisions.

📝 Podcast Summary

The Importance of A/B Testing and Experimentation

Conducting experiments and A/B testing is crucial for product companies to drive growth and make informed decisions. Ronny Kohavi emphasizes the importance of testing every code change and new feature, as even small changes can have unexpected impacts. He recommends allocating time to high-risk, high-reward ideas, understanding that most experiments will fail. Ronny advises being ready to fail 80% of the time when pursuing significant innovations. He also highlights the significance of trust in successful experiments, along with the importance of creating an experiment-driven culture within a company. This conversation highlights the practicality of running experiments and the potential for surprising results that can lead to breakthroughs.

Uncovering Revenue-Boosting Changes Through Experiments

A small, seemingly insignificant change can have a substantial impact on revenue without compromising the user experience. In the case discussed, shifting the order of two lines in the search results increased Bing's revenue by around 12%, amounting to $100 million. This change did not negatively affect user metrics, unlike superficial methods like displaying more ads. It highlights the importance of conducting experiments to uncover unexpected results and learn from them. Additionally, the conversation emphasizes the significance of not underestimating the value of seemingly trivial changes and the need to prioritize and remember successful experiments to capitalize on their potential long-term benefits.

The Power of Small Improvements and Constant Experimentation

Small improvements can have a significant impact. Both Ronny Kohavi and Lenny discuss examples where small gains in metrics and experiments led to substantial improvements in revenue. Ronny mentions that at Bing, the relevance team's goal was to improve their metric by just 2% each year, which added up to a remarkable 2% improvement overall. Similarly, at Airbnb, running 250 experiments resulted in a 6% increase in revenue. However, it's important to note that the majority of experiments (92%) fail to improve the intended metric. So, while small improvements can be powerful, they require constant experimentation and evaluation to find those few ideas that truly make a difference.

The importance of institutional memory and documentation for organizational learning and improvement.

Institutional memory and documentation are crucial for organizational learning and improvement. Ronny Kohavi emphasizes the importance of summarizing the learnings from experiments and conducting regular meetings to discuss the most surprising experiments, whether they were successful or not. Surprising experiments, where the expected outcome differs significantly from the actual result, provide valuable insights and opportunities for learning. It is essential to document these surprises and remember them when designing future iterations or making decisions. Kohavi suggests maintaining a comprehensive database of successes and failures and enabling keyword search to easily retrieve experiment history. By actively embracing experimentation and staying data-driven, organizations can drive innovation and make informed decisions, even for seemingly small changes or bug fixes.

Balancing Incremental Changes and High-Risk Ideas in Experimentation

Experimentation is crucial for success, but it's important to strike a balance between incremental changes and high-risk, high-reward ideas. Ronny Kohavi emphasizes the need for a portfolio of experiments, some of which may lead to significant breakthroughs, while others may fail. It's important to allocate efforts to both types of experiments and be prepared for a high failure rate, especially when attempting big ideas. The conversation highlights the importance of learning from failures and using controlled experiments as the ultimate oracle to determine if users are benefiting from a particular idea. While A/B testing may not be suitable for all domains, it is highly valuable in the software industry, especially when a mature platform with low incremental costs is in place. Startups should consider implementing A/B testing once they have enough users and a platform that supports it.

The Importance of a Clear Overall Evaluation Criterion for Experimentation and A/B Testing in Startups

Experimentation and A/B testing can be valuable tools for startups, but it's important to have a clear overall evaluation criterion (OEC) in place. Simply optimizing for revenue is not enough, as it can lead to actions that harm the user experience in the long run. For example, adding more ads to a search page may increase revenue initially, but it can negatively impact user satisfaction and result in increased churn. The OEC should consider various metrics, such as time to successful result and percentage of successful sessions, to strike a balance between revenue growth and user experience. It's also crucial to consider long-term factors, such as user satisfaction after a purchase or stay.

The importance of long-term value and metrics in accurate predictions and decisions.

In order to make accurate predictions and decisions, it is crucial to consider the long-term value of users and the countervailing metrics associated with a particular action. Ronny Kohavi emphasizes the importance of defining the OEC (Overall Evaluation Criterion) in a way that causally predicts the lifetime value of the user. By incorporating metrics such as retention rates and time to achieve a task, the OEC becomes more useful in driving long-term success. Moreover, Ronny suggests two approaches for understanding long-term metrics: running long-term experiments to learn and building models based on historical data and background knowledge. This conversation highlights the significance of considering both short-term and long-term impacts when making strategic decisions.

The importance of cautious and iterative redesigns

It is crucial to approach redesigns and large-scale changes in a cautious and iterative manner. Both Ronny Kohavi and Lenny highlight the negative consequences of full redesigns, emphasizing that they often lead to negative outcomes and require significant effort to rectify. Instead, they advocate for incrementally testing and adjusting changes along the way. By implementing smaller changes and learning from each iteration, teams can identify the ideas that actually work and avoid the negative impact of unsuccessful alterations. Additionally, they stress the importance of being open to failure and adopting a data-driven approach to decision-making. Running experiments and analyzing results can help organizations recognize the value of incremental changes and overcome the resistance to them.

Striking a Balance in Product and Process Redesign

When considering redesigning a product or process, it is important to strike a balance between taking big bets and iterating towards improvement. While completely redesigning something may offer the potential for breakthrough success, it is crucial to recognize that 80% of the time such attempts fail. Allocating resources to both known optimizations and high-risk, high-reward endeavors is a wise approach. This rule of thumb applies to many organizations and is evident in the allocation of resources at Google. Additionally, it is essential to prioritize running experiments and avoiding the shipment of features that do not provide value or even have a negative impact. By understanding the purpose and potential impact of each project, organizations can make informed decisions and optimize their processes effectively.

The Importance of a Data-Driven Approach for Airbnb's Success

Airbnb's shift towards a more top-down, vision-oriented approach may have hindered its potential for success. While design aspects were given attention by Brian, the search team, responsible for neural networks and search algorithms, heavily relied on A/B testing before launching anything. The absence of controlled experiments in other teams, coupled with the departure of data-driven advocates like Greg Greeley, may have impacted Airbnb's overall performance. Furthermore, during the COVID-19 pandemic, Ronny Kohavi emphasizes the importance of continued experimentation, as it enables companies to make informed decisions even during uncertain times. In retrospect, these insights highlight the importance of maintaining a data-driven approach and conducting controlled experiments for the long-term growth and success of a company like Airbnb.

Importance of Trust in Experimentation Platform and Results Analysis

Trust is crucial in running experiments and building an experimentation platform. Ronny Kohavi emphasized the importance of trust in two aspects. Firstly, he mentioned that the experimentation platform serves as a safety net, allowing quick aborts when something goes wrong, promoting safe deployments and velocity. Secondly, the platform provides trustworthy results at the end of an experiment, analyzing key metrics and debugging. Kohavi highlighted the need for organizational trust in the experiment's scientific nature and control. He cautioned against using real-time P value monitoring, which could lead to inflated error rates and false positive results, damaging trust in the platform. The conversation serves as a reminder that accurate and reliable experimentation is paramount to make informed decisions.

Addressing Sample Ratio Mismatch in Experiments

A common issue when running experiments is sample ratio mismatch, which occurs when the distribution of users between control and treatment groups is not as designed. This can be a red flag that something is wrong with the experiment. By using a formula or spreadsheet, one can determine the probability of such a mismatch occurring by chance. It was found that approximately 8% of experiments at Microsoft suffered from this issue. Bots and problems with the data pipeline are often the causes of sample ratio mismatch. To address this, a warning banner was initially added, but people ignored it. Eventually, a compromise was reached by highlighting the numbers in the scorecard with a red line to signal a sample ratio mismatch.

The Importance of Being Skeptical and Investigative in Data Interpretation

We should be cautious when interpreting results that appear too good to be true. Ronny Kohavi explains that people have a natural bias towards wanting to see success, which can lead them to overlook flaws in the data. Twyman's law, a concept introduced by a person working in radio media, states that figures that look interesting or different are usually wrong. This emphasizes the need to investigate and not immediately celebrate extraordinary results, as there is a high probability of finding flaws in the experiment. Additionally, Ronny highlights the misconception around P values and advises against relying solely on them. The false positive risk, which tends to be much higher than commonly thought, should be considered when interpreting statistical significance.

The Power and Benefits of Experimentation and A/B Testing

Implementing experimentation and A/B testing in a company can be highly valuable and lead to success. Although data scientists are aware that experiments are not perfect and there is some uncertainty, launching positive experiments can still have a positive impact. It's okay to be occasionally wrong as long as the overall balance is in favor of successful experiments. Lowering the P-value and implementing replication can increase success and reduce the false positive rate. Additionally, keeping track of experiment failure rates and consulting internal experts can help in starting experiments. If resistance to experimentation exists, starting with a team or department that frequently launches and has a clear optimization goal can help shift the culture towards embracing experimentation. The success of experimentation in one area, like Bing, can influence and inspire other teams in the company. Using third-party experimentation platforms is also a viable option today.

The Importance of Building an Experimentation Platform for Effective Decision-Making

Building a platform for running experiments is crucial for effective and efficient experimentation. Ronny Kohavi emphasizes the importance of self-service and reducing the marginal cost of experiments to zero. By providing a platform that allows users to easily set up, run, and analyze experiments, organizations can streamline the experimentation process. Kohavi also highlights the need for strong analysis capabilities within the platform to avoid the reliance on data scientists. Trust is vital in running experiments, but speed is also important. Kohavi recommends having a scorecard soon after the experiment finishes and utilizing variance reduction techniques, such as capping metrics and using pre-experiment data to adjust results. Overall, the conversation emphasizes the value of building a comprehensive experimentation platform to drive effective decision-making.

There are several books and a TV series that Ronny Kohavi highly recommends. One book called "Calling Bullshit" provides insightful perspectives on extreme claims and encourages skepticism. Another book, "Hard Facts, Dangerous Half-Truths And Total Nonsense," challenges commonly held beliefs, showing that many things we consider well-understood may lack justification. The book "Mistakes Were Made (But Not by Me)" explores the fallacies we often succumb to, leading to humbling outcomes. Additionally, the TV series "Chernobyl" is highly praised by Kohavi for its portrayal of the disaster. Furthermore, Kohavi discusses using structured narratives, a concept he learned at Amazon, as a minor change that has had a significant impact on their product development process.

Shifting from PowerPoint to structured documents improves feedback and decision-making.

Implementing a structured document instead of PowerPoint presentations can have a significant impact. Ronny Kohavi shared his experience at Amazon, where they shifted from using paper-based presentations to Word or Google Docs. This change allowed for team members to provide honest feedback and for that feedback to be easily documented and referenced after the meeting. Additionally, Ronny emphasized the importance of the hierarchy of evidence when it comes to making decisions based on information. Trusting controlled experiments and multiple controlled experiments over anecdotal or observational studies is crucial. Understanding the concept of control experiments can help individuals make data-driven decisions and improve their overall decision-making processes.