Blogging in the LLM age: a second golden age for blogs?

Austin Tripp

LLMs (large language models) are currently scraping all text on the public internet, training on it, and spitting out variants of that text in response to queries. I think this fact makes now a golden age for blog writing. If you have ever thought about writing a blog, the time is now.

This idea is not unique or original¹, but I am completely convinced by it. The purpose of this post is to explain it in my own words.

My review guide for machine learning conference papers.

Austin Tripp

There is no official step by step guide for how to review ML conference papers at venues like NeurIPS/ICML/ICLR.¹ In this post, I try to explain my guide. It is not official, endorsed, or necessarily good, but I have been reviewing for 4+ years with this (implicitly) in mind already.

Problems with ML for toxicity prediction.

Austin Tripp

I recently read a review paper by Seal et al about machine learning for toxicity prediction. Given the length of the paper I thought I would share my important takeaways. Disclaimer: not everything in this post was part of the paper, and not everything in the paper is reflected in this post.

Chebyshev Scalarization Explained

Austin Tripp

I've been reading about multi-objective optimization recently.¹ Many papers state limitations of "linear scalarization" approaches, mainly that it might not be able to represent all Pareto-optimal solutions (if this sentence does not make sense to you, see background). Chebyshev scalarization is sometimes mentioned as an alternative which can represent all solutions. However, these papers mention it in passing without a proper explanation, and I did not find a good explanation of it online.²

After doing a bit of my own research,³ I found that Chebyshev scalarization is actually not too complicated, so I thought I would explain it online. In this post, I:

Give definitions for Chebyshev scalarization (for both maximization and minimization)
Give a simple proof that it can represent all Pareto-optimal solutions
Explain its relationship to linear scalarization via $\ell_p$ norms.
Give a geometric interpretation via an interactive visualization

Reminder that Claude is really good for Chinese grammar

Austin Tripp

This shouldn't surprise anybody who uses LLMs a lot, but LLMs are really good at common languages and translation. When practising Chinese today, a sentence which confused me was:

Some Beijing travel tips

Austin Tripp

Earlier this year I went to Beijing. Here are some miscellaneous travel tips. What I am writing here is probably not unique or original, treat this just as my personal emphasis / endorsement.

Taking the V out of VAEs: long live KL-regularized autoencoders!

Austin Tripp

I recently read this post about generative modelling in latent space by Sander Dieleman and agreed strongly with the following quote (copied verbatim except for typo correction):

Coding python packages with AI

Austin Tripp

I tried using some new LLM tools to code 2 entire python packages (instead of editing a handful of lines at a time, which is what I did previously). It went well! These tools are not perfect, but they are useful!

Why your active learning algorithm may not do better than random

Austin Tripp

I am a big fan of active learning, but I am also acutely aware of its potential failure modes. A common failure mode is random-like performance: achieving no better "success"¹ in picking points than a random policy. Indeed, it is possible to experience this when the implementation is flawed.² However, in some problems it may not be possible to beat random-like performance. In this post I try to explain why.

Generic recommendations for cheminformatics models.

Austin Tripp

I was recently asked via email for my thoughts on which models to use in general for molecular property prediction. I'm sharing my responses publicly in case they are useful.

Using LLMs to improve my Chinese

Austin Tripp

I've been learning Chinese for almost 10 years now, but still make awkward-sounding sentences when I speak. A few months ago I thought "why not use LLMs to help me speak more naturally", and found that it does not take much prompting to get useful feedback. Here is a conversation with Claude 3.5 from a few months ago:

Conceptual confusion about desirable outputs of reaction prediction models.

Austin Tripp

In the literature about machine learning for retrosynthesis, one line of work tries to predict chemical reactions, either in the forward direction (ie what products will A + B form) or in the backward direction (ie what reactants could react to produce molecule C). Such models are often trained on datasets of known reactions like Pistachio or USPTO, with the hope of generalizing to new "correct" reactions. However, this formulation of the problem overlooks a lot of subtleties about what makes a reaction "correct". In this post I will present a more nuanced mental model which (hopefully) clarifies some ambiguities.

Punishing poor reviewers at CVPR

Austin Tripp

This year CVPR pledged to make all authors participate in peer review, and reject papers from authors who wrote low-quality reviews.¹ Last week they confirmed on Twitter that they followed through with this and rejected 19 papers. Presumably this is a tiny fraction of the overall papers submitted, but I hope this is an effective deterrent for future authors. At the very least, I'm glad that some major conference tried something like this!

https://cvpr.thecvf.com/Conferences/2025/CVPRChanges ↩

Why don't ML conferences provide reviewer instructions?

Austin Tripp

I remember when I first received an invitation to review papers for an ML conference in late 2020. What surprised me most was not that I was being invited (even though that was a surprise, since I was just a second year PhD student who had only just completed writing a paper myself). Instead, it was the lack of instruction of how to assess the papers: essentially just "write your reviews by date X", and "evaluate novelty, significance, soundness, etc". In fact, in all the years since, I think I have never received explicit instructions for reviewing ML conference papers.¹

Alpha over LLMs

Austin Tripp

On a recent podcast, Patrick McKenzie mentioned the idea of "alpha over LLMs": does a publisher produce text with any meaningful advantage over asking an LLM? I think this is an important question for anybody trying to regularly write, even if the readership is small (eg this blog). I interpret this as:

People should not put out content which is obviously wrong and can be corrected by an LLM (eg "I have theory X" where asking an LLM provides clear and convincing counter-arguments to X).
People should also not put out content which is worse than the answer you get from asking an LLM (eg the same content but explained less clearly).

I will generally try to uphold this principle in future blog posts.

Is offline model-based optimization a realistic problem? (I'm not convinced)

Austin Tripp

This is a "quickpost": a post which I have tried to write quickly, without very much editing/polishing. For more details on quickposts, see this blog post.

Offline model-based optimization (OMBO in this post) is essentially 1-shot optimization using a fixed dataset. You see data, do whatever you want, then propose a batch of query points, which are then evaluated. Hopefully, one (or most) of the query points are optimal (or near optimal). End of task.

Experiment: more posts, lower quality

Austin Tripp

Since starting my new position at Valence, my efforts to write more on my blog have clearly been successful.¹ However, my internal list of "things I would like to write a blog post about" is growing far faster than I am actually able to write blog posts about things where I do think it is worth putting an opinion online.

Stock responses about statistical significance for reviewing machine learning papers

Austin Tripp

So many ML papers contain tables like

Method	Score(↑)
Baseline 1	49.9%
Baseline 2	49.8%
Baseline 3	50.0%
Our super fancy SOTA method	50.1%

then say "results on the benchmark show that our method is state-of-the-art for task X."

Hiring is hard: why good applicants without connections can get overlooked.

Austin Tripp

Knowing people is a great way to get hired. Nepotism is one obvious explanation (aka people hire you because they like you, or to gain favors from people who like you). I (along with most other people) think that nepotism is bad: it's unfair, and gives jobs to people who are probably not that good at them. However, it is a mistake to think that nepotism is the only reason why people who are known get hired, and that this practice is always bad. Some better reasons are:

Reaction model scores are CRITICAL to multi-step retrosynthesis.

Austin Tripp

Machine-learning for retrosynthesis is a popular research topic. Popular sub-topics include:

Double checking that Gauche's fingerprint kernels are positive definite.

Austin Tripp

GAUCHE is a library for Gaussian processes in chemistry. I contributed a small amount to GAUCHE several years ago but am not an active developer. I recently learned that some new fingerprint kernels were added. In this post I examine whether these kernels are positive definite (PD), and if there are any restrictions attached.

Using a small set of lemmas (of which two were new to me), I am convinced that all but two of the kernels are PD, without being restricted to binary vectors. The remaining 2 I am unsure of, but don't claim that they are not PD.

What ML researchers and users get wrong: optimistic assumptions

Austin Tripp

ML is often done poorly, both by "ML experts" (by which I mean people who understand the algorithms but not the data) and "ML users" (by which I mean people who understand their data, but not the algorithms). I think the cause is often over-optimism, although about different things:

New Year's Resolutions for 2025

Austin Tripp

Happy 2025! Here are a few goals I am setting for myself this year!

Review of NeurIPS 2024 and predictions for ML in 2025

Austin Tripp

I was fortunate to attend NeurIPS 2024, arguably the largest and most influential machine learning conference in the world (thanks Valence for sponsoring my trip 🙏). In this post I will try to summarize what I learned at NeurIPS, and cautiously make some predictions for the year ahead.

Rules of scientific English writing for an international audience.

Austin Tripp

Although English is the common language for international scientific communication, most scientists are not native English speakers. To account for this, I think that all scientists (especially native English speakers) should try to write text which is easy to read for non-native speakers. I propose the following rules for this:

When should you expect Bayesian optimization to work well?

Austin Tripp

As much as I believe in the potential of Bayesian optimization (BO) to be useful for scientific discovery, after 4+ years I have seen many instances where BO does not work. In this post I explain a simple heuristic rule to decide whether you should expect BO to work well or not.

What can eduroam teach us about building research infrastructure

Austin Tripp

Eduroam is a fantastic piece of academic infrastructure: students/researchers from thousands of universities around the world can automatically connect to WiFi and any partner institutions using login details from their home institution. To me it's surprising that it exists, given that it has many characteristics of projects which academia is terrible at accomplishing:

Scientific conferences as approximate Bayesian inference

Austin Tripp

Scientists should ideally form their beliefs based on evidence and update their beliefs as new evidence arrives. Unfortunately, humans are far from perfect Bayesian thinkers and therefore may struggle to do this properly. In this post I explain how conferences help scientists perform better Bayesian inference.

Overview of the NSGA-II algorithm for multi-objective optimization

Austin Tripp

Nondominated sorting genetic algorithm version 2, more commonly called NSGA-II, is a well-established genetic algorithm (aka evolutionary algorithm) for multi-objective optimization (MOO). In this post I try to extract the key insights/lessons from this paper.

Advice for applying for a PhD in AI for science in 2024

Austin Tripp

With the immense growth of AI for science (AI4sci for brevity), I imagine many younger students are considering applying for PhDs in AI for science in this application cycle. This post is my attempt to turn my 5 years of experience in AI4sci into actionable advice for prospective PhD students.

Thoughts on Google Vizier

Austin Tripp

Vizier, described in a recent paper from Google, is a black-box optimization algorithm deployed for "numerous research and production systems at Google". Allegedly, this one algorithm works well on a wide range of settings (something which the "no-free-lunch-theorem" suggests might not be possible). In this post I provide my thoughts on what key design decisions likely make this algorithm work well.

Being 'data-driven' does not mean that you should use bad data.

Austin Tripp

Relying on data rather than intuitions to make decisions is usually a good thing, but is not always better. When one needs to make a decision about things for which there is no good data it might be better to rely on intuition rather than the best proxy available. Here are some examples where I think an intuition-based approach can be better than a data-driven approach (but still worse than a data-driven approach with good data):

Problems with the top-k diversity metric for diverse optimization

Austin Tripp

NOTE this blog post can be run as a jupyter notebook. I re-ordered the cells to make it easier to read; to re-produce all the plots see instructions at the end of the post.

Background¶

"Diverse optimization" has been a popular topic in machine learning conferences for a few years now, particularly in the "AI for drug discovery" sub-field. In this context, the goal of "optimization" algorithms is to suggest promising drug candidates, where "promising" means maximizing one (or more) objective functions. An example of an objective function could be a docking score (an approximate simulation of the interactions between a protein and a molecule). "Diverse" optimization further requires that an algorithm produce multiple distinct candidate solutions. This is typically desired when the objective functions don't fully capture everything we want (for example, a drug candidate also having low toxicity). The hope is that a diverse set of candidates will have a higher chance of one useful candidate compared to a non-diverse sets.

How I chose a static site generator

Austin Tripp

Recently, I wanted to update my website to look a bit more polished (and support additional features such as automatically generating pages for my publications). In the end I decided to completely switch from building my website with Jekyll to nikola instead. This post explains my thought process for this (in case anybody else is considering a similar switch).

Reasons to have a website

Austin Tripp

I created this website because I thought (and continue to think) that having a website can benefit one's career. Essentially, a professional website serves as an accessible source of information about oneself for prospective employers, coworkers, and employees. Unless you put something horrible on your website, the effect should at worst be neutral, so there is essentially no downside to having one.

In the remainder of the post I will lay out a more detailed case for having a website and address some potential hesitations people might have about creating one.

The Monty Hall Problem is Really About Policy Assumptions

Austin Tripp

TL;DR: the Monty Hall problem doesn't have a well-defined solution.

A Quick Tutorial on Bash Quotes

Austin Tripp

Today I learned way more about quotations in bash than I ever thought I needed to know. I thought I would highlight the interesting use case that I discovered, which requires some special trickery to write a script that executes arbitrary commands. First, let's quickly review some facts about bash quotes.

How to Keep a Communal Fridge Clean

Austin Tripp

Last month, my class decided that we should get a fridge for the class study room. This brought up an important question: how would the fridge be cleaned? I thought this was an interesting problem and deserved some discussion, both from a practical and a theoretical standpoint.

Language Travel Logs: Japanese 2018

Austin Tripp

One dream I have always had since I started learning languages is to be able to go to another country and use that language to communicate. This August I had the first opportunity to do that during a 2 week trip to Japan. In this post, I will outline the preparation I did before going, where I was able to use it when I was there, and evaluate my success.

Sparse Matrices: 5 Tips and Tricks

Austin Tripp

Over the course of my internship at the online shopping company Wish, I have dealt a lot with a lot of data in the form of sparse matrices, specificaly in the form of item interaction matrices for customer data. In doing so, I have made heavy use of scipy's sparse matrices library. Here are 5 tricks that I have learned.

Why you should never be certain of your beliefs: a Bayesian perspective

Austin Tripp

People are notoriously bad at estimating their percent confidence in their beliefs, as explained further in this Wikipedia article. Something I thought of recently is what effect this overconfidence has from a Bayesian perspective. After a bit of math, I came to the conclusion that having extreme confidence in your beliefs (0% or 100% confidence) implies that you would be unable to change your beliefs if shown evidence to the contrary. I believe this simple argument suggests that it is very irrational to hold prior beliefs of 0 or 100%. If you do feel this way, then you should choose a very high value (99.99%) or a very low value (0.001%), but always leave some room for error.

An Overview of Gradient Boosting and Popular Libraries for it.

Austin Tripp

Everybody doing machine learning wants the best models possible. The aim of this blog article is the following:

To provide an introduction to the machine learning technique known as boosting, and specifically gradient boosting.
To compare/contrast boosting with other ensemble methods, such as bagging
To explain and compare several popular gradient boosting frameworks, specifically XGBoost, CatBoost, and LightGBM.

Turning Adam Optimization into SGD

Austin Tripp

Motivation

This strange question came up when working on a machine learning project to generate embeddings. Working with the version of Pytorch available on our DGX (similar to version 0.3.1), I found there was an optimizer called SparseAdam but not one called SparseSGD. Since what I really wanted to do was use SGD, I wondered: could I turn the Adam optimizer into an SGD optimizer by setting the hyperparameters \(\beta_1\), \(\beta_2\), and \(\epsilon\)?

Paper Review: A Computational Approach to Organizational Structure

Austin Tripp

Motivation

Ever since I've started doing internships, the concept of efficient organizations has piqued my interest. In every workplace I have been in, time is always wasted by inefficient transfers of information. For example, long meetings where most of the content is irrelevant to most of the people, or repeated interactions with co-workers where you explain the same thing to all of them. Assuming employees make ~$40/hour, a 1 h meeting with 100 people will cost $4000! If these meetings are not productive, then the company gets a negative return on its time investment, which essentially means the company is wasting money. Clearly organizational efficiency is a financially important objective.

Much of an organization's efficiency can be linked to its structure. Long, big meetings are usually a consequence of a strong hierarchical organization, where work is done by employees, then synced to a centralized node (a boss), and then possibly recapitulated to the workers in a meeting. However, are there better ways to structure an organization that would save time?

How to guess a Kanji's on-yomi in 4 easy steps

Austin Tripp

Lately I have been putting a lot of effort into studying Japanese, to prepare for my upcoming trip to Japan at the end of summer 2018. While learning the readings for various kanji (Chinese character) words, I've noticed that a lot of their pronunciations (on-yomi) are related to the Chinese pronunciations in an interesting way. Let me explain my empirical theory about how to to convert Chinese pronunciations into Japanese ones:

Launched my website

Austin Tripp

Didn't know that creating a decent looking website could be this easy. Thanks Jekyll. Also thanks to this website for walking me through the process.