---
title: "Rules of scientific English writing for an international audience."
date: 2024-12-30
tags:
    - speculation
    - language
    - writing
    - _recent-highlight
has_math: false
---

Although English is the common language for international scientific
communication, most scientists are not native English speakers. To account for
this, I think that all scientists (especially native English speakers) should
try to write text which is easy to read for non-native speakers. I propose the
following rules for this:

<!-- TEASER_END -->

1. _ABSOLUTELY NO CULTURAL REFERENCES._ Americans: pay extra attention to this!
   ([link](#no-cultural-references))
2. Use simple words. Repeated words are ok. ([link](#simple-words))
3. No slang, especially current slang. ([link](#no-slang))
4. Use literal language instead of metaphorical language.
   ([link](#literal-language-over-metaphorical-language))
5. Use simple sentence structures. In particular, minimize nested clauses.
   ([link](#simple-sentence-structures))
6. Avoid Latin/French/German phrases.
   ([link](#avoid-latinfrenchgerman-phrases))
7. Minimize the use of idioms/fixed phrases. If you really want to use an
   idiom, use quotation marks. ([link](#minimize-idioms))

## Explanations and examples

### No cultural references

More precisely my rule is "no cultural references which will not be universally
understood". In practice however, I think very few cultural references actually
are universally understood (as I will elaborate on below). Therefore, I think
"no cultural references" is generally a sufficient description. _If you really
want to include a cultural reference, at least include a footnote which
explains it._


#### Avoiding cultural references is hard

When I hear "avoid cultural references" I think "reference a specific
book/film/TV series that not everybody has read/watched." For example, if I
were to describe a procedure with excessively strict rules as
"soup-Nazi-esque", readers who have not watched a particular episode of the TV
series _Seinfeld_ would not understand it. This is easy to imagine and
therefore easy to avoid (especially since I know many people my own age and
with a similar background who have not watched this episode of Seinfeld).

This kind of reference is atypically easy to avoid. References which may be
less obvious are:

- **"Famous" people**: who is famous varies enormously between places _and_
  between generations. People at my parent's age may not know about Billie
  Eilish, and young people might not know about _The Rolling Stones_. Even
  "tier 1" American celebrities like Tom Cruise or Taylor Swift might not be
  known to, for example, a 60 year old Chinese scientist.
- **Allusions to "classics":** it may feel safe to assume that people are
  vaguely familiar with "classics", at least their names/themes. Unfortunately,
  what is considered a "classic" varies widely by region. People in non-Western
  countries may not have watched _Star Wars_ or old Disney films.
  Non-anglophones may not have heard of Charles Dickens. Characters from Greek
  mythology (eg Zeus) are probably not well-known outside Europe. Even
  characters from the bible may not be universally known (eg in largely atheist
  countries like China).
- **Historical figures**: similar to classics, not everybody will know about
  George Washington, Queen Victoria, Augustus Caesar, or Hammurabi.
- **Cuisine**: phrases like "the meat of the argument" reflect European cooking
  practices of a meat-centric meal (not shared in places like
  largely-vegetarian India).
- **Stores/brands**: these vary a lot in different places! Don't assume readers
  will know "Rolex is a luxury watch" or "Ferrari is a fast, expensive, and
  high-quality car".

#### Americans: be extra careful

In my experience, Americans have a particularly strong tendency to overestimate
foreigners' familiarity with American culture. This is somewhat understandable
because American culture _is_ globally dominant and many foreign scientists are
familiar with many popular items from American culture such as McDonald's, rap,
jazz, the TV series _Friends_, etc. However, this familiarity should not be
extrapolated too far. Some examples:

- While the most popular US states might be known (eg California, Florida,
  Texas), less prominent states might not be (eg Montana, Idaho). Don't assume
  that people will know where these states are (even roughly) or their
  stereotypes (eg Nebraska has lots of farms).
- Most of the world does not learn US history (including the civil war).
- People may not have heard of smaller US cities like St Louis or Denver.

If you are American and A) don't have family who live abroad B) have not lived
abroad yourself, you should probably assume that you cannot judge what
foreigners will know and don't know, and therefore I suggest you default to
assuming nothing.

#### Is there really _nothing_ universal?

Some things are probably safe. For example, essentially every country has
McDonald's. Probably everybody has heard of Obama, the Queen of England
(Elizabeth II), McDonald's, and Coca Cola. But even when suggesting these I
have lingering doubts.

#### Exception: the culture of science itself

At least when writing for an audience of scientists, I propose that the
following references _are_ acceptable:

- **Famous scientists** like Einstein and Newton. Most likely all scientists in
  all fields will know at least these two names.
- **Cultural trends within science** like "open data" and  "the reproducibility
  crisis"
- **Events** like the development of the atomic bomb, discovery of DNA, etc.

#### Heuristic basket of countries

If you are trying to assess whether a specific cultural reference will be
understood by an international audience, I suggest explicitly considering the
following 5 countries:

- USA
- Germany (representative European country)
- India (huge country with unique culture)
- China (also huge country with a unique culture, and also a lot of censorship
  of Western books/films)
- Japan (Westernized rich country with very non-Western history and culture)

This list is, of course, not representative of the world population (eg lack of
any African, Middle Eastern, or South American countries), but this lack of
representation is (unfortunately) also reflected in the population of world
scientists (at least in my experience). At the very least, the inclusion of 3
non-Western countries will help avoid extremely Euro-centric references.

### Simple words

Non-native speakers tend have a smaller vocabulary. Using simpler words helps
readers avoid needing to interrupt their reading to consult a dictionary. Some
examples of simplified words:

- Multitude/numerous/plethora/innumerable/countless => many/a lot
- Utilize/employ/make use of/operate => use
- Obtain/acquire/procure/secure => get

**Potential objection 1:** simplifying words loses meaning. For example, saying
"there are many stars in the sky" instead of "there are innumerable stars in
the sky" removes the implication that the number of stars is so large that one
cannot intuitively reason about the number. A reader could potentially
interpret "many" stars as 100, while "innumerable" would not be interpreted
that way. **My counter-argument:** usually these subtle distinctions are not
important, and if they are important then they should be stated explicitly. For
example, one could say "there are many stars in the sky, at least one
trillion."

**Potential objection 2:** I was explicitly taught in high-school English class
that using the same word too many times makes text hard to read. For example,
"the sky contains many stars, many black holes, many pulsars, and many
galaxies" could be changed to "the sky contains many stars, a multitude of
black holes, countless pulsars and innumerable galaxies." **My
counter-argument:** the goal of scientific writing is clear communication, not
enjoyment. Although I personally enjoy reading the second version of the
sentence more, I am a native English speaker. I probably would have enjoyed the
sentence much less if I needed to open a dictionary to understand it.
Therefore, I think this high-school English rule should _explicitly be ignored_
in scientific writing.

### No slang

Most people try to avoid slang already, but it is easy to accidentally include
slang words that you use in everyday speech, eg "cool" or "rough" (like "rough
time"). Always double check your text.

### Literal language over metaphorical language

Metaphorical language is very common in English, and is often considered a key
part of "educated" writing. However, readers who are unfamiliar with a
particular metaphor are likely to interpret it literally before realizing that
it is a metaphor. Therefore, I suggest using literal language instead of
metaphors whenever possible.

Here are some examples:

- "Flowery language": unless the text is actually about flowers, this is a
  metaphor. I suggest "fancy language".
- "Warm greeting": unless the greeting had a high temperature, "friendly
  greeting" is more literal.
- "Hot topic": an abstract concept such as a topic does not have a temperature.
  "Popular topic" is more literal.

**Potential objection:** these uses of words are all legitimate. The
metaphorical meanings of "flowery", "warm", and "hot" all appear in the
dictionary. **My counter-argument 1:** while they do appear in the dictionary,
they are not the first entry. For example, the metaphorical usage of the word
"hot" above was entry #4 in the online dictionary I used. It is not fair to the
reader to make them read 4 dictionary entries. **My counter-argument 2:** these
listings appear in the dictionary because the metaphors are common, not because
it is a "legitimate" usage of the word.

### Simple sentence structures

Use simple sentence structures which are easy to read. I think "nested clauses"
in particular cause a lot of difficulties and deserve special attention. An
example of a sentence with a nested clause is:

> The teacher who has students that study hard will see their students succeed.

Native speakers will fairly easily parse this as:

> The teacher (who has students [that study hard]) will see their students
> succeed.

However, non-native speakers may find this more difficult, and even native
speakers may find it hard to read this sentence quickly. Instead, write
something like:

> The teacher will see their students succeed if their students study hard.

This avoids the nested clause.

### Avoid Latin/French/German phrases

Some Latin/French/German phrases effectively function as advanced English
vocabulary, for example:

- _lingua franca_: common language
- _ad infinitum_: repeated infinitely
- _carte blanche_: complete freedom to do what you think is best
- _coup de grâce_: final action which ends a bad situation
- _ansatz_: initial guess

Remember that the contemporary scientific community is not Europe in 1800.
International scientists will be highly educated, but this education will
likely not include French/German/Latin (unless they are in France or Germany).
Using these words makes your text _less_ international, not more. In almost all
cases the English translations are preferable.

NOTE: there are some notable exceptions. I think the abbreviations _i.e._ and
_e.g._ are common enough that most people know them. Some foreign words are
part of the established nomenclature in some fields (like _ansatz_ in physics
and _coup d'état_ in history).

### Minimize idioms

Idioms (sometimes called "expressions", "set phrases", "sayings", etc)  are
effectively difficult words which non-native speakers may not know. However,
unlike difficult words, idioms are harder for readers to understand because:

1. They may not appear in a dictionary.
2. It might be difficult to know which words in a sentence are included in the
   idiom.

Therefore, I suggest avoiding idioms as much as possible.

If you ignore this advice and use an idiom anyway, I suggest _surrounding it
with quotation marks_. Although this may look ugly, it clearly signals to the
reader that the words within the quotations should be parsed together and can
be copy/pasted into a search engine.

#### Native speakers beware!

**This is harder than it seems, especially for native speakers.** For example,
when I think about avoiding idioms I imagine the phrase _"it's raining cats and
dogs"_ which has a fixed non-literal meaning (it's raining intensely) but has
an obviously misleading literal meaning. While I agree that this phrase should
not appear in scientific writing, I think this is _not_ the phrase you should
have in mind when you think about avoiding idioms. If a non-native speaker
reads this phrase, it is obviously so absurd that they would probably search
for its meaning.

Instead, I suggest imagining the phrase _"once upon a time"_. This is a fixed
phrase which most native English speakers probably know, and unlike "raining
cats and dogs" it has no obvious "wrong" interpretation. However, read
literally word-by-word, the phrase does not make much sense. Additionally, it
might be difficult for non-native speakers to identify the fixed phrase within
a larger sentence. Imagine the sentence:

> Once upon a time chemists wanted to transform lead into gold.

Native speakers will parse this as:

> (Once upon a time) (chemists wanted to transform lead into gold).

However, a non-native speaker may try to parse it as:

> Once (upon a time chemists wanted) to transform lead into gold.

Does this mean that one time, _at a time which chemists wanted,_ they
transformed lead into gold? That doesn't make sense. Maybe:

> (Once upon) (a time chemists wanted) (to transform lead into gold).

Does this mean that when a specified desired time was reached, chemists
transformed lead into gold? Again, this doesn't make sense. What about:

> (Once upon a time chemists) wanted to transform lead into gold.

Some kind of special chemists wanted to transform lead into gold?

Hopefully this gives you a sense of how non-native speakers may struggle to
figure out what to look up.

#### Some other idioms to avoid

Here are some idioms which I use very naturally as a native speaker, but are
probably confusing to non-native speakers:

- "few and far between": far between what? Just write "rarely".
- "time and time again"/"from time to time"
- "on the go"/"on the run"
- "fall flat": be received poorly / not be understood
- "keep in mind": I think "make a mental note" is clearer
- "bird's eye view": overview is clearer

#### References to idioms

References to idioms do not make sense unless the reader knows the idiom. I
highlight this specifically because references to idioms are fairly common, but
it may not occur to native English speakers that they are making a reference.

Example: a biologist might encourage others to distribute their cells between
multiple freezers to "avoid putting all your cells in one basket". This is a
reference to the idiom "don't put all your eggs in one basket", which generally
means "don't allow one accident (like dropping the basket) to ruin everything
you have (like break all your eggs)". However, the recommendation does not make
sense if the reader does not know this idiom. I suggest just stating the
consequence literally, eg "avoid losing all your cells if one freezer loses
power".

## Potential objections to these rules, and my responses to these objections

- **These rules encourage dull writing**: it is possible to write engaging text
  without breaking these rules. However, the point of science is not to be
  exciting or entertaining to read. If you want this then go read a fiction
  book!
- **Idioms and cultural references make text more interesting to read, I want
  to use them**: first, remember that the interest is lost if you don't get the
  references. Second, in my rules you can still include them if you really
  want, just using quotations and footnotes to explain them.
- **Large language models largely solve this problem**: true, you can ask an
  LLM to explain the idioms and cultural references in a sentence. I tried
  ChatGPT a few times and it did a good job. However, I don't think it is kind
  to your readers to make them use ChatGPT to understand the text of your
  paper, especially when it is mostly a stylistic choice.
- **We should not "dumb down" our English writing, foreign scientists should
  instead improve their English**: this is true to a degree. For example, I
  would not advocate using the word "non-stop" instead of "continuous" just
  because "continuous" is a more difficult word. However, things like metaphors
  and cultural references are largely _optional_ in scientific writing and are
  somewhat unrelated to language skill (in my opinion). Try to empathize with a
  50 year old Japanese scientist working in Japan. Their life is entirely in
  Japanese, except for reading/writing papers in English and speaking to
  foreigners in English at international conferences approximately once per
  year. How much time do you want them to dedicate to learning advanced English
  to understand your "literary" writing?

## Conclusions

These rules all stem from my personal experience studying foreign languages
(mostly Chinese and French). Basically, try to write text that is easy to read
and eliminate unnecessary complexity whenever possible.

