I have been accumulating these observations for many years. Writing textbooks has led me to think about how best to present mathematics. I have also noted writing errors commonly made by my thesis students and in papers submitted to journals. Here I collect my conclusions.
My first objective for this document was to educate my students, thereby reducing the time needed to edit their theses. Since it exists, I have made it publicly available in the hope that others may find it useful. If you don't find it useful (or if you object to it on principle), then please ignore it. I hope to make some writers of mathematics (especially students) aware of issues they may not have considered; small changes can produce mathematical writing that is easier to read by wider audiences.
After an introductory explanation of why care in writing mathematics is needed, I discuss (1) mathematical style, (2) notation and terminology, (3) punctuation and English grammar as used in mathematical writing, and (4) English usage for non-native speakers. Some points are minor distinctions, but even these make mathematical writing clearer when used consistently. My intent is not to make writing rigid, but rather to make it transparent to avoid distracting the reader by ambiguities or awkwardness in the flow of the narrative.
_{ Mathematical style Abstract/Intro/Conclusion syntax for definitions "where" in definitions "double-duty" definitions "Let G=(V,E) be a graph" expressions as units separation of formulas notation starting sentence "let x,y be" conditions in parentheses mixing words & notation "Let .... Then" "When/For/Since" "As/For" as reasons "Hence/Thus/Therefore" "by Theorem X" "so" vs. "so that" "such that" vs. "so that" "Assume/Suppose/Let" "any/each/every" universal quantifiers "less/fewer" sets vs. sizes possessives on notation nested proofs "best possible" numerals vs. words } | _{ Terminology/Notation ":=" (for definitions) ":" vs. "|" for "such that" "sequence/series/list" "v1,v2,…,vn" lists of relations "k=1,2,...,n" "Big Oh" notation "maximum degree Δ" hyphens for parameters vertex vs. edge parameters two-word adjectives adverbs and "well-known" notation for paths order and size of graph "h∈G"; graphs are not sets digraphs and hypergraphs connected components "maximal" vs. "maximum" multicharacter operators "induct on", "by induction" "clique" or "complete subgr." isomorphism vs. subgraphs "proper coloring" "partitions" vs. "parts" "pairwise" vs. "mutually" "pairwise disjoint/isomorphic" "union/join" edge or path "between" set minus "left hand side" } | _{ English usage introductory words quotations/periods which/that antecedents naked "this" "distinct/unique" contractions "i.e." and "e.g." "different than" articles ("a/the") possessives & titles capitalization & titles adjectival names conjunctions & commas semicolons excessive commas serial comma appositives passive voice "the below" "either" "we have been proving" "non-" include hyphen? "placement of citation" } | _{ For Non-native speakers "bound of" "a joint work" "few" vs. "a few" "usual" "partial case" "passing a vertex" "can not" and "may be" "evidently" "principal" vs. "principle" more extra commas expressions to avoid } |
Live mathematical conversations use many shortcuts that are inappropriate in precise mathematical writing. The context is known by all participants, and shortcuts evolve to save time. Furthermore, the speaker can immediately clarify ambiguity. Without immediate access to the author, written mathematics must use language more carefully. Also, mathematical concepts are abstract, without context from everyday experience, so the writing must be more consistent to make the meaning clear. Outside mathematics, imprecise writing can still be understood because the objects and concepts discussed are familiar.
Some mathematicians object to some of my recommendations. Many time-honored practices in the writing of mathematics are grammatically incorrect. These mistakes in writing cause no difficulty for readers with sufficient mathematical sophistication or familiarity with the subject, but it is unnecessary to restrict the audience to such readers. A bit of care leads to clearer writing that makes mathematics more easily accessible and readable to a wider and less specialized audience.
Some languages have conventions of usage or grammar that lead to typical errors in English mathematical writing by their native speakers. I discuss some such items in a separate section at the end. My explanations use terms for English parts of speech and punctuation, giving technical reasons for some recommendations. I hope that readers who are unfamiliar with these terms will still benefit from seeing what the choices are.
I apologize in advance for my own grammatical errors. Habits die hard, and it is easy to err in applying principles of writing. In particular, there are inconsistencies between what I propose here and what I wrote in my earlier books. Those books were written in the previous millennium, and I have learned many things about writing since then. Also, I am a speaker of American English, and some points are consistently different in British English (such as the treatment of "which" vs. "that" and the aversion to serial commas).
Some of my conclusions conflict with manuals of English style. My conclusions are intended to produce clear mathematical writing that is more logically consistent than publishers' conventions. This applies especially to punctuation and to words that serve as logical connectives.
I welcome corrections, suggestions/inquiries, and "pet peeves" that may lead to further items in later versions of this guide.
The first section of the paper is an "Introduction" that should motivate the problem, discuss related results, state the results more completely, and perhaps summarize the techniques or the structure of the paper or crucial definitions.
The introduction should also contain any concluding remarks or key conjectures. There is generally little or no value in a separate section of concluding remarks. Such remarks either are redundant or contain information that readers usually seek in the introduction. Readers who study the full details of the proofs are well aware of the statements that summarize what has been done. Readers who do not read the full details have no reason to go on to the concluding remarks. A mathematical research article is not read like a novel or even like an essay that seeks to "persuade" the reader; it does not need an epilogue.
Many definitions are phrased as "An object has property italicized term if condition holds." Here we use the word "if" even though subsequently it is understood that an object has the property if and only if the defining condition holds. The italicization alerts the reader to this situation. The convention can be justified by saying that the property or object does not actually exist until the definition is complete, so one does not yet in the definition say that the named property implies the condition.
Definitions written by non-native speakers sometimes contain extra commas.
In each sentence below, the comma should be deleted.
"A bipartite graph, is a graph that is 2-colorable".
"A graph is bipartite, if it is 2-colorable".
The first example is a mistaken placement of a comma inside a clause (see
discussion of Commas).
Note the difference in italicization above. When written as an adjective-noun combination, the term being defined is the name for structures that have the property; hence the full term bipartite graph is italicized. When the property alone is being defined and is positioned as a predicate adjective, only the adjective is italicized.
Note the difference between "where" and "such that". "Where" is used when the preceding notation is being defined; "such that" is used when it is already defined and its value is being restricted.
Of course, readers sufficiently familiar with the context have no trouble understanding what is meant, but why disenfranchise other readers? One can just as easily write "The neighborhood of a vertex v, denoted N(v), is {u: uv∈ E(G)}". Alternatively, one can introduce the notation as an appositive in a conventional position immediate after the term defined: "The neighborhood N(v) of a vertex v is {u: uv∈ E(G)}".
A common Double-Duty definition is "Let G=(V,E) be a graph". The sentence defines the equation G=(V,E) to be a graph. Of course, the writer intends simultaneously to introduce notation for a particular graph and its vertex set and edge set, but that is not what the sentence says. It is better to write "Let G be a graph" and use operators V and E to refer to the vertex and edge sets of G as V(G) and E(G) (see also Operators vs. constants.)
A more subtle example is "For each 1≤ i≤ n,". The introduction of the notation i has been lost because the inequalities impose conditions on it before it is defined. Since the expression is a unit, grammatically the phrase is referring to each inequality written in this way. Correct alternatives that express the intended meaning include "For all i such that 1≤i≤n", "For i∈[n]", and "For 1≤i≤n". The third option is slightly different from the others; it means "whenever i is such that the conditions hold", implicitly introducing i in a specified range but avoiding the grammatical problem.
For example, "there exists i<j with x_{i}=x_{j}" ascribes a property to the inequality i<j (and is a Double-Duty Definition of i). Without context, it is hard to tell that the author meant "there exists i such that i<j and x_{i}=x_{j}". Consider also "The number of nonneighbors is n-1-d(u)≥ i." The number of nonneighbors is not an inequality, it is a number; the author is trying to make two statements in one inequality. For clarity, separate the statements: "The number of nonneighbors is n-1-d(u), which is at least i".
Exceptions. Applying this principle with very simple expressions leads
to ponderous writing. Here are two notable exceptions:
1) In "Choose x∈ V(G) such that x has minimum degree," we
are choosing x, not the expression "x∈ V(G)". The
justification for this exception is that the membership or containment symbol
is read as "in", which is not a verb. (One can treat nonmembership in the same
way.)
2) "Let G'=G-x". When introducing notation for an object or expression
by a single imperative verb ("let", "set", "put", "choose", etc.), we read the
equality symbol as the verb "equal", truly an exception. This exception can
be recognized by the lack of any English verb in the sentence. Continuing with
another verb, as in "Let G'=G-x be ...", would produce
a Double Duty Definition.
If the introductory part of the sentence is longer, then we may already have a noun and a verb, and the expression again becomes a unit. For example, "Include each vertex independently with probability p=(ln n)/n" should be "Include each vertex independently with probability p, where p=(ln n)/n".
[On the other hand, "we have" is an awkward phrase that often should be dropped when not needed to separate formulas. For example, instead of "By the preceding theorem, we have A=B," prefer "By the preceding theorem, A=B".]
When the second formula just specifies an object, the separation can be accomplished by specifying the type of object, as in "When k=2, the graph G is Eulerian" instead of "When k=2, G is Eulerian." One can always rewrite notational expressions separated by a comma to avoid the difficulty. Usually this is easy, as in changing "For every bipartite graph G, χ(G)≤2" to "If G is bipartite, then χ(G)≤2".
Exceptions. With lists of size at least three, omission of "and" does not cause as much confusion, and including it can be awkard. Here the objection to the common mathematical convention is much weaker: we accept "Let x,y,z be the vertices of T," although writing "Let {x,y,z} be the vertex set of T" would be more precise. Still, "let x, y, and z be the vertices" reads better.
Another sensible exception is "Choose x,y∈ V(G)". Here the relation is between each variable and the set, and we accept this as a single formula. Again a justification is that we can read ∈ as the single word "in", without a verb. Similarly, many mathematicians write, "For n,m≥2" to mean the conjunction of n≥2 and m≥2. The exception for the membership symbol is consistent with other exceptions for the membership symbol; doing it with inequalities is more questionable. Avoid doing it with equalities (see Variable equal to list). it unnecessarily requires a pause for the reader to figure it out.
Other examples: "Suppose there is an edge xy (≠e) in G" should be "Suppose that G has an edge xy other than e". Similarly, "For k≤m with k even" improves on "For k≤m (k even)" or "For k≤m, k even", and "Consider a_{i} for 1≤i≤n" is better than "Consider a_{i} (1≤i≤n)". One can also separate by putting words into the parentheses: "For k≤ m (where k is even)". Note that "Suppose that there is an edge xy≠e in G" is a Double-Duty Definition; "xy≠e" is not an edge.
The same principle applies to logical symbols. In written mathematics, do not use the symbols ∃,∀,⇒,iff) to substitute for words in sentences. Shorthand notation used to save space on lecture slides need not follow these restrictions, since the slides summarize the lecture and are accompanied orally by sentences.
Used at the beginning of a sentence, the English word "Then" is temporal, as in "Then we left." Since the implicative sense of "then" is so common in mathematics, the temporal sense should rarely be used, to avoid confusion. Usually the temporal "then" at the beginning of a sentence can be changed to "Now" or "Next" with less confusion and essentially the same (and more accurate) meaning, especially in a proof.
When readability would be improved by omitting "then", the sentence should instead start with "When" or "For", as in this sentence itself. A comma still follows the condition introduced by "When" or "For". The structure of a sentence beginning with "Since" is like those beginning with "When" or "For"; a comma follows the first clause. After "Since" or "Because", the concluding clause cannot begin with "then" or "so"; "then" is used only with "If".
Among these choices, I treat "Therefore" as the most formal, introducing a major conclusion and hence taking a comma. Because "Hence" and "Thus" are single syllables, I use them without commas to indicate the flow of argument without making the writing choppy. This choice modifies strict English punctuation in the service of mathematical understanding. It is not incorrect to put commas after all these introductory words, but it enhances mathematical communication to omit the commas after short words introducing short conclusions that are just a step along the way. Copy editors put in the commas, and I insist that they be removed again.
"Suppose" vs. "Suppose that". After words of hypothesis or conclusion ("suppose", "assume", "implies", "conclude", etc), use "that" when what follows is a clause with an English verb. Omit "that" when what follows is just a noun unit, such as a notional expression. For example, "Assume the hypothesis" is a complete sentence with imperative verb and object. The structure is the same in "Suppose x+y≤10". When an English verb follows, we have "Suppose that f is a proper coloring".
This distinction is a matter of some debate. Some more formal authors use "Suppose that" when what follows is a formula containing a relational symbol, treating the symbol as a verb. I think it is better to maintain the consistency of treating formulas as noun units. The clarification accomplished by "that" when a verb follows become unnecessary when the clause is condensed into notation. When the notation is displayed, its role as a fact (noun) is clearer and makes "that" especially unnecessary; the use of "that" should be the same when the formula is not displayed. A related example is "the case k=2", as opposed to "the case that k=2"; here "k=2" is the case, which is a noun, so there is no "that".
Writers who always drop "that" from "Suppose that" have a valid point. In spoken English, we usually drop "that" in this conntext to avoid ponderous language. When the instruction is informal, without abstract concepts, it is reasonable to drop "that". For example, "Suppose the hypothesis is true" would be awkward with "that". Similarly, the very short "Suppose there is" would be awkward with "that" after "Suppose". Here the verb is gone before one even notices it; this is almost like "Suppose [notation]".
This exception may seem awkward. A better solution when introducing notation is to avoid "Suppose x is" entirely: "Let G be a graph" is better than "Suppose G is a graph". Compare "Suppose x=1" and "Let x=1". The first assumes the truth of an equality and treats the equation is a unit. The second is more active. Because we never say "Let that . . .", we either view "Let" as the verb or view the equality sign as the verb. This usage of "Let" is an exception to the treatment of a href="#expressions">expressions as noun units; it is not used with inequalities, because an inequality sign would need to be read as the lengthy "be less than or equal to" to become a verb.
Numbered plural variables cause difficulty. In English, "for every two elements" is awkward because "every" is singular. Thus here it is better to say "for any two elements". The presence of "for" is suggestive of the universal quantification and helps avoid ambiguity. Confusion can still arise: consider "Form G' from G by adding an edge joining any two vertices with distance 2 in G." Here some readers will think that only one edge is added.
Avoiding "any" is not imperative. Evaluate its use in context, making sure to prevent misinterpretation. "Any" is a good substitute for "an arbitrary", and the meaning of "not any" is fairly clear.
Using an indefinite article ("a" or "an") as a universal quantifier can be dangerous, as in "Prove that a bipartite graph has no odd cycle." Some readers (often students) may interpret "a" as "one" or "some", turning universality into existence. Using "every" is clearer. Putting "must" before the conclusion can suggest universality but is usually unnecessary.
The informal phrase "is most likely" is similar to "is best possible"; there is no article because "most likely" is used as a single term. It means that the probability is high, whereas "is the most likely" means having higher probability than any other outcome. Another example is "best practice", which is a single technical term in areas of management. It is used as a single term, without "the". For example, I have seen the title "Best Practices in Online and Blended Learning and Teaching".
Although "This result is best possible" is a complete sentence, it is somewhat vague, since it does not specify the sense in which the result cannot be improved. Often it is more informative to say something like "the constant in the upper bound cannot be improved". For this reason, some writers suggest avoiding the term "best possible".
The usage of "series" in English is contrary to its usage in mathematics. In English a "series" usually consists of finitely many occurrences in order, as in the "World Series" or the title "A Series of Unfortunate Events". In mathematics a series is an infinite sum. So I believe, but one correspondent tells me that a finite sum is also a series, though I would just call it a summation or finite sum.
Although html does not provide line-centered dots, the ellipsis in an indexed list with relations should be vertically centered on the line ("\cdots" in tex), while the ellipsis in an indexed list separated by commas should be on the baseline ("\ldots" in tex).
It is tempting for mnemonic reasons to write "We write V=V(G) and Δ=Δ(G)". Admittedly, this usage is not confusing when discussing only one graph at a time; the difference between a graph invariant and a real-valued function is that we rarely focus on the value of a real-valued function at just one point. Nevertheless, it is rare that a paper discusses only one graph, and hence it is better to use V(G) and Δ(G) for objects associated with G. The problem is particularly bad with Δ, since this character also occurs in mathematics as a difference operator. One often sees "Δn" meaning the change in the value of n, so one should not use "Δn" to mean the maximum degree times the number of vertices in a graph. (In my textbook I violated this principle by using n(G) and e(G) for the numbers of vertices and edges in a graph G while using n for the number of vertices of a particular graph and e as a particular edge; the error will be corrected in the third edition.)
Furthermore, using the hyphen in the edge context maintains consistency with the needed usage explained in the preceding item. When comparing "edge-coloring" and "list coloring", we are not coloring the lists, so the hyphenation of the term is different.
The presence of the word "vertex" sometimes becomes an issue. As mentioned above, the fundamental parameters involve vertices and do not require the word "vertex" as a modifier. Similarly, when we speak of "disjoint subgraphs", it must be that they cannot share anything, vertices or edges, so the word "vertex" is unnecessary. The concept "edge-disjoint" indicates a less restrictive condition. Saying "vertex disjoint" suggests vertices as an alternative to edges; it is better just to say "disjoint". Clearly disjoint cycles share no vertices.
Further examples: "graph-theoretic techniques", "straight-line drawing" (what is a straight line drawing or a straight line segment as opposed to one that is not straight?)
The adverb "well" is a possible exception. In "well-known theorem" we think of the combination "well-known" as a single technical term, leading to "A well-known theorem is a theorem that is well known." The term "well-defined" also behaves this way. However, opinion on this point is sharply divided; some authors insist that because "well" is an adverb the term should not be hyphenated. Further support for the hyphen: the mathematical usage of "well" in the hyphenated term differs from the English usage of "well" is the unhyphenated expression. A "well defined function" is a function for which we have done a good job of giving a definition, but a "well-defined function" is an object that has been given a valid definition as a function, with every domain element given a unique image.
It must be admitted that "order" and "size" are quite convenient, while overuse of "number of vertices" and "number of edges" becomes quite awkward. Introducing notation for the numbers of vertices and edges minimizes this difficulty. Unfortunately, there is no generally agreed notation for operators returning the numbers of vertices and edges of a graph G. The only notation that cannot be misunderstood is the absence of special notation: |V(G)| and |E(G)|. Even these expressions are cumbersome to use repeatedly, so it is often beneficial to write "Let n=|V(G)| and m=|E(G)|."
However, when A⊆V(G) it is clear that |A| is the size of the vertex subset A. It can then be useful to use ||A|| to denote |E(G[A])|, the number of edges in the subgraph of G induced by A.
Similarly, one should not use "hyperedges" to refer to the edges of a hypergraph. Hypergraphs generalize graphs by allowing edges to have arbitrary size. Calling them "hyperedges" eliminates the possibility of saying that graphs arise as a special case, since graphs have edges, not hyperedges.
Although this distinction is sensible and has become established in many settings (such as "maximum antichain" and "maximum independent set"), potential confusion can be reduced by using "largest" and "smallest" instead of "maximum" and "minimum". For example, it is harder to misinterpret "a largest matching" than to misinterpret "a maximum matching".
For consistency, then, one should not write "a vertex of maximal degree" or "the maximal number of edges"; that is, "maximal" should not be applied to numerical values. This is consistent with usage in continuous mathematics, where we write that a continuous function "attains its maximum" on a closed and bounded set.
A different problem arises in the induction step. When we cite the induction hypothesis, we must write "By the induction hypothesis", not "By induction". To obtain the conclusion for the smaller instance, we are invoking the hypothesis that the claim holds for smaller values; we are not invoking the principle of mathematical induction.
Hence we should never write "a P_{n}" for a member of that class. We can write that a graph "contains a path with n vertices", because that is a structural description of the subgraph, but we cannot write "contains a P_{n}" or "consider a P_{n} in G". We can say "contains ten copies of P_{n}" to refer to subgraphs that are n-vertex paths; each such subgraph is a member of the isomorphism class denoted by P_{n}.
Neverthless, some flexibility is helpful here. When H is the notation for an isomorphism class, we still write "H⊆G" to mean that some subgraph of G belongs to the isomorphism class or is "isomorphic to H" (or "G contains a copy of H"), even though we are not specifying the particular vertices or edges of G used in the subgraph.
Some authors who write extensively about chromatic number and edge-chromatic number drop the word "proper" and use k-[edge-]coloring for the restricted concept. The minor convenience gained by dropping this word is overwhelmed by the negative influence of introducing inconsistency of terminology in combinatorics. Use "proper k-coloring" when that is what is meant. For other variations, such as "acyclic k-coloring" or "dynamic k-coloring", the adjectives replace "proper" by imposing further restrictions on the k-coloring, so the word "proper" is then no longer needed.
A bipartition is a partition into two parts. In particular, we say that a bipartition of a bipartite graph is a partition of its vertex set into two independent sets. In the past I used "partite sets" to refer to the parts of such a partition, but there are objections to that term, and students never get it (for example, they refer to one "partite" of a graph, and certainly "partite" is not a noun. Hence I now refer to the "parts" of a bipartite graph. This is a slight abuse of terminology, but I think its familiarity as a word better facilitates discussion.
Some authors use G+H instead (or also!) to denote the join of G and H, which consists of the disjoint union plus edges joining every vertex of G to every vertex of H. Other notation has been used, such as G∨H, borrowing the join operation (x∨y) in lattices or logic, but this is not satisfactory. Instead the best notation for the graph join is \diamondplus (unavailable in html?) which overstrikes a diamond and a plus, much like "⊕" except with a rotated square whose corners are at the points of the "+" ("⊕" is unavailable because it represents symmetric difference or binary sum). The \diamondplus is consistent with the Nesetril notation for graph products: the symbol is a picture of the result of applying the operation to two copies of K_{2}. In addition, the use of "+" indicates that the number of vertices is additive.
When the phrase after the relative pronoun specifies a further restriction of the class that has just been introduced, the correct pronoun is "that", and the subsequent phrase tells which of the items in the class are those being discussed. If the subsequent phrase speaks about the totality of the class, then the proper pronoun is "which". When "that" and "which" both seem usable, use "that" when the sense is "having the property that", and use "which" when the sense is "all of which" or "the only one of which". Usually a comma is appropriate before "which". Usually "that" is correct when an indefinite article ("a" or "an") has been used on the word being modified. Beware: This distinction is not made or is made the opposite way in British English. Some American style manuals don't care, but in mathematics there are two distinct meanings to be expressed.
The word "distinct" has the same meaning as "different". Two things can be distinct, but one thing cannot be distinct. Thus the sentence "Every value is distinct" is incorrect; it has no meaning. Many beginning students think it means that each value is different from every other value, but it does not.
The word "unique" indicates that there is only one of the items being described. It does not mean that this item is different from other items. Some students think that "The function f maps the points in A to unique points in B" is a statement that f is injective, but it is not. Every function from A to B maps each point in A to a unique point in B.
The distinction between the words "distinct" and "unique" is made clear by a typical boast on the World Wide Web. The sentence "Our website has one million unique visitors" makes no sense. The intent is to say that among millions of hits there are one million distinct visitors; if there is a unique visitor, then there is no other visitor.
Functions or parameters assign a number to each domain object. The resulting value is specific for the object; there is only one choice for it. Hence we do not say "the graph has a chromatic number 3" or "the vertex has a degree 3". These sentences suggest that the object may have more than one value of the parameter. The answer to the question "What is the degree of this vertex?" may be "This vertex has degree 3", but it cannot be "This vertex has a degree 3".
We also do not say "This vertex has the degree 3", although "The degree of this vertex is 3" is correct. Consider the sentence "Every graph has an even number of vertices with odd degree, which means that the list of vertex degrees has even sum." The term "even number" takes the article "an" because we are saying which type of number is being used (it is one of the even numbers). The later "odd degree" and "even sum" do not, because these are properties that the vertices and the list do or do not satisfy. Articles are inappropriate when invoking a property.
Articles also are not used with conceptual nouns. Compare with familiar conversation: we say "This chair has value $100" and not "This chair has the value $100." "Value" and "degree" are abstract properties. Here is another non-mathematical example: We say "I receive compensation for my work," not "I receive a compensation for my work." Compensation is an amount, but here only the abstract concept of receiving compensation is meant, not some number of things. Hence we do not use an article.
Similarly, abstract properties do not take articles. We say "because transitivity of A implies transitivity of B", not "because the transitivity of A implies the transitivity of B". The property in question is "transitivity", not "the transitivity".
When discussing a result by two authors, we cannot put possessives on both names, and making only the second name possessive would be wrong. Hence we write "the Greene--Kleitman Theorem". Here "the" serves as a definite article for the unique object "Greene--Kleitman Theorem". When the result is less celebrated, one can indicate the possessive by "of", as in "the theorem of Greene and Kleitman".
Two clauses (in essence, two complete sentences) may be combined using a conjunction; the conjuction must be preceded by a comma. Examples of conjunctions are "and", "but", "then", and "so" (the latter should be treated as conjunctions in mathematical writing). Since a conjunction joins two things, sentences should not begin with these words. This is a logical approach that helps keep writing clear, though strict English usage (especially British) may call some of these words adverbs. See further comments on the use of then and so.
Exception. The situation is more complicated when the second clause itself contains a conjunction. Compare "If A, then B holds and C holds" with "If A, then B holds, and C holds". In the first sentence, it is clear that A implies both B and C. The proper grouping or meaning in the second sentence is unclear. Since we only have one comma symbol and don't parenthesize sentences to indicate grouping, a short conjunction of two sentences within a larger conjunction is written without a comma.
One reason for using the serial comma in lists is to avoid confusion in sentences that do not contain lists. Consider the sentences "Like a, b and c have the same property" and "Later, Early and Jones proved the conjecture". These are not lists, and using a comma would be wrong, but when a document does not use serial commas these examples initially appear to be lists. Similarly, in that context an item in a list that itself joins two subitems with "and" looks like the last two items in a list.
Omitting the serial comma can also cause confusion mathematically, as in "The value of f is positive at 2, negative at 1 and 0 at 0."
When an appositive is short enough or contains essential information, the commas are omitted: "My friend Bob is a student." In mathematical writing, a similar situation applies when notation is introduced: "The degree d(v) of a vertex v is the number of neighbors of v." Here "d(v)" is a brief appositive. One could argue that the notation for "degree" is not essential to the sense of the sentence, but putting commas around very short appositives can produce very choppy sentences. A speaker need not pause for such appositives, and hence one may omit the commas.
The expression "may be" does exist in English, when used as a verb as in "It may be true" or "This may be the only component". However, when it appears at the start of a clause most likely the word "maybe" is intended, as in "Maybe this proof will work. Note that in this situation there is another verb ("work"), and the initial expression means "Possibly", which is not a verb.
Similarly, "as evidenced by" generally is not used in English; change to "as shown by".