Document

Coset Codes-Part I: Introduction and Geometrical Classification

G. DAVID FORNEY, JR., FELLOW, IEEE
Invited Paper

Abstract

Practically all known good constructive coding techniques for band-limited channels, including lattice codes and various recently proposed trellis-coded modulation schemes, can be characterized as coset codes. A coset code is defined by a lattice partition $\Lambda / \Lambda^{\prime}$ and by a binary encoder $C$ that selects a sequence of cosets of the lattice $\Lambda^{\prime}$ .

#13 Decoding complexity as a geometric parameter

The fundamental coding gain of a coset code, as well as other important parameters such as the error coefficient, the decoding complexity, and the constellation expansion factor, are purely geometric parameters determined by $C$ and $\Lambda / \Lambda^{\prime}$ .

The known types of coset codes, as well as a number of new classes that systematize and generalize known codes, are classified and compared in terms of these parameters.

I. Introduction

A. History

THE FACT that the channel capacity of band-limited channels with white Gaussian noise is some 9 dB beyond what can be achieved with simple pulse amplitude modulation (PAM) was an immediate consequence of Shannon's original work (see [1]). It is therefore remarkable that approximately three decades passed before serious work began on developing constructive coding techniques that could achieve sizable fractions of this potential gain.

The field was not completely inactive during this period. Shannon's work had indicated that there must be sphere packings in spaces of high dimension with sufficiently high density to approach channel capacity. In the 1950's and 1960's mathematicians developed some constructive dense sphere packings based on lattices in spaces of moderate to high dimension, notably the $2^{n}$ -dimensional Barnes-Wall lattices [2], and the ultradense 24-dimensional Leech lattice [3]. The earliest advocate of the use of lattices for communications appears to have been Lang in Canada in the early 1960's (see, e.g., [4, preface]). In an interesting interplay between mathematics and communications, it seems that Lang's calculations of bounds on maximum lattice density for $8-32$ dimensions helped to motivate Leech to

[^0]discover his now-famous lattice [3, p. 265]. Another longtime proponent of lattices in communications, also in Canada, has been deBuda, who proved that the coding theorem applies to lattice codes [5]. Connections between lattice theory and coding theory were made by Leech and Sloane [6], and Sloane has since continued to develop bridges between these disciplines. This work is authoritatively and comprehensively summarized in [7].

It is probably fair to say, however, that it was the trellis-coded modulation schemes of Ungerboeck [8] that captured the attention of the modulation community and inspired widespread practical application as well as intensified research. Through a technique called "mapping by set partitioning" of one- and two-dimensional signal constellations, combined with binary convolutional codes, Ungerboeck showed how coding gains of the order of 3 dB could be obtained with simple four-state codes, while gains of up to 6 dB could be obtained with more complex (128-state) codes. A variant [9] of Ungerboeck's eight-state two-dimensional scheme has been adopted in international standards for both $14.4 \mathrm{kbit} / \mathrm{s}$ private-line modems and $9.6 \mathrm{kbit} / \mathrm{s}$ switched-network modems and is coming into wide commercial use.

More recently, trellis-coded modulation schemes using multidimensional signal constellations have been developed. A simple four-dimensional scheme of Gallager was presented by Forney et al. in [1], and a similar scheme was discovered independently by Calderbank and Sloane [10]. Wei [11] has developed a class of multidimensional schemes that are highly suited for implementation, one of which is used in a Codex 19.2-kbit/s modem. Calderbank and Sloane [12], [13] have also developed a variety of classes of new trellis codes.

In an earlier paper [1] Forney et al. pointed out that all schemes known at that time, including the most important lattice codes, could be generated by the same basic elements:

a conventional binary encoder, block or convolutional, operates on certain of the data bits to provide a larger number of coded bits;
these coded bits select one of the subsets of a partitioned signal constellation;
additional uncoded bits select an individual signal point from the selected subset.

This way of looking at coding schemes has several important consequences:

The code distance properties, and thus the fundamental coding gain, are determined by the binary encoder and the subset partitioning, which are largely decoupled from the choice of signal constellation in the third step. In this last step there is a trade-off between optimal shaping of the signal constellation and implementation simplicity, but this is almost independent of the fundamental coding scheme and has only a minor effect on the overall coding gain. Finally, in decoding, the first operation can always be to determine the best signal point in each subset and its metric; after that step, decoding depends again only on the fundamental code structure determined by the first two encoding operations.
Different classes of codes can be readily compared and contrasted in this common framework. In this paper, which is to a large extent a sequel to [1], we concern ourselves only with the code structure imposed by the first two encoding operations, regarding the constellation shaping as peripheral; in our view this clarifies the similarities and differences between various schemes.

B. Introduction to Coset Codes

Calderbank and Sloane [13] have made the important observations that the signal constellation should be regarded as a finite set of points taken from an infinite lattice, and that the partitioning of the constellation into subsets corresponds to the partitioning of that lattice into a sublattice and its cosets. This lattice/coset language is both illuminating and powerful, and we have found that all of the good coded modulation schemes mentioned above can be put into this framework. (These codes may be more specifically characterized as lattice-type coset codes, which are the only type considered in this paper, apart from a brief mention of phase-modulated coset codes in the next section.)

We call this general class of coded modulation schemes coset codes. They seem to provide a general approach to the construction of implementable codes for band-limited channels that approach channel capacity, just as conventional codes (both block and convolutional) do for the power-limited case.

We now give a quick preview of the elements of coset codes, and of key terms and concepts that will figure in the rest of the paper. Fig. 1 illustrates the general structure of an encoder for a coset code, embodying the three principal elements just described and using the language of lattices

Fig. 1. General structure of encoder for coset code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ .

and cosets (compare [1, fig. 10] or [13, fig. 2]). The main ingredients are as follows:

An $N$ -dimensional lattice $\Lambda$ , which we can think of as an infinite regular array of points in $N$ -space. The signal points will all be taken from a finite subset of points lying within a translate (coset) of $\Lambda$ , and the set of all possible signal points is called the signal constellation.
A sublattice $\Lambda^{\prime}$ of $\Lambda$ , i.e., a subset of the points of $\Lambda$ which is itself an $N$ -dimensional lattice. The sublattice induces a partition $\Lambda / \Lambda^{\prime}$ of $\Lambda$ into $\left|\Lambda / \Lambda^{\prime}\right|$ cosets of $\Lambda^{\prime}$ , where $\left|\Lambda / \Lambda^{\prime}\right|$ is the order of the partition; when $\Lambda$ and $\Lambda^{\prime}$ are binary lattices, the order of the partition is a power of 2 , say $2^{k+r}$ , and correspondingly, the partition divides the signal constellation into $2^{k+r}$ subsets, each corresponding to a distinct coset of $\Lambda^{\prime}$ .
A rate- $k /(k+r)$ binary encoder $C$ , which takes in $k$ bits per $N$ dimensions and puts out $k+r$ coded bits; the latter select one of the cosets of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ . The redundancy $r(C)$ of $C$ is $r$ bits per $N$ dimensions; the normalized redundancy per two dimensions is $\rho(C)= 2 r(C) / N$ .

The coset code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is the set of all sequences of signal points that lie within a sequence of cosets of $\Lambda^{\prime}$ that could be specified by a sequence of coded bits from $C$ . Some lattices, including the most useful ones, can be generated as lattice codes $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ , where $\Lambda$ and $\Lambda^{\prime}$ are lattices of lower dimension, and $C$ is a binary block code. If $C$ is a convolutional encoder, then $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is a trellis code.

#7 Definition of fundamental volume and redundancy in the Intro…

The fundamental coding gain of the coset code is denoted by $\gamma(\mathbb{C})$ and is defined by two elementary geometrical parameters: the minimum squared distance $d_{\text {min }}^{2}(\mathbb{C})$ between signal point sequences in $\mathbb{C}$ and the fundamental volume $V(\mathbb{C})$ per $N$ dimensions, which is equal to $2^{r(\mathbb{C})}$ where the redundancy $r(\mathbb{C})$ is equal to the sum of the redundancy $r(C)$ of the encoder $C$ and the redundancy $r(\Lambda)$ of the lattice $\Lambda$ . In fact,

\gamma(\mathbb{C})=2^{-\rho(\mathbb{C})} d_{\min }^{2}(\mathbb{C})

where the normalized redundancy $\rho(\mathbb{C})$ (per two dimensions) is equal to $2 r(\mathbb{C}) / N$ .

To transmit $n$ bits per $N$ dimensions, the signal constellation must consist of $2^{n+r}$ points from a coset of $\Lambda$ , partitioned into $2^{k+r}$ subsets, each consisting of $2^{n-k}$ points from a different coset of $\Lambda^{\prime}$ . Given a selected coset of $\Lambda^{\prime}$ , $n-k$ uncoded bits select a particular signal point from that coset. The constellation expansion factor (compared to an uncoded constellation of $2^{n}$ points from a coset of $\Lambda$ ) is thus $2^{r(C)}$ per $N$ dimensions, or $2^{\rho(C)}$ per two dimensions. This translates into an average power cost of a factor of $2^{\rho(C)}$ (or $\rho(C) \cdot 3.01 \mathrm{~dB}$ ), which is reflected in the formula for the fundamental coding gain $\gamma(\mathbb{C})$ just given.

The total coding gain $\gamma_{\text {tot }}(\mathbb{C})$ is the product of the fundamental coding gain $\gamma(\mathbb{C})$ with the shape gain $\gamma_{s}$ of the finite constellation ( $\gamma_{s}$ is defined as $\gamma_{\text {tot }}(\mathbb{C}) / \gamma(\mathbb{C})$ and is approximately equal to the ratio of the normalized second moment [7] of an $N$ -cube to that of the region of $N$ -space in which the constellation is contained). If the constella- tion is the set of points in a coset of $\Lambda$ that lie within an $N$ -cube, $\gamma_{s}=1$ and $\gamma_{\text {tot }}(\mathbb{C})=\gamma(\mathbb{C})$ . If the constellation is chosen more like an $N$ -sphere to reduce average power, $\gamma_{s}$ measures this reduction. Thus $\gamma_{s}$ , unlike $\gamma(\mathbb{C})$ , reflects finite constellation effects. Even for the same code $\mathbb{C}, \gamma_{s}$ will in general vary with $n$ , unlike $\gamma(\mathbb{C})$ . $\gamma_{s}$ is usually much smaller than $\gamma(\mathbb{C})$ , being upper-bounded by the shape gain of an $N$ -sphere (see next paragraph); however, calculating $\gamma_{s}$ (or $\gamma_{\text {tot }}(\mathbb{C})$ ) is usually more cumbersome than calculating $\gamma(\mathbb{C})$ . Finally, $\gamma_{s}$ is determined not by $\mathbb{C}$ but by the choice of constellation boundary; in general, a similar gain can be achieved in uncoded systems by choosing a similar boundary for an uncoded constellation in $N$ dimensions. For all these reasons we prefer to focus on the fundamental coding gain $\gamma(\mathbb{C})$ in this paper. (Multidimensional constellations will be considered in [14].) We feel that the various values for $\gamma_{\text {tot }}(\mathbb{C})$ that have appeared in the prior literature have confused rather than clarified the fundamental properties of these codes and the comparisons between them. (Calderbank and Sloane [13] also calculate an asymptotic coding gain whose value is independent of finite constellation effects. This gain is the combination of our "fundamental coding gain" and the shape gain of an $N$ -sphere over an $N$ -cube, which for $N$ even is $G_{\otimes}= \pi(n+1) /\left[6(n!)^{1 / n}\right]$ , where $n=N / 2$ ; thus $G_{\otimes}=\pi / 3(0.20$ dB ) for $N=2, \pi / 2^{3 / 2}(0.46 \mathrm{~dB})$ for $N=4,5 \pi / 6(24)^{1 / 4} (0.73 \mathrm{~dB})$ for $N=8$ , with a limit of $\pi e / 6(1.53 \mathrm{~dB})$ as $N \rightarrow \infty$ [1].)

As a simple example of a coset code, let us consider the four-state two-dimensional Ungerboeck code illustrated in Fig. 2, transmitting 5 bits per two dimensions with the "square" 64 -point constellation of Fig. 3(a). In this case the lattice $\Lambda$ is the two-dimensional integer lattice $\boldsymbol{Z}^{2}$ , i.e., the set of all integer 2-tuples (the signal constellation is actually chosen from its translate $\boldsymbol{Z}^{2}+(1 / 2,1 / 2)$ for symmetry). Thus the scale is such that the minimum (squared) distance between points in the constellation is one. The sublattice $\Lambda^{\prime}$ is the two-dimensional lattice $2 \boldsymbol{Z}^{2}$ , i.e., the set of all even integer 2 -tuples. The partition $\boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}$ has order 4, i.e., $\boldsymbol{Z}^{2}$ is the union of four cosets of $2 \boldsymbol{Z}^{2}$ , which correspond to the points labeled $A, B, C$ , and $D$ in Fig. 3. There are 16 points in the constellation from each of the four cosets. The minimum squared distance between points in any coset of $2 Z^{2}$ is four. The encoder $C$ is a rate-1/2 four-state convolutional encoder. Thus the redundancy per two dimensions is $r(C)=\rho(C)=1$ . One of the five input bits per two dimensions goes into the encoder, and the two resulting coded bits select one of the

Fig. 2. Four-state two-dimensional Ungerboeck code ( $n=5$ ).

Fig. 3. Two 64 -point signal constellations based on partition $\boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}$ . (a) Square. (b) Cross.

four cosets $A, B, C$ , and $D$ ; the remaining four uncoded bits select one of the 16 points from the selected coset.

As we show below, $C$ can be chosen so that $d_{\text {min }}^{2}(\mathbb{C})= d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=4$ ; since $\rho(\mathbb{C})=\rho(C)=1$ (the integer lattice has zero redundancy), the fundamental coding gain $\gamma(\mathbb{C})$ is $2^{-1} \cdot 4=2(3.01 \mathrm{~dB})$ (the minimum squared distance gain is a factor of four, but this is offset by a constellation expansion power cost of a factor of two, leaving a net gain of a factor of 2). Because the constellation is square, this is also the total coding gain. The total coding gain could be improved slightly by use of a more circular constellation, e.g., the 64 -point "cross" constellation of Fig. 3(b), which is about 0.1 dB better [1].

C. Other Coset Codes

The coset codes described in this paper are based on partitions of binary lattices. More generally, a coset code $\mathbb{C}(S / T ; C)$ can be defined whenever $S$ is some set of discrete elements that forms an algebraic group, with some distance measure between elements of $S, T$ is a subgroup of $S$ such that the quotient group $S / T$ has finite order $\mid S / T$ , and $C$ is an appropriate code whose codewords select sequences of cosets of $T$ in the partition $S / T$ .

For instance, $S$ can be a binary block code, with $T$ a subcode, and Hamming distance as the distance measure. We shall see some examples of the construction of convolutional codes in this way in this paper (other such constructions appear in [15]). In [18] we shall see how such codes as the Reed-Muller and Golay codes can be built up from short codes in this way. In [16], we shall show how ternary codes, lattices, and trellis codes can be constructed as coset codes, where $S$ and $T$ are ternary block codes or lattices, and $C$ is a block or convolutional code over the ternary field $\mathrm{GF}(3)$ .

Finally, phase-modulated codes can be constructed as coset codes as follows. The signal constellation for $m$ -ary PSK ( $m$ PSK) can be regarded as the $m$ complex $m$ th roots of unity, which forms a group $S$ under complex multiplication. If $n$ divides $m$ , the $n$ PSK constellation is a subgroup T. Thus, for example, 16PSK/8PSK/4PSK/2PSK/1PSK is a chain of two-way partitions. Although the minimum squared distances within these constellations are somewhat different from those in lattice partitions (e.g., $0.152 / 0.586 / 2 / 4 / \infty$ for these constellations, compared to 1/2/4/8/16 for the comparable two-dimensional lattice partition $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2} / 2 R \boldsymbol{Z}^{2} / 4 \boldsymbol{Z}^{2}$ ), similar constructions to those presented here often yield good phase- modulated codes (e.g., the phase-modulated codes of Ungerboeck [8] or of LaFanchere et al. [17]).

#8 Role of algebraic structure in the coset code framework

In general, these code constructions rely very little on the linearity properties of the groups (e.g., lattices, sublattices) on which they are based, and the codes so constructed are often not linear, particularly the trellis codes. The essential properties of these sets seem to be their partition structure and related distance properties, which of course were the basis for Ungerboeck's constructions via 'mapping by set partitioning.' The primary benefit of starting with sets that are groups seems to be that their subgroups naturally induce useful partitions via coset decompositions.

D. Outline

The primary subject of this paper is the categorization of various lattice-type coset codes in terms of their key parameters and, ultimately, in terms of their performance as measured by their fundamental coding gains.

Since these codes are based on partitions of binary lattices, we begin with an introduction to such lattices in Section II, which is intended to be self-contained. A more comprehensive introduction to this family of lattices is given in the companion paper [18], hereafter referred to as part II. (By far the best general reference on lattices is the forthcoming encyclopedic book by Conway and Sloane [7].) Section III summarizes those results from part II most relevant to this paper, particularly the performance of lattices as lattice codes, their partition/distance structure, and the decoding complexity of lattices and lattice partitions as measured by the number of binary operations of the trellis-based decoding algorithms given in part II.

Section IV then extends lattice theory to trellis codes. Section V characterizes the principal trellis codes that have been introduced to date as coset codes. Section VI introduces some generic classes of trellis codes whose main parameters can be easily determined and which include codes similar to (and in many cases equivalent to) the principal known codes. Finally, Section VII compares and contrasts all of these codes in terms of performance vs. complexity.

It is intended that this paper and part II may be read independently; as a result, there is some overlap. The reader who desires to read both papers in the most logical order is advised to skim this paper quickly through Section II, omitting proofs; then to read part II, with primary focus on the material relating to Barnes-Wall lattices; and then to return to the rest of this paper. The mathematical level is kept as elementary as possible; for the more mathematically inclined reader, we recommend learning about lattices by reading Conway and Sloane [7].

II. A Lattice Primer

A. Definitions

A real lattice $\Lambda$ is simply a discrete set of vectors (points, $N$ -tuples) in real Euclidean $N$ -space $\boldsymbol{R}^{N}$ that forms a group under ordinary vector addition, i.e., the sum or difference of any two vectors in $\Lambda$ is in $\Lambda$ . Thus $\Lambda$ necessarily includes the all-zero $N$ -tuple $\mathbf{0}$ , and if $\lambda$ is in $\Lambda$ , then so is its additive inverse $-\lambda$ . The vectors in a lattice may possibly span fewer than $N$ dimensions; however, this will not be the case for any lattice considered here, so there will be no confusion if we call a lattice of real $N$ -tuples an $N$ -dimensional real lattice.

As an example, the set $\boldsymbol{Z}$ of all integers is essentially the only one-dimensional real lattice, up to scaling, and the prototype of all lattices. The set $\boldsymbol{Z}^{N}$ of all integer $N$ -tuples is an $N$ -dimensional real lattice for any $N$ .

Lattices have only two principal structural characteristics. Algebraically, a lattice is a group; this property leads to the study of subgroups (sublattices) and partitions (coset decompositions) induced by such subgroups. Geometrically, a lattice is endowed with the properties of the space in which it is embedded, such as the Euclidean distance metric and the notion of volume in $\boldsymbol{R}^{N}$ . The following two sections are concerned with these two aspects of lattice structure.

Lattices closely related to a given real $N$ -dimensional lattice $\Lambda$ are obtained by the following operations.

Scaling: If $r$ is any real number, then $r \Lambda$ is the lattice consisting of all multiples $r \lambda$ of vectors $\lambda$ in $\Lambda$ by the scalar $r$ .
Orthogonal Transformation: More generally, if $T$ is any scaled orthogonal transformation of $N$ -space, then $T \Lambda$ is the lattice consisting of all transformations $T \lambda$ of vectors $\lambda$ in $\Lambda$ by $T$ . We say that $T \Lambda$ is a version of $\Lambda$ .
Cartesian Product: The $M$ -fold Cartesian product of $\Lambda$ with itself-i.e., the set of all $M N$ -tuples ( $\lambda_{1}, \lambda_{2}, \cdots, \lambda_{M}$ ) where each $\lambda_{j}$ is in $\Lambda$ -is an $M N$ -dimensional lattice denoted by $\Lambda^{M}$ .

For example, $\boldsymbol{Z}^{N}$ is the $N$ -fold Cartesian product of $\boldsymbol{Z}$ with itself, and $r \boldsymbol{Z}^{N}$ is a scaled version of $\boldsymbol{Z}^{N}$ for any $r$ and $N$ . The two-dimensional lattice $Z^{2}$ is illustrated in Fig. 4.

Fig. 4. Lattice $\boldsymbol{Z}^{2}$ and its sublattice $R \boldsymbol{Z}^{2}$ (black dots).

The most important scaled orthogonal transformation for our purposes is the rotation operator $R$ , defined by the $2 \times 2$ matrix

R \triangleq\left(\begin{array}{rr} 1 & 1 \\ 1 & -1 \end{array}\right) .

$R \boldsymbol{Z}^{2}$ is a version of $\boldsymbol{Z}^{2}$ obtained by rotating $\boldsymbol{Z}^{2}$ by $45^{\circ}$ and scaling by $2^{1 / 2}$ and is also illustrated in Fig. 4. The points in $R \boldsymbol{Z}^{2}$ are a subset of the points in $\boldsymbol{Z}^{2}$ , meaning that $R \boldsymbol{Z}^{2}$ is a sublattice of $\boldsymbol{Z}^{2}$ . Note that $R^{2}=2 I$ , where $I$ is the identity operator (in two dimensions), so that $R^{2} Z^{2} =2 \boldsymbol{Z}^{2}$ .

We can define a $2 N$ -dimensional rotation operator by letting $R$ operate on each pair of coordinates in a $2 N$ tuple; with a slight abuse of notation, we denote by $R$ any such rotation operator. For instance, in four dimensions,

R \triangleq\left(\begin{array}{rrrr} 1 & 1 & 0 & 0 \\ 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & -1 \end{array}\right)

Note that $R^{2}=2 I$ for any $N$ , where $I$ is the identity operator in $2 N$ dimensions, so that $R^{2} \Lambda=2 \Lambda$ for any real $2 N$ -dimensional lattice $\Lambda$ .

B. Group Properties

A coset of a lattice $\Lambda$ , denoted by $\Lambda+c$ , is the set of all $N$ -tuples of the form $\lambda+c$ , where $\lambda$ is any point in $\Lambda$ and $\boldsymbol{c}$ is some constant $N$ -tuple that specifies the coset. Geometrically, the coset $\Lambda+c$ is therefore a translate of $\Lambda$ by $\boldsymbol{c}$ (if $\boldsymbol{c}$ is in $\Lambda$ , then $\Lambda+\boldsymbol{c}=\Lambda$ ). Two $N$ -tuples are equivalent modulo $\Lambda$ if their difference is a point in $\Lambda$ . Thus the coset $\Lambda+\boldsymbol{c}$ is the set of all points equivalent to $\boldsymbol{c}$ modulo $\Lambda$ .

A sublattice $\Lambda^{\prime}$ of a lattice $\Lambda$ is a subset of the elements of $\Lambda$ that is itself a lattice, i.e., $\Lambda^{\prime}$ is a subgroup of the additive group $\Lambda$ . Thus, by elementary group theory, a sublattice $\Lambda^{\prime}$ induces a partition (denoted by $\Lambda / \Lambda^{\prime}$ ) of $\Lambda$ into equivalence classes modulo $\Lambda^{\prime}$ (the equivalence classes may be added modulo $\Lambda^{\prime}$ and form the quotient group $\Lambda / \Lambda^{\prime}$ ). We shall say that the order of the partition (or quotient group) $\Lambda / \Lambda^{\prime}$ is the number $\left|\Lambda / \Lambda^{\prime}\right|$ of such equivalence classes (in the mathematical literature, $\left|\Lambda / \Lambda^{\prime}\right|$ is usually called the index of $\Lambda^{\prime}$ in $\Lambda$ ). Each equivalence class is a coset of $\Lambda^{\prime}$ (one being $\Lambda^{\prime}$ itself), or, geometrically, a translate of $\Lambda^{\prime}$ . For example, the partition $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2}$ has order $\left|\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2}\right|=2$ , and Fig. 4 illustrates $\boldsymbol{Z}^{2}$ as the union of two cosets of $R \boldsymbol{Z}^{2}$ . Of course, any $N$ -dimensional integer lattice $\Lambda$ is a sublattice of $\boldsymbol{Z}^{N}$ .

If we take one element from each equivalence class, we obtain a system of coset representatives for the partition $\Lambda / \Lambda^{\prime}$ , denoted by $\left[\Lambda / \Lambda^{\prime}\right]$ . (In general, there are many ways of selecting such a system $\left[\Lambda / \Lambda^{\prime}\right]$ , so the notation does not entirely specify the system.) Then every element of $\Lambda$ can be written uniquely as a sum $\lambda=\lambda^{\prime}+\boldsymbol{c}$ , where $c \in\left[\Lambda / \Lambda^{\prime}\right]$ is the coset representative of the equivalence class in which $\lambda$ lies, and $\lambda^{\prime}=\lambda-c$ is an element of $\Lambda^{\prime}$ (because $\lambda \equiv c \bmod \Lambda^{\prime}$ ). This is called a coset decomposition of $\Lambda$ and will be written here as

\Lambda=\Lambda^{\prime}+\left[\Lambda / \Lambda^{\prime}\right] .

For example, the two 2 -tuples $(0,0)$ and $(1,0)$ are a system of coset representatives for the partition $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2}$ , and every element of $\boldsymbol{Z}^{2}$ may be written as the sum of one of these two 2 -tuples with an element of $R \boldsymbol{Z}^{2}$ , i.e., $\boldsymbol{Z}^{2}$ is the union of $R \boldsymbol{Z}^{2}+(0,0)=R \boldsymbol{Z}^{2}$ and $R \boldsymbol{Z}^{2}+(1,0)$ (the black dots and white dots in Fig. 4, respectively).

As another example, if $m$ is any integer, the lattice $m \boldsymbol{Z}$ of integer multiples of $m$ is a sublattice of $\boldsymbol{Z}$ . The partition $\boldsymbol{Z} / m \boldsymbol{Z}$ is the partition of the integers into $m$ equivalence classes modulo $m \boldsymbol{Z}$ (modulo $m$ ), and the order of the partition is $m$ . The integers $\{0,1, \cdots, m-1\}$ form a system of coset representatives for the partition $\boldsymbol{Z} / m \boldsymbol{Z}$ , and every integer $n$ can be written uniquely as $n=a m+c$ , where $a m$ is an element of $m \boldsymbol{Z}$ and $c \in\{0,1, \cdots, m-1\}=[\boldsymbol{Z} / m \boldsymbol{Z}]$ (thus $[\boldsymbol{Z} / \boldsymbol{m} \boldsymbol{Z}]$ is essentially the ring $\boldsymbol{Z}_{m}$ of integers modulo $m)$ . In particular, the partition $\boldsymbol{Z} / 2 \boldsymbol{Z}$ has order 2 and divides the integers into two subsets, $2 \boldsymbol{Z}$ (the even integers) and $2 \boldsymbol{Z}+1$ (the odd integers).

More generally, for any $m \in \boldsymbol{Z}$ , the lattice $m \boldsymbol{Z}^{N}$ of all $N$ -tuples of integer multiples of $m$ is a sublattice of $\boldsymbol{Z}^{N}$ of order $m^{N}$ , and $[\boldsymbol{Z} / m \boldsymbol{Z}]^{N}$ is a system of coset representatives for $\boldsymbol{Z}^{N} / m \boldsymbol{Z}^{N}$ ; hence $\boldsymbol{Z}^{N}=m \boldsymbol{Z}^{N}+[\boldsymbol{Z} / m \boldsymbol{Z}]^{N}$ .

A partition $\Lambda / \Lambda^{\prime}$ also induces a coset decomposition of any coset of $\Lambda$ , say $\Lambda+\boldsymbol{c}$ ; for $\Lambda+\boldsymbol{c}=\Lambda^{\prime}+\left[\Lambda / \Lambda^{\prime}\right]+\boldsymbol{c}$ .

A partition chain $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime} / \cdots$ is a sequence of lattices such that each is a sublattice of the previous one (in other words, $\Lambda \supseteq \Lambda^{\prime} \supseteq \Lambda^{\prime \prime} \supseteq \cdots$ ). For example, $\boldsymbol{Z} / 2 \boldsymbol{Z} / \mathbf{4} \boldsymbol{Z} / \cdots$ is an infinite sequence of two-way partitions of the integers.

#14 Mismatch in coset decomposition formulas

A partition chain induces a multiterm coset decomposition chain, with a term corresponding to each partition; e.g., if $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime}$ is a partition chain, then

\Lambda=\Lambda^{\prime \prime}+\left[\Lambda^{\prime} / \Lambda^{\prime \prime}\right]+\left[\Lambda / \Lambda^{\prime}\right]

meaning that every element of $\Lambda$ can be expressed as an element of $\Lambda^{\prime \prime}$ plus a coset representative from $\left[\Lambda^{\prime} / \Lambda^{\prime \prime}\right]$ plus a coset representative from $\left[\Lambda / \Lambda^{\prime}\right]$ . For example, the chain $\boldsymbol{Z} / \mathbf{2} \boldsymbol{Z} / \mathbf{4 Z} / \cdots$ leads to the standard binary representation of an integer $m$ :

m=a_{0}+2 a_{1}+4 a_{2}+\cdots

where $a_{0}, a_{1}, a_{2}, \cdots \in\{0,1\}$ , and $a_{0}$ specifies the coset in the partition $\boldsymbol{Z} / 2 \boldsymbol{Z}, 2 a_{1}$ specifies the coset in the partition $2 \boldsymbol{Z} / 4 \boldsymbol{Z}$ , and so forth. That is,

\boldsymbol{Z}=[\boldsymbol{Z} / 2 \boldsymbol{Z}]+[2 \boldsymbol{Z} / 4 \boldsymbol{Z}]+[4 \boldsymbol{Z} / 8 \boldsymbol{Z}]+\cdots

For a related example with a finite chain, we can specify one of the eight cosets of $\boldsymbol{Z} / 8 \boldsymbol{Z}$ (one of the equivalence classes of integers modulo 8 ) by 3 bits ( $a_{0}, a_{1}, a_{2}$ ), where $a_{2} a_{1} a_{0}$ is the standard binary representation of the coset representative $c \in\{0,1, \cdots, 7\}$ .

We may illustrate such a decomposition chain by a partition tower, as shown in Fig. 5(a). Each block in the tower represents one partition $\Lambda / \Lambda^{\prime}$ in the chain, and the input to that block is a variable which selects one of the $\left|\Lambda / \Lambda^{\prime}\right|$ cosets of $\Lambda^{\prime}$ in that partition (or, equivalently, one of the coset representatives in $\left[\Lambda / \Lambda^{\prime}\right]$ ). The standard bi-

Fig. 5. Partition towers illustrating coset decomposition chains induced by lattice partition chains. (a) $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime} / \cdots$ . (b) $Z / 2 Z / \boldsymbol{4} \boldsymbol{Z} / \boldsymbol{8} \boldsymbol{Z} / \cdots$ .

nary representation is illustrated in this way in Fig. 5(b) (note that the "least significant bit" $a_{0}$ appears at the top).

C. Geometric Properties

The geometry of a real lattice $\Lambda$ arises from the geometry of a real Euclidean $N$ -space $\boldsymbol{R}^{N}$ . The two principal geometrical parameters of $\Lambda$ are the minimum squared distance $d_{\min }^{2}(\Lambda)$ between its points and its fundamental volume $V(\Lambda)$ ; these determine its fundamental coding gain $\gamma(\Lambda)$ .

The norm $\|\boldsymbol{x}\|^{2}$ of a vector $\boldsymbol{x}$ in $\boldsymbol{R}^{N}$ is the sum of the squares of its coordinates. Norms are nonnegative and in fact nonzero unless $\boldsymbol{x}=\mathbf{0}$ . The squared distance between two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ is the norm of their difference $\|x-y\|^{2}$ .

Because a lattice $\Lambda$ consists of discrete points, the norms of all lattice points are an infinite set of discrete values that can be enumerated in ascending order We call this the weight distribution of the lattice (theta series, in the lattice literature). The weight distribution is also the squared distance distribution between any point in the lattice and all other points, since any point $\lambda$ in $\Lambda$ can be taken as the origin $\mathbf{0}$ by translation of $\Lambda$ by $\lambda$ (looking out from any point in $\Lambda$ , the lattice looks the same).

The minimum nonzero norm is thus the minimum squared distance $d_{\min }^{2}(\Lambda)$ between any two points in $\Lambda$ . The number of elements of $\Lambda$ with this norm is the number of nearest neighbors of any lattice point (also called the kissing number, or multiplicity), and will be called here the error coefficient $N_{0}(\Lambda)$ .

For example, for any $N$ , the integer lattice $\boldsymbol{Z}^{N}$ has $d_{\text {min }}^{2}\left(\boldsymbol{Z}^{N}\right)=1$ . The set of all integer $N$ -tuples of norm 1 is the set of all permutations and sign changes of the vector $(1,0, \cdots, 0)$ , so $N_{0}\left(\boldsymbol{Z}^{N}\right)=2 N$ .

Loosely, the fundamental volume $V(\Lambda)$ is the volume of $N$ -space per lattice point, or the reciprocal of the number of lattice points per unit volume. More precisely, if we can partition $N$ -space into regions of equal volume, one associated with each lattice point, then $V(\Lambda)$ is the volume of each such region. For example, it is easy to see that we may partition $N$ -space into $N$ -cubes of side 1, one associated with each point of $\boldsymbol{Z}^{N}$ , so $V\left(\boldsymbol{Z}^{N}\right)=1$ .

To treat the general case, note that $\boldsymbol{R}^{N}$ is itself a group under ordinary vector addition (but not a lattice), because its points are not discrete). Any real $N$ -dimensional lattice $\Lambda$ is a subgroup of $\boldsymbol{R}^{N}$ . Thus there is a partition $\boldsymbol{R}^{N} / \Lambda$ of $N$ -space into equivalence classes modulo $\Lambda$ (cosets of $\Lambda$ ) (in our original definition of a coset of $\Lambda$ , implicitly we meant a coset in the partition $\boldsymbol{R}^{N} / \Lambda$ ). Define a fundamental region $\mathbb{R}(\Lambda)$ as a region of $N$ -space that contains one and only one point from each such equivalence class modulo $\Lambda$ ; thus $\mathbb{R}(\Lambda)$ is a system of coset representatives for the partition $\boldsymbol{R}^{N} / \Lambda$ . Every point $\boldsymbol{x}$ in $\boldsymbol{R}^{N}$ is thus uniquely representable as $\boldsymbol{x}=\lambda+\boldsymbol{c}$ , where $\lambda \in \Lambda$ and $\boldsymbol{c} \in \mathbb{R}(\Lambda)$ , i.e., there is a coset decomposition $\boldsymbol{R}^{N}= \Lambda+\mathbb{R}(\Lambda)$ . Geometrically, this is a tesselation of $N$ -space by translates of fundamental regions of $\Lambda$ . While there is no unique fundamental region, every fundamental region $\mathbb{R}(\Lambda)$ must have the same volume $V(\Lambda)$ (if it is measurable), since it is congruent to any other fundamental region modulo $\Lambda$ ; this uniquely defines the fundamental volume $V(\Lambda)$ .

For example, one fundamental region of the onedimensional integer lattice $\boldsymbol{Z}$ is the half-open interval $[0,1)=\{c: 0 \leq c<1\}$ ; another is the half-open interval ( $-1 / 2,1 / 2$ ]. Whatever fundamental region we take, however, its volume (length) must be one. Similarly, for $\boldsymbol{Z}^{N}$ , we may take $\mathbb{R}\left(\boldsymbol{Z}^{N}\right)$ as the half-open $N$ -cube $[0,1)^{N}$ or as $(-1 / 2,1 / 2]^{N}$ ; again the volume $V\left(\boldsymbol{Z}^{N}\right)$ is one for any $N$ .

Happily, the computation of fundamental volumes of an integer lattice $\Lambda$ may be completely avoided by use of the following lemma, if we know the order $\left|\boldsymbol{Z}^{N} / \Lambda\right|$ of the partition $Z^{N} / \Lambda$ .

Lemma 1: If $\Lambda^{\prime}$ is a sublattice of $\Lambda$ of order $\left|\Lambda / \Lambda^{\prime}\right|$ , then $V\left(\Lambda^{\prime}\right)=\left|\Lambda / \Lambda^{\prime}\right| V(\Lambda)$ .

Proof: Since only one of every $\left|\Lambda / \Lambda^{\prime}\right|$ points in $\Lambda$ is in $\Lambda^{\prime}$ , the fundamental volume of $\Lambda^{\prime}$ must be $\left|\Lambda / \Lambda^{\prime}\right|$ times larger than that of $\Lambda$ for its fundamental regions to fill $N$ -space. In fact, we may take a union of $\left|\Lambda / \Lambda^{\prime}\right|$ fundamental regions of $\Lambda$ as a fundamental region of $\Lambda^{\prime}$ , one $(\mathbb{R}(\Lambda)+c)$ associated with each member $c$ of a set of coset representatives for $\Lambda / \Lambda^{\prime}$ , in view of the decomposition chain $\boldsymbol{R}^{N}=\Lambda^{\prime}+\left[\Lambda / \Lambda^{\prime}\right]+\mathbb{R}(\Lambda)$ .

Corollary: If $\Lambda$ is an integer lattice, then $V(\Lambda)=\left|Z^{N} / \Lambda\right|$ . Notice that Lemma 1 does not essentially depend on $\Lambda$ being a lattice; as long as $\Lambda$ is a union of some number $\left|\Lambda / \Lambda^{\prime}\right|$ of cosets of $\Lambda^{\prime}$ , the points of $\Lambda$ are $\left|\Lambda / \Lambda^{\prime}\right|$ times as dense in $N$ -space as those of $\Lambda^{\prime}$ .

From the two geometrical parameters $d_{\min }^{2}(\Lambda)$ and $V(\Lambda)$ , we define the fundamental coding gain $\gamma(\Lambda)$ of a lattice $\Lambda$ as follows:

\gamma(\Lambda) \triangleq d_{\min }^{2}(\Lambda) / V(\Lambda)^{2 / N}

(in the mathematical literature this is called Hermite's parameter and is also denoted by the symbol $\gamma$ ). The fundamental coding gain is a normalized measure of the density of a lattice in the following various senses. a) It is dimensionless. Both $d_{\text {min }}^{2}(\Lambda)$ and $V(\Lambda)^{2 / N}$ have the dimensions of a two-dimensional volume (area). We shall often find that the most appropriate normalization of other parameters is to two dimensions. b) The fundamental coding gain is invariant to scaling, $\gamma(r \Lambda)=\gamma(\Lambda)$ , because $d_{\text {min }}^{2}(r \Lambda)=r^{2} d_{\text {min }}^{2}(\Lambda)$ and $V(r \Lambda) =r^{N} V(\Lambda)$ . c) More generally, the fundamental coding gain is invariant to any scaled orthogonal transformation $T, \gamma(T \Lambda) =\gamma(\Lambda)$ , because $d_{\text {min }}^{2}(T \Lambda)=|\operatorname{det} T|^{2 / N} d_{\text {min }}^{2}(\Lambda)$ and $V(T \Lambda) =|\operatorname{det} T| V(\Lambda)$ , where $\operatorname{det} T$ is the determinant of $T$ . Thus any version of $\Lambda$ has the same fundamental coding gain. d) The fundamental coding gain is invariant to the Cartesian product operation, $\gamma\left(\Lambda^{M}\right)=\gamma(\Lambda)$ , because $d_{\min }^{2}\left(\Lambda^{M}\right)=d_{\min }^{2}(\Lambda)$ and $V\left(\Lambda^{M}\right)=V(\Lambda)^{M}$ . Thus if $\Lambda_{N}$ is an $N$ -dimensional lattice and $\Lambda_{M}$ is an $M$ -dimensional lattice, then in $M N$ -space (where they can be compared directly) $\Lambda_{N}^{M}$ has a greater or lesser density of points per unit volume than does $\Lambda_{N}^{M}$ according to whether $\gamma\left(\Lambda_{N}\right)$ is greater or less than $\gamma\left(\Lambda_{M}\right)$ , provided that they are scaled so that their minimum squared distances are the same. We therefore say that $\Lambda$ is denser than $\Lambda^{\prime}$ if $\gamma(\Lambda)>\gamma\left(\Lambda^{\prime}\right)$ , regardless of whether $\Lambda$ and $\Lambda^{\prime}$ have the same dimension. e) For any $N, \gamma\left(\boldsymbol{Z}^{N}\right)=1$ . An uncoded system may be defined as one that uses constellations based on $\boldsymbol{Z}^{N}$ (e.g., PAM uses constellations based on $\boldsymbol{Z}$ , and narrow-sense quadrature amplitude modulation (QAM) uses constellations based on $\boldsymbol{Z}^{2}$ ). Thus the fundamental coding gain $\gamma(\Lambda)$ of an arbitrary lattice $\Lambda$ may be considered to be the gain using constellations based on $\Lambda$ over an uncoded system using constellations based on $\boldsymbol{Z}^{N}$ . f) More concretely, suppose we form a $2^{n}$ -point $N$ dimensional constellation by taking all points in an $N$ dimensional lattice $\Lambda$ (or a coset of $\Lambda$ ) that lie within an $N$ -sphere with radius chosen just large enough to enclose the desired number of points. The volume of the sphere must then be about $2^{n} V(\Lambda)$ for large $n$ . For the integer lattice $\boldsymbol{Z}^{N}$ , the volume of such a sphere must be about $2^{n}$ . The ratio of the radii of the two spheres is thus about $V(\Lambda)^{-1 / N}$ (this dimensional argument in fact holds for constellation boundaries of any shape). If we scale $\Lambda$ so that $d_{\text {min }}^{2}(\Lambda)=1$ , then the minimum distance is the same as for $\boldsymbol{Z}^{N}$ , but we achieve an average power reduction of about $V(\Lambda)^{-2 / N}$ , or $\gamma(\Lambda)$ , by using the constellation based on $\Lambda$ rather than that based on $\boldsymbol{Z}^{N}$ . Thus $\gamma(\Lambda)$ is normalized properly to measure a power gain, and we will often give its value in decibels.

For example, $R \boldsymbol{Z}^{2}$ is a version of $\boldsymbol{Z}^{2}$ with $d_{\text {min }}^{2}=2$ (the rotation operator $R$ always doubles norms, in any number of dimensions). The partition $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2}$ has order 2, and thus $V\left(R Z^{2}\right)=2$ . Thus we verify that $\gamma\left(R Z^{2}\right)=1$ .

As an example of a denser lattice, the Schläfli lattice $D_{4}$ may be defined as the four-dimensional integer lattice consisting of all integer 4 -tuples with an even number of odd coordinates or, equivalently, with even norm. The order of the partition $Z^{4} / D_{4}$ is two, because $Z^{4}$ is the union of $D_{4}$ and its coset $D_{4}+(1,0,0,0)$ (the set of all integer 4-tuples with an odd number of odd coordinates or, equivalently, with odd norm). Thus $V\left(D_{4}\right)=2$ . Clearly, $d_{\text {min }}^{2}\left(D_{4}\right)=2$ ; therefore, the fundamental coding gain of $D_{4}$ is

\gamma\left(D_{4}\right)=2 / 2^{2 / 4}=2^{1 / 2} .

Thus $D_{4}$ is denser than $\boldsymbol{Z}$ or $\boldsymbol{Z}^{4}$ by a factor of $2^{1 / 2}$ (or 1.51 dB ). The elements of norm 2 are the 24 points obtained by permutations and coordinate sign changes of the 4-tuple ( $1,1,0,0$ ), so the error coefficient $N_{0}\left(D_{4}\right)$ is 24 .

Another way of comparing the density of $D_{4}$ to that of $\boldsymbol{Z}^{4}$ is the following. The lattice $R \boldsymbol{Z}^{4}$ is a version of $\boldsymbol{Z}^{4}$ with $d_{\min }^{2}\left(R Z^{4}\right)=2$ (since $R$ doubles norms) and with $V\left(R Z^{4}\right)=4$ (since $\gamma\left(R Z^{4}\right)=1$ ). Moreover, $R Z^{4}$ is a sublattice of $D_{4}$ , which must be of order 2 (by Lemma 1); in fact, $D_{4}$ is the union of $R \boldsymbol{Z}^{4}$ (the lattice of integer 4-tuples with even norms in both pairs of coordinates) and its coset $R Z^{4}+(1,0,1,0)$ (the set of integer 4-tuples with odd norms in both pairs of coordinates). However, $D_{4}$ has the same minimum squared distance as $R \boldsymbol{Z}^{4}$ . Thus it is possible to take a version of $\boldsymbol{Z}^{4}$ and insert a translate of that version into the interstices between points without reducing the minimum distance. So $D_{4}$ has twice as many points as $R Z^{4}$ per unit volume, with no decrease in $d_{\text {min }}^{2}$ ; hence $D_{4}$ is twice as dense as $R Z^{4}$ in four dimensions, or $2^{1 / 2}$ times as dense per two dimensions, which is the normalization used in the definition of $\gamma$ .

We see that $Z^{4} / D_{4} / R Z^{4}$ is a chain of two-way partitions with distances $1 / 2 / 2$ (for short). However, since $D_{4}$ is a sublattice of $Z^{4}$ , it follows that $R D_{4}$ is a sublattice of $R Z^{4}, R^{2} D_{4}=2 D_{4}$ is a sublattice of $R^{2} Z^{4}=2 Z^{4}$ , and so forth. Hence $\boldsymbol{Z}^{4} / D_{4} / R \boldsymbol{Z}^{4} / R D_{4} / 2 \boldsymbol{Z}^{4} / 2 D_{4} / 2 R \boldsymbol{Z}^{4} / 2 R D_{4} / \cdots$ is an infinite chain of two-way partitions, with distances $1 / 2 / 2 / 4 / 4 / 8 / 8 / 16 / \cdots$ .

D. Complex Lattices and Gaussian Integers

A complex lattice $\Lambda$ is a discrete set of points in complex Euclidean $N$ -space $\boldsymbol{C}^{N}$ that forms a group under ordinary (complex) vector addition. Again, we stipulate that the only such lattices to be considered here will actually span $N$ dimensions, so we shall feel free to call such a $\Lambda$ an $N$ -dimensional complex lattice.

An obvious isomorphism (written $\Lambda_{r} \simeq \Lambda_{c}$ ) exists between any $2 N$ -dimensional real lattice $\Lambda_{r}$ and a corresponding $N$ -dimensional complex lattice $\Lambda_{c}$ , formed by taking each pair of coordinates of $\Lambda_{r}$ to specify the real and imaginary parts of each coordinate of $\Lambda_{c}$ , or vice versa. Addition of two points gives the same result in either case. Sublattices, cosets, and all such group properties carry over. Even the norm of two corresponding vectors is the same, so distances are not affected. Thus for most purposes it makes no difference whether we consider a lattice to be real or complex. For all parameters previously defined (e.g., $d_{\text {min }}^{2}(\Lambda), V(\Lambda), \gamma(\Lambda)$ ), we may define the values for a complex lattice to be the same as those for the corresponding real lattice.

The only difference of any significance arises when we consider multiplicative operations, such as scaling, or the taking of inner products. A complex lattice $\Lambda_{c}$ may be scaled by either a real number $r$ or a complex number $\alpha$ , the latter operation involving an equal phase rotation of each coordinate of $\Lambda_{c}$ by the phase of $\alpha$ (as well as a scaling of lengths by $|\alpha|$ , or norms by $|\alpha|^{2}$ ). The inner product $(x, y)$ of two real vectors $x$ and $y$ is the sum of the products of their coordinates and must be real; the (Hermitian) inner product ( $\boldsymbol{x}, \boldsymbol{y}$ ) of two complex vectors $\boldsymbol{x}$ and $y$ is the sum of the products of the coordinates of $x$ with the complex conjugates of the coordinates of $y$ and may be complex. Thus there may arise differences in definitions of orthogonality, duality, and so forth. In general, for the lattices considered in this paper, we shall prefer the complex definitions.

The simplest example of a complex lattice is the onedimensional complex lattice $\boldsymbol{G}$ corresponding to the twodimensional real lattice $\boldsymbol{Z}^{2}$ . The point ( $a, b$ ) in $\boldsymbol{Z}^{2}$ corre- sponds to the point $a+b i$ in $\boldsymbol{G}$ , where $a$ and $b$ may be any pair of integers. The set $\boldsymbol{G}$ is called the set of Gaussian integers.

The Gaussian integers $\boldsymbol{G}$ actually form a system of complex integers analogous to the ordinary real integers $\boldsymbol{Z}$ . Multiplication of two elements of $\boldsymbol{G}$ (using complex arithmetic) yields another element of $\boldsymbol{G}$ , which cannot be 0 unless one of the two elements is 0 (in fact, their norms multiply as real integers). Thus $\boldsymbol{G}$ is a ring and, in fact, an integral domain. Indeed, we have unique factorization in $\boldsymbol{G}$ : every element of $\boldsymbol{G}$ can be expressed uniquely as a product of primes, up to units, where the units (invertible elements) are $\pm 1$ and $\pm i$ , and the primes are the elements that have no divisors other than themselves, up to units. The primes of $\boldsymbol{G}$ , in order of increasing norm, are $1+i$ , $2 \pm i, 3, \cdots$ , with norms $2,5,9, \cdots$ . We denote the prime of least norm by $\phi \triangleq 1+i$ . (Note that $|\phi|^{2}=\phi \phi^{*}=2$ , and thus two is not a prime in $\boldsymbol{G}$ .)

We may scale $\boldsymbol{G}$ by any element $g \in \boldsymbol{G}$ and obtain a sublattice $g \boldsymbol{G}$ of $\boldsymbol{G}$ . By Lemma 1, the partition $\boldsymbol{G} / g \boldsymbol{G}$ must have order $|g|^{2}$ (the norm of $g$ ). There are thus $|g|^{2}$ equivalence classes of $\boldsymbol{G}$ modulo $g$ .

For example, $\phi \boldsymbol{G}$ is a sublattice of $\boldsymbol{G}$ of order $|\phi|^{2}=2$ and, in fact, is the complex lattice corresponding to the real lattice $R \boldsymbol{Z}^{2}$ . As with $R \boldsymbol{Z}^{2}, \phi \boldsymbol{G}$ consists of all the elements of $\boldsymbol{G}$ with even norm, its coset $\phi \boldsymbol{G}+1$ consists of all the elements of $\boldsymbol{G}$ with odd norm, and the union of $\phi \boldsymbol{G}$ and $\boldsymbol{\phi} \boldsymbol{G}+1$ is $\boldsymbol{G}$ (Fig. 4 may equally well be taken to illustrate this partition of $\boldsymbol{G}$ ). The coset representatives $[\boldsymbol{G} / \phi \boldsymbol{G}]$ may thus be taken as $\{0,1\}$ , and are isomorphic to $\boldsymbol{Z}_{2}=\mathrm{GF}(2)$ using modulo $\phi$ arithmetic (since $2 \equiv 0 \bmod \phi$ ).

More generally, $\phi^{\mu} \boldsymbol{G}$ is a sublattice of $\boldsymbol{G}$ of order $|\phi|^{2 \mu}=2^{\mu}$ and, in fact, is the complex lattice corresponding to the real lattice $R^{\mu} \boldsymbol{Z}^{2}$ , which is equal to $2^{\mu / 2} \boldsymbol{Z}^{2}$ for $\mu$ even and $2^{(\mu-1) / 2} R Z^{2}$ for $\mu$ odd. As with $R^{\mu} Z^{2}, \phi^{\mu} G$ consists of all the elements of $\boldsymbol{G}$ whose norms are multiples of $2^{\mu}$ , and thus $d_{\text {min }}^{2}\left(\phi^{\mu} G\right)=2^{\mu}$ . There is then an infinite chain $\boldsymbol{G} / \phi \boldsymbol{G} / \phi^{2} \boldsymbol{G} / \phi^{3} \boldsymbol{G} / \phi^{4} \boldsymbol{G} / \cdots$ of two-way partitions with distances $1 / 2 / 4 / 8 / 16 / \cdots$ , corresponding to the real chain $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2} / 2 R \boldsymbol{Z}^{2} / 4 \boldsymbol{Z}^{2} / \cdots$ . In analogy to the chain $\boldsymbol{Z} / 2 \boldsymbol{Z} / 4 \boldsymbol{Z} / \cdots$ , this chain suggests a complex binary representation of a Gaussian integer $g$ :

g=a_{0}+\phi a_{1}+\phi^{2} a_{2}+\cdots

where $a_{0}, a_{1}, a_{2}, \cdots \in\{0,1\}$ , and $a_{0}$ specifies the coset of $\phi \boldsymbol{G}$ in the partition $\boldsymbol{G} / \phi \boldsymbol{G}, \phi a_{1}$ specifies the coset of $\phi^{2} \boldsymbol{G}$ in the partition $\phi \boldsymbol{G} / \phi^{2} \boldsymbol{G}$ , and so forth. That is, the complex binary representation is based on the coset decomposition

\boldsymbol{G}=[\boldsymbol{G} / \phi \boldsymbol{G}]+\left[\phi \boldsymbol{G} / \phi^{2} \boldsymbol{G}\right]+\left[\phi^{2} \boldsymbol{G} / \phi^{3} \boldsymbol{G}\right]+\cdots

For any lattice $\Lambda$ , if $\lambda$ is any lattice point and $m$ is any integer, then $\pm m \lambda= \pm(\lambda+\lambda+\cdots+\lambda)$ is a lattice point, so $m \Lambda$ is a sublattice of $\Lambda$ , and $\Lambda$ (like any additive group) is a module over the ring $\boldsymbol{Z}$ of ordinary integers. However, a complex lattice $\Lambda$ is not necessarily a module over the ring $\boldsymbol{G}$ of Gaussian integers (for example, the two-dimensional hexagonal lattice is not). It is so if and only if $\lambda \in \Lambda$ implies $i \lambda \in \Lambda$ ; for then if $g=a+b i$ is any Gaussian integer, $g \lambda=a \lambda+b(i \lambda)$ is a lattice point. Then $g \Lambda$ is a sublattice of $\Lambda$ for any $g \in \boldsymbol{G}$ . In particular, $i \Lambda$ is a sublattice of $\Lambda$ ; but since $i(i \Lambda)=-\Lambda=\Lambda$ is a sublattice of $i \Lambda$ , in fact $i \Lambda=\Lambda$ . When necessary, we shall call such a complex lattice a $\boldsymbol{G}$ -lattice.

In general, multiplication of a $\boldsymbol{G}$ -lattice $\Lambda_{\mathrm{c}}$ by the complex scalar $\phi$ has much the same effect as a transformation of the corresponding real lattice $\Lambda_{r}$ by the rotation operator $R$ . The correspondence is not exact because $R$ includes a reflection as well as rotation and scaling, so that $R^{2}=2$ , whereas $\phi^{2}=2 i$ . (We could have avoided this difficulty by exchanging columns in the definition of $R$ .) However, if $\Lambda=\Lambda^{*}$ -i.e., if $\lambda \in \Lambda$ implies $\lambda^{*} \in \Lambda$ , where $\lambda^{*}$ is the complex conjugate of $\lambda$ -as will be true for all lattices to be considered here, then $\phi \Lambda_{\mathrm{c}} \simeq R \Lambda_{r}$ . The difference is slight, but we regard multiplication by the complex scalar $\phi$ as fundamentally a more natural operation than rotation by $R$ .

E. Binary Lattices

Binary lattices have proved to be the most useful class of lattices in applications. On the one hand, this is because they are a natural extension of binary block codes and are well suited to the bit-oriented real world. On the other hand, in many cases they give the best performance, both as lattices and as the basis of trellis codes, a result which would have been harder to predict a priori. For instance, the densest known lattices in $1,4,8,16$ , and 24 dimensions (among others) are binary lattices. We provide a brief introduction here; part II discusses binary lattices in more detail.

A real $N$ -dimensional lattice $\Lambda$ is a binary lattice if it is an integer lattice that has $2^{m} Z^{N}$ as a sublattice for some $m$ . The least such $m$ is called the 2-depth of the lattice. Thus $\boldsymbol{Z}^{N} / \Lambda / 2^{m} \boldsymbol{Z}^{N}$ is a partition chain. It turns out that all of the binary lattices that have proved to be useful to date have 2 -depth equal to one or two; we shall call such lattices mod-2 and mod-4 lattices, respectively.

A complex $N$ -dimensional lattice $\Lambda$ is a binary lattice if it is a Gaussian integer $\boldsymbol{G}$ -lattice that has $\phi^{\mu} \boldsymbol{G}^{N}$ as a sublattice for some $\mu$ . The least such $\mu$ is called the $\phi$ -depth of the lattice. Thus $G^{N} / \Lambda / \phi^{\mu} G^{N}$ is a partition chain.

If $\Lambda$ is a $2 N$ -dimensional real binary lattice, then the corresponding $N$ -dimensional complex lattice is also a complex binary lattice (if it is a $\boldsymbol{G}$ -lattice), and vice versa, since $2^{m} \boldsymbol{Z}^{2 N} \simeq \phi^{2 m} \boldsymbol{G}^{N} \subset \phi^{2 m-1} \boldsymbol{G}^{N}$ . So we may speak of the $\phi$ -depth of a real $2 N$ -dimensional binary lattice. A real $2 N$ -dimensional binary lattice with 2 -depth $m$ has $\phi$ -depth $2 m$ or $2 m-1$ ; thus the $\phi$ -depth is twice as fine-grained a parameter, and we shall henceforth call it simply the depth $\mu$ of a binary lattice. A mod-2 binary lattice thus has depth 1 or 2 , and a mod-4 binary lattice has depth 3 or 4 . For example, since $\boldsymbol{Z}^{4} / D_{4} / R \boldsymbol{Z}^{4}=\boldsymbol{G}^{2} / D_{4} / \phi \boldsymbol{G}^{2}$ is a partition chain, $D_{4}$ is a mod-2 binary lattice with depth $\mu=1$ .

Since the order of the partition $Z^{N} / 2^{m} Z^{N}$ (resp. $\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N}$ ) is a power of two, the orders of $\boldsymbol{Z}^{N} / \Lambda$ and $\Lambda / 2^{m} \boldsymbol{Z}^{N}$ (resp. $\boldsymbol{G}^{N} / \Lambda$ and $\Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ ) must be powers of two, since their product is $\left|\boldsymbol{Z}^{N} / 2^{m} \boldsymbol{Z}^{N}\right|$ (resp. $\left|\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N}\right|$ ). The redundancy $r(\Lambda)$ of a binary lattice $\Lambda$ is defined as the binary logarithm of $\left|Z^{N} / \Lambda\right|$ , so that $\left|Z^{N} / \Lambda\right|=2^{r(\Lambda)}$ . In view of the corollary to Lemma 1, the fundamental volume of a binary lattice is therefore $V(\Lambda)=2^{r(\Lambda)}$ , and the fundamental coding gain is

\gamma(\Lambda)=2^{-\rho(\Lambda)} d_{\min }^{2}(\Lambda)

where $\rho(\Lambda)$ is the normalized redundancy (per two dimensions) of $\Lambda, \rho(\Lambda)=r(\Lambda) / N$ , where $2 N$ is the dimension of $\Lambda$ as a real lattice, or $N$ is the dimension of $\Lambda$ as a complex lattice.

If we choose a constellation of (say) $2^{n}$ points from $\Lambda$ , they will occupy a volume in $2 N$ -space approximately $2^{r(\Lambda)}$ times as large as the same number of points chosen from $Z^{2 N}$ would. We therefore say that the constellation expansion factor is $2^{r(\Lambda)}$ in $2 N$ dimensions, or $2^{\rho(\Lambda)}$ in two dimensions. As previously discussed, this translates into a power cost due to constellation expansion of a factor of $2^{\rho(\Lambda)}$ , or $\rho(\Lambda) \cdot 3.01 \mathrm{~dB}$ . The formula for fundamental coding gain just given therefore has an interpretation as follows: the minimum squared distance gain of a factor of $d_{\text {min }}^{2}(\Lambda)$ (relative to $d_{\text {min }}^{2}\left(\boldsymbol{Z}^{N}\right)=1$ ) is partially offset by a constellation expansion power cost of a factor of $2^{-\rho(\Lambda)}$ , leaving a net coding gain of $\gamma(\Lambda)=2^{-\rho(\Lambda)} d_{\text {min }}^{2}(\Lambda)$ .

The order of $\Lambda / 2^{m} \boldsymbol{Z}^{2 N}$ is $2^{2 N m-r(\Lambda)}$ , and the order of $\Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ is $2^{N \mu-r(\Lambda)}$ . We may give $k(\Lambda) \triangleq N \mu(\Lambda)-r(\Lambda)$ the somewhat ugly but dual name of the 'informativity' of $\Lambda$ , where $N$ is the dimension of $\Lambda$ as a complex lattice; the normalized informativity (per two dimensions) of $\Lambda$ is $\kappa(\Lambda) \triangleq k(\Lambda) / N$ , where $N$ is the dimension of $\Lambda$ as a complex lattice, or $2 N$ is the dimension of $\Lambda$ as a real lattice. Thus the depth of a binary lattice $\Lambda$ is the sum of its normalized redundancy and informativity:

\mu(\Lambda)=\rho(\Lambda)+\kappa(\Lambda) .

For example, the depth of $D_{4}$ is 1 , its redundancy and informativity are both equal to 1 , and its normalized redundancy and informativity are both equal to $1 / 2$ .

Moreover, if a sublattice $\Lambda^{\prime}$ of a binary lattice $\Lambda$ is also a binary lattice, then $\Lambda / \Lambda^{\prime}$ is a partition whose order is a power of two, since $\boldsymbol{Z}^{N} / \Lambda / \Lambda^{\prime} / 2^{m} \boldsymbol{Z}^{N}$ is a partition chain for some $m$ . We define the depth of a partition $\Lambda / \Lambda^{\prime}$ of binary lattices as the depth of $\Lambda^{\prime}, \mu\left(\Lambda / \Lambda^{\prime}\right) \triangleq \mu\left(\Lambda^{\prime}\right)$ ; the redundancy as the redundancy of $\Lambda, \rho\left(\Lambda / \Lambda^{\prime}\right) \triangleq \rho(\Lambda)$ ; and the informativity as the informativity of $\Lambda^{\prime}, \kappa\left(\Lambda / \Lambda^{\prime}\right) \triangleq \kappa\left(\Lambda^{\prime}\right)$ . It follows that the order of the partition is

\left|\Lambda / \Lambda^{\prime}\right|=2^{N\left(\mu\left(\Lambda^{\prime}\right)-\rho(\Lambda)-\kappa\left(\Lambda^{\prime}\right)\right)}

where $2 N$ is the dimension of $\Lambda$ as a real lattice.

F. Labelings of Partitions of Binary Lattices

If $\Lambda$ and $\Lambda^{\prime}$ are binary lattices such that $\Lambda^{\prime}$ is a sublattice of $\Lambda$ , then the partition $\Lambda / \Lambda^{\prime}$ has order $2^{K}$ for some integer $K$ . Any map from binary $K$ -tuples $\boldsymbol{a}$ to unique cosets of $\Lambda^{\prime}$ in this partition is called a labeling. The labeling may be defined by a function $\boldsymbol{c}(\boldsymbol{a})$ , where $\boldsymbol{a}$ is any binary $K$ -tuple, called a label, and $\boldsymbol{c}(\boldsymbol{a})$ is a coset representative of the coset of $\Lambda^{\prime}$ specified by $\boldsymbol{a}$ . We always assume that $\boldsymbol{c ( 0 )}=\mathbf{0}$ , i.e., that the zero label maps to the zero coset, namely, $\Lambda^{\prime}$ itself.

The following lemma shows, first, that any such partition can be broken up into a chain of $K$ two-way partitions $\Lambda_{k} / \Lambda_{k+1}, 0 \leq k \leq K-1$ ; second, that there is then a labeling

\boldsymbol{c}(\boldsymbol{a})=\sum a_{k} \boldsymbol{g}_{k}

where $a_{k}$ is the $k$ th coordinate of the binary $K$ -tuple $a$ and $\boldsymbol{g}_{k}$ is an element of $\Lambda_{k}$ but not of $\Lambda_{k+1}$ such that the two vectors $\left\{a_{k} \boldsymbol{g}_{k}, a_{k} \in\{0,1\}\right\}$ , are a system of coset representatives for the cosets of $\Lambda_{k+1}$ in the two-way partition $\Lambda_{k} / \Lambda_{k+1}$ .

#9 Description of coset representatives in Sec. II.F

Thus the $2^{K}$ binary linear combinations $\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}$ of the generators $\boldsymbol{g}_{k}$ are a system of coset representatives $\left[\Lambda_{k} / \Lambda_{k+1}\right]$ for the partition $\Lambda_{k} / \Lambda_{k+1}$ (this is a special case of a general result for groups of order $2^{K}-$ binary groups-discussed in part II).

In the coding context, we call such a labeling an Ungerboeck labeling.

Lemma 2: Let $\Lambda$ and $\Lambda^{\prime}$ be binary lattices such that $\Lambda^{\prime}$ is a sublattice of $\Lambda$ , and let $\left|\Lambda / \Lambda^{\prime}\right|=2^{K}$ . Then there is a sequence of lattices $\Lambda_{0}=\Lambda, \Lambda_{1}, \cdots, \Lambda_{K}=\Lambda^{\prime}$ such that $\Lambda_{0} / \Lambda_{1} / \cdots / \Lambda_{K}$ is a lattice partition chain and each partition $\Lambda_{k} / \Lambda_{k+1}$ is two-way, $0 \leq k \leq K-1 . \Lambda$ has the coset decomposition

\Lambda=\Lambda^{\prime}+\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}

where $a_{k} \in\{0,1\}$ and $\left\{a_{k} g_{k}, a_{k} \in\{0,1\}\right\}$ is a system of coset representatives for $\left[\Lambda_{k} / \Lambda_{k+1}\right], 0 \leq k \leq K-1$ .

Sketch of proof (by induction, from $k=K-1$ down to $k=0$ ): Assume that $\Lambda_{k+1}$ is a binary lattice such that $\Lambda / \Lambda_{k+1} / \Lambda^{\prime}$ is a partition chain; $\Lambda_{K}=\Lambda^{\prime}$ is certainly such a lattice. If $\Lambda_{k+1} \neq \Lambda$ , then a vector $\boldsymbol{g}_{k}$ exists in $\Lambda$ that is not in $\Lambda_{k+1}$ but has order $2 \bmod \Lambda_{k+1}$ , i.e., such that $\boldsymbol{g}_{k}+\boldsymbol{g}_{k} \in \Lambda_{k+1}$ (see part II). Let $\Lambda_{k}$ then be the union of $\Lambda_{k+1}$ and its coset $\Lambda_{k+1}+\boldsymbol{g}_{k} ; \Lambda_{k}$ is clearly a lattice, which by construction is a sublattice of $\Lambda$ and has $\Lambda_{k+1}$ (and thus $\Lambda^{\prime}$ ) as a sublattice. The two vectors $\left\{a_{k} g_{k}, a_{k} \in\right. \{0,1\}\}$ are a system of coset representatives for the cosets of $\Lambda_{k+1}$ in the two-way partition $\Lambda_{k} / \Lambda_{k+1}$ . By induction, the $2^{K-k}$ binary linear combinations $\left\{\Sigma_{k \leq j \leq K-1} a_{j} \boldsymbol{g}_{j}\right\}$ are a system of coset representatives for $\Lambda_{k} / \Lambda^{\prime}$ . The induction terminates when $k=0$ .

Fig. 6 portrays an Ungerboeck labeling in two ways: as a partition tower, as in Fig. 5, and as a binary partition tree. In the tower, the first bit $a_{0}$ in the label $\boldsymbol{a}$ selects one

Fig. 6. Illustration of Ungerboeck labeling by partition tower and by partition tree.

of the two cosets of $\Lambda_{1}$ in the partition $\Lambda / \Lambda_{1}$ , the second bit $a_{1}$ selects one of the two cosets of $\Lambda_{2}$ in the partition $\Lambda_{1} / \Lambda_{2}$ , and so forth. In the tree, these are shown as two-way branches. Thus an Ungerboeck labeling is nested, in the sense that the first $k$ bits of the labeling of the partition $\Lambda / \Lambda^{\prime}$ are a labeling of the partition $\Lambda / \Lambda_{k}$ . If $\lambda$ is a vector in $\Lambda$ in the coset of $\Lambda^{\prime}$ whose label is $\boldsymbol{a}$ , then $\lambda$ is in a coset of $\Lambda_{k}$ determined by the first $k$ bits of $\boldsymbol{a}$ .

Consequently, we have the lattice/coset version of the Ungerboeck distance bound: if $\lambda$ and $\lambda^{\prime}$ are two different points of $\Lambda$ that lie in cosets of $\Lambda^{\prime}$ whose labels agree in their first $k$ bits, then $\left\|\lambda-\lambda^{\prime}\right\|^{2} \geq d_{\min }^{2}\left(\Lambda_{k}\right)$ , because both $\lambda$ and $\lambda^{\prime}$ are in the same coset of $\Lambda_{k}$ (this is a special case of the partition distance lemma of part II).

G. Decomposition of Binary Lattices

This section shows that the structure of binary lattices (whether real or complex) is a generalization of that of binary block codes. Mod-2 binary lattices are essentially isomorphic to linear binary block codes, as we can see in the following lemma and proof (this is "Construction A" of Leech and Sloane [6]).

Recall that an ( $N, K$ ) linear binary block code is any $K$ -dimensional subspace of the $N$ -dimensional linear vector space over $\boldsymbol{Z}_{2}=\mathrm{GF}(2)$ of all binary $N$ -tuples, where the coordinates of each of the $2^{K}$ codewords are regarded as elements of the binary field $\boldsymbol{Z}_{2}=\mathrm{GF}(2)$ . The codewords may be expressed as linear combinations $\boldsymbol{c}(\boldsymbol{a})=\sum a_{k} \boldsymbol{g}_{k}$ of a set of $K$ binary $N$ -tuples $\boldsymbol{g}_{k}, 1 \leq k \leq K$ , called generators. The set is called the generator matrix $G$ , so $\boldsymbol{c}(\boldsymbol{a})=\boldsymbol{a} G$ . The minimum Hamming distance of the code is the minimum number of nonzero coordinates in any nonzero codeword; we shall sometimes call an ( $N, K$ ) code with minimum Hamming distance $d_{H}$ an ( $N, K, d_{H}$ ) code. (Unless otherwise specified, an $(N, K)$ code will always mean a linear binary ( $N, K$ ) code.)

Lemma 3: An $N$ -dimensional real lattice $\Lambda$ is a mod-2 binary lattice if and only if it is the set of all integer $N$ -tuples that are congruent modulo 2 to one of the codewords $\boldsymbol{c}$ in a linear binary ( $N, K$ ) block code $C$ . The redundancy of $\Lambda$ is $r(\Lambda)=N-K$ , and its minimum squared distance is $d_{\min }^{2}(\Lambda)=\min \left[4, d_{H}(C)\right]$ , where $d_{H}(C)$ is the minimum Hamming distance of the code $C$ .

Proof: If $\Lambda$ is a mod- 2 binary lattice, then $\boldsymbol{Z}^{N} / \Lambda / 2 \boldsymbol{Z}^{N}$ is a partition chain, $\Lambda / 2 \boldsymbol{Z}^{N}$ has order $2^{K}$ for some integer $K$ , and $\boldsymbol{Z}^{N} / \Lambda$ then has order $2^{N-K}$ ; so the redundancy of $\Lambda$ is $r(\Lambda)=N-K . \Lambda$ is thus the union of $2^{K}$ cosets of $2 Z^{N}$ (in the partition $Z^{N} / 2 Z^{N}$ ); any such coset of $2 \boldsymbol{Z}^{N}$ has a binary coset representative $\boldsymbol{c}$ , i.e., an $N$ tuple of ones and zeros, and the coset $2 \boldsymbol{Z}^{N}+\boldsymbol{c}$ consists of all integer $N$ -tuples congruent to $c$ modulo 2 . By Lemma 2, $\Lambda=2 \boldsymbol{Z}^{N}+\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}$ , where the $\boldsymbol{g}_{k}$ can be taken as binary $N$ -tuples, $1 \leq k \leq K$ , and addition may be taken modulo $2 \boldsymbol{Z}^{N}$ ; i.e., modulo 2 . The set $C$ of $2^{K}$ coset representatives $\left\{\boldsymbol{c}(\boldsymbol{a})=\sum a_{k} \boldsymbol{g}_{k}\right\}$ is thus a linear binary ( $N, K$ ) block code. Conversely, if $C$ is such a code, then the set of all integer $N$ -tuples that are congruent modulo 2 to code- words $\boldsymbol{c}$ in $C$ is easily seen to be a lattice, with $2 \boldsymbol{Z}^{N}$ as a sublattice (the set of all integer $N$ -tuples congruent to $\mathbf{0}$ modulo 2). If $\boldsymbol{c} \neq \mathbf{0}$ , the minimum norm in the coset $2 \boldsymbol{Z}^{N}+\boldsymbol{c}$ is the norm $\|\boldsymbol{c}\|^{2}$ of $\boldsymbol{c}$ itself, which is also the Hamming weight of $c$ . The minimum such weight is the minimum Hamming distance of the code, $d_{H}(C)$ . If $\boldsymbol{c}=\mathbf{0}$ , the minimum nonzero norm in $2 Z^{N}$ itself is four. Thus $d_{\text {min }}^{2}(\Lambda)= \min \left[4, d_{H}(C)\right]$ .

Lemma 3 says that any mod-2 lattice $\Lambda$ may be constructed as a coset code $\mathbb{C}\left(\boldsymbol{Z}^{N} / 2 \boldsymbol{Z}^{N} ; C\right)$ , where $C$ is a linear binary ( $N, K$ ) block code. We can express $\Lambda$ by the code formula (coset decomposition) $\Lambda=2 \boldsymbol{Z}^{N}+C$ . This explicitly exhibits $\Lambda$ as the union of $2^{K}$ cosets of $2 \boldsymbol{Z}^{N}$ . The labeling $\boldsymbol{c}(\boldsymbol{a})=\Sigma a_{k} \boldsymbol{g}_{k}=\boldsymbol{a} G(\bmod 2)$ is linear in the sense that $\boldsymbol{c}\left(\boldsymbol{a} \oplus \boldsymbol{a}^{\prime}\right)=\boldsymbol{c}(\boldsymbol{a}) \oplus \boldsymbol{c}\left(\boldsymbol{a}^{\prime}\right)$ . This picture of a mod-2 lattice as a coset code is illustrated in Fig. 7.

Fig. 7. Illustration of mod- 2 binary lattice $\Lambda$ as union of $2^{K}$ cosets of $2 \boldsymbol{Z}^{N}$ , each coset corresponding to codeword $c(\boldsymbol{a})$ in linear binary $(N, K)$ code $C$ .

For example, the mod-2 lattices $\boldsymbol{Z}^{2}, R \boldsymbol{Z}^{2}$ , and $2 \boldsymbol{Z}^{2}$ correspond to the $(2,2,1),(2,1,2)$ , and $(2,0, \infty)$ binary codes, respectively, and consequently have minimum squared distances 1,2 , and 4 . The mod-2 lattices $Z^{4}, D_{4}$ , $R \boldsymbol{Z}^{4}, R D_{4}$ and $2 \boldsymbol{Z}^{4}$ correspond to $(4,4,1),(4,3,2),(4,2,2) =(2,1,2)^{2},(4,1,4)$ , and $(4,0, \infty)$ binary codes, with distances $1,2,2,4$ , and 4.

In general, the lattice corresponding to any single-par-ity-check ( $N, N-1,2$ ) code is a mod-2 lattice, the so-called "checkerboard lattice" $D_{N}$ , with minimum squared distance 2 and redundancy 1 , a sublattice of $\boldsymbol{Z}^{N}$ of order 2. The lattice $D_{2}$ is the lattice $R \boldsymbol{Z}^{2} \simeq \phi \boldsymbol{G}$ .

Mod-2 lattices are limited to minimum squared distances of four or less because $N$ -tuples in $2 \boldsymbol{Z}^{N}$ such as $(2,0, \cdots, 0)$ are lattice points. Binary block codes with Hamming distance 4 are therefore of special interest. The Gosset lattice $E_{8}$ is the mod-2 lattice corresponding to the $(8,4,4)$ Reed-Muller code; $E_{8}$ thus has $d_{\text {min }}^{2}\left(E_{8}\right)=4$ , $r\left(E_{8}\right)=4, \rho\left(E_{8}\right)=1$ , and thus $\gamma\left(E_{8}\right)=2^{-1} \cdot 4=2$ ( 3.01 dB ). ( $E_{8}$ has many remarkable properties; it is perhaps the second most important lattice in lattice theory.)

If $\Lambda^{\prime}$ is a sublattice of $\Lambda$ , where both are mod- 2 lattices, it is easy to see that the code $C^{\prime}$ associated with $\Lambda^{\prime}$ must be a subcode of the code $C$ associated with $\Lambda$ .

In general, if $\Lambda$ is a mod-4 lattice, then $4 \boldsymbol{Z}^{N}$ is a sublattice, and Lemma 2 allows us to write

\Lambda=4 \boldsymbol{Z}^{N}+\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}

where each $\boldsymbol{g}_{k}$ may be taken as a coset representative of $4 Z^{N}$ , i.e., as an $N$ -tuple of integers modulo 4 , but where the label $\boldsymbol{a}$ is still a $\{0,1\}$ -valued integer $K$ -tuple. Addition may be taken modulo 4 . This exhibits $\Lambda$ as a union of $2^{K}$ cosets of $4 \boldsymbol{Z}^{N}$ , where the coset representatives are $\boldsymbol{c}(\boldsymbol{a})= \sum a_{k} \boldsymbol{g}_{k}(\bmod 4)$ . If we take the coordinates of $\boldsymbol{c}(\boldsymbol{a})$ to be from the set $\{0, \pm 1,2\}$ , then $\boldsymbol{c}(\boldsymbol{a})$ is also a coset leader of its coset, i.e., an element of minimum norm.

#3 Clarity of argument for mod-4 lattice structure

For a further refinement, let $\Lambda_{e}$ be the set of all points in $\Lambda$ whose coordinates are all even. Then $\Lambda_{e}$ is a lattice, a sublattice of $\Lambda$ , with $\boldsymbol{4} \boldsymbol{Z}^{N}$ as a sublattice, so $\Lambda / \Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}$ is a partition chain, and there is a coset decomposition of the form $\Lambda=\boldsymbol{4} \boldsymbol{Z}^{N}+\left[\Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}\right]+\left[\Lambda / \Lambda_{e}\right]$ . The lattice $\Lambda_{e}$ is clearly a mod-2 lattice scaled by a factor of 2 ; consequently, $\Lambda_{e}=4 \boldsymbol{Z}^{N}+2 C$ , where $C$ is a binary ( $N, K^{\prime}$ ) code for some $K^{\prime}$ , by Lemma 3; in other words, the coset representatives $\left[\Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}\right]$ may be taken as $2 C$ , or $\left\{2 \sum a_{k} \boldsymbol{g}_{k}\right\}$ , where the $\boldsymbol{g}_{k}$ constitute a set of $K^{\prime}$ binary generators for the code $C$ . Thus we may take $K^{\prime}$ of the generators to be $2 \boldsymbol{g}_{k}, 1 \leq k \leq K^{\prime}$ , and we may write

\Lambda=4 \boldsymbol{Z}^{N}+2 C+\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}

where the $\boldsymbol{g}_{k}, K^{\prime}+1 \leq k \leq K$ , are $N$ -tuples of integers modulo 4 that are not all even, such that $\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}$ is a system of coset representatives $\left[\Lambda / \Lambda_{e}\right]$ . The generators $\left\{2 \boldsymbol{g}_{k}, K^{\prime}+1 \leq k \leq K\right\}$ generate a lattice $\Lambda^{\prime}$ that is a sublattice of $\Lambda_{e}$ , whose elements are congruent to $2 \boldsymbol{c}^{\prime}$ modulo 4, where $c^{\prime}$ is a codeword in a binary ( $N, K-K^{\prime}$ ) block code $C^{\prime}$ that is a subcode of the code $C$ , so $K-K^{\prime} \leq K^{\prime}$ .

It is sometimes possible to find a set of generators $\left\{\boldsymbol{g}_{k}, K^{\prime}+1 \leq k \leq K\right\}$ for $\Lambda / \Lambda_{e}$ such that each $\boldsymbol{g}_{k}$ is an $N$ -tuple of ones and zeros and such that the "carries" (the twos-coefficients in the vector sum) of any sum $\boldsymbol{g}_{k}+\boldsymbol{g}_{k^{\prime}}$ are a codeword in $C$ ; then we say that $\Lambda$ is decomposable. The coset decomposition then becomes the code formula $\Lambda=4 \boldsymbol{Z}^{N}+2 C_{1}+C_{0}$ , where $C_{0}$ is a subcode of $C_{1}$ . This means that $\Lambda$ consists of all integer $N$ -tuples whose coordinate ones-coefficients in the standard binary representation form a binary $N$ -tuple $a_{0}$ that is a codeword in $C_{0}$ , and whose coordinate twos-coefficients form a binary $N$ tuple $\boldsymbol{a}_{1}$ that is a codeword in $C_{1}$ (this is the "coordinate array" idea of Leech and Sloane [6]).

The minimum squared distance of a mod-4 decomposable real lattice $\Lambda$ is

d_{\min }^{2}(\Lambda)=\min \left[16,4 d_{H}\left(C_{1}\right), d_{H}\left(C_{0}\right)\right]

because, on the one hand, there are $N$ -tuples in $\boldsymbol{4 Z}^{N}$ of norm 16, in $2 C_{1}$ of norm $4 d_{H}\left(C_{1}\right)$ and in $C_{0}$ of norm $d_{H}\left(C_{0}\right)$ ; on the other hand, if $\lambda \equiv 2 a_{1}+a_{0} \bmod 4$ and $\boldsymbol{a}_{0} \neq \mathbf{0}$ , then $\|\lambda\|^{2} \geq d_{H}\left(C_{0}\right)$ ; if $\boldsymbol{a}_{0}=\mathbf{0}$ but $\boldsymbol{a}_{1} \neq \mathbf{0}$ , then $\|\lambda\|^{2} \geq 4 d_{H}\left(C_{1}\right)$ ; finally, if $\boldsymbol{a}_{0}=\boldsymbol{a}_{1}=\mathbf{0}$ , then $\|\lambda\|^{2} \geq 16$ (if $\lambda \neq \mathbf{0}$ ). This suggests that we shall want to choose codes $C_{1}$ and $C_{0}$ for which the Hamming distances are in the ratio 1:4.

Most of the mod-4 binary lattices useful in practice are decomposable. For example, the lattice $R E_{8}$ is a version of the Gosset lattice which is mod-4 and decomposable, with the code formula $R E_{8}=4 \boldsymbol{Z}^{8}+2(8,7,2)+(8,1,8)$ . The standard binary representation of an integer modulo 4 is a more elementary example of a decomposition of this type, with the code formula $\boldsymbol{Z}=\mathbf{4 Z}+2(1,1)+(1,1)$ ; this simply means that every integer modulo 4 can be expressed as a binary linear combination of the two generators 2 and 1 , where the combination can be specified by a 2-bit label.

The simplest example of an indecomposable mod-4 lattice is the two-dimensional lattice $\Lambda=\left\{\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2}\right): \boldsymbol{x}_{1}+\boldsymbol{x}_{2} \equiv\right. 0 \bmod 4\}$ , for which $C$ is the $(2,1)$ code with generator $(1,1)$ , so $2 C$ has generator $\boldsymbol{g}_{1}=(2,2)$ , but $\boldsymbol{g}_{2}$ must be taken as $(1,-1)$ or $(-1,1)$ ; however, this lattice can be made decomposable by inverting the sign of one coordinate. The Leech lattice $\Lambda_{24}$ is an example of a fundamentally indecomposable mod-4 lattice.

Fig. 8(a) illustrates the coset decomposition $\Lambda / \Lambda_{e} / 4 \boldsymbol{Z}^{N}$ of a mod-4 binary lattice that is used above. The $K^{\prime}$ -bit label $\boldsymbol{a}_{1}$ specifies a coset of $\boldsymbol{4} \boldsymbol{Z}^{N}$ in the partition $\Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}$ , and the ( $K-K^{\prime}$ )-bit label $\boldsymbol{a}_{0}$ specifies a coset of $\Lambda_{e}$ in the partition $\Lambda / \Lambda_{e}$ . Altogether, therefore, the $K$ -bit label ( $\boldsymbol{a}_{0}, \boldsymbol{a}_{1}$ ) specifies a coset of $\mathbf{4} \boldsymbol{Z}^{N}$ in the partition $\Lambda / \mathbf{4} \boldsymbol{Z}^{N}$ . Fig. 8(b) illustrates the same decomposition when $\Lambda$ is a decomposable mod-4 binary lattice with code formula $\Lambda=4 \boldsymbol{Z}^{N}+2 C_{1}+C_{0}$ ; the form is then that of a coset code $\mathbb{C}\left(\boldsymbol{Z}^{N} / 2 \boldsymbol{Z}^{N} / 4 \boldsymbol{Z}^{N} ; C_{0}, C_{1}\right)$ . (If $C_{1}$ is the ( $N, N$ ) code, then $\Lambda_{e}=2 \boldsymbol{Z}^{N}, \Lambda$ is actually a mod-2 lattice, and Fig. 8 reduces to Fig. 7.)

Fig. 8. Illustration of mod-4 binary lattice $\Lambda$ as union of $2^{K}$ cosets of $\boldsymbol{4} \boldsymbol{Z}^{N}$ , each coset corresponding to binary label ( $\boldsymbol{a}_{0}, \boldsymbol{a}_{1}$ ). (a) General case. (b) Case where $\Lambda$ is decomposable.

It should be clear how to extend Fig. 8 to real binary lattices of any 2-depth $m$ , using a partition chain $\Lambda / \Lambda_{e} / \cdots / 2^{m} Z^{N}$ of sublattices of $\Lambda$ consisting of all elements of $\Lambda$ whose coordinates are in $2 \boldsymbol{Z}^{N}, 4 \boldsymbol{Z}^{N}$ , and so forth. The decomposition of complex binary lattices is similar. Mod-2 complex binary $\boldsymbol{G}$ -lattices are always decomposable, with a code formula of the form $\Lambda=\phi^{2} \boldsymbol{G}^{N}+ \phi C_{1}+C_{0}$ , as we can see from the following lemma. Complex binary lattices with depths of three or more are not necessarily decomposable, but the ones that we are concerned with generally are.

Lemma 4: An $N$ -dimensional complex $G$ -lattice $\Lambda$ is a mod-2 binary lattice if and only if it is the set of all Gaussian integer $N$ -tuples that are congruent modulo $\phi^{2}$ to an $N$ -tuple of the form $\phi \boldsymbol{c}_{1}+\boldsymbol{c}_{0}$ , where $\boldsymbol{c}_{1}$ is a codeword in a binary ( $N, K$ ) code $C_{1}$ , and $c_{0}$ is a codeword in a binary ( $N, J-K$ ) code $C_{0}$ which is a subcode of $C_{1}$ . The redundancy of $\Lambda$ is $r(\Lambda)=2 N-J$ , and its minimum squared distance is

d_{\min }^{2}(\Lambda)=\min \left[4,2 d_{H}\left(C_{1}\right), d_{H}\left(C_{0}\right)\right] .

Proof: If $\Lambda$ is a mod-2 binary lattice, then $\boldsymbol{G}^{N} / \Lambda / 2 \boldsymbol{G}^{N}=\phi^{2} \boldsymbol{G}^{N}$ is a partition chain, and $\Lambda / 2 \boldsymbol{G}^{N}$ has order $2^{J}$ for some integer $J$ . Since $\boldsymbol{G}^{N} / 2 \boldsymbol{G}^{N}$ has order $2^{2 N}$ , $\boldsymbol{G}^{N} / \Lambda$ has order $2^{2 N-J}$ , so the redundancy of $\Lambda$ is $r(\Lambda)= 2 N-J . \Lambda$ is the union of $2^{J}$ cosets of $2 \boldsymbol{G}^{N}=\phi^{2} \boldsymbol{G}^{N}$ , and we may take the coordinates of the coset representative of any such coset to be of the form $\phi \boldsymbol{c}_{1}+\boldsymbol{c}_{0}$ , where $\boldsymbol{c}_{1}$ and $\boldsymbol{c}_{0}$ are binary $N$ -tuples. Sums of coset representatives may be taken $\bmod 2 G^{N}$ (or modulo 2 ).

Consider the set $\Lambda_{\phi}$ of all points in $\Lambda$ whose coordinates are all multiples of $\phi$ . Then $\Lambda / \Lambda_{\phi} / \phi^{2} G^{N}$ is a partition chain, and there is a coset decomposition of the form $\Lambda=\phi^{2} \boldsymbol{G}^{N}+\left[\Lambda_{\phi} / \phi^{2} \boldsymbol{G}^{N}\right]+\left[\Lambda / \Lambda_{\phi}\right]$ . The generators of the lattice $\Lambda_{\phi}$ modulo $\phi^{2} \boldsymbol{G}^{N}$ may be taken as $\phi \boldsymbol{g}_{k}, 1 \leq k \leq K$ , for some $K$ , where each $\boldsymbol{g}_{k}$ is a binary $N$ -tuple; consequently, $\Lambda_{\phi}=\phi^{2} G^{N}+\phi C_{1}$ , where $C_{1}$ is the binary ( $N, K$ ) code generated by these $\boldsymbol{g}_{k}$ . Similarly, the generators of the lattice $\Lambda$ modulo $\Lambda_{\phi}$ may be taken as $\boldsymbol{g}_{k}, K+1 \leq k \leq J$ , where each $\boldsymbol{g}_{k}$ is a binary $N$ -tuple; consequently, $\Lambda= \Lambda_{\phi}+C_{0}$ , where $C_{0}$ is the binary ( $N, J-K$ ) code generated by the $\boldsymbol{g}_{k}, K+1 \leq k \leq J$ . Since $\Lambda$ is a $\boldsymbol{G}$ -lattice, $\lambda \in \Lambda$ implies $\phi \lambda \in \Lambda$ . Therefore, $C_{0}$ must be a subcode of $C_{1}$ .

The minimum distance expression arises from the fact that there are $N$ -tuples of norm 4 in $\phi^{2} G^{N}$ , of norm $2 d_{H}\left(C_{1}\right)$ in $\phi C_{1}$ , and of norm $d_{H}\left(C_{0}\right)$ in $C_{0}$ ; conversely, if $\lambda \equiv \phi c_{1}+c_{0} \bmod \phi^{2}$ and $c_{0} \neq \mathbf{0}$ , then $\|\lambda\|^{2} \geq d_{H}\left(C_{0}\right)$ ; if $\boldsymbol{c}_{0}=\mathbf{0}$ but $\boldsymbol{c}_{1} \neq \mathbf{0}$ , then $\|\lambda\|^{2} \geq 2 d_{H}\left(C_{1}\right)$ ; finally, if $\boldsymbol{c}_{0}=\boldsymbol{c}_{1} =\mathbf{0}$ , then $\|\lambda\|^{2} \geq 4$ (if $\lambda \neq \mathbf{0}$ ).

For the converse, let $\Lambda$ be the set of all Gaussian integer $N$ -tuples $\lambda \equiv \phi \boldsymbol{c}_{1}+\boldsymbol{c}_{0} \bmod \phi^{2}$ . To show that $\Lambda$ is a $\boldsymbol{G}$ lattice, we must show that if $\lambda_{1}, \lambda_{2} \in \Lambda$ , then $\lambda_{1}+\lambda_{2} \in \Lambda$ , and also if $\lambda \in \Lambda$ , then $-\lambda \in \Lambda$ and $i \lambda \in \Lambda$ . The first two propositions follow immediately from $2 \lambda \equiv 0 \bmod \phi^{2}$ . The third follows from $i \lambda=\phi \lambda-\lambda \in \Lambda$ , where the fact that $\phi \lambda \in \Lambda$ depends on $C_{0}$ being a subcode of $C_{1} . \Lambda$ is a Gaussian integer lattice with $2 G^{N}$ as a sublattice (by construction).

A mod-2 complex $\boldsymbol{G}$ -lattice $\Lambda$ has depth 1 if and only it the code $C_{1}$ of Lemma 4 is the ( $N, N$ ) code; then its code formula can be simplified to $\Lambda=\phi \boldsymbol{G}^{N}+C_{0}$ . Otherwise, it has depth 2.

For example, as a complex lattice, the mod-2 depth-1 Schläfli lattice $D_{4}$ is decomposable with the code formula

D_{4}=\phi \boldsymbol{G}^{2}+(2,1,2) .

Thus $\mu\left(D_{4}\right)=1, r\left(D_{4}\right)=1, \rho\left(D_{4}\right)=1 / 2$ , and $d_{\text {min }}^{2}\left(D_{4}\right)=$ 2, in agreement with the values obtained earlier for $D_{4}$ as a real lattice. As a complex lattice, the mod-2 depth-2 Gosset lattice $E_{8}$ is decomposable with the code formula

E_{8}=\phi^{2} \boldsymbol{G}^{4}+\phi(4,3,2)+(4,1,4)

(see part II). Thus $\mu\left(E_{8}\right)=2, r\left(E_{8}\right)=4, \rho\left(E_{8}\right)=1$ , and $d_{\text {min }}^{2}\left(E_{8}\right)=4$ , in agreement with the values obtained earlier for $E_{8}$ as a real lattice. The complex binary representation of a Gaussian integer $\bmod \phi^{2}$ is a more elementary example of a decomposition of this type.

As in the real case, complex binary lattices that are not mod-2 are not necessarily decomposable, i.e., expressible purely in terms of binary codes. However, the lattices that are useful in applications generally are as follows. Let $C_{\mu-1} / C_{\mu-2} / \cdots / C_{0}$ be a partition chain of binary ( $N, K_{j}$ ) codes, $0 \leq j \leq \mu-1$ , i.e., $C_{j}$ is a subcode of $C_{j+1}$ . Then let $\Lambda$ be a complex binary lattice whose elements are the set of Gaussian integer $N$ -tuples $\lambda$ that are congruent to $\phi^{\mu-1} \boldsymbol{c}_{\mu-1}+\cdots+\boldsymbol{c}_{0}$ modulo $\phi^{\mu}$ , where $\boldsymbol{c}_{j}$ is a codeword in the code $C_{j}$ , i.e., the coefficients of $\phi^{j}$ in the complex binary representation of $\lambda$ are codewords in $C_{j}, 0 \leq j \leq \mu-1$ . We represent this by the (complex) code formula

\Lambda=\phi^{\mu} \boldsymbol{G}^{N}+\phi^{\mu-1} C_{\mu-1}+\cdots+C_{0} .

This is the coordinate array idea again, but using the complex binary representation rather than the standard binary representation.

Some of the principal properties of such a complex decomposable lattice are as follows. a) $\Lambda$ is a $\boldsymbol{G}$ -lattice, because if $\lambda \in \Lambda$ , then $\phi \lambda \in \Lambda$ , in view of the subcode structure of the code formula; thus $i \lambda=(\phi-1) \lambda \in \Lambda$ . b) $\Lambda$ has depth $\mu$ (assuming that $K_{\mu-1}<N$ ). c) The order of the partition $\Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ is the product of the $2^{K_{j}}$ ; the informativity of $\Lambda$ is $k(\Lambda)=\sum K_{j}$ ; the redundancy of $\Lambda$ is $r(\Lambda)=N \mu-\sum K_{j}=\sum\left(N-K_{j}\right)$ ; and the normalized redundancy of $\Lambda$ is $\rho(\Lambda)=\mu-\sum K_{j} / N= \Sigma\left(1-K_{j} / N\right)$ . d) The minimum squared distance of $\Lambda$ is

d_{\min }^{2}(\Lambda)=\min \left[2^{\mu}, 2^{\mu-1} d_{H}\left(C_{\mu-1}\right), \cdots, d_{H}\left(C_{0}\right)\right]

because, on the one hand, we can exhibit $N$ -tuples with all of these norms by appropriate choice of codewords; on the other hand, if $j(\lambda)$ is the smallest index such that $\boldsymbol{c}_{j} \neq \mathbf{0}$ in the above congruence, then $\lambda$ has at least $d_{H}\left(C_{j}\right)$ coordinates with norm at least $2^{j}$ . This suggests that we shall want to choose a code partition chain $C_{\mu-1} / C_{\mu-2} / \cdots / C_{0}$ for which the Hamming distances are in the ratio $2 / 4 / \cdots$ .

A decomposable complex lattice of depth $\mu$ may be depicted as a coset code $\mathbb{C}\left(\boldsymbol{G}^{N} / \phi \boldsymbol{G}^{N} / \cdots / \phi^{\mu} \boldsymbol{G}^{N} ; C_{0}, \cdots\right.$ , $C_{\mu-1}$ ), analogously to Fig. 8(b), where the coset of $\phi^{j} \boldsymbol{G}^{N}$ in the partition $\phi^{j} \boldsymbol{G}^{N} / \phi^{j+1} \boldsymbol{G}^{N}$ is determined by a codeword from $C_{j}, 0 \leq j \leq \mu-1$ .

H. Dual Lattices

If $\Lambda$ is a binary lattice of depth $\mu$ , either $2 N$ -dimensional real or $N$ -dimensional complex, we define its dual lattice $\Lambda^{\perp}$ as the set of all Gaussian integer $N$ -tuples $y$ that are orthogonal to all vectors $\boldsymbol{x} \in \Lambda$ modulo $\phi^{\mu}$ , where we use the complex inner product $(\boldsymbol{x}, \boldsymbol{y})$ . That is, $(\boldsymbol{x}, \boldsymbol{y})$ is a Gaussian integer in the lattice $\phi^{\mu} \boldsymbol{G}$ , or a multiple of $\phi^{\mu}$ . For example, it is easy to verify that $D_{4}$ is self-dual, using the fact that $D_{4}$ is the set of all pairs of Gaussian integers that are both even (in $\boldsymbol{\phi} \boldsymbol{G}$ ) or both odd (in $\boldsymbol{\phi} \boldsymbol{G}+1$ ).

If $\Lambda$ is a binary lattice of depth $\mu$ , we may regard $\boldsymbol{G}^{N} / \Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ and $\boldsymbol{G}^{N} / \Lambda^{\perp} / \phi^{\mu} \boldsymbol{G}^{N}$ as dual partition chains, since $\phi^{\mu} G^{N}$ is the lattice of all vectors orthogonal to all vectors in $\boldsymbol{G}^{N}$ modulo $\phi^{\mu}$ , and vice versa. It is straightforward to verify that the order of $\boldsymbol{G}^{N} / \Lambda$ is the same as the order of $\Lambda^{\perp} / \phi^{\mu} \boldsymbol{G}^{N}$ , which implies that the order of $\boldsymbol{G}^{N} / \Lambda^{\perp}$ is the same as the order of $\Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ . Therefore, the redundancy of $\Lambda^{\perp}$ is the informativity of $\Lambda$ , and vice versa. Moreover, if $\Lambda / \Lambda^{\prime}$ is a partition of binary lattices of depth $\mu$ , then $\boldsymbol{G}^{N} / \Lambda / \Lambda^{\prime} / \phi^{\mu} \boldsymbol{G}^{N}$ and $\boldsymbol{G}^{N} / \Lambda^{\prime \perp} / \Lambda^{\perp} / \phi^{\mu} \boldsymbol{G}^{N}$ are dual partition chains, and $\Lambda^{\prime \perp} / \Lambda^{\perp}$ is a partition of binary lattices with the same order and depth as $\Lambda / \Lambda^{\prime}$ .

If $\Lambda$ is a decomposable $N$ -dimensional complex binary lattice of depth $\mu$ , with code formula

\Lambda=\phi^{\mu} \boldsymbol{G}^{N}+\phi^{\mu-1} C_{\mu-1}+\cdots+C_{0}

then its dual lattice $\Lambda^{\perp}$ is a decomposable complex binary lattice of depth $\mu$ , with code formula

\Lambda^{\perp}=\phi^{\mu} \boldsymbol{G}^{N}+\phi^{\mu-1} C_{0}^{\perp}+\cdots+C_{\mu-1}^{\perp}

where $C_{j}^{\perp}$ is the dual code to $C_{j}$ . (Recall that the dual code to a linear binary ( $N, K$ ) block code is an ( $N, N-K$ ) code and that, if $C^{\prime}$ is a subcode of $C$ , then $C^{\perp}$ is a subcode of $\left(C^{\prime}\right)^{\perp}$ .) This proposition may be verified by noting that $C_{0}^{\perp} / \cdots / C_{\mu-1}^{\perp}$ is a code partition chain, that every generator $\phi^{\mu-j-1} \boldsymbol{g}_{j}^{\perp}$ of $\Lambda^{\perp} / \phi^{\mu} \boldsymbol{G}^{N}$ is orthogonal $\bmod \phi^{\mu}$ to every generator $\phi^{j^{\prime}} \boldsymbol{g}_{j^{\prime}}$ of $\Lambda / \phi^{\mu} \boldsymbol{G}^{N}$ , and that the dimensions are such that the informativity of $\Lambda^{\perp}$ is equal to the redundancy of $\Lambda$ and vice versa.

For example, the Schläfli lattice $D_{4}$ is self-dual, because its complex code formula is $D_{4}=\phi \boldsymbol{G}^{2}+(2,1,2)$ , and the ( $2,1,2$ ) code is self-dual. The Gosset lattice $E_{8}$ is self-dual, because its complex code formula is $E_{8}=\phi^{2} \boldsymbol{G}^{4}+\phi(4,3,2) +(4,1,4)$ , and the $(4,3,2)$ and $(4,1,4)$ codes are duals. An alternative definition of the dual of an $N$ -dimensional real lattice $\Lambda$ with 2-depth $m$ is the lattice $\Lambda^{\perp}$ consisting of all integer $N$ -tuples orthogonal to all $N$ tuples in $\Lambda$ modulo $2^{m}$ , using the real inner product. For the lattices that we will be considering, this definition coincides with the definition given above when the depth ( $\phi$ -depth) $\mu$ of $\Lambda$ is even, so that $\mu=2 m$ ; when $\mu$ is odd, $\mu=2 m-1$ , the dual lattice under this definition will be a version of the dual lattice as defined earlier, rotated by the rotation operator $R$ . When a real lattice is decomposable, the code formula for its dual has the same relation to its code formula as is given in the complex case above. For example, the Gosset lattice $E_{8}$ has depth 2 and has real code formula $E_{8}=2 Z^{8}+(8,4,4)$ ; its dual under this alternative definition is still itself, because the $(8,4,4)$ code is self-dual. The Schläfli lattice $D_{4}$ has depth 1 and has real code formula $D_{4}=2 \boldsymbol{Z}^{4}+(4,3,2)$ ; its dual under this alternative definition is $R D_{4}$ , which has code formula $R D_{4}=$ , $2 \boldsymbol{Z}^{4}+(4,1,4)$ , since the $(4,3,2)$ and $(4,1,4)$ codes are duals.

III. Useful Lattices and Their Partitions

In this section we list the lattices that have proved useful in applications. These are primarily the sequence of Barnes-Wall lattices and their principal sublattices, all of which are closely interrelated. Another close relative is the Leech lattice, probably the most important lattice in lattice theory. We give the principal properties of these lattices, including their geometrical parameters and their partition properties. For further details, see part II.

The Barnes-Wall lattices are a family of $2^{n}$ -dimensional binary lattices. The $n$ th member of the family, which we denote by $\Lambda(0, n)$ , may be regarded as either a $2^{n}$ -dimensional complex $\boldsymbol{G}$ -lattice or a $2^{n+1}$ -dimensional real lattice. The first few members of the family are $\Lambda(0,0)=\boldsymbol{G} \simeq \boldsymbol{Z}^{2}$ (the Gaussian integer lattice or the two-dimensional real integer lattice), $\Lambda(0,1)=D_{4}$ (the Schläfli lattice), $\Lambda(0,2) =E_{8}$ (the Gosset lattice), $\Lambda(0,3)=\Lambda_{16}$ (the 16-dimensional Barnes-Wall lattice), and $\Lambda(0,4)=\Lambda_{32}$ (the 32dimensional Barnes-Wall lattice). These are the densest lattices known in 4,8 , and 16 dimensions (and, until recently, 32) [7].

The Barnes- Wall lattices are decomposable, with code formulas that involve the family of Reed-Muller codes. Recall that the Reed-Muller code $\mathrm{RM}(r, n), 0 \leq n, 0 \leq r \leq n$ , is a code of length $N=2^{n}$ , minimum distance $d_{H}=2^{n-r}$ , and with $K(r, n)=\sum_{0 \leq j \leq r} C_{n j}$ information bits, where $C_{n_{j}}$ is the combinatorial coefficient $(n!) /[(j!)((n-j)!)]$ ; further, the Reed-Muller codes of a given length are nested, in the sense that $\mathrm{RM}(n, n) / \mathrm{RM}(n-1, n) / \cdots / \mathrm{RM}(0, n)$ is a code partition chain.

The Barnes-Wall lattice $\Lambda(0, n)$ has depth $\mu$ equal to $n$ and may be defined as the complex $\boldsymbol{G}$ -lattice that has the code formula

\Lambda(0, n)=\phi^{n} \boldsymbol{G}^{N}+\phi^{n-1} \mathrm{RM}(n-1, n)+\cdots+\mathrm{RM}(0, n),

where $N=2^{n}$ . For example, as we have already seen, $\Lambda(0,0)=\boldsymbol{G}, \quad \Lambda(0,1)=D_{4}=\phi \boldsymbol{G}^{2}+(2,1,2), \quad \Lambda(0,2)=E_{8}= \phi^{2} \boldsymbol{G}^{4}+\phi(4,3,2)+(4,1,4)$ , and so forth. Thus the complex code formula involves all Reed-Muller codes of length $N=2^{n}$ (this construction is due to Cusack [19]). There are similar real code formulas involving alternate Reed-Muller codes of length $2 N$ , as we shall tabulate below.

From the properties of decomposable binary lattices and Reed-Muller codes, the redundancy and informativity of $\Lambda(0, n)$ are both equal to $n 2^{n-1}=n N / 2$ , the normalized redundancy and informativity are both equal to $n / 2$ , and the minimum squared distance $d_{\min }^{2}$ is equal to $2^{n}$ . Consequently, the fundamental coding gain is $\gamma=2^{n / 2}$ . Because $\mathrm{RM}(n-r-1, n)$ and $\mathrm{RM}(r, n)$ are dual codes, the dual of $\Lambda(0, n)$ has the same code formula as $\Lambda(0, n)$ itself, and $\Lambda(0, n)$ is self-dual for any $n$ .

The principal sublattices of the Barnes-Wall lattices are a family of lattices $\Lambda(r, n), 0 \leq n, 0 \leq r \leq n$ , which may be defined as decomposable $2^{n}$ -dimensional complex $\boldsymbol{G}$ lattices of depth $\mu=n-r$ with the code formulas

\begin{aligned} \Lambda(r, n)=\phi^{n-r} \boldsymbol{G}^{N}+\phi^{n-r-1} \mathrm{RM}(n-1, & n) \\ & +\cdots+\mathrm{RM}(r, n) \end{aligned}

where $N=2^{n}$ . Thus $\Lambda(n, n)$ is $G^{N}$ , or the integer lattice $\boldsymbol{Z}^{2 N}$ , and $\Lambda(0, n)$ is the Barnes-Wall lattice as previously defined. The lattice $\Lambda(n-1, n), n \geq 1$ , is the "checkerboard lattice" $D_{N}$ , with code formula $D_{N}=\phi \boldsymbol{G}^{N}+ \mathrm{RM}(n-1, n)=\phi \boldsymbol{G}^{N}+(N, N-1,2)$ , where $N=2^{n}$ and ( $N, N-1,2$ ) is the single-parity-check code of length $N$ .

In view of the general formula for the dual of a decomposable complex $\boldsymbol{G}$ -lattice and the duality properties of Reed-Muller codes, the duals of the principal sublattices are the decomposable $2^{n}$ -dimensional complex $G$ -lattices

TABLE I Useful Binary Lattices

( $r, n$ )	Λ	$2 N$	$\mu$	Real Code Formula	Complex Code Formula
$(0,0)$	$\boldsymbol{Z}^{\mathbf{2}}$	2	0	$\boldsymbol{Z}^{\mathbf{2}}$	G
$(1,1)$	$Z^{4}$	4	0	$Z^{4}$	$G^{2}$
$(0,1)$	$D_{4}$	4	1	$2 \boldsymbol{Z}^{4}+(4,3,2)$	$\phi \boldsymbol{G}^{2}+(2,1,2)$
$(2,2)$	$Z^{8}$	8	0	$Z^{8}$	$G^{4}$
$(1,2)$	$D_{8}$	8	1	$\mathbf{2} \boldsymbol{Z}^{8}+(8,7,2)$	$\phi \boldsymbol{G}^{4}+(4,3,2)$
$(0,2)$	$E_{8}$	8	2	$2 \boldsymbol{Z}^{8}+(8,4,4)$	$\phi^{2} \boldsymbol{G}^{4}+\phi(4,3,2)+(4,1,4)$
$(3,3)$	$Z^{16}$	16	0	$Z^{16}$	$G^{8}$
$(2,3)$	$D_{16}$	16	1	$2 \boldsymbol{Z}^{16}+(16,15,2)$	$\phi \boldsymbol{G}^{8}+(8,7,2)$
$(1,3)$	$H_{16}$	16	2	$2 \boldsymbol{Z}^{16}+(16,11,4)$	$\phi^{2} \boldsymbol{G}^{8}+\phi(8,7,2)+(8,4,4)$
$(0,3)$	$\Lambda_{16}$	16	3	$\boldsymbol{4} \boldsymbol{Z}^{16}+2(16,15,2)+(16,5,8)$	$\phi^{3} \boldsymbol{G}^{8}+\phi^{2}(8,7,2)+\phi(8,4,4)+(8,1,8)$
$(4,4)$	$Z^{32}$	32	0	$Z^{32}$	$G^{16}$
$(3,4)$	$D_{32}$	32	1	$2 \boldsymbol{Z}^{32}+(32,31,2)$	$\phi G^{16}+(16,15,2)$
$(2,4)$	$X_{32}$	32	2	$2 \boldsymbol{Z}^{32}+(32,26,4)$	$\phi^{2} \boldsymbol{G}^{16}+\phi(16,15,2)+(16,11,4)$
$(1,4)$	$H_{32}$	32	3	$\boldsymbol{4} \boldsymbol{Z}^{32}+2(32,31,2)+(32,16,8)$	$\phi^{3} \boldsymbol{G}^{16}+\phi^{2}(16,15,2)+\phi(16,11,4)$ + (16, 5, 8)
$(0,4)$	$\Lambda_{32}$	32	4	$4 Z^{32}+2(32,26,4)+(32,6,16)$

\begin{gathered} \phi^{4} G^{16}+\phi^{3}(16,15,2)+\phi^{2}(16,11,4) \\ +\phi(16,5,8)+(16,1,16) \end{gathered}

| | - | $Z^{24}$ | 24 | 0 | $\boldsymbol{Z}^{\mathbf{2 4}}$ | $G^{12}$ | | - | $D_{24}$ | 24 | 1 | $2 Z^{24}+(24,23,2)$ | $\phi G^{12}+(12,11,2)$ | | - | $X_{24}$ | 24 | 2 | $2 \boldsymbol{Z}{ }^{24}+(24,18,4)$ | $\phi^{2} \boldsymbol{G}^{12}+\phi(12,11,2)+(12,7,4)$ | | - | $H_{24}$ | 24 | 3 | $4 Z^{24}+2(24,23,2)+(24,12,8)$ | | | | | | | | $\phi^{3} G^{12}+\phi^{2}(12,11,2)+\phi(12,7,4) +(12,5,8)^{\prime}$ | | - | $\Lambda_{24}$ | 24 | 4 | $4 Z^{24}+2(24,18,4)+(24,6,16)^{\prime}$ |

\begin{gathered} \phi^{4} G^{12}+\phi^{3}(12,11,2)+\phi^{2}(12,7,4) \\ +\phi(12,5,8)^{\prime}+(12,1,16)^{\prime} \end{gathered}

of depth $\mu=n-r$ with the code formulas

\begin{aligned} \Lambda(r, n)^{\perp}=\phi^{n-r} \boldsymbol{G}^{N}+\phi^{n-r-1} \mathrm{RM}(n-r & -1, n) \\ & +\cdots+\mathrm{RM}(0, n) \end{aligned}

where $N=2^{n}$ .

#4 Undefined notation for the dual of a Barnes-Wall lattice

Thus $\Lambda(n, n)^{\perp}=\boldsymbol{G}^{N} \simeq \boldsymbol{Z}^{2 N}, \Lambda(0, n)^{\perp}= \Lambda(n)$ , and $\Lambda(n-1, n)^{\perp}, n \geq 1$ , is the dual $D_{N}^{\perp}$ of the checkerboard lattice $D_{N}$ , with code formula $D_{N}^{\perp}=\phi \boldsymbol{G}^{N}+ \mathrm{RM}(0, n)=\phi \boldsymbol{G}^{N}+(N, 1, N)$ , where $N=2^{n}$ and $(N, 1, N)$ is the repetition code of length $N$ .

Table I gives the real and complex code formulas of the Barnes-Wall lattices and their principal sublattices for up to 32 real dimensions ( 16 complex dimensions). In addition to the designations already introduced, we designate $\Lambda(1,3)$ as $H_{16}, \Lambda(1,4)$ as $H_{32}$ , and $\Lambda(2,4)$ as $X_{32}$ . For reference, we also give "code formulas" for the Leech lattice $\Lambda_{24}$ and its principal sublattices $H_{24}, X_{24}, D_{24}$ , and $Z^{24}$ , which are closely related to their 32 -dimensional relatives. ( $\Lambda_{24}$ and its principal sublattice $H_{24}$ are indecomposable complex binary lattices of depths 4 and 3, respectively; in these two cases a notation such as $(24,6,16)^{\prime}$ means the set of all binary linear combinations modulo 4 of a set of six generators whose coordinates are integers modulo 4 , such that the minimum nonzero norm in any coset with such a representative is 16 .)

Table II gives additional information on these lattices. The informativity $k(\Lambda)$ and the normalized informativity $\kappa(\Lambda)$ follow from the dimensionality of the Reed-Muller codes in the complex code formula, and from $k(\Lambda)$ (or from the code formulas) we can compute the redundancy $r(\Lambda)$ and the normalized redundancy $\rho(\Lambda)$ . For all of these lattices, the minimum squared distance $d_{\text {min }}^{2}(\Lambda)$ is equal to $2^{\mu}$ ; it follows that the fundamental coding gain is given by $\gamma(\Lambda)=2^{\mu-\rho}=2^{\kappa}$ . (This expression for the fundamental coding gain has the following interpretation: both $\Lambda$ and $\phi^{\mu} \boldsymbol{G}^{N}$ have the same minimum squared distance $2^{\mu}$ , but $\Lambda$ is the union of $2^{k(\Lambda)}$ cosets of $\phi^{\mu} G^{N}$ and is therefore $2^{k(\Lambda)}$ times as dense as $\phi^{\mu} G^{N}$ in $2 N$ -space. A constellation based on $\Lambda$ will therefore be a factor of $2^{k(\Lambda)}$ smaller in $2 N$ dimensions, or $2^{\kappa(\Lambda)}$ in two dimensions, which translates into power savings of a factor of $2^{\kappa(\mathrm{A})}$ .) We also give the normalized error coefficient $\tilde{N}_{0}= N_{0}(\Lambda) / N$ (normalized to two dimensions), which may be obtained from the lattice literature or from the weight distributions of Reed-Muller codes. Finally, we give the number $2^{\nu}$ of states in the trellis diagrams for $\Lambda$ that are derived in Part II, as well as the corresponding normalized decoding complexity $\tilde{N}_{D}=N_{D}(\Lambda) / N$ , per two dimensions, where $N_{D}(\Lambda)$ is the number of binary operations (additions or comparisons of two numbers) required to decode $\Lambda$ (find the closest element of $\Lambda$ to an arbitrary point $\boldsymbol{r}$ in $2 N$ -space), using the trellis-based algorithms of Part II. (Note that the constellation expansion factor $2^{\rho(\Lambda)}$ is almost a factor of two smaller for $H_{2 N}$ than for $\Lambda_{2 N}$ , while the fundamental coding gain is only slightly inferior (by a factor of $2^{-1 / N}$ ). The error coefficient and decoding complexity are also about a factor of two smaller. Therefore, the lattices $H_{2 N}$ may be attractive alternatives to the densest lattices $\Lambda_{2 N}$ in practical applications. In addition,

TABLE II Useful Binary Lattices

A	$k(\mathrm{~A})$	$\kappa(\Lambda)$	$r(\Lambda)$	$\rho(\Delta)$	$d_{\text {min }}^{2}(\Lambda)$	$\gamma(\Lambda)$	dB	$\tilde{N}_{0}$	$2^{\nu}$	$\tilde{N}_{D}$
$Z^{2}$	0	0	0	0	1	1	0.00	4	1	1
$Z^{4}$	0	0	0	0	1	1	0.00	4	1	1
$D_{4}$	1	1/2	1	1/2	2	$2^{1 / 2}$	1.51	12	2	3.5
$Z^{8}$	0	0	0	0	1	1	0.00	4	1	1
$D_{8}$	3	3/4	1	1/4	2	$2^{3 / 4}$	2.26	28	2	5.75
$E_{8}$	4	1	4	1	4	2	3.01	60	4	11.75
$Z^{16}$	0	0	0	0	1	1	0.00	4	1	1
$D_{16}$	7	7/8	1	1/8	2	$2^{7 / 8}$	2.63	60	2	$\sim 7$
$H_{16}$	11	11/8	5	5/8	4	$2^{11 / 8}$	4.14	284	8	~32
$\Lambda_{16}$	12	3/2	12	3/2	8	$2^{3 / 2}$	4.52	540	16	~ 64
$\boldsymbol{Z}{ }^{32}$	0	0	0	0	1	1	0.00	4	1	1
$D_{32}$	15	15/16	1	1/16	2	$2^{15 / 16}$	2.82	124	2	7.5
$X_{32}$	26	13/8	6	3/8	4	$2{ }^{13 / 8}$	4.89	1244	16	~76
$H_{32}$	31	31/16	17	17/16	8	$2^{31 / 16}$	5.83	5084	128	~792
$\Lambda_{32}$	32	2	32	2	16	4	6.02	9180	256	~1584
$Z^{24}$	0	0	0	0	1	1	0.00	4	1	1
$D_{24}$	11	11/12	1	1/12	2	$2^{11 / 12}$	2.76	92	2	7.25
$X_{24}$	18	3/2	6	1/2	4	$2^{3 / 2}$	4.52	508	8	~42
$H_{24}$	23	23/12	13	13/12	8	$2^{23 / 12}$	5.77	8188	128	~632
$\Lambda_{24}$	24	2	24	2	16	4	6.02	16380	256	~1264

the Leech half-lattice $H_{24}$ is decomposable as a real lattice, with the ( $24,12,8$ ) Golay code appearing in the code formula, whereas the Leech lattice $\Lambda_{24}$ itself is not.)

Because of the nested character of Reed-Muller codes, the code formulas show that $Z^{2 N} \simeq \Lambda(n, n) / \Lambda(n-1, n) / \cdots / \Lambda(0, n)$ is a partition chain of $2^{n}$ -dimensional complex lattices of depths $0 / 1 / \cdots / n$ and with distances $1 / 2 / \cdots / 2^{n}$ (for short). Also, we may verify that $\Lambda(0, n) / \phi \Lambda(1, n) / \cdots / \phi^{n} \Lambda(n, n)=\phi^{n} G^{N} \simeq R^{n} Z^{2 N}$ is a partition chain of $2^{n}$ -dimensional complex lattices of depths $n / n / \cdots / n$ and with distances $2^{n} / 2^{n} / \cdots / 2^{n}$ . Similarly, $\boldsymbol{Z}^{2 N} \simeq \Lambda(n, n)^{\perp} / \Lambda(n-1, n)^{\perp} / \cdots / \Lambda(0, n)^{\perp}=\Lambda(0, n)$ is a partition chain of $2^{n}$ -dimensional complex lattices of depths $0 / 1 / \cdots / n$ and with distances $1 / 2 / \cdots / 2^{n}$ , and $\Lambda(0, n)=\Lambda(0, n)^{\perp} / \phi \Lambda(1, n)^{\perp} / \cdots / \phi^{n} \Lambda(n, n)^{\perp}=\phi^{n} G^{N} \simeq R^{n} Z^{2 N}$ is a partition chain of $2^{n}$ -dimensional complex lattices of depths $n / n / \cdots / n$ and with distances $2^{n} / 2^{n} / \cdots / 2^{n}$ .

Fig. 9 is an illustration of these partition chains in dimensions $2,4,8$ , and 16 , extended indefinitely using the lattices $\phi^{j} \Lambda(r, n) \simeq R^{j} \Lambda(r, n)$ for all $j \geq 0$ . The lattices are arranged in columns according to depth, where for the purposes of this diagram we regard $\phi^{j} \Lambda(r, n) \simeq R^{j} \Lambda(r, n)$ as having depth $n-r$ for any $j \geq 0$ . One unit of vertical distance corresponds to a two-way partition. The lines indicate sublattice relationships. From this diagram, we can easily find the least $\mu$ for which $\phi^{\mu} G^{N} \simeq R^{\mu} Z^{2 N}$ is a sublattice of any given lattice, and thus verify the depths of lattices and partition chains.

In the rest of this paper, we will be considering coset codes $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ based on partitions $\Lambda / \Lambda^{\prime}$ of lattices that appear in Fig. 9. The partitions that we will use are generally those with $\Lambda^{\prime}$ at least as dense as $\Lambda$ , depths no greater than four, and orders no greater than $2^{12}$ . Table III summarizes some of the principal properties of such partitions, including: the (real) dimension $2 N$ ; the order $\left|\Lambda / \Lambda^{\prime}\right|$

Fig. 9. Partition chains involving Barnes-Wall lattices, principal sublattices, and duals of principal sublattices in $2,4,8$ , and 16 dimensions.

of the partition; its depth $\mu\left(\Lambda / \Lambda^{\prime}\right)=\mu\left(\Lambda^{\prime}\right)$ , normalized informativity $\kappa\left(\Lambda / \Lambda^{\prime}\right)=\kappa\left(\Lambda^{\prime}\right)$ , and normalized redundancy $\rho\left(\Lambda / \Lambda^{\prime}\right)=\rho(\Lambda)$ ; the minimum squared distances of $\Lambda$ and $\Lambda^{\prime}$ ; and the normalized (per two dimensions) complexity $\tilde{N}_{D}=N_{D} / N$ of decoding the partition, where $N_{D}$ is the number of binary operations required by the trellis-based decoding algorithms of part II to determine the closest element of each of the $\left|\Lambda / \Lambda^{\prime}\right|$ cosets of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ to an arbitrary point $r$ in $2 N$ -space. (Note: if $\Lambda / \Lambda^{\prime}$ is a partition, then so is $R \Lambda / R \Lambda^{\prime} \simeq \phi \Lambda / \phi \Lambda^{\prime}$ ; if it is simpler to decode $R \Lambda / R \Lambda^{\prime}$ than $\Lambda / \Lambda^{\prime}$ , the lesser $\tilde{N}_{D}$ is given.) The final column gives $\tilde{N}_{D} /\left|\Lambda / \Lambda^{\prime}\right|$ , to show that $\tilde{N}_{D}$ is approximated by $\alpha \mid \Lambda / \Lambda^{\prime}$ , where $\alpha$ is a small number in the range from one to six.

#10 Notation for a lattice in Table III

TABLE III Useful Lattice Partitions

Λ	$\Lambda^{\prime}$	$2 N$	$\left\\|\Lambda / \Lambda^{\prime}\right\\|$	$\mu$	$\kappa$	$\rho$	$d_{\text {min }}^{2}(\Lambda)$	$d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$	$\tilde{N}_{D}$	$\tilde{N}_{D} /\left\\|\Lambda / \Lambda^{\prime}\right\\|$
$Z^{2}$	$R Z^{2}$	2	2	1	0	0	1	2	2	1
$Z^{2}$	$2 Z^{2}$	2	4	2	0	0	1	4	4	1
$\boldsymbol{Z}^{\mathbf{2}}$	$2 R Z^{2}$	2	8	3	0	0	1	8	8	1
$\boldsymbol{Z}^{\mathbf{2}}$	$4 \boldsymbol{Z}^{\mathbf{2}}$	2	16	4	0	0	1	16	16	1
$Z^{4}$	$D_{4}$	4	2	1	1/2	0	1	2	5	2.5
$D_{4}$	$R D_{4}$	4	4	2	1/2	1/2	2	4	10	2.5
$Z^{4}$	$R D_{4}$	4	8	2	1/2	0	1	4	16	2
$D_{4}$	$2 D_{4}$	4	16	3	1/2	1/2	2	8	32	2
$Z^{4}$	$2 D_{4}$	4	32	3	1/2	0	1	8	56	1.75
$D_{4}$	$2 R D_{4}$	4	64	4	1/2	1/2	2	16	112	1.75
$Z^{4}$	$2 R D_{4}$	4	128	4	1/2	0	1	16	208	1.6
$Z^{8}$	$D_{8}$	8	2	1	3/4	0	1	2	6.5	3.25
$D_{8}$	$E_{8}$	8	8	2	1	1/4	2	4	30	3.75
$Z^{8}$	$E_{8}$	8	16	2	1	0	1	4	44	2.75
$E_{8}$	$R E_{8}$	8	16	3	1	1	4	8	60	3.75
$D_{8}^{1}$	$R E_{8}$	8	32	3	1	3/4	2	8	88	2.75

| $D_{8}$ | $R E_{8}$ | 8 | 128 | 3 | 1 | 1/4 | 2 | 8 | 280 | 2.2 | | $Z^{8}$ | $R E_{8}$ | 8 | 256 | 3 | 1 | 0 | 1 | 8 | 504 | $\sim 2$ | | $E_{8}$ | $2 E_{8}$ | 8 | 256 | 4 | 1 | 1 | 4 | 16 | 560 | 2.2 | | $D_{8}$ | $2 E_{8}$ | 8 | $2^{11}$ | 4 | 1 | 1/4 | 2 | 16 | 3792 | ~ 2 | | $Z^{8}$ | $2 E_{8}$ | 8 | $2^{12}$ | 4 | 1 | 0 | 1 | 16 | 7376 | -2 | | $Z^{16}$ | $D_{16}$ | 16 | 2 | 1 | 7/8 | 0 | 1 | 2 | 7.25 | 3.6 | | $D_{16}$ | $H_{16}$ | 16 | 16 | 2 | 11/8 | 1/8 | 2 | 4 | 74 | 4.6 | | $Z^{16}$ | $H_{16}$ | 16 | 32 | 2 | 11/8 | 0 | 1 | 4 | 104 | 3.25 | | $H_{16}$ | $\Lambda_{16}$ | 16 | 128 | 3 | 3/2 | 5/8 | 4 | 8 | 776 | $\sim 6$ | | $D_{16}$ | $\Lambda_{16}$ | 16 | $2^{11}$ | 3 | 3/2 | 1/8 | 2 | 8 | 8440 | $\sim 4$ | | $Z^{16}$ | $\Lambda_{16}$ | 16 | $2^{12}$ | 3 | 3/2 | 0 | 1 | 8 | 16376 | ~4 | | $\Lambda_{16}$ | $R \Lambda_{16}$ | 16 | 256 | 4 | 3/2 | 3/2 | 8 | 16 | 1552 | ~6 |

IV. Trellis Codes

A. Introduction to Trellis Codes

A trellis code is a coset code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ as shown in Fig. 1, where $C$ is a rate- $k /(k+r)$ convolutional code. In this paper $C$ will always be a binary convolutional code, and $\Lambda$ and $\Lambda^{\prime}$ binary lattices, generally mod- 2 or mod- 4 .

The codewords in a rate- $k /(k+r)$ convolutional code may be expressed as sequences ( $\boldsymbol{a}_{t}, \boldsymbol{a}_{t+1}, \cdots$ ) of binary ( $k+r$ )-tuples $\boldsymbol{a}_{j}$ , which serve as labels that select cosets $\Lambda^{\prime}+\boldsymbol{c}\left(\boldsymbol{a}_{j}\right)$ of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ . The code sequences in a trellis code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ therefore consist of the sequences of elements of $\Lambda$ that are congruent to some coset representative sequence ( $c\left(a_{t}\right), c\left(a_{t+1}\right), \cdots$ ) modulo $\Lambda^{\prime}$ , where ( $\boldsymbol{a}_{t}, \boldsymbol{a}_{t+1}, \cdots$ ) is a codeword in $C$ .

For technical reasons, all sequences ( $s_{t}, s_{t+1}, \cdots$ ) are assumed to have a definite starting time $t$ , although they may continue indefinitely. We may associate with any such sequence a formal power series in the delay operator $D$ ,

s(D)=s_{t} D^{t}+s_{t+1} D^{t+1}+\cdots

where $t$ may be any integer, positive or negative. Thus a coset code maps a label sequence $\boldsymbol{a}(D)$ to a coset representative sequence $\boldsymbol{c}(D)$ .

The important properties of a convolutional code are linearity and time-invariance. Linearity means that the mod-2 sum of any two codewords is a codeword. Timeinvariance means that the time shift of any codeword is a codeword, i.e., if $\boldsymbol{a}(D)$ is a codeword, then so is $D \boldsymbol{a}(D)$ . (It follows that a convolutional code is a vector space over the field of binary formal power series $f(D), f_{t} \in\{0,1\}$ ; the dimension of this vector space is $k$ , and any codeword $\boldsymbol{a}(D)$ can be written as $\boldsymbol{a}(D)=\Sigma f_{j}(D) \boldsymbol{g}_{j}(D)$ , where the $\boldsymbol{g}_{j}(D), 1 \leq j \leq k$ , are a set of $k$ generator sequences that form a generator matrix $G$ ; see [20].) We assume that the labeling function $\boldsymbol{c}\left(\boldsymbol{a}_{t}\right)$ is time-invariant; then a trellis code is also time-invariant, although, as we shall see in more detail below, not necessarily linear.

A convolutional code $C$ has a well-defined state space, which is a vector space over the binary field of some dimension $\nu$ . The parameter $\nu$ is called the overall constraint length, or just constraint length, of $C$ . The code $C$ can be generated by a linear (binary) finite-state machine with $k$ inputs, $k+r$ outputs, and $\nu$ binary memory elements; such an encoder has $2^{\nu}$ states.

A trellis diagram for a $2^{\nu}$ -state, rate- $k /(k+r)$ convolutional code is an extended state transition diagram for the encoder that generates $C$ . For each time $t$ , it has $2^{\nu}$ nodes, or states, representing the possible states at time $t$ . For each possible state transition, it has a branch connecting the two corresponding nodes. There are $2^{k}$ branches leaving and entering each node, and each is labeled with the ( $k+r$ )-tuple $\boldsymbol{a}$ that represents the encoder output associated with that state transition. Thus we may obtain a trellis diagram for a trellis code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ by taking a trellis diagram for $C$ and replacing each label $\boldsymbol{a}$ by the corresponding coset representative $\boldsymbol{c}(\boldsymbol{a})$ , representing the coset $\Lambda^{\prime}+c(a)$ .

The minimum Hamming distance $d_{H}(C)$ of a convolutional code $C$ is the minimum Hamming distance between any two codewords in $C$ , i.e., the minimum number of coordinate differences between the outputs $\boldsymbol{a}$ on any two paths in the trellis that start and end in a common state. Because $C$ is linear, this is also the minimum Hamming weight of any codeword, which is the minimum weight of any path that starts and ends on a zero state.

The minimum squared distance $d_{\text {min }}^{2}(\mathbb{C})$ of a trellis code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is the minimum squared distance between any two code sequences in $\mathbb{C}$ , which is the lesser of a) the minimum distance $\left\|\lambda_{1}(D)-\lambda_{2}(D)\right\|^{2}$ between sequences $\lambda_{1}(D)$ and $\lambda_{2}(D)$ that correspond to two distinct paths that begin and end in a common state; and b) the minimum distance $d_{\min }^{2}\left(\Lambda^{\prime}\right)$ between elements of $\Lambda^{\prime}$ , corresponding to "parallel transitions" associated with any given branch. (If $\lambda(D)$ is a code sequence, then so is $\lambda(D)+\lambda^{\prime} D^{j}$ for any $\lambda^{\prime} \in \Lambda^{\prime}$ .)

#15 Description of the Ungerboeck encoder in Sec. IV.A

For example, the four-state Ungerboeck code shown in Figs. 2 and 3 uses the four-state rate-1/2 convolutional code whose encoder and trellis diagram are illustrated in Fig. 10. Contrary to convention, the encoder is shown in coset code form, using the partition ( $2,2,1$ )/ $(2,1,2) /(2,0, \infty)$ of binary codes of length 2 . Let $\boldsymbol{g}_{0}$ be a coset representative for the nonzero coset in the partition $(2,2) /(2,1)$ , e.g., $\boldsymbol{g}_{0}=[10]$ , and let $\boldsymbol{g}_{1}$ be the coset representative for the nonzero coset in the partition $(2,1) /(2,0)$ , i.e., $\boldsymbol{g}_{1}=[11]$ .

The two bits $\boldsymbol{a}=\left(\boldsymbol{a}_{0}, \boldsymbol{a}_{1}\right)$ select a 2 -tuple $\boldsymbol{c}(\boldsymbol{a})=a_{0} \boldsymbol{g}_{0}+a_{1} \boldsymbol{g}_{1}$ , representing one of the four cosets of $(2,0)$ (the single codeword [00]) in the four-way partition $(2,2) /(2,0)$ . In the trellis diagram, branches are labeled by both $\boldsymbol{a}$ and $\boldsymbol{c}(\boldsymbol{a})$ .

Fig. 10. Convolutional encoder $C$ for four-state Ungerboeck code of Figs. 2 and 3, and trellis diagram labeled with both $a$ and $c(a)$ .

The minimum Hamming distance of this code (taking the outputs as $\boldsymbol{c}(\boldsymbol{a}))$ is five, because the distance between distinct paths is at least two where they diverge, two where they merge, and one somewhere in between. (This is because the difference between paths is a codeword of the ( $2,1,2$ ) code where they merge and diverge, and a codeword in the $(2,2,1)$ code somewhere in between; that is, we are exploiting the Ungerboeck distance bound for this code partition chain.)

The trellis code of Figs. 2 and 3 is obtained by replacing the code partition chain $(2,2) /(2,1) /(2,0)$ by the corresponding partition of $\bmod -2$ lattices, $Z^{2} / R Z^{2} / 2 Z^{2} \simeq \boldsymbol{G} / \boldsymbol{\phi} \boldsymbol{G} / \boldsymbol{\phi}^{2} \boldsymbol{G}$ . Note that $\boldsymbol{g}_{0}$ and $\boldsymbol{g}_{1}$ are still coset representatives for the nonzero cosets of $R \boldsymbol{Z}^{2}$ in the partition $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} \simeq \boldsymbol{G} / \phi \boldsymbol{G}$ and of $2 \boldsymbol{Z}^{2}$ in the partition $R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2} \simeq \phi \boldsymbol{G} / \phi^{2} \boldsymbol{G}$ , respectively, if we regard them as integers modulo 2. The trellis diagram of Fig. 10 then continues to represent this trellis code, where we now regard the 2tuples $\boldsymbol{c}(\boldsymbol{a})$ as coset representatives of cosets of $2 \boldsymbol{Z}^{2}$ .

It is easy to see that the minimum squared distance between code sequences corresponding to distinct paths in the trellis is the minimum Hamming distance between sequences $c(D)$ of coset representatives and thus is equal to $d_{H}(C)=5$ . However, since $d_{\min }^{2}\left(2 \boldsymbol{Z}^{2}\right)=4$ , the minimum squared distance of the trellis code is $d_{\text {min }}^{2}(\mathbb{C})=4$ . If $\lambda(D)$ is any code sequence, the only code sequences at distance 4 from $\lambda(D)$ are the sequences $\lambda(D)+\lambda^{\prime} D^{j}$ , where $\lambda^{\prime}$ is one of the four elements of $2 \boldsymbol{Z}^{2}$ of norm 4 , namely $\pm(2,0)$ and $\pm(0,2)$ . These are special cases of general results for trellis codes based on partitions of mod-2 lattices that will be given below.

B. Geometrical Parameters

As with lattices, the two principal geometrical parameters of a trellis code are the minimum squared distance $d_{\text {min }}^{2}(\mathbb{C})$ between its code sequences and its fundamental volume $V(\mathbb{C})$ (per $N$ dimensions); these determine its fundamental coding gain $\gamma(\mathbb{C})$ .

We have already introduced $d_{\text {min }}^{2}(\mathbb{C})$ and noted that $d_{\text {min }}^{2}(\mathbb{C})=\min \left[d_{0}^{2}, d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)\right]$ , where $d_{0}^{2}$ is the minimum squared distance $\left\|\lambda_{1}(D)-\lambda_{2}(D)\right\|^{2}$ between code sequences $\lambda_{1}(D)$ and $\lambda_{2}(D)$ that correspond to two distinct paths that begin and end in a common state in the code trellis. If, as with convolutional codes, the distribution of distances from a given code sequence to all other code sequences does not depend on the given code sequence, then $d_{\text {min }}^{2}(\mathbb{C})$ is the minimum squared distance from the all-zero code sequence to any other code sequence, i.e., the minimum norm of any code sequence. We call such codes distance-invariant. All codes in this paper are distanceinvariant.

The error coefficient $N_{0}(\mathbb{C})$ is the average number of code sequences that differ from a given sequence by $d_{\text {min }}^{2}(\mathbb{C})$ and that first differ from the given sequence at a given time $t$ . By time-invariance, $N_{0}(\mathbb{C})$ does not depend on the time $t$ , and we may take $t=0$ , say. If the code is distanceinvariant, then the number of code sequences that differ from a given sequence by $d_{\text {min }}^{2}(\mathbb{C})$ does not depend on the given sequence, and we may take the given sequence as the all-zero sequence. Thus in a distance-invariant code $N_{0}(\mathbb{C})$ is the number of sequences of norm $d_{\text {min }}^{2}(\mathbb{C})$ that start at time zero.

The fundamental volume is a trickier concept. Intuitively, since the code $C$ conveys $k$ bits of information per unit time (per $N$ dimensions) and has $r$ bits of redundancy, it is clear that in some sense the trellis code $\mathbb{C}$ is $2^{k}$ times as dense as $\Lambda^{\prime}$ and a factor of $2^{r}$ less dense than $\Lambda$ , per $N$ dimensions. Therefore, the fundamental volume $V(\mathbb{C})$ should be equal to $2^{-k} V\left(\Lambda^{\prime}\right)$ , or to $2^{r} V(\Lambda)$ .

To substantiate this proposition, we argue as follows. Define $\mathbb{C}_{t}$ as the set of all code sequences that start at time $t$ or later. By time-invariance, all such sets are isomorphic to each other and to a particular such set, say $\mathbb{C}_{0}$ . How- ever, $\mathbb{C}_{t}$ is also a proper subset of $\mathbb{C}_{t^{\prime}}$ if $t>t^{\prime}$ ; e.g., $\mathbb{C}_{1}$ is a proper subset of $\mathbb{C}_{0}$ .

Define two code sequences as equivalent modulo $\mathbb{C}_{1}$ if their first difference is at time $t$ or later. Two sequences in $\mathbb{C}_{0}$ are then equivalent modulo $\mathbb{C}_{1}$ if and only if their first element $c_{0}$ is the same. Let $\Lambda_{0}$ then be the set of all possible first elements $c_{0}$ ; that is, $\Lambda_{0}=\left\{\lambda: \lambda \in \Lambda^{\prime}+\right. \boldsymbol{c}\left(\boldsymbol{a}_{0}\right)$ }, where $\boldsymbol{a}_{0}$ is a possible time-zero output from the encoder $C$ given that all previous outputs were zero or, equivalently, given that the encoder starts in the zero state. There are $2^{k}$ such $a_{0}$ , and thus $\Lambda_{0}$ is the union of $2^{k}$ cosets of $\Lambda^{\prime}$ . Ordinarily, $\Lambda_{0}$ is a lattice, which we call the time-zero lattice. By Lemma $1, V\left(\Lambda_{0}\right)=2^{-k} V\left(\Lambda^{\prime}\right)$ ; since $V\left(\Lambda^{\prime}\right)=2^{k+r} V(\Lambda)$ , we also have $V\left(\Lambda_{0}\right)=2^{r} V(\Lambda)$ .

In an appropriate sense, therefore, the equivalence classes of $\mathbb{C}_{0}$ modulo $\mathbb{C}_{1}$ , which we may write as $\mathbb{C}_{0} / \mathbb{C}_{1}$ , are isomorphic to the time-zero lattice $\Lambda_{0}$ . The set $\mathbb{C}_{0}$ has the decomposition

\mathbb{C}_{0}=\mathbb{C}_{0} / \mathbb{C}_{1}+\mathbb{C}_{1} / \mathbb{C}_{2}+\cdots

which by time-invariance is isomorphic to the Cartesian product $\Lambda_{0} \times \Lambda_{0} \times \cdots$ . In other words, $\mathbb{C}_{0}$ fills space as densely as does $\Lambda_{0} \times \Lambda_{0} \times \cdots$ . Thus it is reasonable to define the fundamental volume of $\mathbb{C}$ per $N$ dimensions as $V(\mathbb{C})=V\left(\Lambda_{0}\right)$ .

Now we may define the fundamental coding gain of a trellis code in the same way as we did for a lattice:

\gamma(\mathbb{C})=d_{\min }^{2}(\mathbb{C}) / V(\mathbb{C})^{2 / N} .

Let us now write $k(C)$ and $r(C)$ for the parameters $k$ and $r$ of a convolutional code $C$ , and $\kappa(C)$ and $\rho(C)$ for the normalized informativity and redundancy $\kappa(C)= 2 k(C) / N$ and $\rho(C)=2 r(C) / N$ , respectively. Since $V(\mathbb{C}) =V\left(\Lambda_{0}\right)=2^{-k(C)} V\left(\Lambda^{\prime}\right)=2^{r(C)} V(\Lambda)$ , we also have the expressions

\begin{aligned} \gamma(\mathbb{C}) & =2^{\kappa(C)} d_{\min }^{2}(\mathbb{C}) / V\left(\Lambda^{\prime}\right)^{2 / N} \\ & =2^{\kappa(C)}\left[d_{\min }^{2}(\mathbb{C}) / d_{\min }^{2}\left(\Lambda^{\prime}\right)\right] \gamma\left(\Lambda^{\prime}\right) \\ & =2^{\kappa(\mathbb{C})}\left[d_{\min }^{2}(\mathbb{C}) / 2^{\mu(\mathbb{C})}\right] \end{aligned}

where $\kappa(\mathbb{C})=\kappa(C)+\kappa\left(\Lambda^{\prime}\right)$ and $\mu(\mathbb{C})=\mu\left(\Lambda^{\prime}\right)$ . Also,

\begin{aligned} \gamma(\mathbb{C}) & =2^{-\rho(C)} d_{\min }^{2}(\mathbb{C}) / V(\Lambda)^{2 / N} \\ & =2^{-\rho(C)}\left[d_{\min }^{2}(\mathbb{C}) / d_{\min }^{2}(\Lambda)\right] \gamma(\Lambda) \\ & =2^{-\rho(\mathbb{C})} d_{\min }^{2}(\mathbb{C}) \end{aligned}

where $\rho(\mathbb{C})=\rho(C)+\rho(\Lambda)$ . Thus if we define the normalized redundancy, informativity, and depth of the trellis code $\mathbb{C}$ as the sums of the corresponding quantities for the code $C$ and partition $\Lambda / \Lambda^{\prime}$ , where we regard the depth of $C$ as 0 , then we get expressions analogous to those that we obtained for lattices.

The following lemma is both useful in itself and also gives an intuitive explanation of these formulas.

Lemma 5: If $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is a trellis code based on a partition $\Lambda / \Lambda^{\prime}$ of binary lattices, where the depth of $\Lambda^{\prime}$ is $\mu$ , and a $2^{\nu}$ -state, rate- $k /(k+r)$ convolutional code $C$ , then there is an equivalent trellis code $\mathbb{C}\left(\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N^{\prime}} ; C^{\prime}\right)$ based on the partition $\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N}$ , where $C^{\prime}$ is a $2^{\nu}$ -state, rate- $\left[k+k\left(\Lambda^{\prime}\right)\right] / N \mu$ convolutional code, and $N$ is the dimension of $\Lambda$ or $\Lambda^{\prime}$ as complex lattices.

Proof: If $\Lambda^{\prime}$ is a binary lattice of depth $\mu$ , then $\boldsymbol{G}^{N} / \Lambda / \Lambda^{\prime} / \phi^{\mu} \boldsymbol{G}^{N}$ is a partition chain, with $\left|\boldsymbol{G}^{N} / \Lambda\right|=2^{r(\Lambda)}$ , $\left|\Lambda / \Lambda^{\prime}\right|=2^{k+r},\left|\Lambda^{\prime} / \phi^{\mu} \boldsymbol{G}^{N}\right|=2^{k\left(\Lambda^{\prime}\right)}$ , and $\left|\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N}\right|=2^{N_{\mu}}$ . In view of the coset decomposition $\boldsymbol{G}^{N}=\phi^{\mu} \boldsymbol{G}^{N}+ \left[\Lambda^{\prime} / \phi^{\mu} \boldsymbol{G}^{N}\right]+\left[\Lambda / \Lambda^{\prime}\right]+\left[\boldsymbol{G}^{N} / \Lambda\right]$ , we may select a coset of $\phi^{\mu} \boldsymbol{G}^{\boldsymbol{N}}$ in the partition $\boldsymbol{G}^{\boldsymbol{N}} / \phi^{\mu} \boldsymbol{G}^{\boldsymbol{N}}$ by the following set of $N \mu$ bits: an all-zero $r(\Lambda)$ -tuple $\mathbf{0}$ , which selects the zero coset of $\Lambda$ in the partition $G^{N} / \Lambda$ , namely, $\Lambda$ itself; a $(k+r)$ -tuple $\boldsymbol{a}$ , which selects the coset $\Lambda^{\prime}+\boldsymbol{c}(\boldsymbol{a})$ of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ as in the original trellis code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ ; and finally, a $k\left(\Lambda^{\prime}\right)$ -tuple $a^{\prime}$ of "uncoded bits" which selects one of the $2^{k\left(\Lambda^{\prime}\right)}$ cosets of $\phi^{\mu} G^{N}$ whose union is $\Lambda^{\prime}$ . These $N \mu$ bits can be regarded as the outputs of an augmented convolutional encoder $C^{\prime}$ , which has $k+k\left(\Lambda^{\prime}\right)$ information bits, $r+r(\Lambda)$ redundant bits, and the same number of states as the original encoder for $C$ , as illustrated in Fig. 11. The set of code sequences that may be generated by this augmented encoder are the same as those in the original code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ .

Fig. 11. Augmented encoder $C^{\prime}$ of Lemma 5.

An alternative form of Lemma 5 is as follows. If the 2-depth of $\Lambda^{\prime}$ is $m$ , then $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is equivalent to a code $\mathbb{C}\left(\boldsymbol{Z}^{N} / 2^{m} \boldsymbol{Z}^{N^{\prime}} ; C^{\prime}\right)$ based on the partition $\boldsymbol{Z}^{N} / 2^{m} \boldsymbol{Z}^{N}$ , where $C^{\prime}$ is a $2^{\nu}$ -state, rate- $[N m-r-r(\Lambda)] / N m$ convolutional code, and $N$ is the dimension of $\Lambda$ or $\Lambda^{\prime}$ as real lattices. The proof is essentially the same. Indeed, if $\mu$ is even, $\mu=2 m$ , then the partition $\boldsymbol{Z}^{2 N} / \Lambda / \Lambda^{\prime} / 2^{m} \boldsymbol{Z}^{2 N}$ is the same as $\boldsymbol{G}^{N} / \Lambda / \Lambda^{\prime} / \boldsymbol{\phi}^{\mu} \boldsymbol{G}^{N}$ , and the augmented encoder $C^{\prime}$ is the same; if $\mu$ is odd, $\mu=2 m-1$ , then the partition $\boldsymbol{Z}^{2 N} / \Lambda / \Lambda^{\prime} / 2^{m} \boldsymbol{Z}^{2 N}$ is an extension of $\boldsymbol{G}^{N} / \Lambda / \Lambda^{\prime} / \phi^{\mu} \boldsymbol{G}^{N}$ by the partition $\phi^{\mu} \boldsymbol{G}^{N} / \phi^{\mu+1} \boldsymbol{G}^{N}$ , and the augmented encoder $C^{\prime}$ just uses $N$ more uncoded bits.

The coding gain $\gamma(\mathbb{C})$ may now be related to that of the lattices $\boldsymbol{G}^{N}, \Lambda, \Lambda^{\prime}$ , and $\phi^{\mu} \boldsymbol{G}^{N}$ as follows. Relative to $\gamma\left(\Lambda^{\prime}\right)=2^{\kappa\left(\Lambda^{\prime}\right)}\left[d_{\text {min }}^{2}\left(\Lambda^{\prime}\right) / 2^{\mu\left(\Lambda^{\prime}\right)}\right]$ , the gain $\gamma(\mathbb{C})$ is greater by a factor of $2^{\kappa(C)}$ due to the fact that $\mathbb{C}$ conveys $\kappa(C)$ more bits of information per two dimensions, offset by a distance loss factor of $d_{\text {min }}^{2}(\mathbb{C}) / d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ (if any).

#16 Incorrect formula for lattice coding gain

Relative to $\gamma(\Lambda)=2^{-\rho(\Lambda)} d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ , the gain $\gamma(\mathbb{C})$ is greater by the distance gain factor of $d_{\text {min }}^{2}(\mathbb{C}) / d_{\text {min }}^{2}(\Lambda)$ , offset by a power loss of $2^{-\rho(C)}$ due to constellation expansion.

If $\Lambda^{\prime}=\phi^{\mu} \boldsymbol{G}^{N}$ , then $\gamma\left(\Lambda^{\prime}\right)=1$ and $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2^{\mu\left(\Lambda^{\prime}\right)}$ , so $\gamma(\mathbb{C})$ is simply $2^{\kappa(\mathbb{C})}$ , offset by a distance loss factor of $d_{\text {min }}^{2}(\mathbb{C}) / 2^{\mu\left(\Lambda^{\prime}\right)}$ (if any). If $\Lambda=\boldsymbol{G}^{N}$ , then $\gamma(\Lambda)=1$ and $d_{\text {min }}^{2}(\Lambda)=1$ , so $\gamma(\mathbb{C})$ is simply $d_{\text {min }}^{2}(\mathbb{C})$ , offset by a power loss of $2^{-\rho(\mathbb{C})}$ due to constellation expansion.

This last expression is the simplest and shows that we need to know only the minimum squared distance $d_{\text {min }}^{2}(\mathbb{C})$ and the normalized redundancy $\rho(\mathbb{C})$ to compute the fundamental coding gain $\gamma(\mathbb{C})=2^{-\rho(\mathbb{C})} d_{\text {min }}^{2}(\mathbb{C})$ , where the normalized redundancy is simply the sum of the normalized redundancies of the code $C$ and the lattice $\Lambda$ . If $\Lambda=\boldsymbol{G}^{N} \simeq \boldsymbol{Z}^{2 N}$ , then it suffices to know the normalized redundancy $r / N$ of the code $C$ . A small normalized redundancy $\rho(\mathbb{C})$ is thus desirable to both minimize constellation expansion and maximize coding gain for a given $d_{\text {min }}^{2}(\mathbb{C})$ , as was recognized by Wei [11].

Lemma 5 shows that all trellis codes based on partitions of binary lattices are equivalent to trellis codes based on partitions $\boldsymbol{G}^{N} / \phi^{\mu} \boldsymbol{G}^{N}$ , or, by extension, partitions $\boldsymbol{Z}^{N} / 2^{m} \boldsymbol{Z}^{N}$ of the integer lattice $\boldsymbol{Z}^{N}$ , so that in principle only these kinds of partitions need to be considered to discover all binary trellis codes. In practice, consideration of more general partitions $\Lambda / \Lambda^{\prime}$ both facilitates the search for good codes and simplifies their encoding and decoding.

C. Linear Trellis Codes

In general, if $c_{1}$ and $c_{2}$ are two code sequences in a trellis code $\mathbb{C}$ , it is not necessarily true that $\boldsymbol{c}_{1}+\boldsymbol{c}_{2}$ and $c_{1}-c_{2}$ are also code sequences. When this property does hold, we say that $\mathbb{C}$ is a linear trellis code. In this section we give some examples of linear trellis codes, including the important case where $\Lambda / \Lambda^{\prime}$ is a partition of mod-2 binary lattices.

If a trellis code is linear, then it is a lattice, albeit an infinite-dimensional lattice. Because it is a time-invariant infinite-dimensional lattice, it is usually possible to define its parameters on a per-unit-time or per-two-dimensions basis, so that they are perfectly finite and analogous to the parameters of a finite-dimensional lattice, as we have already seen in the previous section. In this sense, linear trellis codes are to finite-dimensional lattices as convolutional codes are to block codes. (To extend this analogy, nonlinear trellis codes are a generalization of nonlattice finite-dimensional sphere packings.)

A trivial example of a linear trellis code is the repeated use of a lattice code. If $\Lambda$ is any lattice, let $\Lambda^{\infty}$ be defined as the set of all sequences ( $\lambda_{t}, \lambda_{t+1}, \cdots$ ), where $\lambda_{t} \in \Lambda$ , $j \geq t$ . This may be regarded as a trellis code based on the "dummy partition" $\Lambda / \Lambda$ where the convolutional encoder $C$ disappears. The trellis associated with $\Lambda^{\infty}$ has one state for each time $t$ , and the branch in each time interval is labeled by $\Lambda$ .

If $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is a linear trellis code, it is a sublattice of $\Lambda^{\infty}$ and has $\left(\Lambda^{\prime}\right)^{\infty}$ as a sublattice, so $\Lambda^{\infty} / \mathbb{C} /\left(\Lambda^{\prime}\right)^{\infty}$ is a partition chain. If the partition $\Lambda / \Lambda^{\prime}$ has depth $\mu$ , then $\left(\boldsymbol{G}^{N}\right)^{\infty} / \mathbb{C} /\left(\phi^{\mu} \boldsymbol{G}^{N}\right)^{\infty}$ is a partition chain; if it has 2 -depth $m$ , then $\left(\boldsymbol{Z}^{N}\right)^{\infty} / \mathbb{C} /\left(2^{m} \boldsymbol{Z}^{N}\right)^{\infty}$ is a partition chain.

A trellis code $\mathbb{C}$ has little chance of being linear unless the mapping $c(a)$ from encoder output ( $k+r$ )-tuples (labels) $\boldsymbol{a}$ to coset representatives $\boldsymbol{c}$ is linear modulo $\Lambda^{\prime}$ , as in Lemma 2; i.e., $\boldsymbol{c}(\boldsymbol{a})=\boldsymbol{a} G$ , where $G=\left\{\boldsymbol{g}_{j}, 1 \leq j \leq k+r\right\}$ is a generator matrix of $k+r$ vectors of $\Lambda$ that span $\Lambda$ , modulo $\Lambda^{\prime}$ . The following lemma shows that when $\Lambda / \Lambda^{\prime}$ is a partition of mod-2 lattices and the labeling map is linear, $\mathbb{C}$ is linear, and indeed isomorphic to a binary convolutional code, in the same sense as a mod-2 binary lattice $\Lambda$ is isomorphic to a binary block code, described in Lemma 3 (recall that if $\boldsymbol{c}(\boldsymbol{a})$ is an Ungerboeck labeling, it is linear modulo $\Lambda^{\prime}$ ).

#17 Incomplete premise in Lemma 6

Lemma 6: If $\Lambda^{\prime}$ is a mod-2 lattice, $C$ is a $2^{\nu}$ -state, rate- $k /(k+r)$ convolutional code and the labeling map $\boldsymbol{c}(\boldsymbol{a})$ is linear modulo $\Lambda^{\prime}$ , then a trellis code $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is the set of all sequences of integer $N$ -tuples that are congruent modulo 2 to one of the words in a $2^{\nu}$ -state rate- $[N-r(\mathbb{C})] / N$ convolutional code $C^{\prime}$ .

The redundancy of $\mathbb{C}$ is $r(\mathbb{C})=r+r(\Lambda)$ , and its minimum squared distance is $d_{\text {min }}^{2}(\mathbb{C})=\min \left[4, d_{H}\left(C^{\prime}\right)\right]$ .

Sketch of proof: By the extension of Lemma 5, $\mathbb{C}$ is equivalent to a code based on the partition $\boldsymbol{Z}^{N} / 2 \boldsymbol{Z}^{N}$ , where the augmented encoder $C^{\prime}$ has $N$ output bits and redundancy $r\left(C^{\prime}\right)=r+r(\Lambda)$ . In the augmented encoder, the ( $k+r$ )-tuple $\boldsymbol{a}$ and the uncoded bits $\boldsymbol{a}^{\prime}$ specify a coset of $2 \boldsymbol{Z}^{N}$ in the partition $\boldsymbol{Z}^{N} / 2 \boldsymbol{Z}^{N}$ , which may be specified by a binary $N$ -tuple $\boldsymbol{c}^{\prime}\left(\boldsymbol{a}, \boldsymbol{a}^{\prime}\right)$ . The mapping $\boldsymbol{c}^{\prime}\left(\boldsymbol{a}, \boldsymbol{a}^{\prime}\right)$ may be taken to be linear $\bmod 2$ if $\boldsymbol{c}(\boldsymbol{a})$ is linear modulo $\Lambda^{\prime}$ . Thus $\mathbb{C}$ is set of all sequences of integer $N$ -tuples that are congruent modulo 2 to one of the words in the convolutional code $C^{\prime}$ . The minimum squared distance between code sequences corresponding to distinct codewords of $C^{\prime}$ is $d_{H}\left(C^{\prime}\right)$ , and $d_{\min }^{2}\left(2 \boldsymbol{Z}^{N}\right)=4$ .

The four-state Ungerboeck code shown in Figs. 2, 3, and 10 is an example of a code of this type. The encoder of Fig. 10 is of the form of the augmented encoder of Fig. 11. Many of the important known codes to be listed in the next section, including Gallager-Calderbank-Sloane (GCS)-type codes and most of the Wei codes, are of this type.

Even when a code is not linear, it may still be regular in the sense of Calderbank and Sloane [13]. A labeling $\boldsymbol{c}(\boldsymbol{a})$ is defined as regular if the minimum squared distance between points in two cosets $\Lambda^{\prime}+\boldsymbol{c}(\boldsymbol{a})$ and $\Lambda^{\prime}+\boldsymbol{c}\left(\boldsymbol{a}^{\prime}\right)$ is a function only of the mod-2 sum $\boldsymbol{a} \oplus \boldsymbol{a}^{\prime}$ of their labels or, equivalently, if the minimum norm in the coset $\Lambda^{\prime}+c(a) -\boldsymbol{c}\left(\boldsymbol{a}^{\prime}\right)$ is equal to the minimum norm in the coset $\Lambda^{\prime}+ \boldsymbol{c}\left(\boldsymbol{a} \oplus \boldsymbol{a}^{\prime}\right)$ . For example, the 2-bit standard binary representation is a regular labeling of the four cosets in the partition $\boldsymbol{Z} / \mathbf{4 Z}$ . If $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ is a code based on a partition $\Lambda / \Lambda^{\prime}$ with a regular labeling, the minimum squared distance between code sequences in the coset sequences $\boldsymbol{c}(\boldsymbol{a}(D))$ and $\boldsymbol{c}\left(\boldsymbol{a}^{\prime}(D)\right)$ corresponding to codewords $\boldsymbol{a}(D)$ and $\boldsymbol{a}^{\prime}(D)$ is then equal to the minimum norm of any code sequence in the coset sequence $c(a(D) \oplus \boldsymbol{a}^{\prime}(D)$ ) corresponding to the codeword $\boldsymbol{a}(D) \oplus \boldsymbol{a}^{\prime}(D)$ . Therefore, the distribution of distances from any given code sequence to all other code sequences is the same as the norm distribution of code sequences, and the code is thus distance-invariant.

A labeling is regular under any of the following conditions: a) if it is linear, in the sense that $\boldsymbol{c}(\boldsymbol{a})-\boldsymbol{c}\left(\boldsymbol{a}^{\prime}\right)=\boldsymbol{c}\left(\boldsymbol{a} \oplus \boldsymbol{a}^{\prime}\right)$ modulo $\Lambda^{\prime}$ , e.g., whenever $\Lambda$ and $\Lambda^{\prime}$ are mod- 2 lattices; b) if it is an Ungerboeck labeling and the Ungerboeck distance bound always holds with equality, e.g., any Ungerboeck labeling for $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2} / 2 R \boldsymbol{Z}^{2} \simeq \boldsymbol{G} / \phi \boldsymbol{G} / \phi^{2} \boldsymbol{G} / \phi^{3} \boldsymbol{G}$ ; c) if $\Lambda$ and $\Lambda^{\prime}$ are $N$ -fold Cartesian products $\Lambda^{N}$ and $\left(\Lambda^{\prime}\right)^{N}$ , and the labeling for $\Lambda^{N} /\left(\Lambda^{\prime}\right)^{N}$ is the $N$ -fold Cartesian product of a regular labeling for $\Lambda / \Lambda^{\prime}$ , e.g., when the partition is $\boldsymbol{G}^{N} / \phi^{3} \boldsymbol{G}^{N}$ or $\boldsymbol{Z}^{N} / \mathbf{4} \boldsymbol{Z}^{N}$ .

In fact, regular labelings (although not necessarily regular Ungerboeck labelings) exist for all partitions used in all the codes covered in this paper.

V. Known Classes of Trellis Codes

We shall now categorize the principal classes of trellis codes that have so far appeared in the literature according to the parameters of the previous sections. In the next section, we give further generic classes. We shall then compare and contrast all of those schemes, including lattice codes.

Ungerboeck [8] developed classes of one- and twodimensional trellis codes using rate- $k /(k+1)$ binary convolutional codes. From the viewpoint of this paper, his one-dimensional schemes are based on the four-way partition $\boldsymbol{Z} / \mathbf{4 Z}$ of the integers into the four residue classes modulo 4 , in combination with a binary rate- $1 / 2$ convolutional coder to select cosets of $\mathbf{4 Z}$ . His two-dimensional schemes for rectangular constellations, which have the greatest practical importance, are based on either the fourway partition $\boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}$ in combination with a rate-1/2 convolutional encoder, or the eight-way partition $Z^{2} / 2 R Z^{2}$ with a rate- $2 / 3$ convolutional encoder. He also gives codes using phase-modulated constellations that are based on similar principles and may be regarded as coset codes (see Section I-C), but that will not be covered here.

Table IV gives the characteristics of the Ungerboeck one- and two-dimensional schemes. The codes achieve increasing $d_{\text {min }}^{2}$ as the number $2^{\nu}$ of states increases from 4 to 512 , up to the maximum possible value of $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2^{\mu}$ . (Note: We use the codes listed in [21], where minor corrections have been made in the earlier code tables.) The depth is $\mu=4$ for the one-dimensional schemes, while in two dimensions $\mu=2$ or 3 . The redundancy $r$ is one for both classes, but the normalized redundancy $\rho$ (per two dimensions) is thus two in the one-dimensional case, versus one in the two-dimensional case. The fundamental coding gain $\gamma$ is given by the formula $2^{-\rho} d_{\text {min }}^{2}$ and is also given in decibels. $N_{0}$ is the number of nearest neighbors, and $\tilde{N}_{0}=2 N_{0} / N$ is the error coefficient normalized to two dimensions. $N_{D}$ is the number of decoding operations using the trellis-based decoding algorithms of the partition $\Lambda / \Lambda^{\prime}$ whose complexity is given in Table III, followed by a conventional Viterbi algorithm for the convolutional code, and $\tilde{N}_{D}=2 N_{D} / N$ is the decoding complexity per two dimensions. (For each unit of time, for each of the $2^{\nu}$ states, the Viterbi algorithm requires $2^{k}$ additions and a comparison of $2^{k}$ numbers, or $2^{k}-1$ binary comparisons, so that its complexity is $\beta 2^{k+\nu}$ , where $\beta=2-2^{-k}$ , and $2^{k+\nu}$ is the number of branches per stage of the trellis, which is the measure of complexity used by Ungerboeck [21], following Wei [11].)

The error coefficient reduces the effective coding gain by an amount that depends on the steepness of the error probability curve. In this paper, we will use the rule of thumb that every factor of two increase in the error coefficient reduces the coding gain by about 0.2 dB (at error rates of the order of $10^{-6}$ ); this will enable us to compute an effective coding gain $\gamma_{\text {eff }}$ (in dB), normalized for the error coefficient $\tilde{N}_{0}$ . In principle, the error coefficients at every distance ought to be considered, and the effective coding gain evaluated in the same way for each; if

TABLE IV Ungerboeck Codes

$N$	Λ	$\Lambda^{\prime}$	$2^{\nu}$	$k /(k+r)$	$\rho$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$
One-dimensional
1	$Z$	$4 Z$	4	1/2	2	9	2.25	3.52	8	24
1	$Z$	$4 Z$	8	1/2	2	10	2.5	3.98	8	48
1	$Z$	$4 Z$	16	1/2	2	11	2.75	4.39	16	96
1	$Z$	$4 Z$	32	1/2	2	13	3.25	5.12	24	192
1	$Z$	$4 Z$	64	1/2	2	14	3.5	5.44	72	384
1	$Z$	$4 Z$	128	1/2	2	16	4	6.02	132	768
1	$Z$	$4 Z$	256	1/2	2	16	4	6.02	4	1536
1	$Z$	$4 Z$	512	1/2	2	16	4	6.02	4	3072
Two-dimensional
2	$Z^{2}$	$2 Z^{2}$	4	1/2	1	4	2	3.01	4	16
2	$Z^{2}$	$2 R Z^{2}$	8	2/3	1	5	2.5	3.98	16	64
2	$\boldsymbol{Z}^{\mathbf{2}}$	$2 R Z^{2}$	16	2/3	1	6	3	4.77	56	120
2	$\boldsymbol{Z}^{\mathbf{2}}$	$2 R Z^{2}$	32	2/3	1	6	3	4.77	16	232
2	$Z^{2}$	$2 R Z^{2}$	64	2/3	1	7	3.5	5.44	56	456
2	$Z^{2}$	$2 R Z^{2}$	128	2/3	1	8	4	6.02	344	902
2	$Z^{2}$	$2 R Z^{2}$	256	2/3	1	8	4	6.02	44	1800
2	$Z^{2}$	$2 R Z^{2}$	512	2/3	1	8	4	6.02	4	3592

they grow too large too rapidly, they can dominate performance. We will not go beyond the error coefficient $\tilde{N}_{0}$ in this paper, except for Ungerboeck-type codes, where we can present results of Eyuboglu and Li (unpublished) that take into account the next two normalized coefficients, $\tilde{N}_{1}$ and $\tilde{N}_{2}$ .

Honig [22] has performed a search for one-dimensional Ungerboeck-type codes based on the four-way partition $\boldsymbol{Z} / \mathbf{4} \boldsymbol{Z}$ , using a criterion that includes the effect of the error coefficient, and has obtained an improvement at 64 states (the apparently improved 16 -state code is actually catastrophic). Similarly, Pottie and Taylor [23] have searched for two-dimensional Ungerboeck-type codes based on the eight-way partition $Z^{2} / 2 R Z^{2}$ and have obtained improvements at 64 and 128 states; the 128-state code has a lower fundamental coding gain $\gamma$ but a greater effective coding gain $\gamma_{\text {eff }}$ due to its much lower error coefficient. Finally, Eyuboglu and Li have made a reasonably exhaustive search for the best codes of both classes in terms of the effective coding gain criterion, for up to 256 states, with modest improvements at as few as 16 states.

Table V gives the performance parameters $d_{\min }^{2}, \tilde{N}_{0}, \tilde{N}_{1}$ , $\tilde{N}_{2}$ and the consequent effective coding gains $\gamma_{\text {eff }}$ for a number of the codes of Ungerboeck, Honig, Pottie and Taylor, and Eyuboglu and Li . When the dominant error coefficient is other than $\tilde{N}_{0}$ , it is starred. The parity-check polynomials $\boldsymbol{h}^{j}$ for these codes are also given, in the octal notation of Ungerboeck [8], [21]. All parameters not given (including decoding complexity) are the same as for the Ungerboeck code with the same number of states.

Fig. 12(a) plots the effective coding gain $\gamma_{\text {eff }}$ versus the normalized complexity $\tilde{N}_{D}$ for the best of these Unger-boeck-type one- and two-dimensional codes. We see that the graphs are fairly linear on this log-log plot over most of their range. An increase of a factor of two in complexity yields an increase of about 0.4 dB in coding gain, until the effective coding gain passes 5 dB . The one-dimensional codes are of the order of 0.2 dB better over this linear range (however, the two-dimensional codes have been generally preferred in practice because their constellation expansion factor $2^{\rho}$ is only two, not four).

The first multidimensional code seems to have been developed by Gallager [1]. In this code, an eight-state rate3/4 convolutional encoder selects two successive cosets from the four-way two-dimensional partition $\boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}$ or, equivalently, one coset from the 16-way fourdimensional lattice partition $Z^{4} / 2 Z^{4}$ (equivalently, four successive cosets from the two-way one-dimensional partition $\boldsymbol{Z} / 2 \boldsymbol{Z}$ ). The basic idea, as in Lemma 6, is that with such partitions the minimum squared distance $d_{\text {min }}^{2}$ between code sequences is simply the minimum Hamming distance $d_{H}$ of the binary code, as long as $d_{H} \leq 4$ . Quite independently, Calderbank and Sloane [10] discovered a very similar code, although with improved error coefficient $\tilde{N}_{0}$ due to the choice of an eight-state rate- $3 / 4$ binary code with a lower error coefficient. We shall call codes based

TABLE V Effective Coding Gains

$2^{\nu}$	$\boldsymbol{h}^{\mathbf{2}}$	$\boldsymbol{h}^{\mathbf{1}}$	$\boldsymbol{h}^{0}$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{1}$	$\tilde{N}_{2}$	$\gamma_{\text {eff }}(\mathrm{dB})$	Type
For $\boldsymbol{Z} / \mathbf{4} \boldsymbol{Z}$ codes
4		2	5	9	2.25	3.52	8	16	32	3.32	U
8		04	13	10	2.5	3.98	8	16	32	3.78	U
16		04	23	11	2.75	4.39	16	16	32	3.99	U
16		10	23	11	2.75	4.39	8	16	48	4.19	EL
32		10	45	13	3.25	5.12	24	56	112	4.60	U
64		024	103	14	3.5	5.44	72	0	180	4.61	U
64		054	161	14	3.5	5.44	16	${ }^{*} 64$	132	4.94	H
128		126	235	16	4	6.02	132	0	512	5.01	U
128		160	267	15	3.75	5.74	16	68	*200	5.16	EL
128		124	207	14	3.5	5.44	8	16	28	5.24	EL
256		362	515	16	4	6.02	4	64	*160	5.47	U
256		370	515	15	3.75	5.74	8	12	*80	5.42	EL
512		0342	1017	16	4	6.02	4	0	*112	5.57	U
For $\boldsymbol{Z}^{2} / 2 \boldsymbol{R} \boldsymbol{Z}^{2}$ codes
4	-	2	5	4	2	3.01	4	32	128	3.01	U
8	04	02	11	5	2.5	3.98	16	72	320	3.58	U
16	16	04	23	6	3	4.77	56	160	820	4.01	U
32	10	06	41	6	3	4.77	16	104	404	4.37	U
32	34	16	45	6	3	4.77	8	*128	404	4.44	EL
64	064	016	101	7	3.5	5.44	56	260	1008	4.68	U
64	060	004	143	7	3.5	5.44	48	292	1184	4.72	PT
64	036	052	115	7	3.5	5.44	40	252	992	4.78	EL
128	042	014	203	8	4	6.02	344	0	5900	4.74	U
128	056	150	223	8	4	6.02	172	624	2568	4.94	EL
128	024	100	245	7	3.5	5.44	8	*188	968	4.91	PT
128	164	142	263	7	3.5	5.44	8	*132	752	5.01	EL
256	304	056	401	8	4	6.02	44	*304	1316	5.28	U
256	370	272	417	8	4	6.02	36	*308	1224	5.28	EL
256	274	162	401	7	3.5	5.44	4	*64	248	5.22	EL
512	0510	0346	1001	8	4	6.02	4	128	*700	5.50	U

Fig. 12. Performance versus complexity. (a) For Ungerboeck-type onedimensional and two-dimensional codes (as improved by Eyuboglu and Li). (b) For Wei codes. (c) For Calderbank-Sloane-type codes.

on partitions $\boldsymbol{Z}^{N} / 2 \boldsymbol{Z}^{N}$ (with $N>2$ ) GCS-type codes. Table VI gives the parameters of the GCS-type code just described, with the error coefficient for the Calder-bank-Sloane (CS) version (as given in [13]).

Wei [11] has developed a variety of multidimensional codes. We shall say that a code is "Wei-type" if $\Lambda / \Lambda^{\prime}$ is a lattice partition where $\Lambda^{\prime}$ is a denser lattice than $\Lambda$ . Wei stresses that $\Lambda^{\prime}$ should be dense to maximize coding gain and to simplify code construction but that $\Lambda$ should have low redundancy $\rho(\Lambda)$ to maximize coding gain and to minimize the constellation expansion factor. Wei is also willing to increase the number of states $2^{\nu}$ (and decoding complexity) to minimize the error coefficient $\tilde{N}_{0}$ , even when the fundamental coding gain $\gamma$ is not improved thereby. Table VII gives the parameters for the codes discussed in [11]; Wei also mentions that any of his codes that are based on the 16-way partition $Z^{8} / E_{8}$ can be translated into a code based on the 16-way partition $E_{8} / R E_{8}$ , with the same error coefficient, but with twice the constellation size. (Wei's lattice " $D E_{8}$ " is here called $D_{8}^{\perp}$ , in keeping with the notation of Section III.) Note that the 32 -state code based on $\boldsymbol{Z}^{4} / 2 \boldsymbol{Z}^{4}$ is a GCS-type code. Indeed, all of the codes in which $\Lambda^{\prime}$ is a lattice of depth 2 are equivalent to GCS-type codes in view of Lemma 6; in particular, the eight-state code based on $Z^{4} / R D_{4}$ is equivalent to the CS version of the GCS-type code just described, although its decoding complexity is less because of the symmetries of $R D_{4}$ (the blank error coefficients are unknown but large).

Ungerboeck [21] has found additional 128 -state Wei-type codes that extend the four- and eight-dimensional families; these codes are listed in Table VIII. Fig. 12(b) plots the effective coding gain $\gamma_{\text {eff }}$ versus the normalized complexity $\tilde{N}_{D}$ for the Wei codes for which the error coefficient is known, versus the codes of Fig. 12(a) as benchmarks.

Calderbank and Sloane [12], [13] have also developed a number of multidimensional codes. We shall say that a

TABLE VI GCS-Type Code

$N$	$\Lambda$	$\Lambda^{\prime}$	$2^{\nu}$	$k /(k+r)$	$\rho$	$d_{\min }^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$	$\gamma_{\text {eff }}(\mathrm{dB})$
4	$\boldsymbol{Z}^{4}$	$2 \boldsymbol{Z}^{4}$	8	$3 / 4$	$1 / 2$	4	$2^{3 / 2}$	4.52	44	64	3.82

TABLE VII Wei Codes

$N$	Λ	$\Lambda^{\prime}$	$2^{y}$	$k /(k+r)$	$\rho$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$	$\gamma_{\text {eff }}(\mathrm{dB})$
4	$Z^{4}$	$R D_{4}$	8	2/3	1/2	4	$2^{3 / 2}$	4.52	44	44	3.82
4	$Z^{4}$	$R D_{4}$	16	2/3	1/2	4	$2^{3 / 2}$	4.52	12	72	4.20
4	$Z^{4}$	$2 Z^{4}$	32	3/4	1/2	4	$2^{3 / 2}$	4.52	4	244	4.52
4	$Z^{4}$	$2 D_{4}$	64	4/5	1/2	5	$5 / 2^{1 / 2}$	5.48	72	1048	4.65
8	$Z^{8}$	$E_{8}$	16	3/4	1/4	4	$2^{7 / 4}$	5.27	316	104	4.01
8	$Z^{8}$	$E_{8}$	32	3/4	1/4	4	$2^{7 / 4}$	5.27	124	164	4.28
8	$Z^{8}$	$E_{8}$	64	3/4	1/4	4	$2^{7 / 4}$	5.27	60	284	4.49
16	$Z^{16}$	$H_{16}$	32	4/5	1/8	4	$2^{15 / 8}$	5.64		228
16	$Z^{16}$	$H_{16}$	64	4/5	1/8	4	$2^{15 / 8}$	5.64	796	352	4.12
16	$Z^{16}$	$H_{16}$	128	4/5	1/8	4	$2^{15 / 8}$	5.64	412	600	4.31
8	$D_{8}^{\frac{1}{8}}$	$R E_{8}$	32	4/5	1	8	4	6.02		336
8	$D_{8}^{\perp}$	$R E_{8}$	64	4/5	1	8	4	6.02	316	584	4.76
8	$D_{8}^{\perp}$	$R E_{8}$	128	4/5	1	8	4	6.02	124	1080	5.03

TABLE VIII Further Wei-Type Codes (Ungerboeck)

$N$	$\Lambda$	$\Lambda^{\prime}$	$2^{\nu}$	$k /(k+r)$	$\rho$	$d_{\min }^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$	$\gamma_{\mathrm{eff}}(\mathrm{dB})$
4	$\boldsymbol{Z}^{4}$	$2 D_{4}$	128	$4 / 5$	$1 / 2$	6	$6 / 2^{1 / 2}$	6.28	728	2040	4.77
8	$\boldsymbol{Z}^{8}$	$R D_{8}$	128	$4 / 5$	$1 / 4$	4	$2^{7 / 4}$	5.27	28	1032	4.71

TABLE IX Calderbank-Sloane Codes

$N$	Λ	$\Lambda^{\prime}$	$2^{\nu}$	$k /(k+r)$	$\rho$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$	$\gamma_{\text {eff }}(\mathrm{dB})$
2	$Z^{2}$	$\boldsymbol{4} \boldsymbol{Z}^{\mathbf{2}}$	4	2/4	2	8	2	3.01	4	44	3.01
2	$\boldsymbol{Z}^{\mathbf{2}}$	$\boldsymbol{4} \boldsymbol{Z}^{\mathbf{2}}$	8	2/4	2	11	2.75	4.39	32	72	3.79
2	$\boldsymbol{Z}^{\mathbf{2}}$	$\mathbf{4} \boldsymbol{Z}^{\mathbf{2}}$	16	2/4	2	12	3	4.77	48	128	4.05
2	$\boldsymbol{Z}^{\mathbf{2}}$	$4 \boldsymbol{Z}^{\mathbf{2}}$	64	2/4	2	14	3.5	5.44	48	464	4.72
2	$\boldsymbol{Z}^{\mathbf{2}}$	$4 \boldsymbol{Z}^{\mathbf{2}}$	128	2/4	2	16	4	6.02	228	912	4.85
4	$D_{4}$	$2 D_{4}$	16	3/4	1	6	3	4.77	152	152	3.72
4	$D_{4}$	$2 D_{4}$	64	3/4	1	8	4	6.02	828	512	4.48
8	$E_{8}$	$R E_{8}$	8	3/4	5/4	8	$2^{7 / 4}$	5.27	764	90	3.75
8	$E_{8}$	$R E_{8}$	16	3/4	5/4	8	$2^{7 / 4}$	5.27	316	120	4.01
8	$E_{8}$	$R E_{8}$	32	3/4	5/4	8	$2^{7 / 4}$	5.27	124	180	4.28
8	$E_{8}$	$R E_{8}$	64	3/4	5/4	8	$2^{7 / 4}$	5.27	60	300	4.49

Note added in proof: J. Chow (private communication) has obtained values of $\tilde{N}_{0}=88$ and $\gamma_{\text {eff }}=3.88 \mathrm{~dB}$ for the 16-state $D_{4} / R D_{4}$ code, and of $d_{\text {min }}^{2}=6, \tilde{N}_{0}=16$ , and $\gamma_{\text {eff }}=4.37 \mathrm{~dB}$ for the 64-state $D_{4} / R D_{4}$ code.

TABLE X Further $\boldsymbol{Z}^{\mathbf{2}} \boldsymbol{/} \mathbf{4} \boldsymbol{Z}^{\mathbf{2}}$ Codes (Eyuboglu)

$N$	Λ	$\Lambda^{\prime}$	$2^{\nu}$	$k /(k+r)$	$\rho$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N}_{0}$	$\tilde{N}_{D}$	$\gamma_{\text {eff }}(\mathrm{dB})$
2	$Z^{2}$	$\boldsymbol{4} \boldsymbol{Z}^{\mathbf{2}}$	4	2/4	2	9	2.25	3.52	8	44	3.32
2	$\boldsymbol{Z}^{\mathbf{2}}$	$4 Z^{2}$	16	2/4	2	12	3	4.77	16	128	4.37
2	$\boldsymbol{Z}^{\mathbf{2}}$	$4 Z^{2}$	32	2/4	2	12	3	4.77	4	240	4.77
2	$\boldsymbol{Z}^{\mathbf{2}}$	$4 \boldsymbol{Z}^{\mathbf{2}}$	32	2/4	2	13	3.25	5.12	16	240	4.72

code is "CS-type" if $\Lambda / \Lambda^{\prime}$ is a lattice partition where $\Lambda$ and $\Lambda^{\prime}$ are versions of the same lattice. Some of their codes are shown in Table IX. (

#19 Unclear reference to "last three codes"

They also consider the following: Ungerboeck-type codes based on partitions $\boldsymbol{Z} / \mathbf{4} \boldsymbol{Z}, \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}$ and $\boldsymbol{Z}^{2} / 2 R \boldsymbol{Z}^{2}$ , but without improvement over Ungerboeck either in fundamental coding gain $\gamma$ or in error coefficient $\tilde{N}_{0}$ , except for the $\nu=6$ case also found by Pottie and Taylor; the GCS-type code based on the partition $\boldsymbol{Z}^{4} / 2 \boldsymbol{Z}^{4}$ , as previously mentioned; and codes using the nonbinary two-dimensional hexagonal lattice $A_{2}$ , for which the results are not particularly encouraging. The last three codes appear to be equivalent to the aforementioned translation of Wei's $\boldsymbol{Z}^{8} / E_{8}$ codes.

)

Finally, Eyuboglu has also searched for codes based on the 16-way two-dimensional partition $\boldsymbol{Z}^{2} / 4 \boldsymbol{Z}^{2}$ . The additional codes found that improve on codes already listed are summarized in Table X.

Fig. 12(c) plots the effective coding gain $\gamma_{\text {eff }}$ versus the normalized complexity $\tilde{N}_{D}$ for the Calderbank-Sloane codes, as improved by the codes in Table X (up to 32 states), again with the codes of Fig. 12(a) for comparison. Note that the $\boldsymbol{Z}^{2} / \boldsymbol{4}^{2}$ codes ought to be compared to the one-dimensional $\boldsymbol{Z} / \mathbf{4} \boldsymbol{Z}$ codes, since they have the same redundancy and depth, and in fact include the latter as a subset; their performance improvement is in fact small, and, taking complexity into account, they are no better.

VI. Further Classes of Trellis Codes

In this section we present a number of additional generic classes of codes that can be described relatively simply. Our objective is more to round out the picture than to improve on earlier results; in general, these codes have parameters comparable to those of the known codes of the previous section (or indeed of lattice codes). Our main point, in fact, is that "there are many ways to modulate," and that the complexity of the encoder and decoder required to achieve a given coding gain and error coefficient remains remarkably constant across a wide variety of codes.

We describe eight different classes of codes $\mathbb{C}\left(\Lambda / \Lambda^{\prime} ; C\right)$ , based on all possible choices of the three following binary characteristics. The codes are based either on a lattice partition $\Lambda / \Lambda^{\prime}$ with minimum squared distances $d_{\text {min }}^{2}(\Lambda) / d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ in the ratio $1: 2$ , or on a partition chain $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime}$ with distances $d_{\text {min }}^{2}(\Lambda) / d_{\text {min }}^{2}\left(\Lambda^{\prime}\right) / d_{\text {min }}^{2}\left(\Lambda^{\prime \prime}\right)$ in the ratio $1: 2: 4$ (in the latter case, we use an Ungerboeck labeling and exploit the Ungerboeck distance bound). The convolutional code $C$ is either a rate- $k / 2 k$ code, with $\left|\Lambda / \Lambda^{\prime}\right|=2^{2 k}$ or else $\left|\Lambda / \Lambda^{\prime}\right|=\left|\Lambda^{\prime} / \Lambda^{\prime \prime}\right|=2^{k}$ , or a rate- $k / (k+1)$ code, with $\left|\Lambda / \Lambda^{\prime}\right|=2^{k+1}$ or else $\left|\Lambda / \Lambda^{\prime}\right|=2$ and $\left|\Lambda^{\prime} / \Lambda^{\prime \prime}\right|=2^{k}$ . The constraint length $\nu$ of $C$ is either $k$ or $2 k$ , with each of the $k$ input bits being held in memory for either one or two time units. All codes are noncatastrophic, i.e., there is no infinite sequence of nonzero inputs that leads to a finite sequence of nonzero outputs.

The resulting codes have many characteristics similar to those of the Wei and Calderbank-Sloane codes, as well as the four-state two-dimensional Ungerboeck code. Except for error coefficient, the rate- $k / 2 k$ codes also have parameters resembling those of the Barnes-Wall lattices.

Class I Codes

Let $\Lambda / \Lambda^{\prime}$ be a $2^{2 k}$ -way lattice partition with $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)= 2 d_{\min }^{2}(\Lambda)$ . Let $C$ be a "unit-memory" rate- $k / 2 k$ binary convolutional encoder, as shown in Fig. 13(a). In each time

Fig. 13. Encoders. (a) For Class I code. (b) For Class II codes. (c) For Class III and IV codes. (d) For Class V codes. (e) For Class VI codes. (f) For Class VII and VIII codes.

unit $k$ information bits enter the encoder and are stored for one time unit; the encoder output is the combination of the $k$ current and $k$ previous bits, or $b$ bits altogether, which select one of the $2^{2 k}$ cosets of $\Lambda^{\prime}$ . The encoder has $2^{k}$ states, and the code has a trellis diagram in which every current state is connected to every next state, so there are a total of $2^{2 k}$ branches in each time unit, one corresponding to each coset of $\Lambda^{\prime}$ . The code is thus not catastrophic, because only one branch corresponds to the zero coset of $\Lambda^{\prime}$ ( $\Lambda^{\prime}$ itself). Any two paths through the trellis must differ in at least two time units, so the minimum squared distance between paths is $2 d_{\text {min }}^{2}(\Lambda)$ , which is the same as the minimum squared distance $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ within any coset of $\Lambda^{\prime}$ ; thus $d_{\min }^{2}(\mathbb{C})=d_{\min }^{2}\left(\Lambda^{\prime}\right)$ .

The multiplicity $N_{0}$ of sequences at distance $d_{\text {min }}^{2}(\mathbb{C})$ from a given sequence that start at a given time is

N_{0}=N_{\Lambda^{\prime}}+\left(2^{k}-1\right) N_{\Lambda}^{2}

where $N_{\Lambda^{\prime}}$ is the number of points of weight $d_{\min }^{2}\left(\Lambda^{\prime}\right)$ in $\Lambda^{\prime}$ , and $N_{\Lambda}$ is the number of points of weight $d_{\min }^{2}(\Lambda)$ in any nonzero coset of $\Lambda^{\prime}$ (if it is the same for all such cosets).

Table XI gives the parameters for Class I codes based on the partitions $\boldsymbol{Z}^{4} / R \boldsymbol{Z}^{4}, D_{4} / R D_{4}, E_{8} / R E_{8}$ , and $\Lambda_{16} / R \Lambda_{16}$ , which have orders $4,4,16$ , and 256 , and depths $1,2,3$ , and 4 , respectively (the $D_{4} / R D_{4}$ code was developed as a phase-modulated code by Divsalar and Simon [24]). These codes are closely related to the lattices $D_{4}, E_{8}, \Lambda_{16}$ , and $\Lambda_{32}$ , and have the same principal parameters $\mu, \rho$ , and $\gamma$ . In the case of the two-state code based on $\boldsymbol{Z}^{4} / R \boldsymbol{Z}^{4}$ , it is possible to make the minimum squared distance between distinct paths equal to $3 d_{\text {min }}^{2}(\Lambda)$ (use the partition chain $\boldsymbol{Z}^{4} / D_{4} / R \boldsymbol{Z}^{4}$ , and let one of the two bits control each of these two-way partitions; see Class III), so that the error coefficient achieves its minimum possible value, $\tilde{N}_{0}=4$ .

Class II Codes

#11 Inconsistent encoder description for Class II codes

Let $\Lambda / \Lambda^{\prime}$ again be a $2^{2 k}$ -way lattice partition with $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ . Let $C$ be a rate- $1 / 2,2^{2 k}$ -state convolutional encoder as shown in Fig. 13(b), with $k$ information bits entering in each time unit, two units of memory, and $2 k$ output bits, $k$ representing the inputs one time unit earlier, and $2 k$ representing the mod- 2 sum of the current inputs with those two time units earlier, which together select one of the $2^{2 k}$ cosets of $\Lambda^{\prime}$ .

The trellis diagram has $2^{2 k}$ states, with $2^{k}$ branches leaving and entering every state. Because the distance between paths is at least $d_{\text {min }}^{2}(\Lambda)$ when they diverge, $d_{\text {min }}^{2}(\Lambda)$ when they merge, and $d_{\text {min }}^{2}(\Lambda)$ in some other time unit, the distance between distinct paths is at least $3 d_{\text {min }}^{2}(\Lambda)$ . Thus $d_{\text {min }}^{2}(\mathbb{C}) =d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ , the distance within cosets of $\Lambda^{\prime}$ , and the error coefficient $\tilde{N}_{0}$ is the same as that of $\Lambda^{\prime}$ (the minimum possible for any coset code based on the partition $\Lambda / \Lambda^{\prime}$ ). The code is noncatastrophic because every nonzero input creates a nonzero output one time unit later.

TABLE XI

$\bar{N}$	Λ	$\Lambda^{\prime}$	2v	$k /(k+r)$	$\rho$	$d_{\text {min }}^{2}$	$\gamma$	dB	$\tilde{N_{0}}$	$\tilde{N}_{D}$	$\gamma_{\text {eff }}(\mathrm{dB})$
Class I codes
2	$Z^{4}$	$R Z^{4}$	2	1/2	1/2	2	$2^{1 / 2}$	1.51	4	5	1.51
2	$D_{4}$	$R D_{4}$	2	$1 / 2$	1	4	2	3.01	44	13	2.32
4	$E_{8}$	$R E_{8}$	4	2/4	3/2	8	$2^{3 / 2}$	4.52	252	67	3.24
8	$\Lambda_{16}$	$R \Lambda_{16}$	16	4/8	2	16	4	6.02		1614
Class II codes
4	$D_{4}$	$R D_{4}$	4	1/2	1	4	2	3.01	12	16	2.69
8	$E_{8}$	$R E_{8}$	16	2/4	3/2	8	$2^{3 / 2}$	4.52	60	88	3.73
16	$\Lambda_{16}$	$R \Lambda_{16}$	256	4/8	2	Class III codes	4	6.02	540	2544	4.61
2	$\boldsymbol{Z}^{\mathbf{2}}$	$2 Z^{2}$	2	1/2	1	3	3/2	1.76	8	8	1.56
4	$D_{4}$	$2 D_{4}$ $2 E_{8}$	4 16	2/4	3/2	6	$3 / 2^{1 / 2}$	3.27		46
8	$E_{8}$	$2 E_{8}$	16	4/8	2	16	4	6.02	1020	684	4.42
Class IV codes
2	$Z^{2}$	$2 Z^{2}$	4	1/2	1	4	2	3.01	4	16	3.01
4	$D_{4}$	$2 D_{4}$	16	2/4	3/2	8	$2^{3 / 2}$	4.52	12	88	4.20
8	$E_{8}$	$2 E_{8}$	256	4/8	2	16	4	6.02	60	2544	5.24
Class V codes
4	$Z^{4}$	$R Z^{4}$	2	1/2	1/2	2	$2^{1 / 2}$	1.51	4	5	1.51
4	$D_{4}$	$R D_{4}$	2	1/2	1	4	2	3.01	44	13	2.32
8	$E_{8}$	$R E_{8}$	8	3/4	5/4	8	$2^{7 / 4}$	5.27	764	90	3.75
16	$\Lambda_{16}$	$R \Lambda_{16}$	128	7/8	13/8	16	$2^{19 / 8}$	7.15		5632
8	$D_{8}$	$E_{8}$	4	2/3	1/2	4	$2^{3 / 2}$	4.52	316	37	3.25
16	$D_{16}$	$H_{16}$	8	3/4	1/4	4	$2^{7 / 4}$	5.27	1692	89	3.52
16	$H_{16}$	$\Lambda_{16}$	64	6/7	3/4	8	$2^{9 / 4}$	6.77		1792
Class VI codes
4	$D_{4}$	$R D_{4}$	4	1/2	1	4	2	3.01	12	16	2.69
8	$E_{8}$	$R E_{8}$	64	3/4	5/4	8	$2^{7 / 4}$	5.27	60	300	4.49
16	$\Lambda_{16}$	$R \Lambda_{16}$	$2{ }^{14}$	7/8	13/8	16	$2^{19 / 8}$	7.15	540	$\sim 2^{19}$	5.73
8	$D_{8}$	$E_{8}$	16	2/3	1/2	4	$2^{3 / 2}$	4.52	60	58	3.73
8	$D_{16}$	$H_{16}$	64	3/4	1/4	4	$2^{7 / 4}$	5.27	284	194	4.04
16	$H_{16}$	$\Lambda_{16}$	$2^{12}$	6/7	3/4	8	$2^{9 / 4}$	6.77	540	$-2^{16}$	5.36
Class VII codes
2	$Z^{2}$	$2 \boldsymbol{Z}^{\mathbf{2}}$	2	1/2	1	3	3/2	1.76	8	8	1.56
4	$Z^{4}$	$R D_{4}$	4	2/3	1/2	3	$3 / 2^{1 / 2}$	3.27	24	30	2.75
8	$Z^{8}$	$E_{8}$	8	3/4	1/4	3	$3 / 2^{1 / 4}$	4.02		74
16	$Z^{16}$	$H_{16}$	16	4/5	1/8	3	$3 / 2^{1 / 8}$	4.39		166
8	$D_{8}^{\frac{1}{8}}$	$R E_{8}$	16	4/5	1	6	3	4.77		212
Class VIII codes
2	$Z^{2}$	$2 \boldsymbol{Z}^{\mathbf{2}}$	4	1/2	1	4	2	3.01	4	16	3.01
4	$Z^{4}$	$R D_{4}$	16	2/3	1/2	4	$2^{3 / 2}$	4.52	12	72	4.20
8	$Z^{8}$	$E_{8}$	64	3/4	1/4	4	$2^{7 / 4}$	5.27	60	284	4.49
16	$Z^{16}$	$H_{16}$	256	4/5	1/8	4	$2^{15 / 8}$	5.64	284	1096	4.41
8	$D_{8}^{1}$	$R E_{8}$	256	4/5	1	8	4	6.02	60	2072	5.24

Table XI gives the parameters for Class II codes based on the partitions $D_{4} / R D_{4}, E_{8} / R E_{8}$ , and $\Lambda_{16} / R \Lambda_{16}$ . Their parameters, including coding gain, are the same as those of Class I codes, except that the increase of the number of states to equal the order $2^{2 k}$ of the partition $\Lambda / \Lambda^{\prime}$ results in a reduction of the error coefficient $\tilde{N}_{0}$ to its minimum value. The decoding complexity increases only modestly because it is dominated by the complexity of decoding $\Lambda / \Lambda^{\prime}$ .

Class III and IV Codes

Let $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime}$ now be a chain of two $2^{k}$ -way partitions with distances $d_{\min }^{2}\left(\Lambda^{\prime \prime}\right)=2 d_{\min }^{2}\left(\Lambda^{\prime}\right)=4 d_{\min }^{2}(\Lambda)$ . For a Class III code (resp. Class IV), let $C$ be the same rate- $k / 2 k$ encoder as in Class I (resp. Class II), but now let the first set of $k$ output bits select the coset of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ , and the second set of $k$ output bits select the coset of $\Lambda^{\prime \prime}$ in the partition $\Lambda^{\prime} / \Lambda^{\prime \prime}$ , as shown in Fig. 13(c). These encoders are still noncatastrophic.

Because the label selecting the coset of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ is zero at the time of the first nonzero input, the set of possible initial code symbols (the time-zero lattice $\Lambda_{0}$ ) is $\Lambda^{\prime}$ , so the minimum squared distance between distinct paths is $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ where they first diverge. In the case of Class III codes, the minimum squared distance between distinct paths is $d_{\text {min }}^{2}(\Lambda)$ where they merge, so $d_{\text {min }}^{2}(\mathbb{C})=3 d_{\text {min }}^{2}(\Lambda)$ . In the case of Class IV codes, the set of possible final code symbols is also $\Lambda^{\prime}$ , so that the minimum squared distance between distinct paths is $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ where they merge, and also at least $d_{\text {min }}^{2}(\Lambda)$ at some other time, so that the $d_{\text {min }}^{2}$ between distinct paths is $5 d_{\text {min }}^{2}(\Lambda)$ . This means that $d_{\text {min }}^{2}(\mathbb{C})=d_{\text {min }}^{2}\left(\Lambda^{\prime \prime}\right)= 4 d_{\min }^{2}(\Lambda)$ , and furthermore the normalized error coefficient $\tilde{N}_{0}$ is the same as that of $\Lambda^{\prime \prime}$ .

Table XI gives the parameters for Class III codes based on the partitions $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}, D_{4} / R D_{4} / 2 D_{4}$ , and $E_{8} / R E_{8} / 2 E_{8}$ , which have orders 4,16 , and 256 , and depths 2,3 , and 4 , respectively. The first is a noncatastrophic Ungerboeck-type two-state code with $\gamma=1.5 (1.76 \mathrm{~dB})$ (this code appears as a phase-modulated code in Divsalar et al. [25]). The others are CS-type codes. For the $E_{8} / R E_{8} / 2 E_{8}$ code, we use the fact that there exists an alternative partition $E_{8} / R^{*} E_{8} / 2 E_{8}$ , where $R^{*} E_{8}$ , like $R E_{8}$ , is a version of $E_{8}$ with $d_{\text {min }}^{2}=8$ , such that the system of coset representatives $\left[R^{*} E_{8} / 2 E_{8}\right]$ is also a system of coset representatives for $E_{8} / R E_{8}$ (see part II); this ensures that $d_{\min }^{2}=8$ when paths merge as well as when they diverge, so that $d_{\text {min }}^{2}(\mathbb{C})=d_{\text {min }}^{2}\left(2 E_{8}\right)=16$ .

Table XI gives the parameters for Class IV codes based on the same partitions $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}, D_{4} / R D_{4} / 2 D_{4}$ , and $E_{8} / R E_{8} / 2 E_{8}$ . The first is Ungerboeck's four-state code, which is the prototype of this class. These are CS-type codes that are closely related to Class III codes, except that they have twice the constraint length and $4 / 3$ the minimum squared distance and coding gain (except in the last case, where only the error coefficient improves). Again, these codes are closely related to the lattices $E_{8}, \Lambda_{16}$ , and $\Lambda_{32}$ and to the corresponding Class I and II codes and have the same principal parameters. In fact, even the decoding complexity is the same as that of Class II codes, but the error coefficient is still further reduced.

Class V Codes

Let $\Lambda / \Lambda^{\prime}$ now be a $2^{k+1}$ -way partition with distances $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ , and let $C$ be a rate- $k /(k+1)$ convolutional encoder as shown in Fig. 13(d), with $k$ information bits entering in each time unit, and $k+1$ output bits generated as follows. Let $T$ be a linear (modulo 2 ) circuit with $k$ input bits, namely the $k$ -tuple $\boldsymbol{x}_{t-1}$ of input bits delayed by one time unit, and $k+1$ output bits. One output bit goes directly to the coset selector; the remaining $k$ bits are added (modulo 2) to the $k$ -tuple $\boldsymbol{x}_{t}$ , and the $k$ -bit sum forms the remaining inputs to the coset selector.

The circuit $T$ need have only the following two properties: a) its outputs are all-zero only when its inputs are all-zero, and b) there is no infinite input sequence ( $\boldsymbol{x}_{0}, \boldsymbol{x}_{1}, \cdots$ ) into $C$ that generates a finite output sequence from $C$ (so that the code is noncatastrophic). A simple circuit $T$ with these properties is the one whose output is simply the ( $k+1$ )-tuple ( $x_{t-1}, 0$ ), where the leftmost bit is the one that goes directly into the coset selector. Property a) is obvious. Property b) follows from the fact that if $\boldsymbol{x}_{0} \neq \mathbf{0}$ , then there is no sequence ( $\boldsymbol{x}_{1}, \boldsymbol{x}_{2}, \cdots$ ) such that $\left(\boldsymbol{x}_{0}, 0\right) \oplus\left(0, \boldsymbol{x}_{1}\right)=\mathbf{0},\left(\boldsymbol{x}_{1}, 0\right) \oplus\left(0, \boldsymbol{x}_{2}\right)=\mathbf{0}$ , etc., since $\boldsymbol{x}_{1}$ can only match the $k-1$ low-order bits in $\boldsymbol{x}_{0}, \boldsymbol{x}_{2}$ can then only match the $k-2$ low-order bits in $\boldsymbol{x}_{0}$ , etc., and so eventually the highest order nonzero bit in $\boldsymbol{x}_{0}$ "shifts" to the highest order position, where it cannot be matched.

Property a) ensures that the minimum squared distance is at least $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ , because two distinct paths differ by at least $d_{\text {min }}^{2}(\Lambda)$ where they diverge and $d_{\text {min }}^{2}(\Lambda)$ where they merge. The multiplicity $N_{0}$ of sequences at distance $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ from any given sequence starting at any given time is

N_{0}=N_{\Lambda^{\prime}}+\left(2^{k+1}-k-2\right) N_{\Lambda}^{2}

where $N_{\Lambda^{\prime}}$ is the number of points of weight $d_{\min }^{2}\left(\Lambda^{\prime}\right)$ in $\Lambda^{\prime}$ , and $N_{\Lambda}$ is the number of points of weight $d_{\text {min }}^{2}(\Lambda)$ in any nonzero coset of $\Lambda^{\prime}$ (if it is the same for all such cosets).

#5 Path enumeration for Class V error coefficient

The coefficient of $N_{\Lambda}^{2}$ follows from the observation that in the code trellis, starting from a given zero state and ending at some later zero state, there are $2^{k}-1$ nonzero paths of length $2,2^{k-1}-1$ nonzero paths of length 3 , and so forth, up to $2-1=1$ nonzero path of length $k+1$ , so that the total number of nonzero paths is $2^{k+1}-k-2$ (this is generally true for any noncatastrophic rate- $k /(k+1)$ encoder; see Forney [26]).

Table XI gives the parameters for Class V codes based on the partitions $Z^{4} / R Z^{4}, D_{4} / R D_{4}, E_{8} / R E_{8}$ , and $\Lambda_{16} / R \Lambda_{16}$ , as in Class I, as well as $D_{8} / E_{8}, D_{16} / H_{16}$ , and $H_{16} / \Lambda_{16}$ . The first two are just the two-state Class I codes again, since $k+1=2 k=2$ . The third code is a code equivalent to the eight-state CS code based on $E_{8} / R E_{8}$ , which may be considered as the prototype of this class. The fourth is a 128-state code with a coding gain in excess of 7 dB , but with a huge error coefficient and decoding complexity. The last three further illustrate that Class V codes attain large coding gains for relatively few states (and thus small decoding complexity) but with outsize error coefficients: the fifth code attains $\gamma=2^{3 / 2}(4.52 \mathrm{~dB})$ with only four states, while the last gets considerably beyond 6 dB with only 64 states. (In unpublished work, Wei had earlier constructed codes based on such 16dimensional partitions as $H_{16} / \Lambda_{16}$ , with comparable coding gains.) Even if the effective number of states is taken as the order of the partition $\Lambda / \Lambda^{\prime}$ rather than $2^{\nu}$ (since the decoding complexity is dominated by decoding $\Lambda / \Lambda^{\prime}$ ), so that the effective number of states doubles, these are still very good (ignoring $\tilde{N}_{0}$ ).

Class VI Codes

Once again, let $\Lambda / \Lambda^{\prime}$ be a $2^{k+1}$ -way partition with distances $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ , but now let $C$ be a rate $k /(k+1)$ convolutional encoder as shown in Fig. 13(e). This is the same as the Class V encoder, including the circuit $T$ , except that there is a second memory element, and its output $\boldsymbol{x}_{t-2}$ is further added to the $k$ -tuple output of the encoder of Fig. 13(d).

Property b) of circuit $T$ again ensures that the code is noncatastrophic.

#1 Minimum distance argument for Class VI codes

Furthermore, it ensures that if the input sequence has a finite number of nonzero $\boldsymbol{x}_{t}$ , then the encoder outputs are nonzero at at least three different times: once when the sequence begins, once at some inter- mediate time when $T$ has a nonzero output, and once when the last nonzero input finally leaves the encoder. Consequently, the minimum distance between two distinct paths is at least $3 d_{\min }^{2}(\Lambda)$ , so that the minimum distance of the code is $d_{\text {min }}^{2}(\mathbb{C})=d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$

, and the error coefficient $\tilde{N}_{0}$ is the same as that of $\Lambda^{\prime}$ .

Table XI gives the parameters for Class VI codes based on the same partitions as for Class V (except for $Z^{4} / R Z^{4}$ , where no improvement is achieved). The four-state $D_{4} / R D_{4}$ code is again a Class II code, and the 64-state $E_{8} / R E_{8}$ code is equivalent to the corresponding Wei/CalderbankSloane code. The remaining codes illustrate that Class VI codes have the same coding gains as Class V codes, but with reasonable error coefficients, at the cost of increased decoding complexity (vastly increased, for those codes with gains more than 6 dB ).

Class VII and VIII Codes

Now let $\Lambda / \Lambda^{\prime} / \Lambda^{\prime \prime}$ be a two-level partition chain with distances $d_{\text {min }}^{2}\left(\Lambda^{\prime \prime}\right)=2 d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=4 d_{\text {min }}^{2}(\Lambda)$ and orders 2 and $2^{k}$ . Let $C$ be a rate- $k /(k+1)$ encoder as in Fig. 13(d) and (e), but with one output bit selecting one of the two cosets of $\Lambda^{\prime}$ in the partition $\Lambda / \Lambda^{\prime}$ , and the remaining $k$ bits selecting a coset of $\Lambda^{\prime \prime}$ in the partition $\Lambda^{\prime} / \Lambda^{\prime \prime}$ , as shown in Fig. 13(f).

Again, the codes are noncatastrophic. With Class VII codes, as with Class III, two distinct paths differ by at least $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ where they diverge and $d_{\text {min }}^{2}(\Lambda)$ where they merge, so that $d_{\min }^{2}(\mathbb{C})=3 d_{\min }^{2}(\Lambda)$ . With Class VIII codes, as with Class IV, two distinct paths differ by at least $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$ where they diverge, by at least $d_{\text {min }}^{2}(\Lambda)$ at some intermediate time, and by at least $d_{\min }^{2}\left(\Lambda^{\prime}\right)$ where they merge, so that the minimum squared distance between distinct paths is at least $5 d_{\text {min }}^{2}(\Lambda)$ . Hence $d_{\text {min }}^{2}(\mathbb{C})= d_{\text {min }}^{2}\left(\Lambda^{\prime \prime}\right)=4 d_{\text {min }}^{2}(\Lambda)$ , and the error coefficient $\tilde{N}_{0}$ is the same as that for $\Lambda^{\prime \prime}$ .

Table XI gives the parameters for Class VII and Class VIII codes based on the partitions $\boldsymbol{Z}^{2} / R \boldsymbol{Z}^{2} / 2 \boldsymbol{Z}^{2}, \boldsymbol{Z}^{4} / D_{4} / R D_{4}, \quad Z^{8} / D_{8} / E_{8}, \quad Z^{16} / D_{16} / H_{16}$ , and $D_{8}^{\perp} / E_{8} / R E_{8}$ , which have orders $4,8,16,32$ , and 32, and depths $2,2,2$ , 2, and 3, respectively. The first Class VII code is the two-state Class III code again, and the first Class VIII code is again Ungerboeck's four-state code. The remaining codes are Wei-type codes. In particular, the second and third Class VIII codes correspond to Wei's 16-state fourdimensional and 64 -state eight-dimensional codes, which are the prototypes of this class, and the last two Class VIII codes are 256 -state elaborations of codes that Wei investigated for $2^{\nu}=32,64$ , and 128. The Class VII codes are closely related to Class VIII codes, except that they have half the state space dimension and $3 / 4$ the minimum squared distance and coding gain.

VII. Discussion

A large number of codes have been discussed in a common framework in this paper. In this section we draw what conclusions seem warranted by the evidence.

Trellis codes and lattice codes are comparable, with respect to fundamental parameters such as fundamental coding gain $\gamma$ versus number of states $2^{\nu}$ . Considering the sequence of Barnes-Wall lattices, we see that it takes two states to get $\gamma=2^{1 / 2}(1.51 \mathrm{~dB})$ , four states to get $\gamma=2$ ( 3.02 dB ), 16 states to get $\gamma=2^{3 / 2}(4.52 \mathrm{~dB})$ , and 256 states to get $\gamma=4(6.02 \mathrm{~dB})$ . The depths $\mu$ of these lattices are $1,2,3$ , and 4 ; their redundancies $\rho$ are $1 / 2,1,3 / 2$ , and 2 ; and their minimum squared distances are $2,4,8$ , and 16 .

These properties are shared by the generic trellis codes that we have called Class I, II, and IV, which include a two-state $\gamma=2^{1 / 2}$ code based on the partition $\boldsymbol{Z}^{4} / R \boldsymbol{Z}^{4}$ and the four-state Ungerboeck $\gamma=2$ code based on the partition $Z^{2} / 2 Z^{2}$ , both with minimal error coefficient $\tilde{N}_{0}=4$ .

All of the trellis codes that achieve $\gamma=2^{3 / 2}(4.52 \mathrm{~dB})$ require 16 states, except for the GCS/Wei eight-state four-dimensional code, which has an error coefficient of $\tilde{N}_{0}=44$ , and four-state Class I and V codes, whose error coefficients are very large and whose decoding complexity is not that much less than that of the Wei/Class VIII 16 -state four-dimensional code, for example.

#12 Parameter mismatch for lattice X_32 in Discussion

Note also that the lattices $X_{24}$ and $X_{32}$ achieve $\gamma=2^{3 / 2}(4.52 \mathrm{~dB})$ and $\gamma=2^{13 / 8}(4.89 \mathrm{~dB})$ with 8 and 16 states, respectively, but with $\mu=2, \rho=1 / 2$ and $d_{\min }^{2}=4$ , like the Wei codes.

#6 Incorrect example for γ=3 codes in Discussion

There is a nearby cluster of codes that achieve $\gamma=3$ ( 4.77 dB ), with either $\mu=3, \rho=1$ and $d_{\min }^{2}=6$ , or $\mu=4$ , $\rho=2$ and $d_{\text {min }}^{2}=12$ ; e.g., the 16- and 32-state two-dimensional Ungerboeck codes, the 16-state CS/Eyuboglu codes, or the 16 -state Class III and Class VII codes.

There is another cluster of codes at $\gamma=2^{7 / 4}(5.27 \mathrm{~dB})$ . While there are codes that achieve this fundamental coding gain with as few as eight states (e.g., the eight-dimensional CS code, or two of the Class V codes), it seems to take 32 or 64 states to get reasonable error coefficients (e.g., the Wei or CS eight-dimensional codes).

To achieve $\gamma=4(6.02 \mathrm{~dB})$ , all of the trellis codes with reasonable error coefficient ( $\tilde{N}_{0}<100$ ) require 256 states. There are such codes with as few as 16 states (e.g., Class I $\Lambda_{16} / R \Lambda_{16}$ ) but with very large error coefficients and without as much saving in decoding complexity as the low number of states would suggest (because most of the complexity occurs in decoding the lattice partition). There are a number of good 128 -state codes, but there are also lattices ( $H_{24}$ and $H_{32}$ ) that achieve nearly 6 dB with 128 states. The 256 -state one-dimensional Ungerboeck code is remarkable: it obtains $\gamma=4$ with minimal error coefficient $\tilde{N}_{0}=4$ , and with quite low decoding complexity. Note that it has $\mu=4, \rho=2$ and $d_{\text {min }}^{2}=16$ , like $\Lambda_{24}$ and $\Lambda_{32}$ .

In summary, we propose a folk theorem: it takes two states to get 1.5 dB , four states to get $3 \mathrm{~dB}, 16$ states to get 4.5 dB , perhaps 64 states to get 5.25 dB , and 256 states to get 6 dB , as long as we require a reasonably small error coefficient (for trellis codes). 2) Trellis codes are better than lattice codes, if we consider effective coding gain versus decoding complexity. Granted, our measure of effective coding gain is based on a rule of thumb that is only approximately valid for moderately low error rates and that generally does not take into account neighbors other than the nearest; granted also, our measure of decoding complexity is based specifically on the algorithms of part II and is highly implemen-tation-dependent. We have not even given an effective coding gain $\gamma_{\text {eff }}$ for lattice codes because our rule of thumb is questionable when the number of nearest neighbors is so high. Nonetheless, it seems clear that the very large error coefficients of lattice codes will mean that their effective performance will be significantly inferior to that of comparable trellis codes. For codes with the same parameters, due to the many symmetries of the lattice codes, the decoding complexity does seem to increase slightly as we go from a lattice code to Class I to Class II to Class IV-i.e., as the code becomes "more convolutional"-but this slight effect is very much outweighed by the large reduction in error coefficient. 3) It is best to keep the redundancy $\rho$ as small as possible, within reasonable limits. The densest lattices are all selfdual, so their redundancy is equal to half their depth (the comparable trellis codes use rate- $k / 2 k$ encoders). Ungerboeck [8] made the point, using channel capacity arguments, that there is little to be gained by going beyond 1 bit of redundancy per symbol, i.e., by using rate- $k /$ ( $k+r$ ) encoders with $r>1$ . Wei [11] recognized that, by going beyond two dimensions, the normalized redundancy $\rho$ could be reduced below one and thus that good codes could be obtained with small constellation expansion. The evidence of the codes presented here is that, while the very best codes (e.g., Ungerboeck's four-state two-dimensional code, or all of his one-dimensional codes) may have informativity equal to redundancy, like the best lattices, there is very little loss if redundancy is reduced as long as we do not go to extremes (e.g., Wei's 16-dimensional codes, with $\rho=1 / 8$ ). Compare, for example, the Ungerboeck-type two-dimensional codes with the one-dimensional; or the codes (or lattices) with $\rho=1 / 2$ that achieve $\gamma=2^{3 / 2} (4.52 \mathrm{~dB})$ , versus those with $\rho=3 / 2$ . 4) The Ungerboeck codes are still the benchmark. Comparing all codes shown in Fig. 12(a)-(c), we see that little improvement has been achieved over Ungerboeck's original results. The one-dimensional codes are generally slightly better than the two-dimensional codes, but this is offset by their normalized redundancy of $\rho=2$ , which gives a constellation expansion factor of four, versus the two-dimensional redundancy of $\rho=1$ , which gives a constellation expansion factor of two. Some of the $\boldsymbol{Z}^{\mathbf{2}} \boldsymbol{/} \mathbf{4} \boldsymbol{Z}^{\mathbf{2}}$ codes found by Eyuboglu are slightly better, but this is not surprising because any $2^{\nu}$ -state rate- $1 / 2 \quad \boldsymbol{Z} / 4 \boldsymbol{Z}$ code can also be regarded as a $2^{\nu}$ -state rate- $2 / 4 \boldsymbol{Z}^{2} / 4 \boldsymbol{Z}^{2}$ code. Some of Wei's four-dimensional codes are also slightly better; this is more surprising and significant because these codes also have normalized redundancy $\rho=1 / 2$ . (The Wei eightdimensional codes are also in the vicinity of the twodimensional Ungerboeck codes but with $\rho=1 / 4$ ; the comparable Wei/Calderbank-Sloane codes have slightly higher decoding complexity and, more importantly, $\rho=5 / 4$ .)

#18 Parameter comparison between Wei code and X24 lattice

Of all the codes we have considered, a few stand out as "special." The four-state two-dimensional Ungerboeck code is certainly in this category because it is the unique code with $\gamma=2$ and $\tilde{N}_{0}=4$ and because of its symmetries and close relationship to the special lattice $E_{8}$ . As mentioned before, the 256 -state one-dimensional Ungerboeck code is also special because it is a code with $\gamma=4$ and $\tilde{N}_{0}=4$ , which makes it the trellis cousin of the very special lattice $\Lambda_{24}$ . The 16-state four-dimensional Wei code is the single code that most clearly improves on the Ungerboeck-type codes; note that it has the same parameters as the lattice $X_{24}$ .

(However, could there be a 16 -state code with $\mu=3, \rho=3 / 2, d_{\text {min }}^{2}=8$ -i.e., with the same parameters as $\Lambda_{16}$ -that achieves $\gamma=2^{3 / 2}, \tilde{N}_{0}=4$ or 8 ? or a 16-state, $\mu=2, \rho=1, d_{\text {min }}^{2}=6$ code that achieves $\gamma=3$ , $\tilde{N}_{0}=4$ or 8 ?)

These results suggest that there is little likelihood of finding significantly better codes in terms of the parameters that we have considered. Above 5 dB , where the curves for the Ungerboeck-type codes begin to tail off, there could be codes with 64 or more states that are superior to those known, although it is also possible that this is close enough to channel capacity that the performance/complexity curve will tend to saturate for all codes. We do not expect to need depths more than three to four in this range, so in view of Lemma 5 a systematic search of $\boldsymbol{Z}^{N} / \boldsymbol{4 Z}^{N}$ codes should settle the question. (Ternary codes may also be attractive in this region; see [16].)

VIII. Conclusion

We have defined coset codes in such a way as to embrace all of the good known codes and to suggest a large variety of extensions. Their characterization in terms of geometrical parameters like the fundamental coding gain turns out to be quite simple and allows us to sort out from the variety of schemes that have been proposed those that seem to have the best combinations of coding gain, decoding complexity, and constellation expansion.

With respect to those parameters, Ungerboeck's original codes continue to stand up very well vis-à-vis the rest of the codes considered. Wei's codes probably represent the most significant improvement, particularly because they reduce the constellation expansion factor below two, while achieving some gains in coding gain and decoding complexity. While the codes of Calderbank and Sloane do not rise to the top in any of our comparisons, their introduction of the lattice/coset viewpoint has clearly been the most significant conceptual contribution since Ungerboeck.

In the opinion of the author, while many of the best codes may have already been discovered, the fields of coset codes and trellis codes are no further developed than that of ordinary coding theory in the early 1960's. There may well be better codes still to be discovered in the $3-6-\mathrm{dB}$ range, as indicated in the previous section. Suboptimal decoders should be investigated, as well as codes specifically tailored for such decoders. The design of good sphere packings in large dimensions is a topic of active mathematical interest, and the development of still more powerful trellis codes is wide open. Codes which combine good coding gain with other properties, such as rotational invariance and decoding delay, will be important for applications. The theory of phase-modulated coset codes should be brought along in parallel with that of the lattice-type codes. The combination of these codes with spectral shaping requirements, e.g., signalling for partial-response or other band-limited channels, is an important topic. The vector quantization problem is dual to the sphere packing problem and in the block case has been attacked successfully with lattices; there should also be good trellis quantizers. The question of how to design good multidimensional constellations is not closed. Finally, it seems that mathematicians should be interested in trellis codes, particularly as infinite-dimensional generalizations of fi-nite-dimensional sphere-packings. As in other parts of information theory, the interplay between theoretically and practically motivated research is likely to prove fruitful for some time.

Acknowledgment

This work was directly stimulated by the work of L.-F. Wei on multidimensional trellis codes. Remarks by A. R. Calderbank and preprints of the Calderbank-Sloane papers were most helpful in pointing the way to the lattice/coset viewpoint. G. R. Lang kindly provided references to the early history of lattices in communications. I am indebted to M. V. Eyuboglu for providing some of the code parameters and for permission to publish the improved codes cited in the text. I am grateful for comments on earlier versions of this paper by J. B. Anderson, A. R. Calderbank, R. G. Gallager, M. I. Klun, G. Ungerboeck, and L.-F. Wei.

References

[1] G. D. Forney, Jr., R. G. Gallager, G. R. Lang, F. M. Longstaff, and S. U. Qureshi, "Efficient modulation for band-limited channels," IEEE J. Select. Areas Commun., vol. SAC-2, pp. 632-647, 1984. [2] E. S. Barnes and G. E. Wall, "Some extreme forms defined in terms of Abelian groups," J. Australian Math. Soc., vol. 1, pp. 47-63, 1959. [3] J. Leech, "Notes on sphere packings," Can. J. Math., vol. 19, pp. 251-267, 1967. [4] H. S. M. Coxeter, Twelve Geometric Essays. Carbondale, IL: Southern Illinois Univ. Press, 1968. [5] R. deBuda, "The upper bound of a new near optimal code," IEEE Trans. Inform. Theory, vol. IT-21, pp. 441-445, 1975. [6] J. Leech and N. J. A. Sloane, "Sphere packings and error-correcting codes," Can. J. Math., vol. 23, pp. 718-745, 1971. [7] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups. New York: Springer-Verlag, 1988. [8] G. Ungerboeck, "Channel coding with multilevel/phase signals." IEEE Trans. Inform. Theory, vol. IT-28, pp. 55-67, 1982. [9] L. F. Wei, "Rotationally invariant convolutional channel coding with expanded signal space. Part II: Nonlinear coding," IEEE J. Select. Areas Commun., vol. SAC-2, pp. 672-686, 1984. [10] A. R. Calderbank and N. J. A. Sloane, "Four-dimensional modulation with an eight-state trellis code," AT&T Tech. J., vol. 64, pp. 1005-1018, 1985. [11] L. F. Wei, "Trellis-coded modulation with multidimensional constellations," IEEE Trans. Inform. Theory, vol. IT-33, pp. 483-501, 1987. [12] A. R. Calderbank and N. J. A. Sloane, "An eight-dimensional trellis code," Proc. IEEE, vol. 74, pp. 757-759, 1986. [13] __, "New trellis codes based on lattices and cosets," IEEE Trans. Inform. Theory, vol. IT-33, pp. 177-195, 1987. [14] G. D. Forney, Jr., and L. F. Wei, "Multidimensional signal constellations," in preparation, 1989. [15] F. Pollara, R. J. McEliece, and K. Abdel-Ghaffar, "On finite-state codes," submitted to IEEE Trans. Inform. Theory, 1987. [16] G. D. Forney, Jr., "Coset codes - Part III: Ternary codes, lattices, and trellis codes," in preparation, 1989. [17] A. LaFanchere, R. H. Deng, and D. J. Costello, Jr., "Multidimensional trellis coded phase modulation using unit-memory and par-tial-unit-memory convolutional codes," submitted to IEEE Trans. Inform. Theory, 1987. [18] G. D. Forney, Jr., "Coset codes-Part II: Binary lattices and related codes," IEEE Trans. Inform. Theory, this issue, pp. 1152-1187. [19] E. L. Cusack, "Error control codes for QAM signalling," Electron. Lett., vol. 20, pp. 62-63, 1984. [20] G. D. Forney, Jr., "Convolutional codes I: Algebraic structure," IEEE Trans. Inform. Theory, vol. IT-16, pp. 720-738, 1970. [21] G. Ungerboeck, "Trellis-coded modulation with redundant signal sets. Part II: State of the art," IEEE Commun. Mag., vol. 25, no. 2, pp. 12-21, 1987. [22] M. L. Honig, "Optimization of trellis codes with multilevel amplitude modulation with respect to an error probability criterion," IEEE Trans. Commun., vol. COM-34, pp. 821-825, 1986. [23] G. J. Pottie and D. P. Taylor, "An approach to Ungerboeck coding for rectangular signal sets," IEEE Trans. Inform. Theory, vol. IT-33, pp. 285-290, 1987. [24] D. Divsalar and M. K. Simon, "Multiple trellis-coded modulation (MTCM)," preprint, 1986. [25] D. Divsalar, M. K. Simon, and J. H. Yuen, "Trellis coding with asymmetrical modulations," IEEE Trans. Commun., vol. COM-35, pp. 130-141, 1987. [26] G. D. Forney, Jr., "Structural analysis of convolutional codes via dual codes," IEEE Trans. Inform. Theory, vol. IT-19, pp. 512-518, 1973.

[^0]: Manuscript received September 2, 1986; revised September 18, 1987. This paper was presented at the 1987 Information Theory Workshop, Bellagio, Italy, June 24, 1987.

The author is with the Codex Corporation, 7 Blue Hill River Road, Canton, MA 02021.

IEEE Log Number 8824503.

Feedback

Overall FeedbackBeta

Here are some high-level reactions to the text.

Outline

The paper proceeds in a logical sequence, moving from historical context to specific geometric definitions. First, it establishes the general framework of coset codes defined by a lattice partition $\Lambda/\Lambda'$ and a binary encoder $C$ . Second, it provides a self-contained primer on lattice theory, specifically focusing on binary lattices and their decomposition. Third, it applies this framework to classify specific lattice families (such as Barnes-Wall) and trellis codes. The final sections are intended to compare these schemes in terms of performance versus complexity.

Broadness of the central classification

The introduction states that "practically all" constructive coding techniques for band-limited channels can be modeled as coset codes. Readers generally expect such sweeping universality to be supported by explicit mappings for the major families of existing codes, such as the multidimensional TCM schemes of Wei or the Calderbank–Sloane codes. Currently, the text focuses heavily on the 4-state Ungerboeck code and Barnes-Wall lattices. It would be helpful to either include specific parameterizations for the other major families to demonstrate the framework's universality or, alternatively, to align the introductory claims with the specific subset of lattice codes fully detailed in the text.

Distinguishing synthesis from new results

The manuscript combines a synthesis of existing lattice theory (referencing Construction A, Barnes-Wall lattices, and work by Calderbank and Sloane) with a new unifying perspective. Because the exposition is so seamless, it is occasionally difficult to distinguish the synthesis of known mathematics from the novel contributions of this specific paper. A distinct subsection itemizing which definitions, structural lemmas, and complexity comparisons are original to this work would help the audience appreciate the specific theoretical advances being made here.

The predictive value of fundamental coding gain

The framework evaluates schemes primarily via the fundamental coding gain $\gamma(\mathbb{C}) = 2^{-\rho(\mathbb{C})} d_{\min}^2(\mathbb{C})$ . While the text categorizes shaping gain as "peripheral," relying solely on $\gamma$ leaves a gap between the geometric classification and practical modem performance. Readers looking for concrete engineering trade-offs might look for analytical bounds or simulations that link the $\gamma$ -ordering to error rates or capacity gaps at finite SNRs. Clarifying whether the classification is intended only for asymptotic, infinite-lattice behavior or for practical operation would resolve this ambiguity.

Precision in the proposed taxonomy

The text aims to categorize "known types of coset codes" and systematize them. However, without explicit $(\Lambda/\Lambda', C)$ descriptions for the broader set of industry-standard codes mentioned in the introduction, the work risks looking like a partial survey rather than a definitive classification. Providing precise definitions for the "generic classes" introduced later would allow experts to use the framework to locate any given practical scheme within the design space.

Self-containment relative to Part II

Several key structural arguments—particularly regarding redundancies, partition chains, and decoding complexity—rely on results cited from the "to appear" companion paper (Part II). Since the taxonomy and performance claims hinge on these lattice parameters, the current text forces the reader to accept certain premises on faith. Incorporating the essential proofs or theorems directly into this manuscript would ensure the classification is rigorous and verifiable on its own terms.

Detailed Feedback

Minimum distance argument for Class VI codes

The minimum-distance argument for Class VI codes relies on the statement that any finite-weight input sequence produces encoder outputs that are nonzero in at least three time instants, leading to a path-distance lower bound of $3 d_{\min}^2(\Lambda)$ . It is not clear how this follows from property b) of circuit $T$ ("no infinite input sequence produces a finite output sequence"), and for the simple example $T$ given in the Class V section (output $(\boldsymbol{x}_{t-1},0)$ ) this three-output property appears to fail. For that $T$ , an input impulse on a coordinate other than the first produces nonzero outputs only at two times. In that case the generic lower bound on the distance between distinct paths is only $2 d_{\min}^2(\Lambda)$ , so it is not established that path distances are strictly greater than $d_{\min}^2(\Lambda') = 2 d_{\min}^2(\Lambda)$ , nor that the error coefficient is necessarily equal to that of $\Lambda'$ . This part of the Class VI discussion would benefit from either a more restrictive, explicit condition on $T$ that guarantees the three-output property, or a revised argument that justifies the stated minimum distance and error coefficient.

Inconsistency in Decoding Complexity Calculation

N_{D} is the number of decoding operations using the trellis-based decoding algorithms of the partition \Lambda / \Lambda^{\prime} whose complexity is given in Table III, followed by a conventional Viterbi algorithm for the convolutional code, and \tilde{N}_{D}=2 N_{D} / N is the decoding complexity per two dimensions. (For each unit of time, for each of the 2^{\nu} states, the Viterbi algorithm requires 2^{k} additions and a comparison of 2^{k} numbers, or 2^{k}-1 binary comparisons, so that its complexity is \beta 2^{k+\nu}, where \beta=2-2^{-k}, and 2^{k+\nu} is the number of branches per stage of the trellis, which is the measure of complexity used by Ungerboeck [21], following Wei [11].)

The definition immediately before Table IV states that $N_D$ is the sum of the complexity of decoding the partition $\Lambda/\Lambda'$ (from Table III) and the complexity of the Viterbi algorithm, with $\tilde N_D = 2 N_D / N$ as the normalized figure. For the two-dimensional Ungerboeck codes, the reported $\tilde N_D$ values are consistent with this rule: they equal the sum of the Table III partition complexity and the Viterbi complexity, normalized as described.

For the one-dimensional Ungerboeck codes based on $Z/4Z$ , however, the tabulated $\tilde N_D$ values appear to equal the Viterbi complexity alone (i.e., $2 \beta 2^{k+\nu}/N$ with $k=1$ ), with no additional term that could be attributed to decoding the partition $Z/4Z$ . Since $Z/4Z$ is not listed in Table III, it is unclear whether its partition complexity has been deliberately neglected, treated as negligible, or handled by a different convention.

This discrepancy between the stated definition of $N_D$ and the way the 1D entries of Table IV are actually computed can make it difficult for a reader to reproduce the numbers and to understand exactly what operations are being counted. It would be helpful either to clarify explicitly that, for the one-dimensional $Z/4Z$ codes, $N_D$ intentionally includes only the Viterbi component, or to revise the table so that the 1D entries consistently incorporate the partition-decoding complexity as defined.

Clarity of argument for mod-4 lattice structure

For a further refinement, let $\Lambda_{e}$ be the set of all points in $\Lambda$ whose coordinates are all even. Then $\Lambda_{e}$ is a lattice, a sublattice of $\Lambda$ , with $\boldsymbol{4} \boldsymbol{Z}^{N}$ as a sublattice, so $\Lambda / \Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}$ is a partition chain, and there is a coset decomposition of the form $\Lambda=\boldsymbol{4} \boldsymbol{Z}^{N}+\left[\Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}\right]+\left[\Lambda / \Lambda_{e}\right]$ . The lattice $\Lambda_{e}$ is clearly a mod-2 lattice scaled by a factor of 2 ; consequently, $\Lambda_{e}=4 \boldsymbol{Z}^{N}+2 C$ , where $C$ is a binary ( $N, K^{\prime}$ ) code for some $K^{\prime}$ , by Lemma 3; in other words, the coset representatives $\left[\Lambda_{e} / \boldsymbol{4} \boldsymbol{Z}^{N}\right]$ may be taken as $2 C $, or$ \left{2 \sum a_{k} \boldsymbol{g}{k}\right} $, where the$ \boldsymbol{g}{k} $constitute a set of$ K^{\prime} $binary generators for the code$ C $. Thus we may take$ K^{\prime} $of the generators to be$ 2 \boldsymbol{g}_{k}, 1 \leq k \leq K^{\prime}$, and we may write

\Lambda=4 \boldsymbol{Z}^{N}+2 C+\left\{\sum a_{k} \boldsymbol{g}_{k}\right\}

In the paragraph beginning "For a further refinement, let $\Lambda_e$ be the set of all points in $\Lambda$ whose coordinates are all even," the transition from the generators $2g_k$ for $k>K'$ to the statement about a binary subcode $C'\subset C$ is quite compressed. The text asserts that the lattice generated by $\{2g_k,\,K'+1\le k\le K\}$ has elements congruent to $2c' $modulo 4 for$ c' $in an$ (N,K-K') $code$ C' $that is a subcode of$ C $, but it does not spell out how$ C' $is defined from$ C$ or why it must be nested.

Because $\Lambda_e=4\boldsymbol{Z}^N+2C$ has already been established, the intended reasoning is that each $2g_k\in\Lambda_e$ must satisfy $2g_k\equiv 2c'_k\pmod{4}$ for some $c'_k\in C$ , and the binary code $C'$ is the subcode of $C$ generated by these $c'_k$ . Making this map from $\Lambda_e$ to $C$ explicit, and then noting that the image of the sublattice $\Lambda'\subset\Lambda_e$ under this map is a subcode $C'\subset C$ , would more clearly justify the subcode relationship. Adding one or two sentences along these lines would align this real mod‑4 argument with the more explicit complex mod‑2 argument given later in Lemma 4 and make the structure of general mod‑4 lattices easier to follow.

Undefined notation for the dual of a Barnes-Wall lattice

The sentence listing special cases of the duals of the principal sublattices includes the identity $\Lambda(0, n)^{\perp}= \Lambda(n)$ . The notation $\Lambda(n)$ , with a single argument, is not defined elsewhere in the paper; the family is consistently denoted $\Lambda(r, n)$ , and Barnes–Wall lattices themselves are written as $\Lambda(0, n)$ (or with subscripts such as $\Lambda_{16}$ ). Moreover, just above, you have shown that $\Lambda(0, n)$ is self-dual, and the general dual code formula specialized to $r=0$ reproduces the original code formula for $\Lambda(0, n)$ . In this context, it appears that the right-hand side should read $\Lambda(0, n)$ , and that " $\Lambda(n)$ " is a typographical slip introducing undefined notation and an apparent contradiction with the stated self-duality. Clarifying or correcting this identity would prevent possible confusion.

Path enumeration for Class V error coefficient

The coefficient of $N_{\Lambda}^{2}$ follows from the observation that in the code trellis, starting from a given zero state and ending at some later zero state, there are $2^{k}-1$ nonzero paths of length $2,2^{k-1}-1$ nonzero paths of length 3 , and so forth, up to $2-1=1 $nonzero path of length$ k+1 $, so that the total number of nonzero paths is$ 2^{k+1}-k-2 $(this is generally true for any noncatastrophic rate-$ k /(k+1)$ encoder; see Forney [26]).

The derivation of the coefficient of $N_{\Lambda}^{2}$ in the Class V error coefficient is not very transparent from the description of the encoder in Fig. 13(d). As written, the sentence reads as though "nonzero paths" means all paths in the state diagram that start and end in the zero state with no intermediate zero states. If one models the state as the previous $k$ ‑bit input block, a straightforward counting argument then gives $(2^k-1)^{L-1}$ such input sequences of length $L$ , which does not match the claimed pattern $2^k-1,2^{k-1}-1,\dots,2-1$ for $L=2,\dots,k+1$ .

It would help if you could clarify precisely what class of rate‑ $k/(k+1)$ encoders is being considered in this counting (e.g., whether $T$ is assumed memoryless and the state is exactly $x_{t-1}$ , or whether you are invoking a particular canonical realization from [26]), and what exactly is meant by "nonzero paths" (distinct zero‑state error events in the code trellis, as opposed to arbitrary nonzero input or state sequences). A brief reminder of the relevant result from [26], or a more specific pointer into that reference, would make it easier for the reader to see why the factor $2^{k+1}-k-2$ is appropriate here.

Incorrect example for γ=3 codes in Discussion

In the quoted sentence, the 16‑state Class III code is cited as an example of a code that "achieve[s] $\gamma=3$ (4.77 dB)." This appears inconsistent with the parameters given earlier in the paper. According to Table XI, the only 16‑state Class III instance is the $N=8$ code based on the partition $E_8/2E_8$ , with normalized redundancy $\rho=2$ and $d_{\min}^2=16$ , so its fundamental coding gain is $\gamma = 2^{-2}\cdot 16 = 4$ (6.02 dB), not 3. The $N=4$ Class III code based on $D_4/2D_4$ has $d_{\min}^2=6$ and $\rho=3/2$ , giving $\gamma = 3/\sqrt{2} \approx 2.12$ (3.27 dB), but this is a 4‑state code, not 16‑state. Thus none of the tabulated Class III constructions lies at $\gamma=3$ , even though the 16‑state Class VII example in the same sentence (based on $D_8^{\perp}/RE_8$ ) does. It would be helpful to correct or clarify this part of the Discussion so that the listed examples of $\gamma=3$ codes align with the parameters in Table XI.

Definition of fundamental volume and redundancy in the Introduction

Nitpick

\gamma(\mathbb{C})=2^{-\rho(\mathbb{C})} d_{\min }^{2}(\mathbb{C})

where the normalized redundancy $\rho(\mathbb{C})$ (per two dimensions) is equal to $2 r(\mathbb{C}) / N$ .

At first the statement that the "fundamental volume $V(\mathbb{C})$ per $N$ dimensions … is equal to $2^{r(\mathbb{C})}$ where … $r(\mathbb{C})$ is equal to … $r(\Lambda)$ of the lattice $\Lambda$ " made me think this relation was being asserted in full generality, even though the notion of lattice redundancy $r(\Lambda)$ had not yet been defined. Then I understood from Sections II.E and III.B that $r(\Lambda)$ is introduced specifically for binary lattices, with $V(\Lambda)=2^{r(\Lambda)}$ under the normalization $V(\mathbb{Z}^N)=1$ , and that $V(\mathbb{C})=2^{r(\mathbb{C})}$ is a consequence of these later results in that restricted setting.

To avoid any momentary ambiguity, it might help if this introductory paragraph signaled more explicitly that the formula $V(\mathbb{C})=2^{r(\mathbb{C})}$ (and the appearance of $r(\Lambda)$ ) rely on the binary-lattice framework developed in Section II, or if a brief forward reference to the later definition of $r(\Lambda)$ were added here.

Role of algebraic structure in the coset code framework

Nitpick

At first the sentence "these code constructions rely very little on the linearity properties of the groups … on which they are based" made me wonder whether this was meant to downplay the importance of the underlying group structure itself, which is clearly required to define coset partitions. Then I understood from the continuation ("the codes so constructed are often not linear") and from the surrounding discussion that the intended contrast is between linear-code structure (in the usual coding sense) and the partition/distance properties that actually drive performance.

To make this distinction immediately transparent, it might still help to indicate more explicitly what notion of "linearity" is meant here—e.g., linearity of the resulting codes over GF(2) versus the more basic group axioms that enable coset decompositions—so that readers do not conflate the two when interpreting this paragraph.

Description of coset representatives in Sec. II.F

Nitpick

The sentence

"Thus the $2^{K}$ binary linear combinations $\{\sum a_k \boldsymbol{g}_k\}$ of the generators $\boldsymbol{g}_k$ are a system of coset representatives $[\Lambda_k / \Lambda_{k+1}]$ for the partition $\Lambda_k / \Lambda_{k+1}$ …"

appears inconsistent with the definitions just given. Each intermediate quotient $\Lambda_k / \Lambda_{k+1}$ is explicitly two‑way, so any system of coset representatives for such a partition should contain exactly two vectors, whereas the set $\{\sum a_k \boldsymbol{g}_k\}$ has $2^K$ elements.

Reading Lemma 2 immediately below makes the intended structure clear: for each $k$ , the pair $\{a_k \boldsymbol{g}_k, a_k\in\{0,1\}\}$ serves as representatives for the two-way partition $\Lambda_k / \Lambda_{k+1}$ , and the $2^K$ combinations $\{\sum a_k \boldsymbol{g}_k\}$ form a system of representatives for the overall partition $\Lambda / \Lambda'$ . To avoid the cardinality mismatch, it would be helpful to correct this sentence so that the role of the $2^K$ combinations is associated with the full partition $\Lambda / \Lambda'$ rather than with a single two-way quotient $\Lambda_k / \Lambda_{k+1}$ .

#10

Notation for a lattice in Table III

Nitpick

TABLE III Useful Lattice Partitions

Λ	$\Lambda^{\prime}$	$2 N$	$\left\\|\Lambda / \Lambda^{\prime}\right\\|$	$\mu$	$\kappa$	$\rho$	$d_{\text {min }}^{2}(\Lambda)$	$d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)$	$\tilde{N}_{D}$	$\tilde{N}_{D} /\left\\|\Lambda / \Lambda^{\prime}\right\\|$
...
$D_{8}^{1}$	$R E_{8}$	8	32	3	1	3/4	2	8	88	2.75
...

The lattice labeled $D_8^1$ in Table III does not seem to be defined anywhere in the paper or in Tables I–II. However, all the numerical parameters in that row (index 32, depth 3, $\kappa=1$ , $\rho=3/4$ , $d_{\min}^2(\Lambda)=2$ , $d_{\min}^2(\Lambda')=8$ ) match exactly what one obtains for the partition $D_8^{\perp}/R E_8$ , where $D_8^{\perp}$ is the dual of the checkerboard lattice $D_8$ as defined in Section II‑H. It would be helpful to correct the notation in Table III from $D_8^1$ to $D_8^{\perp}$ (or otherwise define $D_8^1$ explicitly) so that the table is consistent with the earlier dual‑lattice notation and with the subsequent discussion of the $D_8^{\perp}/R E_8$ partition in Section V.

#11

Inconsistent encoder description for Class II codes

Nitpick

Let $\Lambda / \Lambda^{\prime}$ again be a $2^{2 k}$ -way lattice partition with $d_{\text {min }}^{2}\left(\Lambda^{\prime}\right)=2 d_{\text {min }}^{2}(\Lambda)$ . Let $C$ be a rate- $1 / 2,2^{2 k}$ -state convolutional encoder as shown in Fig. 13(b), with $k$ information bits entering in each time unit, two units of memory, and $2 k $output bits,$ k $representing the inputs one time unit earlier, and \$ 2 k $representing the mod- 2 sum of the current inputs with those two time units earlier, which together select one of the$ 2^{2 k} $cosets of$ \Lambda^{\prime}$.

Statement " $2k$ output bits, $k$ representing the inputs one time unit earlier, and $2k$ representing the mod‑2 sum of the current inputs with those two time units earlier" seemed to imply a total of $3k$ bits rather than $2k$ output bits. After checking Fig. 13(b) and the surrounding discussion, it is clear that there are two $k$ ‑bit output groups: one equal to the previous input vector and the other equal to the mod‑2 sum of the current input with that two time units earlier. It would help to adjust the wording so that it is explicit that the $2k$ outputs consist of these two $k$ ‑bit groups, and thus avoid the impression that an extra $2k$ ‑bit component is present.

#12

Parameter mismatch for lattice X_32 in Discussion

Nitpick

In the Discussion paragraph comparing $X_{24}$ and $X_{32}$ to Wei's codes, the normalized redundancy is stated as $\rho = 1/2$ for both lattices. According to Table II and the code formulas, $X_{24}$ has $\rho = 1/2$ but $X_{32}$ has $\rho = 3/8$ , while both have $\mu = 2$ and $d_{\min}^2 = 4$ . It would be good to adjust this sentence so that the value of $\rho$ quoted for $X_{32}$ is consistent with Table II.

#13

Decoding complexity as a geometric parameter

Nitpick

Statement that "the decoding complexity … [is a] purely geometric parameter" initially made me think you were putting complexity on exactly the same footing as quantities like $d_{\min}^2$ and $V(\mathbb{C})$ , derived directly from the Euclidean embedding of the code. Then I understood from Sections I‑D, III, and V that you are working with a specific complexity measure tied to canonical trellis/partition ML decoders, so that once $C$ and $\Lambda/\Lambda'$ are fixed (and $C$ is taken in minimal form), the corresponding decoding complexity is indeed an intrinsic parameter of the code, just like $d_{\min}^2$ and $N_0$ .

Because "purely geometric" is used earlier in a more narrow sense (distances, volumes, densities), it might be helpful in the abstract to slightly soften or qualify this wording when referring to decoding complexity, e.g. by indicating that complexity is an intrinsic algorithmic parameter determined by $(C,\Lambda/\Lambda')$ rather than a geometric invariant in the strict sense. This would reduce the chance that readers interpret the phrase more literally than you intend.

#14

Mismatch in coset decomposition formulas

Nitpick

\Lambda=\Lambda^{\prime \prime}+\left[\Lambda^{\prime} / \Lambda^{\prime \prime}\right]+\left[\Lambda / \Lambda^{\prime}\right]

m=a_{0}+2 a_{1}+4 a_{2}+\cdots

\boldsymbol{Z}=[\boldsymbol{Z} / 2 \boldsymbol{Z}]+[2 \boldsymbol{Z} / 4 \boldsymbol{Z}]+[4 \boldsymbol{Z} / 8 \boldsymbol{Z}]+\cdots

At first the multi-term formula $\Lambda = \Lambda'' + [\Lambda'/\Lambda''] + [\Lambda/\Lambda']$ made me expect that every example of a partition chain decomposition would show an explicit "base lattice" term. Then, in the integer example for the chain $\boldsymbol{Z}/2\boldsymbol{Z}/4\boldsymbol{Z}/\cdots$ , the expression

\boldsymbol{Z} = [\boldsymbol{Z}/2\boldsymbol{Z}] + [2\boldsymbol{Z}/4\boldsymbol{Z}] + [4\boldsymbol{Z}/8\boldsymbol{Z}] + \cdots

involves only coset-representative sets, and no obvious analogue of the $\Lambda''$ term. After reading more carefully, it is clear that here the chain is infinite and its intersection is the trivial lattice $\{0\}$ , so the "finest" sublattice term is just $\{0\}$ and can be omitted without effect; this reconciles the two views. Still, it might help some readers if you briefly noted in the text that, for the integer chain, the limiting sublattice is $\{0\}$ , which is why the binary expansion can be written purely as a sum of coset representatives.

#15

Description of the Ungerboeck encoder in Sec. IV.A

Nitpick

Statement "Contrary to convention, the encoder is shown in coset code form, using the partition (2,2,1)/(2,1,2)/(2,0,∞) of binary codes of length 2" initially made me wonder whether the convolutional encoder $C$ itself was being characterized as a coset code. However, the following sentences make clear that what is being constructed is the labeling map $c(a)$ : $g_0$ and $g_1$ are coset representatives for the short block-code partition chain, and $c(a)=a_0 g_0 + a_1 g_1$ maps the encoder output bits to coset representatives, while $C$ remains a conventional four-state rate-1/2 convolutional code as in Fig. 10. To avoid any momentary ambiguity about what is meant by "encoder" here, it might help to say explicitly that it is the labeling (or overall coset-code encoder) that is being shown in coset code form, rather than the internal convolutional encoder $C$ itself.

#16

Incorrect formula for lattice coding gain

Nitpick

The sentence "Relative to $\gamma(\Lambda)=2^{-\rho(\Lambda)} d_{\min }^{2}(\Lambda')$ …" appears to contain a notational slip. Earlier, and throughout the paper, the fundamental coding gain of a binary lattice is defined as $\gamma(\Lambda)=2^{-\rho(\Lambda)} d_{\min }^{2}(\Lambda)$ , and this is the expression that is used in the displayed formulas immediately before and after this sentence. To keep the narrative statement consistent with those equations and with the general definition, the factor inside $d_{\min}^2(\cdot)$ here should be $\Lambda$ , not $\Lambda'$ . As written, this single prime does not alter any subsequent derivation (since the correct form is used in the formulas), but changing it would remove a small potential source of confusion.

#17

Incomplete premise in Lemma 6

Nitpick

At first the premise of Lemma 6, which only assumes that $\Lambda'$ is a mod‑2 lattice, made me think that $\Lambda$ might be allowed to be an arbitrary lattice, so that the conclusion about "sequences of integer $N$ ‑tuples" and the appeal to Lemma 5 would require stronger local hypotheses. Then I understood from the beginning of Section IV that, throughout this section, both $\Lambda$ and $\Lambda'$ are assumed to be binary lattices (hence integer lattices), and that the extra requirement in Lemma 6 is only that $\Lambda'$ be mod‑2 so that the reduction to the partition $\boldsymbol{Z}^N/2\boldsymbol{Z}^N$ applies.

Given that global setup, the logic and conclusion of Lemma 6 are sound. However, because the sentence just before the lemma speaks of "the important case where $\Lambda/\Lambda'$ is a partition of mod‑2 binary lattices" while the lemma statement itself mentions only $\Lambda'$ being mod‑2, it may be helpful to align the wording. Explicitly reminding the reader in or near the lemma that $\Lambda$ is a binary lattice (as per the section‑wide assumption) would make the statement more self‑contained and avoid any momentary doubt about whether further conditions on $\Lambda$ are intended.

#18

Parameter comparison between Wei code and X24 lattice

Nitpick

At first the sentence "note that it has the same parameters as the lattice $X_{24}$ " made me think that all listed quantities for the 16-state four-dimensional Wei code and $X_{24}$ were being claimed equal, including dimension, number of states, and error coefficient, which they clearly are not. Then I understood from the following parenthetical about $\Lambda_{16}$ , and by checking Tables II and VII, that "same parameters" here is meant in the more specific sense of the geometric parameters $(\mu,\rho,d_{\min}^2,\gamma)$ , for which the Wei code and $X_{24}$ do coincide. Because other readers may also initially read "parameters" more broadly, it could help to say explicitly which parameters are intended in this comparison, to avoid any momentary confusion when cross-checking the tables.

#19

Unclear reference to "last three codes"

Nitpick

At first the phrase "The last three codes appear to be equivalent to the aforementioned translation of Wei's $\boldsymbol{Z}^{8} / E_{8}$ codes" made me think it was referring to the three code families just enumerated in the parenthesis (Ungerboeck-type 1D/2D codes, the GCS-type $\boldsymbol{Z}^4/2\boldsymbol{Z}^4$ code, and the $A_2$ codes), which would be inconsistent dimensionally with the 8-dimensional Wei codes. Then I understood from Tables VII and IX, together with the earlier remark about translating Wei's $\boldsymbol{Z}^8/E_8$ codes to $E_8/R E_8$ , that "the last three codes" is intended to denote the last three 8-dimensional $E_8/R E_8$ entries in Table IX (those with $2^\nu = 16, 32, 64), which are indeed equivalent to those translations. It might still help some readers if the intended antecedent of "the last three codes" were identified more explicitly.