1 Introduction

Consider a generic complex-valued finite-dimensional inverse problem scenario in which the forward model is represented as

$$\begin{aligned} {\mathbf {b}} = {\mathcal {A}}({\mathbf {x}}) + {\mathbf {n}}, \end{aligned}$$
(1)

where \({\mathbf {b}} \in {\mathbb {C}}^M\) represents the measured data, \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) is the measurement operator, \({\mathbf {n}} \in {\mathbb {C}}^M\) represents noise, and \({\mathbf {x}} \in {\mathbb {C}}^N\) represents the unknown signal that we wish to estimate based on knowledge of \({\mathbf {b}}\) and \({\mathcal {A}}(\cdot )\). A common approach to solving this inverse problem is to find a least-squares solution

$$\begin{aligned} \hat{{\mathbf {x}}} = \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathcal {A}}({\mathbf {x}}) - {\mathbf {b}}\Vert _2^2, \end{aligned}$$
(2)

where \(\Vert \cdot \Vert _2\) denotes the standard \(\ell _2\)-norm. This choice of formulation can be justified in multiple ways, and e.g., corresponds to the optimal maximum likelihood estimator when the noise vector \({\mathbf {n}}\) is independent and identically-distributed (i.i.d.) Gaussian noise (Kay 1993). Even for more complicated noise statistics that follow, e.g., the Poisson, Rician, or non-Central Chi distributions, there exist iterative methods that allow the maximum likelihood estimator to be obtained by iteratively solving a sequence of least-squares objective functions (Erdogan and Fessler 1999; Fessler and Erdogan 1998; Varadarajan and Haldar 2015). In addition, another reason for the popularity of least-squares is that the optimization problem is frequently very easy to solve. For example, in the case where \({\mathcal {A}}(\cdot )\) is a linear operator (i.e., \({\mathcal {A}}(\cdot )\) can be represented in an equivalent matrix form as \({\mathcal {A}}({\mathbf {x}}) = {\mathbf {A}}{\mathbf {x}}\) for some matrix \({\mathbf {A}}\in {\mathbb {C}}^{M \times N}\)) with a trivial nullspace, the solution to Eq. (2) has the analytic closed-form expression (Luenberger 1969)

$$\begin{aligned} \hat{{\mathbf {x}}} = ({\mathbf {A}}^H{\mathbf {A}})^{-1}{\mathbf {A}}^H {\mathbf {b}}, \end{aligned}$$
(3)

where \(^H\) denotes the conjugate-transpose operation. In large-scale problems where N is very large, the matrix inversion in Eq. (3) may be computationally intractable, although there exist a variety of simple iterative algorithms that are guaranteed to converge to a globally-optimal solution, including Landweber iteration (Landweber 1951), the conjugate gradient (CG) algorithm (Hestenes and Stiefel 1952), and LSQR (Paige and Saunders 1982).

Instead of assuming linearity, we focus in this work on solving least-squares problems in the scenario where \({\mathcal {A}}(\cdot )\) is nonlinear, but can be represented as the summation and/or composition of some operators that are linear and some operators that are antilinear. Such nonlinear operators have sometimes been termed as real-linear operators in mathematical physics (Huhtanen and Ruotsalainen 2011). Important common examples of operators that possess this kind of nonlinear structure include the complex-conjugation operator

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \overline{{\mathbf {x}}}, \end{aligned}$$
(4)

the operator that takes the real part of a complex vector

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \mathrm {real}({\mathbf {x}}) \triangleq \frac{1}{2}{\mathbf {x}} + \frac{1}{2}\overline{{\mathbf {x}}}, \end{aligned}$$
(5)

and the operator that takes the imaginary part of a complex vector

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \mathrm {imag}({\mathbf {x}}) \triangleq \frac{1}{2i}{\mathbf {x}} - \frac{1}{2i}\overline{{\mathbf {x}}}. \end{aligned}$$
(6)

Even though the descriptions we present in this paper are generally applicable to arbitrary real-linear operators, we were initially motivated to consider such operators because of specific applications in magnetic resonance imaging (MRI) reconstruction. In particular, MRI images are complex-valued, and real-linear operators have previously been used to incorporate prior information about the image phase characteristics into the image reconstruction process, which helps to regularize/stabilize the solution when the inverse problem is ill posed. For example, there is a line of research within MRI that poses phase-constrained image reconstruction as either (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Markovsky 2011)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 + \lambda \Vert \mathrm {imag}({\mathbf {B}}{\mathbf {x}})\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}{\mathbf {x}} \\ \sqrt{\lambda } \cdot \mathrm {imag}({\mathbf {B}}{\mathbf {x}}) \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2, \end{aligned} \end{aligned}$$
(7)

or (Bydder 2010)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {R}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 \\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}\mathrm {real}({\mathbf {x}}) - {\mathbf {b}}\Vert _2^2 + \Vert \mathrm {imag}({\mathbf {x}})\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}\mathrm {real}({\mathbf {x}}) \\ \mathrm {imag}({\mathbf {x}}) \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2. \end{aligned} \end{aligned}$$
(8)

In the case of Eq. (7), \(\lambda \in {\mathbb {R}}\) is a positive regularization parameter and the matrix \({\mathbf {B}}\) embeds prior information about the image phase such that the regularization encourages \({\mathbf {B}}{\mathbf {x}}\) to be real-valued. In the case of Eq. (8), the phase information is embedded in the matrix \({\mathbf {A}}\), such that the vector \({\mathbf {x}}\) is strictly forced to be real-valued. Another line of research within MRI instead imposes phase constraints by leveraging linear predictability and the conjugate-symmetry characteristics of the Fourier transform, leading to an inverse problem formulation that can take the general form (Haldar 2014, Haldar and Setsompop 2020, Haldar 2015, Kim and Haldar 2018)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 + \lambda \Vert {\mathbf {C}}{\mathbf {x}} - {\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})}\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}{\mathbf {x}} \\ \sqrt{\lambda }{\mathbf {C}}{\mathbf {x}} - \sqrt{\lambda }{\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})} \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2, \end{aligned} \end{aligned}$$
(9)

for appropriate matrices \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\).

Although these are nonlinear least-squares problems because the operators involved are nonlinear, previous work has benefitted from the fact that this kind of inverse problem can be transformed into an equivalent higher-dimensional real-valued linear least-squares problem (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018). Specifically, this can be done by replacing all complex-valued quantities with real-valued quantities, e.g., separating \({\mathbf {x}} \in {\mathbb {C}}^N\) into its real and imaginary components, and treating this as an inverse problem in \({\mathbb {R}}^{2N}\) rather than the original space \({\mathbb {C}}^N\). While this real-valued transformation of the problem is effective and enables the use of standard linear least-squares solution methods, it can also cause computational inefficiencies and can sometimes be difficult to implement when the operators involved have complicated structure.

In this work, we describe theory that enables provably-convergent linear least-squares iterative algorithms to be applied to this nonlinear least-squares problem setting, without requiring a real-valued transformation of the original complex-valued vectors and operators. This can enable both improved computation speed and simplified algorithm implementations.

2 Background

2.1 Linear, antilinear, and real-linear operators

In this section, we briefly summarize some definitions and properties of linear and antilinear operators, with simplifications corresponding to our finite-dimensional problem context. Readers interested in a more detailed and more general treatment are referred to Refs. (Rudin 1991; Huhtanen and Ruotsalainen 2011).

Definition 1

(Linear Operator) An operator \({\mathcal {F}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) is said to be linear (or complex-linear) if it satisfies both additivity

$$\begin{aligned} {\mathcal {F}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {F}}({\mathbf {x}}) + {\mathcal {F}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$
(10)

and homogeneity

$$\begin{aligned} {\mathcal {F}}(\alpha {\mathbf {x}} ) = \alpha {\mathcal {F}}({\mathbf {x}}) \text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {C}}. \end{aligned}$$
(11)

Property 1

For any linear operator \({\mathcal {F}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), there is a unique matrix \({\mathbf {F}} \in {\mathbb {C}}^{M \times N}\) such that \({\mathcal {F}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}}\) for \(\forall {\mathbf {x}} \in {\mathbb {C}}^N\).

Definition 2

(Antilinear Operator) An operator \({\mathcal {G}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) is said to be antilinear (or conjugate-linear) if it satisfies both additivity

$$\begin{aligned} {\mathcal {G}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {G}}({\mathbf {x}}) + {\mathcal {G}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$
(12)

and conjugate homogeneity

$$\begin{aligned} {\mathcal {G}}(\alpha {\mathbf {x}} ) = \overline{\alpha } {\mathcal {G}}({\mathbf {x}})\text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {C}}. \end{aligned}$$
(13)

Property 2

For any antilinear operator \({\mathcal {G}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), there is a unique matrix \({\mathbf {G}} \in {\mathbb {C}}^{M \times N}\) such that \({\mathcal {G}}({\mathbf {x}}) = \overline{({\mathbf {G}}{\mathbf {x}})}\) for \(\forall {\mathbf {x}} \in {\mathbb {C}}^N\).

Note that by taking the matrix \({\mathbf {G}}\) as the identity matrix, we observe that applying complex conjugation \(\overline{{\mathbf {x}}}\) is an antilinear operation on the vector \({\mathbf {x}}\).

Definition 3

(Real-Linear Operator) An operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) is said to be real-linear if it satisfies both additivity

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {A}}({\mathbf {x}}) + {\mathcal {A}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$
(14)

and homogeneity with respect to real-valued scalars

$$\begin{aligned} {\mathcal {A}}(\alpha {\mathbf {x}} ) = \alpha {\mathcal {A}}({\mathbf {x}}) \text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {R}}. \end{aligned}$$
(15)

Real-linearity is a generalization of both linearity and antilinearity, as can be seen from the following property.

Property 3

Every real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) can be uniquely decomposed as the sum of a linear operator and an antilinear operator. In particular, \({\mathcal {A}}({\mathbf {x}}) = {\mathcal {F}}({\mathbf {x}}) + {{\mathcal {G}}}({\mathbf {x}})\) for \(\forall {\mathbf {x}} \in {\mathbb {C}}^N\), where \({\mathcal {F}}(\cdot ): {\mathbb {C}}^N\rightarrow {\mathbb {C}}^M\) is the linear operator defined by

$$\begin{aligned} {\mathcal {F}}({\mathbf {x}}) \triangleq \frac{1}{2} {\mathcal {A}}({\mathbf {x}}) - \frac{i}{2} {\mathcal {A}}(i{\mathbf {x}}) \end{aligned}$$
(16)

and \({\mathcal {G}}(\cdot ): {\mathbb {C}}^N\rightarrow {\mathbb {C}}^M\) is the antilinear operator defined by

$$\begin{aligned} {\mathcal {G}}({\mathbf {x}}) \triangleq \frac{1}{2} {\mathcal {A}}({\mathbf {x}}) + \frac{i}{2} {\mathcal {A}}(i{\mathbf {x}}). \end{aligned}$$
(17)

Property 4

For any real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), there are unique matrices \({\mathbf {F}},{\mathbf {G}} \in {\mathbb {C}}^{M \times N}\) such that \({\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}\) for \(\forall {\mathbf {x}} \in {\mathbb {C}}^N\).

Notably, both the \(\mathrm {real}(\cdot )\) and \(\mathrm {imag}(\cdot )\) operators from Eqs. (5) to (6) are observed to have real-linear form.

Property 5

For any two real-linear operators \({\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) and \({\mathcal {A}}_2(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), their sum \({\mathcal {A}}_1(\cdot ) + {\mathcal {A}}_2(\cdot )\) is also a real-linear operator.

Property 6

For any two real-linear operators \({\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^P\) and \({\mathcal {A}}_2(\cdot ): {\mathbb {C}}^P \rightarrow {\mathbb {C}}^M\), their composition \({\mathcal {A}}_2(\cdot ) \circ {\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M \triangleq {\mathcal {A}}_2({\mathcal {A}}_1(\cdot )) )\) is also a real-linear operator.

As can be seen, any operator that can be represented as the summation and/or composition of some operators that are linear and some operators that are antilinear can be viewed as a real-linear operator. As a result, the scenarios of interest in this paper all involve real-linear operators, and the remainder of this paper will assume that \({\mathcal {A}}(\cdot )\) obeys real-linearity, and has been decomposed in matrix form as \({\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}\).

2.2 Real-valued transformation of complex-valued least squares

Assuming \({\mathcal {A}}(\cdot )\) is real-linear as described in the previous subsection, Eq. (2) can be rewritten as

$$\begin{aligned} \hat{{\mathbf {x}}} = \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})} - {\mathbf {b}} \Vert _2^2, \end{aligned}$$
(18)

which is a nonlinear least squares problem. However, as stated in the introduction, previous work (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018) has transformed this problem into the form of a conventional linear least-squares problem by treating the variable \({\mathbf {x}}\) as an element of \({\mathbb {R}}^{2N}\) instead of \({\mathbb {C}}^N\). This was achieved by rewriting \({\mathbf {x}} \in {\mathbb {C}}^N\) as \({\mathbf {x}} = {\mathbf {x}}_r + i {\mathbf {x}}_i\), where the real-valued vectors \({\mathbf {x}}_r, {\mathbf {x}}_i \in {\mathbb {R}}^N\) represent the real and imaginary components of \({\mathbf {x}}\). This allows us to equivalently rewrite the solution to Eq. (18) as \(\hat{{\mathbf {x}}} = \hat{{\mathbf {x}}}_r + i \hat{{\mathbf {x}}}_i\), with

$$\begin{aligned} \begin{aligned} \{\hat{{\mathbf {x}}}_r,\hat{{\mathbf {x}}}_i\}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}}_r,{\mathbf {x}}_i \in {\mathbb {R}}^N}\Vert {\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}} \Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}}_r,{\mathbf {x}}_i \in {\mathbb {R}}^N}\left\| \begin{bmatrix} \mathrm {real}({\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}}) \\ \mathrm {imag}({\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}}) \end{bmatrix} \right\| _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{\tilde{{\mathbf {x}}} \in {\mathbb {R}}^{2N}} \left\| \tilde{{\mathbf {A}}} \tilde{{\mathbf {x}}} - \tilde{{\mathbf {b}}} \right\| _2^2, \end{aligned} \end{aligned}$$
(19)

where

$$\begin{aligned}&\tilde{{\mathbf {x}}} \triangleq \begin{bmatrix} {\mathbf {x}}_r \\ {\mathbf {x}}_i \end{bmatrix} \in {\mathbb {R}}^{2N}, \end{aligned}$$
(20)
$$\begin{aligned}&\tilde{{\mathbf {A}}} \triangleq \begin{bmatrix} \mathrm {real}({\mathbf {F}}) + \mathrm {real}({\mathbf {G}}) &{} - \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) \\ \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) &{} \mathrm {real}({\mathbf {F}}) - \mathrm {real}({\mathbf {G}}) \end{bmatrix} \in {\mathbb {R}}^{2M \times 2N}, \end{aligned}$$
(21)

and

$$\begin{aligned} \tilde{{\mathbf {b}}} \triangleq \begin{bmatrix} \mathrm {real}({\mathbf {b}}) \\ \mathrm {imag}({\mathbf {b}}) \end{bmatrix} \in {\mathbb {R}}^{2M}. \end{aligned}$$
(22)

The final expression in Eq. (19) has the form of a standard real-valued linear least-squares problem, and therefore can be solved using any of the linear least-squares solution methods described in the introduction. For example, the Landweber iteration (Landweber 1951) applied to this problem would proceed as given in Algorithm 1, and with infinite numerical precision, \(\hat{{\mathbf {x}}}_k\) is guaranteed to converge to a globally optimal solution as \(k \rightarrow \infty \) whenever \(0< \alpha < 2/ \Vert \tilde{{\mathbf {A}}}\Vert _2^2\).

figure a

As another example, the CG algorithm (Hestenes and Stiefel 1952) applied to this problem would proceed as given in Algorithm 2, and with infinite numerical precision, \(\hat{{\mathbf {x}}}_k\) would be guaranteed to converge to a globally optimal solution after at most 2N iterations.

figure b

Compared to the analytic linear least-squares solution corresponding to Eq. (3), these iterative algorithms are generally useful for larger-scale problems where the matrix \(\tilde{{\mathbf {A}}}\) may be too large to store in memory, and where the matrix has structure so that matrix-vector multiplications with \(\tilde{{\mathbf {A}}}\) and \(\tilde{{\mathbf {A}}}^H\) can be computed quickly using specially-coded function calls rather than working with actual matrix representations (e.g., if \(\tilde{{\mathbf {A}}}\) has convolution structure so that matrix-vector multiplication can be implemented using the Fast Fourier Transform, if \(\tilde{{\mathbf {A}}}\) is sparse, etc.).

Although the problem transformation from Eq. (19) has been widely used (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018), it can also be cumbersome to work with if the operator \({\mathcal {A}}(\cdot )\) has more complicated structure. For example, the optimization problem in Eq. (9) involves the composition of linear and antilinear operators, and the \(\tilde{{\mathbf {A}}}\) matrix corresponding to this case has a complicated structure that is laborious to derive. In particular, with much manipulation, the matrix for this case can be derived to be

$$\begin{aligned} \tilde{{\mathbf {A}}} = \begin{bmatrix} \mathrm {real}({\mathbf {A}}) &{} -\mathrm {imag}({\mathbf {A}}) \\ {\mathbf {H}}_{11} &{} {\mathbf {H}}_{12} \\ \mathrm {imag}({\mathbf {A}}) &{} \mathrm {real}({\mathbf {A}}) \\ {\mathbf {H}}_{21} &{} {\mathbf {H}}_{22} \end{bmatrix}, \end{aligned}$$
(23)

with

$$\begin{aligned}&{\mathbf {H}}_{11} = \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {C}})- \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) -\sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}), \end{aligned}$$
(24)
$$\begin{aligned}&{\mathbf {H}}_{12} = -\sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {C}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) - \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) , \end{aligned}$$
(25)
$$\begin{aligned}&{\mathbf {H}}_{21} = \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {C}}) - \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) , \end{aligned}$$
(26)

and

$$\begin{aligned} {\mathbf {H}}_{22} = \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {C}}) + \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {real}({\mathbf {E}}). \end{aligned}$$
(27)

Of course, Eq. (9) relies on a relatively simple mixture of linear and antilinear operators, and problems involving more complicated mixtures would be even more laborious to derive.

Beyond just the effort required to compute the general form of \(\tilde{{\mathbf {A}}}\), it can also be computationally expensive to try to use this type of expression in an iterative algorithm, particularly when the different operators have been implemented as specially-coded function calls. For example, if we were not given the actual matrix representations of \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) in Eq. (23) and only had function calls that implemented matrix-vector multiplication with these matrices, then a naive implementation of matrix multiplication between \(\tilde{{\mathbf {A}}}\) and a vector would require 4 calls to the function that computes multiplication with \({\mathbf {A}}\) (e.g., to compute \(\mathrm {real}({\mathbf {A}}){\mathbf {r}}\) for an arbitrary real-valued vector \({\mathbf {r}} \in {\mathbb {R}}^N\), we could instead compute the complex-valued matrix-vector multiplication function call to obtain \({\mathbf {s}} = {\mathbf {A}}{\mathbf {r}}\), and then use \(\mathrm {real}({\mathbf {A}}){\mathbf {r}} = \mathrm {real}({\mathbf {s}})\), with an analogous approach for computing \(\mathrm {imag}({\mathbf {A}}){\mathbf {t}}\) for an arbitrary real-valued vector \({\mathbf {t}} \in {\mathbb {R}}^N\)), 4 calls to the function that computes multiplication with \({\mathbf {C}}\), 8 calls to the function that computes multiplication with \({\mathbf {D}}\), and 8 calls to the function that computes multiplication with \({\mathbf {E}}\). This relatively large number of function calls represents a substantial increase in computational complexity compared to a standard evaluation of the complex-valued forward model, which would only require the use of one function call for each operator. Of course, this number of computations is based on a standard naive implementation, and additional careful manipulations could be used to reduce these numbers of function calls by exploiting redundant computations – however, this would contribute further to the laborious nature of deriving the form of \(\tilde{{\mathbf {A}}}\).

3 Main results

Our main results are given by the following lemmas, which enable the use of the real-valued linear least-squares framework from Sect. 2.2 while relying entirely on complex-valued representations and computations.

Lemma 1

Consider a real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), with corresponding \(\tilde{{\mathbf {A}}}\) matrix as defined in Eq. (21). Also consider arbitrary vectors \({\mathbf {m}} \in {\mathbb {C}}^N\) and \({\mathbf {n}} \in {\mathbb {C}}^M\), which are decomposed into their real and imaginary components according to \({\mathbf {m}} = {\mathbf {m}}_r + i{\mathbf {m}}_i\) and \({\mathbf {n}} = {\mathbf {n}}_r + i {\mathbf {n}}_i\), with \({\mathbf {m}}_r,{\mathbf {m}}_i \in {\mathbb {R}}^N\) and \({\mathbf {n}}_r,{\mathbf {n}}_i \in {\mathbb {R}}^M\). Then

$$\begin{aligned} \tilde{{\mathbf {A}}} \begin{bmatrix} {\mathbf {m}}_r \\ {\mathbf {m}}_i \end{bmatrix} = \begin{bmatrix} \mathrm {real}({\mathcal {A}}({\mathbf {m}})) \\ \mathrm {imag}({\mathcal {A}}({\mathbf {m}})) \end{bmatrix} \end{aligned}$$
(28)

and

$$\begin{aligned} \tilde{{\mathbf {A}}}^H \begin{bmatrix} {\mathbf {n}}_r \\ {\mathbf {n}}_i \end{bmatrix} = \begin{bmatrix} \mathrm {real}({\mathcal {A}}^*({\mathbf {n}})) \\ \mathrm {imag}({\mathcal {A}}^*({\mathbf {n}})) \end{bmatrix}, \end{aligned}$$
(29)

with \({\mathcal {A}}^*(\cdot )\) defined below.

Definition 4

(\({\mathcal {A}}^*(\cdot )\)]) Consider a real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\), which is represented for \(\forall {\mathbf {x}} \in {\mathbb {C}}^N\) as \({\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}\) for some matrices \({\mathbf {F}},{\mathbf {G}} \in {\mathbb {C}}^{M \times N}\). We define \({\mathcal {A}}^*(\cdot ): {\mathbb {C}}^M \rightarrow {\mathbb {C}}^N\) as the mapping \( {\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H {\mathbf {n}} + {\mathbf {G}}^H \overline{{\mathbf {n}}}\) for \(\forall {\mathbf {n}} \in {\mathbb {C}}^M\).

Note that \({\mathcal {A}}^*(\cdot )\) is also a real-linear operator, and can be equivalently written in real-linear form as \({\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H {\mathbf {n}} + \overline{({\mathbf {G}}^T {\mathbf {n}})}\) for \(\forall {\mathbf {n}} \in {\mathbb {C}}^M\), where \(^T\) denotes the transpose operation (without conjugation). Interestingly, it can also be shown that \({\mathcal {A}}^*(\cdot )\) matches the definition of the adjoint operator of \({\mathcal {A}}(\cdot )\) from real-linear operator theory (Huhtanen and Ruotsalainen 2011). We believe this is interesting because our definition of \({\mathcal {A}}^*(\cdot )\) was derived independently of adjoint concepts. In addition, while the adjoint of a linear operator is well-known to be useful for solving linear least-squares problems, the adjoint definition used in real-linear operator theory is different from that used in the linear case (Huhtanen and Ruotsalainen 2011), and we had no prior expectations that the real-linear definition of adjoint would be related to the present work.

Lemma 2

Consider a real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) that can be written as the composition \({\mathcal {A}}(\cdot ) = {\mathcal {A}}_2(\cdot ) \circ {\mathcal {A}}_1(\cdot )\) of real-linear operators \({\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^P\) and \({\mathcal {A}}_2(\cdot ): {\mathbb {C}}^P \rightarrow {\mathbb {C}}^M\). Then \({\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathcal {A}}_2^*({\mathbf {n}})))\) for \(\forall {\mathbf {n}} \in {\mathbb {C}}^M\).

Lemma 3

Consider a real-linear operator \({\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) that can be written as the summation \({\mathcal {A}}(\cdot ) = {\mathcal {A}}_1(\cdot ) + {\mathcal {A}}_2(\cdot )\) of real-linear operators \({\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\) and \({\mathcal {A}}_2(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M\). Then \({\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathbf {n}})+{\mathcal {A}}_2^*({\mathbf {n}})\) for \(\forall {\mathbf {n}} \in {\mathbb {C}}^M\).

The proofs of these three lemmas are straightforward, and are given in the appendices. Note that results similar to Lemmas 2 and 3 have been previously described for the adjoints of real-linear operators (Huhtanen and Ruotsalainen 2011), although the previous results were derived in a different context and relied on different concepts and proof techniques than those used in this paper (i.e., our approach is completely free of the inner-product-based real-linear adjoint concepts that were central to previous literature).

When combined together, Lemmas 1-3 completely eliminate the need to derive or work with the real-valued matrix \(\tilde{{\mathbf {A}}}\) in the context of iterative algorithms, because the effects of multiplication with the real-valued matrices \(\tilde{{\mathbf {A}}}\) and \(\tilde{{\mathbf {A}}}^H\) can be obtained equivalently using the complex-valued nonlinear operators \({\mathcal {A}}(\cdot )\) and \({\mathcal {A}}^*(\cdot )\). This can also lead to computational savings, since e.g., computing \(\mathrm {real}({\mathcal {A}}({\mathbf {m}}))\) and \(\mathrm {imag}({\mathcal {A}}({\mathbf {m}}))\) (as needed for computing multiplication of the matrix \(\tilde{{\mathbf {A}}}\) with a vector using Eq. (28)) only requires a single call to the function that computes \({\mathcal {A}}({\mathbf {m}})\). Likewise, computing multiplication of the matrix \(\tilde{{\mathbf {A}}}^H\) with a vector only requires a single call to the function that computes \({\mathcal {A}}^*(\cdot )\). And further, if \({\mathcal {A}}(\cdot )\) is represented as a complicated summation and/or composition of real-linear operators, we can rely on Properties 5 and 6 and Lemmas 2 and 3 to build \({\mathcal {A}}^*(\cdot )\) incrementally from the individual constituent operators, rather than needing to use Definition 4 (which would require \({\mathcal {A}}(\cdot )\) to be simplified so that it can be decomposed using \({\mathbf {F}}\) and \({\mathbf {G}}\) matrices).

As a consequence of these lemmas, it is, e.g., possible to replace the real-valued Landweber iteration from Algorithm 1 with the simpler complex-valued iteration given by Algorithm 3.

figure c

With infinite numerical precision, Algorithm 3 will produce the exact same sequence of iterates as Algorithm 1, and will therefore have the exact same global convergence guarantees stated previously for Landweber iteration.

We can make similar modifications to the CG algorithm from Algorithm 2, although need the following additional property to be able to correctly handle the inner-products appearing in the CG algorithm.

Property 7

Consider arbitrary vectors \({\mathbf {p}},{\mathbf {q}} \in {\mathbb {C}}^N\), which are decomposed into their real and imaginary components according to \({\mathbf {p}} = {\mathbf {p}}_r + i{\mathbf {p}}_i\) and \({\mathbf {q}} = {\mathbf {q}}_r + i {\mathbf {q}}_i\), with \({\mathbf {p}}_r,{\mathbf {p}}_i,{\mathbf {q}}_r,{\mathbf {q}}_i \in {\mathbb {R}}^N\). Define \(\tilde{{\mathbf {p}}},\tilde{{\mathbf {q}}} \in {\mathbb {R}}^{2N}\) according to

$$\begin{aligned} \tilde{{\mathbf {p}}} = \begin{bmatrix} {\mathbf {p}}_r \\ {\mathbf {p}}_i \end{bmatrix} \text { and } \tilde{{\mathbf {q}}} = \begin{bmatrix} {\mathbf {q}}_r \\ {\mathbf {q}}_i \end{bmatrix} \end{aligned}$$
(30)

Then \(\tilde{{\mathbf {p}}}^H\tilde{{\mathbf {q}}} = \mathrm {real}({\mathbf {p}}^H{\mathbf {q}})\).

Combining this property with the previous lemmas leads to the simple complex-valued iteration for the CG algorithm given by Algorithm 4.

figure d

While we have only shown complex-valued adaptations of the Landweber and CG algorithms, this same approach is easily applied to other related algorithms like LSQR (Paige and Saunders 1982).

4 Useful relations for common real-linear operators

Before demonstrating the empirical characteristics of our proposed new approach, we believe that our proposed framework will be easier to use if we enumerated some of the most common real-linear \({\mathcal {A}}(\cdot )\) operators and their corresponding \({\mathcal {A}}^*(\cdot )\) operators. Such a list is provided in Table 1.

Table 1 Table of common real-linear \({\mathcal {A}}(\cdot )\) operators and corresponding \({\mathcal {A}}^*(\cdot )\) operators. We also provide expressions for \({\mathcal {A}}^*({\mathcal {A}}(\cdot ))\) in cases where the combined operator takes a simpler form than applying each operator sequentially. In the last two rows, it is assumed that the matrix \({\mathbf {A}} \in {\mathbb {C}}^{M_1 \times N}\), and that the vector \({\mathbf {y}} \in {\mathbb {C}}^M\) is divided into two components \({\mathbf {y}}_1 \in {\mathbb {C}}^{M_1}\) and \({\mathbf {y}}_2 \in {\mathbb {C}}^{M-M_1}\) with \({\mathbf {y}} = \begin{bmatrix} {\mathbf {y}}_1^T&{\mathbf {y}}_2^T \end{bmatrix}^T\). In the last row, we take \({\mathcal {B}}({\mathbf {x}}) \triangleq {\mathbf {C}}{\mathbf {x}} - {\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})}\), with corresponding \({\mathcal {B}}^*({\mathbf {y}}) = {\mathbf {C}}^H{\mathbf {y}} - {\mathbf {E}}^H \overline{({\mathbf {D}}^H{\mathbf {y}})} \). Note that a special case of equivalent complex-valued operators associated with Eq. (7) (with \({\mathbf {B}}\) chosen as the identity matrix) was previously presented by Ref. (Bydder and Robson 2005), although without the more general real-linear mathematical framework developed in this work

5 Numerical example

To demonstrate the potential benefits of our proposed complex-valued approach, we will consider an instance of the problem described by Eq. (9). In this case, the use of complex-valued operations can lead to both a simpler problem formulation and faster numerical computations.

To address simplicity, we hope that it is obvious by inspection that the process of deriving \(\tilde{{\mathbf {A}}}\) for this case (as given in Eq. (23), and needed for the conventional real-valued iterative computations) was non-trivial and labor-intensive, while the derivation of \({\mathcal {A}}(\cdot )\) and \({\mathcal {A}}^*(\cdot )\) (as given in Table 1, and needed for the proposed new complex-valued iterative computations) was comparatively fast and easy.

To address the computational benefits of the proposed approach, we will consider a specific realization of Eq. (9), in which \({\mathbf {x}} \in {\mathbb {C}}^{1000}\), \({\mathbf {n}} \in {\mathbb {C}}^{20000}\), \({\mathbf {A}} \in {\mathbb {C}}^{20000\times 1000}\), \({\mathbf {C}} \in {\mathbb {C}}^{30000 \times 1000}\), \({\mathbf {D}} \in {\mathbb {C}}^{30000 \times 2000}\), and \({\mathbf {E}} \in {\mathbb {C}}^{2000 \times 1000}\), with the real and imaginary parts of all of these vectors and matrices drawn at random from the i.i.d. Gaussian distribution. We then took \({\mathbf {b}} = {\mathbf {A}}{\mathbf {x}} + {\mathbf {n}}\), and set \(\lambda = 10^{-3}\). For this random problem instance, we find the optimal nonlinear least-squares solution in four distinct ways:

  • Conventional real-valued approach with matrices We assume that \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) are available to us in matrix form, such that it is straightforward to directly precompute the real-valued matrix \(\tilde{{\mathbf {A}}} \in {\mathbb {R}}^{100000\times 2000}\) from Eq. (23). We then use this precomputed matrix directly in iterative linear least-squares solution algorithms like Landweber iteration, CG, and LSQR. Although the form of this \(\tilde{{\mathbf {A}}}\) matrix was complicated to derive, multiplications with the precomputed \(\tilde{{\mathbf {A}}}\) and \(\tilde{{\mathbf {A}}}^H\) matrices within each iteration should be very computationally efficient, particularly since we have taken 4 separate complex-valued matrices \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) that were originally specified by a sum total of \(1.12\times 10^8\) complex-valued entries (\(2.24 \times 10^8\) real numbers), and replaced them with a single real-valued matrix specified by only \(2\times 10^8\) real numbers. Every matrix-vector multiplications involving \(\tilde{{\mathbf {A}}}\) or \(\tilde{{\mathbf {A}}}^H\) will therefore require \(2\times 10^8\) real-valued scalar multiplications.

  • Proposed complex-valued approach with matrices As in the previous case, we assume that \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) are available to us in matrix form, which allows us to directly form the \({\mathbf {F}}\) and \({\mathbf {G}}\) matrices corresponding to the complex-valued real-linear formulation of the problem. Specifically, \({\mathbf {F}}\) was formed as

    $$\begin{aligned} {\mathbf {F}} = \begin{bmatrix} {\mathbf {A}} \\ \sqrt{\lambda } {\mathbf {C}} \end{bmatrix} \end{aligned}$$
    (31)

    and \({\mathbf {G}}\) was formed as

    $$\begin{aligned} {\mathbf {G}} = \begin{bmatrix} {{\mathbf {0}}} \\ -\sqrt{\lambda } \, \overline{{\mathbf {D}}}{\mathbf {E}} \end{bmatrix}. \end{aligned}$$
    (32)

    We then used these precomputed matrices to evaluate \({\mathcal {A}}(\cdot )\) and \({\mathcal {A}}^*(\cdot )\) as needed in our proposed complex-valued iterative algorithms. These precomputed \({\mathbf {F}}\) and \({\mathbf {G}}\) matrices are each specified by \(5\times 10^7\) complex-valued entries (i.e., a sum total of \(1\times 10^8\) complex-valued entries or \(2\times 10^8\) real-valued entries considering \({\mathbf {F}}\) and \({\mathbf {G}}\) together). However, since one complex-valued scalar multiplication corresponds to four real-valued scalar multiplications, applying the \({\mathcal {A}}(\cdot )\) or \({\mathcal {A}}^*(\cdot )\) operators to vectors will require \(4\times 10^8\) real-valued scalar multiplications, which is twice the number of multiplications required in the previous case.

  • Conventional real-valued approach with function calls We assume that we do not have direct access to the \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) matrices, but are only given blackbox functions that calculate matrix-vector multiplications with these matrices and their conjugate transposes. As such, we implement matrix-vector multiplication with \(\tilde{{\mathbf {A}}}\) (and similarly for \(\tilde{{\mathbf {A}}}^H\)) naively in each iteration of the conventional iterative linear least-squares solution algorithms, using multiple calls to each of these functions as described in Sect. 2.2. This approach is not expected to be computationally efficient given the large number of function calls, although is simpler to implement than more advanced approaches that might be developed to exploit redundant computations within Eq. (23).

  • Proposed complex-valued approach with function calls As in the previous case, we assume that we do not have direct access to the \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\) matrices, but are only given blackbox functions that calculate matrix-vector multiplications with these matrices and their conjugate transposes. We implement the proposed complex-valued iterative algorithms using the techniques described in Sect. 3, using the expressions for \({\mathcal {A}}(\cdot )\) and \({\mathcal {A}}^*(\cdot )\) given in Table 1. In this case, applying the \({\mathcal {A}}(\cdot )\) or \({\mathcal {A}}^*(\cdot )\) operators to vectors will require \(1.12 \times 10^8\) complex-valued scalar multiplications (equal to sum total of the number of entries in \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\), which corresponds to \(4.48 \times 10^8\) real-valued scalar multiplications (slightly larger than the proposed complex-valued approach with matrices).

For the sake of reproducible research, Matlab code corresponding to this example is included as supplementary material.

Fig. 1
figure 1

Results for Landweber iteration. The plots show the total number of multiplications, the normalized cost function value (normalized so that the initial value is 1), the computation time in seconds, and the relative difference between the solution from the conventional method with matrices and solutions obtained with other methods

Fig. 2
figure 2

Results for the conjugate gradient algorithm. The plots show the total number of multiplications, the normalized cost function value (normalized so that the initial value is 1), the computation time in seconds, and the relative difference between the solution from the conventional method with matrices and solutions obtained with other methods

For each case, we ran 50 iterations of Landweber iteration and 15 iterations of CG and LSQR in MATLAB 2018b, on a system with an Intel Core i7-8700K 3.70 GHz CPU processor. For each approach, each algorithm, and at each iteration, we computed (1) the total cumulative number of real-valued scalar multiplications (with 1 complex-valued scalar multiplication equal to 4 real-valued scalar multiplications) used by the algorithm thus far; (2) the cost function value from Eq. (9) using the current estimate (either \({\mathbf {x}}_k\) or \(\tilde{{\mathbf {x}}}_k\)); (3) the total computation time in seconds; and (4) the relative \(\ell _2\)-norm difference between the \({\mathbf {x}}_k\) value estimated from the proposed method with function calls and the other methods, where we define the relative \(\ell _2\)-norm difference between arbitrary vectors \({\mathbf {p}}\) and \({\mathbf {q}}\) as \(\Vert {\mathbf {p}}-{\mathbf {q}}\Vert _2/\Vert \frac{1}{2}{\mathbf {p}}+\frac{1}{2}{\mathbf {q}}\Vert _2\). To minimize random fluctuations in computation speed due to background processing, the computation times we report represent the average of 15 different identical trials.

Results for Landweber iteration and the CG algorithm are reported in Figs. 1 and 2, respectively. The results for LSQR were quite similar to the results for CG (as expected theoretically Paige and Saunders 1982), and have been omitted. Results confirm that, as should be expected from the theory, all of the different approaches yield virtually identical cost function values and virtually identical solution estimates \({\mathbf {x}}_k/\tilde{{\mathbf {x}}}_k\) at each iteration for each of the different algorithms. There are some very minor differences on the order of \(10^{-15}\), which can be attributed to numerical effects resulting from finite-precision arithmetic. In terms of computational complexity, we observe that the matrix-based approaches are generally associated with fewer multiplications than the implementations that use function calls, which should be expected because the matrix-based approaches were able to precompute simpler consolidated matrix representations [e.g., the \(\tilde{{\mathbf {A}}}\) matrix from Eq. (21) or the \({\mathbf {F}}\) and \({\mathbf {G}}\) matrices from Eqs. (31) and (32)] that were not available to the function call approaches.

The proposed approaches required a moderate number of multiplications, somewhat intermediate between the conventional approach with matrices (which had the fewest multiplications, as expected based on our previous discussion of the efficiency of the conventional approach with precomputation) and the conventional approach with function calls (which had the most multiplications). However, in terms of actual computation time, we observe that the conventional approach with function calls was much slower than any of the other three methods, while the other three methods were all similar to one another. It is perhaps surprising that the computation times are not directly proportional to the number of multiplications, although this discrepancy is likely related to MATLAB’s use of efficient parallelized matrix multiplication libraries. Importantly, we observe that both variations of the proposed approach are quite fast, and have computation times that are quite similar to the conventional real-valued approach with matrices (which, as we mentioned, was expected to have excellent computational efficiency). There was negligible difference between the computation times assocociated with matrices and function call implementations of the proposed method, which was definitely not the case for the conventional approaches. And in terms of implementation, the proposed approach with function calls was the easiest to implement, since it didn’t require us to derive the forms of any special matrices like \(\tilde{{\mathbf {A}}}\), \({\mathbf {F}}\), or \({\mathbf {G}}\), we could just directly work with the individual original matrices \({\mathbf {A}}\), \({\mathbf {C}}\), \({\mathbf {D}}\), and \({\mathbf {E}}\).

6 Conclusion

This work proposed a new approach to solving nonlinear least-squares problems involving real-linear operators. The new approach allows the use of the original complex-valued operators without transforming them into an unwieldy real-valued form. Theoretically, the approach enables identical iterative results as the conventional real-valued transformation, but with much simpler implementation options and potentially much faster computations. We expect the proposed approach to be valuable for solving general complex-valued nonlinear least-squares problems involving real-linear operators. Note that the proposed complex-valued approach is also an integral but previously-undescribed component of the most recent version of an open-source MRI reconstruction software package released by the authors (Kim and Haldar 2018).