Efficient iterative solutions to complex-valued nonlinear least-squares problems with mixed linear and antilinear operators

Kim, Tae Hyung; Haldar, Justin P.

doi:10.1007/s11081-021-09604-4

Efficient iterative solutions to complex-valued nonlinear least-squares problems with mixed linear and antilinear operators

Research Article
Open access
Published: 13 March 2021

Volume 23, pages 749–768, (2022)
Cite this article

Download PDF

You have full access to this open access article

Optimization and Engineering Aims and scope Submit manuscript

Efficient iterative solutions to complex-valued nonlinear least-squares problems with mixed linear and antilinear operators

Download PDF

Tae Hyung Kim¹ &
Justin P. Haldar¹

2976 Accesses
4 Citations
Explore all metrics

A Correction to this article was published on 04 May 2021

This article has been updated

Abstract

We consider a setting in which it is desired to find an optimal complex vector ${\mathbf {x}}\in {\mathbb {C}}^N$ that satisfies ${\mathcal {A}}({\mathbf {x}}) \approx {\mathbf {b}}$ in a least-squares sense, where ${\mathbf {b}} \in {\mathbb {C}}^M$ is a data vector (possibly noise-corrupted), and ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ is a measurement operator. If ${\mathcal {A}}(\cdot )$ were linear, this reduces to the classical linear least-squares problem, which has a well-known analytic solution as well as powerful iterative solution algorithms. However, instead of linear least-squares, this work considers the more complicated scenario where ${\mathcal {A}}(\cdot )$ is nonlinear, but can be represented as the summation and/or composition of some operators that are linear and some operators that are antilinear. Some common nonlinear operations that have this structure include complex conjugation or taking the real-part or imaginary-part of a complex vector. Previous literature has shown that this kind of mixed linear/antilinear least-squares problem can be mapped into a linear least-squares problem by considering ${\mathbf {x}}$ as a vector in ${\mathbb {R}}^{2N}$ instead of ${\mathbb {C}}^N$. While this approach is valid, the replacement of the original complex-valued optimization problem with a real-valued optimization problem can be complicated to implement, and can also be associated with increased computational complexity. In this work, we describe theory and computational methods that enable mixed linear/antilinear least-squares problems to be solved iteratively using standard linear least-squares tools, while retaining all of the complex-valued structure of the original inverse problem. An illustration is provided to demonstrate that this approach can simplify the implementation and reduce the computational complexity of iterative solution algorithms.

Least-squares Hermitian problem of complex matrix equation $(AXB,CXD)=(E,F)$

Article Open access 22 November 2016

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Article 18 March 2016

Operator Least Squares Problems and Moore–Penrose Inverses in Krein Spaces

Article 07 May 2018

1 Introduction

Consider a generic complex-valued finite-dimensional inverse problem scenario in which the forward model is represented as

$$\begin{aligned} {\mathbf {b}} = {\mathcal {A}}({\mathbf {x}}) + {\mathbf {n}}, \end{aligned}$$

(1)

where ${\mathbf {b}} \in {\mathbb {C}}^M$ represents the measured data, ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ is the measurement operator, ${\mathbf {n}} \in {\mathbb {C}}^M$ represents noise, and ${\mathbf {x}} \in {\mathbb {C}}^N$ represents the unknown signal that we wish to estimate based on knowledge of ${\mathbf {b}}$ and ${\mathcal {A}}(\cdot )$. A common approach to solving this inverse problem is to find a least-squares solution

$$\begin{aligned} \hat{{\mathbf {x}}} = \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathcal {A}}({\mathbf {x}}) - {\mathbf {b}}\Vert _2^2, \end{aligned}$$

(2)

where $\Vert \cdot \Vert _2$ denotes the standard $\ell _2$-norm. This choice of formulation can be justified in multiple ways, and e.g., corresponds to the optimal maximum likelihood estimator when the noise vector ${\mathbf {n}}$ is independent and identically-distributed (i.i.d.) Gaussian noise (Kay 1993). Even for more complicated noise statistics that follow, e.g., the Poisson, Rician, or non-Central Chi distributions, there exist iterative methods that allow the maximum likelihood estimator to be obtained by iteratively solving a sequence of least-squares objective functions (Erdogan and Fessler 1999; Fessler and Erdogan 1998; Varadarajan and Haldar 2015). In addition, another reason for the popularity of least-squares is that the optimization problem is frequently very easy to solve. For example, in the case where ${\mathcal {A}}(\cdot )$ is a linear operator (i.e., ${\mathcal {A}}(\cdot )$ can be represented in an equivalent matrix form as ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {A}}{\mathbf {x}}$ for some matrix ${\mathbf {A}}\in {\mathbb {C}}^{M \times N}$) with a trivial nullspace, the solution to Eq. (2) has the analytic closed-form expression (Luenberger 1969)

$$\begin{aligned} \hat{{\mathbf {x}}} = ({\mathbf {A}}^H{\mathbf {A}})^{-1}{\mathbf {A}}^H {\mathbf {b}}, \end{aligned}$$

(3)

where $^H$ denotes the conjugate-transpose operation. In large-scale problems where N is very large, the matrix inversion in Eq. (3) may be computationally intractable, although there exist a variety of simple iterative algorithms that are guaranteed to converge to a globally-optimal solution, including Landweber iteration (Landweber 1951), the conjugate gradient (CG) algorithm (Hestenes and Stiefel 1952), and LSQR (Paige and Saunders 1982).

Instead of assuming linearity, we focus in this work on solving least-squares problems in the scenario where ${\mathcal {A}}(\cdot )$ is nonlinear, but can be represented as the summation and/or composition of some operators that are linear and some operators that are antilinear. Such nonlinear operators have sometimes been termed as real-linear operators in mathematical physics (Huhtanen and Ruotsalainen 2011). Important common examples of operators that possess this kind of nonlinear structure include the complex-conjugation operator

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \overline{{\mathbf {x}}}, \end{aligned}$$

(4)

the operator that takes the real part of a complex vector

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \mathrm {real}({\mathbf {x}}) \triangleq \frac{1}{2}{\mathbf {x}} + \frac{1}{2}\overline{{\mathbf {x}}}, \end{aligned}$$

(5)

and the operator that takes the imaginary part of a complex vector

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}}) = \mathrm {imag}({\mathbf {x}}) \triangleq \frac{1}{2i}{\mathbf {x}} - \frac{1}{2i}\overline{{\mathbf {x}}}. \end{aligned}$$

(6)

Even though the descriptions we present in this paper are generally applicable to arbitrary real-linear operators, we were initially motivated to consider such operators because of specific applications in magnetic resonance imaging (MRI) reconstruction. In particular, MRI images are complex-valued, and real-linear operators have previously been used to incorporate prior information about the image phase characteristics into the image reconstruction process, which helps to regularize/stabilize the solution when the inverse problem is ill posed. For example, there is a line of research within MRI that poses phase-constrained image reconstruction as either (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Markovsky 2011)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 + \lambda \Vert \mathrm {imag}({\mathbf {B}}{\mathbf {x}})\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}{\mathbf {x}} \\ \sqrt{\lambda } \cdot \mathrm {imag}({\mathbf {B}}{\mathbf {x}}) \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2, \end{aligned} \end{aligned}$$

(7)

or (Bydder 2010)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {R}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 \\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}\mathrm {real}({\mathbf {x}}) - {\mathbf {b}}\Vert _2^2 + \Vert \mathrm {imag}({\mathbf {x}})\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}\mathrm {real}({\mathbf {x}}) \\ \mathrm {imag}({\mathbf {x}}) \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2. \end{aligned} \end{aligned}$$

(8)

In the case of Eq. (7), $\lambda \in {\mathbb {R}}$ is a positive regularization parameter and the matrix ${\mathbf {B}}$ embeds prior information about the image phase such that the regularization encourages ${\mathbf {B}}{\mathbf {x}}$ to be real-valued. In the case of Eq. (8), the phase information is embedded in the matrix ${\mathbf {A}}$, such that the vector ${\mathbf {x}}$ is strictly forced to be real-valued. Another line of research within MRI instead imposes phase constraints by leveraging linear predictability and the conjugate-symmetry characteristics of the Fourier transform, leading to an inverse problem formulation that can take the general form (Haldar 2014, Haldar and Setsompop 2020, Haldar 2015, Kim and Haldar 2018)

$$\begin{aligned} \begin{aligned} \hat{{\mathbf {x}}}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {A}}{\mathbf {x}} - {\mathbf {b}}\Vert _2^2 + \lambda \Vert {\mathbf {C}}{\mathbf {x}} - {\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})}\Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \left\| \begin{bmatrix} {\mathbf {A}}{\mathbf {x}} \\ \sqrt{\lambda }{\mathbf {C}}{\mathbf {x}} - \sqrt{\lambda }{\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})} \end{bmatrix} - \begin{bmatrix} {\mathbf {b}} \\ {{\mathbf {0}}} \end{bmatrix} \right\| _2^2, \end{aligned} \end{aligned}$$

(9)

for appropriate matrices ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$.

Although these are nonlinear least-squares problems because the operators involved are nonlinear, previous work has benefitted from the fact that this kind of inverse problem can be transformed into an equivalent higher-dimensional real-valued linear least-squares problem (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018). Specifically, this can be done by replacing all complex-valued quantities with real-valued quantities, e.g., separating ${\mathbf {x}} \in {\mathbb {C}}^N$ into its real and imaginary components, and treating this as an inverse problem in ${\mathbb {R}}^{2N}$ rather than the original space ${\mathbb {C}}^N$. While this real-valued transformation of the problem is effective and enables the use of standard linear least-squares solution methods, it can also cause computational inefficiencies and can sometimes be difficult to implement when the operators involved have complicated structure.

In this work, we describe theory that enables provably-convergent linear least-squares iterative algorithms to be applied to this nonlinear least-squares problem setting, without requiring a real-valued transformation of the original complex-valued vectors and operators. This can enable both improved computation speed and simplified algorithm implementations.

2 Background

2.1 Linear, antilinear, and real-linear operators

In this section, we briefly summarize some definitions and properties of linear and antilinear operators, with simplifications corresponding to our finite-dimensional problem context. Readers interested in a more detailed and more general treatment are referred to Refs. (Rudin 1991; Huhtanen and Ruotsalainen 2011).

Definition 1

(Linear Operator) An operator ${\mathcal {F}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ is said to be linear (or complex-linear) if it satisfies both additivity

$$\begin{aligned} {\mathcal {F}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {F}}({\mathbf {x}}) + {\mathcal {F}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$

(10)

and homogeneity

$$\begin{aligned} {\mathcal {F}}(\alpha {\mathbf {x}} ) = \alpha {\mathcal {F}}({\mathbf {x}}) \text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {C}}. \end{aligned}$$

(11)

Property 1

For any linear operator ${\mathcal {F}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, there is a unique matrix ${\mathbf {F}} \in {\mathbb {C}}^{M \times N}$ such that ${\mathcal {F}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}}$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$.

Definition 2

(Antilinear Operator) An operator ${\mathcal {G}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ is said to be antilinear (or conjugate-linear) if it satisfies both additivity

$$\begin{aligned} {\mathcal {G}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {G}}({\mathbf {x}}) + {\mathcal {G}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$

(12)

and conjugate homogeneity

$$\begin{aligned} {\mathcal {G}}(\alpha {\mathbf {x}} ) = \overline{\alpha } {\mathcal {G}}({\mathbf {x}})\text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {C}}. \end{aligned}$$

(13)

Property 2

For any antilinear operator ${\mathcal {G}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, there is a unique matrix ${\mathbf {G}} \in {\mathbb {C}}^{M \times N}$ such that ${\mathcal {G}}({\mathbf {x}}) = \overline{({\mathbf {G}}{\mathbf {x}})}$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$.

Note that by taking the matrix ${\mathbf {G}}$ as the identity matrix, we observe that applying complex conjugation $\overline{{\mathbf {x}}}$ is an antilinear operation on the vector ${\mathbf {x}}$.

Definition 3

(Real-Linear Operator) An operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ is said to be real-linear if it satisfies both additivity

$$\begin{aligned} {\mathcal {A}}({\mathbf {x}} + {\mathbf {y}}) = {\mathcal {A}}({\mathbf {x}}) + {\mathcal {A}}({\mathbf {y}}) \text { for } \forall {\mathbf {x}},{\mathbf {y}} \in {\mathbb {C}}^N \end{aligned}$$

(14)

and homogeneity with respect to real-valued scalars

$$\begin{aligned} {\mathcal {A}}(\alpha {\mathbf {x}} ) = \alpha {\mathcal {A}}({\mathbf {x}}) \text { for } \forall {\mathbf {x}} \in {\mathbb {C}}^N, \forall \alpha \in {\mathbb {R}}. \end{aligned}$$

(15)

Real-linearity is a generalization of both linearity and antilinearity, as can be seen from the following property.

Property 3

Every real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ can be uniquely decomposed as the sum of a linear operator and an antilinear operator. In particular, ${\mathcal {A}}({\mathbf {x}}) = {\mathcal {F}}({\mathbf {x}}) + {{\mathcal {G}}}({\mathbf {x}})$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$, where ${\mathcal {F}}(\cdot ): {\mathbb {C}}^N\rightarrow {\mathbb {C}}^M$ is the linear operator defined by

$$\begin{aligned} {\mathcal {F}}({\mathbf {x}}) \triangleq \frac{1}{2} {\mathcal {A}}({\mathbf {x}}) - \frac{i}{2} {\mathcal {A}}(i{\mathbf {x}}) \end{aligned}$$

(16)

and ${\mathcal {G}}(\cdot ): {\mathbb {C}}^N\rightarrow {\mathbb {C}}^M$ is the antilinear operator defined by

$$\begin{aligned} {\mathcal {G}}({\mathbf {x}}) \triangleq \frac{1}{2} {\mathcal {A}}({\mathbf {x}}) + \frac{i}{2} {\mathcal {A}}(i{\mathbf {x}}). \end{aligned}$$

(17)

Property 4

For any real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, there are unique matrices ${\mathbf {F}},{\mathbf {G}} \in {\mathbb {C}}^{M \times N}$ such that ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$.

Notably, both the $\mathrm {real}(\cdot )$ and $\mathrm {imag}(\cdot )$ operators from Eqs. (5) to (6) are observed to have real-linear form.

Property 5

For any two real-linear operators ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ and ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, their sum ${\mathcal {A}}_1(\cdot ) + {\mathcal {A}}_2(\cdot )$ is also a real-linear operator.

Property 6

For any two real-linear operators ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^P$ and ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^P \rightarrow {\mathbb {C}}^M$, their composition ${\mathcal {A}}_2(\cdot ) \circ {\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M \triangleq {\mathcal {A}}_2({\mathcal {A}}_1(\cdot )) )$ is also a real-linear operator.

As can be seen, any operator that can be represented as the summation and/or composition of some operators that are linear and some operators that are antilinear can be viewed as a real-linear operator. As a result, the scenarios of interest in this paper all involve real-linear operators, and the remainder of this paper will assume that ${\mathcal {A}}(\cdot )$ obeys real-linearity, and has been decomposed in matrix form as ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}$.

2.2 Real-valued transformation of complex-valued least squares

Assuming ${\mathcal {A}}(\cdot )$ is real-linear as described in the previous subsection, Eq. (2) can be rewritten as

$$\begin{aligned} \hat{{\mathbf {x}}} = \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}} \in {\mathbb {C}}^N} \Vert {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})} - {\mathbf {b}} \Vert _2^2, \end{aligned}$$

(18)

which is a nonlinear least squares problem. However, as stated in the introduction, previous work (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018) has transformed this problem into the form of a conventional linear least-squares problem by treating the variable ${\mathbf {x}}$ as an element of ${\mathbb {R}}^{2N}$ instead of ${\mathbb {C}}^N$. This was achieved by rewriting ${\mathbf {x}} \in {\mathbb {C}}^N$ as ${\mathbf {x}} = {\mathbf {x}}_r + i {\mathbf {x}}_i$, where the real-valued vectors ${\mathbf {x}}_r, {\mathbf {x}}_i \in {\mathbb {R}}^N$ represent the real and imaginary components of ${\mathbf {x}}$. This allows us to equivalently rewrite the solution to Eq. (18) as $\hat{{\mathbf {x}}} = \hat{{\mathbf {x}}}_r + i \hat{{\mathbf {x}}}_i$, with

$$\begin{aligned} \begin{aligned} \{\hat{{\mathbf {x}}}_r,\hat{{\mathbf {x}}}_i\}&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}}_r,{\mathbf {x}}_i \in {\mathbb {R}}^N}\Vert {\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}} \Vert _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{{\mathbf {x}}_r,{\mathbf {x}}_i \in {\mathbb {R}}^N}\left\| \begin{bmatrix} \mathrm {real}({\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}}) \\ \mathrm {imag}({\mathbf {F}}{\mathbf {x}}_r + i{\mathbf {F}}{\mathbf {x}}_i + \overline{{\mathbf {G}}} {\mathbf {x}}_r -i \overline{{\mathbf {G}}} {\mathbf {x}}_i - {\mathbf {b}}) \end{bmatrix} \right\| _2^2\\&= \mathop {{{\,\mathrm{\arg \min }\,}}}\limits _{\tilde{{\mathbf {x}}} \in {\mathbb {R}}^{2N}} \left\| \tilde{{\mathbf {A}}} \tilde{{\mathbf {x}}} - \tilde{{\mathbf {b}}} \right\| _2^2, \end{aligned} \end{aligned}$$

(19)

where

$$\begin{aligned}&\tilde{{\mathbf {x}}} \triangleq \begin{bmatrix} {\mathbf {x}}_r \\ {\mathbf {x}}_i \end{bmatrix} \in {\mathbb {R}}^{2N}, \end{aligned}$$

(20)

$$\begin{aligned}&\tilde{{\mathbf {A}}} \triangleq \begin{bmatrix} \mathrm {real}({\mathbf {F}}) + \mathrm {real}({\mathbf {G}}) &{} - \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) \\ \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) &{} \mathrm {real}({\mathbf {F}}) - \mathrm {real}({\mathbf {G}}) \end{bmatrix} \in {\mathbb {R}}^{2M \times 2N}, \end{aligned}$$

(21)

and

$$\begin{aligned} \tilde{{\mathbf {b}}} \triangleq \begin{bmatrix} \mathrm {real}({\mathbf {b}}) \\ \mathrm {imag}({\mathbf {b}}) \end{bmatrix} \in {\mathbb {R}}^{2M}. \end{aligned}$$

(22)

The final expression in Eq. (19) has the form of a standard real-valued linear least-squares problem, and therefore can be solved using any of the linear least-squares solution methods described in the introduction. For example, the Landweber iteration (Landweber 1951) applied to this problem would proceed as given in Algorithm 1, and with infinite numerical precision, $\hat{{\mathbf {x}}}_k$ is guaranteed to converge to a globally optimal solution as $k \rightarrow \infty $ whenever $0< \alpha < 2/ \Vert \tilde{{\mathbf {A}}}\Vert _2^2$.

As another example, the CG algorithm (Hestenes and Stiefel 1952) applied to this problem would proceed as given in Algorithm 2, and with infinite numerical precision, $\hat{{\mathbf {x}}}_k$ would be guaranteed to converge to a globally optimal solution after at most 2N iterations.

Compared to the analytic linear least-squares solution corresponding to Eq. (3), these iterative algorithms are generally useful for larger-scale problems where the matrix $\tilde{{\mathbf {A}}}$ may be too large to store in memory, and where the matrix has structure so that matrix-vector multiplications with $\tilde{{\mathbf {A}}}$ and $\tilde{{\mathbf {A}}}^H$ can be computed quickly using specially-coded function calls rather than working with actual matrix representations (e.g., if $\tilde{{\mathbf {A}}}$ has convolution structure so that matrix-vector multiplication can be implemented using the Fast Fourier Transform, if $\tilde{{\mathbf {A}}}$ is sparse, etc.).

Although the problem transformation from Eq. (19) has been widely used (Bydder and Robson 2005; Willig-Onwuachi et al. 2005; Lew et al. 2007; Hoge et al. 2007; Haldar et al. 2013; Blaimer et al. 2016; Haldar 2014; Haldar and Setsompop 2020; Haldar 2015; Kim and Haldar 2018), it can also be cumbersome to work with if the operator ${\mathcal {A}}(\cdot )$ has more complicated structure. For example, the optimization problem in Eq. (9) involves the composition of linear and antilinear operators, and the $\tilde{{\mathbf {A}}}$ matrix corresponding to this case has a complicated structure that is laborious to derive. In particular, with much manipulation, the matrix for this case can be derived to be

$$\begin{aligned} \tilde{{\mathbf {A}}} = \begin{bmatrix} \mathrm {real}({\mathbf {A}}) &{} -\mathrm {imag}({\mathbf {A}}) \\ {\mathbf {H}}_{11} &{} {\mathbf {H}}_{12} \\ \mathrm {imag}({\mathbf {A}}) &{} \mathrm {real}({\mathbf {A}}) \\ {\mathbf {H}}_{21} &{} {\mathbf {H}}_{22} \end{bmatrix}, \end{aligned}$$

(23)

with

$$\begin{aligned}&{\mathbf {H}}_{11} = \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {C}})- \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) -\sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}), \end{aligned}$$

(24)

$$\begin{aligned}&{\mathbf {H}}_{12} = -\sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {C}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) - \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) , \end{aligned}$$

(25)

$$\begin{aligned}&{\mathbf {H}}_{21} = \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {C}}) - \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {real}({\mathbf {E}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) , \end{aligned}$$

(26)

and

$$\begin{aligned} {\mathbf {H}}_{22} = \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {C}}) + \sqrt{\lambda }\cdot \mathrm {imag}({\mathbf {D}})\mathrm {imag}({\mathbf {E}}) + \sqrt{\lambda }\cdot \mathrm {real}({\mathbf {D}})\mathrm {real}({\mathbf {E}}). \end{aligned}$$

(27)

Of course, Eq. (9) relies on a relatively simple mixture of linear and antilinear operators, and problems involving more complicated mixtures would be even more laborious to derive.

Beyond just the effort required to compute the general form of $\tilde{{\mathbf {A}}}$, it can also be computationally expensive to try to use this type of expression in an iterative algorithm, particularly when the different operators have been implemented as specially-coded function calls. For example, if we were not given the actual matrix representations of ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ in Eq. (23) and only had function calls that implemented matrix-vector multiplication with these matrices, then a naive implementation of matrix multiplication between $\tilde{{\mathbf {A}}}$ and a vector would require 4 calls to the function that computes multiplication with ${\mathbf {A}}$ (e.g., to compute $\mathrm {real}({\mathbf {A}}){\mathbf {r}}$ for an arbitrary real-valued vector ${\mathbf {r}} \in {\mathbb {R}}^N$, we could instead compute the complex-valued matrix-vector multiplication function call to obtain ${\mathbf {s}} = {\mathbf {A}}{\mathbf {r}}$, and then use $\mathrm {real}({\mathbf {A}}){\mathbf {r}} = \mathrm {real}({\mathbf {s}})$, with an analogous approach for computing $\mathrm {imag}({\mathbf {A}}){\mathbf {t}}$ for an arbitrary real-valued vector ${\mathbf {t}} \in {\mathbb {R}}^N$), 4 calls to the function that computes multiplication with ${\mathbf {C}}$, 8 calls to the function that computes multiplication with ${\mathbf {D}}$, and 8 calls to the function that computes multiplication with ${\mathbf {E}}$. This relatively large number of function calls represents a substantial increase in computational complexity compared to a standard evaluation of the complex-valued forward model, which would only require the use of one function call for each operator. Of course, this number of computations is based on a standard naive implementation, and additional careful manipulations could be used to reduce these numbers of function calls by exploiting redundant computations – however, this would contribute further to the laborious nature of deriving the form of $\tilde{{\mathbf {A}}}$.

3 Main results

Our main results are given by the following lemmas, which enable the use of the real-valued linear least-squares framework from Sect. 2.2 while relying entirely on complex-valued representations and computations.

Lemma 1

Consider a real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, with corresponding $\tilde{{\mathbf {A}}}$ matrix as defined in Eq. (21). Also consider arbitrary vectors ${\mathbf {m}} \in {\mathbb {C}}^N$ and ${\mathbf {n}} \in {\mathbb {C}}^M$, which are decomposed into their real and imaginary components according to ${\mathbf {m}} = {\mathbf {m}}_r + i{\mathbf {m}}_i$ and ${\mathbf {n}} = {\mathbf {n}}_r + i {\mathbf {n}}_i$, with ${\mathbf {m}}_r,{\mathbf {m}}_i \in {\mathbb {R}}^N$ and ${\mathbf {n}}_r,{\mathbf {n}}_i \in {\mathbb {R}}^M$. Then

$$\begin{aligned} \tilde{{\mathbf {A}}} \begin{bmatrix} {\mathbf {m}}_r \\ {\mathbf {m}}_i \end{bmatrix} = \begin{bmatrix} \mathrm {real}({\mathcal {A}}({\mathbf {m}})) \\ \mathrm {imag}({\mathcal {A}}({\mathbf {m}})) \end{bmatrix} \end{aligned}$$

(28)

and

$$\begin{aligned} \tilde{{\mathbf {A}}}^H \begin{bmatrix} {\mathbf {n}}_r \\ {\mathbf {n}}_i \end{bmatrix} = \begin{bmatrix} \mathrm {real}({\mathcal {A}}^*({\mathbf {n}})) \\ \mathrm {imag}({\mathcal {A}}^*({\mathbf {n}})) \end{bmatrix}, \end{aligned}$$

(29)

with ${\mathcal {A}}^*(\cdot )$ defined below.

Definition 4

(${\mathcal {A}}^*(\cdot )$]) Consider a real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$, which is represented for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}$ for some matrices ${\mathbf {F}},{\mathbf {G}} \in {\mathbb {C}}^{M \times N}$. We define ${\mathcal {A}}^*(\cdot ): {\mathbb {C}}^M \rightarrow {\mathbb {C}}^N$ as the mapping $ {\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H {\mathbf {n}} + {\mathbf {G}}^H \overline{{\mathbf {n}}}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$.

Note that ${\mathcal {A}}^*(\cdot )$ is also a real-linear operator, and can be equivalently written in real-linear form as ${\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H {\mathbf {n}} + \overline{({\mathbf {G}}^T {\mathbf {n}})}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$, where $^T$ denotes the transpose operation (without conjugation). Interestingly, it can also be shown that ${\mathcal {A}}^*(\cdot )$ matches the definition of the adjoint operator of ${\mathcal {A}}(\cdot )$ from real-linear operator theory (Huhtanen and Ruotsalainen 2011). We believe this is interesting because our definition of ${\mathcal {A}}^*(\cdot )$ was derived independently of adjoint concepts. In addition, while the adjoint of a linear operator is well-known to be useful for solving linear least-squares problems, the adjoint definition used in real-linear operator theory is different from that used in the linear case (Huhtanen and Ruotsalainen 2011), and we had no prior expectations that the real-linear definition of adjoint would be related to the present work.

Lemma 2

Consider a real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ that can be written as the composition ${\mathcal {A}}(\cdot ) = {\mathcal {A}}_2(\cdot ) \circ {\mathcal {A}}_1(\cdot )$ of real-linear operators ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^P$ and ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^P \rightarrow {\mathbb {C}}^M$. Then ${\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathcal {A}}_2^*({\mathbf {n}})))$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$.

Lemma 3

Consider a real-linear operator ${\mathcal {A}}(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ that can be written as the summation ${\mathcal {A}}(\cdot ) = {\mathcal {A}}_1(\cdot ) + {\mathcal {A}}_2(\cdot )$ of real-linear operators ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ and ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$. Then ${\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathbf {n}})+{\mathcal {A}}_2^*({\mathbf {n}})$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$.

The proofs of these three lemmas are straightforward, and are given in the appendices. Note that results similar to Lemmas 2 and 3 have been previously described for the adjoints of real-linear operators (Huhtanen and Ruotsalainen 2011), although the previous results were derived in a different context and relied on different concepts and proof techniques than those used in this paper (i.e., our approach is completely free of the inner-product-based real-linear adjoint concepts that were central to previous literature).

When combined together, Lemmas 1-3 completely eliminate the need to derive or work with the real-valued matrix $\tilde{{\mathbf {A}}}$ in the context of iterative algorithms, because the effects of multiplication with the real-valued matrices $\tilde{{\mathbf {A}}}$ and $\tilde{{\mathbf {A}}}^H$ can be obtained equivalently using the complex-valued nonlinear operators ${\mathcal {A}}(\cdot )$ and ${\mathcal {A}}^*(\cdot )$. This can also lead to computational savings, since e.g., computing $\mathrm {real}({\mathcal {A}}({\mathbf {m}}))$ and $\mathrm {imag}({\mathcal {A}}({\mathbf {m}}))$ (as needed for computing multiplication of the matrix $\tilde{{\mathbf {A}}}$ with a vector using Eq. (28)) only requires a single call to the function that computes ${\mathcal {A}}({\mathbf {m}})$. Likewise, computing multiplication of the matrix $\tilde{{\mathbf {A}}}^H$ with a vector only requires a single call to the function that computes ${\mathcal {A}}^*(\cdot )$. And further, if ${\mathcal {A}}(\cdot )$ is represented as a complicated summation and/or composition of real-linear operators, we can rely on Properties 5 and 6 and Lemmas 2 and 3 to build ${\mathcal {A}}^*(\cdot )$ incrementally from the individual constituent operators, rather than needing to use Definition 4 (which would require ${\mathcal {A}}(\cdot )$ to be simplified so that it can be decomposed using ${\mathbf {F}}$ and ${\mathbf {G}}$ matrices).

As a consequence of these lemmas, it is, e.g., possible to replace the real-valued Landweber iteration from Algorithm 1 with the simpler complex-valued iteration given by Algorithm 3.

With infinite numerical precision, Algorithm 3 will produce the exact same sequence of iterates as Algorithm 1, and will therefore have the exact same global convergence guarantees stated previously for Landweber iteration.

We can make similar modifications to the CG algorithm from Algorithm 2, although need the following additional property to be able to correctly handle the inner-products appearing in the CG algorithm.

Property 7

Consider arbitrary vectors ${\mathbf {p}},{\mathbf {q}} \in {\mathbb {C}}^N$, which are decomposed into their real and imaginary components according to ${\mathbf {p}} = {\mathbf {p}}_r + i{\mathbf {p}}_i$ and ${\mathbf {q}} = {\mathbf {q}}_r + i {\mathbf {q}}_i$, with ${\mathbf {p}}_r,{\mathbf {p}}_i,{\mathbf {q}}_r,{\mathbf {q}}_i \in {\mathbb {R}}^N$. Define $\tilde{{\mathbf {p}}},\tilde{{\mathbf {q}}} \in {\mathbb {R}}^{2N}$ according to

$$\begin{aligned} \tilde{{\mathbf {p}}} = \begin{bmatrix} {\mathbf {p}}_r \\ {\mathbf {p}}_i \end{bmatrix} \text { and } \tilde{{\mathbf {q}}} = \begin{bmatrix} {\mathbf {q}}_r \\ {\mathbf {q}}_i \end{bmatrix} \end{aligned}$$

(30)

Then $\tilde{{\mathbf {p}}}^H\tilde{{\mathbf {q}}} = \mathrm {real}({\mathbf {p}}^H{\mathbf {q}})$.

Combining this property with the previous lemmas leads to the simple complex-valued iteration for the CG algorithm given by Algorithm 4.

While we have only shown complex-valued adaptations of the Landweber and CG algorithms, this same approach is easily applied to other related algorithms like LSQR (Paige and Saunders 1982).

4 Useful relations for common real-linear operators

Before demonstrating the empirical characteristics of our proposed new approach, we believe that our proposed framework will be easier to use if we enumerated some of the most common real-linear ${\mathcal {A}}(\cdot )$ operators and their corresponding ${\mathcal {A}}^*(\cdot )$ operators. Such a list is provided in Table 1.

Table 1 Table of common real-linear ${\mathcal {A}}(\cdot )$ operators and corresponding ${\mathcal {A}}^*(\cdot )$ operators. We also provide expressions for ${\mathcal {A}}^*({\mathcal {A}}(\cdot ))$ in cases where the combined operator takes a simpler form than applying each operator sequentially. In the last two rows, it is assumed that the matrix ${\mathbf {A}} \in {\mathbb {C}}^{M_1 \times N}$, and that the vector ${\mathbf {y}} \in {\mathbb {C}}^M$ is divided into two components ${\mathbf {y}}_1 \in {\mathbb {C}}^{M_1}$ and ${\mathbf {y}}_2 \in {\mathbb {C}}^{M-M_1}$ with ${\mathbf {y}} = \begin{bmatrix} {\mathbf {y}}_1^T&{\mathbf {y}}_2^T \end{bmatrix}^T$. In the last row, we take ${\mathcal {B}}({\mathbf {x}}) \triangleq {\mathbf {C}}{\mathbf {x}} - {\mathbf {D}} \overline{({\mathbf {E}} {\mathbf {x}})}$, with corresponding ${\mathcal {B}}^*({\mathbf {y}}) = {\mathbf {C}}^H{\mathbf {y}} - {\mathbf {E}}^H \overline{({\mathbf {D}}^H{\mathbf {y}})} $. Note that a special case of equivalent complex-valued operators associated with Eq. (7) (with ${\mathbf {B}}$ chosen as the identity matrix) was previously presented by Ref. (Bydder and Robson 2005), although without the more general real-linear mathematical framework developed in this work

Full size table

5 Numerical example

To demonstrate the potential benefits of our proposed complex-valued approach, we will consider an instance of the problem described by Eq. (9). In this case, the use of complex-valued operations can lead to both a simpler problem formulation and faster numerical computations.

To address simplicity, we hope that it is obvious by inspection that the process of deriving $\tilde{{\mathbf {A}}}$ for this case (as given in Eq. (23), and needed for the conventional real-valued iterative computations) was non-trivial and labor-intensive, while the derivation of ${\mathcal {A}}(\cdot )$ and ${\mathcal {A}}^*(\cdot )$ (as given in Table 1, and needed for the proposed new complex-valued iterative computations) was comparatively fast and easy.

To address the computational benefits of the proposed approach, we will consider a specific realization of Eq. (9), in which ${\mathbf {x}} \in {\mathbb {C}}^{1000}$, ${\mathbf {n}} \in {\mathbb {C}}^{20000}$, ${\mathbf {A}} \in {\mathbb {C}}^{20000\times 1000}$, ${\mathbf {C}} \in {\mathbb {C}}^{30000 \times 1000}$, ${\mathbf {D}} \in {\mathbb {C}}^{30000 \times 2000}$, and ${\mathbf {E}} \in {\mathbb {C}}^{2000 \times 1000}$, with the real and imaginary parts of all of these vectors and matrices drawn at random from the i.i.d. Gaussian distribution. We then took ${\mathbf {b}} = {\mathbf {A}}{\mathbf {x}} + {\mathbf {n}}$, and set $\lambda = 10^{-3}$. For this random problem instance, we find the optimal nonlinear least-squares solution in four distinct ways:

Conventional real-valued approach with matrices We assume that ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ are available to us in matrix form, such that it is straightforward to directly precompute the real-valued matrix $\tilde{{\mathbf {A}}} \in {\mathbb {R}}^{100000\times 2000}$ from Eq. (23). We then use this precomputed matrix directly in iterative linear least-squares solution algorithms like Landweber iteration, CG, and LSQR. Although the form of this $\tilde{{\mathbf {A}}}$ matrix was complicated to derive, multiplications with the precomputed $\tilde{{\mathbf {A}}}$ and $\tilde{{\mathbf {A}}}^H$ matrices within each iteration should be very computationally efficient, particularly since we have taken 4 separate complex-valued matrices ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ that were originally specified by a sum total of $1.12\times 10^8$ complex-valued entries ($2.24 \times 10^8$ real numbers), and replaced them with a single real-valued matrix specified by only $2\times 10^8$ real numbers. Every matrix-vector multiplications involving $\tilde{{\mathbf {A}}}$ or $\tilde{{\mathbf {A}}}^H$ will therefore require $2\times 10^8$ real-valued scalar multiplications.
Proposed complex-valued approach with matrices As in the previous case, we assume that ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ are available to us in matrix form, which allows us to directly form the ${\mathbf {F}}$ and ${\mathbf {G}}$ matrices corresponding to the complex-valued real-linear formulation of the problem. Specifically, ${\mathbf {F}}$ was formed as
$$\begin{aligned} {\mathbf {F}} = \begin{bmatrix} {\mathbf {A}} \\ \sqrt{\lambda } {\mathbf {C}} \end{bmatrix} \end{aligned}$$
(31)
and ${\mathbf {G}}$ was formed as
$$\begin{aligned} {\mathbf {G}} = \begin{bmatrix} {{\mathbf {0}}} \\ -\sqrt{\lambda } \, \overline{{\mathbf {D}}}{\mathbf {E}} \end{bmatrix}. \end{aligned}$$
(32)
We then used these precomputed matrices to evaluate ${\mathcal {A}}(\cdot )$ and ${\mathcal {A}}^*(\cdot )$ as needed in our proposed complex-valued iterative algorithms. These precomputed ${\mathbf {F}}$ and ${\mathbf {G}}$ matrices are each specified by $5\times 10^7$ complex-valued entries (i.e., a sum total of $1\times 10^8$ complex-valued entries or $2\times 10^8$ real-valued entries considering ${\mathbf {F}}$ and ${\mathbf {G}}$ together). However, since one complex-valued scalar multiplication corresponds to four real-valued scalar multiplications, applying the ${\mathcal {A}}(\cdot )$ or ${\mathcal {A}}^*(\cdot )$ operators to vectors will require $4\times 10^8$ real-valued scalar multiplications, which is twice the number of multiplications required in the previous case.
Conventional real-valued approach with function calls We assume that we do not have direct access to the ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ matrices, but are only given blackbox functions that calculate matrix-vector multiplications with these matrices and their conjugate transposes. As such, we implement matrix-vector multiplication with $\tilde{{\mathbf {A}}}$ (and similarly for $\tilde{{\mathbf {A}}}^H$) naively in each iteration of the conventional iterative linear least-squares solution algorithms, using multiple calls to each of these functions as described in Sect. 2.2. This approach is not expected to be computationally efficient given the large number of function calls, although is simpler to implement than more advanced approaches that might be developed to exploit redundant computations within Eq. (23).
Proposed complex-valued approach with function calls As in the previous case, we assume that we do not have direct access to the ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$ matrices, but are only given blackbox functions that calculate matrix-vector multiplications with these matrices and their conjugate transposes. We implement the proposed complex-valued iterative algorithms using the techniques described in Sect. 3, using the expressions for ${\mathcal {A}}(\cdot )$ and ${\mathcal {A}}^*(\cdot )$ given in Table 1. In this case, applying the ${\mathcal {A}}(\cdot )$ or ${\mathcal {A}}^*(\cdot )$ operators to vectors will require $1.12 \times 10^8$ complex-valued scalar multiplications (equal to sum total of the number of entries in ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$, which corresponds to $4.48 \times 10^8$ real-valued scalar multiplications (slightly larger than the proposed complex-valued approach with matrices).

For the sake of reproducible research, Matlab code corresponding to this example is included as supplementary material.

For each case, we ran 50 iterations of Landweber iteration and 15 iterations of CG and LSQR in MATLAB 2018b, on a system with an Intel Core i7-8700K 3.70 GHz CPU processor. For each approach, each algorithm, and at each iteration, we computed (1) the total cumulative number of real-valued scalar multiplications (with 1 complex-valued scalar multiplication equal to 4 real-valued scalar multiplications) used by the algorithm thus far; (2) the cost function value from Eq. (9) using the current estimate (either ${\mathbf {x}}_k$ or $\tilde{{\mathbf {x}}}_k$); (3) the total computation time in seconds; and (4) the relative $\ell _2$-norm difference between the ${\mathbf {x}}_k$ value estimated from the proposed method with function calls and the other methods, where we define the relative $\ell _2$-norm difference between arbitrary vectors ${\mathbf {p}}$ and ${\mathbf {q}}$ as $\Vert {\mathbf {p}}-{\mathbf {q}}\Vert _2/\Vert \frac{1}{2}{\mathbf {p}}+\frac{1}{2}{\mathbf {q}}\Vert _2$. To minimize random fluctuations in computation speed due to background processing, the computation times we report represent the average of 15 different identical trials.

Results for Landweber iteration and the CG algorithm are reported in Figs. 1 and 2, respectively. The results for LSQR were quite similar to the results for CG (as expected theoretically Paige and Saunders 1982), and have been omitted. Results confirm that, as should be expected from the theory, all of the different approaches yield virtually identical cost function values and virtually identical solution estimates ${\mathbf {x}}_k/\tilde{{\mathbf {x}}}_k$ at each iteration for each of the different algorithms. There are some very minor differences on the order of $10^{-15}$, which can be attributed to numerical effects resulting from finite-precision arithmetic. In terms of computational complexity, we observe that the matrix-based approaches are generally associated with fewer multiplications than the implementations that use function calls, which should be expected because the matrix-based approaches were able to precompute simpler consolidated matrix representations [e.g., the $\tilde{{\mathbf {A}}}$ matrix from Eq. (21) or the ${\mathbf {F}}$ and ${\mathbf {G}}$ matrices from Eqs. (31) and (32)] that were not available to the function call approaches.

The proposed approaches required a moderate number of multiplications, somewhat intermediate between the conventional approach with matrices (which had the fewest multiplications, as expected based on our previous discussion of the efficiency of the conventional approach with precomputation) and the conventional approach with function calls (which had the most multiplications). However, in terms of actual computation time, we observe that the conventional approach with function calls was much slower than any of the other three methods, while the other three methods were all similar to one another. It is perhaps surprising that the computation times are not directly proportional to the number of multiplications, although this discrepancy is likely related to MATLAB’s use of efficient parallelized matrix multiplication libraries. Importantly, we observe that both variations of the proposed approach are quite fast, and have computation times that are quite similar to the conventional real-valued approach with matrices (which, as we mentioned, was expected to have excellent computational efficiency). There was negligible difference between the computation times assocociated with matrices and function call implementations of the proposed method, which was definitely not the case for the conventional approaches. And in terms of implementation, the proposed approach with function calls was the easiest to implement, since it didn’t require us to derive the forms of any special matrices like $\tilde{{\mathbf {A}}}$, ${\mathbf {F}}$, or ${\mathbf {G}}$, we could just directly work with the individual original matrices ${\mathbf {A}}$, ${\mathbf {C}}$, ${\mathbf {D}}$, and ${\mathbf {E}}$.

6 Conclusion

This work proposed a new approach to solving nonlinear least-squares problems involving real-linear operators. The new approach allows the use of the original complex-valued operators without transforming them into an unwieldy real-valued form. Theoretically, the approach enables identical iterative results as the conventional real-valued transformation, but with much simpler implementation options and potentially much faster computations. We expect the proposed approach to be valuable for solving general complex-valued nonlinear least-squares problems involving real-linear operators. Note that the proposed complex-valued approach is also an integral but previously-undescribed component of the most recent version of an open-source MRI reconstruction software package released by the authors (Kim and Haldar 2018).

Change history

04 May 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11081-021-09631-1

References

Blaimer M, Heim M, Neumann D, Jakob PM, Kannengiesser S, Breuer F (2016) Comparison of phase-constrained parallel MRI approaches: analogies and differences. Magn Reson Med 75:1086–1099
Article Google Scholar
Bydder M (2010) Solution of a complex least squares problem with constrained phase. Linear Algebra Appl 433:1719–1721
Article MathSciNet Google Scholar
Bydder M, Robson MD (2005) Partial Fourier partially parallel imaging. Magn Reson Med 53:1393–1401
Article Google Scholar
Erdogan H, Fessler JA (1999) Monotonic algorithms for transmission tomography. IEEE Trans Med Imag 18:801–814
Article Google Scholar
Fessler JA, Erdogan H (1998) A paraboloidal surrogates algorithm for convergent penalized-likelihood emission image reconstruction. In: Proceedings of IEEE Nuclear Science Symposium and Medical Imaging Conference., pp. 1132–1135
Haldar JP (2014) Low-rank modeling of local k-space neighborhoods (LORAKS) for constrained MRI. IEEE Trans Med Imag 33:668–681
Article Google Scholar
Haldar JP (2015) Autocalibrated LORAKS for fast constrained MRI reconstruction. In: Proceedings of IEEE International Symposium on Biomedical Imaging, pp. 910–913
Haldar JP, Setsompop K (2020) Linear predictability in MRI reconstruction: Leveraging shift-invariant Fourier structure for faster and better imaging. IEEE Signal Process Mag 37:69–82
Article Google Scholar
Haldar JP, Wedeen VJ, Nezamzadeh M, Dai G, Weiner MW, Schuff N, Liang ZP (2013) Improved diffusion imaging through SNR-enhancing joint reconstruction. Magn Reson Med 69:277–289
Article Google Scholar
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49:409–436
Article MathSciNet Google Scholar
Hoge WS, Kilmer ME, Zacarias-Almarcha C, Brooks DH (2007) Fast regularized reconstruction of non-uniformly subsampled partial-Fourier parallel MRI data. In: Proc. IEEE International Symposium on Biomedical Imaging pp. 1012–1015
Huhtanen M, Ruotsalainen S (2011) Real linear operator theory and its applications. Int Equ Oper Theory 69:113–132
Article MathSciNet Google Scholar
Kay SM (1993) Fundamentals of statistical signal processing, Volume I: estimation theory. Prentice Hall, Upper Saddle River
MATH Google Scholar
Kim TH, Haldar JP (2018) LORAKS software version 2.0: Faster implementation and enhanced capabilities. Technical Report USC-SIPI-443, University of Southern California, Los Angeles, CA
Landweber L (1951) An iteration formula for Fredholm integral equations of the first kind. Am J Math 73:615–624
Article MathSciNet Google Scholar
Lew C, Pineda AR, Clayton D, Spielman D, Chan F, Bammer R (2007) SENSE phase-constrained magnitude reconstruction with iterative phase refinement. Magn Reson Med 58:910–921
Article Google Scholar
Luenberger DG (1969) Optimization by vector space methods. Wiley-Interscience, New York
MATH Google Scholar
Markovsky I (2011) On the complex least squares problem with constrained phase. SIAM J Matrix Anal Appl 32:987–992
Article MathSciNet Google Scholar
Paige CC, Saunders MA (1982) LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans Math Softw 8:43–71
Article MathSciNet Google Scholar
Rudin W (1991) Functional analysis, second. McGraw-Hill Science, USA
MATH Google Scholar
Varadarajan D, Haldar JP (2015) A majorize-minimize framework for Rician and non-central chi MR images. IEEE Trans Med Image 34:2191–2202
Article Google Scholar
Willig-Onwuachi JD, Yeh EN, Grant AK, Ohliger MA, McKenzie CA, Sodickson DK (2005) Phase-constrained parallel MR image reconstruction. J Magn Reson 176:187–198
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, 90089, USA
Tae Hyung Kim & Justin P. Haldar

Authors

Tae Hyung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Justin P. Haldar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tae Hyung Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by a USC Annenberg Fellowship, a Kwanjeong Educational Foundation Scholarship, NSF research award CCF-1350563, and NIH research awards R21-EB022951, R01-MH116173, R01-NS074980, R01-NS089212, and R33-CA225400.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 2 KB)

Appendices

Proof of Lemma 1

First, note that Eq. (28) is a simple consequence of the derivations shown in Eq. (19). Thus, the validity of Eq. (29) is the only thing that remains to be proved.

To see that Eq. (29) is valid, note that

$$\begin{aligned} {\mathcal {A}}^*({\mathbf {n}})= & {} {\mathbf {F}}^H{\mathbf {n}} + {\mathbf {G}}^H\overline{{\mathbf {n}}} \nonumber \\= & {} {\mathbf {F}}^H ({\mathbf {n}}_r + i {\mathbf {n}}_i) + {\mathbf {G}}^H ({\mathbf {n}}_r - i {\mathbf {n}}_i)\nonumber \\= & {} \left( \mathrm {real}({\mathbf {F}}^H) + i \cdot \mathrm {imag}({\mathbf {F}}^H) \right) ({\mathbf {n}}_r + i {\mathbf {n}}_i)\nonumber \\&+ \left( \mathrm {real}({\mathbf {G}}^H) + i \cdot \mathrm {imag}({\mathbf {G}}^H) \right) ({\mathbf {n}}_r - i {\mathbf {n}}_i)\nonumber \\= & {} \left( \mathrm {real}({\mathbf {F}}^H){\mathbf {n}}_r - \mathrm {imag}({\mathbf {F}}^H){\mathbf {n}}_i + \mathrm {real}({\mathbf {G}}^H){\mathbf {n}}_r + \mathrm {imag}({\mathbf {G}}^H){\mathbf {n}}_i\right) \nonumber \\&+i\left( \mathrm {imag}({\mathbf {F}}^H){\mathbf {n}}_r + \mathrm {real}({\mathbf {F}}^H){\mathbf {n}}_i + \mathrm {imag}({\mathbf {G}}^H){\mathbf {n}}_r - \mathrm {real}({\mathbf {G}}^H){\mathbf {n}}_i\right) \nonumber \\= & {} \left( \mathrm {real}({\mathbf {F}})^H{\mathbf {n}}_r + \mathrm {imag}({\mathbf {F}})^H{\mathbf {n}}_i + \mathrm {real}({\mathbf {G}})^H{\mathbf {n}}_r - \mathrm {imag}({\mathbf {G}})^H{\mathbf {n}}_i\right) \nonumber \\&+i\left( -\mathrm {imag}({\mathbf {F}})^H{\mathbf {n}}_r + \mathrm {real}({\mathbf {F}})^H{\mathbf {n}}_i - \mathrm {imag}({\mathbf {G}})^H{\mathbf {n}}_r - \mathrm {real}({\mathbf {G}})^H{\mathbf {n}}_i\right) , \nonumber \\ \end{aligned}$$

(33)

where the last line of this expression relies on the fact that $\mathrm {imag}({\mathbf {B}}^H) = - \mathrm {imag}({\mathbf {B}})^H$ for an arbitrary matrix ${\mathbf {B}}$. Equation (33) provides a decomposition of ${\mathcal {A}}^*(\cdot )$ into its real and imaginary components, and is equivalent to

$$\begin{aligned} \begin{aligned} \begin{bmatrix} \mathrm {real}({\mathcal {A}}^*({\mathbf {n}})) \\ \mathrm {imag}({\mathcal {A}}^*({\mathbf {n}})) \end{bmatrix}&= \begin{bmatrix} \mathrm {real}({\mathbf {F}})^H + \mathrm {real}({\mathbf {G}})^H &{} \mathrm {imag}({\mathbf {F}})^H - \mathrm {imag}({\mathbf {G}})^H \\ -\mathrm {imag}({\mathbf {F}})^H - \mathrm {imag}({\mathbf {G}})^H &{} \mathrm {real}({\mathbf {F}})^H - \mathrm {real}({\mathbf {G}})^H \end{bmatrix} \begin{bmatrix} {\mathbf {n}}_r \\ {\mathbf {n}}_i \end{bmatrix} \\&= \begin{bmatrix} \mathrm {real}({\mathbf {F}}) + \mathrm {real}({\mathbf {G}}) &{} - \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) \\ \mathrm {imag}({\mathbf {F}}) - \mathrm {imag}({\mathbf {G}}) &{} \mathrm {real}({\mathbf {F}}) - \mathrm {real}({\mathbf {G}}) \end{bmatrix}^H \begin{bmatrix} {\mathbf {n}}_r \\ {\mathbf {n}}_i \end{bmatrix} \\&= \tilde{{\mathbf {A}}}^H \begin{bmatrix} {\mathbf {n}}_r \\ {\mathbf {n}}_i \end{bmatrix}, \end{aligned} \end{aligned}$$

(34)

where the last line comes from the definition of $\tilde{{\mathbf {A}}}$ in Eq. (21). This proves the validity of Eq. (29). $\square $

Proof of Lemma 2

Let ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^P$ be a real-linear operator that is represented for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as ${\mathcal {A}}_1({\mathbf {x}}) = {\mathbf {F}}_1{\mathbf {x}} + \overline{({\mathbf {G}}_1 {\mathbf {x}})}$ for some matrices ${\mathbf {F}}_1,{\mathbf {G}}_1 \in {\mathbb {C}}^{P\times N}$, and let ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^P \rightarrow {\mathbb {C}}^M$ be a real-linear operator that is represented for $\forall {\mathbf {y}} \in {\mathbb {C}}^P$ as ${\mathcal {A}}_2({\mathbf {y}}) = {\mathbf {F}}_2{\mathbf {y}} + \overline{({\mathbf {G}}_2 {\mathbf {y}})}$ for some matrices ${\mathbf {F}}_2,{\mathbf {G}}_2 \in {\mathbb {C}}^{M\times P}$. Then the composition ${\mathcal {A}}(\cdot ) = {\mathcal {A}}_2(\cdot ) \circ {\mathcal {A}}_1(\cdot )$ can be expressed for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as

$$\begin{aligned} \begin{aligned} {\mathcal {A}}({\mathbf {x}})&= {\mathcal {A}}_2({\mathcal {A}}_1({\mathbf {x}})) \\&= {\mathcal {A}}_2\left( {\mathbf {F}}_1{\mathbf {x}} + \overline{({\mathbf {G}}_1{\mathbf {x}})}\right) \\&={\mathbf {F}}_2\left( {\mathbf {F}}_1{\mathbf {x}} + \overline{({\mathbf {G}}_1{\mathbf {x}})}\right) + \overline{\left( {\mathbf {G}}_2 \left( {\mathbf {F}}_1{\mathbf {x}} + \overline{({\mathbf {G}}_1{\mathbf {x}})}\right) \right) } \\&= ({\mathbf {F}}_2{\mathbf {F}}_1 + \overline{{\mathbf {G}}_2}{\mathbf {G}}_1){\mathbf {x}} + \overline{(\overline{{\mathbf {F}}_2}{\mathbf {G}}_1+{\mathbf {G}}_2{\mathbf {F}}_1){\mathbf {x}}}. \end{aligned} \end{aligned}$$

(35)

Thus ${\mathcal {A}}(\cdot )$ can be written in the real-linear form ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ with ${\mathbf {F}} \triangleq {\mathbf {F}}_2{\mathbf {F}}_1 + \overline{{\mathbf {G}}_2}{\mathbf {G}}_1$ and ${\mathbf {G}} \triangleq \overline{{\mathbf {F}}_2}{\mathbf {G}}_1 + {\mathbf {G}}_2 {\mathbf {F}}_1 $.

By Definition 4, we also have that ${\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H{\mathbf {n}} + {\mathbf {G}}^H \overline{{\mathbf {n}}}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$, ${\mathcal {A}}_1^*({\mathbf {\mathbf {y}}}) \triangleq {\mathbf {F}}_1^H{\mathbf {y}} + {\mathbf {G}}_1^H \overline{{\mathbf {y}}}$ for $\forall {\mathbf {y}} \in {\mathbb {C}}^P$, and ${\mathcal {A}}_2^*({\mathbf {n}}) \triangleq {\mathbf {F}}_2^H{\mathbf {n}} + {\mathbf {G}}_2^H \overline{{\mathbf {n}}}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$. Thus, we have for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$ that

$$\begin{aligned} \begin{aligned} {\mathcal {A}}_1^*({\mathcal {A}}_2^*({\mathbf {n}}))&= {\mathcal {A}}_1^*\left( {\mathbf {F}}_2^H{\mathbf {n}} + {\mathbf {G}}_2^H\overline{{\mathbf {n}}}\right) \\&={\mathbf {F}}_1^H \left( {\mathbf {F}}_2^H{\mathbf {n}} + {\mathbf {G}}_2^H\overline{{\mathbf {n}}}\right) + {\mathbf {G}}_1^H \overline{\left( {\mathbf {F}}_2^H{\mathbf {n}} + {\mathbf {G}}_2^H\overline{{\mathbf {n}}}\right) } \\&= ({\mathbf {F}}_1^H{\mathbf {F}}_2^H + {\mathbf {G}}_1^H \overline{{\mathbf {G}}_2^H}){\mathbf {n}} + \left( {\mathbf {F}}_1^H{\mathbf {G}}_2^H + {\mathbf {G}}_1^H\overline{{\mathbf {F}}_2^H}\right) \overline{{\mathbf {n}}} \\&= ({\mathbf {F}}_2{\mathbf {F}}_1 + \overline{{\mathbf {G}}_2}{\mathbf {G}}_1)^H {\mathbf {n}} + ( \overline{{\mathbf {F}}_2}{\mathbf {G}}_1 + {\mathbf {G}}_2 {\mathbf {F}}_1 )^H\overline{{\mathbf {n}}} \\&= {\mathbf {F}}^H{\mathbf {n}} + {\mathbf {G}}^H\overline{{\mathbf {n}}} \\&= {\mathcal {A}}^*({\mathbf {n}}), \end{aligned} \end{aligned}$$

(36)

which shows that ${\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathcal {A}}_2^*({\mathbf {n}}))$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$ as desired. $\square $

Proof of Lemma 3

Let ${\mathcal {A}}_1(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ be a real-linear operator that is represented for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as ${\mathcal {A}}_1({\mathbf {x}}) = {\mathbf {F}}_1{\mathbf {x}} + \overline{({\mathbf {G}}_1 {\mathbf {x}})}$ for some matrices ${\mathbf {F}}_1,{\mathbf {G}}_1 \in {\mathbb {C}}^{M\times N}$, and let ${\mathcal {A}}_2(\cdot ): {\mathbb {C}}^N \rightarrow {\mathbb {C}}^M$ be a real-linear operator that is represented for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as ${\mathcal {A}}_2({\mathbf {x}}) = {\mathbf {F}}_2{\mathbf {x}} + \overline{({\mathbf {G}}_2 {\mathbf {x}})}$ for some matrices ${\mathbf {F}}_2,{\mathbf {G}}_2 \in {\mathbb {C}}^{M\times N}$. Then the summation ${\mathcal {A}}(\cdot ) = {\mathcal {A}}_1(\cdot ) + {\mathcal {A}}_2(\cdot )$ can be expressed for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ as

$$\begin{aligned} \begin{aligned} {\mathcal {A}}({\mathbf {x}})&={\mathcal {A}}_1({\mathbf {x}})+{\mathcal {A}}_2({\mathbf {x}}) \\&= {\mathbf {F}}_1 {\mathbf {x}} + \overline{({\mathbf {G}}_1 {\mathbf {x}})} + {\mathbf {F}}_2 {\mathbf {x}} + \overline{({\mathbf {G}}_2 {\mathbf {x}})} \\&=\left( {\mathbf {F}}_1 + {\mathbf {F}}_2 \right) {\mathbf {x}} + \overline{\left( {\mathbf {G}}_1 + {\mathbf {G}}_2\right) {\mathbf {x}}}. \end{aligned} \end{aligned}$$

(37)

Thus ${\mathcal {A}}(\cdot )$ can be written in the real-linear form ${\mathcal {A}}({\mathbf {x}}) = {\mathbf {F}}{\mathbf {x}} + \overline{({\mathbf {G}}{\mathbf {x}})}$ for $\forall {\mathbf {x}} \in {\mathbb {C}}^N$ with ${\mathbf {F}} \triangleq {\mathbf {F}}_1 +{\mathbf {F}}_2$ and ${\mathbf {G}} \triangleq {\mathbf {G}}_1 + {\mathbf {G}}_2$.

By Definition 4, we also have that ${\mathcal {A}}^*({\mathbf {n}}) \triangleq {\mathbf {F}}^H{\mathbf {n}} + {\mathbf {G}}^H \overline{{\mathbf {n}}}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$, ${\mathcal {A}}_1^*({\mathbf {\mathbf {y}}}) \triangleq {\mathbf {F}}_1^H{\mathbf {y}} + {\mathbf {G}}_1^H \overline{{\mathbf {y}}}$ for $\forall {\mathbf {y}} \in {\mathbb {C}}^P$, and ${\mathcal {A}}_2^*({\mathbf {n}}) \triangleq {\mathbf {F}}_2^H{\mathbf {n}} + {\mathbf {G}}_2^H \overline{{\mathbf {n}}}$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$. Thus, we have for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$ that

$$\begin{aligned} \begin{aligned} {\mathcal {A}}_1^*({\mathbf {n}}) + {\mathcal {A}}_2^*({\mathbf {n}})&= {\mathbf {F}}_1^H {\mathbf {n}} + {\mathbf {G}}_1^H\overline{{\mathbf {n}}} + {\mathbf {F}}_2^H {\mathbf {n}} + {\mathbf {G}}_2^H\overline{{\mathbf {n}}} \\&= ({\mathbf {F}}_1^H + {\mathbf {F}}_2^H){\mathbf {n}} + ({\mathbf {G}}_1^H+{\mathbf {G}}_2^H)\overline{{\mathbf {n}}}\\&= ({\mathbf {F}}_1 + {\mathbf {F}}_2)^H{\mathbf {n}} + ({\mathbf {G}}_1+{\mathbf {G}}_2)^H\overline{{\mathbf {n}}}\\&= {\mathbf {F}}^H {\mathbf {n}} + {\mathbf {G}}^H\overline{{\mathbf {n}}}\\&= {\mathcal {A}}^*({\mathbf {n}}), \end{aligned} \end{aligned}$$

(38)

which shows that ${\mathcal {A}}^*({\mathbf {n}}) = {\mathcal {A}}_1^*({\mathbf {n}}) + {\mathcal {A}}_2^*({\mathbf {n}})$ for $\forall {\mathbf {n}} \in {\mathbb {C}}^M$ as desired. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, T.H., Haldar, J.P. Efficient iterative solutions to complex-valued nonlinear least-squares problems with mixed linear and antilinear operators. Optim Eng 23, 749–768 (2022). https://doi.org/10.1007/s11081-021-09604-4

Download citation

Received: 14 September 2020
Revised: 11 February 2021
Accepted: 12 February 2021
Published: 13 March 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11081-021-09604-4

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Efficient iterative solutions to complex-valued nonlinear least-squares problems with mixed linear and antilinear operators

Abstract

Similar content being viewed by others

Least-squares Hermitian problem of complex matrix equation $(AXB,CXD)=(E,F)$

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Operator Least Squares Problems and Moore–Penrose Inverses in Krein Spaces

1 Introduction

2 Background

2.1 Linear, antilinear, and real-linear operators

Definition 1

Property 1

Definition 2

Property 2

Definition 3

Property 3

Property 4

Property 5

Property 6

2.2 Real-valued transformation of complex-valued least squares

3 Main results

Lemma 1

Definition 4

Lemma 2

Lemma 3

Property 7

4 Useful relations for common real-linear operators

5 Numerical example

6 Conclusion

Change history

04 May 2021

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (zip 2 KB)

Appendices

Proof of Lemma 1

Proof of Lemma 2

Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation