An efficient source wavefield reconstruction scheme using single boundary layer values for the spectral element method

In the adjoint‐state method, the forward‐propagated source wavefield and the backward‐propagated receiver wavefield must be available simultaneously either for seismic imaging in migration or for gradient calculation in inversion. A feasible way to avoid the excessive storage demand is to reconstruct the source wavefield backward in time by storing the entire history of the wavefield in perfectly matched layers. In this paper, we make full use of the elementwise global property of the Laplace operator of the spectral element method (SEM) and propose an efficient source wavefield reconstruction method at the cost of storing the wavefield history only at single boundary layer nodes. Numerical experiments indicate that the accuracy of the proposed method is identical to that of the conventional method and is independent of the order of the Lagrange polynomials, the element type, and the temporal discretization method. In contrast, the memory‐saving ratios of the conventional method versus our method is at least N when using either quadrilateral or hexahedron elements, respectively, where N is the order of the Lagrange polynomials used in the SEM. A higher memory‐saving ratio is achieved with triangular elements versus quadrilaterals. The new method is applied to reverse time migration by considering the Marmousi model as a benchmark. Numerical results demonstrate that the method is able to provide the same result as the conventional method but with about 1/25 times lower storage demand. With the proposed wavefield reconstruction method, the storage demand is dramatically reduced; therefore, in‐core memory storage is feasible even for large‐scale three‐dimensional adjoint inversion problems.


Introduction
The adjoint methods, such as least-squares reverse time migration (LSRTM) (Tang YX, 2009;Dai W et al., 2011;Wong et al., 2015;Liu XJ et al., 2016;Liu YS et al., 2016) and full waveform inversion (FWI) (Tarantola, 1984;Pratt and Shipp, 1999;Tromp et al., 2005;Fichtner et al., 2006aFichtner et al., , 2006bPlessix, 2009;Virieux and Operto, 2009), are powerful tools in geophysics that allow the computation of the first derivative (or gradient) of a physical observation or an associated objective function, with respect to its dependency parameter, to be achievable (Fichtner et al., 2006a(Fichtner et al., , 2006bPlessix, 2006;Virieux and Operto, 2009). In adjoint methods, the computation of the gradient requires both the forward-propagated source wavefield and the backward-propagated receiver wavefield (or adjoint wavefield) to be available simultaneously (Tarantola, 1984). The source and receiver wavefields are extrapolated separ-ately along opposite directions, which means that the source wavefield needs to be saved or reproduced (Dussaud et al., 2008;Feng B and Wang HZ, 2012). For small-sized problems, the source wavefields can simply be stored in memory and the gradient can be computed on the fly, which is the simplest way to access the source and receiver wavefields synchronously (Whitmore and Lines, 1986;Sun WJ and Fu LY, 2013). However, the storage and input/output (I/O) requirements for this method become increasingly demanding for data from elastic, multicomponent, and three-dimensional (3-D) surveys (Nguyen and McMechan, 2014); therefore, this method is not feasible because the memory cost is unaffordable. The above wavefield preservation method (i.e., the storing of an extrapolated wavefield) thus suffers a stringent memory bottleneck for large-scale 3-D imaging applications. A popular and feasible method of making the source wavefield accessible during the receiver wavefield extrapolation is wavefield reconstruction, which attempts to reproduce or reconstruct an extrapolated wavefield (Gauthier et al., 1986;Dussaud et al., 2008;Vasmel and Robertsson, 2016). method with a minimal storage cost rests on the random boundary condition (Clapp, 2009;Fletcher and Robertsson, 2011;Shen XK and Clapp, 2015;Shi Y and Wang YH, 2016). Wave propagation is reversible in random media (the boundary layers); therefore, the wavefield can theoretically be reconstructed by requiring only the last two time-slice wavefield snapshots. However, the main drawback is that this may induce additional noise in the gradient of the adjoint method, for example, in the migration image (Liu HW et al., 2013;Shi Y and Wang YH, 2016). Furthermore, it usually requires relatively thick boundary layers to obtain a result without a strong boundary reflection. Symes (2007) proposed an efficient reconstruction of the source wavefield that involved using the optimal checkpointing method, in which wavefield snapshots at several time steps are stored and later used as initial data to restart the wavefield simulation (Anderson et al., 2012;Nguyen and McMechan, 2014). This method uses a little more storage than the random boundary condition, and it can be used even in dissipated media without a loss of accuracy (Yang PL et al., 2016a). The most popular reconstruction method among the wavefield reconstruction methods uses the wavefields at the last two time steps and the forward simulation boundary values to recompute the source wavefield backward in time (Gauthier et al., 1986;Dussaud et al., 2008;Tang C and Wang D, 2012;Feng B et al., 2013). This method gains in efficiency by compensating for the time-consuming I/O demands with the recomputing cost. Many variations of this approach have been developed. The only difference is the number of stored boundary values. Feng B and Wang HZ (2012) stored wavefield values in only a single layer near the boundary to approximately reconstruct the source wavefield at the price of losing spatial accuracy near the boundaries. However, the order of the finite-difference (FD) stencil was only degree 2 near the boundary, which cannot reproduce the source wavefield accurately. Especially in 3-D, this method cannot meet the accuracy requirement because the wave propagation is comparatively complex.
An accurate wavefield reconstruction method, namely, the standard method, reproduces the source wavefield in the reverse time direction by storing the boundary values with a thickness of a half-number of the FD stencil as the boundary condition and by storing wavefields at the last two states as the initial condition (Gauthier et al., 1986;Dussaud et al., 2008;Tang C and Wang D, 2012;Nguyen and McMechan, 2014). In this study, the boundary values refer to the wavefield values in perfectly matched layers (PMLs) (Berenger, 1994;Komatitsch and Tromp, 2003;Liu YS et al., 2014). Even though this method can perfectly reconstruct the source wavefield, the memory requirements may be unaffordable for a typical workstation (Tan SR and Huang LJ, 2014). To mitigate this situation, Tan SR and Huang LJ (2014) developed an extrapolation-based wavefield reconstruction scheme for the staggered grid method. In this method, high-order polynomials are constructed to extrapolate the missing boundary values. It is able to offer high accuracy with fewer storage requirements than the standard method, although it still needs to store several layers of wavefield values depending on the length of the FD stencil. To further reduce the storage demand, Liu SL et al. (2015) proposed a source wavefield reconstruction method that stores only one wavefield value as a linear combination of wavefields in the boundary region. They used an optimization technique in the Fourier domain to obtain the coefficients for this linear combination, and they presented an implementation for a regular grid. This method, which is less accurate than the standard method (Liu SL et al., 2015), essentially adopts an asymmetric FD operator, that is, a longer operator toward the PMLs domain and a shorter operator toward the computational domain. Recently, Vasmel and Robertsson (2016) presented an alternative scheme that adopts the method of multiple point sources. Their method reconstructs the source wavefield with minimal memory requirements by using a combination of monopole and dipole sources on an injection surface surrounding the model. It can accurately reproduce the source wavefield at the cost of storing one or two surface records. The accuracy of reconstruction is independent of the order of spatial accuracy of the FD stencil. Following the ideas of Tan SR and Huang LJ (2014) and Sun WJ and Fu LY (2013), Yang PL et al. (2016b) recently proposed a method that stores only the boundary values according to the Nyquist sampling principle. To obtain the boundary values at each time step, the missing values that are not stored are reproduced by interpolation methods. This method can dramatically decimate the saved boundary values without a significant loss of accuracy (Yang PL et al., 2016b).
In recent years, the SEM has becoming a popular method of addressing regional and global problems, especially because the open-source package SPECFEM (Komatitsch and Vilotte, 1998;Tromp, 1999, 2002a, b ) can handle complex geometry through accurate numerical calculations of the wavefield (Komatitsch and Vilotte, 1998;Komatitsch and Tromp, 1999). Similarly, advances in numerical methods combined with developments in high-performance computing have enabled unprecedented simulations of seismic wave propagation in realistic 3-D global Earth models by using the SEM (Komatitsch and Tromp, 2002a, b ;Chaljub et al., 2003;Chaljub and Valette, 2004;Capdeville and Marigo, 2007;Peter et al., 2011). Conversely, by virtue of the adjoint-state method (Tarantola, 1984;Pratt and Shipp, 1999;Tromp et al., 2005Tromp et al., , 2008Plessix, 2006;Liu QY andTromp, 2006, 2008;Virieux and Operto, 2009), the gradient can be computed with respect to the functionals at a low cost by modeling the wavefield twice. Thus, the adjoint tomography or inversion provides new opportunities for improving images of the Earth's interior (Fichtner et al., 2006a(Fichtner et al., , b, 2009Tape et al., 2007Tape et al., , 2010Bozdağ et al., 2011Bozdağ et al., , 2016Lee et al., 2014;Chen et al., 2015;Zhu HJ et al., 2015). For these applications (as a special case of adjoint inversion), simultaneously accessing the forward-propagated source wavefield and backward-propagated receiver wavefield cannot be avoided. A favorable and feasible solution is one that reproduces the source wavefield backward in time. To apply the adjoint inversion to large-scale 3-D study cases efficiently, a memory-efficient source wavefield reconstruction needs to be developed.
For the adjoint tomography application in SEM, some source wavefield reconstruction methods that can be derived directly from the FD method have been implemented. The most direct way of reconstructing the source wavefield with the SEM is that one store the last two time-slice wavefield values and the wavefield information in the PMLs (with reference to the conventional method). However, the nonuniform spacing of the interpolation points for algebraic polynomials imposes a tight constraint on the time-step compared with a uniform spacing grid (Carcione and Wang PJ, 1993). As a consequence, the Courant-Friedrichs-Lewy (CFL) numbers decrease exponentially with the increasing degree of Lagrange polynomials (Liu SL et al., 2014;Sen, 2007, 2010;Liu SL et al., 2017b) (Figure 1). This means that the SEM requires many more time steps for a given problem compared with the FD method (uniform grid). This dramatically increases the storage amount of the boundary values used for the source wavefield reconstruction. Likewise, the excessive I/O operations significantly reduce its efficiency. For these reasons, a costefficient source wavefield reconstruction method is needed that will minimize the storage cost for the SEM. Recently, Komatitsch et al. (2016) introduced the checkpointing algorithm to the source wavefield reconstruction of the SEM. During forward propagation of the source wavefield, one saves checkpointing or restart files to disk every few hundred or thousand time steps. During backward propagation of the adjoint wavefield, one can forward recompute the source wavefield starting from the previous restart file by reading back from disk and storing the wavefields between two adjacent restart files in memory. Although this method can be applied to dissipative media, it still requires a huge amount of memory to save the wavefields between the adjacent restart time instants. Recently, Liu SL et al. (2017a) proposed a memory-efficient storage method at the interface of teleseismic wavefield modeling with a hybrid method that significantly reduces the storage demand.
To save the storage cost of SEM-based adjoint inversion, we first analyze the difference between the FD method and the SEM, bearing in mind that the Laplace operator is pointwise local in the former, whereas it is elementwise global in the latter. Because our goal is to develop a novel efficient wavefield reconstruction algorithm for the SEM, here we propose a method that requires only wavefield values stored in a single boundary layer to accurately reconstruct the source wavefield. For the SEM, the accumulated acceleration values at the boundary nodes depend on the nodes in the computational domain and the PMLs domain. When the wavefield information in the PMLs is not available, the accumulated acceleration at the boundary nodes is incorrect. To accurately reconstruct the source wavefield, the wavefields at the boundary nodes need to be replaced by the corresponding stored correct wavefield values at each time step of the backward extrapolation of the source wavefield. In general, this reconstruction method shares similar ideas with the method proposed by Liu SL et al. (2017a) in teleseismic wavefield modeling using the hybrid method. The storage cost is comparable to that of the method proposed by Liu SL et al. (2015) using the FD method. However, unlike the latter, our reconstruction method does not involve any optimization of coefficients as in the method by Liu SL et al. (2015), nor does it require an extrapolation step as described by Tan SR and Huang LJ (2014) and Yang PL et al. (2016b). We then use the Marmousi model to demonstrate its effectiveness and analyze its accuracy and efficiency. Finally, the reverse time migration (RTM) images obtained by the proposed method are shown and compared with the result obtained by the conventional method.

Acoustic Wave Equation
We present here a short exposition of the discrete formulation of the SEM that is needed in the following sections (Komatitsch and Vilotte, 1998;Liu YS et al., 2017a). We start from the acoustic wave equation in an isotropic heterogeneous medium, which is where p(x, t) is the acoustic pressure wavefield at spatial location x and time t; c(x) is the velocity structure; f(x, t) is the source term; and is the Laplace operator. Multiplying equation (1) by the time-independent test function w, integrating by parts, and applying the Neumann boundary condition on boundaries, we obtain the variational form of the wave equation (1): where the repeated subscripts represent summation over the affected variables and Ω is the computational domain. To numerically solve the integral equation (2), the computational domain Ω is decomposed into N e nonoverlapping quadrilateral or triangular elements Ω e in two dimensions (physical elements). Each of these physical elements can then be transformed into a unit or reference element. More details can be found in Komatitsch and Vilotte (1998) and Liu YS et al. (2017a).
On each reference element, the pressure field and test function can be expressed as where ϕ i (x) is the ith shape function or Lagrange function on the reference element; M N is the number of interpolation nodes on each element; and N is the order of the Lagrange polynomials. Substituting equation (3) into (2), we obtain the following (and well-known) ordinary difference equation, p where is the second-order temporal derivative of the pressure vector p with respect to time; M is the mass matrix, and K is the stiffness matrix, whose respective entities are and f is the load vector.
In the Lagrange-type SEM, the collocated interpolation and integration nodes are always adopted. As a consequence, the mass matrix is usually a diagonal matrix, which can substantially improve the numerical efficiency. After discretizing the second-order temporal derivative by using the leapfrog scheme, we obtain the following recursive equation: where n is the time index. Indeed, many other temporal discretization methods can be adopted (Newmark, 1959;Martin et al., 2010); however, for simplicity, we use the leapfrog format.

Source Wavefield Reconstruction
In the adjoint inversion, the source wavefield will be accessible during the time-reversal extrapolation of the receiver wavefield. However, the source and receiver wavefields are extrapolated in opposite temporal directions. To solve this dilemma, a popular method is to use the wavefields at the last two time steps and the boundary values of the forward simulation to reconstruct the source wavefield backward in time (Gauthier et al., 1986;Dussaud et al., 2008). Although this method has received considerable attention in the FD community, it faces a challenge in the SEM community. For the SEM, the CFL numbers decrease faster than those for the FD method because of the nonuniform spacing of the interpolation points (Carcione and Wang PJ, 1993). The smaller CFL number means that many more time steps are required; therefore, the memory size increases rapidly with the increasing degree N of the Lagrange polynomials. Practical strategies are thus needed to minimize the memory requirements for reconstruction of the source wavefield.
In the two-dimensional (2-D) case, as shown in Figure 1, the CFL number of the FD method decreases slowly with the increasing order N of the operator, whereas the CFL numbers of the quadrilateral SEM (QSEM) and triangular SEM (TSEM) decrease exponentially as N increases Sen, 2007, 2010;Liu SL et al., 2017b). To compare the amount of memory required in each case, we adopt a simple model for analysis. The adopted model consists of only two squares that represent the computational domain and the PMLs domain, respectively (see Figure 2). The gray line denotes the boundary that separates the domains from each other. We set the maximum modeling time to be T = 4 s. For the 2Nth order FD method, the number of layers in the PMLs domain to be stored is N; the memory size of the standard reconstruction scheme (i.e., storing the boundary values with a thickness of a half-number of the FD stencil as a boundary condition) is N × (N + 1) × nt when disregarding the memory requirements of the last two-slice wavefields, where nt is the total number of time steps. For the Nth QSEM and TSEM, we assume that the boundary layers consist of only a single element so that the number of nodes on the edges is identical to the number of layers in PMLs of the FD method (see Figure 2b and 2c). As shown in Figure 2b, for the Nthorder QSEM, the memory size of the conventional wavefield reconstruction scheme is also N × (N + 1) × nt. For the Nth-order TSEM, the total number of nodes in the PMLs domain is 2M N -(N -1), where the factor 2 is due to the number of triangles usually being twice the number of quadrilaterals and N -1 is the duplicate number of nodes on the common edge (excluding the two end points). Therefore, the memory size of the conventional wavefield reconstruction scheme is [2M N -(N -1) -(N + 1)] × nt, where N + 1 is the number of nodes on the edge (solid gray circles on the gray line in Figure 2c). For each method, the number nt is different because it is calculated according to the respective CFL number. As shown in Figure 3, even for this very small model, the memory requirements of the conventional source wavefield reconstruction scheme for the SEM methods are several orders of magnitude larger than those for the FD method. Therefore, it is important to develop a memory-efficient wavefield reconstruction algorithm for the SEM.
In the FD method, the Laplace operator depends on the wavefield values at the FD stencil (by solving the strong or differential form of the wave equation), whereas in the SEM, the Laplace operator depends on the wavefield values in the entire element (by solving the weak or integral form of the wave equation). Thus, the Laplace operator of the FD method is pointwise local, whereas the Laplace operator of the SEM is elementwise global. Because of this property of the Laplace operator, we are able to develop an efficient wavefield reconstruction method for the SEM. To carry out the analysis, we assume the model consists simply of two squares. As shown in Figure 4, we decompose the model into two domains, namely, the computational domain and the PMLs domain. The former consists of element 1 and the latter consists of element 2 ( Figure 4). The model has two types of nodes: the inner nodes, denoted by the cross and plus symbols, and the boundary nodes, denoted by the solid circles and open circles. In the SEM, the global mass and stiffness matrices are the accumulated contributions from all the neighboring elements. In both elements, the accumulated acceleration values at the inner nodes (represented by the cross and plus symbols) depend only on the contributions from the nodes in the current element, whereas the accumulated acceleration values at the boundary nodes (represented by circles) depend on the contributions from the nodes in the current element and connected elements. Specifically, the accumulated acceleration values at the common nodes (solid gray circles in Figure 4) are , which are the summation of the contributions from elements 1 and 2.
In the backward propagation of the source wavefield, we use the following equation to extrapolate the source wavefield: . Memory requirements of the conventional wavefield reconstruction method for the finite difference method (solid black line), the quadrilateral spectral element method (solid gray line), and the triangular spectral element method (dashed black line). The conventional wavefield reconstruction method reconstructs the source wavefield by using the last two-slice wavefield snapshots plus the wavefield history in the perfectly matched layers (PMLs), whereas the proposed wavefield reconstruction method reproduces the source wavefield by using the last two-slice wavefield snapshots plus the wavefield history at the boundary nodes. For comparison, we assume that the PMLs consist of only one element and disregard the storage cost of the two-slice wavefield snapshots.

PMLs domain
Computational domain Boundary ② ① In this process, and in the computational domain are always available. Given that the accumulated acceleration at the inner nodes depends only on the contribution from the nodes in the current element, the updated wavefield values at these nodes (plus symbols in Figure 4) are always correct even though the wavefield values in the PMLs (i.e., the solid circles and cross symbols in Figure 4) are unavailable. This is because the accumulated acceleration at the boundary nodes depends on nodes in the current element and connected elements, that is, . When the wavefield values in the PMLs are not available, the accumulated acceleration at these boundary nodes (solid gray circles shown in Figure 4) cannot be reproduced correctly. Therefore, the updated wavefield values at these boundary nodes (solid gray circles in Figure 4, and nodes on the thick solid line in Figure 5b) are not exact. To correct the incorrect wavefields at these boundary nodes, we can replace the boundary wavefield values with the corresponding stored correct values to reconstruct the source wavefield accurately (i.e., wavefields values at the nodes of the outer boundary of the computational domain shown in Figure 5). We can then explicitly and accurately extrapolate the wavefield at time n-1 by using the wavefields at times n and n+1. Thus, only the wavefield snapshots at the last two time slices and the wavefield values at the boundary nodes need to be saved during the forward propagation of the source wavefield. In the backward reconstruction of the source wavefield, the corresponding reproduced incorrect values should then be replaced with the stored correct ones. In comparison with the total PMLs (Figure 5a), the memory requirement for the wavefield values at the boundary nodes ( Figure  5b) can be dramatically reduced. Although the analysis is conducted on two elements, its application to more elements is straightforward. The source wavefield reconstruction method is illustrated in Table 1. In the next sections, we perform a series of numerical tests to verify the effectiveness of this novel wavefield reconstruction method.

Efficiency
In the FD community, the source wavefield is popularly reconstructed by using the last two time-slice wavefield snapshots plus the wavefield history in the PMLs (Gauthier et al., 1986;Dussaud et al., 2008). This scheme can be transplanted directly to the SEM. For an enlightening comparison, we reconstruct the source wavefield by using the following two schemes: (1) storing the last two time-slice wavefield snapshots plus the entire history of the wavefield in the PMLs, which is the conventional method directly from the FD community; or (2) storing the last two time-slice wavefield snapshots plus the complete history of the wavefield only at the boundary nodes, which is the new source wavefield reconstruction method. For purposes of discussion, we adopt the model shown in Figure 2. We can then define the following memory-saving ratio (MSR) for a comparison of efficiency (disregarding the last two-slice wavefields): where M PML denotes the memory requirement of the wavefield reconstruction method with the wavefield history in the PMLs (first working scheme) and M bound denotes an analogous memory requirement with the wavefield history at the boundary nodes (second working scheme). The nodes that need to be stored to reconstruct the source wavefield for the FD method, the QSEM, and the TSEM are shown in Figure 2. For the 2-D FD method and the QSEM, the total number of nodes in the PMLs domain is (Figure 2a and 2b), whereas the number of nodes that needs to be stored is for the conventional source wavefield reconstruction scheme. Therefore, the MSR for the FD method and the QSEM is . For the TSEM, the total number of nodes in the PMLs domain is , whereas the numbers of nodes that need to be stored are and for the conventional and proposed source wavefield reconstruction schemes, respectively. As a consequence, the MSR for the TSEM is . For the 3-D FD method and the hexahedral SEM, the total number of nodes in the PMLs domain is and the numbers of nodes that need to be stored are and for the conventional and proposed source wavefield reconstruction schemes, respectively. Therefore, it still leads to MSRs of N. Certainly, this is just the minimum MSR  Forward Propagation for t = 0 to t = t max forward propagate source wavefield using equation (7) add source function save wavefield at boundary nodes end for Backward Propagation for t = t max to t = 0 subtract time-reversed source function backward propagate source wavefield using equation (8) replace wavefield at boundary nodes with stored ones end for Earth and Planetary Physics doi: 10.26464/epp2019035 347 because at least two elements are generally adopted to suppress spurious reflections from artificial boundaries well (Komatitsch et al., 2003). In Figure 6, with the identical parameters illustrated in Figure 2, we compare the memory requirements for the QSEM and TSEM when using the conventional and proposed source wavefield reconstruction schemes. We can observe that the memory sizes of the source wavefield reconstruction schemes for the QSEM and TSEM can be significantly reduced when the new reconstruction scheme is adopted. As the degree of the polynomials increases, the memory-saving size increases. In Figure 7, we show the MSRs versus the order of the Lagrange polynomials for the SEM in quadrilateral, triangular, and hexahedral elements, respectively. Notably, the MSRs for the SEM in both the quadrilateral and hexahedral elements are at least 1, whereas the MSRs for the SEM in triangular elements are at least 2. For the usually adopted fourth-order degree polynomials (Komatitsch and Vilotte, 1998), the MSR for the former is 4, whereas the MSR for the latter is about 5.6. This illustrates that the developed wavefield reconstruction method is more appealing for higher order methods.

Effectiveness
Using the elementwise global property of the Laplace operator in the SEM, we can reconstruct the source wavefield by saving the wavefield values at the boundary nodes plus the wavefield snapshots at the last two time slices (Figure 5b). In this section, we report on a series of numerical tests carried out to check the effectiveness of the proposed wavefield reconstruction method. The dimensions of the adopted model size are 3.82km × 1.21km (Marmousi model, Figure 8). An adaptive triangular mesh made by using the open-source package Triangle (http://www.cs.cmu. edu/~quake/triangle.html) discretizes the seismic velocity model. The seismic source, denoted by a cross symbol, is located at the point (1.91 km, −0.05 km) and is modeled by a Ricker wavelet with a dominant frequency of 22 Hz. The duration of wavefield propagation is 3.2 s. The corresponding CFL number determines the time interval. The inverted triangle denotes a receiver located at the point (3.8 km, −0.1 km) that records the wavefield for further analysis. In addition, 383 equally spaced receivers are deployed exactly on the surface to record forward-and backwardpropagated source wavefields.
In this example, we handle the top boundary as a PMLs boundary condition. For the triangular mesh shown in Figure 8, we apply the cubature points-based TSEM (Liu YS et al., 2017a) to numerically solve the acoustic wave equation (1). In each element, the cubature points are both the interpolation nodes and the integration nodes. This also leads to a diagonal mass matrix and exhibits an accuracy comparable to that of the QSEM (Liu YS et al., 2017a). In this test, we adopt fourth-order Lagrange polynomials in each triangular element. According to its CFL number (i.e., 0.0553), the maximum allowable time interval for calculation is 0.2 ms to achieve stable wavefield extrapolation. Throughout this study, the thickness of the PMLs is assumed to be 0.1 km. Figure 9 shows (from left to right) the forward-propagated wavefield (left), the backward-propagated wavefield (central), and the residual multiplied by 10 4 (right) at time steps of 0.1 s (top), 0.5 s (middle), and 1.0 s (bottom). The reconstructed wavefields are almost identical to the theoretical ones (i.e., the forward-propagated wavefield). Although the residuals (obtained by subtracting the values included in the central column from those plotted in the left column) are enlarged by 10,000 times, no coherent events are observed. This indicates that the source wavefield is perfectly reproduced, which proves the effectiveness of the novel wavefield reconstruction method. To check whether the error level is reasonable, we have also reconstructed the wavefield by using the conventional method (i.e., the first scheme mentioned in the previous section). The corresponding reconstructed wavefields proposed (dashed lines) wavefield reconstruction methods for the quadrilateral spectral element method (black lines) and the triangular spectral element method (gray lines). The conventional wavefield reconstruction method reconstructs the source wavefield by using the last two-slice wavefield snapshots plus the wavefield history in the perfectly matched layers (PMLs), whereas the proposed wavefield reconstruction method reproduces the source wavefield by using the last two-slice wavefield snapshots plus the wavefield history at the boundary nodes. For comparison, we assume that the PMLs consist of only one element and disregard the storage cost of the two-slice wavefield snapshots. and scaled residuals at time instants of 0.1, 0.5, and 1.0 s are shown in Figure 10. It can be seen that the wavefields reconstructed with the conventional method (Figure 10b, 10e, and 10h) are comparable to those reconstructed by the novel method ( Figure  9b, 9e, and 9h). In particular, the scaled residuals in one case and another (Figure 9c, 9f, 9i, 10c, 10f, and 10i) are almost identical. This indicates that the accuracy of the developed wavefield reconstruction method is comparable to that of the conventional method.
To verify whether the reconstructed wavefields are exact at each time instant, here we show the seismograms recorded by the receivers deployed on the surface. Figure 11 shows the seismograms produced by the forward-propagated wavefield ( Figure  11a) and the backward-propagated wavefield (Figure 11b), together with the scaled residual ( Figure 11c). Clearly, the common shot gathers obtained by the forward-propagated wavefield (Figure 11a) are identical to those provided by the backward-propagated wavefield (Figure 11b). The residuals are exactly zero ( Figure  11c) because the stored wavefield history on the surface restricts the wavefield reconstruction.
In this case, the stored boundary values (at the four edges) can completely restrict the wavefield reconstruction. To exclude this effect, we also display the wavefield history at a fixed receiver located at the position (3.8 km, −0.1 km). Figure 12 shows the records of the forward-propagated wavefields (thick black line) and the records of the backward-propagated wavefields obtained by the new (blue dashed and dotted line) and conventional (red dashed line) wavefield reconstruction methods. The residuals multiplied by a factor of 10 4 (cyan and magenta solid lines) are also superimposed on the records. The errors can be observed to Reconstruction with the boundary nodes In this experiment, there are 22,616 triangular elements whose minimum and maximum edge lengths are 11.86 and 67.16 m, respectively. The number of cubature points is 249,809 in total. The Reconstruction with the PMLs Figure 10. Same as in Figure 9. The backward-propagated wavefields are reconstructed by using the last two time-slice wavefield snapshots and the wavefield history in the perfectly matched layers (PMLs) (see Figures 4 and 5), i.e., the conventional algorithm. In the right column, the scaled residuals are computed by subtracting the values included in the central column from those plotted in the left column and then multiplying by a factor of 10 4 .

Forward propagation Backward propagation Residual×10 4
Reconstruction with the boundary nodes number of nodes in the PMLs is 58,219, whereas the number of boundary nodes is 1,932. The memory requirements of the two methods are 117.93 MB and 3.47 GB, respectively; therefore, the MSR is ~30, which significantly reduces the storage demand. Even though the adopted model is very small, the conventional method still requires several gigabytes because of the nonuniform spacing of the interpolation points (Carcione and Wang PJ, 1993).

Tests with Different Degrees of Polynomials
In the previous section, we have analyzed the proposed reconstruction method by using the TSEM and a fixed order of the Lagrange polynomials. In this section, assuming the same physics and geometry as before, we check its effectiveness with different degrees of the Lagrange polynomials. We adopt polynomials of degrees 3 and 5 for comparison. According to the CFL numbers (i.e., 0.105 and 0.024), the maximum allowable time intervals for stable calculation are 0.3 and 0.1 ms, respectively. The adaptive triangular mesh (not included here) used for seismic modeling is similar to the one shown in Figure 8. The top boundary is set as a free surface boundary condition. Because the new wavefield method is comparable to the conventional one, the reconstructed wavefields obtained by the conventional method are not shown. Figure 13 shows the forward-propagated and backward-propagated wavefields at a time instant of 0.5 s, once calculated by the TSEM (Liu YS et al., 2017a) with polynomials of degrees 3 and 5. Generally, the wavefields are also perfectly reconstructed (Figures 13a-13b and 15d-15e) when compared with previous results . No coherent events can be observed in the scaled residuals (Figure 13c and 13f), which demonstrates that the proposed wavefield reconstruction method is independent of the order of the Lagrange polynomials.
When polynomials of degree 3 are used, there are 33,513 triangular elements whose minimum and maximum edge lengths are 8.31 m and 51.76 m, respectively. The number of cubature points is 218,833 in total. The number of nodes in the PMLs is 17,868, whereas the number of boundary nodes is 844. The memory re-  Figure 12. Seismic records at a fixed receiver located at the point (3.8 km, −0.1 km) (shown in Figure 8). The top boundary is conducted as a perfectly matched layers (PMLs) boundary condition. The order of the Lagrange polynomials is degree 4. The solid black line is the record of the forward-propagated wavefield. The dashed red line is the record of the backward-propagated wavefield reconstructed by using the last two time-slice wavefield snapshots plus the wavefield history in the PMLs, whereas the solid magenta line is the corresponding residual multiplied by a factor of 10 4 . The dashed and dotted line is the backward-propagated wavefield reconstructed by using the last two time-slice wavefield snapshots plus the wavefield history at the boundary nodes, whereas the solid cyan line is the corresponding residual multiplied by a factor of 10 4 .
Reconstruction with the boundary nodes

Tests with Different Types of Elements
In the previous sections, we have tested only in triangular elements. Here, we test the wavefield reconstruction method in quadrilateral elements. In each element, we adopt the Gauss-Lobatto-Legendre points as the interpolation and integration nodes, which usually generates a diagonal mass matrix (Komatitsch and Vilotte, 1998). Fourth-order Lagrange polynomials are usually adopted (Komatitsch and Vilotte, 1998). Figure 14 shows the forward-propagated and backward-propagated wavefields at a time instant of 0.5 s, once calculated with the SEM Vilotte, 1998: Liu YS et al., 2017a), and the meshes of triangular and quadrilateral elements. The comparison of the reconstructed wavefields (Figure 14b and 14e) reveals results almost identical to the theoretical ones (Figure 14a and 14d) whether with triangular or quadrilateral elements. The scaled residuals (Figure 14c and 14f) are comparable, which further demonstrates that the proposed wavefield reconstruction method is independent of the element type.
In this experiment, there are 13,066 quadrilateral elements with the same minimum and maximum edge lengths of 20.0 m. The number of Gauss-Lobatto-Legendre points is 213,325 in total.
The number of nodes in the PMLs is 27,153, whereas the number of boundary nodes is 1,253. The memory requirements of both methods are 50.99 MB and 1.08 GB, respectively, and the MSR is 21.7.

Tests with Different Temporal Discretization Schemes
In the previous sections, the temporal discretization method is the second-order leapfrog scheme (equation (8)). In this section, we test the developed reconstruction method with different temporal discretization formats. The temporal discretization schemes considered are the second-order leapfrog format, the second-order explicit Newmark format (Newmark, 1959), and the fourth-order Lax-Wendroff format (Dablain, 1986;Jund and Salmon, 2007).
In the forward propagation of the source wavefield, the displacement values at the boundary nodes are stored for the second-order leapfrog scheme; the accumulated acceleration values at the boundary nodes are stored for the second-order Newmark scheme; and both the displacement and accumulated acceleration values at the boundary nodes are stored for the fourth-order Lax-Wendroff scheme. In the backward propagation of the source wavefield, the stored values replace the updated values at the corresponding positions. In this section, we take the same physics and geometry as before. The top boundary is considered a free surface boundary condition. The fourth-order cubature pointsbased TSEM is adopted to solve equation (1). Figure 15 shows the snapshots and scaled residuals computed by the three temporal discretization schemes. In general, the reconstructed wavefields are almost identical to the theoretical ones. The scaled residuals are at the rounding error level. This further indicates that the developed source wavefield reconstruction method is independent of the temporal discretization scheme.
Reconstruction with the boundary nodes

Application to Reverse Time Migration
The tests above demonstrate that the proposed wavefield reconstruction method is independent of the boundary condition, the order of the Lagrange polynomials, and the element type used for implementation. To test the accuracy and efficiency of the novel algorithm in the adjoint inversion, such as LSRTM (Tang YX, 2009;Liu XJ et al., 2016, 2017Liu YS et al., 2016) and FWI (Tarantola, 1984;Virieux and Operto, 2009;Liu YS et al., 2017b), here we apply the new method to the RTM because the first iteration in the LSRTM is equivalent to one RTM. The adopted model is a 11.5km × 3.75km-sized Marmousi model. Figure 16a shows the model used to synthetize the data, whereas Figure 16b shows a Gaussian smoothed version of the Marmousi velocity model as the migration velocity model to avoid backscattered waves. A Ricker wavelet with a dominant frequency of 22 Hz located at a depth of 0.05 km models the seismic source in this experiment. The synthetic data are acquired from 58 shots equally spaced from each other at a distance of 0.2 km. The first shot is located at a point of coordinates (0.05 km, −0.05 km). The receivers are evenly deployed on the surface with a fixed-spread acquisition geometry. The model is discretized by triangles, whose minimum and maximum edge lengths are 12.0 m and 114.5 m, respectively. Figure 16c shows an enlarged view of the portion of mesh delimited by the rectangle in the center of the other two plots. The number of triangular elements is 126,731. The cubature points-based TSEM (Liu YS et al., 2017a) is adopted to solve the acoustic wave equation. The order of the Lagrange polynomials is 4, which leads to 1,396,664 cubature points in total. According to the CFL number, the maximum allowable time interval is 0.2 ms for a stable wavefield simulation. The wavefield is propagated until 7 s, which leads to 35,000 time steps. The thickness of the PMLs is still 0.1 km. To purely investigate the accuracy and efficiency of the wavefield reconstruction method, we deal with the top boundary condition as a PMLs boundary condition to eliminate the effect of multiple waves on the RTM image, although this is slightly impractical. Figure 17 shows the RTM images obtained by reconstructing the wavefield history from the information in the PMLs (Figure 17a) or at the boundary nodes ( Figure 17b). Intuitively, the images obtained by the two working schemes are quite comparable, which indicates that the developed wavefield reconstruction method can be reliably applied to the adjoint inversion, such as LSRTM and FWI. Of course, for comparison purposes, we have run identical code on the same computer (Intel Xeon E5-2683 v4, 2.10 GHz) except for the difference between the source wavefield reconstruction methods. The same computational language (Fortran 90, 64 threads parallel with the OpenMP) has been implemented and executed throughout the calculation routines on a Windows operating system for the two runs. In terms of the elapsed time, the RTM image obtained by the wavefield values in the PMLs consumed 124.8 hours, whereas the RTM image obtained by the information at the boundary nodes consumed only 111.6 hours. This demonstrates that the I/O operations involved in the conventional reconstruction method (by saving the wavefield history of  the PMLs on disk) significantly depreciates the efficiency of the algorithm. As shown in Figure 7, this efficiency difference in favor of the boundary nodes-based method will be more evident in higher order methods.
In this numerical experiment, the number of nodes in the PMLs is 132,979, whereas the number of boundary nodes is 5,184. As a result, the memory requirement of the second scheme is only 692.2 MB, whereas that of the first scheme (i.e., the conventional method) is up to 17.3 GB, which leads to an MSR of 25.7. With the second scheme, it is perfectly possible to store the entire wavefield history at the boundary nodes in the computer memory. However, with the first scheme, the amount of computer memory needed to store the wavefield history may not be affordable and can be saved only on hard disk, which involves many I/O opera- tions. Even operating with a 2-D model, the proposed wavefield reconstruction method can significantly improve the efficiency of RTM because the nonuniform spacing of the interpolation points for algebraic polynomials places a stringent constraint on the time step (Carcione and Wang PJ, 1993).

Discussion
In this paper, we have reconstructed the source wavefield by storing the last two time-slice wavefield snapshots plus the wavefield history at the boundary nodes. This method makes full use of the elementwise global property of the Laplace operator in the SEM. The memory requirements are equivalent to those needed in the FD scheme (Liu SL et al., 2015;Vasmel and Robertsson, 2016). In comparison with the FD approach, the new method is algorithmically simpler because it does not involve any extra operations. However, the proposed method cannot be extended to the FD domain because the property of the Laplace operator here differs from this property in the FD method. Without limiting ourselves to the SEM, our wavefield reconstruction method could potentially be applied to a weak form-based wave equation solver, for example in the context of the finite element method and the discontinuous Galerkin method (Brossier et al., 2010).
Although the experiments conducted in this study have been performed in 2-D, the application of the new method to 3-D is straightforward. However, like the conventional wavefield reconstruction method in the FD community, this method is valid only in nondissipative media. In dissipative media, it can be combined with the optimal checkpointing method, as Yang PL et al. (2016a) have done through the FD method.
With respect to the memory requirement, interpolating the history of significantly decimated boundaries can mitigate this issue (Yang PL et al., 2016b) because the injection of the boundary sequence in time is essentially determined by the Nyquist sampling principle rather than the much smaller time interval given by the CFL number. To access information at the boundaries each time, it is necessary to adopt some interpolation method, such as the discrete Fourier transform interpolation, Kaiser-windowed sinc interpolation, or Lagrange polynomial interpolation, to interpolate the missing values. Therefore, our method combined with that of Yang PL et al. (2016b) allows the in-core memory-saving wavefield values at the boundaries to be feasible in large-scale 3-D imaging applications. In addition, regardless of whether we present the wavefield reconstruction method in an acoustic medium, its application to other media is straightforward.

Conclusions
We have developed an efficient wavefield reconstruction method for the SEM by making full use of the elementwise global property of the Laplace operator. A series of numerical experiments have allowed us to demonstrate that the accuracy of this method is independent of the order of the Lagrange polynomials, the element type, and the temporal discretization method used for implementation. The residuals obtained with the boundary nodes and also PMLs, magnified up to ten thousand times, reveal the absence of coherent events, thus indicating that the former is able to accurately reconstruct the source wavefield when compared with the conventional method. Because the boundary nodesbased wavefield reconstruction method needs to store only the last two time-slice wavefield snapshots plus the wavefield history on the boundary, it can save memory, from double up to several hundred times, without a loss of accuracy. Not only does the memory requirement decrease significantly, but it also improves the efficiency of the adjoint inversion because it avoids excessive and inefficient I/O operations. To compare the proposed method with the conventional scheme, the new method is finally implemented to obtain RTM imaging. The results indicate that the new algorithm produces almost the same good RTM images, but it requires about 1/25 times the memory compared with the conventional method. With the new method, saving the boundary wavefields in the computer memory is feasible. In addition, among the advantages of the method is that it can potentially be applied to a weak form-based wave equation solver, such as the finite element or discontinuous Galerkin method. Finally, the proposed method is especially appealing for working with higher order methods.