Machine‐learning‐facilitated earthquake and anthropogenic source detections near the Weiyuan Shale Gas Blocks, Sichuan, China

Seismic hazard assessment and risk mitigation depend critically on rapid analysis and characterization of earthquake sequences. Increasing seismicity in shale gas blocks of the Sichuan Basin, China, has presented a serious challenge to monitoring and managing the seismicity itself. In this study, to detect events we apply a machine‐learning‐based phase picker (PhaseNet) to continuous seismic data collected between November 2015 and November 2016 from a temporary network covering the Weiyuan Shale Gas Blocks (SGB). Both P‐ and S‐phases are picked and associated for location. We refine the velocity model by using detected explosions and earthquakes and then relocate the detected events using our new velocity model. Our detections and absolute relocations provide the basis for building a high‐precision earthquake catalog. Our primary catalog contains about 60 times as many earthquakes as those in the catalog of the Chinese Earthquake Network Center (CENC), which used only the sparsely distributed permanent stations. We also measure the local magnitude and achieve magnitude completeness of ML0. We relocate clusters of events, showing sequential migration patterns overlapping with horizontal well branches around several well pads in the Wei202 and Wei204 blocks. Our results demonstrate the applicability of a machine‐learning phase picker to a dense seismic network. The algorithms can facilitate rapid characterization of earthquake sequences.


Introduction
During the past several years, the Southern Sichuan Basin of China has experienced a marked increase in seismicity near hydraulic fracturing (HF) sites, accompanied by increasing seismic hazard as the maximum magnitudes of earthquakes has climbed (Figure 1) (Lei XL et al., 2017Meng LY et al., 2019;Sheng MH et al., 2020;Yang HF et al., 2020). Earthquakes associated with the development and production of natural gas from tight formations pose a threat not only to local communities but also to the safe produc-tion of shale gas (Ellsworth, 2013;Holland, 2013;Clarke et al., 2014;Atkinson et al., 2016;Bao XW and Eaton, 2016;Grigoli et al., 2017;Yang HF et al., 2017). Accurate seismic hazard assessment for potentially induced earthquakes depends critically on how well we know the stress conditions on those faults and the mechanisms of earthquake induction (Walsh III and Zoback, 2016;Schoenball and Ellsworth, 2017;Yu HY et al., 2019).
Various techniques, e.g., focal mechanism inversion, geodetic observations, and geologic surveys, have been widely used to investigate fault structures after earthquakes occur, but all of these approaches have limited resolution of small-scale fault structures, focal depths, and hidden faults (Zhu LP and Helmberger, 1996;Xu WB et al., 2015;Yang HF, 2015;Zhou PC et al., 2019). This is particularly true for induced earthquake-prone regions, where small faults, including many that are unmapped, have proved capable of hosting small-to-moderate-size earthquakes (Schoenball et al., 2018). Because many of the potentially seismogenic faults have limited or no surface expression, their detailed geometry is difficult to determine. High-precision earthquake hypocenters have the potential to illuminate the activating faults before damaging earthquakes occur (Rubin et al., 1999;Waldhauser and Ellsworth, 2002;Kao H and Shan SJ, 2007;Peng ZG and Zhao P, 2009;Yang HF et al., 2009;Schoenball et al., 2018). During hydraulic fracturing operations, or later during production, a near-real-time precise earthquake catalog can thus inform both operators and authorities about evolving hazard conditions (Lee et al., 2019;Langenbruch et al., 2020;Schultz et al., 2020).
A combination of the number and accuracy of picked phase arrivals and the distribution of stations determines the quality of an earthquake catalog (Lee and Stewart, 1981). The template matching method (e.g., Gibbons and Ringdal, 2006;Peng ZG and Zhao P, 2009;Yang HF et al., 2009;Skoumal et al., 2015;Zhang M and Wen LX, 2015;Shelly et al., 2016) has proven to be a powerful means to expand seismicity catalogs but relies on the initial number of earthquakes and on the quality and completeness of recorded template waveforms. Unsupervised search for earthquakes using data mining methods are highly effective but currently impractical for real-time application (Yoon et al., 2015(Yoon et al., , 2019. Conventional automatic algorithms (e.g., short-term average/longterm average STA/LTA method; Allen, 1978) work well for impulsive P-wave arrivals but suffer from limited efficiency and accuracy when signals are emergent, or noise levels are high. Moreover, in densely instrumented areas where the data flow outpaces seismic analyst data processing rates, many events must be skipped, and S waves may go unanalyzed.
Machine-learning-based techniques have now matured to outperform routine analysis methods (e.g., Ross et al., 2018;Mousavi et al., 2019b;Wang J et al., 2019;Zhu WQ and Beroza, 2019). PhaseNet (Zhu WQ and Beroza, 2019) is a deep-neural-network-based arrival-time picking method. Although it was trained on northern California earthquakes, it has since demonstrated success in picking accurate arrival times of both P and S waves in diverse regions and signal environments, which has made it an attractive choice for routine earthquake detection Liu M et al., 2020;Park et al., 2020;Wang MM et al., 2020;Tan YJ et al., 2021).
In addition to robust arrival time picking, real-time seismic monitoring and catalog construction depend crucially on accurate event association and location techniques (Johnson et al., 1997;Zhang M et al., 2019;Johnson, 2020). Rapid communication of accurate information between network monitoring and oil field operations, particularly when small earthquakes occur in rapid succession, is fundamentally important for safety (Schultz et al., 2020). Phase association methods follow a variety of strategies to group candidate picks into events that best fit the predicted arrival times (e.g., Stewart, 1977;Ester et al., 1996;Johnson et al., 1997;Grigoli et al., 2018;Bergen and Beroza, 2018;Ross et al., 2019). Among them, the Rapid Earthquake Association and Location (REAL) method  can determine the location directly with sufficient accuracy for decision making, given a  0  10  20  30  40  50  60  70  80  90  100  110  120 (Yang HF et al., 2020). The magenta cross marks the location of a Wastewater Disposal (WD) well with an injection depth of 4 km. The stars mark earthquakes (black, before our study period; yellow, after our study period). (Inset) Location map of the Sichuan Basin; the red square indicates the study region. (b) Numbers of wells completely annually and cumulative well count (black curve). (c) Magnitude-time plot of the CENC catalog in the study region, and cumulative count of earthquakes. Earthquakes occurring in our study period in red. sufficient number of phase picks and a reasonably accurate velocity model. Simultaneous solutions of both velocity and hypocentral parameters are required to achieve precise earthquake locations (e.g., VELEST; Kissling et al., 1994). Recent applications of REAL in several regions have demonstrated its ability to characterize earthquake sequences rapidly from raw seismic data, in combination with a machine-learning phase picker (e.g., PhaseNet), and a sequential workflow (Liu M et al., 2020;Park et al., 2020;Wang RJ et al., 2020;Tan YJ et al., 2021;Wong et al., 2021).
In this study, we investigate earthquakes that occurred in the region surrounding the Weiyuan Shale Gas Blocks (SGB), Sichuan, China, between November 2015 and November 2016, using a temporary network of 50 stations (Figure 1). The deployment occurred during a time when small-magnitude earthquakes were increasing in number, but three years before the deadly M w 4−5 class earthquakes that occurred in 2019 (Sheng MH et al., 2020;Wang MM et al., 2020;Yang HF et al., 2020). Here we demonstrate the feasibility of rapidly generating a catalog using a dense network and machine-learning algorithms. To generate the base for building a high-resolution earthquake catalog, we follow the workflow of Liu M et al. (2020), starting with phase detection and picking by applying the machine learning picker PhaseNet (Zhu WQ and Beroza, 2019) followed by event association with REAL  and hypocentral location using VELEST (Kissling et al., 1994). Our REAL catalog contains 79,395 events, constructed using 981,284 P and 859,576 S arrival time readings. Based on our VELEST-located earthquakes, we are able further to refine and reveal the seismicity migration patterns surrounding several HF wells.

Data
The Weiyuan SGB is located in the Weiyuan-Rongxian counties and Zizhong City in the Sichuan Province. The geology is characterized by the Weiyuan anticline, an NE-SW trending dome structure with many mapped faults trending in the NE direction on the north flank and several faults trending in the SE direction on the south flank ( Figure 1) (Liang X et al., 2019). Between 1980 and 2015, the region within 25-km from Weiyuan County was struck by only 12 earthquakes with magnitudes greater than 3, according to the China Earthquake Network Center (CENC) routine catalog ( Figure 1); most of these 12 events were located to the south of our study region, near Zigong City. The occurrence rate of smaller earthquakes (M < 3) was as low as about one earthquake per week before April 2015 and after 2009, when the CENC routine catalog became available. However, frequent earthquakes (~12 earthquakes per day) have occurred in the Weiyuan SGB since May 2015, at pace with the development of HF wells in the region ( Figure 1). Shale gas resource is abundant in the upper Ordovician Wufeng Formation to lower Silurian Longmaxi Shale Formation (Lei XL et al., 2017;Wu X et al., 2019). By 2015, more than 100 HF well pads were projected, and among them, 17 had been drilled, with typical completion depths of around 4.8 km below ground level (~400 m) in the Wei202 block (near station s15 in Figure 1) and 5.8 km below ground level (~350 m) for Wei204 block (near station s22 in Figure 1) (Safety and Environmental Protection Technology Research Institute of CNPC and Beijing CNPC Construction Project Labor Safety and Health Pre Evaluation Co., Ltd, 2015;Dong DZ et al., 2018). Figure 1 shows HF wells in production or being drilled by late 2015.
A temporary seismic network of 50 stations was deployed in the area from November 2015 to November 2016 (Figure 2a), covering the Weiyuan SGB ( Figure 1). Each station was equipped with a Guralp CMG-40T seismograph that was set with a sampling rate of 100 Hz. The average station spacing was 5 km. The network was in normal service starting from late November 2015. However, in the late stage of deployment of the network, some stations failed due  The seismic monitoring system in the Weiyuan region relied on the Sichuan network with more distant regional stations (Yang HF et al., 2020). Among them, only three stations are located within 30 km of Weiyuan County, which is in the center of our study region ( Figure 1a). CENC uses these and more distant stations for constructing the routinely reported catalog in the Sichuan Basin.
During the operation period of the temporary seismic network, 1,217 earthquakes were reported with a magnitude range of 0.1−3.3 in the study area falling within Latitude [29.35°, 29.85°] and Longitude [104.5°, 105°] ( Figure 1).

Construction of the Event Catalog
We applied a sequential workflow to process to one year's continuous seismic data to demonstrate how machine learning methods can expand the catalog in this region, even without a priori information from the standard catalog.

Events Detection
We use a 30-s time window to divide the continuous data into segments for input to PhaseNet. Neighboring segments overlap by 50% to enhance the completeness of picking by avoiding edge effects. To PhaseNet, we input three-component seismic waveforms of each segment; PhaseNet classifies each sample point with a probability that it is either a P wave, S wave, or noise; the three probabilities sum to unity. The noise is defined as the complement to P and S waves in the data (Zhu WQ and Beroza, 2019). Peaks in the P-wave and S-wave probability distributions corres-pond to the P and S arrival times ( Figure 3). Conservatively, we set a probability of 0.5 as the acceptance threshold for both P and S picks (e.g., Figure 4). Smaller values might lead, especially in noisy data, to a larger number of false or bad picks (e.g., Liu M et al., 2019;Wang RJ et al., 2020).
The detected phases consist of 4,860,125 P-wave arrival times and 3,567,610 S-wave arrival times ( Figure 2b). The number of detections at each station correlates with the station's uptime . Complete P and S picking from one month's continuous data from a single station, performed on a MacBook laptop computer (3.1 GHz Dual-core Intel Core i5 processor), takes approximately 2 CPU hours.

Events Association
Next, we use the Rapid Earthquake Association and Location (REAL) package  to create events out of the raw-pick data stream. REAL scans the input picks in time order, searching for associations of both P-and S-phases to form events on a 3D grid. Here we use a grid spacing of 2 km. The algorithm projects arrival times from each station onto the grid and searches for the grid point with the largest number of coincident arrivals. If several grid point have the same (maximum) number of picks, we select the one with the smallest root-mean-square misfit . The optimal stacking point of arrival times corresponds to the initial hypocenter. Associated arrivals are declared in an event and thus removed, and the process steps forward in time with the oldest remaining arrival time.
In the original implementation, REAL associated arrival picks to a particular earthquake by counting the number of picks within the

Earth and Planetary Physics
doi: 10.26464/epp2021053 505 theoretical travel-time windows. Occasionally, when events overlapped in space and time, a pick could be associated to multiple events, which is obviously problematic. To make the result more robust against a large event splitting into multiple small events with small residuals, we use a modified REAL that assigns individual picks to the event with the greatest number of picks (Tan YJ et al., 2021). Consequently, this version of REAL keeps only the most reliable event within a time window (e.g., 1 s around P waves and 1.6 s around S waves), thus potentially missing events that occur closely in space and time.
We precalculated the theoretical P and S travel-time tables using the TauP Toolkit (Crotwell et al., 1999) and a 1-D velocity model (Lei XL et al., 2017). The search intervals for the event location are 0.02° over a 0.2° by 0.2° (in latitude and longitude) horizontal range and 2 km for a depth range above 20 km, respectively. The grid is centered at the station that recorded the earliest P phase. We set the association threshold to 5 P picks and a total of 10 P or S picks. So at least five stations are needed to declare an event.
Events occurring close to the boundary of the study area or with a station azimuthal gap larger than 250° are deleted in the association procedure. We also remove S picks that occur less than 0.5 seconds after P, as described by Zhang M et al. (2019).
To evaluate the rate of false positives, we visually checked the associated 703 P and 710 S phase picks on station s22 for one day's data (data from 2016-03-07; Figure 4), finding that PhaseNet picked phase arrivals times that show subsample-level agreement with manual picks, with average differences of 0.002 and 0.007 s for P and S phases, respectively (Figure 5a−b). Moreover, most of the associated phases show picking probabilities over 0.7 (Figure 5c), which validates setting a probability of 0.5 as the acceptance threshold for both P and S picks, as our approach to dis-carding bad picks on noisy data.
The REAL catalog contains 79,395 events from 12 November 2015 through 18 November 2016 ( Figure 6). 25% of the PhaseNet picks are successfully associated, consisting of 981,284 P-wave arrival times and 859,576 S-wave arrival times (Figures 2c and 5d). The computation time depends on the number of picks recorded by our network. For one day of data, it takes about 10 minutes on a MacBook laptop computer (3.1 GHz Dual-core Intel Core i5 processor) to finish associating ten thousand picks.

Local Magnitude Estimation
For each event, we measure the maximum amplitudes of horizontal components after deconvolving the instrument response from the raw waveforms and then convolving with the theoretical Wood-Anderson seismometer response . The measured waveform window starts 0.5 sec before P arrivals and is twice the S-P interval in length (e.g., Figure 7a). If an arrival time is missing, either P or S, we use the predicted travel times. We compute source-to-station distances using the locations from REAL, and the attenuation table of Richter (1958) to determine the local magnitude (M L ). Then we average the measurements at individual stations to estimate the magnitude of each event.
The estimated magnitudes range from M L −0.5 to 3.5. Just two earthquakes were larger than M L 3: an M L 3.3 earthquake on 2016-01-07, and an M L 3.5 on 2016-07-26. Moreover, we compare the event waveforms for the M L > 2.8 earthquakes, which were all reported by CENC but with generally lower magnitudes. The waveforms and peak amplitudes of M L > 2.8 earthquakes, with similar epicentral distances between 14 and 16 km recorded on stations s22, suggest that their magnitudes are reliably estimated. Location uncertainties are significantly reduced; the RMS (root mean  (Figure 9a). Overall, by using the temporary array we are able to locate about 60 times more events than are in the CENC catalog ( Figure 9b). Note that the seismicity rate trends of the two catalogs generally agree well with each other except after October 2016, when the CENC seismicity rate climbs to its peak while our REAL catalog seismicity rate flattens. The numbers of our located events show a three-fold increase than CENC catalog after October 2016, but the removal of 41 of the 50 stations of our temporary network (Figure 2a) results in decreased event locatability; thus the REAL catalog seismicity rate drops abnormally at the very end of our observation period ( Figure 9b).   (Figures 10a−b). Note that most of the larger magnitude earthquakes are located close to HF wells (Figures 10c−d). Moreover, earthquakes with M L over 2.5 are concentrated near the wells in the Wei202 block (the west region in Figure 6d). In contrast, the Wei204 block features very dense concentrations of smaller events (east region in Figures 6d and 10a−b).

1-D Velocity Model Refinement
Here we present a modified 1-D velocity model of the shallow structure by using VELEST (Kissling et al., 1994) and our picked P and S arrivals. In VELEST, the non-linear 'coupled hypocenter-velocity model problem' is linearized and iteratively solved by inversion of the damped least-square matrix of traveltime partial derivatives (Kissling et al., 1994). We solve for the S-and the P-wave ve-locity models independently by using VELEST since many P-and S-wave arrival times of high quality were picked by PhaseNet (Zhu WQ and Beroza, 2019).
A starting model was derived from combinations of several documented velocity models with different layered interfaces that were potentially applicable to our study. Zhao Z et al. (1997)  (2020) inverted the S velocities from ambient noise tomography results. In running VELEST, we modified the model by Zhao Z et al. (1997) by adding several shallow layers above 2 km to the initial model ( Figure 8); previous studies had suggested the possibility of significantly lower shallow velocities.
To utilize VELEST in updating the velocity model, we selected 1,600 events with 400 explosions in the south and 1200 earthquakes that are well-distributed over the Wei202 block, the Wei204 block, and the northern region near Zizhong County. The selected events are the best-recorded ones in each subregion, each with at least 20 associated picks whose phase probabilities were greater than 0.7. Starting with the 1D velocity models modified from Lei XL et al. (2017) and Zeng Q et al. (2020), we estimate optimal 1D P-and S-velocity models by averaging over the outputs of 20 experiments on the selected sub-datasets to account for lateral heterogeneities and the trade-off between epicentral locations and station effects. Damping factors for the hypocentral parameters and velocity parameters were selected by optimizing the data misfit reduction and the parameter resolution. Here we heavily damped the station terms. Low-velocity layers have been avoided so as not to introduce instabilities in the inversion process. Convergence to a stable solution is obtained after 30 iterations.
Despite different initial models, the resulting optimal 1D P-and Svelocity models turn out to be very similar both for the layered depths and the velocities, showing minor dependences on initial velocities and numbers of layered interfaces. These confirm both the precision and accuracy of our locations and demonstrate that the final model we derive is robust.
The best-estimated model, outperforming the other ones in relocating explosions back to the shallow surface (Figure 9a), is defined as the one initiated from Lei XL et al. (2017). The optimal model, with much slower velocities in the shallow part and varying V p /V s ratios for different layered depths (solid red curves in Figure 11a), is adopted in the absolute relocations of the REAL catalog events in the following part.

Absolute Relocation
Subsequently, we relocate all of the 79,395 REAL catalog events ( Figure 12a) by using VELEST and the updated 1-D velocity model ( Figure 11). The well-constrained events are retained with < 200°s tation gap and < 0.6-s travel time residual, resulting in 74,921 relocated events (Figures 12b), showing a success rate of 94%. More specifically, 5,692 out of 7,979 (71%) explosions and 69,229 out of 71,416 (97%) non-explosion events are relocated by VELEST. The explosions and other events that are lost were mostly ones that occurred near the margins of our seismic network (Figures 8 and  12).
The VELEST locations show increased spatial clustering ( Figure  12b). It is noteworthy that the depths of the explosions in the south are relocated to be very shallow, with a concentration above 1 km (Figure 13b). The mean RMS travel time residuals for those explosions drop from 0.207 s to 0.121 s (Figure 13c). As a result, the sweeping behaviors of explosions are much more obvious in VELEST locations (Figure 8b). After removing the relocated explosions, the VELEST catalog's depths peak at around 3 to 4 km, with a much better depth resolution than the REAL locations ( Figures 12 and 13a−b). Moreover, the mean RMS travel time residuals drop from 0.194 s to 0.145 s (Figure 13d).

Catalog Characterization
Among the obtained 79,395 detections, we identify 7,979 explosions and locate 71,416 non-explosion events; the smallest magnitude is −0.5 (Figures 9 and 12). With an updated 1-D velocity model, we are able to relocate 69,229 out of 71,416 events, which are not evenly distributed but concentrate in a few clusters (Figures 4a and 10). The densest clusters appear in the east of the study region, close to the wells of the Wei204 block, with the second densest cluster near the Wei202 block to its west.

Explosions
We relocated 5,692 out of 7,979 detected explosions (M L < 0.5) that occurred exclusively between 2015-12-22 and 2016-01-20 ( Figure 8a). The explosions serve as a benchmark for the perform-ances of our earthquake timing, association, and locations ( Figure  8). They occur with a semi-diurnal pattern of occurrence, almost exclusively at night (Figure 8c). Their magnitudes range from M L −0.4 to 0.5, and most have focal depths at 0 km. During the time of the 3D survey, we located an average of 266 events/day; the peak was 461, recorded on 2016-01-15 (Figure 8c). The temporal progression of these small events sweeps systematically from west to east (Figures 8a−b), showing typical migration characteristics of an active source seismic survey. Conveniently, the explosions facilitate the validation of our velocity model ( Figure 11). Explosions near the southwestern and southeastern margins of our network (Figures 8a−b  gap in retaining the data for VELEST relocations.

Magnitudes of Earthquakes in Our Catalog
We double-checked all M L 2.5 earthquakes in our REAL catalog, and found that they were all reported by CENC but with generally lower magnitudes, which could explain the greater number of M 3 class earthquakes in the REAL catalog compared to the CENC catalog ( Figure 14). The M L > 2.8 earthquakes are carefully validated with their event waveform inspections (Figure 7).
We compare our REAL catalog with those reported in the CENC catalog specified in our study region. Event pairs with their origin time differences smaller than 3 seconds are considered to be the same events. Using that criterion, we have successfully detected 1129 out of 1217 CENC earthquakes and are able to relocate 1,067 of them (Figure 14a). For the same earthquakes, magnitudes in our REAL catalog are systematically larger than those in the CENC catalog by an average of 0.5 units (Figures 14b−c). Such a difference in magnitudes is likely due to different definitions, including our use of Richter's original attenuation table. The mean RMS travel time residuals drop from 0.895 s to 0.199 s and 0.146 s for CENC, REAL, and VELEST relocations, respectively (Figure 15a). The offset between event pairs varies between kilometers to tens of kilometers (Figures 15b−c). We also note that some epicenters in the north from the CENC catalog appear in the south in our REAL catalog and became more tightly clustered (Figures 14a and . Output models using different initial models. Initial model 1 is modified from Lei XL et al. (2017). Initial model 2 is from Zeng Q et al. (2020). The solid red one is applied in the VELEST locations in this study. (a) Input and output velocity models. (b) Zoomed view of depths above 6 km for the velocity models in (a).  Overall, 88 out of 1,217 (7%) of the CENC earthquakes were lost in our REAL catalog (Figures 14a−b) due to too large station azimuthal gaps or insufficient numbers of picks. More specifically, out of the 88 missing CENC earthquakes, 16 occurred before 2015-12 when the temporary network was yet to be fully commis-   The yellow dots show events that were reported by CENC but were missed by our association. The cyan dots are REAL associated locations (CENC events) but lost by VELEST relocation processes. (b) Magnitude-time plot for CENC (gray) and REAL (red) events, with their sizes scaled according to RMS residuals. The yellow dots are missed CENC events. (c) Magnitude differences between our estimated magnitudes and CENC routine reported ones for the same events. The magnitude bin is 0.1. sioned, and 66 occurred after 2016-08, when the temporary network was partly decommissioned (Figure 2a). The remaining six occurred exclusively after 2016-08 during which most temporary stations were removed (Figure 2a). Therefore, our detection success rate can be as high as 99.5% if we focus only on events between 2015-12 and 2016-07 when the temporary network functioned well. With a more strict threshold in retaining the data for VELEST relocations, another 62 earthquakes are lost, their locations distributed near the eastern and western margins of our seismic network (Figure 14a).
We used ZMAP to estimate the magnitude of completeness (M c ) for the REAL and CENC catalogs using the maximum curvature technique for the cumulative frequency-magnitude (Figure 16) (Wiemer and Wyss, 2000;Wiemer, 2001;Mignan and Woessner, 2012;Mignan et al., 2018). After removing explosions, we retain 71,416 REAL events in the magnitude range −0.5 to 3.5, showing uneven distributions of events between day and night (Figures  17a−b). The M c value is estimated to be 0, and the b-value is 1.09. In comparison, the M c value in the CENC catalog is 0.8, and the b-value is 1.04 (Figure 12a). Although our magnitudes are larger than those in the CENC catalog (Figure 14c), we have a much smaller M c value (Figure 12b). Such improvement is attributed to our unique privilege of using a temporary network, which was able to detect and locate many more earthquakes than are in the CENC catalog (Figures 12).
To estimate the temporal pattern of M c , we used a moving window approach and selected the sample window as 500 events overlapping with 125 events. The magnitude bin is 0.1. We compute M c and the b-value in each window and assign the corresponding time as the middle of the window (Figure 17c−d). The M c value remained relatively stable around 0 until June of 2016 (Figure 17c), by which time many stations in the north had been removed (see Figure 2). After then the M c value increased to be around 0.6 ( Figure 17c). The b-value ranged from ~1.0 during quiet times to >1.5 during active times (Figures 17d).

Spatiotemporal Patterns of the Catalog
The REAL and VELEST catalogs exhibit improved spatiotemporal To better understand the characteristics of the REAL seismicity, we split the events into different magnitude ranges ( Figure 10). The 67,774 events with magnitudes below M L 0.5 are widely distributed over the study region ( Figure 10a). As the magnitude range increases, the events become more localized in space. There are 10,996 events with magnitudes between M L 0.5 and 1.5, most of which occurred after February 2016, concentrating near the HF wells and in the north part of our study region (Figure 10b). Clustering in the vicinity of the HF wells becomes more pronounced for events with magnitudes over M L 1.5 (Figures 10c−d). Except for sporadic events near mapped faults and in the south part of the study region, most of the larger earthquakes are located within ~5 km of the nearest HF wells (Figure 10). One exception is the cluster in the northeast near Zizhong City (Figures 10 and 12).
We detected and located about twice as many events at night time compared to daytime (Figures 17a−b). It is likely a result of variations of cultural noise levels and wind-generated noise (e.g., Wang RJ et al., 2020;Tan YJ et al., 2021). The difference can also be seen from the estimated M c and b-values for the separate analysis of the two groups of diurnal seismicity (Figures 16b and 17c−d).   both the relatively quiet and active periods (Figures 17c−d). Overall, the M c and b-values of earthquakes in the daytime show more significant temporal variations.
The spatiotemporal clustering patterns are obvious for events with a magnitude below M L 1.5 but less so for larger magnitude events ( Figures 12). Overall, the larger magnitude events near the HF wells show relatively much weaker temporal clustering than do smaller events (Figure 18a−b). The VELEST catalog shows multiple distinct clusters by considering their occurrence times and distance to nearby HF wells (Figures 12b and 18a−b). With a depth concentration between 2 and 4 km, they are located well within the seismic network ( Figure 12b); thus, their hypocenters are better constrained and discussed in detail.
The seismicity in Wei202 block consists of 4,749 total events. Most seismicity in Wei202 block is densely located atop the horizontal branches of the six HF wells (Figure 18a). The seismicity in the northeastern part of the Wei202 block (Figure 18a), with the largest magnitude of M L 3.3, shows close proximity to the upward wells of Wei202H3, suggesting a causal link. The magnitude-time plots show that the seismicity rate is initially high in 2015-11, the beginning of the Weiyuan earthquake sequence (Figure 18c). Later on, the seismicity continues to increase, and we see a gently decreased level of seismic activity until 2016-04. The seismicity rate remains at low levels (~1 event per day) between 2016-05 and 2016-07. After 2016-08, clusters of seismicity emerge near the well toe of Wei202H6 and the well head of Wei202H4.  2016-02, and early 2016-10, with rapid decay of seismicity near the onset regions (Figure 18b). We found events densely located near the Wei204 block between 2016-03 and 2016-05 ( Figure  18b). The b-values are larger than 1, and M c is below 0 during this time interval (Figures 17c−d). Given the strong spatial correlation between populated seismicity and wells, it is highly likely that the groups of seismicity between 2016-03 and 2016-05 are linked to HF activities of different horizontal well branches from Wei204H6 (Figures 18b and 18d). The largest earthquake in our observed Weiyuan sequence is located near the well toes of the upward horizontal wells from Wei204H6. Additionally, we are able to identify three distinctly grouped events extending around upward wells of Wei204H4, Wei204H9 and Wei204H11. Each group of seismicity lasted only for several weeks and occurred separately in the very beginning and end of our study period (Figures 18b and  18d). The location, orientation, and timing of the seismicity suggest that the two groups of events were probably induced by HF simulation of wells Wei204H4 Wei204H9 and Wei204H11, respectively.

Discussion and Conclusions
We have built a comprehensive earthquake catalog from raw continuous seismic data using a machine-learning phase picker (Zhu WQ and Beroza, 2019) and a sequential workflow of phase association and location algorithms Liu M et al., 2020). In this study, we are able to report that the processing of 1year datasets from 50 stations can be done in about 750 CPU hours. Moreover, parallel processing and GPU-based execution will be even faster, costing only minutes (Liu M et al., 2020).
Machine learning and seismology benefit from each other (Kong QK et al., 2019). An important ingredient for improving machine learning is the assembly of large labeled data sets. In seismology, we have such large labeled seismic data sets, e.g., waveforms with manually or automatically picked and manually reviewed arrival times (Mousavi et al., 2019a). Seismology is one of the best fields for developing and testing machine learning algorithms, which will eventually drive the development of machine learning. PhaseNet demonstrates high precision, particularly for S waves (Zhu WQ and Beroza, 2019). The database of accurate P and S arrival times allows us to improve both P-and S-wave velocity models and is especially useful for further polarization-based and Swave-based seismic analysis (Zhu WQ and Beroza, 2019).
In the phase picking step, the sensitive detection performance of PhaseNet can facilitate REAL in successful association and initial location of weak events. PhaseNet also detects explosions with clear P or S arrivals. The local velocity model we used (Lei XL et al., 2017) was initially developed for the Changning region, which is within ~100 km from our study region. Consequently, the velocities of the deeper depths are likely suitable for our study region, but the shallow velocities might not be very accurate, as can be seen from our velocity model validations ( Figure 11). Using such a reasonably good velocity model, the depths of more than half of the 7,979 explosions are recovered to be 0 km (Figure 13a). Suspected shots are typically treated as a nuisance during construction of seismic catalogs; however, as we have shown, they can be quite valuable for the validation of location accuracy since they provide us the information of their depths from the surface and benefit our velocity model calibrations (Figures 11 and 13). In this study, we knew of their existence ( Figure 8) (INFOPETRO, 2016) and could remove them during post-processing. Therefore, explosions were of minimal concern to the accuracy of sequential workflow during the phase association and relocation procedures (e.g., Park et al., 2020). Some events that occur closely in space and time might be missing in this type of analysis, since REAL keeps only the most reliable one within a time window .
Compared to continuous-waveform-based earthquake detection methods (e.g., Zhang M and Wen LX, 2015;Kao H and Shan SJ, 2007), REAL reduces the computational load by using discrete time phase picks. REAL possesses the main advantages of both pick-based and waveform-based detection and location methods , enabling it to work successfully with sparse station coverage. We detected relatively fewer events after August 2016 near the Wei204 block, a bias that is likely to have been due to the removal of stations. Only 10 stations were in service after that time in this area ( Figure 2). Station coverage in the east also became poor due to the removal of stations. Even so, we detected and located some event clusters in the north of the Wei204 block ( Figure 12).
The phase data and initial hypocenters produced by this workflow serve as the initial step toward the construction of a highquality and robust earthquake catalog, to be followed by precision relocation using a range of modern tools, e.g., hypoDD (Waldhauser and Ellsworth, 2002). Although the number of detections greatly increased over routine analysis, we likely missed events for a variety of reasons, including insufficient number of picks, events spaced too closely in space and time, or events with large station gaps and/or travel time residuals. Small numbers of picks (especially S picks) or large timing errors can result in large location errors since the depth-origin trade-off most strongly affects locations based on P waves (Zhu WQ and Beroza, 2019).
We suggest that much of the microseismicity in the earthquake sequence may be a result of HF stimulation in Wei202 and Wei204 blocks. The combination of sensitive detection and precise location of microseismicity, and a preliminary database of HF well information, has allowed us to distinguish groups of events induced by HF. The multiple well influences on seismicity pose significant challenges for seismic hazard mitigation, where different actions would be required for different wells.
In the CENC catalog, it is difficult to associate earthquakes with HF wells due to the small number of earthquakes and uncertain locations (Figures 14a). For the same earthquakes in the two catalogs, significant location improvements were obtained (Figures 14a−b  and 15a). Our results improve the spatial resolution of the ~1,200 earthquakes reported in the CENC catalog (Figure 14a), showing potential to better separate the seismicity near HF wells from natural tectonic events (Figure 12b). Although we found many small earthquakes clustered near HF wells (Figures 18a−b), it is still challenging to understand the connection, if any, with shale gas development and production without a higher precision catalog. The presence of multiple closely-spaced HF wells poses significant challenges for seismic hazard mitigation since different ac-tions might be required depending on the causal connection (Figures 18a−b). Thus, we are motivated to refine further the VELEST catalog by using relative relocation methods to unfold their detailed spatiotemporal patterns in our companion paper.
In sum, we have constructed a basic earthquake catalog for the Weiyuan earthquake sequence from 12 November 2015 to 18 November 2016 from raw data without manual review. We have detected and located about 60 times as many events as the routine CENC catalog. We have also used explosion and earthquake data to calibrate the local 1-D P and S velocity models. We were able to relocate 5,692 explosions and other 69,229 events. Our study demonstrates the feasibility of characterizing earthquake sequences, potentially induced seismicity, and artificial sources, using machine-learning detection and timing and a highperformance sequential earthquake monitoring workflow.