DFT-EFE
 
Loading...
Searching...
No Matches
MPIRequestersNBX.h
Go to the documentation of this file.
1/******************************************************************************
2 * Copyright (c) 2021. *
3 * The Regents of the University of Michigan and DFT-EFE developers. *
4 * *
5 * This file is part of the DFT-EFE code. *
6 * *
7 * DFT-EFE is free software: you can redistribute it and/or modify *
8 * it under the terms of the Lesser GNU General Public License as *
9 * published by the Free Software Foundation, either version 3 of *
10 * the License, or (at your option) any later version. *
11 * *
12 * DFT-EFE is distributed in the hope that it will be useful, but *
13 * WITHOUT ANY WARRANTY; without even the implied warranty *
14 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. *
15 * See the Lesser GNU General Public License for more details. *
16 * *
17 * You should have received a copy of the GNU Lesser General Public *
18 * License at the top level of DFT-EFE distribution. If not, see *
19 * <https://www.gnu.org/licenses/>. *
20 ******************************************************************************/
21
22/*
23 * @author Bikash Kanungo
24 */
25
26#ifndef dftefeMPIRequestersNBX_h
27#define dftefeMPIRequestersNBX_h
28
29#include <utils/TypeConfig.h>
30#include <utils/MPITypes.h>
32#include <vector>
33#include <memory>
34#include <set>
35namespace dftefe
36{
37 namespace utils
38 {
39 namespace mpi
40 {
42 {
43 /*
44 * @brief Implements the Non-blocking Consensus (NBX) algorithm as
45 * described in the following paper to determine the list of requesting
46 * processors for the current processors
47 * @article{hoefler2010scalable,
48 * title={Scalable communication protocols for dynamic sparse data
49 * exchange}, author={Hoefler, Torsten and Siebert, Christian and
50 * Lumsdaine, Andrew}, journal={ACM Sigplan Notices}, volume={45},
51 * number={5},
52 * pages={159--168},
53 * year={2010},
54 * publisher={ACM New York, NY, USA}
55 * }
56 */
57
58 /*
59 * The following is a brief description of the typical use case
60 * situation. Each processor has a list of target processors to which it
61 * wants to send a message (think of it as a message to another
62 * processor to request some data that is owned by the other processor).
63 * Similarly, other processors might be requesting the current processor
64 * for some of the data owned by the current processor. However, the
65 * current processor has no apriori knowledge of which processors will
66 * be requesting data from it. The challenge is to utilize the current
67 * processor's list of target processors to determine the current
68 * processor's requesting processors. In other words, we have to use a
69 * one way communication information to figure out the other way (its
70 * dual).
71 *
72 * Perhaps a more concrete example might help. Let's say, we have a
73 * vector/array which is distributed across a set of processors.
74 * Each processors own part of the vector. The ownership is exclusive,
75 * i.e., a processor is the sole owner of its part of the vector.
76 * In practice, it means that the processor owns a set of indices of the
77 * vector. Additionally, the different sets of owning indices across all
78 * the processors are disjoint. Moreover, the union of the sets across
79 * all the processors gives the set of indices of the distributed
80 * vector. However, each processor also needs information on a set of
81 * non-owned indices (hereafter termed ghost indices) based on the needs
82 * of the application. Based on the ghost indices, the current processor
83 * can easily determine the processor where it is owned. These
84 * processors are termed as target processors to which the current
85 * processor has to send a request to access the ghost data. Similarly,
86 * the ghost indices in some other processor might be owned by this
87 * processor. In that case, the other processor will be sending a
88 * request to the current processor to access some of its data (data
89 * which is ghost to the other processor but owned by the current
90 * processor). But the current processor has no apriori knowledge of
91 * which processors will be requesting data from it. A knowledge of it
92 * will help the current processor to prepare for the request of data.
93 *
94 * In cases of sparse communication, that is, where each processor only
95 * needs to communicate with a small subset of the total number of
96 * processors, the NBX algorithm offers an algorithm of complexity
97 * O(log(P)) (where P is the number of processors) to determing the
98 * list of requesting processors. The algorithm works as follows:
99 *
100 * 1. The current processor sends nonblocking synchronous message
101 * (i.e., MPI_ISsend) to all its target processors. Remember that
102 * the current processor already has information about its target
103 * processors. Also, note that the status of the nonblocking
104 * synchronous send turns to "completed" only when a when
105 * the message has been received by a receiving processor. Let's
106 * call this operation as the "local-send", as we are sending
107 * requests to target processors that are locally known by the current
108 * processor.
109 *
110 * 2. The current processor keeps on doing nonblocking probe for
111 * incoming message (i.e., MPI_IProbe). The MPI_IProbe checks if there
112 * is an incoming message matching a given source and tag or not. The
113 * source is the index of the source processor sending the message and
114 * tag is an MPI_tag associated with exchange. It does not initiate any
115 * receive operation , it only verfies whether there is something to be
116 * received or not. For our purpose, we will use a wildcards
117 * MPI_ANY_SOURCE and MPI_ANY_TAG, as we just want to know if there is
118 * an incoming message or not. In the event that there is an incoming
119 * message (i.e., the MPI_IProbe's flag is true), we can extract the
120 * source processor from the status handle of the MPI_IProbe and append
121 * it to a list that stores the requesting processor IDs. Addtionally,
122 * in the event that there is an incoming messag, we call a non-blocking
123 * receive (i.e., MPI_IRecv) to initiate the actual
124 * reception of the incoming. The MPI_Recv, in turn, will complete
125 * the status of source processor's MPI_ISsend through which the
126 * incoming message was sent to the current processor. Thus, we
127 * achieve two things over here: we detected a requesting processor
128 * and we also signaled the requesting processor that we have received
129 * their message. But this is only job half-done. How do we tell the
130 * current processor to stop probing for incoming message? And how do
131 * inform all the processors involved that all the incoming messages
132 * across all the processors have been received? This kind of problem
133 * is what is called a Consensus Problem
134 * (https://en.wikipedia.org/wiki/Consensus_(computer_science)).
135 * The way to reach the consensus in NBX is a two-step process:
136 * (a) the current processor checks if all the "local-send"
137 * (see #1 above) has been received or not.
138 * That is, if the status handle of all its MPI_ISsend have turned
139 * to completed or not. If all the local"local-send" have been
140 * completed, we initiate a non-blocking barrier (i.e.,
141 * MPI_IBarrier) on the current processor. This informs the network that
142 * the current processor has witnessed its part of an event (in this
143 * case the event is the completion of all its "local-send"). (b) the
144 * above only informs the network that the all "local-send" of the
145 * current processor have been received. But the current processor
146 * can still have incoming messages to be receieved. Hence, the current
147 * processor keeps on probing and receiving incoming messages, until
148 * the non-blocking barrier (MPI_IBarrier) (as mentioned
149 * above in (a)) has been invoked by all the processors. This can be
150 * checked from the status handle of the MPI_IBarrier, which
151 * completes only when all processors call it.
152 * At a stage when the status of MPI_IBarrier turns to completed,
153 * we know for sure that all the "local-send" of all
154 * the processors have been received and that there are no more
155 * incoming messages in any processor to be received. Thus, we
156 * can now safely terminate the nonblocking probe on all processors.
157 *
158 *
159 *
160 * @note: Since we are only interested in knowing the requesting
161 * processors for the current processor, we only need token
162 * MPI sends and receives (e.g., just an integer across) instead
163 * of large chunks of data. To that end, we harcode all the send
164 * and receive buffers to be of integer type
165 */
166
167 public:
168 MPIRequestersNBX(const std::vector<size_type> &targetIDs,
169 const MPIComm & comm);
170 //
171 // default Constructor for serial (without MPI) compilation
172 //
173 MPIRequestersNBX() = default;
174
175 std::vector<size_type>
176 getRequestingRankIds() override;
177
178 private:
182 std::vector<size_type> d_targetIDs;
183
187 std::vector<int> d_sendBuffers;
188
192 std::vector<MPIRequest> d_sendRequests;
193
202 std::vector<std::unique_ptr<int>> d_recvBuffers;
203
207 std::vector<std::unique_ptr<MPIRequest>> d_recvRequests;
208
209 //
210 // request for barrier
211 //
213
214 //
215 // MPI communicator
216 //
218
222 std::set<size_type> d_requestingProcesses;
223
226
231 bool
233
239 void
241
246 bool
248
252 void
254
258 void
260
266 void
267 finish();
268 };
269
270 } // end of namespace mpi
271 } // end of namespace utils
272} // end of namespace dftefe
273#endif // dftefeMPIRequestersNBX_h
Definition: MPIRequestersBase.h:37
Definition: MPIRequestersNBX.h:42
std::vector< std::unique_ptr< int > > d_recvBuffers
Definition: MPIRequestersNBX.h:202
std::vector< size_type > d_targetIDs
Definition: MPIRequestersNBX.h:182
std::set< size_type > d_requestingProcesses
Definition: MPIRequestersNBX.h:222
std::vector< std::unique_ptr< MPIRequest > > d_recvRequests
Definition: MPIRequestersNBX.h:207
bool haveAllLocalSendReceived()
Definition: MPIRequestersNBX.cpp:176
std::vector< int > d_sendBuffers
Definition: MPIRequestersNBX.h:187
void probeAndReceiveIncomingMsg()
Definition: MPIRequestersNBX.cpp:116
const MPIComm & d_comm
Definition: MPIRequestersNBX.h:217
MPIRequest d_barrierRequest
Definition: MPIRequestersNBX.h:212
std::vector< MPIRequest > d_sendRequests
Definition: MPIRequestersNBX.h:192
void finish()
Definition: MPIRequestersNBX.cpp:221
int d_numProcessors
Definition: MPIRequestersNBX.h:224
bool haveAllIncomingMsgsReceived()
Definition: MPIRequestersNBX.cpp:207
int d_myRank
Definition: MPIRequestersNBX.h:225
void startLocalSend()
Definition: MPIRequestersNBX.cpp:80
std::vector< size_type > getRequestingRankIds() override
Definition: MPIRequestersNBX.cpp:61
void signalLocalSendCompletion()
Definition: MPIRequestersNBX.cpp:197
int MPIComm
Definition: MPITypes.h:83
int MPIRequest
Definition: MPITypes.h:84
dealii includes
Definition: AtomFieldDataSpherical.cpp:31