DFT-FE 1.1.0-pre
Density Functional Theory With Finite-Elements
Loading...
Searching...
No Matches
MPIRequestersNBX.h
Go to the documentation of this file.
1// ---------------------------------------------------------------------
2//
3// Copyright (c) 2017-2025 The Regents of the University of Michigan and DFT-FE
4// authors.
5//
6// This file is part of the DFT-FE code.
7//
8// The DFT-FE code is free software; you can use it, redistribute
9// it, and/or modify it under the terms of the GNU Lesser General
10// Public License as published by the Free Software Foundation; either
11// version 2.1 of the License, or (at your option) any later version.
12// The full text of the license can be found in the file LICENSE at
13// the top level of the DFT-FE distribution.
14//
15// ---------------------------------------------------------------------
16//
17
18/*
19 * @author Bikash Kanungo, Sambit Das
20 */
21
22#ifndef dftfeMPIRequestersNBX_h
23#define dftfeMPIRequestersNBX_h
24
25#include <mpi.h>
26#include <TypeConfig.h>
27#include <MPIRequestersBase.h>
28#include <vector>
29#include <set>
30#include <memory>
31namespace dftfe
32{
33 namespace utils
34 {
35 namespace mpi
36 {
38 {
39 /*
40 * @brief Implements the Non-blocking Consensus (NBX) algorithm as
41 * described in the following paper to determine the list of requesting
42 * processors for the current processors
43 * @article{hoefler2010scalable,
44 * title={Scalable communication protocols for dynamic sparse data
45 * exchange}, author={Hoefler, Torsten and Siebert, Christian and
46 * Lumsdaine, Andrew}, journal={ACM Sigplan Notices}, volume={45},
47 * number={5},
48 * pages={159--168},
49 * year={2010},
50 * publisher={ACM New York, NY, USA}
51 * }
52 */
53
54 /*
55 * The following is a brief description of the typical use case
56 * situation. Each processor has a list of target processors to which it
57 * wants to send a message (think of it as a message to another
58 * processor to request some data that is owned by the other processor).
59 * Similarly, other processors might be requesting the current processor
60 * for some of the data owned by the current processor. However, the
61 * current processor has no apriori knowledge of which processors will
62 * be requesting data from it. The challenge is to utilize the current
63 * processor's list of target processors to determine the current
64 * processor's requesting processors. In other words, we have to use a
65 * one way communication information to figure out the other way (its
66 * dual).
67 *
68 * Perhaps a more concrete example might help. Let's say, we have a
69 * vector/array which is distributed across a set of processors.
70 * Each processors own part of the vector. The ownership is exclusive,
71 * i.e., a processor is the sole owner of its part of the vector.
72 * In practice, it means that the processor owns a set of indices of the
73 * vector. Additionally, the different sets of owning indices across all
74 * the processors are disjoint. Moreover, the union of the sets across
75 * all the processors gives the set of indices of the distributed
76 * vector. However, each processor also needs information on a set of
77 * non-owned indices (hereafter termed ghost indices) based on the needs
78 * of the application. Based on the ghost indices, the current processor
79 * can easily determine the processor where it is owned. These
80 * processors are termed as target processors to which the current
81 * processor has to send a request to access the ghost data. Similarly,
82 * the ghost indices in some other processor might be owned by this
83 * processor. In that case, the other processor will be sending a
84 * request to the current processor to access some of its data (data
85 * which is ghost to the other processor but owned by the current
86 * processor). But the current processor has no apriori knowledge of
87 * which processors will be requesting data from it. A knowledge of it
88 * will help the current processor to prepare for the request of data.
89 *
90 * In cases of sparse communication, that is, where each processor only
91 * needs to communicate with a small subset of the total number of
92 * processors, the NBX algorithm offers an algorithm of complexity
93 * O(log(P)) (where P is the number of processors) to determing the
94 * list of requesting processors. The algorithm works as follows:
95 *
96 * 1. The current processor sends nonblocking synchronous message
97 * (i.e., MPI_ISsend) to all its target processors. Remember that
98 * the current processor already has information about its target
99 * processors. Also, note that the status of the nonblocking
100 * synchronous send turns to "completed" only when a when
101 * the message has been received by a receiving processor. Let's
102 * call this operation as the "local-send", as we are sending
103 * requests to target processors that are locally known by the current
104 * processor.
105 *
106 * 2. The current processor keeps on doing nonblocking probe for
107 * incoming message (i.e., MPI_IProbe). The MPI_IProbe checks if there
108 * is an incoming message matching a given source and tag or not. The
109 * source is the index of the source processor sending the message and
110 * tag is an MPI_tag associated with exchange. It does not initiate any
111 * receive operation , it only verfies whether there is something to be
112 * received or not. For our purpose, we will use a wildcards
113 * MPI_ANY_SOURCE and MPI_ANY_TAG, as we just want to know if there is
114 * an incoming message or not. In the event that there is an incoming
115 * message (i.e., the MPI_IProbe's flag is true), we can extract the
116 * source processor from the status handle of the MPI_IProbe and append
117 * it to a list that stores the requesting processor IDs. Addtionally,
118 * in the event that there is an incoming messag, we call a non-blocking
119 * receive (i.e., MPI_IRecv) to initiate the actual
120 * reception of the incoming. The MPI_Recv, in turn, will complete
121 * the status of source processor's MPI_ISsend through which the
122 * incoming message was sent to the current processor. Thus, we
123 * achieve two things over here: we detected a requesting processor
124 * and we also signaled the requesting processor that we have received
125 * their message. But this is only job half-done. How do we tell the
126 * current processor to stop probing for incoming message? And how do
127 * inform all the processors involved that all the incoming messages
128 * across all the processors have been received? This kind of problem
129 * is what is called a Consensus Problem
130 * (https://en.wikipedia.org/wiki/Consensus_(computer_science)).
131 * The way to reach the consensus in NBX is a two-step process:
132 * (a) the current processor checks if all the "local-send"
133 * (see #1 above) has been received or not.
134 * That is, if the status handle of all its MPI_ISsend have turned
135 * to completed or not. If all the local"local-send" have been
136 * completed, we initiate a non-blocking barrier (i.e.,
137 * MPI_IBarrier) on the current processor. This informs the network that
138 * the current processor has witnessed its part of an event (in this
139 * case the event is the completion of all its "local-send"). (b) the
140 * above only informs the network that the all "local-send" of the
141 * current processor have been received. But the current processor
142 * can still have incoming messages to be receieved. Hence, the current
143 * processor keeps on probing and receiving incoming messages, until
144 * the non-blocking barrier (MPI_IBarrier) (as mentioned
145 * above in (a)) has been invoked by all the processors. This can be
146 * checked from the status handle of the MPI_IBarrier, which
147 * completes only when all processors call it.
148 * At a stage when the status of MPI_IBarrier turns to completed,
149 * we know for sure that all the "local-send" of all
150 * the processors have been received and that there are no more
151 * incoming messages in any processor to be received. Thus, we
152 * can now safely terminate the nonblocking probe on all processors.
153 *
154 *
155 *
156 * @note: Since we are only interested in knowing the requesting
157 * processors for the current processor, we only need token
158 * MPI sends and receives (e.g., just an integer across) instead
159 * of large chunks of data. To that end, we harcode all the send
160 * and receive buffers to be of integer type
161 */
162
163 public:
164 MPIRequestersNBX(const std::vector<size_type> &targetIDs,
165 const MPI_Comm & comm);
166 //
167 // default Constructor for serial (without MPI) compilation
168 //
169 MPIRequestersNBX() = default;
170
171 std::vector<size_type>
173
174 private:
175 /**
176 * List of processes this processor wants to send requests to.
177 */
178 std::vector<size_type> d_targetIDs;
179
180 /**
181 * Buffers for sending requests.
182 */
183 std::vector<int> d_sendBuffers;
184
185 /**
186 * Requests for sending requests.
187 */
188 std::vector<MPI_Request> d_sendRequests;
189
190 /**
191 * Buffers for receiving requests.
192 * We use a vector of pointers because that
193 * guarantees that the buffers themselves
194 * are never moved around in memory, even if the vector is
195 * resized and consequently its elements (the pointers) are moved
196 * around.
197 */
198 std::vector<std::unique_ptr<int>> d_recvBuffers;
199
200 /**
201 * Requests for receiving requests.
202 */
203 std::vector<std::unique_ptr<MPI_Request>> d_recvRequests;
204
205 //
206 // request for barrier
207 //
208 MPI_Request d_barrierRequest;
209
210 //
211 // MPI communicator
212 //
213 const MPI_Comm &d_comm;
214
215 /**
216 * List of processes who have made a request to this process.
217 */
218 std::set<size_type> d_requestingProcesses;
219
222
223 /**
224 * Check whether all of message sent from the current processor
225 * to other processors have been received or not
226 */
227 bool
229
230 /**
231 * Signal to all other processors that for this processor
232 * all its message sent to other processors have been received.
233 * This is done nonblocking barrier (i.e., MPI_IBarrier).
234 */
235 void
237
238 /**
239 * Check whether all of the incoming messages from other processors to
240 * the current processor have been received.
241 */
242 bool
244
245 /**
246 * Probe for an incoming message and if there is one receive it
247 */
248 void
250
251 /**
252 * Start to sending message to all the target processors
253 */
254 void
256
257 /**
258 * After all processors have received all the incoming messages,
259 * the MPI data structures can be freed and the received messages
260 * can be processed.
261 */
262 void
264 };
265
266 } // end of namespace mpi
267 } // end of namespace utils
268} // end of namespace dftfe
269#endif // dftfeMPIRequestersNBX_h
Definition MPIRequestersBase.h:33
std::vector< std::unique_ptr< MPI_Request > > d_recvRequests
Definition MPIRequestersNBX.h:203
std::set< size_type > d_requestingProcesses
Definition MPIRequestersNBX.h:218
MPIRequestersNBX(const std::vector< size_type > &targetIDs, const MPI_Comm &comm)
std::vector< MPI_Request > d_sendRequests
Definition MPIRequestersNBX.h:188
std::vector< size_type > getRequestingRankIds() override
std::vector< int > d_sendBuffers
Definition MPIRequestersNBX.h:183
const MPI_Comm & d_comm
Definition MPIRequestersNBX.h:213
int d_numProcessors
Definition MPIRequestersNBX.h:220
std::vector< size_type > d_targetIDs
Definition MPIRequestersNBX.h:178
std::vector< std::unique_ptr< int > > d_recvBuffers
Definition MPIRequestersNBX.h:198
MPI_Request d_barrierRequest
Definition MPIRequestersNBX.h:208
int d_myRank
Definition MPIRequestersNBX.h:221
Definition MPICommunicatorP2P.h:46
Definition Cell.h:36
Definition pseudoPotentialToDftfeConverter.cc:34