Article Type : Research Article
Authors : Sinitsyn EV, Laptev VM, Komarova KS and Buzunov AN
Keywords : Probabilistic model of people flows; Forecasting of people/passenger flows; People flows forecasting by means of bankcards payments analysis
The tasks
of passenger flows forecasting as well as the detection of places where the
people concentrate during intra-city flows are important for planning the
development of transport infrastructure, demand for transport services and
organizing of traffic control. The lessons of the COVID-19 pandemic show that
such forecasting is also relevant for anti-epidemic measures. The goal of this
paper is the development of probabilistic mathematical model for intra-city
people flows on the base of bank card payments analysis. This model allows to
obtain information about territorial and temporal intra-city people flows. Such
flows can be also clustered by age -gender composition and payment purposes.
The usage of this model is illustrated by the examples of two Russian cities:
Ekaterinburg and Moscow.
Long-term planning of
transport infrastructure and urban development, tasks of public safety (for
example avoiding of accidents caused by overcrowding), anti-epidemic measures
are based on complete, timely information about the people flows over exact areas. Such
information is also implied by the "Smart city" standards a promising
direction of the urban development. Currently, several methods are used to get
necessary Information [1]:
·
Technical
vision systems [2,3].
·
Using
of various sensors - "counters" [4].
·
Sold
tickets data.
All the above methods
need either special measures, or rather expansive equipment [2,3]. Meantime,
one can suggest two methods that do not have these shortcomings, since the
information about the people flows over the studied territories is a by-product
of technologies satisfying the regular needs of residents. These methods are
the usage of data from mobile operators and the analysis of bank cards payments
data. The last are concentrated in the ecosystems of large banks. For example,
in Russia one may use the data of Burbank - the largest bank in Russia, Central
and Eastern Europe, and one of the leading financial institutions worldwide.
The bank cards of it are used by most Russian citizens [5,6]. The people flow
analysis based on the mobile operator's data is well known and is implemented
in many popular navigation services. The examples of bank cards payments
analysis for the same purposes are much scarcer. At the same time, such payments are an
inalienable part of everyday life for any citizen and can be a source of
regular necessary information to solve the above-mentioned problems. In data
analysis one can use the descriptive analytics and predictive modelling [7]. A
common variant of application the above-mentioned methods is namely descriptive
analysis. For the purposes of predictive modelling, it is worth to use
probabilistic models (check and the references in it). In this paper we are
going to suggest the variant of such model that was successfully used in
different analytical tasks [8-10]. Particularly, in the tasks, related with
analysis of members' flows between
various communities. For example, this can be the predictions of
passengers, or pedestrian’s flows between different city places or different
cities [10]. Let’s suppose that there are M
city’s regions and Xi is the number of potential participants of
flows in each, so:
Where N is the total
number of moving citizens. The task is to determinate the probability of the
distribution
And probability
distribution
Here
The data on bank card’s payments
To test the model, we
used the data of payments by Sberbank’s cards tied to the place of registration
of the device through which the payment was done. For the analysis, we used the
data for payments made within 7 months from July 2020 to December 2021, grouped
by working and non-working days of each month and by two-hour intervals from 0
to 24 hours. Information about the age
and gender of cardholders as well as the payment's purpose was also available.
A cardholder was counted as moved from one place to another if she or he paid
twice during one hour in the different places. Data were collected both for the
number of such persons and for the amounts of money spent in each place. The
territories of cities were covered by the net of hexagons (edge of each was 170
m) with known coordinates (latitude and longitude) of centers. Such hexagons
were grouped (if necessary) into larger geographical objects (city districts
for example). It was these objects that were considered as places between which
the citizens were moving, in equation [2].
These data were used
to calculate the main parameters in (2) -
Here
M – As above is the
total number of the nodes under consideration. N – Is the total number of
moving cardholders during the considered time – T:
And finally:
One can easily show
that for each node:
It is convenient to
introduce dimensionless time
As an example, Fig.1 shows the comparison of ?
distribution for the nodes of two Russian cities: Moscow and Ekaterinburg
(Figure 1).
The overall number of citizens in flows during T – ? (5) is equal:
Figure
1: Relative fraction (y-line) of nodes i with
The probabilities
Figure 2: The probability of citizen’s movement on the given distance.
Methods of people flows prognosis
In this section we are
going to discuss the possible solutions of the main equation (2) and its
consequences, particularly to the possible variants of people flows
predictions.
Correlation functions of people concentrations: Let's return to the equation (2). It can
be used to determine moment-generating function or solved numerically. We’ll
use both variants below. The results obtained allow to predict the probability
distribution
After
standard routine with the usage of (2) one can transform (6a,b) to the form:
Here
By direct summation in
(7a,b) it can be shown that:
As it should be in
accordance with (1).
From (6) one can find
that for a large N,
Where
The continuity equation and its solutions: In the limit of large N it is convenient
to transform equation (2) by expanding the probability
And correspondingly:
It is easy to show that total current:
We’ll consider the case
In accordance with (3e) matrix
Where:
And other parameters are defined by (3). Either matrix
It can be shown that this stationary state is stable.
The solution of (9) can be found by standard routine [14]:
Here
The concrete form of
Let's assume, for example, that the initial distribution is uniform in
some hypercube of the space
and is equal to zero otherwise. One can show, that for
Here
The feedback effects: Instead of solving multi-node problem one can use more simple
approximations. For example, the most probable concentrations of citizens in
the nodes, where intra-city flows are crossed can be estimated in two nodes
model: X is the node of interest, and Y all other nodes (XY – model). For
analysis of flows between nodes i and j one can use three nodes model: i is
node X, j – Y, and all other nodes are united into node Z (XYZ – model). For
instance, in Ekaterinburg standard deviation between concentrations
where
c is a constant,
that does not influence the results,
Here
The solution (18)
allows to analyse the peculiarities of
Let’s consider uniform
initial distribution (all concentrations x are equivalently possible), so
Here:
And:
Thus, if
·
The
avoidance of the places with small people concentrations.
·
The
tendency to visit such places.
Suppose that X is the investigated place and Y
all other places of the city available for visits. For simplicity, let's assume
that in (18)
Figure
3:
The transitions (X?Y,Y?X) probabilities
It is worth to mention
that in the case of avoiding places with small people concentrations the
equations (2), (9) have two stable singular points –
(
In this section we are
going to apply the above models to the analysis of people flows in two Russian
cities – Moscow, and Ekaterinburg (the native town of the authors). The
hexagons for Ekaterinburg were grouped into 69 nodes, corresponding to
established administrative division, large shopping centers were also
considered as independent nodes. In Moscow we simply grouped neighboring
hexagons to reduce their number to observable quantity. Figure 4 shows the
distributions of stationary concentrations
Figure
4: The
fraction of given values of
The statistical
parameters of
The presented data show that distribution of
people concentration over the nodes is highly heterogeneous. The shift of the
distribution for Moscow to the area of low concentrations compared to
Yekaterinburg is likely caused by studying the people flows between larger
administrative entities in the latter city. Quite expectedly, the maximum
Figure
5:
The probability density functions of variable
Table 1: The statistical
characteristics of
City |
Average value
– |
Standard deviation – |
Variation coefficient – |
Median |
Moscow |
1,97E-03 |
1,85E-03 |
93,6% |
1,49E-03 |
Ekaterinburg |
1,45E-02 |
7,93E-03 |
54,7% |
1,30E-02 |
Figure 5 shows the
distribution of flows intensities in the stationary states for two above
mentioned cities. To estimate the total load on the city’s road network we
summarize the flows “To the node i”:
And “From the node i”:
For each node. So, the
total flow for the node i in both directions is:
It is worth to remind
that absolute values of flows intensities are given by multiplying of (21c) by
the total number of citizens in flows (Figure 5).
Detailed analysis of
people flows shows that most citizens are moving on small distances. For
example, in Ekaterinburg 88%, and in Moscow 72% of citizens move on the
distances less than 1750 m. So, the analysis of pedestrian’s traffic is of high
practical importance. Such analysis is realized in the special service
“Pedestrian Traffic”, built on big data from SBER. This service uses access to
more than 75% of transactions activity of clients - physical persons. The
service allows to track and prognoses daily and seasonal pedestrian activity,
to build a map of the most popular pedestrian routes as well as to record
amount of people living in a certain area for both city center and suburbs.
The proposed model can
also be used in such, unfortunately currently required, activity as planning of
anti-epidemic measures. Really, using it one can trace the flows of infected
persons, starting from “zero” diseased. The information about coefficient of
the infection spread (Rt- the number of people, who can be infected
on average by one diseased) makes it possible to estimate the statistical
characteristics of the disease’s spread over the concrete city by means of the
proposed model. As an example, Fig. 6 shows the time evolution of a contagious
disease in Ekaterinburg, under various restrictions on the people flows. Two
types of restrictions were considered:
Soft isolation:
The restrictions of the most significant flows between the administrative
districts of the city, and conserving of free movement within each of them;
Lock down:
The restrictions of all significant flows both between the administrative
districts of the city and inside them.
It was assumed that the
infection spread coefficient was
Fig. 6 shows that restrictions on the people flows over the city significantly slows down the spread of the disease. This gives the healthcare system the time to prepare for receiving the patients. The proposed model allows to introduce such restrictions very selectively. So, the slowing down of the disease's spread can be made without serious problems for business activity and life quality (Figure 6).
Figure 6: The influence of the restrictions on people flows on the velocity of the disease’s spread, calculated on the base of model, proposed in the section 2.2. Arrows show the time reserve earned by healthcare system.
As it was shown in
Section 2.2, the stationary states are achieved during a sufficiently long
process of people random replacements over the city with the given
probabilities of transitions between the nodes. This allows to interpret
Figure 7: The visitors’ and revenue’s concentrations in thirteen main shopping centers of Ekaterinburg.
To study the potential disadvantages of the above approach, we analyzed the passenger’s flows of bus public transport in the Sverdlovsk region. The data obtained within the framework of the proposed model were compared with the direct counting of the tickets sold. It was shown that the proposed model correctly reflects the trends in changings of passenger’s flows by the days of a week and time of a day. The passenger concentrations at the stopping points, predicted by the model, worse agree with the actual data, although their relative values at different stopping points are reflected more or less correctly. Apparently, this is related to the above-mentioned feature of the model - the analysis of the buyers of goods along the route, and not all the passengers.
Figure
8:
The visitors’ and revenue’s concentrations in the different ages’ groups of the
shopping center with maximum revenue in Ekaterinburg.
The possible model for
the analysis of pedestrian and passenger flows based on the analysis of data on
payments by bank cards during the movements across the routes were
considered. We formulated the
algorithms, allowing to predict the places of the most probable concentration
of citizens in the process of intercity flows and the magnitudes of passenger
and pedestrian flows. It should be emphasized that the proposed model creates
the possibility to predict not only the flows of citizens, but also the flows
of the effective demand along the routes of people flows. We express our
gratitude to the employees of SBER Analytics Company for providing the data for
our analysis.