Abstract
Black–Scholes (BS) is a remarkable quotation model for European option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes; however, in practice, it varies. How do traders come to learn these parameters? We introduce natural agent-based models, in which traders update their beliefs about the true implied volatility based on the opinions of other agents. We prove exponentially fast convergence of these opinion dynamics, using techniques from control theory and leader-follower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.
1 Introduction
Econophysics divides into two paradigms. Statistical Econophysics relies on data, fitting certain power laws to existing asset prices at various time scales [1,2]. In statistical Econophysics, zero-intelligence agents have random interactions. Agents are homogeneous and have no learning ability. The central object of study is historical price data. The viewpoint is that interacting zero-intelligence traders’ actions are already incorporated into price fluctuations. The focus is on the macroscopic aggregation of interactions in the form of available data.
While this is an important area of research, agent-based Econophysics offers the opportunity to study the microscopic interactions in more detail, where agents are heterogeneous.
Our objective is to offer a cogent and clear motivation for agent-based Econophysics in the context of option volatilities, whereby learning and interaction are made explicit. To an outsider, it may seem that financial assets are observed at one price, decided by the market. In reality, prices fluctuate throughout the day and there is no equilibrium price: it is always in flux. Interaction between strategic traders and other players is embedded in all transactions and informational channels. Interaction is vital to understanding markets. The motivation for this paper was inspired by the works of Kirman [3] and Follmer et al. [4]. Rather than develop a thorough game theoretic or mean-field model, we advocate something in between. We aim to take a more nuanced view of agent-based Econophysics as espoused by Chakraborti et al. [5].
1.1 Our contribution
We introduce two different classes of learning models that converge to a consensus. Our interest is not in equilibrium but what process leads to it [6–8]. The first introduces a feedback mechanism (§4.1, theorem 4.1) where agents who are off the true ‘hidden’ volatility parameter feel a slight (even infinitesimally so) pull towards it along with the all the other ‘random’ chatter of the market. This model captures the setting where traders have access to an alternative trading venue or an information source provided by brokers and private message boards. The second model incorporates a market leader (e.g. Goldman Sachs) that is confident in its own internal metrics or is privy to client flow (private information) and does not give any weight to outside opinions (§4.3, theorem 4.4). Proving the convergence results (as well as establishing the exponentially fast convergence rates) requires tools from discrete dynamical systems. We showcase as well as complement our theoretical results with experiments (e.g. figure 2a–d), which for example show that if we move away from our models, convergence is no longer guaranteed.
We formalize the multi-dimensional analogues of our two models by using Kronecker products (§5, theorems 5.1 and 5.3). Thus, our models show how a volatility curve could function as a global attractor given adaptive agents. We conclude the paper by discussing future work and connections to other fields.
2 Derivatives and social learning
Before discussing the main models of this paper, we give an overview of options markets and trading. We then motivate our framework and explain why certain social learning models are appropriate.
2.1 Trading
Most trading is done electronically. To be dominant, firms now invest huge sums in technology to get an edge. For futures trading, speed is vital to profits. Trading complex derivatives requires not only speed but huge amounts of investment in quantitative models. This, in turn, feeds the need for mathematicians, computer scientists and engineers. Increasingly, over the last two decades, the way trading is conducted has also seen drastic changes. Electronification of the markets has affected both instruments traded on and off exchange. Algorithmic trading drives not only plain vanilla instruments like stocks and futures but also derivatives [9–11]. Furthermore, the distinction between stock exchanges and over-the-counter (OTC) markets is not as clear as it once was [12]. In OTC markets, trading is between two counterparties and there is no centralized marketplace. Increasingly, over the last decade, there has been a regulatory push to make OTC markets more exchange-like. In OTC markets, participants may see what their competitors are quoting for a particular security, but volume and the actual price transacted remain the privy of the bilateral counterparties. In some quarters, OTC markets are usually referred to as being quote-driven or truly dark markets [13]. Regulation in the USA and European Union has resulted in fragmented exchange-based trading but centralization of opaque OTC markets.
2.2 Options markets
Derivative contracts are actively traded across the world’s financial markets with a total estimate value in the trillions of dollars. To get an intuitive understanding of the setting and the issues at hand, let us consider the prototypical example of European options.
A European option is the right to buy or sell an underlying asset at some point in the future at a fixed price, also known as the strike. A call option gives the right to buy an asset and a put option gives the right to sell an asset at the agreed price. On the opposite side of the buyer is the seller who has relinquished his control of exercise. Buyers of puts and calls can exercise the right to buy or sell. Sellers of options have to fulfil obligations when exercised against. The payoff of a buyer of a call option with stock price ST at expiry time T and exercise price K is max{ST − K, 0}, whereas for a put option is max{K − ST, 0}.
(a) A typical implied volatility smile for varying strikes K divided by fixed spot price. Moneyness is K/S0. ATM denotes At-The-Money where K equals S0. (b) Consensus occurs as all traders’ opinions of the implied volatility converge, round by round, to a distinct value for varying strikes.
(a) A typical implied volatility smile for varying strikes K divided by fixed spot price. Moneyness is K/S0. ATM denotes At-The-Money where K equals S0. (b) Consensus occurs as all traders’ opinions of the implied volatility converge, round by round, to a distinct value for varying strikes.
How does the market decide what the quoted volatility should be (e.g. for a stock index three months from now)? This is a critical but not well-understood question. This is exactly what we aim to study by introducing models of learning agents who update their beliefs about the volatility. Agent-based models on volatility–smile interaction and formation have not been thoroughly addressed in finance or Econophysics. They remain a challenge [18]. Previous attempts have been made, but the focus has never been on the mathematical or specific nature of interaction [19,20]. Furthermore, our work takes into account the physicality of how trading occurs. An alternative perspective is offered in [21,22], again though the nature of interaction is missing. Nevertheless, these early attempts offer a good indication that at least the problem has garnered significant interest in different disciplines.
2.3 Econophysics
The challenge for physicists is not to force existing physics-based models on human behaviour but rather develop new models [23–25]. To go from local microscopic interactions to global macroscopic behaviour is not an easy task [26,27]. In fact, the choice of models seems infinite. There are a plethora of agent-based models [5,25,28]. Which one is correct? And, moreover, which type of social learning is representative of financial markets trading? LeBaron provides an early guide [29]. Agent-based models were proclaimed as the future for Econophysics [30,31]. While development in this area has been steady, the problem of the emergence of volatility smiles remains unresolved. The volatility smile is an active and vigorous area of research in the mathematical finance community [32–34]. Many models postulate a stochastic process for the underlying stock and volatility combined.
2.4 Knightian uncertainty
Risk and uncertainty are two different concepts [35–37]. Risky assets are those on which the probabilities of random events are well defined and known. For instance, suppose we observe historical data of a stock price. Are we confident to claim we know the distribution of the stock’s returns? If we are, then the stock is considered risky. Its risk is quantifiable. However, if we were unsure of even the correct probability measure, then we would be faced with uncertainty. In a sense, this captures the essence of financial markets. Traders and players use different probability measures when trading and quoting options. No single measure dominates. In fact, there are many models that are consistent with the observation of a finite number of strike volatilities in the market [38–41]. In practice, the choice of a correct probability measure such that a derivative contract is priced correctly is a subjective and quantitative exercise. In any case, no perfect model exists [42–46]. As a result, participants in financial markets are free to choose whichever probability model they calibrate to market data [47–49].
The problem with economics-based models and those in mathematical finance literature is that many times the analysis is centred on a representative agent. In the case of risk and uncertainty, the choice of pricing a derivative contract reduces to choosing a correct equivalent martingale measure under which a derivative claim is replicable. For market-makers and dealers, the choice of models is vast. Each player has to make a choice and inevitably no two institutions will use the same models with the same parameters. In this case, it is remarkable that the market will aggregate the diverse beliefs to arrive at a consensus smile. At the microscopic level, though, the dealers are observing one another’s updates. Hence, our model can be seen as a meta-opinion dynamics framework built upon the individual choices of the dealers.
2.5 Non-Bayesian financial markets
In financial markets, updating occurs at high frequency across geographical locations [50,51]. Agents move simultaneously: cancellations are the norm [52–54]. In practical terms, sequential Bayesian learning models do not seem appropriate [55,56]. Bayesian observational learning examples include [57–59]. These models are sequential in nature. They study herd behaviour. As time passes, a player in turn observes the actions of previous agents and receives a private signal. Each agent has a one-off decision when she updates her posterior probability and takes an action. In some instances, the nth agent may reach the truth as n → ∞.
In DeGroot learning, myopic updating occurs in each iteration. Agents in our set-up have fixed weights but update their responses until consensus is reached. Recently, there have been some experimental papers on the evidence of DeGroot updating [60,61]. Repeated averaging models are our base precisely because they capture the nature of interaction and learning in financial markets so compactly. Players can observe previous choices but not the payoffs of their competitors. A more in-depth discussion of learning in games would take us further away from our goal of studying the mathematical nature of interaction. The reader can consult [62,63] for a game-theoretic perspective.
3 Model description
In mathematical opinion dynamic models, agents take views of other agents into account before arriving at their own updated estimate. Agents can observe other agents’ previous signals.
DeGroot [64] was one of the early developers of such observational learning dynamics. While simple, these models allow us to examine convergence to consensus. In a sense, these types of models are called naive models, as agents can recall perfectly what the other players submitted in the previous round. See the survey papers [65–68].
3.1 Volatility basics
Agents have an initial opinion of the implied volatility, which they update after taking into account volatilities of other agents. A feedback mechanism aids the agents in arriving at the true volatility parameter.
At all times, the focus is on a static picture of the volatility smile. Within this static framework agents are updating their opinion of the true implied volatility. This updating occurs in a high-frequency sense. In an exchange setting, one can think of all bids and offers as visible to agents. The agents initially are unsure of the true value of the implied volatility, but by learning—and feedback—reach consensus on the true parameter. Our first attempt is a naive learning model common in social networks. Learning occurs between trading times. Therefore, our implicit assumption is that no transactions occur while traders are adjusting and learning each other’s quotes.
This rather peculiar feature is market practice. Trading happens at longer intervals than quote updating. This is as true for high-frequency trading of stocks as it is for options markets. Quotes and prices—or rather vols—are changing more frequently than actual transactions.
Each dollar value of an option corresponds to an implied volatility parameter σ(K, T) ∈ (0, 1) that depends on strike and expiry. Implied volatility is quoted in percentage terms.
Assumption 3.1.
We have three types of players: agents/traders, brokers and leaders. Brokers give feedback to the traders. The ability of agents to determine this feedback is their learning ability. Leaders are unknown and do not give feedback but their quotes are visible.
3.2 Naive opinion dynamics
Definition 3.2 (consensus).
The n agents (3.2) are said to reach consensus if for any fixed initial condition , as t → ∞ for all i, j ∈ {1, … n}.
Definition 3.3 (consensus to a point).
The n agents (3.2) are said to reach consensus to a point if for any initial condition , lim t→∞Xt = c1n, where 1n denotes the n × 1 vector composed of only ones and . The constant c is often referred to as the consensus value.
Proposition 3.4.
Consider the opinion dynamics in equation (3.2). If A is aperiodic and irreducible, then for any initial conditionconsensus to a point is reached. The consensus value c depends on both the matrix A and the initial condition X1.
Remark 3.5.
Proposition 3.4 implies that if the row stochastic opinion matrix A is aperiodic and irreducible, then all the agents converge to some consensus value c. However, since c depends on the unknown initial opinion X1, the consensus value c is unknown and, in general, different from the true volatility σ(K, T). We wish to alleviate this and thus introduce two novel models.
4 Consensus (scalar agent dynamics)
In this section, we assume that the agents are able to learn how far off they are from the true volatility by informational channels in the marketplace. There are many avenues, platforms and private online chat rooms that provide quotes for option prices; some of these are stale and some are fresh. The agents’ learning ability determines the quality of the feedback from all these sources. In reality, options are not traded on one exchange or platform. There are multiple venues and, though there might be a dominant marketplace, the same instruments can be traded across different venues and locations. We aggregate all of this information in the form of feedback with learning ability. If agents are fast learners, they adjust their volatility estimates quickly.
4.1 Consensus with feedback
Theorem 4.1.
Consider the agent dynamics in (4.2) and assume that εi ∈ (0, aii), i = {1, … , n}. Then consensus tois reached, i.e..
Proof.
Corollary 4.2.
Consensus tois reached exponentially with convergence rate, i.e., i ∈ {1, … , n}, wheredenotes the matrix norm induced by the vector infinity norm.
Proof.
4.2 Random case
Under suitable random conditions for the trust matrix A and , we can still have consensus. In this case, the learning rates and weights are independently and identically distributed from each iteration. However, we need a condition to ensure convergence, namely that on average the learning rates are less than the self-belief condition. Since this is only in expectation, a probabilistic statement, there is some leeway on the learning rates being strictly less than self-belief aii at time t.
Theorem 4.3.
Proof.
Note we do not require the stronger condition that for all t. Unlike the deterministic case, the random case allows considerable flexibility. Neither self-belief aii > 0 nor positive learning εi is required for all times. However, there must be some interaction and learning for beliefs to converge. As matrix products do not commute, if we were to follow the full expansion of the recursion in any of the dynamics, the result would be long, unwieldy matrix products. Random matrix products and dynamics are an active area of research not only in mathematics but also in physics and control theory [73–78]. While the random case is certainly interesting, in this article our focus is on the first steps of modelling interaction and learning dynamics.
4.3 Consensus with an unknown leader
One criticism of model (4.2) is that feedback, even if it is not perfect, has to be learned. In practice, there might not be a helpful mechanism that provides feedback. An alternative is to have an unknown leader embedded in the set of traders. The agents are unsure who the leader is but by taking averages of other traders, they all arrive at the opinion of the leader. In Markov chain theory, such behaviour is called an absorbing state. The leader guides the system to the true value. We assume that the identity of the leader is unknown to all agents.
Theorem 4.4.
Consider the opinion dynamics in (4.5) and assume that the matrixis substochastic and irreducible. It holds that, i.e. consensus tois reached.
Proof.
Because for at least one i, and is substochastic and irreducible, the spectral radius , see lemma 6.28 in [69]; it follows that . Therefore, lim t→∞et = 0 and the assertion follows. ▪
Corollary 4.5.
Letdenote some matrix norm such that (such a norm always exists becauseunder the conditions of theorem 4.4). Then consensus tois reached exponentially with the convergence rate given by, i.e., for i ∈ {1, … , n} and some positive constant.
Proof.
See lemma 5.6.10 in [72] on how to construct such a . Now consider the consensus error et defined in the proof of theorem 4.4, which evolves according to the difference equation (4.6). It follows that , where e1 denotes the initial consensus error. Under the assumptions of theorem 4.4, . By lemma 5.6.10 in [72], implies that there exists some matrix norm, say , such that . We restate the error with norms and obtain . Because all norms are equivalent in finite dimensional vector spaces (see ch. 5 in [72]), for some positive constant . As , the norm of the consensus error converges to zero exponentially with rate . ▪
5 Consensus (vectored agent dynamics)
In this section, we suppose that agents have beliefs over a range of strikes. Thus, each agent’s opinion of the volatility curve is a vector with each entry corresponding to a particular strike. Typically, in markets, options are quoted for At-The-Money (ATM) K = S0 and for two further strikes left of and right of the ATM level. Here, we examine the case of k strikes and n agents, i.e. each agent i now has k quotes for k different moneyness levels. In this configuration, the true volatility is . See figure 1b.
5.1 Consensus with feedback
Theorem 5.1.
Consider the opinion dynamics in (5.2) and assume that εi ∈ (0, aii), i = {1, … , n}. Then consensus to (with) is reached, i.e..
Proof.
Corollary 5.2.
Consensus tois reached exponentially with the convergence rate given by, i.e. .
The proof of the above result is very similar to previous corollaries and is omitted.
5.2 Consensus with an unknown leader
Theorem 5.3.
Consider the opinion dynamics in (5.4) and assume that the matrixis substochastic and irreducible. Then consensus tois reached, i.e. .
The proof of theorem 5.3 follows the same line of reasoning as the proof of theorem 4.4 and it is omitted here.
Corollary 5.4.
Letdenote some matrix norm such that. Then consensus tois reached exponentially with convergence rate, i.e. , for some positive constant.
6 Numerical simulations
Evolution of the agents’ dynamics (4.2): (a) without learning, (b) with learning and εi satisfying the conditions of theorem 4.1, (c) with learning and εinot satisfying the conditions of theorem 4.1, and (d) evolution of the agents’ dynamics with a leader (4.5).
Evolution of the multi-dimensional agents’ dynamics with learning (5.2).
7 Arbitrage bounds
We have taken the true volatility parameter as exogenous to our models. Our only requirement is that there is no static arbitrage, by which we mean that all the quotes in volatility which translate to option prices are such that one cannot trade in the different strikes to create a profit. Checking whether a volatility surface is indeed arbitrage-free is non-trivial, nevertheless some sufficient conditions are well known [81–83]. As long as the volatility surface satisfies them our analysis implies global stability towards an arbitrage-free smile.
- —
Condition 1: (Call Spread) For 0 < K1 ≤ K2, we have
- —
Condition 2: (Butterfly Spread) For 0 < K1 < K2 < K3,
8 Discussion
8.1 Future work
Social learning is an active area of research in many different fields. By combining aspects of social learning models with dynamical systems, we were able to develop insightful analysis for the volatility smile. This can be extended further. There are several immediate possibilities. Can the number of strikes be infinite? We restricted the models to a finite number of strikes: fixed k. In practical terms, at any given time, there are usually two strikes below and two strikes above the ATM level that are liquid. This means the corresponding quotes are visible or updated for five strikes. One way to circumvent this is to consider arbitrage-free volatility curves. But again, we are faced with the observational nature of our framework. A trader only observes a fixed number of strikes of his competitors. The issue of how to introduce heterogeneity in the volatility curves, which themselves emanate from specific pricing models, remains open.
The number of agents can also be infinite. Perhaps a propagation of chaos type of result could shed some light on how an individual trader interacts with the mean-field limit [89–91]. In this case, we lose the heterogeneity of beliefs and the behaviour we are trying to study would have a different implication. Moreover, considerable technical machinery is required [92,93]. We could study the pure limiting behaviour as t, n → ∞. In our current framework, this would have to be balanced with whether an individual can observe an infinite number of competitors. While the technical subtleties are not insurmountable, the modelling issues are more subjective.
The technical issues in random matrix products, briefly discussed in this paper, assure us that much more work needs to be done on the modelling and mathematical front. For example, the matrices A and can be dependent with correlation decreasing in time. Work in this direction has been addressed by Popescu & Vaidya [94].
8.2 Connection
Recently, there has been some rather interesting work at the intersection of computer science and option pricing. Demarzo et al. [95] showed how to use efficient online trading algorithms to price the current value of financial instruments, deriving both upper and lower bounds using online trading algorithms. Moreover, Abernethy et al. [96,97] developed a BS price as sequential two-player zero-sum game. While these papers made an excellent start to bridge the gap between two different academic communities—mainly mathematical finance and theoretical computer science—they do not address the reality of volatility smiles and trading. Our contribution can be viewed as making these connections more concrete. The smile itself is a conundrum and there have even been articles questioning whether it can be solved [98]. The traditional way from the ground up is to develop a stochastic process for the volatility and asset price, possibly introducing jumps or more diffusions through uncertainty [99,100]. Such models have been successfully developed, but the time is ripe to incorporate multi-agent models with arbitrage-free curves.
Introducing learning agents in stochastic differential equation models [101], such as the BS model, is an exciting proposition. Moreover, opinion dynamics as a subject on its own has been studied quite extensively. Recent references that present an expansive discussion in computer science are [8,102]. Econophysics is the right community to develop new models. After all, there is no attachment to utilities of players or stochastic volatility models so entrenched in the mathematical finance community. Free from these shackles, researchers can use a range of tools and techniques to build more sophisticated models. Moreover, there is no restriction or debate on continuous or discrete time. While our framework is discrete, continuous time could perhaps show a way forward to incorporate models from mathematical finance and financial economics [103–105]. Jarrow [106] makes the case for continuous time, arguing that today’s financial markets trade and update at high frequency.
In this paper, we introduce models of learning agents in the context of option trading. A key open question in this setting is how the market comes to a consensus about market volatility, which is reflected in derivative pricing through the BS formula. The framework we have established allows us to explore other areas. Thus far, we took the smile as an exogenous object, proving convergence to equilibrium beliefs. A natural step forward would be to look at the beliefs as probability measures, where each measure corresponds to a different option pricing model. Our learning models focus on interaction between agents. Actually, agents can be interpreted as algorithms. Each algorithm corresponds to a particular belief of a pricing model. Until now, the replication paradigm has led to very sophisticated models. The future may belong to deep hedging arguments [107]. Still, whether we consider models or algorithms, interaction will always be a topic of interest.
Data accessibility
This article has no additional data. Code for simulations is available within the Dryad Digital Repository: https://doi.org/10.5061/dryad.prr4xgxjg [108].
Authors' contributions
T.V. conceptualized the model. T.V. and C.M. formalized the mathematical framework. G.P. guided the work and aided the discussions and structuring of the manuscript. T.V. and C.M. wrote the manuscript.
Competing interests
We declare we have no competing interests.
Funding
T.V. acknowledges a SUTD Presidential fellowship. C.M. acknowledges the National Research Foundation (NRF), Prime Minister’s Office, Singapore, under its National Cybersecurity R&D Programme (Award no. NRF2014NCR-NCR001-40) and administered by the National Cybersecurity R&D Directorate. G.P. acknowledges AcRF Tier 2 grant nos. 2016-T2-1-170, PIE-SGP-AI-2020-01, NRF2019-NRF-ANR095 ALIAS grant and NRF2018 Fellowship NRF-NRFF2018-07.
Acknowledgements
The authors thank Ioannis Panageas, Ionel Popescu, Niels Nygaard and JM Schumacher for fruitful discussions.
Footnotes
Using the BS formula with a particular implied volatility, traders obtain a dollar value for the price.



