AI Code

Overview

For this project, I designed an AI engine for two reasons:

To control the online demo; and
To assist with card evaluation and discovery.

The first point should be obvious (even if it’s not yet implemented); while the second is more subtle: I want to make sure that the cards that are in this game are all completely balanced against one another, and that there’s no overpowered card in this game. (This can be evaluated by investigating the weights and win percentages given various strategies as the system plays itself and learns.)

Architecture

Each ~titans.ai.strategy.StandardStrategy object is a feedforward neural network that maps from the game’s current state (described as the number of cards in each game ~titans.ai.enum.Zone) to the decision that should be made (e.g. awaken such-and-such card, or choose to do nothing). Each ~titans.ai.player.Player has multiple strategy objects: one for each type of decision they make (e.g. they’d have one for awakening cards and a seperate one for playing cards).

When training a strategy, each player starts with randomly-initialized neural networks (or with an explicit ~titans.ai.strategy.RandomStrategy state) and plays many games against another (virtual) player. During each game, each time a decision is made, the game state (neural network input) and the chosen decision (neural network output) is recorded. Additionally, at the completion of each game, each decision is labeled according to whether the player won or lost the game. These decisions can then be coalesced into a dataset: Each game state can be assigned labels for each decision based upon that decision’s win percentage. (Of course, most decisions will be labeled with np.NaN, since they did not occur during the game.)

Using this setup, players can play many games against one another, and have their strategies updated every so-many-number of games. Players can thus learn strategies through bootstrapping: As they learn, their opponent learns, and after playing many games, they’ll be left with (hopefully) decent strategies.

Examples

Vs Random Player

As a proof-of-concept, a learning player was set up to play against a random player. After ~800 games, the learning player won >90% of the time.

Code and results can be found here.

Vs Previous-Best

Work-in-progress

Card Discovery

Work-in-progress

Bonus: Optimization

A batch decision-making process was implemented to accelerate training. With it, playtime was accelerated approximately 10x. Results can be found here.

Future Work

The biggest additions I hope to make to the AI package include:

Deploying this as part of the demo. I’ll stand up a top-performing set of strategies in a Fast-API container (similar to the website API bot that’s working now), and have it accept (likely) JSON input while returning integer-decisions and/or probability distributions.
Getting card discovery / weighting working. I think a straightforward way of showing card strength would be to take a highly-trained strategy, and compare win percentages between two players, where one player doesn’t have access to any given card. If losing access to a single card greatly decreases win percentage, that’d be a sign of an overpowered card.
Implement the remaining cards in code.

Tests

Tests can be invoked by running

pytest tests/titans/ai

(We recommend executing ai tests separately from other tests.)

API Reference

Strategies

Strategies are the basic “thinking” (and learning!) units of the game. Think of them as sklearn-compliant classes for making decisions. The base (abstract) class is Strategy, which is inherited by RandomStrategy and StandardStrategy.

RandomStrategy

class titans.ai.strategy.RandomStrategy(random_state: int = None)[source]

Strategy for making random decisions

Parameters:

random_state: int, optional, default=None: random seed

Methods

`fit`(X, y, /)	Fit model
`predict`(X, /)	Predict best course of action

predict(X: ndarray, /) → ndarray[source]

Predict best course of action

Parameters:

X: np.ndarray: data to predict for

Returns:

np.ndarray: predictions, returned as the predicted probability of winning given each possible choice

StandardStrategy

class titans.ai.strategy.StandardStrategy(*, scale: bool = True, **kwargs)[source]

Standard strategy for making decisions

This class scales data (via z-score normalization) and then using an MLP (feedforward regression ANN) to predict best courses of action (given the player’s / game’s state).

This model is designed so that you can resume fitting at any point: If you call the fit function multiple times, the ANN will simply resume fitting from where you left off. (The scaler will be static, though, configured through your first time you called fit. This ensures that new data is on the same scale as old data.)

Parameters:

scale: bool, optional, default=True: use sklearn.preprocessing.StandardScaler to scale data
**kwargs: Any: passed to RandomStrategy on __init__()

Attributes:

restricted: list[int] | None: If set, then don’t predict these values. This must be manually set, and is provided for exploring the relative values of cards

Methods

`fit`(X, y, /)	Fit model
`predict`(X, /)	Predict best course of action

fit(X: ndarray, y: ndarray, /) → Strategy[source]

Fit model

Parameters:

X: np.ndarray: data to use for fitting
y: np.ndarray: labels

Returns:

Strategy: calling instance

predict(X: ndarray, /) → ndarray[source]

Predict best course of action

Parameters:

X: np.ndarray: data to predict for

Returns:

np.ndarray: predictions, returned as the predicted probability of winning given each possible choice

Strategy

class titans.ai.strategy.Strategy[source]

Strategy abstract base class

Methods

`fit`(X, y, /)	Fit model
`predict`(X, /)	Predict best course of action

fit(X: ndarray, y: ndarray, /) → Strategy[source]

Fit model

Parameters:

X: np.ndarray: data to use for fitting
y: np.ndarray: labels

Returns:

Strategy: calling instance

abstract predict(X: ndarray, /) → ndarray[source]

Predict best course of action

Parameters:

X: np.ndarray: data to predict for

Returns:

np.ndarray: predictions, returned as the predicted probability of winning given each possible choice

Basic game constructs

Here, we list the basic game constructs that are strung together to simulate games. Card objects are used to represent actual in-game cards. Player objects use decks of cards to play against each other. Game objects orchestrate two players playing against one another. Trainer objects take players through many games to learn how to play the game well.

Card

class titans.ai.card.Card(name: Name, /)[source]

Card, with all properties

This class both has all the logic to instantiate cards, and holds all properties as a player uses cards in a game.

Parameters:

name: Name: Name of card

Attributes:

abilities: dict[Ability, int]: card abilities (count of each ability)
cost: int: amount of energy required to awaken
element: Element: card element
name: Name: card name
power: int: card power
species: Species: card species

Player

class titans.ai.player.Player(identity: Identity, /, cards: list[titans.ai.card.Card], strategies: dict[titans.ai.enum.Action, titans.ai.strategy.Strategy] | None = None, *, random_state: int | None = None, temperature: float | None = None)[source]

Player class

This class performs all of the core operations of each player. These actions are meant to be relatively low-level: The logic of walking through an actual game is located in the titans.ai.game.Game class.

Parameters:

identity: Identity: player’s identity
cards: list[Card]: cards used in this game
strategies: dict[Action, Strategy] | None, optional, default=None: strategies for performing actions. If None, random choices will be used
random_state: int | None, optional, default=None: player’s random seed
temperature: float | None, optional, default=None: randomness to add into decision making. This is important for making sure strategies don’t get stuck in local minimums during training.

Attributes:

cards: dict[Zone, list[Card]]: player’s cards, in each zone
identity: Identity: player’s identity
ritual_piles: list[Card]: cards in the shared ritual piles. This is harmonized between players in Player.handshake()
strategies: dict[Action, Strategy]: strategies for performing actions
temples: int: number of temples under player’s control

Methods

`awaken_card`()	Awaken a card from the ritual piles
`battle_opponent`()	Battle opponent
`draw_cards`([count])	Draw cards
`freeze_state`()	Freeze state (for simultaneous actions)
`get_energy`()	Get total energy from all cards in play
`get_power`()	Get total power from all cards in play
`get_state`()	Player's state
`handshake`(opponent, /)	Set this player's opponent (and vice-versa)
`play_cards`([count])	Play one or more cards
`shuffle_cards`()	Shuffle all cards together

awaken_card() → tuple[titans.ai.card.Card | None, int][source]

Awaken a card from the ritual piles

The awakened card is added to your discard pile. You can always choose to not awaken.

Returns:

Card | None: awakened card. This is automatically added to the discard zone but is returned here for easy debugging / logging. None is returned if we choose to not awaken.
int: decision made. This is the index of the decision matrix that was ultimately chosen.

battle_opponent() → Identity | None[source]

Battle opponent

This should NOT be called for each player, but only once

Returns:

Identity | None: winner of the battle

draw_cards(count: int = 1, /) → list[titans.ai.card.Card][source]

Draw cards

Parameters:

count: int: number of cards to draw

Returns:

list[Card]: the cards drawn. These are added to the hand zone, but are also returned for easy debugging / logging.

freeze_state() → Generator[ndarray, None, None][source]

Freeze state (for simultaneous actions)

This causes self.get_state() to return what the state is when you call self.freeze_state().

get_energy() → int[source]

Get total energy from all cards in play

Returns:

int: available energy

get_power() → int[source]

Get total power from all cards in play

Returns:

int: total power

get_state() → ndarray[source]

Player’s state

This is the numeric state that is fed into the ML model as the training data

Returns:

np.ndarray: state, from this instance’s point-of-view

handshake(opponent: Player, /) → None[source]

Set this player’s opponent (and vice-versa)

This sets self.opponent, which is used when getting the game state

You always must handshake before starting a game.

Parameters:

opponent: Player: this instance’s competitor for the game

play_cards(count: int = 1, /) → tuple[list[titans.ai.card.Card], list[int]][source]

Play one or more cards

Parameters:

count: int, optional, default=1: how many cards to play

Returns:

list[Card]: cards played. The played card is automatically added to the play zone, but is also returned here for easy debugging / logging
list[int]: decisions made (i.e. the indices of the decision matrix that were executed)

shuffle_cards() → None[source]: Shuffle all cards together

Game

class titans.ai.game.Game(player_kwargs: dict[str, Any] | dict[titans.ai.enum.Identity, dict[str, Any]] = None, /, *, turn_limit: int = 1000)[source]

Game Class

This class contains all the logic to play a full game.

Parameters:

player_kwargs: dict[str, Any] | dict[Identity, dict[str, Any]]: These dictionaries are unpacked as kwargs to initialize the players. If you provide a dictionary of strings, then the provided values will be unpacked to initialize both players. If you provide a dictionary of identities mapping to a dictionary of strings, then each player will be initialized with the corresponding kwargs provided for that player.
turn_limit: int, optional, default=1000: max number of turns before a draw is declared

Attributes:

cards: list[Card]

cards in the game

history: dict[bytes, dict[Action, dict[Identity, list[int]]]]

here, the history of each player’s state, and the choices they made given that state, are recorded. This variable contains three nested dictionaries:

The top-level dictionary is indexed by each game state at which a decision was made (converted from np.ndarrary -> bytes, for hashability)
The mid-level dictionary is indexed by each action made at that decision point
The bottom-level dictionary is indexed by the player that made that action, and maps to the choice(s) that player made at that decision point

players: list[Players]

players playing the game

transcript: str

human-readable transcript (log) of the game

winner: Identity | None

winner of game

Methods

`parallel_play`()	Play game, returning a generator that pauses at each decision point
`play`()	Play game end-to-end

parallel_play() → Generator[dict[titans.ai.enum.Identity, numpy.ndarray] | None, dict[titans.ai.enum.Identity, dict[titans.ai.enum.Action, numpy.ndarray]], None][source]

Play game, returning a generator that pauses at each decision point

This lets an operator simultaneously make decisions across many different games. This is important because the slowest part of processing is using the ANNs to make decisions. So, using this function lets operators speed up game simulations considerably.

Returns:

Generator

This generator will play the game to each decision point, and will then expect you to send in the decision_matrices that rank each possible choice.

Yields: dict[Identity, np.ndarray] | None

dictionary mapping from each player to their current state. If None is yielded, the game is over.

Send: dict[Identity, dict[Action, np.ndarray]]

dictionary mapping from each player to (: dictionary mapping from each action to the choices that player should make for that action, wherein each possible choice is ranked by its relative value. (highest value = perform that decision)

)

Returns: None

play() → Game[source]

Play game end-to-end

Returns:

Game: calling instance

Trainer

class titans.ai.trainer.Trainer(*, baseline: bool = False, epochs: int = 10, games_per_epoch: int = 100, parallel: bool = True, patience: int = 3, retention: int = 3)[source]

Standard trainer

This class trains up strategies for execellently-playing Titans of Eden. It works by executing a number of training epochs, wherein each epoch a bunch of games are played, and then data from those games is used to train up better and better strategies.

During each epoch, the current strategy is played against the previous-best strategy to generate new training data.

Early-stopping, weight restoration, and patience are all used to ensure the best strategy dominates. Callback is judged on the strategy that (1) plays the best against baseline, and (2) outperforms previous-best strategies.

A temperature parameter is used and varied across games to make ensure that strategies don’t get stuck in local minima.

Parameters:

baseline: bool, optional, default=False: if True, only play against random-card-choosing strategies (instead of playing against latest-gen-minus-one)
epochs: int, optional, default=10: number of training epochs to execute
games_per_epoch: int, optioanl, default=100: number of games to play each epoch
parallel: bool, optional, default=True: play games in parallel, performing batch decision making at each decision point. This is much faster than making decisions one-at-a-time.
retention: int, optional, default=3: when training, keep game histories across this many epochs to train on. (i.e. training data from games more than retention epochs ago will be forgotten.)

Attributes:

history: deque[dict[bytes, dict[Action, dict[bool, list[int]]]]]

here, the history of winning and losing players is recorded. This variable maps every state to every choice made given that state, recording whether the player that made that choice ultimately won or lost the game.

This is saved as a circular buffer (deque), so that the Trainer can retain the histories from the most recent N games.

strategies: dict[Action, Strategy]

Strategies trained by this class

Methods

`discover`()	Discover best cards
`train`()	Train network

discover() → ndarray[source]

Discover best cards

This function uses this trainer’s strategy to weight each card according to its relative value.

Returns:

np.ndarray of shape(len(Name),) of dtype float: weights (relative value) of each card

train() → Self[source]

Train network

All parameters for training are set in __init__.

Constants and enums

Here, we list all the constants and enums that are available for easy-to-read code references. Please note that most enum members are not themselves documented, as we assume self-explanability to anyone familiar with the game.

The constants are mostly used internally for sizing neural networks.

Most enums here do not resolve to integers (without explicit casting), which helps catch code errors in mixing them up / supplying them in the wrong order.

NUM_CHOICES

titans.ai.constants.NUM_CHOICES: int = 21: Number of choices for each action

NUM_FEATURES

titans.ai.constants.NUM_FEATURES: int = 148: Size of player states, fed as input to ML models

Ability

class titans.ai.enum.Ability(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Abilities

BOLSTER_FIRE = 0

BOLSTER_ICE = 1

BOLSTER_RIVALS = 2

BOLSTER_ROCK = 3

BOLSTER_STORM = 4

DISCARD = 5

DRAW = 6

ENERGY = 7

ENERGY_ARC = 8

FLASH = 9

HAUNT = 10

PROTECT = 11

PURIFY = 12

SACRIFICE = 13

SUBSTITUTE = 14

SUBVERT_CAVE_IN = 15

SUBVERT_HARMLESS = 16

SUBVERT_MINDLESS = 17

SUBVERT_TRAITOROUS = 18

SUMMON = 19

Action

class titans.ai.enum.Action(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Various actions (one for each strategy)

AWAKEN = 0

PLAY = 1

Element

class titans.ai.enum.Element(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Elements

DESERT = 1

FIRE = 3

FOREST = 0

ICE = 4

ROCK = 5

STORM = 2

Identity

class titans.ai.enum.Identity(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Player names

BRYAN = 1

MIKE = 0

Name

class titans.ai.enum.Name(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Names

AKARI_TIMELESS_FIGHTER = 16

AURORA_DRACO = 6

CAVERNS_DEFENDER = 18

FINAL_JUDGMENT = 11

FROSTBREATH = 14

GHOST = 3

HELL_FROZEN_OVER = 15

JACE_WINTERS_FIRSTBORN = 12

LIVING_VOLCANO = 9

MADNESS_OF_A_THOUSAND_STARS = 7

MONK = 0

NIKOLAI_THE_CURSED = 4

RETURN_OF_THE_FROST_GIANTS = 13

SMOLDERING_DRAGON = 10

SPINE_SPLITTER = 17

TRAVELER = 2

WHAT_LIES_BELOW = 19

WINDS_HOWL = 5

WIZARD = 1

ZODIAC_THE_ETERNAL = 8

Species

class titans.ai.enum.Species(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Species

BEAST = 2

DRAGON = 3

DWELLER = 0

TITAN = 4

WARRIOR = 1

Zone

class titans.ai.enum.Zone(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Different zones a player’s cards can be in

DECK = 0

DISCARD = 1

HAND = 2

PLAY = 3