AI Code

Overview

For this project, I designed an AI engine for two reasons:

  1. To control the online demo; and

  2. To assist with card evaluation and discovery.

The first point should be obvious (even if it’s not yet implemented); while the second is more subtle: I want to make sure that the cards that are in this game are all completely balanced against one another, and that there’s no overpowered card in this game. (This can be evaluated by investigating the weights and win percentages given various strategies as the system plays itself and learns.)

Architecture

Each ~titans.ai.strategy.StandardStrategy object is a feedforward neural network that maps from the game’s current state (described as the number of cards in each game ~titans.ai.enum.Zone) to the decision that should be made (e.g. awaken such-and-such card, or choose to do nothing). Each ~titans.ai.player.Player has multiple strategy objects: one for each type of decision they make (e.g. they’d have one for awakening cards and a seperate one for playing cards).

When training a strategy, each player starts with randomly-initialized neural networks (or with an explicit ~titans.ai.strategy.RandomStrategy state) and plays many games against another (virtual) player. During each game, each time a decision is made, the game state (neural network input) and the chosen decision (neural network output) is recorded. Additionally, at the completion of each game, each decision is labeled according to whether the player won or lost the game. These decisions can then be coalesced into a dataset: Each game state can be assigned labels for each decision based upon that decision’s win percentage. (Of course, most decisions will be labeled with np.NaN, since they did not occur during the game.)

Using this setup, players can play many games against one another, and have their strategies updated every so-many-number of games. Players can thus learn strategies through bootstrapping: As they learn, their opponent learns, and after playing many games, they’ll be left with (hopefully) decent strategies.

Examples

Vs Random Player

As a proof-of-concept, a learning player was set up to play against a random player. After ~800 games, the learning player won >90% of the time.

Code and results can be found here.

Vs Previous-Best

Work-in-progress

Card Discovery

Work-in-progress

Bonus: Optimization

A batch decision-making process was implemented to accelerate training. With it, playtime was accelerated approximately 10x. Results can be found here.

Future Work

The biggest additions I hope to make to the AI package include:

  1. Deploying this as part of the demo. I’ll stand up a top-performing set of strategies in a Fast-API container (similar to the website API bot that’s working now), and have it accept (likely) JSON input while returning integer-decisions and/or probability distributions.

  2. Getting card discovery / weighting working. I think a straightforward way of showing card strength would be to take a highly-trained strategy, and compare win percentages between two players, where one player doesn’t have access to any given card. If losing access to a single card greatly decreases win percentage, that’d be a sign of an overpowered card.

  3. Implement the remaining cards in code.

Tests

Tests can be invoked by running

pytest tests/titans/ai

(We recommend executing ai tests separately from other tests.)

API Reference

Strategies

Strategies are the basic “thinking” (and learning!) units of the game. Think of them as sklearn-compliant classes for making decisions. The base (abstract) class is Strategy, which is inherited by RandomStrategy and StandardStrategy.

RandomStrategy

class titans.ai.strategy.RandomStrategy(random_state: int = None)[source]

Strategy for making random decisions

Parameters:
random_state: int, optional, default=None

random seed

Methods

fit(X, y, /)

Fit model

predict(X, /)

Predict best course of action

predict(X: ndarray, /) ndarray[source]

Predict best course of action

Parameters:
X: np.ndarray

data to predict for

Returns:
np.ndarray

predictions, returned as the predicted probability of winning given each possible choice

StandardStrategy

class titans.ai.strategy.StandardStrategy(*, scale: bool = True, **kwargs)[source]

Standard strategy for making decisions

This class scales data (via z-score normalization) and then using an MLP (feedforward regression ANN) to predict best courses of action (given the player’s / game’s state).

This model is designed so that you can resume fitting at any point: If you call the fit function multiple times, the ANN will simply resume fitting from where you left off. (The scaler will be static, though, configured through your first time you called fit. This ensures that new data is on the same scale as old data.)

Parameters:
scale: bool, optional, default=True

use sklearn.preprocessing.StandardScaler to scale data

**kwargs: Any

passed to RandomStrategy on __init__()

Attributes:
restricted: list[int] | None

If set, then don’t predict these values. This must be manually set, and is provided for exploring the relative values of cards

Methods

fit(X, y, /)

Fit model

predict(X, /)

Predict best course of action

fit(X: ndarray, y: ndarray, /) Strategy[source]

Fit model

Parameters:
X: np.ndarray

data to use for fitting

y: np.ndarray

labels

Returns:
Strategy

calling instance

predict(X: ndarray, /) ndarray[source]

Predict best course of action

Parameters:
X: np.ndarray

data to predict for

Returns:
np.ndarray

predictions, returned as the predicted probability of winning given each possible choice

Strategy

class titans.ai.strategy.Strategy[source]

Strategy abstract base class

Methods

fit(X, y, /)

Fit model

predict(X, /)

Predict best course of action

fit(X: ndarray, y: ndarray, /) Strategy[source]

Fit model

Parameters:
X: np.ndarray

data to use for fitting

y: np.ndarray

labels

Returns:
Strategy

calling instance

abstract predict(X: ndarray, /) ndarray[source]

Predict best course of action

Parameters:
X: np.ndarray

data to predict for

Returns:
np.ndarray

predictions, returned as the predicted probability of winning given each possible choice

Basic game constructs

Here, we list the basic game constructs that are strung together to simulate games. Card objects are used to represent actual in-game cards. Player objects use decks of cards to play against each other. Game objects orchestrate two players playing against one another. Trainer objects take players through many games to learn how to play the game well.

Card

class titans.ai.card.Card(name: Name, /)[source]

Card, with all properties

This class both has all the logic to instantiate cards, and holds all properties as a player uses cards in a game.

Parameters:
name: Name

Name of card

Attributes:
abilities: dict[Ability, int]

card abilities (count of each ability)

cost: int

amount of energy required to awaken

element: Element

card element

name: Name

card name

power: int

card power

species: Species

card species

Player

class titans.ai.player.Player(identity: Identity, /, cards: list[titans.ai.card.Card], strategies: dict[titans.ai.enum.Action, titans.ai.strategy.Strategy] | None = None, *, random_state: int | None = None, temperature: float | None = None)[source]

Player class

This class performs all of the core operations of each player. These actions are meant to be relatively low-level: The logic of walking through an actual game is located in the titans.ai.game.Game class.

Parameters:
identity: Identity

player’s identity

cards: list[Card]

cards used in this game

strategies: dict[Action, Strategy] | None, optional, default=None

strategies for performing actions. If None, random choices will be used

random_state: int | None, optional, default=None

player’s random seed

temperature: float | None, optional, default=None

randomness to add into decision making. This is important for making sure strategies don’t get stuck in local minimums during training.

Attributes:
cards: dict[Zone, list[Card]]

player’s cards, in each zone

identity: Identity

player’s identity

ritual_piles: list[Card]

cards in the shared ritual piles. This is harmonized between players in Player.handshake()

strategies: dict[Action, Strategy]

strategies for performing actions

temples: int

number of temples under player’s control

Methods

awaken_card()

Awaken a card from the ritual piles

battle_opponent()

Battle opponent

draw_cards([count])

Draw cards

freeze_state()

Freeze state (for simultaneous actions)

get_energy()

Get total energy from all cards in play

get_power()

Get total power from all cards in play

get_state()

Player's state

handshake(opponent, /)

Set this player's opponent (and vice-versa)

play_cards([count])

Play one or more cards

shuffle_cards()

Shuffle all cards together

awaken_card() tuple[titans.ai.card.Card | None, int][source]

Awaken a card from the ritual piles

The awakened card is added to your discard pile. You can always choose to not awaken.

Returns:
Card | None

awakened card. This is automatically added to the discard zone but is returned here for easy debugging / logging. None is returned if we choose to not awaken.

int

decision made. This is the index of the decision matrix that was ultimately chosen.

battle_opponent() Identity | None[source]

Battle opponent

This should NOT be called for each player, but only once

Returns:
Identity | None

winner of the battle

draw_cards(count: int = 1, /) list[titans.ai.card.Card][source]

Draw cards

Parameters:
count: int

number of cards to draw

Returns:
list[Card]

the cards drawn. These are added to the hand zone, but are also returned for easy debugging / logging.

freeze_state() Generator[ndarray, None, None][source]

Freeze state (for simultaneous actions)

This causes self.get_state() to return what the state is when you call self.freeze_state().

get_energy() int[source]

Get total energy from all cards in play

Returns:
int

available energy

get_power() int[source]

Get total power from all cards in play

Returns:
int

total power

get_state() ndarray[source]

Player’s state

This is the numeric state that is fed into the ML model as the training data

Returns:
np.ndarray

state, from this instance’s point-of-view

handshake(opponent: Player, /) None[source]

Set this player’s opponent (and vice-versa)

This sets self.opponent, which is used when getting the game state

You always must handshake before starting a game.

Parameters:
opponent: Player

this instance’s competitor for the game

play_cards(count: int = 1, /) tuple[list[titans.ai.card.Card], list[int]][source]

Play one or more cards

Parameters:
count: int, optional, default=1

how many cards to play

Returns:
list[Card]

cards played. The played card is automatically added to the play zone, but is also returned here for easy debugging / logging

list[int]

decisions made (i.e. the indices of the decision matrix that were executed)

shuffle_cards() None[source]

Shuffle all cards together

Game

class titans.ai.game.Game(player_kwargs: dict[str, Any] | dict[titans.ai.enum.Identity, dict[str, Any]] = None, /, *, turn_limit: int = 1000)[source]

Game Class

This class contains all the logic to play a full game.

Parameters:
player_kwargs: dict[str, Any] | dict[Identity, dict[str, Any]]

These dictionaries are unpacked as kwargs to initialize the players. If you provide a dictionary of strings, then the provided values will be unpacked to initialize both players. If you provide a dictionary of identities mapping to a dictionary of strings, then each player will be initialized with the corresponding kwargs provided for that player.

turn_limit: int, optional, default=1000

max number of turns before a draw is declared

Attributes:
cards: list[Card]

cards in the game

history: dict[bytes, dict[Action, dict[Identity, list[int]]]]

here, the history of each player’s state, and the choices they made given that state, are recorded. This variable contains three nested dictionaries:

  1. The top-level dictionary is indexed by each game state at which a decision was made (converted from np.ndarrary -> bytes, for hashability)

  2. The mid-level dictionary is indexed by each action made at that decision point

  3. The bottom-level dictionary is indexed by the player that made that action, and maps to the choice(s) that player made at that decision point

players: list[Players]

players playing the game

transcript: str

human-readable transcript (log) of the game

winner: Identity | None

winner of game

Methods

parallel_play()

Play game, returning a generator that pauses at each decision point

play()

Play game end-to-end

parallel_play() Generator[dict[titans.ai.enum.Identity, numpy.ndarray] | None, dict[titans.ai.enum.Identity, dict[titans.ai.enum.Action, numpy.ndarray]], None][source]

Play game, returning a generator that pauses at each decision point

This lets an operator simultaneously make decisions across many different games. This is important because the slowest part of processing is using the ANNs to make decisions. So, using this function lets operators speed up game simulations considerably.

Returns:
Generator

This generator will play the game to each decision point, and will then expect you to send in the decision_matrices that rank each possible choice.

Yields: dict[Identity, np.ndarray] | None

dictionary mapping from each player to their current state. If None is yielded, the game is over.

Send: dict[Identity, dict[Action, np.ndarray]]
dictionary mapping from each player to (

dictionary mapping from each action to the choices that player should make for that action, wherein each possible choice is ranked by its relative value. (highest value = perform that decision)

)

Returns: None

play() Game[source]

Play game end-to-end

Returns:
Game

calling instance

Trainer

class titans.ai.trainer.Trainer(*, baseline: bool = False, epochs: int = 10, games_per_epoch: int = 100, parallel: bool = True, patience: int = 3, retention: int = 3)[source]

Standard trainer

This class trains up strategies for execellently-playing Titans of Eden. It works by executing a number of training epochs, wherein each epoch a bunch of games are played, and then data from those games is used to train up better and better strategies.

During each epoch, the current strategy is played against the previous-best strategy to generate new training data.

Early-stopping, weight restoration, and patience are all used to ensure the best strategy dominates. Callback is judged on the strategy that (1) plays the best against baseline, and (2) outperforms previous-best strategies.

A temperature parameter is used and varied across games to make ensure that strategies don’t get stuck in local minima.

Parameters:
baseline: bool, optional, default=False

if True, only play against random-card-choosing strategies (instead of playing against latest-gen-minus-one)

epochs: int, optional, default=10

number of training epochs to execute

games_per_epoch: int, optioanl, default=100

number of games to play each epoch

parallel: bool, optional, default=True

play games in parallel, performing batch decision making at each decision point. This is much faster than making decisions one-at-a-time.

retention: int, optional, default=3

when training, keep game histories across this many epochs to train on. (i.e. training data from games more than retention epochs ago will be forgotten.)

Attributes:
history: deque[dict[bytes, dict[Action, dict[bool, list[int]]]]]

here, the history of winning and losing players is recorded. This variable maps every state to every choice made given that state, recording whether the player that made that choice ultimately won or lost the game.

This is saved as a circular buffer (deque), so that the Trainer can retain the histories from the most recent N games.

strategies: dict[Action, Strategy]

Strategies trained by this class

Methods

discover()

Discover best cards

train()

Train network

discover() ndarray[source]

Discover best cards

This function uses this trainer’s strategy to weight each card according to its relative value.

Returns:
np.ndarray of shape(len(Name),) of dtype float

weights (relative value) of each card

train() Self[source]

Train network

All parameters for training are set in __init__.

Constants and enums

Here, we list all the constants and enums that are available for easy-to-read code references. Please note that most enum members are not themselves documented, as we assume self-explanability to anyone familiar with the game.

The constants are mostly used internally for sizing neural networks.

Most enums here do not resolve to integers (without explicit casting), which helps catch code errors in mixing them up / supplying them in the wrong order.

NUM_CHOICES

titans.ai.constants.NUM_CHOICES: int = 21

Number of choices for each action

NUM_FEATURES

titans.ai.constants.NUM_FEATURES: int = 148

Size of player states, fed as input to ML models

Ability

class titans.ai.enum.Ability(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Abilities

BOLSTER_FIRE = 0
BOLSTER_ICE = 1
BOLSTER_RIVALS = 2
BOLSTER_ROCK = 3
BOLSTER_STORM = 4
DISCARD = 5
DRAW = 6
ENERGY = 7
ENERGY_ARC = 8
FLASH = 9
HAUNT = 10
PROTECT = 11
PURIFY = 12
SACRIFICE = 13
SUBSTITUTE = 14
SUBVERT_CAVE_IN = 15
SUBVERT_HARMLESS = 16
SUBVERT_MINDLESS = 17
SUBVERT_TRAITOROUS = 18
SUMMON = 19

Action

class titans.ai.enum.Action(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Various actions (one for each strategy)

AWAKEN = 0
PLAY = 1

Element

class titans.ai.enum.Element(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Elements

DESERT = 1
FIRE = 3
FOREST = 0
ICE = 4
ROCK = 5
STORM = 2

Identity

class titans.ai.enum.Identity(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Player names

BRYAN = 1
MIKE = 0

Name

class titans.ai.enum.Name(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Names

AKARI_TIMELESS_FIGHTER = 16
AURORA_DRACO = 6
CAVERNS_DEFENDER = 18
FINAL_JUDGMENT = 11
FROSTBREATH = 14
GHOST = 3
HELL_FROZEN_OVER = 15
JACE_WINTERS_FIRSTBORN = 12
LIVING_VOLCANO = 9
MADNESS_OF_A_THOUSAND_STARS = 7
MONK = 0
NIKOLAI_THE_CURSED = 4
RETURN_OF_THE_FROST_GIANTS = 13
SMOLDERING_DRAGON = 10
SPINE_SPLITTER = 17
TRAVELER = 2
WHAT_LIES_BELOW = 19
WINDS_HOWL = 5
WIZARD = 1
ZODIAC_THE_ETERNAL = 8

Species

class titans.ai.enum.Species(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Card Species

BEAST = 2
DRAGON = 3
DWELLER = 0
TITAN = 4
WARRIOR = 1

Zone

class titans.ai.enum.Zone(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Different zones a player’s cards can be in

DECK = 0
DISCARD = 1
HAND = 2
PLAY = 3