AI Code
Overview
For this project, I designed an AI engine for two reasons:
To control the online demo; and
To assist with card evaluation and discovery.
The first point should be obvious (even if it’s not yet implemented); while the second is more subtle: I want to make sure that the cards that are in this game are all completely balanced against one another, and that there’s no overpowered card in this game. (This can be evaluated by investigating the weights and win percentages given various strategies as the system plays itself and learns.)
Architecture
Each ~titans.ai.strategy.StandardStrategy object is a feedforward neural network that maps from the game’s current state (described as the number of cards in each game ~titans.ai.enum.Zone) to the decision that should be made (e.g. awaken such-and-such card, or choose to do nothing). Each ~titans.ai.player.Player has multiple strategy objects: one for each type of decision they make (e.g. they’d have one for awakening cards and a seperate one for playing cards).
When training a strategy, each player starts with randomly-initialized neural
networks (or with an explicit ~titans.ai.strategy.RandomStrategy state)
and plays many games against another (virtual) player. During each game, each
time a decision is made, the game state (neural network input) and the chosen
decision (neural network output) is recorded. Additionally, at the completion
of each game, each decision is labeled according to whether the player won or
lost the game. These decisions can then be coalesced into a dataset: Each game
state can be assigned labels for each decision based upon that decision’s win
percentage. (Of course, most decisions will be labeled with np.NaN,
since they did not occur during the game.)
Using this setup, players can play many games against one another, and have their strategies updated every so-many-number of games. Players can thus learn strategies through bootstrapping: As they learn, their opponent learns, and after playing many games, they’ll be left with (hopefully) decent strategies.
Examples
Vs Random Player
As a proof-of-concept, a learning player was set up to play against a random player. After ~800 games, the learning player won >90% of the time.
Code and results can be found here.
Vs Previous-Best
Work-in-progress
Card Discovery
Work-in-progress
Bonus: Optimization
A batch decision-making process was implemented to accelerate training. With it, playtime was accelerated approximately 10x. Results can be found here.
Future Work
The biggest additions I hope to make to the AI package include:
Deploying this as part of the demo. I’ll stand up a top-performing set of strategies in a Fast-API container (similar to the website API bot that’s working now), and have it accept (likely) JSON input while returning integer-decisions and/or probability distributions.
Getting card discovery / weighting working. I think a straightforward way of showing card strength would be to take a highly-trained strategy, and compare win percentages between two players, where one player doesn’t have access to any given card. If losing access to a single card greatly decreases win percentage, that’d be a sign of an overpowered card.
Implement the remaining cards in code.
Tests
Tests can be invoked by running
pytest tests/titans/ai
(We recommend executing ai tests separately from other tests.)
API Reference
Strategies
Strategies are the basic “thinking” (and learning!) units of the game. Think of
them as sklearn-compliant classes for making decisions. The base (abstract)
class is Strategy, which is inherited by
RandomStrategy and
StandardStrategy.
RandomStrategy
StandardStrategy
- class titans.ai.strategy.StandardStrategy(*, scale: bool = True, **kwargs)[source]
Standard strategy for making decisions
This class scales data (via z-score normalization) and then using an MLP (feedforward regression ANN) to predict best courses of action (given the player’s / game’s state).
This model is designed so that you can resume fitting at any point: If you call the fit function multiple times, the ANN will simply resume fitting from where you left off. (The scaler will be static, though, configured through your first time you called fit. This ensures that new data is on the same scale as old data.)
- Parameters:
- scale: bool, optional, default=True
use sklearn.preprocessing.StandardScaler to scale data
- **kwargs: Any
passed to RandomStrategy on __init__()
- Attributes:
- restricted: list[int] | None
If set, then don’t predict these values. This must be manually set, and is provided for exploring the relative values of cards
Methods
fit(X, y, /)Fit model
predict(X, /)Predict best course of action
Strategy
- class titans.ai.strategy.Strategy[source]
Strategy abstract base class
Methods
fit(X, y, /)Fit model
predict(X, /)Predict best course of action
Basic game constructs
Here, we list the basic game constructs that are strung together to simulate
games. Card objects are used to represent actual
in-game cards. Player objects use decks of cards to
play against each other. Game objects orchestrate two
players playing against one another. Trainer
objects take players through many games to learn how to play the game well.
Card
- class titans.ai.card.Card(name: Name, /)[source]
Card, with all properties
This class both has all the logic to instantiate cards, and holds all properties as a player uses cards in a game.
- Parameters:
- name: Name
Name of card
- Attributes:
- abilities: dict[Ability, int]
card abilities (count of each ability)
- cost: int
amount of energy required to awaken
- element: Element
card element
- name: Name
card name
- power: int
card power
- species: Species
card species
Player
- class titans.ai.player.Player(identity: Identity, /, cards: list[titans.ai.card.Card], strategies: dict[titans.ai.enum.Action, titans.ai.strategy.Strategy] | None = None, *, random_state: int | None = None, temperature: float | None = None)[source]
Player class
This class performs all of the core operations of each player. These actions are meant to be relatively low-level: The logic of walking through an actual game is located in the titans.ai.game.Game class.
- Parameters:
- identity: Identity
player’s identity
- cards: list[Card]
cards used in this game
- strategies: dict[Action, Strategy] | None, optional, default=None
strategies for performing actions. If None, random choices will be used
- random_state: int | None, optional, default=None
player’s random seed
- temperature: float | None, optional, default=None
randomness to add into decision making. This is important for making sure strategies don’t get stuck in local minimums during training.
- Attributes:
- cards: dict[Zone, list[Card]]
player’s cards, in each zone
- identity: Identity
player’s identity
- ritual_piles: list[Card]
cards in the shared ritual piles. This is harmonized between players in Player.handshake()
- strategies: dict[Action, Strategy]
strategies for performing actions
- temples: int
number of temples under player’s control
Methods
Awaken a card from the ritual piles
Battle opponent
draw_cards([count])Draw cards
Freeze state (for simultaneous actions)
Get total energy from all cards in play
Get total power from all cards in play
Player's state
handshake(opponent, /)Set this player's opponent (and vice-versa)
play_cards([count])Play one or more cards
Shuffle all cards together
- awaken_card() tuple[titans.ai.card.Card | None, int][source]
Awaken a card from the ritual piles
The awakened card is added to your discard pile. You can always choose to not awaken.
- Returns:
- Card | None
awakened card. This is automatically added to the discard zone but is returned here for easy debugging / logging. None is returned if we choose to not awaken.
- int
decision made. This is the index of the decision matrix that was ultimately chosen.
- battle_opponent() Identity | None[source]
Battle opponent
This should NOT be called for each player, but only once
- Returns:
- Identity | None
winner of the battle
- draw_cards(count: int = 1, /) list[titans.ai.card.Card][source]
Draw cards
- Parameters:
- count: int
number of cards to draw
- Returns:
- list[Card]
the cards drawn. These are added to the hand zone, but are also returned for easy debugging / logging.
- freeze_state() Generator[ndarray, None, None][source]
Freeze state (for simultaneous actions)
This causes self.get_state() to return what the state is when you call self.freeze_state().
- get_state() ndarray[source]
Player’s state
This is the numeric state that is fed into the ML model as the training data
- Returns:
- np.ndarray
state, from this instance’s point-of-view
- handshake(opponent: Player, /) None[source]
Set this player’s opponent (and vice-versa)
This sets self.opponent, which is used when getting the game state
You always must handshake before starting a game.
- Parameters:
- opponent: Player
this instance’s competitor for the game
- play_cards(count: int = 1, /) tuple[list[titans.ai.card.Card], list[int]][source]
Play one or more cards
- Parameters:
- count: int, optional, default=1
how many cards to play
- Returns:
- list[Card]
cards played. The played card is automatically added to the play zone, but is also returned here for easy debugging / logging
- list[int]
decisions made (i.e. the indices of the decision matrix that were executed)
Game
- class titans.ai.game.Game(player_kwargs: dict[str, Any] | dict[titans.ai.enum.Identity, dict[str, Any]] = None, /, *, turn_limit: int = 1000)[source]
Game Class
This class contains all the logic to play a full game.
- Parameters:
- player_kwargs: dict[str, Any] | dict[Identity, dict[str, Any]]
These dictionaries are unpacked as kwargs to initialize the players. If you provide a dictionary of strings, then the provided values will be unpacked to initialize both players. If you provide a dictionary of identities mapping to a dictionary of strings, then each player will be initialized with the corresponding kwargs provided for that player.
- turn_limit: int, optional, default=1000
max number of turns before a draw is declared
- Attributes:
- cards: list[Card]
cards in the game
- history: dict[bytes, dict[Action, dict[Identity, list[int]]]]
here, the history of each player’s state, and the choices they made given that state, are recorded. This variable contains three nested dictionaries:
The top-level dictionary is indexed by each game state at which a decision was made (converted from np.ndarrary -> bytes, for hashability)
The mid-level dictionary is indexed by each action made at that decision point
The bottom-level dictionary is indexed by the player that made that action, and maps to the choice(s) that player made at that decision point
- players: list[Players]
players playing the game
- transcript: str
human-readable transcript (log) of the game
- winner: Identity | None
winner of game
Methods
Play game, returning a generator that pauses at each decision point
play()Play game end-to-end
- parallel_play() Generator[dict[titans.ai.enum.Identity, numpy.ndarray] | None, dict[titans.ai.enum.Identity, dict[titans.ai.enum.Action, numpy.ndarray]], None][source]
Play game, returning a generator that pauses at each decision point
This lets an operator simultaneously make decisions across many different games. This is important because the slowest part of processing is using the ANNs to make decisions. So, using this function lets operators speed up game simulations considerably.
- Returns:
- Generator
This generator will play the game to each decision point, and will then expect you to send in the decision_matrices that rank each possible choice.
- Yields: dict[Identity, np.ndarray] | None
dictionary mapping from each player to their current state. If None is yielded, the game is over.
- Send: dict[Identity, dict[Action, np.ndarray]]
- dictionary mapping from each player to (
dictionary mapping from each action to the choices that player should make for that action, wherein each possible choice is ranked by its relative value. (highest value = perform that decision)
)
Returns: None
Trainer
- class titans.ai.trainer.Trainer(*, baseline: bool = False, epochs: int = 10, games_per_epoch: int = 100, parallel: bool = True, patience: int = 3, retention: int = 3)[source]
Standard trainer
This class trains up strategies for execellently-playing Titans of Eden. It works by executing a number of training epochs, wherein each epoch a bunch of games are played, and then data from those games is used to train up better and better strategies.
During each epoch, the current strategy is played against the previous-best strategy to generate new training data.
Early-stopping, weight restoration, and patience are all used to ensure the best strategy dominates. Callback is judged on the strategy that (1) plays the best against baseline, and (2) outperforms previous-best strategies.
A temperature parameter is used and varied across games to make ensure that strategies don’t get stuck in local minima.
- Parameters:
- baseline: bool, optional, default=False
if True, only play against random-card-choosing strategies (instead of playing against latest-gen-minus-one)
- epochs: int, optional, default=10
number of training epochs to execute
- games_per_epoch: int, optioanl, default=100
number of games to play each epoch
- parallel: bool, optional, default=True
play games in parallel, performing batch decision making at each decision point. This is much faster than making decisions one-at-a-time.
- retention: int, optional, default=3
when training, keep game histories across this many epochs to train on. (i.e. training data from games more than retention epochs ago will be forgotten.)
- Attributes:
- history: deque[dict[bytes, dict[Action, dict[bool, list[int]]]]]
here, the history of winning and losing players is recorded. This variable maps every state to every choice made given that state, recording whether the player that made that choice ultimately won or lost the game.
This is saved as a circular buffer (deque), so that the Trainer can retain the histories from the most recent N games.
- strategies: dict[Action, Strategy]
Strategies trained by this class
Methods
discover()Discover best cards
train()Train network
Constants and enums
Here, we list all the constants and enums that are available for easy-to-read code references. Please note that most enum members are not themselves documented, as we assume self-explanability to anyone familiar with the game.
The constants are mostly used internally for sizing neural networks.
Most enums here do not resolve to integers (without explicit casting), which helps catch code errors in mixing them up / supplying them in the wrong order.
NUM_CHOICES
- titans.ai.constants.NUM_CHOICES: int = 21
Number of choices for each action
NUM_FEATURES
- titans.ai.constants.NUM_FEATURES: int = 148
Size of player states, fed as input to ML models
Ability
- class titans.ai.enum.Ability(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Card Abilities
- BOLSTER_FIRE = 0
- BOLSTER_ICE = 1
- BOLSTER_RIVALS = 2
- BOLSTER_ROCK = 3
- BOLSTER_STORM = 4
- DISCARD = 5
- DRAW = 6
- ENERGY = 7
- ENERGY_ARC = 8
- FLASH = 9
- HAUNT = 10
- PROTECT = 11
- PURIFY = 12
- SACRIFICE = 13
- SUBSTITUTE = 14
- SUBVERT_CAVE_IN = 15
- SUBVERT_HARMLESS = 16
- SUBVERT_MINDLESS = 17
- SUBVERT_TRAITOROUS = 18
- SUMMON = 19
Action
Element
Identity
Name
- class titans.ai.enum.Name(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Card Names
- AKARI_TIMELESS_FIGHTER = 16
- AURORA_DRACO = 6
- CAVERNS_DEFENDER = 18
- FINAL_JUDGMENT = 11
- FROSTBREATH = 14
- GHOST = 3
- HELL_FROZEN_OVER = 15
- JACE_WINTERS_FIRSTBORN = 12
- LIVING_VOLCANO = 9
- MADNESS_OF_A_THOUSAND_STARS = 7
- MONK = 0
- NIKOLAI_THE_CURSED = 4
- RETURN_OF_THE_FROST_GIANTS = 13
- SMOLDERING_DRAGON = 10
- SPINE_SPLITTER = 17
- TRAVELER = 2
- WHAT_LIES_BELOW = 19
- WINDS_HOWL = 5
- WIZARD = 1
- ZODIAC_THE_ETERNAL = 8