| Portable Game Notation |
| Standard | Portable Game Notation Specification and Implementation Guide |
|---|---|
| Revised | 1994.03.12 |
| Authors | Interested readers of the Internet newsgroup rec.games.chess |
| Coordinator | Steven J. Edwards (send comments to sje@world.std.com) |
"If now, while they are one people, all speaking the same language, they have started to do this, nothing will later stop them from doing whatever they propose to do."Genesis XI, v.6
PGN is not intended to be a general purpose standard that is suitable for every possible use; no such standard could fill all conceivable requirements. Instead, PGN is proposed as a universal portable representation for data interchange. The idea is to allow the construction of a family of chess applications that can quickly and easily process chess game data using PGN for import and export among themselves.
[Event "F/S Return Match"] [Site "Belgrade, Serbia JUG"] [Date "1992.11.04"] [Round "29"] [White "Fischer, Robert J."] [Black "Spassky, Boris V."] [Result "1/2-1/2"] 1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8 10. d4 Nbd7 11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5 Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6 23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5 hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5 35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6 Nf2 42. g4 Bd3 43. Re6 1/2-1/2
Other than formats, there is the additional topic of PGN presentation. While both PGN import and export formats are designed to be readable by humans, there is no recommendation that either of these be an ultimate mode of chess data presentation. Rather, software developers are urged to consider all of the various techniques at their disposal to enhance the display of chess data at the presentation level (i.e., highest level) of their programs. This means that the use of different fonts, character sizes, color, and other tools of computer aided interaction and publishing should be explored to provide a high quality presentation appropriate to the function of the particular program.
Sadly, there are some accidents of history that survive to this day that have baroque representations for a newline: multicharacter sequences, end-of-line record markers, start-of-line byte counts, fixed length records, and so forth. It is well beyond the scope of the PGN project to reconcile all of these to the unified world of ANSI C and the those enjoying the bliss of a single '\n' convention. Some systems may just not be able to handle an archival PGN text file with native text editors. In these cases, an indulgence of sorts is granted to use the local newline convention in non-archival PGN files for those text editors.
The 32 ISO 8859/1 code values from 128 to 159 are non-printing control characters. They are not used for PGN data representation. The 32 code values from 160 to 191 are mostly non-alphabetic printing characters and their use for PGN data is discouraged as their graphic representation varies considerably among other ISO Latin sets. Finally, the 64 code values from 192 to 255 are mostly alphabetic printing characters with various diacritical marks; their use is encouraged for those languages that require such characters. The graphic representations of this last set of 64 characters is fairly constant for the ISO Latin family.
Printing character codes outside of the seven bit ASCII range may only appear in string data and in commentary. They are not permitted for use in symbol construction.
Because some PGN users' environments may not support presentation of non-ASCII characters, PGN game authors should refrain from using such characters in critical commentary or string values in game data that may be referenced in such environments. PGN software authors should have their programs handle such environments by displaying a question mark ("?") for non-ASCII character codes. This is an important point because there are many computing systems that can display eight bit character data, but the display graphics may differ among machines and operating systems from different manufacturers.
Only four of the ASCII control characters are permitted in PGN import format; these are the horizontal and vertical tabs along with the linefeed and carriage return codes.
The external representation of the newline character may differ among platforms; this is an acceptable variation as long as the details of the implementation are hidden from software implementors and users. When a choice is practical, the Unix "newline is linefeed" convention is preferred.
In some cases, very long tag values will require 80 or more columns, but these are relatively rare. An example of this is the "FEN" tag pair; it may have a long tag value, but this particular tag pair is only used to represent a game that doesn't start from the usual initial position.
Brace comments do not nest; a left brace character appearing in a brace comment loses its special meaning and is ignored. A semicolon appearing inside of a brace comment loses its special meaning and is ignored. Braces appearing inside of a semicolon comments lose their special meaning and are ignored.
*** Export format representation of comments needs definition work.
A percent sign appearing in any other place other than the first position in a line does not trigger the escape mechanism.
A string token is a sequence of zero or more printing characters delimited by a pair of quote characters (ASCII decimal value 34, hexadecimal value 0x22). An empty string is represented by two adjacent quotes. (Note: an apostrophe is not a quote.) A quote inside a string is represented by the backslash immediately followed by a quote. A backslash inside a string is represented by two adjacent backslashes. Strings are commonly used as tag pair values (see below). Non-printing characters like newline and tab are not permitted inside of strings. A string token is terminated by its closing quote. Currently, a string is limited to a maximum of 255 characters of data.
An integer token is a sequence of one or more decimal digit characters. It is a special case of the more general "symbol" token class described below. Integer tokens are used to help represent move number indications (see below). An integer token is terminated just prior to the first non-symbol character following the integer digit sequence.
A period character (".") is a token by itself. It is used for move number indications (see below). It is self terminating.
An asterisk character ("*") is a token by itself. It is used as one of the possible game termination markers (see below); it indicates an incomplete game or a game with an unknown or otherwise unavailable result. It is self terminating.
The left and right bracket characters ("[" and "]") are tokens. They are used to delimit tag pairs (see below). Both are self terminating.
The left and right parenthesis characters ("(" and ")") are tokens. They are used to delimit Recursive Annotation Variations (see below). Both are self terminating.
The left and right angle bracket characters ("<" and ">") are tokens. They are reserved for future expansion. Both are self terminating.
A Numeric Annotation Glyph ("NAG", see below) is a token; it is composed of a dollar sign character ("$") immediately followed by one or more digit characters. It is terminated just prior to the first non-digit character following the digit sequence.
A symbol token starts with a letter or digit character and is immediately followed by a sequence of zero or more symbol continuation characters. These continuation characters are letter characters ("A-Za-z"), digit characters ("0-9"), the underscore ("_"), the plus sign ("+"), the octothorpe sign ("#"), the equal sign ("="), the colon (":"), and the hyphen ("-"). Symbols are used for a variety of purposes. All characters in a symbol are significant. A symbol token is terminated just prior to the first non-symbol character following the symbol character sequence. Currently, a symbol is limited to a maximum of 255 characters in length.
A PGN game is composed of two sections. The first is the tag pair section and the second is the movetext section. The tag pair section provides information that identifies the game by defining the values associated with a set of standard parameters. The movetext section gives the usually enumerated and possibly annotated moves of the game along with the concluding game termination marker. The chess moves themselves are represented using SAN (Standard Algebraic Notation), also described later in this document.
A tag pair is composed of four consecutive tokens: a left bracket token, a symbol token, a string token, and a right bracket token. The symbol token is the tag name and the string token is the tag value associated with the tag name. (There is a standard set of tag names and semantics described below.) The same tag name should not appear more than once in a tag pair section.
A further restriction on tag names is that they are composed exclusively of letters, digits, and the underscore character. This is done to facilitate mapping of tag names into key and attribute names for use with general purpose database programs.
For PGN import format, there may be zero or more white space characters between any adjacent pair of tokens in a tag pair.
For PGN export format, there are no white space characters between the left bracket and the tag name, there are no white space characters between the tag value and the right bracket, and there is a single space character between the tag name and the tag value.
Tag names, like all symbols, are case sensitive. All tag names used for archival storage begin with an upper case letter.
PGN import format may have multiple tag pairs on the same line and may even have a tag pair spanning more than a single line. Export format requires each tag pair to appear left justified on a line by itself; a single empty line follows the last tag pair.
Some tag values may be composed of a sequence of items. For example, a consultation game may have more than one player for a given side. When this occurs, the single character ":" (colon) appears between adjacent items. Because of this use as an internal separator in strings, the colon should not otherwise appear in a string.
The tag pair format is designed for expansion; initially only strings are allowed as tag pair values. Tag value formats associated with the STR (Seven Tag Roster, see below) will not change; they will always be string values. However, there are long term plans to allow general list structures as tag values for non-STR tag pairs. Use of these expanded tag values will likely be restricted to special research programs. In all events, the top level structure of a tag pair remains the same: left bracket, tag name, tag value, and right bracket.
For import format, the order of tag pairs is not important. For export format, the STR tag pairs appear before any other tag pairs. (The STR tag pairs must also appear in order; this order is described below). Also for export format, any additional tag pairs appear in ASCII order by tag name.
The seven tag names of the STR are (in order):
A set of supplemental tag names is given later in this document.
For PGN export format, a single blank line appears after the last of the tag pairs to conclude the tag pair section. This helps simple scanning programs to quickly determine the end of the tag pair section and the beginning of the movetext section. The Event tag value should be reasonably descriptive. Abbreviations are to be avoided unless absolutely necessary. A consistent event naming should be used to help facilitate database scanning. If the name of the event is unknown, a single question mark should appear as the tag value.
Examples:
[Event "FIDE World Championship"] [Event "Moscow City Championship"] [Event "ACM North American Computer Championship"] [Event "Casual Game"]The Site tag value should city and region names along with a standard name for the country. The use of the IOC (International Olympic Committee) three letter names is suggested for those countries where such codes are available. If the site of the event is unknown, a single question mark should appear as the tag value. A comma may be used to separate a city from a region. No comma is needed to separate a city or region from the IOC country code. A later section of this document gives a list of three letter nation codes along with a few additions for "locations" not covered by the IOC.
Examples:
[Site "New York City, NY USA"] [Site "St. Petersburg RUS"] [Site "Riga LAT"]The Date tag value gives the starting date for the game. (Note: this is not necessarily the same as the starting date for the event.) The date is given with respect to the local time of the site given in the Event tag. The Date tag value field always uses a standard ten character format: "YYYY.MM.DD". The first four characters are digits that give the year, the next character is a period, the next two characters are digits that give the month, the next character is a period, and the final two characters are digits that give the day of the month. If the any of the digit fields are not known, then question marks are used in place of the digits.
Examples:
[Date "1992.08.31"] [Date "1993.??.??"] [Date "2001.01.01"]The Round tag value gives the playing round for the game. In a match competition, this value is the number of the game played. If the use of a round number is inappropriate, then the field should be a single hyphen character. If the round is unknown, a single question mark should appear as the tag value. Some organizers employ unusual round designations and have multipart playing rounds and sometimes even have conditional rounds. In these cases, a multipart round identifier can be made from a sequence of integer round numbers separated by periods. The leftmost integer represents the most significant round and succeeding integers represent round numbers in descending hierarchical order.
Examples:
[Round "1"] [Round "3.1"] [Round "4.1.2"]The White tag value is the name of the player or players of the white pieces. The names are given as they would appear in a telephone directory. The family or last name appears first. If a first name or first initial is available, it is separated from the family name by a comma and a space. Finally, one or more middle initials may appear. (Wherever a comma appears, the very next character should be a space. Wherever an initial appears, the very next character should be a period.) If the name is unknown, a single question mark should appear as the tag value.
The intent is to allow meaningful ASCII sorting of the tag value that is independent of regional name formation customs. If more than one person is playing the white pieces, the names are listed in alphabetical order and are separated by the colon character between adjacent entries. A player who is also a computer program should have appropriate version information listed after the name of the program.
The format used in the FIDE Rating Lists is appropriate for use for player name tags.
Examples:
[White "Tal, Mikhail N."] [White "van der Wiel, Johan"] [White "Acme Pawngrabber v.3.2"] [White "Fine, R."]The Black tag value is the name of the player or players of the black pieces. The names are given here as they are for the White tag value.
Examples:
[Black "Lasker, Emmanuel"] [Black "Smyslov, Vasily V."] [Black "Smith, John Q.:Woodpusher 2000"] [Black "Morphy"]The Result field value is the result of the game. It is always exactly the same as the game termination marker that concludes the associated movetext. It is always one of four possible values: "1-0" (White wins), "0-1" (Black wins), "1/2-1/2" (drawn game), and "*" (game still in progress, game abandoned, or result otherwise unknown). Note that the digit zero is used in both of the first two cases; not the letter "O".
All possible examples:
[Result "0-1"] [Result "1-0"] [Result "1/2-1/2"] [Result "*"]
Because illegal moves are not real chess moves, they are not permitted in PGN movetext. They may appear in commentary, however. One would hope that illegal moves are relatively rare in games worthy of recording.
In PGN export format, tokens in the movetext are placed left justified on successive text lines each of which has less than 80 printing characters. As many tokens as possible are placed on a line with the remainder appearing on successive lines. A single space character appears between any two adjacent symbol tokens on the same line in the movetext. As with the tag pair section, a single empty line follows the last line of data to conclude the movetext section.
Neither the first or the last character on an export format PGN line is a space. (This may change in the case of commentary; this area is currently under development.)
PGN import format move number indications may have zero or more period characters following the digit sequence that gives the move number; one or more white space characters may appear between the digit sequence and the period(s). There are two export format move number indication formats, one for use appearing immediately before a white move element and one for use appearing immediately before a black move element. A white move number indication is formed from the integer giving the fullmove number with a single period character appended. A black move number indication is formed from the integer giving the fullmove number with three period characters appended.
All white move elements have a preceding move number indication. A black move element has a preceding move number indication only in two cases: first, if there is intervening annotation or commentary between the black move and the previous white move; and second, if there is no previous white move in the special case where a game starts from a position where Black is the active player.
There are no other cases where move number indications appear in PGN export format.
Examples of SAN recorded games are found throughout most modern chess publications. SAN as presented in this document uses English language single character abbreviations for chess pieces, although this is easily changed in the source. English is chosen over other languages because it appears to be the most widely recognized.
An alternative to SAN is FAN (Figurine Algebraic Notation). FAN uses miniature piece icons instead of single letter piece abbreviations. The two notations are otherwise identical. SAN identifies each of the sixty four squares on the chessboard with a unique two character name. The first character of a square identifier is the file of the square; a file is a column of eight squares designated by a single lower case letter from "a" (leftmost or queenside) up to and including "h" (rightmost or kingside). The second character of a square identifier is the rank of the square; a rank is a row of eight squares designated by a single digit from "1" (bottom side [White's first rank]) up to and including "8" (top side [Black's first rank]). The initial squares of some pieces are: white queen rook at a1, white king at e1, black queen knight pawn at b7, and black king rook at h8. SAN identifies each piece by a single upper case letter. The standard English values: pawn = "P", knight = "N", bishop = "B", rook = "R", queen = "Q", and king = "K".
The letter code for a pawn is not used for SAN moves in PGN export format movetext. However, some PGN import software disambiguation code may allow for the appearance of pawn letter codes. Also, pawn and other piece letter codes are needed for use in some tag pair and annotation constructs.
It is admittedly a bit chauvinistic to select English piece letters over those from other languages. There is a slight justification in that English is a de facto universal second language among most chessplayers and program users. It is probably the best that can be done for now. A later section of this document gives alternative piece letters, but these should be used only for local presentation software and not for archival storage or for dynamic interchange among programs. A basic SAN move is given by listing the moving piece letter (omitted for pawns) followed by the destination square. Capture moves are denoted by the lower case letter "x" immediately prior to the destination square; pawn captures the file letter of the originating square of the capturing pawn immediately prior to the "x" character.
SAN kingside castling is indicated by the sequence "O-O"; queenside castling is indicated by the sequence "O-O-O". Note that the upper case letter "O" is used, not the digit zero. The use of a zero character is not only incompatible with traditional text practices, but it can also confuse parsing algorithms which also have to understand about move numbers and game termination markers. Also note that the use of the letter "O" is consistent with the practice of having all chess move symbols start with a letter; also, it follows the convention that all non-pawn move symbols start with an upper case letter.
En passant captures do not have any special notation; they are formed as if the captured pawn were on the capturing pawn's destination square. Pawn promotions are denoted by the equal sign "=" immediately following the destination square with a promoted piece letter (indicating one of knight, bishop, rook, or queen) immediately following the equal sign. As above, the piece letter is in upper case. In the case of ambiguities (multiple pieces of the same type moving to the same square), the first appropriate disambiguating step of the three following steps is taken:
First, if the moving pieces can be distinguished by their originating files, the originating file letter of the moving piece is inserted immediately after the moving piece letter.
Second (when the first step fails), if the moving pieces can be distinguished by their originating ranks, the originating rank digit of the moving piece is inserted immediately after the moving piece letter.
Third (when both the first and the second steps fail), the two character square coordinate of the originating square of the moving piece is inserted immediately after the moving piece letter.
Note that the above disambiguation is needed only to distinguish among moves of the same piece type to the same square; it is not used to distinguish among attacks of the same piece type to the same square. An example of this would be a position with two white knights, one on square c3 and one on square g1 and a vacant square e2 with White to move. Both knights attack square e2, and if both could legally move there, then a file disambiguation is needed; the (nonchecking) knight moves would be "Nce2" and "Nge2". However, if the white king were at square e1 and a black bishop were at square b4 with a vacant square d2 (thus an absolute pin of the white knight at square c3), then only one white knight (the one at square g1) could move to square e2: "Ne2". If the move is a checking move, the plus sign "+" is appended as a suffix to the basic SAN move notation; if the move is a checkmating move, the octothorpe sign "#" is appended instead.
Neither the appearance nor the absence of either a check or checkmating indicator is used for disambiguation purposes. This means that if two (or more) pieces of the same type can move to the same square the differences in checking status of the moves does not allieviate the need for the standard rank and file disabiguation described above. (Note that a difference in checking status for the above may occur only in the case of a discovered check.)
Neither the checking or checkmating indicators are considered annotation as they do not communicate subjective information. Therefore, they are qualitatively different from move suffix annotations like "!" and "?". Subjective move annotations are handled using Numeric Annotation Glyphs as described in a later section of this document.
There are no special markings used for double checks or discovered checks.
There are no special markings used for drawing moves. SAN moves can be as short as two characters (e.g., "d4"), or as long as seven characters (e.g., "Qa6xb7#", "fxg1=Q+"). The average SAN move length seen in realistic games is probably just fractionally longer than three characters. If the SAN rules seem complicated, be assured that the earlier notation systems of LEN (Long English Notation) and EDN (English Descriptive Notation) are much more complex, and that LAN (Long Algebraic Notation, the predecessor of SAN) is unnecessarily bulky. PGN export format always uses the above canonical SAN to represent moves in the movetext section of a PGN game. Import format is somewhat more relaxed and it makes allowances for moves that do not conform exactly to the canonical format. However, these allowances may differ among different PGN reader programs. Only data appearing in export format is in all cases guaranteed to be importable into all PGN readers.
There are a number of suggested guidelines for use with implementing PGN reader software for permitting non-canonical SAN move representation. The idea is to have a PGN reader apply various transformations to attempt to discover the move that is represented by non-canonical input. Some suggested transformations : letter case remapping, capture indicator insertion, check indicator insertion, and checkmate indicator insertion. Import format PGN allows for the use of traditional suffix annotations for moves. There are exactly six such annotations available: "!", "?", "!!", "!?", "?!", and "??". At most one such suffix annotation may appear per move, and if present, it is always the last part of the move symbol.
When exported, a move suffix annotation is translated into the corresponding Numeric Annotation Glyph as described in a later section of this document. For example, if the single move symbol "Qxa8?" appears in an import format PGN movetext, it would be replaced with the two adjacent symbols "Qxa8 $2".
*** The specification for import/export representation of RAV elements needs further development.
There are six kinds of TimeControl fields.
The first kind is a single question mark ("?") which means that the time control mode is unknown. When used, it is usually the only descriptor present.
The second kind is a single hyphen ("-") which means that there was no time control mode in use. When used, it is usually the only descriptor present.
The third Time control field kind is formed as two positive integers separated by a solidus ("/") character. The first integer is the number of moves in the period and the second is the number of seconds in the period. Thus, a time control period of 40 moves in 2 1/2 hours would be represented as "40/9000".
The fourth TimeControl field kind is used for a "sudden death" control period. It should only be used for the last descriptor in a TimeControl tag value. It is sometimes the only descriptor present. The format consists of a single integer that gives the number of seconds in the period. Thus, a blitz game would be represented with a TimeControl tag value of "300".
The fifth TimeControl field kind is used for an "incremental" control period. It should only be used for the last descriptor in a TimeControl tag value and is usually the only descriptor in the value. The format consists of two positive integers separated by a plus sign ("+") character. The first integer gives the minimum number of seconds allocated for the period and the second integer gives the number of extra seconds added after each move is made. So, an incremental time control of 90 minutes plus one extra minute per move would be given by "4500+60" in the TimeControl tag value.
The sixth TimeControl field kind is used for a "sandclock" or "hourglass" control period. It should only be used for the last descriptor in a TimeControl tag value and is usually the only descriptor in the value. The format consists of an asterisk ("*") immediately followed by a positive integer. The integer gives the total number of seconds in the sandclock period. The time control is implemented as if a sandclock were set at the start of the period with an equal amount of sand in each of the two chambers and the players invert the sandclock after each move with a time forfeit indicated by an empty upper chamber. Electronic implementation of a physical sandclock may be used. An example sandclock specification for a common three minute egg timer sandclock would have a tag value of "*180".
Additional TimeControl field kinds will be defined as necessary.
Strings that may appear as Termination tag values:
NAGs with values from 1 to 9 annotate the move just played.
NAGs with values from 10 to 135 modify the current position.
NAGs with values from 136 to 139 describe time pressure.
Other NAG values are reserved for future definition.
Note: the number assignments listed below should be considered preliminary in nature; they are likely to be changed as a result of reviewer feedback.
| NAG | Interpretation |
|---|---|
| 0 | null annotation |
| 1 | good move (traditional "!") |
| 2 | poor move (traditional "?") |
| 3 | very good move (traditional "!!") |
| 4 | very poor move (traditional "??") |
| 5 | speculative move (traditional "!?") |
| 6 | questionable move (traditional "?!") |
| 7 | forced move (all others lose quickly) |
| 8 | singular move (no reasonable alternatives) |
| 9 | worst move |
| 10 | drawish position |
| 11 | equal chances, quiet position |
| 12 | equal chances, active position |
| 13 | unclear position |
| 14 | White has a slight advantage |
| 15 | Black has a slight advantage |
| 16 | White has a moderate advantage |
| 17 | Black has a moderate advantage |
| 18 | White has a decisive advantage |
| 19 | Black has a decisive advantage |
| 20 | White has a crushing advantage (Black should resign) |
| 21 | Black has a crushing advantage (White should resign) |
| 22 | White is in zugzwang |
| 23 | Black is in zugzwang |
| 24 | White has a slight space advantage |
| 25 | Black has a slight space advantage |
| 26 | White has a moderate space advantage |
| 27 | Black has a moderate space advantage |
| 28 | White has a decisive space advantage |
| 29 | Black has a decisive space advantage |
| 30 | White has a slight time (development) advantage |
| 31 | Black has a slight time (development) advantage |
| 32 | White has a moderate time (development) advantage |
| 33 | Black has a moderate time (development) advantage |
| 34 | White has a decisive time (development) advantage |
| 35 | Black has a decisive time (development) advantage |
| 36 | White has the initiative |
| 37 | Black has the initiative |
| 38 | White has a lasting initiative |
| 39 | Black has a lasting initiative |
| 40 | White has the attack |
| 41 | Black has the attack |
| 42 | White has insufficient compensation for material deficit |
| 43 | Black has insufficient compensation for material deficit |
| 44 | White has sufficient compensation for material deficit |
| 45 | Black has sufficient compensation for material deficit |
| 46 | White has more than adequate compensation for material deficit |
| 47 | Black has more than adequate compensation for material deficit |
| 48 | White has a slight center control advantage |
| 49 | Black has a slight center control advantage |
| 50 | White has a moderate center control advantage |
| 51 | Black has a moderate center control advantage |
| 52 | White has a decisive center control advantage |
| 53 | Black has a decisive center control advantage |
| 54 | White has a slight kingside control advantage |
| 55 | Black has a slight kingside control advantage |
| 56 | White has a moderate kingside control advantage |
| 57 | Black has a moderate kingside control advantage |
| 58 | White has a decisive kingside control advantage |
| 59 | Black has a decisive kingside control advantage |
| 60 | White has a slight queenside control advantage |
| 61 | Black has a slight queenside control advantage |
| 62 | White has a moderate queenside control advantage |
| 63 | Black has a moderate queenside control advantage |
| 64 | White has a decisive queenside control advantage |
| 65 | Black has a decisive queenside control advantage |
| 66 | White has a vulnerable first rank |
| 67 | Black has a vulnerable first rank |
| 68 | White has a well protected first rank |
| 69 | Black has a well protected first rank |
| 70 | White has a poorly protected king |
| 71 | Black has a poorly protected king |
| 72 | White has a well protected king |
| 73 | Black has a well protected king |
| 74 | White has a poorly placed king |
| 75 | Black has a poorly placed king |
| 76 | White has a well placed king |
| 77 | Black has a well placed king |
| 78 | White has a very weak pawn structure |
| 79 | Black has a very weak pawn structure |
| 80 | White has a moderately weak pawn structure |
| 81 | Black has a moderately weak pawn structure |
| 82 | White has a moderately strong pawn structure |
| 83 | Black has a moderately strong pawn structure |
| 84 | White has a very strong pawn structure |
| 85 | Black has a very strong pawn structure |
| 86 | White has poor knight placement |
| 87 | Black has poor knight placement |
| 88 | White has good knight placement |
| 89 | Black has good knight placement |
| 90 | White has poor bishop placement |
| 91 | Black has poor bishop placement |
| 92 | White has good bishop placement |
| 93 | Black has good bishop placement |
| 84 | White has poor rook placement |
| 85 | Black has poor rook placement |
| 86 | White has good rook placement |
| 87 | Black has good rook placement |
| 98 | White has poor queen placement |
| 99 | Black has poor queen placement |
| 100 | White has good queen placement |
| 101 | Black has good queen placement |
| 102 | White has poor piece coordination |
| 103 | Black has poor piece coordination |
| 104 | White has good piece coordination |
| 105 | Black has good piece coordination |
| 106 | White has played the opening very poorly |
| 107 | Black has played the opening very poorly |
| 108 | White has played the opening poorly |
| 109 | Black has played the opening poorly |
| 110 | White has played the opening well |
| 111 | Black has played the opening well |
| 112 | White has played the opening very well |
| 113 | Black has played the opening very well |
| 114 | White has played the middlegame very poorly |
| 115 | Black has played the middlegame very poorly |
| 116 | White has played the middlegame poorly |
| 117 | Black has played the middlegame poorly |
| 118 | White has played the middlegame well |
| 119 | Black has played the middlegame well |
| 120 | White has played the middlegame very well |
| 121 | Black has played the middlegame very well |
| 122 | White has played the ending very poorly |
| 123 | Black has played the ending very poorly |
| 124 | White has played the ending poorly |
| 125 | Black has played the ending poorly |
| 126 | White has played the ending well |
| 127 | Black has played the ending well |
| 128 | White has played the ending very well |
| 129 | Black has played the ending very well |
| 130 | White has slight counterplay |
| 131 | Black has slight counterplay |
| 132 | White has moderate counterplay |
| 133 | Black has moderate counterplay |
| 134 | White has decisive counterplay |
| 135 | Black has decisive counterplay |
| 136 | White has moderate time control pressure |
| 137 | Black has moderate time control pressure |
| 138 | White has severe time control pressure |
| 139 | Black has severe time control pressure |
As game files are commonly arranged by chronological order, games with missing or incomplete Date tag pair data are to be avoided. Any question mark characters in a Date tag value will be treated as zero digits for collation within a file and also for file naming.
Large quantities of PGN data arranged by chronological order should be organized into hierarchical directories. A directory containing all PGN data for a given year would have a four character name in the format "YYYY"; directories containing PGN files for a given month would have a six character name in the format "YYYYMM".
| PGN | master directory of the PGN subtree (pub/chess/Game-Databases/PGN) |
| PGN/Events | directory of PGN files, each for a specific event |
| PGN/Events/News | news and status of the event collection |
| PGN/Events/ReadMe | brief description of the local directory contents |
| PGN/MGR | directory of the Master Games Repository subtree |
| PGN/MGR/News | news and status of the entire PGN/MGR subtree |
| PGN/MGR/ReadMe | brief description of the local directory contents |
| PGN/MGR/YYYY | directory of games or subtrees for the year YYYY |
| PGN/MGR/YYYY/ReadMe | description of local directory for year YYYY |
| PGN/MGR/YYYY/News | news and status for year YYYY data |
| PGN/News | news and status of the entire PGN subtree |
| PGN/Players | directory of PGN files, each for a specific player |
| PGN/Players/News | news and status of the player collection |
| PGN/Players/ReadMe | brief description of the local directory contents |
| PGN/ReadMe | brief description of the local directory contents |
| PGN/Standard | the PGN standard (this document) |
| PGN/Tools | software utilities that access PGN data |
The first (most important, primary key) is the Date tag. Earlier dated games appear prior to games played at a later date. This field is sorted by ascending numeric value first with the year, then the month, and finally the day of the month. Query characters used for unknown date digit values will be treated as zero digit characters for ordering comparison.
The second key is the Event tag. This is sorted in ascending ASCII order.
The third key is the Site tag. This is sorted in ascending ASCII order.
The fourth key is the Round tag. This is sorted in ascending numeric order based on the value of the integer used to denote the playing round. A query or hyphen used for the round is ordered before any integer value. A query character is ordered before a hyphen character.
The fifth key is the White tag. This is sorted in ascending ASCII order.
The sixth key is the Black tag. This is sorted in ascending ASCII order.
The seventh key is the Result tag. This is sorted in ascending ASCII order.
The eighth key is the movetext itself. This is sorted in ascending ASCII order with the entire text including spaces and newline characters.
In addition to the PGN standard, there are two more chess standards of interest to the chess software community. These are the FEN standard (Forsyth-Edwards Notation) for position notation and the EPD standard (Extended Position Description) for comprehensive position description for automated interprogram processing. These are described in a later section of this document.
Some PGN software is freeware and can be gotten from ftp sites and other sources. Other PGN software is payware and appears as part of commercial chessplaying programs and chess database managers. Those who are interested in the propagation of the PGN standard are encouraged to support manufacturers of chess software that use the standard. If a particular vendor does not offer PGN compatibility, it is likely that a few letters to them along with a copy of this specification may help them decide to PGN support in their next release.
The staff at the University of Oklahoma at Norman (USA) have graciously provided an ftp site chess.uoknor.edu for the storage of chess related data and programs. Because file names change over time, those accessing the site are encouraged to first retrieve the file "pub/chess/ls-lR.gz" for a current listing. A scan of this listing will also help locate versions of PGN programs for machine types and operating systems other than those listed below. Further information about this archive can be gotten from its administrator, Chris Petroff chris@uoknor.edu.
For European users, the kind staff at the University of Hamburg (Germany) have provided the ftp site ftp.math.uni-hamburg.de; this carries a daily mirror of the pub/chess directory at the chess.uoknor.edu site.
There is a report that mail2pgn has been superseded by the newer program "MV2PGN" described below.
A vendor for North America is:
International Chess Enterprises
P.O. Box 19457
Seattle, WA 98109
USA
(800) 262-4277
A vendor for Europe is:
Gambit-Soft
Feckenhauser Strasse 27
D-78628 Rottweil
GERMANY
49-741-21573
A vendor for North America is:
International Chess Enterprises
P.O. Box 19457
Seattle, WA 98109
USA
(800) 262-4277
The BOOKUP 8.1.1 Addenda notes dated 1993.12.17 provide comprehensive information on how to use EPD in conjunction with "analyst" programs such as Zarkov and HIARCS. Specifically, the search and evaluation abilities of an analyst program are combined with the information organization abilities of the BOOKUP database program to provide position scoring. This is done by first having BOOKUP export a database in EPD format, then having an analyst program annotate each EPD record with a numeric score, and then having BOOKUP import the changed EPD file. BOOKUP can then apply minimaxing to the imported database; this results in scores from terminal positions being propagated back to earlier positions and even back to moves from the starting array.
For some reason, BOOKUP calls this process "backsolving", but it's really just standard minimaxing. In any case, it's a good example of how different programs from different authors performing different types of tasks can be integrated by use of a common, non-proprietary standard. This allows for a new set of powerful features that are beyond the capabilities of any one of the individual component programs.
BOOKUP allows for some customizing of EPD actions. One such customization is to require the positional evaluations to follow the EPD standard; this means that the score is always given from the viewpoint of the active player. This is explained more fully in the section on the "ce" (centipawn evaluation) opcode in the EPD description in a later section of this document. To ensure that BOOKUP handles the centipawn evaluations in the "right" way, the EPD setting "Positive for White" must be set to "N". This makes BOOKUP work correctly with Zarkov and with all other programs that use the "right" centipawn evaluation convention. There is an apparent problem with HIARCS that requires this option to be set to "Y"; but this really means that, if true, HIARCS needs to be adjusted to use the "right" centipawn evaluation convention.
A vendor in North America is:
BOOKUP
2763 Kensington Place West
Columbus, OH 43202
USA
(800) 949-5445
(614) 263-7219
HIARCS
c/o BOOKUP
2763 Kensington Place West
Columbus, OH 43202
USA
(800) 949-5445
(614) 263-7219
The ChessBase related utilities (cb2pgn/pgn2cb) are found at chess.uoknor.edu in the pub/chess/Game-Databases/ChessBase directory.
The NIC related utilities (nic2pgn/pgn2nic) are found at chess.uoknor.edu in the pub/chess/Game-Databases/NIC directory.
For further information about the Hansen utilities, the contact person is the author, Carsten Hansen ch0506@hdc.hha.dk.
Slappy may also be useful to those who have a full feature program who also need to run time consuming chess database tasks on a spare computer.
Suggestions and comments should be directed to its author, Steven J. Edwards sje@world.std.com. More details will appear here as they become available.
CHESSX Software
12 Bluebell Close
Glenmore Park
AUSTRALIA 2745.
The ideas behind CHESSOP can be seen in CHESSOPN (alias CHESSOPG), a free
version on the ICS server which has a reduced openings database (25,000
positions) and no PGN or transposition support but is otherwise the same as
CHESSOP. (These are the files "chessopg.zip" in the directory pub/chess/DOS at
the chess.uoknor.edu ftp site.)
AFG: Afghanistan AIR: Aboard aircraft ALB: Albania ALG: Algeria AND: Andorra ANG: Angola ANT: Antigua ARG: Argentina ARM: Armenia ATA: Antarctica AUS: Australia AZB: Azerbaijan BAN: Bangladesh BAR: Bahrain BHM: Bahamas BEL: Belgium BER: Bermuda BIH: Bosnia and Herzegovina BLA: Belarus BLG: Bulgaria BLZ: Belize BOL: Bolivia BRB: Barbados BRS: Brazil BRU: Brunei BSW: Botswana CAN: Canada CHI: Chile COL: Columbia CRA: Costa Rica CRO: Croatia CSR: Czechoslovakia CUB: Cuba CYP: Cyprus DEN: Denmark DOM: Dominican Republic ECU: Ecuador EGY: Egypt ENG: England ESP: Spain EST: Estonia FAI: Faroe Islands FIJ: Fiji FIN: Finland FRA: France GAM: Gambia GCI: Guernsey-Jersey GEO: Georgia GER: Germany GHA: Ghana GRC: Greece GUA: Guatemala GUY: Guyana HAI: Haiti HKG: Hong Kong HON: Honduras HUN: Hungary IND: India IRL: Ireland IRN: Iran IRQ: Iraq ISD: Iceland ISR: Israel ITA: Italy IVO: Ivory Coast JAM: Jamaica JAP: Japan JRD: Jordan JUG: Yugoslavia KAZ: Kazakhstan KEN: Kenya KIR: Kyrgyzstan KUW: Kuwait LAT: Latvia LEB: Lebanon LIB: Libya LIC: Liechtenstein LTU: Lithuania LUX: Luxembourg MAL: Malaysia MAU: Mauritania MEX: Mexico MLI: Mali MLT: Malta MNC: Monaco MOL: Moldova MON: Mongolia MOZ: Mozambique MRC: Morocco MRT: Mauritius MYN: Myanmar NCG: Nicaragua NET: The Internet NIG: Nigeria NLA: Netherlands Antilles NLD: Netherlands NOR: Norway NZD: New Zealand OST: Austria PAK: Pakistan PAL: Palestine PAN: Panama PAR: Paraguay PER: Peru PHI: Philippines PNG: Papua New Guinea POL: Poland POR: Portugal PRC: People's Republic of China PRO: Puerto Rico QTR: Qatar RIN: Indonesia ROM: Romania RUS: Russia SAF: South Africa SAL: El Salvador SCO: Scotland SEA: At Sea SEN: Senegal SEY: Seychelles SIP: Singapore SLV: Slovenia SMA: San Marino SPC: Aboard spacecraft SRI: Sri Lanka SUD: Sudan SUR: Surinam SVE: Sweden SWZ: Switzerland SYR: Syria TAI: Thailand TMT: Turkmenistan TRK: Turkey TTO: Trinidad and Tobago TUN: Tunisia UAE: United Arab Emirates UGA: Uganda UKR: Ukraine UNK: Unknown URU: Uruguay USA: United States of America UZB: Uzbekistan VEN: Venezuela VGB: British Virgin Islands VIE: Vietnam VUS: U.S. Virgin Islands WLS: Wales YEM: Yemen YUG: Yugoslavia ZAM: Zambia ZIM: Zimbabwe ZRE: Zaire
A single FEN record uses one text line of variable length composed of six data fields. The first four fields of the FEN specification are the same as the first four fields of the EPD specification.
A text file composed exclusively of FEN data records should have a file name with the suffix ".fen".
Many interesting chess problem sets represented using FEN can be found at the chess.uoknor.edu ftp site in the directory pub/chess/SAN_testsuites.
A FEN description has six fields. Each field is composed only of non-blank printing ASCII characters. Adjacent fields are separated by a single ASCII space character.
An en passant target square is given if and only if the last move was a pawn advance of two squares. Therefore, an en passant target square field may have a square name even if there is no pawn of the opposing side that may immediately execute the en passant capture.
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1And after the move 1. e4:
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1And then after 1. ... c5:
rnbqkbnr/pp1ppppp/8/2p5/4P3/8/PPPP1PPP/RNBQKBNR w KQkq c6 0 2And then after 2. Nf3:
rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2For two kings on their home squares and a white pawn on e2 (White to move) with thirty eight full moves played with five halfmoves since the last pawn move or capture:
4k3/8/8/8/8/8/4P3/4K3 w - - 5 39
A single EPD uses one text line of variable length composed of four data field followed by zero or more operations. The four fields of the EPD specification are the same as the first four fields of the FEN specification.
A text file composed exclusively of EPD data records should have a file name with the suffix ".epd".
Many interesting chess problem sets represented using EPD can be found at the chess.uoknor.edu ftp site in the directory pub/chess/SAN_testsuites.
(Note: due to the likelihood of future expansion of EPD, implementors are encouraged to have their programs handle EPD text lines of up to 1024 characters long.)
Each EPD data field is composed only of non-blank printing ASCII characters. Adjacent data fields are separated by a single ASCII space character.
An en passant target square is given if and only if the last move was a pawn advance of two squares. Therefore, an en passant target square field may have a square name even if there is no pawn of the opposing side that may immediately execute the en passant capture.
Multiple operations are separated by a single space character. If there is at least one operation present in an EPD line, it is separated from the last (fourth) data field by a single space character.
An operand is either a set of contiguous non-white space printing characters or a string. A string is a set of contiguous printing characters delimited by a quote character at each end. A string value must have less than 256 bytes of data.
If at least one operand is present in an operation, there is a single space between the opcode and the first operand. If more than one operand is present in an operation, there is a single blank character between every two adjacent operands. If there are no operands, a semicolon character is appended to the opcode to mark the end of the operation. If any operands appear, the last operand has an appended semicolon that marks the end of the operation.
Any given opcode appears at most once per EPD record. Multiple operations in a single EPD record should appear in ASCII order of their opcode names (mnemonics). However, a program reading EPD records may allow for operations not in ASCII order by opcode mnemonics; the semantics are the same in either case.
Some opcodes that allow for more than one operand may have special ordering requirements for the operands. For example, the "pv" (predicted variation) opcode requires its operands (moves) to appear in the order in which they would be played. All other opcodes that allow for more than one operand should have operands appearing in ASCII order. An example of the latter set is the "bm" (best move[s]) opcode; its operands are moves that are all immediately playable from the current position.
Some opcodes require one or more operands that are chess moves. These moves should be represented using SAN. If a different representation is used, there is no guarantee that the EPD will be read correctly during subsequent processing.
Some opcodes require one or more operands that are integers. Some opcodes may require that an integer operand must be within a given range; the details are described in the opcode list given below. A negative integer is formed with a hyphen (minus sign) preceding the integer digit sequence. An optional plus sign may be used for indicating a non-negative value, but such use is not required and is indeed discouraged.
Some opcodes require one or more operands that are floating point numbers. Some opcodes may require that a floating point operand must be within a given range; the details are described in the opcode list given below. A floating point operand is constructed from an optional sign character ("+" or "-"), a digit sequence (with at least one digit), a radix point (always "."), and a final digit sequence (with at least one digit).
Opcode mnemonics used only by a single program or an experimental suite of programs should start with an upper case letter. This is so they may be easily distinguished should they be inadvertently be encountered by other programs. When a such a "private" opcode be demonstrated to be widely useful, it should be brought into the official list (appearing below) in a lower case form.
If a given program does not recognize a particular opcode, that operation is simply ignored; it is not signaled as an error.
This ten member comment family of opcodes is intended for use as descriptive commentary for a complete game or game fragment. The usual processing of these opcodes are as follows:
Values are restricted to integers that are equal to or greater than -32767 and are less than or equal to 32766.
A value greater than 32000 indicates the availability of a forced mate to the active player. The number of plies until mate is given by subtracting the evaluation from the value 32767. Thus, a winning mate in N fullmoves is a mate in ((2 * N) - 1) halfmoves (or ply) and has a corresponding centipawn evaluation of (32767 - ((2 * N) - 1)). For example, a mate on the move (mate in one) has a centipawn evaluation of 32766 while a mate in five has a centipawn evaluation of 32758.
A value less than -32000 indicates the availability of a forced mate to the passive player. The number of plies until mate is given by subtracting the evaluation from the value -32767 and then negating the result. Thus, a losing mate in N fullmoves is a mate in (2 * N) halfmoves (or ply) and has a corresponding centipawn evaluation of (-32767 + (2 * N)). For example, a mate after the move (losing mate in one) has a centipawn evaluation of -32765 while a losing mate in five has a centipawn evaluation of -32757.
A value of -32767 indicates an illegal position. A stalemate position has a centipawn evaluation of zero as does a position drawn due to insufficient mating material. Any other position known to be a certain forced draw also has a centipawn evaluation of zero.
This opcode is intended for use with problem sets composed of positions requiring direct mate answers as solutions.
The usage is similar to that of the "ECO" tag pair of the PGN standard.
This opcode is used to explicitly represent the fullmove number in EPD that is present by default in FEN as the sixth field. Fullmove number information is usually omitted from EPD because it does not affect move generation (commonly needed for EPD-using tasks) but it does affect game notation (commonly needed for FEN-using tasks). Because of the desire for space optimization for large EPD files, fullmove numbers were dropped from EPD's parent FEN. The halfmove clock information was similarly dropped.
This opcode is used to explicitly represent the halfmove clock in EPD that is present by default in FEN as the fifth field. Halfmove clock information is usually omitted from EPD because it does not affect move generation (commonly needed for EPD-using tasks) but it does affect game termination issues (commonly needed for FEN-using tasks). Because of the desire for space optimization for large EPD files, halfmove clock values were dropped from EPD's parent FEN. The fullmove number information was similarly dropped.
This opcode is intended for use with test suites used for measuring chessplaying program strength. An example "id" operand for the seven hundred fifty seventh position of the one thousand one problems in Reinfeld's _1001 Winning Chess Sacrifices and Combinations_ would be "WCSAC.0757" while the fifteenth position in the twenty four problem Bratko-Kopec test suite would have an "id" operand of "BK.15".
The usage is similar to that of the "NIC" tag pair of the PGN standard.
If a non-empty "pv" (predicted variation) line of play is also present in the same EPD record, the first move of the predicted variation is the same as the predicted move. The "pm" opcode is intended for use as a general "display hint" mechanism.
If a "pm" (predicted move) operation is also present in the same EPD record, the predicted move is the same as the first move of the predicted variation.
The "sm" opcode is intended for use to communicate the most recent played move in an active game. It is used to communicate moves between programs in automatic play via a network. This includes correspondence play using e-mail and also programs acting as network front ends to human players.
This ten member variation name family of opcodes is intended for use as traditional variation names for a complete game or game fragment. The usual processing of these opcodes are as follows:
For the above authors only, a list of alternative piece letter codes are provided:
Language Piece letters (pawn knight bishop rook queen king) ---------- -------------------------------------------------- Czech P J S V D K Danish B S L T D K Dutch O P L T D K English P N B R Q K Estonian P R O V L K Finnish P R L T D K French P C F T D R German B S L T D K Hungarian G H F B V K Icelandic P R B H D K Italian P C A T D R Norwegian B S L T D K Polish P S G W H K Portuguese P C B T D R Romanian P C N T D R Spanish P C A T D R Swedish B S L T D K
The binary coded version of PGN is PGC (PGN Game Coding). PGC is a binary representation standard of PGN data designed for the dual goals of storage efficiency and program I/O. A file containing PGC data should have a name with a suffix of ".pgc".
Unlike PGN text files that may have locale dependent representations for newlines, PGC files have data that does not vary due to local processing environment. This means that PGC files may be transferred among systems using general binary file methods.
PGC files should be used only when the use of PGN is impractical due to time and space resource constraints. As the general level of processing capabilities increases, the need for PGC over PGN will decrease. Therefore, implementors are encouraged not to use PGC as the default representation because it is much more difficult (than PGN) to understand without proper software.
PGC data is composed of a sequence of PGC records. Each record is composed of a sequence of one or more bytes. The first byte is the PGN record marker and it specifies the interpretation of the remaining portion of the record. This remaining portion is composed of zero or more PGN record items. Item types move sequences, move sets, and character strings.
A one byte integer item is called "int-1". A two byte integer item is called "int-2". A four byte integer item is called "int-4".
Characters are stored as bytes using the ISO 8859/1 Latin-1 (ECMA-94) code set. There is no provision for other characters sets or representations.
Examples: From the initial position, there are twenty moves. Move ordinal 0 corresponds to the SAN move string "Na3"; move ordinal 1 corresponds to "Nc3", move ordinal 4 corresponds to "a