When a protein molecule binds with another biological polymer (protein or nucleic acid) to form a complex, the subset of residues in the interface that account for most of a protein binding free energy are called binding hot spots. The KFC2 Server provides a user-friendly, web-based tool for predicting protein binding hot spots based on machine learning approaches. For each residue within the binding interface, the KFC2 Server characterizes its local structural environment and compares it to known environments of experimentally determined hot spots. A prediction is then made whether or not the residue is a hot spot. After the computational analysis is complete, the results may be visualized using an interactive job viewer. In addition to standard molecular viewing functionality, the job viewer allows the user to quickly highlight predicted hot spots and surrounding structural features. Two different machine learning methods are implelemented on the KFC2 Server.
TopThe orginal KFC method is comprised of two decision tree-based models: K-FADE (based on shape specificity features), and K-CON (based on biochemical contacts such as intermolecular hydrogen bonds and atomic contacts). Using a data set of experimental alanine-scanning mutations, each model is trained to recognize local structural environments that are indicative of hot spots. For this method, hot spots are defined as mutations associated with a change in binding energy (∆∆G) greater than 2 kcal/mol.
TopWith KFC2, two separate SVM models are implemented. These are referred to as KFC2a and KFC2b. KFC2a offers higher sensitivity and accuracy, but lower specificity than KFC2b. The user may examine both model scores interactively with the KFC Viewer as described below. The results may also be retrieved in tabular form. Depending on how the predictions are to be used, one model may be preferred over the other. If the user wants to have the highest degree of confidence that the predicted hotspots are truly hotspots, then the KFC2b model should be used. On the other hand, if it is important to not overlook any possible hotspots, then KFC2a should be used.
You will notice an additional button near the "submit" button which allows you to choose the old version. Presently, nucleic acid chains are not handled with the new KFC2 method. While work in this area is under way, submissions of interfaces involving nucleic acid chains will automatically be run using the original KFC method.
The citations given below provide a complete discussion of the development and performance of the KFC and KFC2 methods:
Please cite the appropriate article in any work that uses the KFC/KFC2 Server:
Users can register prior to submitting jobs to any of the tools hosted by the Mitchell Lab or submit jobs anonymously. Personal information is only used to contact users when their analysis is complete; it will not be shared. To register, enter a unique user name and email address on the registration page, then click the submit button. An error message will display if the selected user name is in use by another user.
Once registered, users may log in to the server. Although login is not required to submit jobs, it allows a user to view their personal jobs in the job viewer. Both the username and password are case sensitive. By default, a login will expire after two weeks; however, a user may manually logoff as well.
Top
A protein binding interface is the region between two or more polymer chains where the atoms from the different chains interact strongly enough to form a stable complex. A valid PDB file submitted to the KFC2 Server must contain at least two separate polymer chains in order to contain an interface. In the case of homodimers, the chemical composition of the chains will be identical, but the PDB file should still contain two unique chain label identifiers. In some cases, the PDB file downloaded from the Protein Data Bank may require the application of symmetry operators in order to generate the biologically significant interface. Beware, that while the Protein Data Bank does offer the capability of downloading a PDB file containing the biologically significant interface, the file should be checked for proper chain labels, and edited if necessary, before running the file through KFC2. The current version of these generated files from the Protein Data Bank does not use unique chain labels to distinguish the interface chains, but rather separates the chain groups using MODEL and ENDMDL keywords.
In the job submission form, two chain sets may be specified. Chain labels are case sensitive. Valid chain labels may by letters, uppercase or lowercase, or digits 0 through 9. While it is recommended that the user specifies the chain labels, if no chain labels are specified in the form, the KFC Server will attempt to automatically assign the interface chain sets for you.
The automatic chain selection works as follows: If a TER record is found in the PDB file, all ATOMS before the TER are used for Chain Set 1, and all ATOMS after the TER are used for Chain Set 2. If the PDB file does not have any chain labels (a space character in column 22) a unique chain label will be assigned to that chain (defined by location TER records. Also, if the same chain label is used before and after the first TER record, the second chain will be given the lowercase label of the first chain. If a chain separating TER record is not found in the PDB file, then the automatic chain selection will be based on the chain labels. All ATOMS with the first chain label will be used for Chain Set 1, and all ATOMS after this chain will be used for Chain Set 2. In this mode, if additional chains are contained in the file after the first chain which have the same chain label as the first chain, these chains will be assigned different unique chain labels. Beware that TER records are also used to specify breaks in the same chain for regions where the structure is undetermined. The presence of such chain breaks may result in confusing results.
Note that the Jmol viewer, when auto-selecting the color of a chain based on chain label, uses the same color for both uppercase and lowercase case labels. Also, chain labels with the digits 0-9 will use the same color as chains Q-Z. It may be that you will have to rename the chains in order to obtain unique colors for the chains in the displayed molecule. By using the Viewer to toggled on or off a given chain, it is relatively simple to locate specific chains by label. Also, by hovering over a chain, an atom label will pop-up which includes the chain label. A complete summary of the chain sets used in a job may be viewed in the as described below in the Examining Results section.
Note that the KFC Server does not predict interfaces, it analyzes given interfaces for hotspots.
If you are trying to predict potential interface residues for a
single protein chain, we highly recommend that you look into
Consurf
or
Evolutionary Trace
or one of the many websites mentioned in this
paper.
In addition, model structures containing many clashes may vastly overestimate the number of hot spots. Please remove these from your PDB file before submitting to the server.
Finally, the original KFC model is able to analyze structures containing proteins and DNA/RNA but not other types of molecules. Presently, nucleic acid chains are not handled with the new KFC2 method. Until work is completed to add this capability, the current KFC2 Server will automatically switch to using the original KFC model for cases when interface nucleic acid chains are selected.
TopTo analyze an interface, enter the following information on the submission page and click the Submit button:
The job queue displays the current status for each submitted job (Queued, Active, View Results, or Error), and provides links to KFC input and output files. After processing begins, a typical KFC analysis finishes within two minutes. When the task is complete, an email is optionally sent to you with a link to your KFC hot spot predictions or an error message. If the job finishes successfully, the status field will contain a link to the interactive job viewer.
TopFor jobs run under the original KFC server, the following error codes were used:
Using the Job Queue display, you may access KFC input, output, and error files by clicking on a job’s identification number. This is the number in the very first column of the job queue listing. Clicking on the number will bring up a list of the files which you may examine or download. Using the Chains link on this files list page will display how the interface chain sets were defined along with the Jmol color. Depending on your web browser, this page will either be displayed in a new browser window or as a separate tab. Also available form the file list is a file with the name ending with the extension .results. This file contains the numerical results, as well as the hot spot classification, for the residues determined to be part of the interface. The results files generated using the KFC2 Server using the PDB file for 1DVA and interface between chains H and X are given below for both the original KFC and KFC2 methods.
Top
KFC Hot Spot Prediction Server @mitchell-lab.org from Thu, 17 Mar 2011 12:19:50 CDT
JobId: 3749 JobName: Demo_22_1dva_kfc_classic
K-FADE K-FADE K-CON K-CON ConSurf ConSu Rosetta Roset Exper Exper
Chain Res Num Class Conf Class Conf Class Value Class DDG Class Value
----------------------------------------------------------------------------------------------------
H LEU 32 ------- 0.53 ------- 0.92 ------- 2 ------- 0.41 Hotspot Str
H LEU 34 ------- 0.91 ------- 0.91 ------- 2 ------- 1.25 Hotspot Str
H ASN 37 ------- 1.00 ------- 1.00 ------- 1 ------- 0.01 ------- Ins
H GLY 38 ------- 1.00 ------- 1.00 ------- 3 ------- --- ------- ---
H ALA 39 ------- 1.00 ------- 1.00 ------- 1 ------- --- ------- ---
H GLN 40 ------- 0.64 ------- 1.00 ------- 6 ------- 0.01 ------- ---
H ILE 65 ------- 0.67 ------- 0.92 ------- 3 ------- 0.73 ------- Ins
H VAL 67 ------- 0.75 ------- 0.92 ------- 5 ------- 0.70 ------- Ins
H GLU 70 ------- 0.63 ------- 0.72 Conserv 7 ------- 1.02 ------- Weak
H LEU 73 ------- 0.50 Hotspot 0.63 ------- 2 ------- 0.53 ------- Ins
H SER 74 ------- 0.81 ------- 0.64 ------- 5 ------- 0.11 ------- Ins
H GLU 75 ------- 1.00 ------- 0.70 ------- 1 ------- 0.00 ------- Ins
H HIS 76 ------- 0.93 ------- 1.00 ------- 1 ------- 0.43 Hotspot Str
H GLU 80 ------- 0.92 ------- 0.70 Conserv 7 ------- 0.01 ------- Ins
H SER 82 ------- 1.00 ------- 1.00 ------- 1 ------- -0.01 ------- Ins
H LEU 144 Bakbone 0.64 ------- 0.52 ------- 1 ------- 0.28 ------- Ins
H LEU 145 ------- 1.00 ------- 0.92 ------- 1 ------- --- ------- ---
H ASP 146 ------- 1.00 ------- 1.00 ------- 1 ------- --- ------- ---
H ARG 147 ------- 1.00 ------- 1.00 ------- 1 ------- --- ------- ---
H LEU 153 ------- 0.66 ------- 0.92 ------- 2 ------- 0.82 ------- Weak
X ALA 1 ------- 1.00 ------- 0.73 ------- --- ------- --- ------- ---
X LEU 2 ------- 0.58 ------- 0.60 ------- --- Hotspot 2.31 Hotspot Str
X ARG 7 ------- 1.00 ------- 1.00 ------- --- Hotspot 4.40 ------- Weak
X VAL 8 ------- 0.72 ------- 1.00 ------- --- ------- 0.57 ------- Int
X ASP 9 ------- 0.65 Hotspot 0.53 ------- --- ------- 0.66 ------- Int
X TRP 11 ------- 0.52 ------- 0.74 ------- --- Hotspot 2.61 Hotspot Str
X TYR 12 ------- 0.75 Hotspot 0.55 ------- --- Hotspot 3.16 Hotspot Str
X PHE 15 Hotspot 0.58 ------- 0.69 ------- --- ------- 1.58 Hotspot Str
KFC2 Hot Spot Prediction Server @mitchell-lab.org from Thu, 17 Mar 2011 12:18:45 CDT
JobId: 3748 JobName: Demo_22_1dva_kfc2
KFC2-A KFC2-A KFC2-B KFC2-B ConSurf ConSu Rosetta Roset Exper Exper
Chain Res Num Class Conf Class Conf Class Value Class DDG Class Value
------------------------------------------------------------------------------------------------------
H LEU 32 ------- -0.75 Hotspot 0.10 ------- 2 ------- 0.41 Hotspot Str
H LEU 34 ------- -0.71 Hotspot 0.11 ------- 2 ------- 1.25 Hotspot Str
H ASN 37 ------- -1.79 ------- -0.97 ------- 1 ------- 0.01 ------- Ins
H GLY 38 ------- -0.15 ------- -0.61 ------- 3 ------- --- ------- ---
H ALA 39 ------- -1.59 ------- -0.87 ------- 1 ------- --- ------- ---
H GLN 40 ------- -1.53 ------- -0.98 ------- 6 ------- 0.01 ------- ---
H ASP 60 ------- ----- ------- ----- ------- 1 ------- --- ------- ---
H ILE 65 ------- -0.77 ------- -0.40 ------- 3 ------- 0.73 ------- Ins
H VAL 67 ------- -0.30 ------- -0.12 ------- 5 ------- 0.70 ------- Ins
H GLU 70 ------- -1.28 ------- -0.73 Conserv 7 ------- 1.02 ------- Weak
H LEU 73 Hotspot 0.14 Hotspot 0.24 ------- 2 ------- 0.53 ------- Ins
H SER 74 ------- -1.20 ------- -0.89 ------- 5 ------- 0.11 ------- Ins
H GLU 75 ------- -1.83 ------- -0.98 ------- 1 ------- 0.00 ------- Ins
H HIS 76 ------- -0.95 ------- -0.81 ------- 1 ------- 0.43 Hotspot Str
H GLU 80 ------- -1.26 ------- -0.65 Conserv 7 ------- 0.01 ------- Ins
H GLN 81 ------- -2.03 ------- -0.98 ------- 2 ------- --- ------- ---
H SER 82 ------- -1.23 ------- -0.86 ------- 1 ------- -0.01 ------- Ins
H SER 129 ------- ----- ------- ----- ------- 6 ------- --- ------- ---
H LEU 144 ------- -0.75 ------- -0.19 ------- 1 ------- 0.28 ------- Ins
H LEU 145 ------- -1.62 ------- -0.98 ------- 1 ------- --- ------- ---
H ASP 146 ------- -2.30 ------- -0.92 ------- 1 ------- --- ------- ---
H ARG 147 ------- -2.29 ------- -0.78 ------- 1 ------- --- ------- ---
H GLY 149 ------- -2.34 ------- -0.73 ------- 6 ------- --- ------- ---
H ALA 152 ------- -2.16 ------- -0.87 Conserv 7 ------- --- ------- ---
H LEU 153 ------- -0.71 ------- -0.08 ------- 2 ------- 0.82 ------- Weak
H GLN 170 ------- ----- ------- ----- ------- 2 ------- --- ------- ---
H TYR 184 ------- ----- ------- ----- Conserv 7 ------- --- ------- ---
H LYS 188 ------- ----- ------- ----- ------- 5 ------- --- ------- ---
X ALA 1 ------- -1.60 ------- -0.90 ------- --- ------- --- ------- ---
X LEU 2 Hotspot 0.39 Hotspot 0.14 ------- --- Hotspot 2.31 Hotspot Str
X CYS 3 ------- -1.93 ------- -0.88 ------- --- ------- --- ------- ---
X ASP 5 ------- -1.66 ------- -0.89 ------- --- ------- 1.65 ------- ---
X ARG 7 ------- -0.52 ------- -0.29 ------- --- Hotspot 4.40 ------- Weak
X VAL 8 ------- -0.57 ------- -0.36 ------- --- ------- 0.57 ------- Int
X ASP 9 ------- -0.01 ------- -0.27 ------- --- ------- 0.66 ------- Int
X TRP 11 ------- -0.53 Hotspot 0.01 ------- --- Hotspot 2.61 Hotspot Str
X TYR 12 Hotspot 0.34 Hotspot 0.35 ------- --- Hotspot 3.16 Hotspot Str
X GLN 14 ------- -1.99 ------- -0.94 ------- --- ------- 0.10 ------- ---
X PHE 15 Hotspot 0.04 Hotspot 0.13 ------- --- ------- 1.58 Hotspot Str
X VAL 16 ------- -2.21 ------- -1.00 ------- --- ------- 0.22 ------- ---
The job viewer has two major components: a molecular viewer on the left, and a control panel on the right. Users can directly interact with the molecular viewer or use the control panel to affect the display.
In the sample screen shot below, the longer chain H is colored in light coral and the short chain X in teal. Since only these two chains were selected in the job submission form, the other chains contained in PDB file for 1DVA are not shown. Included in the view are space filled hetero atom residues which have at least one atom within 4 Angstroms of an atom in one the chosen chains.
Just below the Interface and KFC-2 Hot Spots heading you will notice that residues determined to be in the interface region are listed. For interface residues which are predicted to be hot spots by either of the two KFC2 models or any of the additional data (Consurf, Rosetta, Experimental) a pink background is used in the box immediately surrounding the residue label. Non-hot spot interface residues are indicated with a white background. For each of the three hotspot residues LEU32:H, LEU34:H, and TYR12:X the first check box immediately below the residue label has been clicked in order to display these residues in space filling mode. Notice that in this snapshot, the mouse pointing arrow was positioned over the TYR12:X label causing the actual data for this hot spot to pop-up. Depending on your browser, you may have to click the label to get the pop-up data, though simply hovering should cause it to appear.
Each component is described in more detail below.
Top
KFC uses the Fast Atomic Density Evaluator (FADE) to analyze the shape specificity within a protein-protein interface. Users can highlight different degrees of shape specificity clicking on the different color-coded checkboxes.
These controls alter the appearance of the selected atoms. By default, KFC selects all protein atoms in the complex. Advanced users may change the atom selection by using the Jmol scripting language.
Additionally, users can save up to four different views of their session.
The three checkboxes in each cell control the display of an interface residue.
The coloring within each cell also encodes information about the residue.
Jmol is the molecular viewer used throughout the Mitchell Lab website. It is an applet written in Java, so users must enable Java and Javascript in their web browsers in order to use the KFC Server. Also, Windows users may need to install the most current Sun Java Runtime Environment (JRE) in order to use Jmol. Jmol is extensively documented, so we direct users to the following websites for information about its use.
TopIf you use the console to make selections and change displays, the selections shown in the Control Panel may no longer be accurate. Actions taken using the console override any mouse-driven selection and display controls.
Top