Sequence Import

The commercial version of abYsis allows users to import sequences into the abYsis database and process them for analysis.

Select Import on the abYsis main menu:

On first usage you will most likely be presented with this screen where the address will the where your IT team have installed your local version of abYsis.

They will have determined what the Username & Password should be during the installation process. This allows companies to restrict who can import sequences into abYsis whilst still providing broader access to other users.

Note: You follow the abYsis-specific Fasta format described in this document for trouble-free access.

Once you have successfully entered you will see the following screen;

Select File

From the Upload tab, select a file from your local computer. It must be in the Fasta based format described later

Translation method:

Unless you have particular reason to select another, we suggest simple translation. See Annotate section for more detail.

Upload

Click Upload to upload your dataset. You can monitor progress on the Loaded tab.

An active job is indicated by a spinning icon. (Currently, the system only permits one active job.)

Successful processing is indicated by a green tick.

A red cross indicates and issue and you should look at the log files for feedback and issues with your upload.

Once your data is in the database the web pages will reflect the updated information.

Options

Note: Do not run more than one of the following options at the same time. Wait for each one to finish before progressing. In the event that more than one option is accidentally activated you should wait the allotted time and then manually refresh the page as it is possible only the first completed task will automatically update the page.

Delete Uploads

You can remove uploads from the database using the Delete Uploads button. Select which entries to delete and click the button. It is recommended not to do this action whilst a statistics and frequencies rebuild is occuring.

Remove failures

If any data fails to upload then a failed entry may exist in the table. To remove all unsuccessful uploads click the Remove failures button.

Download Uploaded Data

It is possible to download all the uploaded sequences. This is a good way to back-up your uploaded sequences and transfer them to a new installation.

Click the Prepare Fasta and wait.

When ready the button Download Fasta will appear.

Rebuild Statistics and Frequencies

This will update the Statistics tab and reflect uploads that have been made since the last Rebuild. This will also update the amino acid frequencies throughout the site. This action can take up to an hour on older servers. Please be patient.

Note: You may need to force refresh/reload of pages if you browser has cached an older version.

Optimise Database

After uploading more than 10,000 sequences it can be beneficial to optisise the database. The Optimise button will only be available when 10,000 new sequences have been uploaded. Database will be slower for a couple of hours while optimisation occurs so you are recommended to do overnight or when system usage is lower.

abYsis Fasta Format

Input file must conform to a specific format.

The Title line must start with a '>' character, followed by six fields separated by vertical bars '|' as follows:

>accession|partner|species|datasource|chaintype|moleculetype
  1. accession An accession code for the chain (required)
  2. partner An accession code for the partner chain (required only for pairing)
  3. species A Latin species name, lower case (optional but recommended. You can make your own species if you wish to keep data separate)
  4. datasource A data source name (required)
  5. chaintype The chain type: 'heavy' or 'light' (optional for unpaired, required for paired)
  6. moleculetype The sequence type 'dna' or 'protein' (required)

An example:

>001485|007059|homo sapiens|mydata|heavy|protein
QVQLKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA 

Further notes:

  • While each file may only contain <% $maxUploadSeqs %> sequences, you may upload multiple files.
  • Accessions must be unique for a given data source. Paired chains must have the same data source and be in the same file.
  • You may have a mix of paired and unpaired chains in the same file.
  • Chain types must be provided for paired chains.
  • Pairing information must be consistent within the file. If the new data replaces or updates data_source/accessions that are already loaded in the database, pairing information for the chains involved will be deleted from the database prior to loading the new data.

All fields must be present but optional fields can be left empty.

For example, if a chain is unpaired, leave the partner accession field empty, like this:

>001485||homo sapiens|mydata|heavy|protein 
QVQLKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA

You may provide both DNA and/or protein sequence data for a given accession.

If providing both, repeat the title line (exactly) except for the final field (protein or dna).

>001485|007059|homo sapiens|mydata|heavy|protein 
QVQLKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA 
>001485|007059|homo sapiens|mydata|heavy|dna 
caggtgcagctgaaggagtcaggacctggcctagtgccctcagcctgtcctgcacagtct
ctttctcactagcggtgttgggttcgccctccaggaaagggtctggagtggctgatgatatg
gagtggtggaagcacagtcaatgcagctttcatatccagactgagcatcagcaaggaca
attccaagagcgtttttatgaacagtctgcaagctaatgacacagccatatattactgtgcca
gagactatgggtaccaagggactctgactgtctctgca

Other points:

  • Accessions must be unique for a given data source
  • Paired chains must have the same data source
  • You may have a mix of paired and unpaired chains in the same file
  • The chain types must be provided for paired chains
  • Pairing information must be consistent within the file

Avoiding Errors

Only use pairing information if you really require it

Example 1.

This file only has one entry, but it will load correctly because there is no accession in the second field that describes heavy chain/light chain pairing. You do not require both protein and DNA, the system will work with just one of these.

>140665||mus musculus|mydata|heavy|protein
QVQLKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA
>140665||mus musculus|mydata|heavy|dna
caggtgcagctgcaggagtctggacctggcctagtgcagccctcacagagcctgtccatcacct
gcacagtctctggtttctcattaactagctatggtgtacactgggttcgccagtctccaggaaaggg
tctggagtggctgggagtgatatggagtggtggaagcacagactataatgcagctttcatatcca
gactgagcatcagcaaggacaattccaagagccaagttttctttaaaatgaacagtctgcaagct
aatgacacagccatatattactgtgccagaaacgcttactggggccaagggactctggtcactgt
ctctgca 

Example 2.

This file also has only one entry but loading will fail because there is an accession for pairing file (020167) yet no sequence with accession 020167 in the upload file.

>140666|020167|mus musculus|johnfile|heavy|protein
QVQIKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA
>140666|020167|mus musculus|johnfile|heavy|dna 
aggtgcagctgcaggagtctggacctggcctagtgcagccctcacagagcctgtccatcacct
gcacagtctctggtttctcattaactagctatggtgtacactgggttcgccagtctccaggaaaggg
tctggagtggctgggagtgatatggagtggtggaagcacagactataatgcagctttcatatcca
gactgagcatcagcaaggacaattccaagagccaagttttctttaaaatgaacagtctgcaagct
aatgacacagccatatattactgtgccagaaacgcttactggggccaagggactctggtcactgt
ctctgca 

Example 3.

This contains two sequences that will load correctly because the pairing information refers to another sequence in the upload file.

>Z-AAA50198|Z-AAA50199|mus musculus|nishikawadata|heavy|protein
LESGAELVKPGASVKLSCKASGYTFSSYWMHWVKQRPGQGLEWIGEIHPSNGLTNYNEKF
KSKATLTVDKSSSTAYMQLSSLTSEDSAVYYCAKGKELGRFAYWGQGTLVTVSA
>Z-AAA50198|Z-AAA50199|mus musculus|nishikawadata|heavy|dna
ctcgagtctggggctgaactggtgaagcctggggcttcagtgaagttgtcctgcaaggcttct
ggctacaccttcagtagttactggatgcactgggtgaagcagaggcctggacaaggacttg
agtggattggagagattcatcctagcaacggtcttactaactataatgagaagttcaagagt
aaggccacattgactgtagacaaatcctccagcacagcctacatgcaactcagcagcctg
acatctgaggactctgcggtctattactgtgcaaaagggaaggaactgggacggtttgcgt
actggggccaagggactctggtcactgtctctgca
>Z-AAA50199|Z-AAA50198|mus musculus|nishikawadata|light|protein
TQSPASLAVSLGQRATISCRASESVDSYGNSFMHWYQQKPGQPPKLLIYLASNLESGVPA
RFSGSGSRTDFTLTIDPVEADDAATYYCQQNNEDPYTFGGGTKLEIKRA
>Z-AAA50199|Z-AAA50198|mus musculus|nishikawadata|light|dna
acccagtctccagcttctttggctgtgtctctagggcagcgggccaccatatcctgcagagcc
agtgaaagtgttgatagttatggcaatagttttatgcactggtaccagcagaaaccaggaca
gccacccaaactcctcatctatcttgcatccaacctagaatctggggtccctgccaggttcagt
ggcagtgggtctaggacagacttcaccctcaccattgatcctgtggaggctgatgatgctgca
acctattactgtcagcaaaataatgaggatccgtacacgttcggaggggggaccaagctgg
aaataaaacgggct

Example 4.

Something like this will also load as there is still sufficient pairing information in the upload file.

>140667|020169|mus musculus|andrew-ig|heavy|protein
QVQIKESGPGLVPSSLSCTVSFSTSGVWVRPPGKGLEWLMIWSGGSTVNAAFISRLSISK
DNSKSVFMNSLQANDTAIYYCARDYGYQGTLTVSA
>140667|020169|mus musculus|andrew-ig|heavy|dna
aggtgcagctgcaggagtctggacctggcctagtgcagccctcacagagcctgtccatcacct
gcacagtctctggtttctcattaactagctatggtgtacactgggttcgccagtctccaggaaaggg
tctggagtggctgggagtgatatggagtggtggaagcacagactataatgcagctttcatatcca
gactgagcatcagcaaggacaattccaagagccaagttttctttaaaatgaacagtctgcaagct
aatgacacagccatatattactgtgccagaaacgcttactggggccaagggactctggtcactgt
ctctgca 
>020169|140667|mus musculus|andrew-ig|light|dna
gtgcagatcaaggagagcggccccggcctggtgcccagcagcctgagctgcaccgtgagc
ttcagcaccagcggcgtgtggatcaggccccccggcaagggcctggagtggctgatgatct
ggagcggcggcagcaccgtgaacgccgccttcatcaccaagctgagcatcagcagggac
aacagcaagagcgtgttcatgcagagcctgcaggccaacgacaccgccatctactactgcg
ccagggagtacggctaccagggcaccctgaccgtgagcgc