Perl Coding Standards

From DictyWiki

Jump to: navigation, search

Note: Read Damian Conway's 'Perl Best Practices'. It is an outstanding resource for writing reusable, understandable Perl Software.

Contents

Class names

Class names should start with an upper case letter. Different words are separated by an underscore.

  Modware::Class_name;
  instead of
  Modware::ClassName;

For a cut and paste Class template with POD and a cosntructor: Class template


There is a exception to the rule. When defining new feature types whcih uses SO terms, the class name should be all caps For eg. When creating a new feature of type sequence_variant. The class name would be

  Modware::Feature::SEQUENCE_VARIANT;

Compile test

All classes should compile cleanly without warnings.

perl -w -e "use New::Class;"

should result in no warnings.

Named arguments

Writing a constructor (or method) with named arguments

We have implemented the _rearrange method from bioperl which allows for case insensitive. Passing an array ref of argument names returns an array of values from the object that can be assigned to variables with code similar to the following:

    my ( $genotype_id, $name, $description, $mutant_type_id ) =
      $self->_rearrange( [qw( GENOTYPE_ID NAME DESCRIPTION MUTANT_TYPE_ID)],
        @args );

Calling a function with named arguments

Although we have case insensitive argument passing (when the _rearrange method is used), argument names should be lower case for code consistency:

my $genotype = new dicty::Genetics::Genotype(
    -name           => "Fake Strain",
    -description    => "This is a null mutant",
    -mutant_type_id => 500
);


Code formatting

Refer to 'Perl Best Practices' The things that we have read and understood and will implement are:

  • Use PerlTidy. There is a common .perltidyrc that is distributed and you should use. PerlTidy should be linked into your text editor so it can be run OFTEN AND EASILY. This will automatically set whitespace and line break conventions.
  • Code Chunking. Read this section in 'Perl Best Practices'. Adopt this habit and use it always.
  • Whitespace: No tabs, only spaces. If you use PerlTidy, this will all be taken care of for you.

These two techniques alone will make our code more uniform and easy to follow.

Method/Function/Variable names

We favor function names with words separated by underscores:

# in Modware we do NOT LIKE camelCasing:
$someObject->someMethod(); # bad 
$SomeObject->SomeMethod(); # bad 

# but we favor this
$some_object->some_method(); # good

Documentation

We will use (Plain Old Documentation) POD style documentation. The Benefits of documenting the code with POD is that we can generate HTML pages displaying our object interface similar to the way the BioPerl project does. By following the guidelines on this page, we will have a consistently documented code base which makes efficient code sharing possible.

Of course Perl Best Practices by Damian Conway if you are here, you need to get a copy has a great section on documentation. You should get boilerplate documentation to be generated by your text editor so that generating this is as painless as possible.

Methods

Eeach method in the module should have a small POD description above it.

For example:

 =head2 name_of_method

  Title    : new
  Usage    : my $results = $my_object->name_of_method( -argument_label => 'argument_value' );
  Function : creates a gene obejct based on either locus_no or gene_name
  Returns  : reference to a Modware::Gene object
  Args     : named arguments:
           : -argument1 => number
           : -argument2 => Modware::Some_class object
           :
 =cut

Packages

Include some short working examples in each Package. These examples should represent two or three common use cases of the object. The exact syntax used in the examples MUST be in the unit test. Documentation review will take place regularly on the HTML generated documentation.

Example:

 =head1 NAME

    Modware::Some_module

 =head1 SYNOPSIS

  Here, you want to concisely show a couple of SIMPLE use cases.  You should describe what you are doing and then write code that will run if pasted into a script.  

  For example:

  USE CASE: PRINT A LIST OF PRIMARY IDS OF RELATED FEATURES

    my $gene = new Modware::Gene( -feature_no => 4161 );

    foreach $feature ( @{ $gene->features() } ) {
       print $feature->primery_id()."\n";
    }

  =head1 DESCRIPTION

   Here, AT A MINIMUM, you explain why the object exists and where it might be used.  Ideally you would be very detailed here. There is no limit on what you can write here.  Obviously, lesser used 'utility' objects will not be heavily documented.

   For example: 

   This object attempts to group together all information about a gene
   Most of this information is returned as references to arrays of other objects.  For example
   the features array is one such association.  You would use this whenever you want to read or write any 
   properties of a gene.

  =head1 AUTHOR - Your Name

    Your Name your_email@northwestern.edu

 =head1 APPENDIX

    The rest of the documentation details each of the object
    methods. Internal methods are usually preceded with a _

 =cut


Search Classes

These classes return arrays of objects. They all start with Modware::Search, the type of object returned is denoted in the class name. For example, Modware::Search::Gene is a class that contains methods that return arrays of Gene objects matching certain criteria or counts of Genes matching certain criteria. Methods in these classes are Class methods (you don't need to instantiate a Modware::Search::Reference object to call one of its methods). As a class method the first letter should be capitalized. Two example methods in Modware::Search::Gene

sub Search_by_name {
    my ( $self, @args ) = @_;

    my $dbh    = new Modware::DBH;

    my @results;

    # Queries in search classes generate
    # lists of identifiers
    my $sth = $dbh->prepare( "
        SELECT G.FEATURE_ID
          FROM $ENV{'CHADO_USER'}.V_GENE_FEATURES G
         WHERE LOWER(G.NAME) LIKE LOWER(?)
      ORDER BY LOWER(G.NAME)
   " );

    $sth->execute(@args);

    # Identifiers returned in $sth are converted
    # to iterators
    my $itr = $self->sth_to_iterator($sth);
    $sth->finish();

    # return array of Gene objects if method called in array context:
    # i.e my @arrray = Modware::Search::Gene->Search_by_name( 'sa*' );
    # otherwise return iterator
    return wantarray ? $itr->to_array() : $itr;

}
sub Count_by_name{
...
}


Calling each method in a script:

my $count = Modware::Search::Gene->Count_by_name( 'abc%' );

print "There are $count genes with the name matching abc\%\n";
my $genes = Modware::Search::Gene->Search_by_name( 'abc%' );

print "There genes with the names matching abc\% are:\n";
while (my $gene = $genes->next() ) {
   print $gene->gene_name()."\n";
}

Alternatively,

my @genes = Modware::Search::Gene->Search_by_name( 'abc%' );

print "There genes with the names matching abc\% are:\n";
for my $gene ( @genes ) {
   print $gene->gene_name()."\n";
}

Database access

Except in the case of 'search' classes which require complex queries, there should be no SQL directly on the database handle in ANY class. Database access should be through either the Class::DBI ORM or Dbtable (if you are dealing with SGD legacy tables for which a Dbtable class exists). All updates, inserts, deletes and MOST retrieval should be done through the methods provided by the ORM.

For right now, we will deal with single rows of data. We normally don’t deal with more than one row of a database row. If we need to do that it is usally done by calling a method in one of the Search classes which returns an array of OBJECTS (not database rows).


Except in the case of 'search' classes which require complex queries, there should be no SQL directly on the database handle in ANY class. Database access should be through either the Class::DBI ORM or Dbtable (if you are dealing with SGD legacy tables for which a Dbtable class exists). All updates, inserts, deletes and MOST retrieval should be done through the methods provided by the ORM.

Standard Get/Set Accessors

dictyBase and Modware code uses simple get/set setter accessors as seen all over BioPerl. If you call the accessor with an argument, that argument is then assigned to that method and is returned. Subsequent calls to the method without an argument will also return that value.

The following boilerplate method can be used.

You should program your text editor to write this automatically.

Note: We do not use Autoloaded methods from Class::Accessor as a rule. This is because they slip through the documentation generation software and we prefer a more explicit method declaration.

sub myaccessor {
   my ($self, $obj) = @_;

   if(scalar @_ > 1) {
      $self->{myaccessor} = $obj;
   }
   return $self->{myaccessor};
}


while designing new objects, if it needs to store the Class::DBI representation of the database object, the accessor must be named as _database_object

sub _database_object {
    my ( $self, $obj ) = @_;
    if ($obj) {
        $self->{_cvterm_dbobj} = $obj;
    }
    return $self->{_cvterm_dbobj};
}

'Lazy' Evaluated Accessors

The idea behind Lazy accessors is that they should act like the standard get/set accessors except that they are not populated with a database value until the accessor is called. For instance, if I create a gene object, many times I don't care what references are associated with it. Only in certain circumstances would I want to incur the overhead of creating reference objects for all of the references associated with the genes. In code I just say my @references = @{$gene->references()} and behind the scenes, the references are being queried from the database the first time that the 'references' method is called.

There should be two methods for a lazy evaluated accessor. One can be automatically generated and is simply a modified version of a standard get/set accessor. You only need to fetch the item from the database if the hash key has not been set or the method has been passed an argument (the value is being set in client code). There should be a separate method that gets the value from the database and sets the value using the standard accessor.

This method allows the same accessor to be inherited by different classes. For this example, any class that holds an array of references can inherit the 'references' method. Individual classes would define their own '_get_references' method.

A boilerplate lazy get_setter (just repleace the word 'myaccessor' with whatever you want your method to be called.)

You should program your text editor to write this automatically.

sub myaccessor {
   my ($self, $obj) = @_;

  #
  # fetches myaccessor from database (_get_myaccessor) if myaccessor is not yet defined
  #  and the user is not attempting to set the myaccessor options
  #
   exists $self->{myaccessor} || scalar @_ > 1 || $self->_get_myaccessor();

   if(scalar @_ > 1) {
      $self->{myaccessor} = $obj;
   }
   return $self->{myaccessor};
}

sub _get_myaccessor {
   my ($self) = @_;

   my $myaccessor_value;

   # SOME CODE TO GET VALUE OF MYACCESSOR FROM THE DATABASE

   $self->myaccessor( $myaccessor_value );

}


An example lazy evaluated accessor for references: 2 methods

=head2 references

 Title    : references
 Usage    : foreach my $reference ( @{ $feature->references() ) {
          :    print $reference->citation()."\n";
          : }
 Function : get set references for this gene
 Returns  : returns reference to array of Modware::Reference objects
 Args     : reference to array of Modware::Reference objects ( optional )

=cut

sub references {
   my ($self, $obj) = @_;

  #
  # fetches references from database (_get_references) if references is not yet defined
  #  and the user is not attempting to set the references options
  #
   exists $self->{references} || scalar @_ > 1 || $self->_get_references();

   if(scalar @_ > 1) {
      $self->{references} = $obj;
   }
   return $self->{references};
}

The method that does the database call, defined by each class with a 'references' method.

=head2 _get_references

 Title    : _get_references
 Usage    : $gene->_get_references();
 Function : gets array of references ( L<Modware::Reference> objects )
 Returns  : nothing
 Args     : none

=cut

sub _get_references {
   my ($self) = @_;

   my @references   = Modware::Search::Reference->Search_by_feature( $self->feature_no );

   $self->references( \@references );

}


Public IDS (i.e. DDB0000000, DSC00000000)

We use one sequence to generate all public identifiers. This makes the number portion of the identifier unique across ALL PUBLIC IDENTIFIERS, and the prefix varies according to what kind of entity the ID refers to. Currently DSC is a strain and DDB is a chromosome, RNA, pseudogene, contig, gap, or GenBank record.

my $id_dbxref = dicty::DBH->get_public_id( 'DSC' );

=head2 get_public_id

 Title    : get_public_id
 Usage    : $feature->get_public_id('DDB');
 Function : Generates a padded identifier from a GLOBAL sequence
 Returns  : Chado::Dbxref object
 Args     : a prefix for the identifier.
          : for example calling with DSC would generate an ID
          : that looks like DSC0007546
          : if no value is passed in then $ENV{'IDENTIFIER_PREFIX'} is used.

=cut

Can then retrieve ID or accession in calling code by $id_dbxref->accession(); # for retreiving the id for display (i.e. DSC0000001) $id_dbxref->dbxref_id(); # for linking this dbxref in the database

Personal tools