Scholar parser

(highly experimental)

This class parses a profile page from Google Scholar for publication data and scientist stats. The page can be read directly from Google Scholar by supplying the user's profile ID, or by passing a HTML file saved from Scholar to the class.

Installation

This module requires PHP-PhantomJS which can be found at http://jonnnnyw.github.io/php-phantomjs/. For easy installation, I recommend using Composer (https://getcomposer.org/) and use the included composer.json file by executing the command

php composer.phar install

Example usage

Below is a very basic example. For a more elaborate one see the example.php file which uses a basic caching mechanism to not query Scholar with each page view request

// Create a new instance of the parser class
require_once("scholar_profile_parser.class.php");
$parser = new ScholarProfileParser();

// The profile to parse (mine in this case)
$scholar_id = "Pm3O_58AAAAJ&hl";

// Read the html from Scholar into a DOM object
$parser->read_html_from_scholar_profile($scholar_id);
// Parse publication data from the DOM
$parser->parse_publications();
// Parse stats from the DOM (H-Index, citation count, i10 index)
$parser->parse_stats(); 

// Print the output
$parser->print_parsed_data_raw();   //Basic output as stored in JSON
echo $parser->format_publications_in_APA();  //Formatted as HTML table

API reference

The documentation for the API can be found here.