UniversityLite Design Overview
I developed UniversityLite as a rapid deployment e-commerce tool to market university products and information to university students over the web using PHP. In other words, UniversityLite creates, deploys and maintains university websites automatically using custom PHP function. As of 2016, this tool has generated and maintains over 7.000 websites with each receiving about 1,000 views a month. The tool is based on the Model-View-Controller (MVC) architectural programming framework.
In its simplest form, UniversityLite can be understood as a function, where the input is simply the name of a university and the output is a full and complete website tailored to that university. Therefore the job of the function is to automatically create the database and website code for a full and complete website. The method by which this is accomplished is through the systematic gathering of information and the resulting interpretation and presentation of that information based on complex, but predictable logic.
As the base input for content generation, the tool was fed 7,000 university names that I gathered from the National Center for Education Statistics (NCES).
Let’s look at one of those university names, “Monroe Community College”, as an example to see how the tool generates basic information about the university.
First, the tool must determine basic information about the string, “Monroe Community College”. We can reasonably assume this will be the name of a university and will develop an array of queries we will use that represent how the university might be referenced on the internet. First, the tool sees the substring, “Community College” and can reasonably determine we might also call the college by its acronym, “MCC”. This logic is used to generate up to four different unique names each university might go by, such as “Brockport University” from the string “State University of New York at Brockport”.
Now that the tool has an array of search terms, it can get to work using intelligent functions with several of the open APIs mentioned to build a SQL database of variables associated with this university. It would be exhaustive to go through the process of all variables generated as each university name generates thousands of variables. Yet, here are some obvious ones along with the techniques used to gather them:
- Preferred subdomain such as “mcc.universitylite.com” (query NCES API for website address and strip out the subdomain, ‘mcc’)
- University Colors such as “Blue and Gold”, as hex values (strippng relevant color names from wikipedia_infobar_request function and sending them to a color_names_to_hex function).
- City, State and Zip of university, such as “Rochester”, “NY” and “14626” (query NCES API for these values)
…and this process goes on and on to gather such information as demographics, weather, local attractions (such as “Monroe County Museum”), apparel categories (such as “MCC Men’s Pants”), degree programs and product categories based on degree programs (such as “Nursing Supplies”).
UniversityLite builds the SQL database with these variables, organizing each major category into a relevant table. To streamline access and storage, we break each table into the data, a data dictionary describing the data and a variable dictionary (as indicated by the VARS and DD tables below).
Now that UniversityLite has created our SQL database of a couple thousand variables associated to “Monroe Community College”, we can begin to generate content associated to Monroe County College as a university.
For starters, the functions will begin to create the skeleton content for the website, such as colors, titles, logos and basic content. When displaying products, for example men’s apparel, we use variables such as the university’s name, colors, demographics and degree programs to query the Amazon AWS API.
In this example, we determine the university total student population is 52% male, which indicates it is fine to go ahead and generate the REST request to Amazon for a male category (as opposed to an all female university, for example). The variables also show us this university has several sports teams, so we decide to put weight on men’s apparel that is sports related. This weight comes in the form of an array of possible search terms that are given mathematical weights to them based on the funding the sports teams receive (aka their size relative to the overall university’s budget).
The chosen array search term, which let’s say is “MCC Men’s Basketball Activewear” is used to build an Amazon REST request by referencing a custom REST generator build for Men’s Apparel (to assure we are only returning items we want within a certain scope of Men’s apparel). This logic will be repeated across the scope of probability we determine from the sports weight function (and other similarly performing functions) to generate a product list which reflects the weight of sports and other variables actually represented by university statistical data.
Once we finally have created our REST request as a string to send to Amazon’s API, we pass it to Amazon, which will in turn provide us an XML database of product information. This will then be imported into the UniversityLite server as a local file for speed in calling the same product string again (dumped after 24 hours for the sake of having relevant pricing models). Each XML comes back between 10MB-30MB, which can quickly grow to over 10GB of XML data in a couple days, depending on how actively the site is being used by actual users. Management of this data with 7,000 active sites (not to mention all the other data generated such as images) requires tight tolerances of data purging to assure the servers that run UniversityLite do so smoothly.
Merging all of the XMLs related to the probability function for sports weight, etc. (using a custom XML Merge function just for this purpose) creates a final XML locally on our server that represents the final collection of information that will be used to display one collection of MCC apparel (in this case, MCC Men’s Apparel).
A sampled section of such a merged XML can be seen below.
This type of process is used for a couple dozen different product categories which are used through the site. In Monroe Community College’s case, it used for both men and women apparel and merchandise, in addition to several product categories that relate to variables relatable to the university.
Of the greatest importance is applying this logic to textbooks a student could purchase. Because we don’t know what textbooks a student might desire, the search query becomes the basis for the REST request. The only major difference is that the query is passed to a function expecting book titles or ISBNs specifically. One unique aspect, however, is the inclusion of the UniversityLite Aggresearch™. This is a very custom solution I put in place to assure that when UniversityLite Aggresearch™ is chosen, the very lowest priced book opportunities are returned. The basis for this logic is searching several possibilities from multiple vendors and inserting those into an array where we can sort it by the lowest price. The result in many cases is that UniversityLite Aggresearch™ often returns books at competitive prices versus other book searching algorithms found on the internet.
One other major area of the site that falls into the same category is the ability for students to post and sell their own books. Although the process is intended to look seamless to a user, this area has completely different functionality. The buying and selling mechanism is based off the WPAdverts – Classifieds Plugin. A major change I made to the plugin is the ability to automatically fill in the classified posting with Author, Title, ISBN, pictures of the book and suggested prices of the book so the user does not have to. Simply entering in either the ISBN or Title will auto populate all relevant fields. This is a major improvement to the WPAdverts Plugin, and one that makes the plugin unique to UniversityLite. In addition, the e-mail domain variable is used to force users to login with a valid university email to sell a book. This creates a niche and protected market for students at a particular university where they know books are being bought and sold by verified students.
Aside from product generation, UniversityLite also determines useful characteristics about the university that create community pages that automatically update. One such page displays recent YouTube sports videos associated to the university, while another displays recent videos from university students based on a geolocation search to the API. In the former case, the sports team used for the YouTube API search is determined using the wikipedia API and in the geolocation is based on an NCES API call where we query the longitude and latitude. The resulting geolocation is passed to the code snippet below which in turns builds a list of YouTube videos we can insert.
Another major component of the generated site are dozens of articles, reports and graphs that are automatically “written” for the purpose of displaying information about the university. This is used to drive traffic and interest to the site with unique content. A sample of such an article can be seen below.
The many graphs for the article are created using several different methods, most of which use the NCES and Wikipedia variable results now in the SQL Database. In the sample below, we iterate through available variables to build the foundation for a JPEG graph that will be displayed on the site.
A sentence within the article might follow logic something like this code snippet below.
UniversityLite is load balanced across three Ubuntu 14.04 LTS virtual servers located in New York, NY with one backup server in Webster, NY. The three primary servers are balanced alphabetically by university name. There are 7,000 websites available as subdomains off of the universitylite.com domain (such as rit.universitylite.com). Each subdomain is given resources by the server only if actively being viewed. As seen in the image below, an arbitrary university named ‘David’s University College’ would be hosted on Server1. Because the functions and algorithms to create any of the A-Z universities resides on each ot the servers, the balancing is done for resource balancing, not necessarily content balancing.
Because of the large set of active subdomains, load balancing through DNS would be cumbersome and time consuming as partial wildcards (such as a*.universitylite.com) in a DNS zone are not defined behavior within the RFC. Manually adding a record for rit.universitylite.com, for example, would also require us to add a manual record for the 6,999 other subdomains. Therefor a DNS zone can only practicaly be used with full wildcards such as *.universitylite.com to to one of the universitylite.com servers. As a result, load balancing is handled first by the Apache virtualhost file, and then subsequently by PHP logic once the subdomain is called.
As seen in the function below, we can query the SQL table once a subdomain passed to the UniversityLite webserver. If the subdomain is found as a SQL entry, we can go ahead and assume the server referenced is the correct load balanced server. However, if it isn’t, we a) either can assume it’s load balanced on one of the other servers or b) it is simply not a valid subdomain anywhere on the site. For example, trying to go to gaboldygooky.universitylite.com will fail all load balancing checks and return an error to the browser.
In this way, I have created a dynamic load balancing system that allows sites to be added and removed ad-hoc without having to alter DNS. Of course, this is just general domain load balancing, and we must further balance the content itself. There are, afterall, 7,000 different subdomains with uniquely generated HTML, photos, graphs, etc. The easiest way I found to handle this is by having UniversityLite just be one central source of programming code, and to allow other content to be displayed dynamically when required. That is not to say, however, that each site can be dynamically created each time it is viewed, but we can remove some redundancy from the equation.
Below are areas of dialy generated content that CAN be shared between them, assuming the same calculator with description, etc. might be viewed by more than one subdomain. This data alone amounts to about 10GB of freshly generated content for a 24 hour period for the sites on server1. If we did not consolidate this data, this could easily have grown to 23TB of data in just 24 hours on one server.
Directory List of ~/www/universitylite.com/public_html/sites/shared$:
drwxr-xr-x 2 maiolo99 maiolo99 4096 Oct 20 16:49 adpics
drwxr-xr-x 7 maiolo99 maiolo99 4096 Oct 20 21:29 amazon_xml
drwxr-xr-x 2 maiolo99 maiolo99 4096 Sep 23 18:07 apparel_images
drwxrwx--- 2 maiolo99 maiolo99 4096 Sep 18 16:54 avatars
drwxrwx--- 2 maiolo99 maiolo99 4096 Jul 26 23:29 holidays
drwxr-xr-x 356 maiolo99 maiolo99 20480 Oct 8 00:30 ms_images
drwxr-xr-x 2 maiolo99 maiolo99 7966720 Oct 21 16:51 product_images
However, sometimes there is only so much we can do. As seen in the directory listing below, these files are created uniquely for one of the subdomains (mcc.universitylite.com) below. This information is perfectly unique to Monroe Community College, and there is nothing we can really do about it.
Directory Listing of ~/www/universitylite.com/public_html/sites/MCCTextbooks:
graph_MCC_All Instructional Staff Total_1473662381.jpg
graph_MCC_All Instructional Staff Total_1473662723.jpg
graph_MCC_All Instructional Staff Total_1473722731.jpg
graph_MCC_All Instructional Staff Total_1473773589.jpg
graph_MCC_All Instructional Staff Total_1473791287.jpg
graph_MCC_All Instructional Staff Total_1473813691.jpg
graph_MCC_All Instructional Staff Total_1474227934.jpg
graph_MCC_All Instructional Staff Total_1474457249.jpg
graph_MCC_All Instructional Staff Total.jpg
The best we can do with this data is purge it (beyond what we might want to keep for historical purposes). We do this by adding historically usable data into a SQL table for that university, and simply deleting the rest.
The load balancing techniques used on UniversityLite have allowed us to maintain 7,000 unique websites, on only three servers.
Now that I have the groundwork for this structure in place, I am already applying it to other markets such as automobile parts and niche Hawaiian products.
For more information, or a closer demo of my work or code, please feel free to contact me.