Reference
Here is the Web API code or reference, providing details about the classes, methods, parameters, attributes, and each part of this application.
.
├── app
│ ├── adapters/
│ | ├── factories/
│ | ├── gateway/
│ | ├── helpers/
| | └── presenters/
│ ├── core/
│ | ├── common/
│ | ├── config/
│ | ├── data/
│ | ├── domain/
| | └── usecases/
│ ├── framework/
│ | ├── dependencies/
│ | ├── exceptions/
│ | ├── fastapi/
│ | └── middleware/
│ └── utils/
SearchParams
Type validator for API Key and Keywords search params
source code
core.common
app/core/common/types.py
A model validator built using Pydantic BaseModel
for the API Key and Keywords parameters used in the Scopus Search API search, validating their typing and values using the Pydantic Field()
function.
Note
Read more about the API Key and Keywords rules and specifications in the requirements section.
Parameter | Type | Description |
---|---|---|
api_key |
str |
API Key search parameter |
keywords |
Keywords |
Keywords search parameter |
ScopusResult
Serializer for entry field item in response JSON schema
source code
core.data
app/core/data/serializers.py
A model serializer built using Pydantic BaseModel
for the entry items in the Scopus Search API search response, parsing them in code using the Pydantic Field()
function.
Parameter | Type | JSON Field | Description |
---|---|---|---|
link |
str |
@_fa |
Top-level navigation links |
url |
str |
prism:url |
Content Abstract Retrieval API URI |
scopus_id |
str |
dc:identifier |
Article Scopus ID |
ScopusSearch
Serializer for Scopus Search API response JSON schema
source code
core.data
app/core/data/serializers.py
class ScopusSearch(
total_results: int = Field(),
items_per_page: int = Field(),
entry: list[ScopusResult] = Field()
)
A model serializer built using the Pydantic BaseModel
for the Scopus Search API JSON response, parsing it in code using the Pydantic Field()
function.
Before validation, it will access the search-results
JSON field to flatten the hierarchy and get the actual data.
Info
Flattening is the process of transforming a nested JSON data structure into a single level of key-value pairs.
Parameter | Type | JSON Field | Description |
---|---|---|---|
total_results |
int |
opensearch:totalResults |
Total number of articles found |
items_per_page |
int |
opensearch:itemsPerPage |
Number of articles divided into each page |
entry |
list[ScopusResult] |
entry |
Lists of article data with the fields specified in the search |
Note
Read more about the returned JSON body and its fields.
pages_count
Calculates the number of pages by dividing the total results by the number of items per page, returning the smallest int
using the math ceil()
function.
ScopusQuotaRateLimit
Serializer for Scopus APIs responses
source code
core.data
app/core/data/serializers.py
class ScopusQuotaRateLimit(
reset: float = Field(),
status: str = Field(),
error_code: str = Field()
)
A model serializer built using Pydantic BaseModel
for the Scopus APIs responses, parsing them in code using the Pydantic Field()
function.
Before validation, it will retrieve the response headers and get the error-response
response JSON field if present.
Parameter | Type | Response Field | Description |
---|---|---|---|
reset |
float |
X-RateLimit-Reset |
Date/Time in Epoch seconds when API quota resets |
status |
str |
X-ELS-Status |
Elsevier server/Scopus API status |
error_code |
str |
error-code |
Elsevier server/Scopus API error code |
Info
Epoch is the number of seconds that have elapsed since January 1, 1970, also known as Unix time.
reset_datetime
Convert the epoch timestamp from the quota reset header to a datetime
, format it, and return it as a more understandable str
of the datetime, telling when the API request quota will be reset.
quota_exceeded
Check the value of the response status header, returning True
if it is equal to QUOTA_EXCEEDED - Quota Exceeded
and False
otherwise.
Note
Learn more about the API request quota limit.
rate_limit_exceeded
Check the value of the response error code field, returning True
if it is equal to RATE_LIMIT_EXCEEDED
and False
otherwise.
Note
Learn more about the API request throttling rate limit.
ScopusAbstract
Serializer for Scopus Abstract Retrieval API response JSON schema
source code
core.data
app/core/data/serializers.py
class ScopusAbstract(
url: str = Field(),
scopus_id: str = Field(),
authors: str = Field(),
title: str = Field(),
publication_name: str = Field(),
abstract: str = Field(),
date: str = Field(),
eid: str = Field(),
doi: str = Field(),
volume: str = Field(),
citations: str = Field()
)
A model serializer built using the Pydantic BaseModel
for the Scopus abstracts of the articles in the JSON response, parsing them in code using the Pydantic Field()
function and setting to null
any fields that are not returned.
Before validation, the hierarchy will be flattened to get the actual data. First, the abstracts-retrieval-response
JSON field will be accessed, then the authors
field will be set from the author
JSON field, taken from the authors
JSON field if returned or from the dc:creator
JSON field otherwise.
Info
Flattening is the process of transforming a nested JSON data structure into a single level of key-value pairs.
Additionally, the author names will be selected from the ce:indexed-name
JSON field in the author data, to be concatenated and returned. Finally, the coredata
JSON field will be accessed and updated with the author data before returning it.
When deserialized into dict
, the date
field, when not null
, will be formatted as DD-MM-YYYY
.
Parameter | Type | JSON Field | Description |
---|---|---|---|
url |
str |
link ref=scopus |
Scopus article preview page URL |
scopus_id |
str |
dc:identifier |
Article Scopus ID |
authors |
str |
authors or dc:creator |
Complete author list or only the first author |
title |
str |
dc:title |
Article title |
publication_name |
str |
prism:publicationName |
Source title |
abstract |
str |
dc:description |
Article complete abstract |
date |
str |
prism:coverDate |
Publication date |
eid |
str |
eid |
Article Electronic ID |
doi |
str |
prism:doi |
Document Object Identifier |
volume |
str |
prism:volume |
Identifier for a serial publication |
citations |
str |
citedby-count |
Cited-by count |
Note
Read more about the returned fields in the Scopus Search Views documentation.
AccessToken
Get and validate the Access Token
source code
framework.dependencies
app/framework/dependencies/access_token.py
A route dependency that implements the __call__
method to create a callable instance that will obtain and validate the Access Token header via the FastAPI Header()
function or the request.
To provide a little more security, the application will automatically generate a Token that will be passed to the application's API web page, which in turn will send it in the request header for validation.
Parameter | Type | Description |
---|---|---|
request |
Request |
The FastAPI Request object |
access_token |
str or None |
Request Token header descriptor and validator. Default: None |
QueryParams
Get and validate the Query Params
source code
framework.dependencies
app/framework/dependencies/query_params.py
class QueryParams()(
request: Request,
api_key: Annotated[str | None, APIKeyQuery] = None,
keywords: Annotated[Keywords | None, KeywordsQuery] = None
)
A route dependency that implements the __call__
method to create a callable instance that will obtain and validate the API Key and Keywords query parameters via the FastAPI Query()
function or the request.
Parameter | Type | Description |
---|---|---|
request |
Request |
The FastAPI Request object |
api_key |
str or None |
Request API Key Query Param descriptor and validator. Default: None |
keywords |
Keywords or None |
Request Keywords Query Param descriptor and validator. Default: None |
equals
Compares the instance's API Key and Keywords with another API Key and Keywords, returns True
if they are equal and False
otherwise.
Parameter | Type | Description |
---|---|---|
api_key |
str |
API Key to comparison |
keywords |
list[str] |
Keywords to comparison |
to_dict
Serializes the API Key and Keywords instance attributes as a dict
.
HTTPRetryHelper
Make HTTP requests with throttling and retry mechanisms
source code
adapters.helpers
app/adapters/helpers/http_retry_helper.py
An HTTP client for making requests with the following mechanisms:
- Throttling: control the rate of data flow into a service by limiting the number of API requests a user can make in a certain period.
- Retry: automatically retry failed operations to recover from unexpected failures and continue functioning correctly.
- Rate Limiting: limits network traffic by controlling the number of requests that can be made within a given period of time.
- Session: persist certain parameters and reuse the same connection across all requests.
- Cache: temporarily stores data so that future requests for that data can be fulfilled more quickly.
Parameter | Type | Description |
---|---|---|
for_search |
bool |
Indicates in the log message whether the request will be directed to the Scopus Search API or not. Default: None |
mount_session
Initializes the session and mounts it by registering the cache-control connection adapter with the retry configuration.
Parameter | Type | Description |
---|---|---|
headers |
Headers |
The HTTP headers to send in the request |
close
Closes the cache-control connection adapter and session.
request
Initialize, prepare with session, send the request, and then returns the response as a Requests Response
object.
Parameter | Type | Description |
---|---|---|
url |
str |
The URL to send the request to |
URLBuilderHelper
Generate and format URLs for HTTP requests
source code
adapters.helpers
app/adapters/helpers/url_builder_helper.py
A builder to generate Scopus APIs resource URLs and pagination URL.
get_search_url
Generates Scopus Search API resource URL and returns it as a str
.
Parameter | Type | Description |
---|---|---|
keywords |
Keywords |
The keywords that will be used in the search |
get_pagination_url
Generates Scopus Search API pagination URL and returns it as a str
.
Parameter | Type | Description |
---|---|---|
page |
int |
The page index for the start of pagination |
get_article_page_url
Generates Scopus Abstract Retrieval API resource URL and returns it as a str
.
Parameter | Type | Description |
---|---|---|
url |
str |
The Scopus Abstract Retrieval API base resource URL |
ScopusSearchAPI
Search and retrieve articles via the Scopus Search API
source code
adapters.gateway
app/adapters/gateway/scopus_search_api.py
First, the request headers for the Scopus API will be built with the API Key
, the resource URL is built with the API Key
and Keywords
as search parameters, and then the articles will be searched via the Scopus Search API. Then, the response is validated, retrieving the articles if successful, or handling errors otherwise.
An error will be returned when: no articles are found, the API Key
quota is exceeded, the Scopus Search API returns a HTTP status error, and when the JSON response cannot be validated.
Note
Read more about the quota of how much data an API Key can retrieve.
The articles data will be validated, defaulting to null
for fields that are not returned. It may use threads with the ThreadPoolExecutor and build the URL with the page index when there are multiple articles to fetch with pagination.
Parameter | Type | Description |
---|---|---|
http_helper |
HttpRetry |
Injects HttpRetryHelper to make the requests |
url_builder |
UrlBuilder |
Injects UrlBuilderHelper to build the URLs |
search_articles
Searches for articles via Scopus Search API, compiles and returns all retrieved data in a list
of ScopusResult
.
Parameter | Type | Description |
---|---|---|
search_params |
SearchParams |
Validated API Key and Keywords search parameters |
ScopusAbstractRetrievalAPI
Retrieves Scopus abstracts via the Scopus Abstract Retrieval API
source code
adapters.gateway
app/adapters/gateway/scopus_abstract_retrieval_api.py
First, the request headers to the Scopus API will be built with the API Key
, the resource URL is built from the base resource URL, and then the Scopus abstracts will be retrieved via the Scopus Abstracts Retrieval API. The response is then validated, retrieving the abstracts if successful or handling errors otherwise.
An error will be returned when: the API Key
quota is exceeded, the Scopus Abstract Retrieval API returns an HTTP status error, and when the JSON response cannot be validated.
The abstracts data will be validated, defaulting to null
for fields that are not returned. It can use threads with the ThreadPoolExecutor when there are multiple abstracts to retrieve.
Parameter | Type | Description |
---|---|---|
http_helper |
HttpRetry |
Injects HttpRetryHelper to make the requests |
url_builder |
UrlBuilder |
Injects UrlBuilderHelper to build the URLs |
retrieve_abstracts
Retrieves Scopus abstracts via the Scopus Abstract Retrieval API, compiles and returns all fetched data into a Pandas DataFrame
.
Parameter | Type | Description |
---|---|---|
api_key |
str |
Validated API Key search parameter |
entry |
list[ScopusResult] |
List of articles data |
ArticlesSimilarityFilter
Filter articles from identical authors with similar titles
source code
core.usecases
app/core/usecases/articles_similarity_filter.py
From the DataFrame
containing all the article information already gathered, the authors are counted, and those that were repeated at least twice are selected. Then, from the articles of these authors, their respective titles are selected and compared using the TheFuzz ratio()
function, and those whose similarity rate is at least 80%
are gathered and discarded.
Note
After consideration and testing, we set the similarity ratio for the articles selection at 80%
.
For all the similar articles gathered, the first one is kept and the rest are discarded. If all the authors are unique, meaning none are repeated, or no similar titles were found, it will return the same DataFrame
.
filter
Filters articles from the DataFrame
if they are from identical authors with similar titles, and then all filtered data will be returned in a Pandas DataFrame
.
Parameter | Type | Description |
---|---|---|
dataframe |
DataFrame |
The DataFrame containing all the gathered article information to be filtered |
ScopusArticlesAggregator
Gathers, filters and compiles data from Scopus articles
source code
core.usecases
app/core/usecases/scopus_articles_aggregator.py
class ScopusArticlesAggregator(
search_api: SearchAPI,
abstract_api: AbstractAPI,
similarity_filter: SimilarityFilter
)
First, articles are searched via Scopus Search API using the provided search parameters, and their Scopus abstracts are retrieved via Scopus Abstract Retrieval API.
Next, articles that are exact duplicates are removed, those with the same authors and titles are also discarded, and similar articles are filtered using ArticlesSimilarityFilter
.
An error is returned when no articles are found.
Parameter | Type | Description |
---|---|---|
search_api |
SearchAPI |
Injects ScopusSearchAPI to search and get the articles via the Scopus Search API |
articles_scraper |
abstract_api |
Injects ScopusAbstractRetrievalAPI to retrieve the Scopus abstracts via the ScopusAbstractRetrievalAPI |
similarity_filter |
SimilarityFilter |
Injects ArticlesSimilarityFilter to filter articles by identical authors with similar titles |
get_articles
Gathers and filters data from Scopus articles, writes and saves all remaining articles to a CSV file, and returns them as a FastAPI FileResponse
object.
Parameter | Type | Description |
---|---|---|
params |
SearchParams |
The validated API Key and Keywords search parameters |
TemplateContextBuilder
Generates context values for template responses
source code
adapters.presenters
app/adapters/presenters/template_context.py
Compiles and builds data, such as context values, for the templates that Jinja renders, passing them and loading them into HTML templates that are returned as a Jinja2Templates TemplateResponse
object.
Parameter | Type | Description |
---|---|---|
request |
Request |
The FastAPI Request object |
get_web_app_context
Returns data to build the API web page response template, returning the request object, template name, and context values.
About the Context values:
Field | Description |
---|---|
request |
The FastAPI Request object |
version |
Application version. Example: 3.0.0 |
repository |
URL of the application's GitHub repository |
swagger_url |
Swagger page URL. Default: / |
token |
Application Token |
filename |
CSV filename. Default: articles.csv |
table_url |
Table web page URL. Default: /scopus-survey/api/table |
search_url |
API URL. Default: /scopus-survey/api/search-articles |
description |
Application description |
get_table_context
Returns data to build the Table web page response template, returning the request object, template name, and context values.
About the Context values:
Field | Description |
---|---|
request |
The FastAPI Request object |
version |
Application version. Example: 3.0.0 |
repository |
URL of the application's GitHub repository |
swagger_url |
Swagger page URL. Default: / |
content |
Table content. List of the articles found or None if there are no articles |
web_app_url |
Application web page URL. Default: /scopus-survey/api |
ExceptionJSON
Generates JSON representation responses for exceptions
source code
adapters.presenters
app/adapters/presenters/exception_json.py
A presenter created using FastAPI JSONResponse
that generates JSON representation responses for exceptions. The error details are filtered to remove the PydanticUndefined
error from Pydantic ValidationError
and the Request
object data is retrieved.
The datetime timestamp is set as a str
in ISO format and finally all data is converted and encoded using the FastAPI jsonable_encoder()
function.
Parameter | Type | Description |
---|---|---|
request |
Request |
The FastAPI Request object |
code |
int |
HTTP status error code |
message |
str |
Exception description |
errors |
Errors |
Error metadata and details |
Exceptions and Errors
HTTP Exceptions are models built from FastAPI HTTPException
that represent HTTP error status codes sent in the response to notify the client using your application of an error. The ones implemented are 401 Unauthorized
, 404 NotFound
, 422 UnprocessableContent
, 500 InternalError
, 502 BadGateway
and 504 GatewayTimeout
.
Application Errors are models built from the base class Exception
that indicates that an error has occurred in the core part of the application's operation/processing. The ones implemented are InterruptError
for the shutdown/exit interrupt signal and ScopusAPIError
for the Scopus Search API HTTP status error.
Exception Handlers are routines designed to process and respond quickly to the occurrence of exceptions/errors or specific special situations during the execution of a program, returning their JSON representation. The implemented handlers are for Starlette HTTPException
, FastAPI HTTPException
, RequestValidationError
, ResponseValidationError
, ValidationError
, HTTPException
, ApplicationError
and Exception
.
About the Exception JSON Response:
Field | Type | Description |
---|---|---|
success |
bool |
Result of the operation, which is a failure since it is an exception. Deafult: False |
code |
int |
HTTP error status code |
message |
str |
Exception/error description |
request |
dict[str,Any] |
Contains some request data in a dict |
errors |
Errors |
Contains some details of the exception/error in a dict . Deafult: None |
timestamp |
str |
The datetime timestamp as a str in ISO format |
About the request
field:
Field | Description |
---|---|
host |
The request client host. Default: 127.0.0.1 |
port |
The request client port. Default: 8000 |
method |
The request method |
url |
The request URL path |
headers |
The request headers |
About ScopusAPIError
error details:
Field | Description |
---|---|
status |
The Scopus APIs HTTP status error code |
api_error |
The Scopus APIs response status error description. Deafult: null |
content |
The Scopus APIs JSON response content itself |
Note
See the responses status error description in the documentation.
Middlewares
Middlewares are mechanisms built on top of the Starlette BaseHTTPMiddleware
that work in the application's request-response cycle, intercepting calls and processing them. They can access and manipulate each request object before it is processed by any route handlers, and also each response object before returning it. There are three implemented.
The TraceExceptionControl
middleware traces the request, reporting the client, the URL accessed, the response status code, and the processing time. It also handles any unexpected exceptions and signal-interrupt errors.
The RedirectNotFoundRoutes
middleware redirects any route not found request that receives a 404 Not Found
error and is not a mapped allowed route. It also handles signal-interrupt errors.
The FastAPI CORSMiddleware
middleware implements and configures the CORS mechanism, allowing any origin, any credentials, any header, and only the GET
method.
SignalHandler
Set signal handlers to set the shutdown event flag
source code
utils
app/utils/signal_handler.py
Create an event object, either a threading Event
or asyncio Event
based on the parameter value, and register your handlers for the SIGINT
and SIGTERM
signals using the signal()
function. The handlers will catch shutdown signals and set the event flag. Then, process-based or threaded operations can be terminated gracefully.
Info
A graceful shutdown is a controlled and orderly process to perform a safe shutdown and free up resources when the application is suddenly interrupted or receives a shutdown/kill signal.
Parameter | Type | Description |
---|---|---|
for_async |
bool |
Indicates whether the event will be asynchronous or not. Default: None |